1
1
2
2
README FOR ext/tidy by John Coggeshall <john@php.net>
3
3
4
- Tidy Version: 0.5b
4
+ Tidy Version: 0.7b
5
5
6
6
Tidy is an extension based on Libtidy (http://tidy.sf.net/) and allows a PHP developer
7
7
to clean, repair, and traverse HTML, XHTML, and XML documents -- including ones with
@@ -19,55 +19,55 @@ then recompile libtidy.
19
19
The Tidy extension has two separate APIs, one for general parsing, cleaning, and
20
20
repairing and another for document traversal. The general API is provided below:
21
21
22
- tidy_create() Initialize and return a tidy document resource
23
- tidy_parse_file($tidy, $ file) Parse the document stored in $file
24
- tidy_parse_string($tidy, $ str) Parse the string stored in $str
22
+ tidy_create() Reinitialize the tidy engine
23
+ tidy_parse_file($file) Parse the document stored in $file
24
+ tidy_parse_string($str) Parse the string stored in $str
25
25
26
- tidy_clean_repair($tidy) Clean and repair the document
27
- tidy_diagnose($tidy) Diagnose a parsed document
26
+ tidy_clean_repair() Clean and repair the document
27
+ tidy_diagnose() Diagnose a parsed document
28
28
29
- tidy_setopt($tidy, $ opt, $val) Set a configuration option $opt to $val
30
- tidy_getopt($tidy, $ opt) Retrieve a configuration option
29
+ tidy_setopt($opt, $val) Set a configuration option $opt to $val
30
+ tidy_getopt($opt) Retrieve a configuration option
31
31
32
- ** note: $opt is a string representing the option. Right now the only
33
- source of these options is the LibTidy source.. eventually I'll document
34
- them offically -- see the src/config.c file in the tidy source **
32
+ ** note: $opt is a string representing the option. Although no formal
33
+ documentation yet exists for PHP, you can find a description of many
34
+ of them at http://www.w3.org/People/Raggett/tidy/ and a list of supported
35
+ options in the phpinfo(); output**
35
36
36
- tidy_get_output($tidy) Return the cleaned tidy HTML as a string
37
- tidy_get_error_buffer($tidy) Return a log of the errors and warnings
37
+ tidy_get_output() Return the cleaned tidy HTML as a string
38
+ tidy_get_error_buffer() Return a log of the errors and warnings
38
39
returned by tidy
39
40
40
41
tidy_get_release() Return the Libtidy release date
41
- tidy_get_status($tidy) Return the status of the document
42
- tidy_get_html_ver($tidy) Return the major HTML version detected for
42
+ tidy_get_status() Return the status of the document
43
+ tidy_get_html_ver() Return the major HTML version detected for
43
44
the document;
44
45
45
- tidy_is_xhtml($tidy) Determines if the document is XHTML
46
- tidy_is_xml($tidy) Determines if the document is a generic XML
46
+ tidy_is_xhtml() Determines if the document is XHTML
47
+ tidy_is_xml() Determines if the document is a generic XML
47
48
48
- tidy_error_count($tidy) Returns the number of errors in the document
49
- tidy_warning_count($tidy) Returns the number of warnings in the document
50
- tidy_access_count($tidy) Returns the number of accessibility-related
49
+ tidy_error_count() Returns the number of errors in the document
50
+ tidy_warning_count() Returns the number of warnings in the document
51
+ tidy_access_count() Returns the number of accessibility-related
51
52
warnings in the document.
52
- tidy_config_count($tidy) Returns the number of configuration errors found
53
+ tidy_config_count() Returns the number of configuration errors found
53
54
54
- tidy_load_config($tidy, $file) Loads the specified configuration file
55
- tidY_load_config_enc($tidy,
56
- $file,
55
+ tidy_load_config($file) Loads the specified configuration file
56
+ tidY_load_config_enc($file,
57
57
$enc) Loads the specified config file using the specified
58
58
character encoding
59
- tidy_set_encoding($tidy, $ enc) Sets the current character encoding for the document
60
- tidy_save_config($tidy, $ file) Saves the current config to $file
59
+ tidy_set_encoding($enc) Sets the current character encoding for the document
60
+ tidy_save_config($file) Saves the current config to $file
61
61
62
62
63
63
Beyond these general-purpose API functions, Tidy also supports the following
64
64
functions which are used to retrieve an object for document traversal:
65
65
66
- tidy_get_root($tidy ) Returns an object starting at the root of the
66
+ tidy_get_root() Returns an object starting at the root of the
67
67
document
68
- tidy_get_head($tidy ) Returns an object starting at the <HEAD> tag
69
- tidy_get_html($tidy ) Returns an object starting at the <HTML> tag
70
- tidy_get_body($tidy ) Returns an object starting at the <BODY> tag
68
+ tidy_get_head() Returns an object starting at the <HEAD> tag
69
+ tidy_get_html() Returns an object starting at the <HTML> tag
70
+ tidy_get_body() Returns an object starting at the <BODY> tag
71
71
72
72
All Navigation of the specified document is done via the PHP5 object constructs.
73
73
There are two types of objects which Tidy can create. The first is TidyNode, which
@@ -82,18 +82,12 @@ class TidyNode {
82
82
public $type; // type of node (text, php, asp, etc.)
83
83
public $id; // id of node (i.e. TIDY_TAG_HEAD)
84
84
85
- public $line; // line # of node in source
86
- public $column; // column # of node in source
87
-
88
- public $html_ver; // HTML version (0,1,2,3,4)
89
-
90
- public $attribs; // an array of attributes (see TidyAttr)
91
- public $children; // an array of child nodes
85
+ public function attributes(); // an array of attributes (see TidyAttr)
86
+ public function children(); // an array of child nodes
92
87
93
88
function has_siblings(); // any sibling nodes?
94
89
function has_children(); // any child nodes?
95
- function has_parent(); // have a parent?
96
-
90
+
97
91
function is_comment(); // is node a comment?
98
92
function is_xhtml(); // is document XHTML?
99
93
function is_xml(); // is document generic XML (not HTML/XHTML)
@@ -106,45 +100,12 @@ class TidyNode {
106
100
107
101
function next(); // returns next node
108
102
function prev(); // returns prev node
109
- function parent(); // returns parent node
110
- function child(); // returns first child node
111
-
103
+
112
104
/* Searches for a particular attribute in the current node based
113
105
on node ID. If found returns a TidyAttr object for it */
114
- function get_attr_type ($attr_id);
106
+ function get_attr ($attr_id);
115
107
116
108
/*
117
-
118
- NOT YET IMPLEMENTED
119
-
120
- Recursively traverses the tree from the current node and returns
121
- an array of attributes matching the node ID/attr ID pair
122
-
123
- Useful for pulling out things like links:
124
- foreach($body->fetch_attrs(TIDY_TAG_A, TIDY_ATTR_HREF) as $link) {
125
- echo "Link : {$link->value}\n";
126
- }
127
- */
128
-
129
- function fetch_attrs($node_id, $attr_id);
130
-
131
- /*
132
-
133
- NOT YET IMPLEMENTED
134
-
135
- Recursively traverses the tree from the current node and returns
136
- an array of nodes matching the node ID
137
-
138
- Useful for pulling out tables, etc (echos the HTML for every
139
- <TABLE> block)
140
-
141
- foreach($body->fetch_nodes(TIDY_TAG_TABLE) as $table) {
142
-
143
- echo $table->value;
144
-
145
- }
146
- */
147
- function fetch_nodes($node_id)
148
109
}
149
110
150
111
class TidyAttr {
@@ -153,11 +114,9 @@ class TidyAttr {
153
114
public $value; // attribute value
154
115
public $id; // attribute id i.e. TIDY_ATTR_HREF
155
116
156
- function next(); // returns next attribute in tag
157
- function tag(); // returns the tag node associated with attribute
158
117
}
159
118
160
119
Examples of using these objects to navigate the tree can be found in the examples/
161
120
directory (I suggest looking at urlgrab.php and dumpit.php)
162
121
163
- E-mail thoughts, suggestions, patches, etc. to <john@php.net>
122
+ E-mail thoughts, suggestions, patches, etc. to <john@php.net>
0 commit comments