Skip to content

Commit d8eeb8e

Browse files
author
John Coggeshall
committed
Updated test cases and examples and cleaned up the new OO code so it will
be easier to maintain.
1 parent 6b567f8 commit d8eeb8e

File tree

13 files changed

+134
-212
lines changed

13 files changed

+134
-212
lines changed

ext/tidy/README

Lines changed: 36 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
README FOR ext/tidy by John Coggeshall <john@php.net>
33

4-
Tidy Version: 0.5b
4+
Tidy Version: 0.7b
55

66
Tidy is an extension based on Libtidy (http://tidy.sf.net/) and allows a PHP developer
77
to clean, repair, and traverse HTML, XHTML, and XML documents -- including ones with
@@ -19,55 +19,55 @@ then recompile libtidy.
1919
The Tidy extension has two separate APIs, one for general parsing, cleaning, and
2020
repairing and another for document traversal. The general API is provided below:
2121

22-
tidy_create() Initialize and return a tidy document resource
23-
tidy_parse_file($tidy, $file) Parse the document stored in $file
24-
tidy_parse_string($tidy, $str) Parse the string stored in $str
22+
tidy_create() Reinitialize the tidy engine
23+
tidy_parse_file($file) Parse the document stored in $file
24+
tidy_parse_string($str) Parse the string stored in $str
2525

26-
tidy_clean_repair($tidy) Clean and repair the document
27-
tidy_diagnose($tidy) Diagnose a parsed document
26+
tidy_clean_repair() Clean and repair the document
27+
tidy_diagnose() Diagnose a parsed document
2828

29-
tidy_setopt($tidy, $opt, $val) Set a configuration option $opt to $val
30-
tidy_getopt($tidy, $opt) Retrieve a configuration option
29+
tidy_setopt($opt, $val) Set a configuration option $opt to $val
30+
tidy_getopt($opt) Retrieve a configuration option
3131

32-
** note: $opt is a string representing the option. Right now the only
33-
source of these options is the LibTidy source.. eventually I'll document
34-
them offically -- see the src/config.c file in the tidy source **
32+
** note: $opt is a string representing the option. Although no formal
33+
documentation yet exists for PHP, you can find a description of many
34+
of them at http://www.w3.org/People/Raggett/tidy/ and a list of supported
35+
options in the phpinfo(); output**
3536

36-
tidy_get_output($tidy) Return the cleaned tidy HTML as a string
37-
tidy_get_error_buffer($tidy) Return a log of the errors and warnings
37+
tidy_get_output() Return the cleaned tidy HTML as a string
38+
tidy_get_error_buffer() Return a log of the errors and warnings
3839
returned by tidy
3940

4041
tidy_get_release() Return the Libtidy release date
41-
tidy_get_status($tidy) Return the status of the document
42-
tidy_get_html_ver($tidy) Return the major HTML version detected for
42+
tidy_get_status() Return the status of the document
43+
tidy_get_html_ver() Return the major HTML version detected for
4344
the document;
4445

45-
tidy_is_xhtml($tidy) Determines if the document is XHTML
46-
tidy_is_xml($tidy) Determines if the document is a generic XML
46+
tidy_is_xhtml() Determines if the document is XHTML
47+
tidy_is_xml() Determines if the document is a generic XML
4748

48-
tidy_error_count($tidy) Returns the number of errors in the document
49-
tidy_warning_count($tidy) Returns the number of warnings in the document
50-
tidy_access_count($tidy) Returns the number of accessibility-related
49+
tidy_error_count() Returns the number of errors in the document
50+
tidy_warning_count() Returns the number of warnings in the document
51+
tidy_access_count() Returns the number of accessibility-related
5152
warnings in the document.
52-
tidy_config_count($tidy) Returns the number of configuration errors found
53+
tidy_config_count() Returns the number of configuration errors found
5354

54-
tidy_load_config($tidy, $file) Loads the specified configuration file
55-
tidY_load_config_enc($tidy,
56-
$file,
55+
tidy_load_config($file) Loads the specified configuration file
56+
tidY_load_config_enc($file,
5757
$enc) Loads the specified config file using the specified
5858
character encoding
59-
tidy_set_encoding($tidy, $enc) Sets the current character encoding for the document
60-
tidy_save_config($tidy, $file) Saves the current config to $file
59+
tidy_set_encoding($enc) Sets the current character encoding for the document
60+
tidy_save_config($file) Saves the current config to $file
6161

6262

6363
Beyond these general-purpose API functions, Tidy also supports the following
6464
functions which are used to retrieve an object for document traversal:
6565

66-
tidy_get_root($tidy) Returns an object starting at the root of the
66+
tidy_get_root() Returns an object starting at the root of the
6767
document
68-
tidy_get_head($tidy) Returns an object starting at the <HEAD> tag
69-
tidy_get_html($tidy) Returns an object starting at the <HTML> tag
70-
tidy_get_body($tidy) Returns an object starting at the <BODY> tag
68+
tidy_get_head() Returns an object starting at the <HEAD> tag
69+
tidy_get_html() Returns an object starting at the <HTML> tag
70+
tidy_get_body() Returns an object starting at the <BODY> tag
7171

7272
All Navigation of the specified document is done via the PHP5 object constructs.
7373
There are two types of objects which Tidy can create. The first is TidyNode, which
@@ -82,18 +82,12 @@ class TidyNode {
8282
public $type; // type of node (text, php, asp, etc.)
8383
public $id; // id of node (i.e. TIDY_TAG_HEAD)
8484

85-
public $line; // line # of node in source
86-
public $column; // column # of node in source
87-
88-
public $html_ver; // HTML version (0,1,2,3,4)
89-
90-
public $attribs; // an array of attributes (see TidyAttr)
91-
public $children; // an array of child nodes
85+
public function attributes(); // an array of attributes (see TidyAttr)
86+
public function children(); // an array of child nodes
9287

9388
function has_siblings(); // any sibling nodes?
9489
function has_children(); // any child nodes?
95-
function has_parent(); // have a parent?
96-
90+
9791
function is_comment(); // is node a comment?
9892
function is_xhtml(); // is document XHTML?
9993
function is_xml(); // is document generic XML (not HTML/XHTML)
@@ -106,45 +100,12 @@ class TidyNode {
106100

107101
function next(); // returns next node
108102
function prev(); // returns prev node
109-
function parent(); // returns parent node
110-
function child(); // returns first child node
111-
103+
112104
/* Searches for a particular attribute in the current node based
113105
on node ID. If found returns a TidyAttr object for it */
114-
function get_attr_type($attr_id);
106+
function get_attr($attr_id);
115107

116108
/*
117-
118-
NOT YET IMPLEMENTED
119-
120-
Recursively traverses the tree from the current node and returns
121-
an array of attributes matching the node ID/attr ID pair
122-
123-
Useful for pulling out things like links:
124-
foreach($body->fetch_attrs(TIDY_TAG_A, TIDY_ATTR_HREF) as $link) {
125-
echo "Link : {$link->value}\n";
126-
}
127-
*/
128-
129-
function fetch_attrs($node_id, $attr_id);
130-
131-
/*
132-
133-
NOT YET IMPLEMENTED
134-
135-
Recursively traverses the tree from the current node and returns
136-
an array of nodes matching the node ID
137-
138-
Useful for pulling out tables, etc (echos the HTML for every
139-
<TABLE> block)
140-
141-
foreach($body->fetch_nodes(TIDY_TAG_TABLE) as $table) {
142-
143-
echo $table->value;
144-
145-
}
146-
*/
147-
function fetch_nodes($node_id)
148109
}
149110

150111
class TidyAttr {
@@ -153,11 +114,9 @@ class TidyAttr {
153114
public $value; // attribute value
154115
public $id; // attribute id i.e. TIDY_ATTR_HREF
155116

156-
function next(); // returns next attribute in tag
157-
function tag(); // returns the tag node associated with attribute
158117
}
159118

160119
Examples of using these objects to navigate the tree can be found in the examples/
161120
directory (I suggest looking at urlgrab.php and dumpit.php)
162121

163-
E-mail thoughts, suggestions, patches, etc. to <john@php.net>
122+
E-mail thoughts, suggestions, patches, etc. to <john@php.net>

ext/tidy/TODO

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
11
TODO
22

3-
- Implement fetch_attr(), fetch_node() methods
4-
- Fix any memleaks
5-
- Fix Win32 crashes
3+
- Implement get_nodes() method

ext/tidy/examples/cleanhtml.php

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,26 +12,24 @@
1212
*
1313
*/
1414

15-
$tidy = tidy_create();
16-
1715
if(!isset($_SERVER['argv'][1])) {
1816
$data = file_get_contents("php://stdin");
19-
tidy_parse_string($tidy, $data);
17+
tidy_parse_string($data);
2018
} else {
21-
tidy_parse_file($tidy, $_SERVER['argv'][1]);
19+
tidy_parse_file($_SERVER['argv'][1]);
2220
}
2321

24-
tidy_clean_repair($tidy);
22+
tidy_clean_repair();
2523

26-
if(tidy_warning_count($tidy) ||
27-
tidy_error_count($tidy)) {
24+
if(tidy_warning_count() ||
25+
tidy_error_count()) {
2826

2927
echo "\n\nThe following errors or warnings occured:\n";
30-
echo tidy_get_error_buffer($tidy);
28+
echo tidy_get_error_buffer();
3129
echo "\n";
3230
}
3331

34-
echo tidy_get_output($tidy);
32+
echo tidy_get_output();
3533

3634
?>
3735

ext/tidy/examples/dumpit.php

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,13 @@
1010
* Usage; php dumpit.php <filename>
1111
*/
1212

13-
14-
$tidy = tidy_create();
15-
tidy_parse_file($tidy, $_SERVER['argv'][1]);
13+
tidy_parse_file($_SERVER['argv'][1]);
1614

1715
/* Optionally you can do this here if you want to fix up the document */
1816

19-
/* tidy_clean_repair($tidy); */
17+
/* tidy_clean_repair(); */
2018

21-
$tree = tidy_get_root($tidy);
19+
$tree = tidy_get_root();
2220
dump_tree($tree);
2321
echo "\n";
2422

@@ -70,20 +68,20 @@ function dump_tree($node, $indent = 0) {
7068
}
7169

7270
/* Any attributes on this node? */
73-
if(count($node->attribs)) {
71+
if(count($node->attributes())) {
7472
do_leaf(" |\n", $indent);
7573
do_leaf(" +---- Attributes\n", $indent);
7674

7775
/* Cycle through the attributes and display them and their values. */
78-
foreach($node->attribs as $attrib) {
76+
foreach($node->attributes() as $attrib) {
7977
do_leaf(" +--{$attrib->name}\n", $indent);
8078
do_leaf(" | +-- Value: {$attrib->value}\n", $indent);
8179
}
8280
}
8381

8482
/* Recurse along the children to generate the remaining nodes */
8583
if($node->has_children()) {
86-
foreach($node->children as $child) {
84+
foreach($node->children() as $child) {
8785
dump_tree($child, $indent + 3);
8886
}
8987
}

ext/tidy/examples/urlgrab.php

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,15 @@
1111
* Usage: php urlgrab.php <file>
1212
*
1313
*/
14-
15-
/* Create a Tidy Resource */
16-
$tidy = tidy_create();
17-
14+
1815
/* Parse the document */
19-
tidy_parse_file($tidy, $_SERVER['argv'][1]);
16+
tidy_parse_file($_SERVER['argv'][1]);
2017

2118
/* Fix up the document */
22-
tidy_clean_repair($tidy);
19+
tidy_clean_repair();
2320

2421
/* Get an object representing everything from the <HTML> tag in */
25-
$html = tidy_get_html($tidy);
22+
$html = tidy_get_html();
2623

2724
/* Traverse the document tree */
2825
print_r(get_links($html));
@@ -33,7 +30,7 @@ function get_links($node) {
3330
/* Check to see if we are on an <A> tag or not */
3431
if($node->id == TIDY_TAG_A) {
3532
/* If we are, find the HREF attribute */
36-
$attrib = $node->get_attr_type(TIDY_ATTR_HREF);
33+
$attrib = $node->get_attr(TIDY_ATTR_HREF);
3734
if($attrib) {
3835
/* Add the value of the HREF attrib to $urls */
3936
$urls[] = $attrib->value;
@@ -45,7 +42,7 @@ function get_links($node) {
4542
if($node->has_children()) {
4643

4744
/* Traverse down each child recursively */
48-
foreach($node->children as $child) {
45+
foreach($node->children() as $child) {
4946

5047
/* Append the results from recursion to $urls */
5148
foreach(get_links($child) as $url) {

ext/tidy/php_tidy.h

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,33 @@ extern zend_module_entry tidy_module_entry;
9595
obj = (PHPTidyObj *)zend_object_store_get_object(object TSRMLS_CC); \
9696
}
9797

98+
#define INSTANCIATE_NODE(_zval, _container, _node) \
99+
tidy_instanciate(tidy_ce_node, _zval TSRMLS_CC); \
100+
_container = (PHPTidyObj *) zend_object_store_get_object(_zval TSRMLS_CC); \
101+
_container->node = _node; \
102+
_container->attr = NULL; \
103+
_container->type = is_node; \
104+
tidy_add_default_properities(_container, is_node TSRMLS_CC);
105+
106+
#define INSTANCIATE_ATTR(_zval, _container, _attr) \
107+
tidy_instanciate(tidy_ce_attr, _zval TSRMLS_CC); \
108+
_container = (PHPTidyObj *) zend_object_store_get_object(_zval TSRMLS_CC); \
109+
_container->node = NULL; \
110+
_container->attr = _attr; \
111+
_container->type = is_attr; \
112+
tidy_add_default_properities(_container, is_attr TSRMLS_CC);
113+
114+
#define PHP_NODE_METHOD_IS_TYPE(_type, _const) \
115+
PHP_NODE_METHOD(is_ ##_type) \
116+
{ \
117+
GET_THIS_CONTAINER(); \
118+
if(tidyNodeGetType(obj->node) == _const) {\
119+
RETURN_TRUE; \
120+
} else { \
121+
RETURN_FALSE; \
122+
} \
123+
}
124+
98125
typedef enum {
99126
is_node,
100127
is_attr

ext/tidy/tests/002.phpt

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,9 @@ tidy_parse_string()
77
--INI--
88
--FILE--
99
<?php
10-
11-
$tidy = tidy_create();
12-
13-
tidy_parse_string($tidy, "<HTML></HTML>");
10+
tidy_parse_string("<HTML></HTML>");
1411

15-
echo tidy_get_output($tidy);
12+
echo tidy_get_output();
1613

1714
?>
1815
--EXPECT--

ext/tidy/tests/003.phpt

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,10 @@ tidy_clean_repair()
88
--FILE--
99
<?php
1010

11-
$tidy = tidy_create();
12-
13-
tidy_parse_string($tidy, "<HTML></HTML>");
14-
tidy_clean_repair($tidy);
11+
tidy_parse_string("<HTML></HTML>");
12+
tidy_clean_repair();
1513

16-
echo tidy_get_output($tidy);
14+
echo tidy_get_output();
1715

1816
?>
1917
--EXPECT--

ext/tidy/tests/004.phpt

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,9 @@ tidy_diagnose()
77
--INI--
88
--FILE--
99
<?php
10-
11-
$tidy = tidy_create();
12-
13-
tidy_parse_string($tidy, "<HTML></HTML>");
14-
tidy_diagnose($tidy);
15-
echo tidy_get_error_buffer($tidy);
10+
tidy_parse_string("<HTML></HTML>");
11+
tidy_diagnose();
12+
echo tidy_get_error_buffer();
1613

1714
?>
1815
--EXPECT--

ext/tidy/tests/005.phpt

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,9 @@ tidy_parse_file()
88
--FILE--
99
<?php
1010

11-
$tidy = tidy_create();
11+
tidy_parse_file("ext/tidy/tests/005.html");
1212

13-
tidy_parse_file($tidy, "ext/tidy/tests/005.html");
14-
15-
echo tidy_get_output($tidy);
13+
echo tidy_get_output();
1614

1715
?>
1816
--EXPECT--

0 commit comments

Comments
 (0)