@@ -92,12 +92,37 @@ Represents a page in the document:
92
92
93
93
This node represent a paragraph, a heading or any text within the document.
94
94
95
- - ` category ` : The type ` "doc" ` .
95
+ - ` category ` : The classification of the text within the document .
96
96
- ` content ` : A string representing the textual content.
97
97
- ` marks ` : List of [ marks] ( #marks ) applied to the text, such as bold, italic, etc.
98
98
- ` attributes ` : Can contain metadata like the bounding box representing where this portion of text is located in the page.
99
99
100
-
100
+ ### Category
101
+ Below are the various categories of text that may be found within a document:
102
+
103
+ ** Category Type**
104
+ - ` page-header ` : Represents the header of the page.
105
+ - ` footer ` : Represents the footer of the page.
106
+ - ` heading ` : Any heading within the document.
107
+ - ` figure ` : Represents a figure or an image.
108
+ - ` other ` : Any other unclassified text.
109
+ - ` appendix ` : Text within an appendix.
110
+ - ` keywords ` : List of keywords.
111
+ - ` acknowledgments ` : Section acknowledging contributors.
112
+ - ` caption ` : Caption associated with a figure or table.
113
+ - ` toc ` : Table of contents.
114
+ - ` abstract ` : The abstract of the document.
115
+ - ` footnote ` : Text at the bottom of the page providing additional information.
116
+ - ` body ` : Main body text of the document.
117
+ - ` itemize-item ` : Item in a list or bullet point.
118
+ - ` title ` : The title of the document.
119
+ - ` reference ` : References or citations within the document.
120
+ - ` affiliation ` : Author's institutional affiliation.
121
+ - ` general-terms ` : General terms section.
122
+ - ` formula ` : Mathematical formula or equation.
123
+ - ` categories ` : Categories or topics listed in the document.
124
+ - ` table ` : Represents a table.
125
+ - ` authors ` : List of authors.
101
126
102
127
### Marks
103
128
@@ -119,8 +144,9 @@ Attributes are optional fields that can store additional information for each no
119
144
120
145
- ` DocumentAttributes ` : General attributes for the document (currently reserved for the future).
121
146
- ` PageAttributes ` : Specific page related attributes, such as the page number.
122
- - ` TextAttributes ` : Text related attributes, such as bounding boxes.
147
+ - ` TextAttributes ` : Text related attributes, such as bounding boxes or level .
123
148
- ` BoundingBox ` : A box that specifies the position of a text in the page.
149
+ - ` Level ` : The specific level of the text within a document, for example, for headings.
124
150
125
151
126
152
## Getting started
0 commit comments