Skip to content
This repository was archived by the owner on Jan 2, 2019. It is now read-only.

Semantic table HTML reader #638

Open
wants to merge 11 commits into
base: 1.8
Choose a base branch
from
Open

Conversation

apapsch
Copy link

@apapsch apapsch commented Aug 12, 2015

This pull request introduces the new reader PHPExcel_Reader_HTML_SemanticTable. Compared to the existing HTML reader, SemanticTable doesn't try to convert everything table-looking. Instead only table elements are recognized and each is treated as a worksheet with the children elements caption, thead and tbody parsed as worksheet title, header rows and content rows respectively.

For example, the following HTML portion is recognized by SemanticTable:

<table>
  <caption>Worksheet name</caption>
  <thead>
    <tr>
      <th>Heading in A1<th>
      <th>Heading in B1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Datenspalte A2</td>
      <td>Datenspalte B2</td>
    </tr>
    <tr>
      <td>Datenspalte A3</td>
      <td>Datenspalte B3</td>
    </tr>
  </tbody>
</table>

To avoid repetitions in PHPExcel_Reader_HTML and PHPExcel_Reader_HTML_SemanticTable, an abstract base class is introduced which is used for common functionality. Care was taken to keep modifications in PHPExcel_Reader_HTML at a minimum, therefore it only uses defaultElementHandler and doesn't use TRAVERSE_CHILDS hint.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants