Extract HTML Data - Functional Specification
The sub-system Get HTML goes to the internet, grabs the HTML from a page and puts it into PARSED_HTML in a semi-structured format. That, however, is only half the story. I now what to extract real data from this HTML and that is what this sub-system does.
HTML-based data comes in a number of internal structures within the page. This sub-system deals with the following:
- Table Data:
- HTML can contain tables with data in them. A function will extract this data and return it in a structure. As there may be multiple tables on a web page, the cardinal position of the table of interest needs to be known.




0 Comments:
Post a Comment
<< Home