Humble Trader

Saturday, September 09, 2006

Extract HTML Data - Functional Specification

The sub-system Get HTML goes to the internet, grabs the HTML from a page and puts it into PARSED_HTML in a semi-structured format. That, however, is only half the story. I now what to extract real data from this HTML and that is what this sub-system does.

HTML-based data comes in a number of internal structures within the page. This sub-system deals with the following:

  • Table Data:
    • HTML can contain tables with data in them. A function will extract this data and return it in a structure. As there may be multiple tables on a web page, the cardinal position of the table of interest needs to be known.

0 Comments:

Post a Comment

<< Home