Humble Trader

Saturday, September 09, 2006

Extract HTML Data - Package STA.EXTRACT_HTML_DATA Specification

  • Package: extract_html_data
  • Description: Container for all procedures and functions relating to extracting data from semi-structured HTML.
    • Type:
      • Name: data_list
      • Description: Container for HTML table data.
      • Type: TABLE
      • Datatype: VARCHAR2(4000)
    • Function: extract_table
      • Description: Return a PL/SQL table containing the data found in the p_table_no HTML from PARSED_HTML. The PARSED_HTML page is referenced as the last created block pointed to by the page name.
      • Parameters:
        • p_table_no:
          • Datatype: NUMBER
          • Direction: IN
          • Description: The cardinal location of the table within the HTML.
        • p_web_page:
          • Datatype: html_pages.name%TYPE
          • Direction: IN
          • Description: The page name that the PARSED_HTML data block originated from.
      • Return:
        • Datatype: data_list
        • Description: A PL/SQL table containing the data. Each element contains the char'd data for one row in the table in the form; '[data_1]','[data_2]','[...
      • Action:
        • Get run_no keys (parse and raw) for parsed_html from the web page name.
        • Get the start component sequence for the table of interest.
        • Get the end component sequence for the table of interest.
        • For all the HTML components inside the range found above...
          • If the start of a new table row...
            • Create a new list object.
            • Put an opening quote into the object.
          • Else If an opening or closing Table Data tag...
            • If an opening tag and not the first piece of data...
              • Add a comma component seperator.
            • End; If an opening tag and not the first piece of data.
            • Add a quote.
          • Else if not a tag...
            • Add the data to the object.
          • Else do nothing.
          • End; If the start of a new table row.
        • End; For all the HTML components in the range found above.
        • Return the list.

0 Comments:

Post a Comment

<< Home