Humble Trader

Friday, September 01, 2006

Get HTML - Table STA.PARSED_HTML Specification

  • Name: PARSED_HTML
  • Description: Holds parsed external-site HTML. This is reasonably well structured and contains indentation information should a report be run against it. Each row contains a tag or data.
  • Columns:
    • Name: PARSE_RUN_NO
      • Description: 'Part of primary key and foreign key to HTML_PARSE_PASSES.PARSE_RUN_NO. The Run Number that created this row.
      • Datatype: NUMBER
      • Null: No
      • Unique: No
      • Part of PK: Yes
    • Name: RAW_RUN_NO
      • Description: Part of primary key and foreign key to HTML_PARSE_PASSES.RAW_RUN_NO. The Run Number of the raw HTML that this row is sourced from.
      • Datatype: NUMBER
      • Null: No
      • Unique: No
      • Part of PK: Yes
    • Name: COMPONENT_SEQ
      • Description: Part of primary key. Maintains the order of the component in the HTML page.
      • Datatype: NUMBER
      • Null: No
      • Unique: No
      • Part of PK: Yes
    • Name: INDENT
      • Description: Indentation information for report layouts. The lowest indent is zero and components within outer components are given numbers comesurate with their place within the web page. For example, the BODY tag is at level 0, a TABLE tag could be level 1, TR at level 2, and TD at level 3.
      • Datatype: NUMBER
      • Null: No
      • Unique: No
      • Part of PK: No
    • Name: HTML_COMPONENT
      • Description: The HTML tag or data from the web page. Long strings of data could be split over several rows and no attempt at parapraph layout is made. i.e. words could be split over rows.
      • Datatype: VARCHAR2(4000)
      • Null: No
      • Unique: No
      • Part of PK: No
    • Name: INS_TSP
      • Description: Insert timestamp.
      • Datatype: DATE
      • Null: No
      • Unique: No
      • Part of PK: No

0 Comments:

Post a Comment

<< Home