Recognizing ontology applicable multiple record web documents
Download
1 / 20

Recognizing Ontology-Applicable Multiple-Record Web Documents - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Recognizing Ontology-Applicable Multiple-Record Web Documents. David W. Embley Dennis Ng Li Xu. Brigham Young University. Problem: Recognizing Applicable Documents. Document 1: Car Ads. Document 2: Items for Sale or Rent. A Conceptual Modeling Solution. Car-Ads Ontology. Car [->object];

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Recognizing Ontology-Applicable Multiple-Record Web Documents' - mohammad-freeman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Recognizing ontology applicable multiple record web documents

Recognizing Ontology-ApplicableMultiple-Record Web Documents

David W. Embley

Dennis Ng

Li Xu

Brigham Young University


Problem recognizing applicable documents
Problem: Recognizing Applicable Documents

Document 1: Car Ads

Document 2: Items for Sale or Rent



Car ads ontology
Car-Ads Ontology

Car [->object];

Car [0:0.975:1] has Year [1:*];

Car [0:0.925:1] has Make [1:*];

Car [0:0.908:1] has Model [1:*];

Car [0:0.45:1] has Mileage [1:*];

Car [0:2.1:*] has Feature [1:*];

Car [0:0.8:1] has Price [1:*];

PhoneNr [1:*] is for Car [1:1.15:*];

Year matches [4]

constant {extract “\d{2}”;

context "([^\$\d]|^)[4-9]\d,[^\d]";

substitute "^" -> "19"; },

End;


Recognition heuristics
Recognition Heuristics

  • H1: Density

  • H2: Expected Values

  • H3: Grouping


H1 density
H1: Density

Document 1: Car Ads

Document 2: Items for Sale or Rent


H1 density1
H1: Density

  • Car Ads

    • Number of Matched Characters: 626

    • Total Number of Characters: 2048

    • Density: 0.306

  • Items for Rent or Sale

    • Number of Matched Characters: 196

    • Total Number of Characters: 2671

    • Density: 0.073


H2 expected values
H2: Expected Values

Document 1: Car Ads

Document 2: Items for Sale or Rent

Year: 3

Make: 2

Model: 3

Mileage: 1

Price: 1

Feature: 15

PhoneNr: 3

Year: 1

Make: 0

Model: 0

Mileage: 1

Price: 0

Feature: 0

PhoneNr: 4


H2 expected values1
H2: Expected Values

OV D1 D2

Year 0.98 16 6

Make 0.93 10 0

Model 0.91 12 0

Mileage 0.45 6 2

Price 0.80 11 8

Feature 2.10 29 0

PhoneNr 1.15 15 11

D1: 0.996

D2: 0.567

D1

ov

D2


H3 grouping of 1 max object sets
H3: Grouping (of 1-Max Object Sets)

Document 1: Car Ads

Document 2: Items for Sale or Rent

{

Year

Make

Model

Price

Year

Model

Year

Make

Model

Mileage

{

{

{

Year

Mileage

Mileage

Year

Price

Price

{


H3 grouping
H3: Grouping

2+3+2+1

44

3+3+4+4

44

= 0.875

= 0.500

Car Ads

----------------

Year

Year

Make

Model

-------------- 3

Price

Year

Model

Year

---------------3

Make

Model

Mileage

Year

---------------4

Model

Mileage

Price

Year

---------------4

Grouping: 0.865

Sale Items

----------------

Year

Year

Year

Mileage

-------------- 2

Mileage

Year

Price

Price

---------------3

Year

Price

Price

Year

---------------2

Price

Price

Price

Price

---------------1

Grouping: 0.500

Expected Number in Group

=   Ave 

= 4 (for our example)

1-Max

Sum of Distinct 1-Max in each Group

Number of Groups  Expected Number in a Group


Combining heuristics
Combining Heuristics

  • Decision-Tree Learning Algorithm C4.5

    • (H1, H2, H3, Positive)

    • (H1, H2, H3, Negative)

  • Training Set

    • 20 positive examples

    • 30 negative examples (some purposely similar, e.g. classified ads)

  • Test Set

    • 10 positive examples

    • 20 negative examples


Car ads rule results
Car Ads: Rule & Results

  • Precision: 100%

  • Recall: 91%

  • Accuracy 97%

    • Harmonic Mean

    • 2/(1/Precision + 1/Recall)




Obituaries rule results
Obituaries: Rule & Results

  • Precision: 91%

  • Recall: 100%

  • Accuracy: 97%



Universal rule
Universal Rule

  • Precision: 84%

  • Recall: 100%

  • Accuracy: 93%


Additional and future work
Additional and Future Work

  • Other Approaches

    • Naïve Bayes [McCallum96] (accuracy near 90%)

    • Logistic Regression [Wang01] (accuracy near 95%)

    • Multivariate Analysis with Continuous Random Vectors [Tang01] (accuracy near 100%)

  • More Extensive Testing

    • Similar documents (motorcycles, wedding announcements, …)

    • Accuracy drops to near 87%

    • Naïve Bayes drops to near 77%

    • Others … ?

  • Other Types of Documents

    • XML Documents

    • Forms and the Hidden Web

    • Tables


Summary
Summary

  • Objective: Automatically Recognize Document Applicability

  • Approach:

    • Conceptual Modeling

    • Recognition Heuristics

      • Density

      • Expected Values

      • Grouping

  • Result: Accuracy Near 95%

www.deg.byu.edu