recognizing ontology applicable multiple record web documents
Download
Skip this Video
Download Presentation
Recognizing Ontology-Applicable Multiple-Record Web Documents

Loading in 2 Seconds...

play fullscreen
1 / 20

Recognizing Ontology-Applicable Multiple-Record Web Documents - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Recognizing Ontology-Applicable Multiple-Record Web Documents. David W. Embley Dennis Ng Li Xu. Brigham Young University. Problem: Recognizing Applicable Documents. Document 1: Car Ads. Document 2: Items for Sale or Rent. A Conceptual Modeling Solution. Car-Ads Ontology. Car [->object];

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Recognizing Ontology-Applicable Multiple-Record Web Documents' - mohammad-freeman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
recognizing ontology applicable multiple record web documents

Recognizing Ontology-ApplicableMultiple-Record Web Documents

David W. Embley

Dennis Ng

Li Xu

Brigham Young University

problem recognizing applicable documents
Problem: Recognizing Applicable Documents

Document 1: Car Ads

Document 2: Items for Sale or Rent

car ads ontology
Car-Ads Ontology

Car [->object];

Car [0:0.975:1] has Year [1:*];

Car [0:0.925:1] has Make [1:*];

Car [0:0.908:1] has Model [1:*];

Car [0:0.45:1] has Mileage [1:*];

Car [0:2.1:*] has Feature [1:*];

Car [0:0.8:1] has Price [1:*];

PhoneNr [1:*] is for Car [1:1.15:*];

Year matches [4]

constant {extract “\d{2}”;

context "([^\$\d]|^)[4-9]\d,[^\d]";

substitute "^" -> "19"; },

End;

recognition heuristics
Recognition Heuristics
  • H1: Density
  • H2: Expected Values
  • H3: Grouping
h1 density
H1: Density

Document 1: Car Ads

Document 2: Items for Sale or Rent

h1 density1
H1: Density
  • Car Ads
    • Number of Matched Characters: 626
    • Total Number of Characters: 2048
    • Density: 0.306
  • Items for Rent or Sale
    • Number of Matched Characters: 196
    • Total Number of Characters: 2671
    • Density: 0.073
h2 expected values
H2: Expected Values

Document 1: Car Ads

Document 2: Items for Sale or Rent

Year: 3

Make: 2

Model: 3

Mileage: 1

Price: 1

Feature: 15

PhoneNr: 3

Year: 1

Make: 0

Model: 0

Mileage: 1

Price: 0

Feature: 0

PhoneNr: 4

h2 expected values1
H2: Expected Values

OV D1 D2

Year 0.98 16 6

Make 0.93 10 0

Model 0.91 12 0

Mileage 0.45 6 2

Price 0.80 11 8

Feature 2.10 29 0

PhoneNr 1.15 15 11

D1: 0.996

D2: 0.567

D1

ov

D2

h3 grouping of 1 max object sets
H3: Grouping (of 1-Max Object Sets)

Document 1: Car Ads

Document 2: Items for Sale or Rent

{

Year

Make

Model

Price

Year

Model

Year

Make

Model

Mileage

{

{

{

Year

Mileage

Mileage

Year

Price

Price

{

h3 grouping
H3: Grouping

2+3+2+1

44

3+3+4+4

44

= 0.875

= 0.500

Car Ads

----------------

Year

Year

Make

Model

-------------- 3

Price

Year

Model

Year

---------------3

Make

Model

Mileage

Year

---------------4

Model

Mileage

Price

Year

---------------4

Grouping: 0.865

Sale Items

----------------

Year

Year

Year

Mileage

-------------- 2

Mileage

Year

Price

Price

---------------3

Year

Price

Price

Year

---------------2

Price

Price

Price

Price

---------------1

Grouping: 0.500

Expected Number in Group

=   Ave 

= 4 (for our example)

1-Max

Sum of Distinct 1-Max in each Group

Number of Groups  Expected Number in a Group

combining heuristics
Combining Heuristics
  • Decision-Tree Learning Algorithm C4.5
    • (H1, H2, H3, Positive)
    • (H1, H2, H3, Negative)
  • Training Set
    • 20 positive examples
    • 30 negative examples (some purposely similar, e.g. classified ads)
  • Test Set
    • 10 positive examples
    • 20 negative examples
car ads rule results
Car Ads: Rule & Results
  • Precision: 100%
  • Recall: 91%
  • Accuracy 97%
    • Harmonic Mean
    • 2/(1/Precision + 1/Recall)
obituaries rule results
Obituaries: Rule & Results
  • Precision: 91%
  • Recall: 100%
  • Accuracy: 97%
universal rule
Universal Rule
  • Precision: 84%
  • Recall: 100%
  • Accuracy: 93%
additional and future work
Additional and Future Work
  • Other Approaches
    • Naïve Bayes [McCallum96] (accuracy near 90%)
    • Logistic Regression [Wang01] (accuracy near 95%)
    • Multivariate Analysis with Continuous Random Vectors [Tang01] (accuracy near 100%)
  • More Extensive Testing
    • Similar documents (motorcycles, wedding announcements, …)
    • Accuracy drops to near 87%
    • Naïve Bayes drops to near 77%
    • Others … ?
  • Other Types of Documents
    • XML Documents
    • Forms and the Hidden Web
    • Tables
summary
Summary
  • Objective: Automatically Recognize Document Applicability
  • Approach:
    • Conceptual Modeling
    • Recognition Heuristics
      • Density
      • Expected Values
      • Grouping
  • Result: Accuracy Near 95%

www.deg.byu.edu

ad