slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The DIADEM Ontology PowerPoint Presentation
Download Presentation
The DIADEM Ontology

Loading in 2 Seconds...

play fullscreen
1 / 20

The DIADEM Ontology - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

The DIADEM Ontology. Yiyang Bao 2 , Xiaonan Guo 2 , Giorgio Orsi 1,2 , Christian Schallhart 2 , Cheng Wang 2 1 Institute for the Future of Computing University of Oxford 2 Department of Computer Science University of Oxford. DIADEM 1.0. The languages of the web. <html> <head>

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The DIADEM Ontology' - kelda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

The DIADEM Ontology

Yiyang Bao2, Xiaonan Guo2, Giorgio Orsi1,2,

Christian Schallhart2, Cheng Wang2

1Institute for the Future of Computing

University of Oxford

2Department of Computer Science

University of Oxford

DIADEM 1.0

the languages of the web
The languages of the web

<html>

<head>

</head>

<body>

<title>

</title>

<div>

</div>

</body>

</html>

  • HTML objects provide the data model of a web-page.
  • CSS boxes and properties provide the layout.
  • Javascript provides web dynamics.

this.value.toLowerCase();

ox:address

  • … ?

xsd:string

Web

ox:Property

  • RDF annotations provide the conceptualization of the domain.

Real

World

why ontology
Why ontology?
  • Ontologies provide a conceptualization of a domain of interest (Gruber ‘93)

ox:partOf

  • But… we do not only want to model the application domain

ox:priceSegment

ox:minPrice

ox:address

  • We must model the domain of its web representations, i.e., its phenomenology.

xsd:string

ox:Property

  • In the end, it is also an ontology
why ontology1
Why ontology?
  • Can be used to complete an incomplete model.
  • Can be used to verify a model.
  • Must tolerate uncertainty and inconsistency.
a logical model for web extraction
A logical model for web extraction
  • Logical model for web entities
    • input and refinement forms.
    • result pages
    • page blocks (e.g., ads)
  • Phenomenological model
    • How logical entities are concretely represented
the building blocks
The building blocks

<form> <label for="male">Male</label> <input type="radio" name="sex" id="male" /> <label for="female">Female</label> <input type="radio" name="sex" id="female" /></form>

  • HTML entities
    • labels
    • fields (included links)
    • text-nodes and text attributes
  • Logical entities
    • constructs of our data model

<div>

<span>

Price:

</span>

<span>

£ 250

</span>

</div>

Price: £ 250

  • Rules
    • describe the phenomenology
the form model
The form model
  • Goal: model web form phenomenology
the form model1
The form model
  • Areas:
    • button
    • location
    • price
    • room
    • type
    • buy/rent
    • order-by
    • display
  • Root entity:
    • RealEstateForm
  • Properties:
    • partOf  hierarchical structures
the form model elements
The form model: elements
  • price
      • type = {min, max}
      • purpose = {buy, rent}
  • currency
  • geographic
      • location
      • area/branch
          • granularity = {area, branch}
          • area/branch input
          • Area/branch select
      • address PO
      • radius
  • room
    • category = {bathroom, bedroom, …}
    • type = {min, max}
the form model elements1
The form model: elements
  • property type
  • order-by
  • button
    • submit
    • reset
    • map search
    • advance submit
    • link button
  • display
  • per page
  • add-in-time
  • new/resale
  • SSTC
  • buy
  • rent
  • buy/rent
  • other
the form model phenomenology
The form model: phenomenology
  • Based on linguistic annotations and (visual) heuristics.

buyElement(X,F) :-

visibleField(X),

hasAnnotationFeature(X,"majorType", "reform.label"),

hasAnnotationFeature(X,"minorType", "buy"),

not hasAnnotationFeature(X,"minorType", "rent"),

not hasAnnotationFeature(X,"minorType", "includeSSTC"),

group(Ns,_,_,F),#member(X,Ns).

radiusElement(X,F) :-

visibleField(X),

hasAnnotationFeature(X,"majorType","reform.label"),

hasAnnotationFeature(X,"minorType","radius"),

group(Ns,_,_,F),#member(X,Ns).

the form model segments
The form model: segments
  • Segments
    • buttons
    • geographic
    • price
    • Room
    • property type
    • buy/rent
    • order-by
    • display
    • per page
    • add in time
    • new/resale
    • SSTC
  • A segment is:
    • a single element
    • a group of elements
    • a group of segments
    • a pair <segment, label>
  • Form
    • real-estate
the result page model
The result-page model
  • Goal: model result-pages phenomenology
the result page model1
The result-page model
  • Attributes and values
    • e.g., < price, £ 250,000 >
  • Record
    • groups of pairs < attribute, value >
  • Data area
    • groups of records
  • Mandatory attribute(s)
    • must be present in a record
    • sanity check purposes
a conceptual model for data extraction
A Conceptual Model for Data Extraction
  • Conceptual Modelling on the Web
    • Software modelling e.g., UML and stereotypes
    • Ad hoc languages e.g., WebML
diadem ontology discussion
DIADEM Ontology: discussion
  • Adaptability
    • result-page model is substantially domain independent
    • Form model is domain dependent (entity types)
      • The number of entities is limited
  • Expressive power
    • safe nr-datalog with stratified negation and aggregation
    • pros: easy to compute
    • cons: not robust to uncertainty and inconsistencies
slide19

Uncertainty, Vagueness and Inconsistencies

  • Origin
    • annotations are noisy
    • entity types are uncertain
  • Multiple models
    • probabilistic models
      • Markov Logic Networks (Lukasiewicz and Simari)
      • C-tables, Bayesian Networks (Olteanu)
    • ASP
      • disjunctive models
      • weak constraints