Efforts to automate labeling of lectures with computing ontology terms
Sponsored Links
This presentation is the property of its rightful owner.
1 / 10

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS. Felicia Decker and Lois Delcambre Portland State University. PREVIOUS WORK. Course Intro to Databases We found 6 courses – on the web – with all lectures Lecture notes ppt/pdf/html

Download Presentation

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS

Felicia Decker and Lois Delcambre

Portland State University


PREVIOUS WORK

  • Course

    • Intro to Databases

    • We found 6 courses – on the web – with all lectures

  • Lecture notes

    • ppt/pdf/html

  • Hand-labeled each lecture topic with Computing Ontology (CO) terms

    • used this to validate the CO

    • leaf CO terms correspond to lecture topics


CURRENT WORK

  • Will the words that appear in these lecture notes help us choose CO terms?Are there “signature” words for each topic?

  • Tools

    • Lucene

    • Converter tools (ppt/pdf/html -> text)

    • Microsoft Excel


LUCENE

  • Index lecture notes

    • text from one lecture = one document

    • documents/lectures from one course = one collection (with an index)

  • Provides us with

    • Term frequency (tf)

    • Inverse document frequency (idf)

    • Tf-idf

  • Currently using single words, just now introducing stemming


CONVERTER TOOLS

  • Lecture notes come in different formats

  • PPT -> text

    • Apache POI

  • PDF -> text

    • TextMiningTool 1.1.42

    • Xpdf-3.02

  • HTML -> text

    • Copy/paste

    • Internet Explorer – save webpage as text


EXCEL

  • After using Lucene to get tf, idf and tf-idf data for each term in the given index…

  • Select a CO term: e.g., Normalization

    • Using CO-labeled lecture notes (previous work), choose the lectures labeled with Normalization

    • Compile tf/idf/tf-idf data into one spreadsheet


HAND-LABEL WORDS FROM LECTURES AS “IMPORTANT”

  • Signature words were human-selected from Database Management Systems by Ramakrishnan and Gehrke, 3rd Ed.

  • Use Find All/Replace All function in Excel to highlight all signature words that identify Normalization


  • INITIAL EFFORT


    INITIAL EFFORT: RESULTS

    • Conclusions

      • Tf-idf is not a strong indicator

        • Cannot solely rely on tf-idf

      • ‘Running example’

        • While good for teaching

        • We don’t care about this data

      • Stemming is important

      • Use of phrases may help


    NEXT STEPS

    • Intersection of terms across all classes

      • May solve ‘running example’ problem

      • Compute average rank

      • Compute average tf-idf (?)

    • Union all documents with the same CO label(union text from all the lectures on normalization, union text from all lectures on query optimization, etc.)

      • Look at tf-idf

    • Consider various classification algorithms (looking to see if there are some implemented for Lucene)


  • Login