Efforts to automate labeling of lectures with computing ontology terms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 10

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on
  • Presentation posted in: General

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS. Felicia Decker and Lois Delcambre Portland State University. PREVIOUS WORK. Course Intro to Databases We found 6 courses – on the web – with all lectures Lecture notes ppt/pdf/html

Download Presentation

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Efforts to automate labeling of lectures with computing ontology terms

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS

Felicia Decker and Lois Delcambre

Portland State University


Previous work

PREVIOUS WORK

  • Course

    • Intro to Databases

    • We found 6 courses – on the web – with all lectures

  • Lecture notes

    • ppt/pdf/html

  • Hand-labeled each lecture topic with Computing Ontology (CO) terms

    • used this to validate the CO

    • leaf CO terms correspond to lecture topics


Current work

CURRENT WORK

  • Will the words that appear in these lecture notes help us choose CO terms?Are there “signature” words for each topic?

  • Tools

    • Lucene

    • Converter tools (ppt/pdf/html -> text)

    • Microsoft Excel


Lucene

LUCENE

  • Index lecture notes

    • text from one lecture = one document

    • documents/lectures from one course = one collection (with an index)

  • Provides us with

    • Term frequency (tf)

    • Inverse document frequency (idf)

    • Tf-idf

  • Currently using single words, just now introducing stemming


Converter tools

CONVERTER TOOLS

  • Lecture notes come in different formats

  • PPT -> text

    • Apache POI

  • PDF -> text

    • TextMiningTool 1.1.42

    • Xpdf-3.02

  • HTML -> text

    • Copy/paste

    • Internet Explorer – save webpage as text


Excel

EXCEL

  • After using Lucene to get tf, idf and tf-idf data for each term in the given index…

  • Select a CO term: e.g., Normalization

    • Using CO-labeled lecture notes (previous work), choose the lectures labeled with Normalization

    • Compile tf/idf/tf-idf data into one spreadsheet


Hand label words from lectures as important

HAND-LABEL WORDS FROM LECTURES AS “IMPORTANT”

  • Signature words were human-selected from Database Management Systems by Ramakrishnan and Gehrke, 3rd Ed.

  • Use Find All/Replace All function in Excel to highlight all signature words that identify Normalization


  • Initial effort

    INITIAL EFFORT


    Initial effort results

    INITIAL EFFORT: RESULTS

    • Conclusions

      • Tf-idf is not a strong indicator

        • Cannot solely rely on tf-idf

      • ‘Running example’

        • While good for teaching

        • We don’t care about this data

      • Stemming is important

      • Use of phrases may help


    Next steps

    NEXT STEPS

    • Intersection of terms across all classes

      • May solve ‘running example’ problem

      • Compute average rank

      • Compute average tf-idf (?)

    • Union all documents with the same CO label(union text from all the lectures on normalization, union text from all lectures on query optimization, etc.)

      • Look at tf-idf

    • Consider various classification algorithms (looking to see if there are some implemented for Lucene)


  • Login