Efforts to automate labeling of lectures with computing ontology terms
Download
1 / 10

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS - PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS. Felicia Decker and Lois Delcambre Portland State University. PREVIOUS WORK. Course Intro to Databases We found 6 courses – on the web – with all lectures Lecture notes ppt/pdf/html

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS' - george-foster


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Efforts to automate labeling of lectures with computing ontology terms

EFFORTS TO AUTOMATE LABELING OF LECTURES WITH COMPUTING ONTOLOGY TERMS

Felicia Decker and Lois Delcambre

Portland State University


Previous work
PREVIOUS WORK

  • Course

    • Intro to Databases

    • We found 6 courses – on the web – with all lectures

  • Lecture notes

    • ppt/pdf/html

  • Hand-labeled each lecture topic with Computing Ontology (CO) terms

    • used this to validate the CO

    • leaf CO terms correspond to lecture topics


Current work
CURRENT WORK

  • Will the words that appear in these lecture notes help us choose CO terms?Are there “signature” words for each topic?

  • Tools

    • Lucene

    • Converter tools (ppt/pdf/html -> text)

    • Microsoft Excel


Lucene
LUCENE

  • Index lecture notes

    • text from one lecture = one document

    • documents/lectures from one course = one collection (with an index)

  • Provides us with

    • Term frequency (tf)

    • Inverse document frequency (idf)

    • Tf-idf

  • Currently using single words, just now introducing stemming


Converter tools
CONVERTER TOOLS

  • Lecture notes come in different formats

  • PPT -> text

    • Apache POI

  • PDF -> text

    • TextMiningTool 1.1.42

    • Xpdf-3.02

  • HTML -> text

    • Copy/paste

    • Internet Explorer – save webpage as text


Excel
EXCEL

  • After using Lucene to get tf, idf and tf-idf data for each term in the given index…

  • Select a CO term: e.g., Normalization

    • Using CO-labeled lecture notes (previous work), choose the lectures labeled with Normalization

    • Compile tf/idf/tf-idf data into one spreadsheet


Hand label words from lectures as important
HAND-LABEL WORDS FROM LECTURES AS “IMPORTANT”

  • Signature words were human-selected from Database Management Systems by Ramakrishnan and Gehrke, 3rd Ed.

  • Use Find All/Replace All function in Excel to highlight all signature words that identify Normalization



  • Initial effort results
    INITIAL EFFORT: RESULTS

    • Conclusions

      • Tf-idf is not a strong indicator

        • Cannot solely rely on tf-idf

      • ‘Running example’

        • While good for teaching

        • We don’t care about this data

      • Stemming is important

      • Use of phrases may help


    Next steps
    NEXT STEPS

    • Intersection of terms across all classes

      • May solve ‘running example’ problem

      • Compute average rank

      • Compute average tf-idf (?)

    • Union all documents with the same CO label(union text from all the lectures on normalization, union text from all lectures on query optimization, etc.)

      • Look at tf-idf

    • Consider various classification algorithms (looking to see if there are some implemented for Lucene)


    ad