An evaluation of text mining tools as applied to selected scientific and engineering literature
Download
1 / 19

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature - PowerPoint PPT Presentation


  • 249 Views
  • Uploaded on
  • Presentation posted in: General

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature. Walter J. Trybula, Ph.D., IEEE Fellow Ronald E. Wyllys, Ph.D. ASIS 2000 – Chicago, Illinois November 14, 2000. Introduction. Data volume is growing and sources of information are more diverse

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An evaluation of text mining tools as applied to selected scientific and engineering literature l.jpg
An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

Walter J. Trybula, Ph.D., IEEE Fellow

Ronald E. Wyllys, Ph.D.

ASIS 2000 – Chicago, Illinois

November 14, 2000


Introduction l.jpg
Introduction Scientific and Engineering Literature

  • Data volume is growing and sources of information are more diverse

  • There is a need to evaluate this information

  • There are tools that claim to be able to find information in textbases

  • An investigation of existing tools would provide a measure of their ability.

  • If such tools worked, it might be possible to discover new knowledge.1

1 As described by Swanson as Undiscovered Public Knowledge

w.trybula@ieee.org


Objective goals l.jpg
Objective/Goals Scientific and Engineering Literature

  • Provide a means of testing the existing instruments to determine their ability to “find” knowledge.

  • Determine if any of these instruments provide useful insight to the data.

  • Evaluate the findings of domain experts to determine if the instruments are helpful.

  • Develop recommendations based on the results of the experiments.

w.trybula@ieee.org


Overview of process l.jpg
Overview of Process Scientific and Engineering Literature

  • Selected a technical area with known commonality (lithography masks).

  • Collected the most recent reports available.

  • Compile results into textbase for analysis by text mining tools.

  • Have domain experts evaluate the results.

  • Analyze their conclusions and draw recommendations for future directions.

w.trybula@ieee.org


Slide5 l.jpg

Example of Commonality Scientific and Engineering Literature

w.trybula@ieee.org


Selection of information l.jpg
Selection of Information Scientific and Engineering Literature

  • Information from leading researchers was collected.

    • Asian efforts on X-ray technology.

    • U.S. efforts on X-ray technology.

    • European efforts on Ion Projection Lithography.

    • U.S. efforts on Electron Projection Lithography.

    • U.S. efforts on Extreme UltraViolet technology.

  • Data was their annual update on technology progress provided for yearly review.

  • All reports, presentations, and data were assembled into a single textbase for analysis.

w.trybula@ieee.org


Sources of data l.jpg
Sources of Data Scientific and Engineering Literature

U.S.

Europe

Asia

  • Concerns:

    • Language

    • Terminology

    • Program (format)

w.trybula@ieee.org


Text mining tools l.jpg
Text Mining Tools Scientific and Engineering Literature

  • Selected three types of Text Mining Instruments available for desk-top operation.

    • Key terms identified with pointers to text

    • Excerpt presentation format

    • Hierarchal tree-structure presentation

      • Did not include Self-Organizing Maps (SOMs)

  • Included a search engine for baseline evaluation of the results (AltaVista).

w.trybula@ieee.org


Text mining tools9 l.jpg
Text Mining Tools Scientific and Engineering Literature

Text Mining Tool that returns Key Terms

w.trybula@ieee.org


Text mining tools10 l.jpg
Text Mining Tools Scientific and Engineering Literature

Text Mining Tool that returns Excerpts

w.trybula@ieee.org


Text mining tools11 l.jpg
Text Mining Tools Scientific and Engineering Literature

Text Mining Tool that returns Hierarchy

w.trybula@ieee.org


Results l.jpg
Results Scientific and Engineering Literature

  • No method provided any novel results. There was some difficulty with mixed format documents.

  • Domain experts were required to evaluate the output and determine importance of delivered information.

  • Graphical information presentation was preferred over simple text.

  • Search Engine provided many pointers to occurrences of search terms.

  • There was no evidence that this approach provided any novel knowledge.

w.trybula@ieee.org


Conclusions l.jpg
Conclusions Scientific and Engineering Literature

  • Text Mining instruments are in a developmental stage and need refinement to be more useful.

  • Text Mining instruments must be able to handle data in various formats, i.e., documents, spreadsheets, presentations, etc.

  • Without a defined goal of what data will be delivered, there is no commonality among the various instruments.

  • Experts had difficulty in retrieving information that was known to be present due to methodology of evaluating information in textbase.

  • There must be a cohesive direction provided for the development of these instruments.

w.trybula@ieee.org


Future directions information needs l.jpg
Future Directions – Information Needs Scientific and Engineering Literature

  • An Instrument that evaluates the text in the textbase and provides an accurate representation of the information contained therein.

  • An Instrument that provides this information in a manner that can be accurately and quickly evaluated by the intended user.

  • An Instrument that draws the best elements from existing work and provides information based on proven methodologies. (In rapidly evolving technologies, efforts in one area may ignore developments in others. This is not acceptable.)

Recommendations

w.trybula@ieee.org


Data mining process l.jpg
Data Mining Process Scientific and Engineering Literature

Recommendations

Start with existing methodology.

w.trybula@ieee.org


Text mining process l.jpg
Text Mining Process Scientific and Engineering Literature

Recommendations

Develop new methodology from existing ones.

w.trybula@ieee.org


Future directions instrument needs l.jpg
Future Directions – Instrument Needs Scientific and Engineering Literature

  • There needs to be a cohesive direction for future work. The existing development must draw on the knowledge developed in the Library Science field.

  • Can build from Data Mining to derive Text Mining functionality. A key concern will remain the method of presenting the results.

  • Need to have some agreement on the purpose of the Text Mining Instruments

    • What is the purpose of “mining” text?

    • What kind of user will there be?

    • What is the anticipated outcome?

  • Consider the application of the latest software developments, e.g., Groove, Napster, for information sharing.

Recommendations

w.trybula@ieee.org


Challenges l.jpg
Challenges Scientific and Engineering Literature

  • Establish a “goal” for the results of Text Mining. What will be accomplished?

  • Drive toward widespread application, i.e., desktop and handheld applications.

  • Incorporate latest hardware developments, i.e., distributed, parallel processing and wireless communications.

  • Deliver what the intended user needs.

  • Don’t reinvent the “wheel”

    • Have the Library Science, the Information Science, and the Computer Science people work together.

Recommendations

w.trybula@ieee.org


Acknowledgements l.jpg
Acknowledgements Scientific and Engineering Literature

  • Dean Brooke Sheldon, Sanda Erdelez, Mary Lynn Rice-Lively (GSLIS, University of Texas at Austin).

  • John Konopka of IBM.

  • The International SEMATECH team including Scott Mackay, Mark Mason, Phil Seidel, David Stark.

  • The various technology champions for their efforts in providing the latest technology information.

w.trybula@ieee.org


ad
  • Login