An evaluation of text mining tools as applied to selected scientific and engineering literature l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on
  • Presentation posted in: General

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature. Walter J. Trybula, Ph.D., IEEE Fellow Ronald E. Wyllys, Ph.D. ASIS 2000 – Chicago, Illinois November 14, 2000. Introduction. Data volume is growing and sources of information are more diverse

Download Presentation

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An evaluation of text mining tools as applied to selected scientific and engineering literature l.jpg

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

Walter J. Trybula, Ph.D., IEEE Fellow

Ronald E. Wyllys, Ph.D.

ASIS 2000 – Chicago, Illinois

November 14, 2000


Introduction l.jpg

Introduction

  • Data volume is growing and sources of information are more diverse

  • There is a need to evaluate this information

  • There are tools that claim to be able to find information in textbases

  • An investigation of existing tools would provide a measure of their ability.

  • If such tools worked, it might be possible to discover new knowledge.1

1 As described by Swanson as Undiscovered Public Knowledge

[email protected]


Objective goals l.jpg

Objective/Goals

  • Provide a means of testing the existing instruments to determine their ability to “find” knowledge.

  • Determine if any of these instruments provide useful insight to the data.

  • Evaluate the findings of domain experts to determine if the instruments are helpful.

  • Develop recommendations based on the results of the experiments.

[email protected]


Overview of process l.jpg

Overview of Process

  • Selected a technical area with known commonality (lithography masks).

  • Collected the most recent reports available.

  • Compile results into textbase for analysis by text mining tools.

  • Have domain experts evaluate the results.

  • Analyze their conclusions and draw recommendations for future directions.

[email protected]


Slide5 l.jpg

Example of Commonality

[email protected]


Selection of information l.jpg

Selection of Information

  • Information from leading researchers was collected.

    • Asian efforts on X-ray technology.

    • U.S. efforts on X-ray technology.

    • European efforts on Ion Projection Lithography.

    • U.S. efforts on Electron Projection Lithography.

    • U.S. efforts on Extreme UltraViolet technology.

  • Data was their annual update on technology progress provided for yearly review.

  • All reports, presentations, and data were assembled into a single textbase for analysis.

[email protected]


Sources of data l.jpg

Sources of Data

U.S.

Europe

Asia

  • Concerns:

    • Language

    • Terminology

    • Program (format)

[email protected]


Text mining tools l.jpg

Text Mining Tools

  • Selected three types of Text Mining Instruments available for desk-top operation.

    • Key terms identified with pointers to text

    • Excerpt presentation format

    • Hierarchal tree-structure presentation

      • Did not include Self-Organizing Maps (SOMs)

  • Included a search engine for baseline evaluation of the results (AltaVista).

[email protected]


Text mining tools9 l.jpg

Text Mining Tools

Text Mining Tool that returns Key Terms

[email protected]


Text mining tools10 l.jpg

Text Mining Tools

Text Mining Tool that returns Excerpts

[email protected]


Text mining tools11 l.jpg

Text Mining Tools

Text Mining Tool that returns Hierarchy

[email protected]


Results l.jpg

Results

  • No method provided any novel results. There was some difficulty with mixed format documents.

  • Domain experts were required to evaluate the output and determine importance of delivered information.

  • Graphical information presentation was preferred over simple text.

  • Search Engine provided many pointers to occurrences of search terms.

  • There was no evidence that this approach provided any novel knowledge.

[email protected]


Conclusions l.jpg

Conclusions

  • Text Mining instruments are in a developmental stage and need refinement to be more useful.

  • Text Mining instruments must be able to handle data in various formats, i.e., documents, spreadsheets, presentations, etc.

  • Without a defined goal of what data will be delivered, there is no commonality among the various instruments.

  • Experts had difficulty in retrieving information that was known to be present due to methodology of evaluating information in textbase.

  • There must be a cohesive direction provided for the development of these instruments.

[email protected]


Future directions information needs l.jpg

Future Directions – Information Needs

  • An Instrument that evaluates the text in the textbase and provides an accurate representation of the information contained therein.

  • An Instrument that provides this information in a manner that can be accurately and quickly evaluated by the intended user.

  • An Instrument that draws the best elements from existing work and provides information based on proven methodologies. (In rapidly evolving technologies, efforts in one area may ignore developments in others. This is not acceptable.)

Recommendations

[email protected]


Data mining process l.jpg

Data Mining Process

Recommendations

Start with existing methodology.

[email protected]


Text mining process l.jpg

Text Mining Process

Recommendations

Develop new methodology from existing ones.

[email protected]


Future directions instrument needs l.jpg

Future Directions – Instrument Needs

  • There needs to be a cohesive direction for future work. The existing development must draw on the knowledge developed in the Library Science field.

  • Can build from Data Mining to derive Text Mining functionality. A key concern will remain the method of presenting the results.

  • Need to have some agreement on the purpose of the Text Mining Instruments

    • What is the purpose of “mining” text?

    • What kind of user will there be?

    • What is the anticipated outcome?

  • Consider the application of the latest software developments, e.g., Groove, Napster, for information sharing.

Recommendations

[email protected]


Challenges l.jpg

Challenges

  • Establish a “goal” for the results of Text Mining. What will be accomplished?

  • Drive toward widespread application, i.e., desktop and handheld applications.

  • Incorporate latest hardware developments, i.e., distributed, parallel processing and wireless communications.

  • Deliver what the intended user needs.

  • Don’t reinvent the “wheel”

    • Have the Library Science, the Information Science, and the Computer Science people work together.

Recommendations

[email protected]


Acknowledgements l.jpg

Acknowledgements

  • Dean Brooke Sheldon, Sanda Erdelez, Mary Lynn Rice-Lively (GSLIS, University of Texas at Austin).

  • John Konopka of IBM.

  • The International SEMATECH team including Scott Mackay, Mark Mason, Phil Seidel, David Stark.

  • The various technology champions for their efforts in providing the latest technology information.

[email protected]


  • Login