L a s i
Download
1 / 43

L.A.S.I. - PowerPoint PPT Presentation


  • 209 Views
  • Uploaded on

L.A.S.I. Linguistic Analysis for Subject Identification. Feasibility Presentation Presented by: CS410 Red Group. November 12, 2012. Outline. Team Red Staff Chart Introduction Societal Problem Case Study Proposed Solution Major Component Diagram Algorithm The Competition Risk

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' L.A.S.I.' - zed


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
L a s i

L.A.S.I.

Linguistic Analysis for Subject Identification

Feasibility Presentation

Presented by:CS410 Red Group

November 12, 2012


November 12, 2012

Outline

  • Team Red Staff Chart

  • Introduction

  • Societal Problem

  • Case Study

  • Proposed Solution

  • Major Component Diagram

  • Algorithm

  • The Competition

  • Risk

  • Conclusion


Team red staff chart

410 Red Group

November 12, 2012

Team Red Staff Chart

Scott Minter

Project Co Leader

Software Specialist

Brittany Johnson

Project Co Leader

Documentation Specialist

Dustin Patrick

Algorithm Specialist

Expert Liaison

Richard Owens

Documentation Specialist

Communication Specialist

Erik Rogers

Marketing Specialist

GUI Developer

Aluan Haddad

Algorithm Specialist

Software Specialist


What is a theme

410 Red Group

November 12, 2012

What is a theme?


410 Red Group

November 12, 2012

A specific and distinctive quality, characteristic, or concern.1

1“Theme” Merriam Webster


What are you looking for when you are identifying a theme

410 Red Group

November 12, 2012

What are you looking for when you are identifying a theme?


410 Red Group

November 12, 2012

5 W’s & 1 H

  • Who

  • What

  • When

  • Where

  • Why

  • How


410 Red Group

November 12, 2012

Bill’s stove was broken. He has been saying for months that he would go to the appliance store to buy a new one. He had some free time yesterday, so he drove to the store to buy a new stove.


410 Red Group

November 12, 2012


The theme from the 5 w s 1 h

410 Red Group

November 12, 2012

The Theme from the 5 W’s & 1 H

Bill drove to the store yesterday to buy a new stove because his broke.


Why are themes important

410 Red Group

November 12, 2012

Why are themes important?

  • Comprehension

  • Summarization

  • Assists in communication between people


Societal problem

410 Red Group

November 12, 2012

Societal Problem

It is difficult for people to identify a common theme over a large set of documents in a timely, consistent, and objective manner.


How long does it take

410 Red Group

November 12, 2012

How long does it take?

  • Finding a theme over multiple documents is a time-consuming process.

  • The average reading speed of an adult is 250 words per minute.2

    2Thomas "What Is the Average Reading Speed and the Best Rate of Reading?"


Consistency and objectivity

410 Red Group

November 12, 2012

Consistency and Objectivity

  • The criteria for evaluation may vary from person to person.

  • Large quantities of documents must be mentally digested, assessed, and interrelated.


Dr patrick hester

410 Red Group

November 12, 2012

Dr. Patrick Hester

“My research interests include multi-objective decision making under uncertainty, probabilistic and non probabilistic uncertainty analysis, critical infrastructure protection, and decision making using modeling and simulation.” 3

- Dr. Hester

Ph. D. from Vanderbilt University, 2007

Major: Risk and Reliability Engineering and Management

3Patrick Hester Website


410 Red Group

November 12, 2012

  • Dr. Hester is a systems analyst and researcher

    • He Must

      • Conduct extensive research

      • Quickly become familiar with client systems

      • Formulate concise, objective assessments

  • LASI will help with all of this


Assessment improvement design a i d

410 Red Group

November 12, 2012

Assessment Improvement Design (A.I.D.)

  • Preliminary Problem statement Identified from document

  • Problem statement then used to find Critical Operational Issues (COI’s)

  • COIs used to find Measures of Effectiveness (MOE’s)

  • MOE’s used to find Measures of Performance (MOP’s)


Current method

410 Red Group

November 12, 2012

Current Method

Continue on to the rest of the A.I.D Process

Customer Contact

yes

Is Customer satisfied?

Situational Awareness Meeting

Problem Statement Presentation

no

Will NCSOSE be needed?

yes

Document Gathering Process

Document Analysis

no

Client Goes Elsewhere


Lasi linguistic analysis for subject identification

410 Red Group

November 12, 2012

LASI: Linguistic Analysis for Subject Identification

THEMES

LASI


Our proposed solution

410 Red Group

November 12, 2012

Our Proposed Solution

  • LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to:

    • accurately find themes

    • be system efficient

    • provide consistent results


What do we mean by linguistic analysis

410 Red Group

November 12, 2012

What do we mean by “linguistic analysis”?

The contextual study of written works and how the words combine to form an overall meaning.


Linguistic analysis involves

410 Red Group

November 12, 2012

Linguistic analysis involves

Syntactic

Semantic

  • Logical grammar

  • Statistical Data

    • Alphabetical Frequencies

    • Word Counts

    • Parts of Speech

  • Word Dependencies

  • Relating syntactic structures to language-independent meanings

  • Extracting meaning and conceptional arguments

  • Summarization


The wills and will nots of lasi

410 Red Group

November 12, 2012

The Wills and Will Nots of LASI

What LASI Will Do

What LASI WillNot Do

  • Analyze multiple documents to find common themes

  • Provide statistical data to help a user make a decision

  • Provide a concise synopsis

  • Provide a single theme


Who would this appeal to

410 Red Group

November 12, 2012

Who Would This Appeal To?

  • Researchers

  • Consultants

  • Academics

  • Students


Benefits to the customer

410 Red Group

November 12, 2012

Benefits To The Customer

  • Time saving

  • Objective output

  • Consistent output

  • Cost saving solution


How does lasi fit into our case study

410 Red Group

November 12, 2012

How does LASI fit into our Case Study?


Before lasi

410 Red Group

November 12, 2012

Before LASI

Customer Contact

Continue on to the rest of the A.I.D Process

yes

Is the Customer satisfied?

Situational Awareness Meeting

Problem Statement Presentation

no

Will NCSOSE be needed?

yes

Document Gathering Process

Document Analysis

no

Client Goes Elsewhere


After lasi

410 Red Group

November 12, 2012

After LASI

Customer Contact

Continue on to the rest of the A.I.D Process

yes

Is the Customer satisfied?

Situational Awareness Meeting

Problem Statement Presentation

no

Will NCSOSE be needed?

yes

Document Gathering Process

LASI Aided Document Analysis

no

Client Goes Elsewhere


Major functional components

410 Red Group

November 12, 2012

Major Functional Components

Hardware

Software

Algorithm:

Extrapolates the most likely congruence of themes and ideas across all documents in the input domain

  • High End Notebook PC

  • - Computation

  • Quad-Core CPU

  • - Primary Memory

  • 8.0 GB DDR3 RAM

  • - Document Storage

  • Solid State Storage

  • ~$1500 USD

User Interface:

- Multi-Level Views

- Weighted Phrase List

- Detailed Breakdown

- Step by Step Justification


Linguistic analysis algorithm

410 Red Group

November 12, 2012

Linguistic Analysis Algorithm

Primary Analysis:

Word Count and Syntactic Assessment

Tertiary Analysis:

Semantic Relationship Assessment

Secondary Analysis:

Associative Identification

Traverse Document in Word-Wise Manner

Bind Pronouns to Nouns, Updating Frequency

Identify Potential Synonyms

Assess Potential Subject-Object-Verb Relationships

Identify Corresponding Parts of Speech

Bind Adjectives to Nouns

Output List of Weighted Themes

Determine Frequency by Grammatical Role

Identify Potential Noun Phrases


November 12, 2012

Milestone diagram


The competition

410 Red Group

November 12, 2012

The Competition


The competition1

410 Red Group

November 12, 2012

The Competition


Wordstat

410 Red Group

November 12, 2012

WordStat


Stanford corenlp

410 Red Group

November 12, 2012

Stanford CoreNLP


Readme

410 Red Group

November 12, 2012

ReadMe


Automap

410 Red Group

November 12, 2012

Automap


Risk matrix

410 Red Group

November 12, 2012

Risk Matrix

Customer Risks

C1 -- Product Interest

C2 -- Maintenance

C3 -- Trust

Technical Risks

T1 -- System Limitations

T2 -- Scanned Text Recognition

T3 -- Jargon Recognition

T4 – Illegal Character Handling


Customer risks

410 Red Group

November 12, 2012

Customer Risks

C1. Product Interest

Probability 2 Impact 4

Mitigation: LASI offers unique functionality and user friendliness.

C2. Maintenance

Probability 3 Impact 2

Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time.

C3. Trust

Probability 3Impact 3

Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning


Technical risks

410 Red Group

November 12, 2012

Technical Risks

T1. System Limitations

Probability 4 Impact 2

Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code.

T2. Scanned Text Recognition

Probability 4 Impact 3

Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text


Technical risks1

410 Red Group

November 12, 2012

Technical Risks

T3. Jargon Recognition

Probability 3 Impact 2

Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference.

T4. Illegal Character Handling

Probability 4 Impact 2

Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods


410 Red Group

November 12, 2012

  • Conclusion

  • LASI is feasible.

  • LASI is a decision support tool not a decision making tool.

  • Implications of success affect a wide area of study and professions.

  • In order for LASI to succeed the output needs to immediately usable and the interface user-friendly.


References

410 Red Group

November 12, 2012

References

  • "Theme." Def. 1b. Merriam Webster. N.p., n.d. Web. 19 Oct. 2012. <http://www.merriam-webster.com/dictionary/theme >.

  • Thomas, Mark. "What Is the Average Reading Speed and the Best Rate of Reading?" What Is the Average Reading Speed and the Best Rate of Reading? Web. 19 Oct. 2012. <http://www.healthguidance.org/entry/13263/1/What-Is-the-Average- Reading-Speed-and-the-Best-Rate-of-Reading.html>.

  • “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012

    <http://www.odu.edu/directory/people/p/pthester>.

    Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012 <http://project.carrot2.org>.

    ”WordStat” Provalis Research. Web. 24 Sept. 2012. <http://provalisresearch.com/products/content-analysis-software/>.

    “ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012. <http://gking.harvard.edu/node/4520/rbuild_documentation/readme.pdf>

    "AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012. <http://www.alchemyapi.com/api/>.

    "AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012. <http://www.casos.cs.cmu.edu/projects/automap/>.

    "CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct. 2012. <http://www.clres.com/>.


ad