1 / 14

Data Science for Business: Semantic Verses

Data Science for Business: Semantic Verses. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

marcin
Download Presentation

Data Science for Business: Semantic Verses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science for Business:Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup February 14, 2014

  2. Data Science for Business • Book Review Summary: • If you are a data scientist, take this as our challenge: think deeply about exactly why your work is relevant to helping the business and be able to present it as such. • Remember: • If you can’t explain it simply, you don’t understand it well enough.—Albert Einstein • Semantic Verses Magnet: • “Magnet is the only engine that treats topics as semantic objects, which gives it a competitive edge since the identification of “key topics” is generally considered to be the main feature of any semantic engine.” • “Semantic is used here to refer to understanding what a piece of text is about. We do not claim we are doing NLP/NLU for question/answering purposes.” • Source: Walid S. Saba, PhD, AI/NLP Scientist, February 2014.

  3. Magnet Text Analysis Engine:Understands What the Text is About http://semanticverses.com/Default.aspx

  4. Data Science for Business Knowledge Base • My Note: A Knowledge Base* with: • Data Story • Slides • Data Sets • Spotfire Dashboard • Book Web Pages • *Structured Mashup with everything treated • as an object with a well-defined URL for the • Glossary (taxonomy) and Table of Contents (thesaurus) • Integrated together in an Information Model! http://semanticommunity.info/Data_Science/Data_Science_for_Business

  5. MindTouch • MindTouch: • Treats topics as semantic objects (they can be searched for links to content). • MindTouch headings identify “key topics” (see Table of Content for book in this page). • Allows one to construct a natural language front-end for enterprise data (and big data) integration across multiple sources (Google Chrome and Spotfire can Find words and data in their mashup Knowledge Bases). • Can be combine with Be Informed, YARCData, and big data analytics (Spotfire) and could pilot including Semantic Verses. • An example of expert subject matter that serves to provide a metamodel of topics as an interface to the integration of content (text and data) that can be both personalized by the user and integrated with similar metamodels. • Semantic Community: • Doing Natural Language Processing (NLP)/Natural Language Understanding (NLU) by hand in MIndTouch and I see why it is so difficult to automate for massive information on the Internet without Subject Matter Expertise and Structure.

  6. Specific Example:TFIDF - Term Frequency (TF) and Inverse Document Frequency (IDF) • Using Google Find for TFIDF (12 hits) where the first is: Combining Them: TFIDF which says: See “Example: Attribute Selection with Information Gain” on page 56. • Which says: For a dataset with instances described by attributes and a target variable, we can determine which attribute is the most informative with respect to estimating the value of the target variable. We also can rank a set of attributes by their informativeness, in particular by their information gain. This can be used simply to understand the data better. It can be used to help predict the target. Or it can be used to reduce the size of the data to be analyzed, by selecting a subset of attributes in cases where we can not or do not want to process the entire dataset. • See this UC Irvine Machine Learning Repository page for the data set used to illustrate information gain.

  7. Using Google Find for TFIDF 1

  8. Using Google Find for TFIDF 10

  9. The Data Mining Process 1 • Business Understanding • Data Understanding • Data Preparation • Modeling • Evaluation • Deployment

  10. The Data Mining Process 2 • Business Understanding: • Use real Subject Matter Expertise content instead of general Web content. • Data Understanding: • Make all content data so unstructured, semi-structured, and structure information are integrated data. • Data Preparation: • Create an index of content topics and objects that is both a relational and graph database. • Modeling: • A searchable Information Model with Analytics (Ontology) linked to the Thesaurus (Taxonomy) linked to the Glossary (Vocabulary). • Evaluation: • Finding more needles in the needle haystack and discovering things of interest that you did not know how to look for. • Deployment: • Publically available on the Web using the Google Chrome Browser.

  11. Data Preparation Topics Knowledge Base URL Function Within Topic URLs Figure and Tables URLs Within Footnote URL Relational and Graph (Subject, Object, & Predicate) Databases

  12. Modeling A searchable Information Model with Analytics (Ontology) linked to the Thesaurus (Taxonomy) linked to the Glossary (Vocabulary)

  13. Evaluation • Find: • The find tool is a fast way to find contents in your data, navigate in the analysis, and to perform actions found in the menus of Spotfire. It consists of a text field where you enter a search string and a list of results for the search. • To reach the Find dialog: Press Ctrl+F. OR Select Tools > Find.... • Searching in TIBCO Spotfire: • There are many places in TIBCO Spotfire where you can search for different items. For example, you can search for filters, analyses in the library or elements used to build information links in the Information Designer. All of the available search fields use the same basic search syntax, which is presented below. For more information regarding search of a specific item, see the links at the bottom of this page. • Tip: If you cannot find what you are looking for, try adding more wildcards. For example, to locate a filter called "Sales ($)" , enter the search expression "Sales ($*", to avoid interpreting the text within the parenthesis as a Boolean expression. http://semanticommunity.info/Data_Science/TIBCO_Spotfire_6_for_Data_Science#Find

  14. Deployment • Publically available on the Web using the Google Chrome Browser. Web Player

More Related