1 / 26

Semantic Search for NSF Decision Making

Semantic Search for NSF Decision Making. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ April 4, 2012. Overview. Background NITRD Dashboards

xandy
Download Presentation

Semantic Search for NSF Decision Making

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Search forNSF Decision Making Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ April 4, 2012

  2. Overview • Background • NITRD Dashboards • Data.gov Developer Community • Research.gov Dashboard • Semantic MedLine • Some Next Steps

  3. Background • My role at EPA, as their Senior Enterprise Architect and Data Scientist, and as lead for several Federal CIO Council activities, and since leaving government to become Director and Senior enterprise Architect-Data Scientist of Semantic Community, has been to implement high-level direction as follows:

  4. Background • Teri Takai (DoD CIO) - Harvard Leadership for a Networked World, Lead Practitioner. I am an Invited Practitioner that Mentors Students under her direction. • Social Business Intelligence from Open Government Data • Letitia Long, Director of the National Geospatial Intelligence Agency. I am the lead for the pilot demonstration for the NCOIC-NGA CRADA at the upcoming 13th SOA for eGov Conference, April 3rd • A Quint – Cross Information Sharing and Integration for the Intelligence Community • Demonstration at the 13th SOA for E-Government Conference, April 3, 2012, at MITRE • Donna Roy, Executive Director of NIEM. She requested that I provide suggestions and demonstrations for evolving NIEM which I have done twice. • A Plan for Scaling NIEM to Big Data • Build The NIEM Information Exchange Clearinghouse In The Cloud • Gus Hunt, CIA CTO. He challenged me to show how to make the CIA World Fact Book more semantic and to work with Digital Reasoning. • CIA World Fact Book • Digital Reasoning

  5. Background • Sonny Bhagowahlia, David McClure, and Jeanne Holm (Data.gov Program Executive, GSA Associate Administrator, and Data.gov Evangelist, respectively) challenged me to do data science for Data.gov. • Data.gov • Data.gov Developers Community Space Launched • Wyatt Kash, Editor in Chief for AOL Government, challenged me to build Shared Services like Federal CIO Steven VanRoekel is asking for. • Federal IT Dashboard in Motion and In Memory • Dennis Wisnosky, DoD CTO, and Walt Okon, DoD Senior Architect Engineer challenged me to Build DoD in the Cloud and Federate It with Other DoD and non-DoD Architectures (e.g. TOGAF) • Build DoD in the Cloud and Build TOGAF in the Cloud • Enterprise Information Web for Semantic Interoperability at DoD • Dr. George Strawn, Director of the NSF NITRD and White House OSTP Staff to the CTO (Aneesh Chopra and Todd Park), challenged me to do data science dashboards. • A NITRD Dashboard (March and April 2011) • SIRA for Semantic Search (August 10, 2011) • A Research.gov Dashboard (March 2012) • Semantic MedLine (In process)

  6. NITRD Dashboards Note: Also see Build the NITRD Dashboard in the Cloud and Build the R&D Dashboard in the Cloud. http://semanticommunity.info/A_NITRD_Dashboard#Spotfire

  7. Data.gov Developer Community • Play the role of a data scientist from an agency, use a platform that supports the things below, and build an app that provides semantic search for NSF abstracts that allows decision makers to identify future scientific research needs. • My distilled suggestions for the recent excellent Data.gov meeting are: • Add a data scientist to the Data.gov team to lead a community of data scientists from the agencies and non-government organizations in a new community. • Ensure that the new data.gov platform supports the sitemap and schema protocols with well-defined URLs for content, faceted search, and big data in memory. • Encourage the new developer community to build their own data.gov sites to become both publishers and consumers of data to support the new data scientist community above. • Note: Invited to give presentation the end of April by Jeanne Holm, Data.gov Evangelist. http://semanticommunity.info/AOL_Government/Data.gov_Developers_Community_Space_Launched

  8. Research.gov Dashboard • Build an app that provides semantic search for NSF abstracts that allows decision makers to identify future scientific research needs. • Created 176 MB Excel file(60,981 rows by 44 columns) for Spotfire Dashboard. • Get 2011 data from state tables? • Tried to extract text for Semantic Search with SIRA and Digital Reasoning but found Abstract text is cut off and URLs are embedded in Publications and Project Outcomes columns.

  9. Research.gov Spending & Results Download Data Sets https://www.research.gov/

  10. Research.gov Dashboard http://semanticommunity.info/A_NITRD_Dashboard/Research.gov#Spotfire_Dashboard

  11. Sample of Hand Parsed Text Note: We will need to get the raw text data to accomplish the objectives of this work. http://semanticommunity.info/A_NITRD_Dashboard/Research.gov/Sample_of_Hand_Parsed_Text

  12. Semantic MedLine Prototype: Home • Semantic MEDLINE is a prototype Web application that summarizes MEDLINE citations returned by a PubMed search. Natural language processing is used to analyze salient content in titles and abstracts. This information is then presented in a graph that has links to the MEDLINE text processed. • Currently, the results from 35 PubMed searches (including a variety of disorders and drugs) are available to be processed. The 500 most recent citations (from the date of the search) are available for further processing by Semantic MEDLINE. • Begin at the Search tab by selecting a search; then move to the Summarize tab. Choose a summary type to specify the point of view of the summary (Treatment of Disease, Substance Interactions, Diagnosis, or Pharmacogenomics). After selecting the topic of the summary, click the Summarize and Visualize button. The graph appears below. Right click on an edge to display a MEDLINE citation. http://skr3.nlm.nih.gov/SemMedDemo/index.jsp

  13. Semantic MedLine Prototype: Search http://skr3.nlm.nih.gov/SemMedDemo/InitializeSearch.do

  14. Semantic MedLine Prototype: Summarize http://skr3.nlm.nih.gov/SemMedDemo/Summary.do

  15. Semantic MedLine http://skr3.nlm.nih.gov/SemMed/

  16. Semantic MedLine Prototype:Knowledgebase http://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline

  17. Semantic MedLine:Predication Database Note: Large Tar and GZIP files! ftp://lhcftp.nlm.nih.gov/outgoing/cgsb/

  18. Semantic MedLine:Data Extraction http://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline/Data_Extraction

  19. Semantic MedLine:Analytics I have questions based on these analytics. Web Player

  20. Semantic MedLine:Analytics Web Player

  21. Semantic MedLine:Analytics Web Player

  22. Some Next Steps • We will need to get the raw text data to accomplish the objectives of the work with the Research.gov Abstracts, Project Outcomes, etc. • We need to extract the large Semantic MedLine Predication Databases files for Semantic Search with SIRA and Digital Reasoning.

  23. AOL Government Stories • Semantic Medline (Pending) • HPN Health Prize for Health Data Palooza (Pending) • From Catalyst to Semantic Synthesis - How the IC Finds More Needles in Bigger Haystacks (Pending) • Challenges and Opportunities in Big Data: Defense Department Bets Big On Big Data • Semantics and Ontologies for the Intelligence Community Working Toward Standards (Pending) • Data.gov Developers Community Space Launched - Is Dr. Merkin In the House? (Pending) • Building Trust Between Cloud Computing Providers and Suppliers • Health Datapalooza Would Benefit From Real Innovation Investment • Has NIEM Reached A Choke Point With Big Data • Put Federal IT Dashboard Into Motion • Why The Intelligence Community Loves Big Data • Big Data Science Visualizations Past Present and Future http://semanticommunity.info/#AOL_Government_Stories

  24. Challenges and Opportunities in Big Data http://gov.aol.com/2012/03/30/defense-department-bets-big-on-big-data/

  25. My Suggestions • I think it leaves us with a disconnected federal big data program between the science and intelligence communities with the former considerably behind the latter. • As Professor Jim Hendler, RPI Computer Scientist, commented during the meeting: "Computer scientists like us have to move to the social science side of things to really do big data.“ • This new White House Initiative needs Todd Park's entrepreneurial spirit, Gus Hunt's experience, and DoD's new money, spent in a coordinated way with the IC and civilian agencies to make big data across the federal government a reality.

  26. Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) http://www.nsf.gov/publications/pub_summ.jsp?WT.z_pims_id=504767&ods_key=nsf12499

More Related