1 / 19

Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign

M ultimodal I nformation A ccess and S ynthesis A DHS Institute of Discrete Science Center @ UIUC. Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign. MIAS Mission. Most of the data today is unstructured

nili
Download Presentation

Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimodal Information Access and Synthesis A DHS Institute of Discrete Science Center @ UIUC Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign

  2. MIAS Mission • Most of the data today is unstructured • books, newspaper articles, journal publications, field reports, images, and audio and video streams. • How to deal with the huge amount of unstructured data as if it was organized in a database with a known schema. • how to locate, access, organize, analyze and synthesize unstructured data. MIAS Mission: • develop the theories, algorithms, and tools for analysts to • access a variety of data formats and models • integrate them with existing resources • transform raw data into useful and understandable information.

  3. Task Perspective • In the next decade, intelligence analysts will need to • monitor a huge number of interesting events and entities • formulate and evaluate hypotheses with respect to them. • Analysts must interact, at the appropriate level of semantic abstraction, with a system that can • synthesize, summarize and interpret vast amounts of multimodal information, • integrate observed data with domain models and background information in multiple formats, • propose hypotheses, and help verify them.

  4. Scenario • Consider an intelligence analyst researching a problem • Iranian nuclear program – generate a list of Iranian nuclear scientists, affiliations, specialties, biographies, photos, and notable recent activities. • Medical treatment – what is known about it; who are the experts; what do users say about it; what side effects have been reported • Current technologies have solved the problem of collecting and storing huge amounts of information; it would be reasonable to assume that the information she is after does exist; • However, multiple barriers exist on the way to a successful completion of the analysis, each posing a significant research challenge.

  5. Multimodal Information Access & Synthesis Saeed Zakeri Saeed Zakeri Online Data Sources Focused Multimodal Data Retrieval Web pages attended * * News Articles Field reports Specific Web sites Text Repositories Relational databases Surveillance Videos ... Discover unusual events, entities, and associations. Continuous monitoring of events, entities & associations Rapid retrieval of all info. about a particular entity Efficient search, querying, question answering, browsing, mining, * * * * * visited Elkhan Factory Text documents U. of Tehran * * Elkhan Factory * * * * * * Relational data loc = Northern Iran name = Elkhan Factory topics = fertilizer, enrichment Semantic categories Temporal categories Subjectivity/Opinions Images Infer Metadata: Semantic entities Discover Relations Between Semantic Entities Support Information Analysis, Knowledge Discovery, Monitoring Semantic Disambiguation & Integration across multiple sources and modalities

  6. MIAS Processes • Focused data retrieval and integration • Identify and collect relevant data from multiple sources • Semantic data enrichment • Infer semantics from unstructured data and images; • Allow navigation and search across disparate data modalities; augment KBs • Entity identification and relations discovery • Identify real-world entities and relations among them • Relate them to existing institutional resources for information integration • Knowledge discovery and hypotheses generation and verification • Construct the rich semantic structure and hidden networks of entity linkages • Foundations • Machine learning, database and data mining, natural language processing, inference and optimization and computer vision techniques • Required for and driven by the aforementioned problems.

  7. Integrated Mission: Research & Education • Develop diverse human resources to enhance the scientific research, educational, and governmental workforce in MIAS • Educational and Outreach Initiatives: • Target students from small research programs & minority-serving • Expose them to the national labs • Open opportunities for bigger impact • A comprehensive education program designed to increase participation in the study and practice of MIAS topics: • Provide substantive training for a new generation of experts in the field, • Serve as a tool for recruiting an experienced group of undergraduates into graduate study in one of the broad fields of information science • Be an intellectual community center, where participants at all levels of expertise come together in an enriched environment of collaboration.

  8. Data Science Summer Institute at UIUC 1st DSSI: May 2007 Huge Success • Intensive Course • in • The Math of Data Sciences • Probability and Statistics • Linear Algebra • Data Structures and Algorithms • Optimization • Learning & Clustering • Research Projects • Led by co-PIs and Grad Students • Topics: • Virtual Web: Focused Crawling • Relations and Entities • Text and Images 8 weeks course 25 students Faculty from UIUC, Kansas State, UTSA, UTEP Advanced MIAS Related Tutorials Speaker Series

  9. Machine Learning & Data Mining Databases & Information Integration Computer Vision Natural Language Processing & Information Extraction Information Retrieval & Web Information Access Data Science Summer Institute at UIUC Advanced MIAS Related Tutorials

  10. Data Science Summer Institute at UIUC

  11. Our Team • Leading researchers in intelligent information access & analysis and its foundations • A large number of affiliates/consultants covering all areas of interest to the MIAS center.

  12. Kevin C. Chang: Data Integration and Retrieval Deep Web MetaQuerier: Large-scale integration over deep Web • pioneered the “holistic integration” paradigm • Widely published at SIGMOD, VLDB, ICDE • System building and demo at CIDR, SIGMOD, VLDB, … • PC/editor/organizers of SIGMOD, ICDE, WWW, SIGKDD special issue, WIRI06, IIWeb06 workshops • Awards: NSF CAREER, IBM Faculty, NCSA Faculty VLDB00 Best Paper Selection

  13. AnHai Doan: Data Integration • Data integration • Matching Schemas, Ontologies, Entities • Integrating databases and text • ACM Doctoral Dissertation Award 2004 • Sloan Fellowship 2006 • Edited special issues on Data Integration • Co-chaired workshops on data integration, Web technologies, machine learning • Co-writing a book titled “Data Integration” Monitoring People and Events

  14. David Forsyth: Computer Vision and Learning Linking Text and Images Labeling images via (a lot of) caption text • Leading Computer Vision researcher, • Over 110 papers on vision, graphics, learning applications • Program Chair, CVPR 2000, General chair CVPR 2006, regular member of PC in all major vision conferences • IEEE Technical Achievement Award, 2006 • Lead author of main textbook, widely adopted

  15. Jiawei Han: Data Mining • Mining patterns and knowledge discovery from massive data • Research focus: Mining data streams, frequent patterns, sequential patterns, graph patterns, and their applications • Privacy preserving Data Mining • Developed many popular data mining algorithms, e.g., FPgrowth, PrefixSpan, gSpan, StarCubing, CrossMine, RankingCube, and CrossClus • Over 300 research papers published in conferences and journals • Editor-in-Chief, ACM Transactions on Knowledge Discovery from Data • Textbook, “Data mining: Concepts and Techniques,” adopted worldwide

  16. Cinda Heeren: Data Mining, Education • MIAS Summer School Director • Mathematical Foundation of Data Science Discrete UIUC CS Department Director of Diversity Programs • Lecturer for Discrete Math, Data Structures courses at UIUC • One of the leading teachers and educational leaders at UIUC. • Research in Data Mining and Data bases • Speaker and regular presenter at conferences for young women, including: • GAMES and WYSE summer camps • Expanding Your Horizons careers conference • Grace Hopper Celebration of Women in Computing 2004 - panel on best practices for recruiting women into undergraduate programs in CS. • SIGCSE 2006 - workshop, “How to host your own Small Regional Celebration of Women in Computing.” Summer School

  17. ChengXiang Zhai: Information Retrieval and Text Mining • Probabilistic Paradigms for Information Retrieval • Personalized/Context Dependent Search • Relation Identification and Data Integration • Leading expert in information retrieval and search technologies • Recipient of the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), • Main architect and key contributor of the Lemur Toolkit (being used by many research groups and IR companies around the world) • ACM SIGIR’04 best paper award • Selected services include Program Chair of ACM CIKM 2004; HLT/NAACL 06; SIGIR’09

  18. Dan Roth: Machine Learning, NLP • Semantic Analysis and Data enrichment • Entity and Relation Identification and Integration • Textual Entailment • Machine Learning Methods for NLP and IE • Leading Researcher in Machine Learning, NLP, AI • Developed Popular Machine Learning system and machine learning based NLP tools used in industry and NLP classes. • Program Chair, ACL’03, CoNLL’02; Regular senior PC member in all major Machine Learning, NLP and AI conferences • Associate Editor: Journal of Artificial Intelligence Research; Machine Learning • Multiple papers awards

  19. Information Access & Synthesis Processes Tools Text Processing& Analysis Semantic Analysis & Information Extraction Information Integration Machine Learning & Data Mining Integrating Text & Images • Focused data retrieval and integration, • Identify and collect relevant data from multiple sources • Semantic data enrichment, • Infer semantics from unstructured data and images; • Allow navigation and search across disparate data modalities; augment KBs • Entity identification and relations discovery, • Identify real-world entities and relations among them • Relate them to existing institutional resources for information integration • Knowledge discovery and hypotheses generation and verification, • Construct the rich semantic structure and hidden networks of entity linkages • Foundations • Machine learning, database and data mining, natural language processing, inference and optimization and computer vision techniques • Called for and driven by the aforementioned problems.

More Related