1 / 13

Mtech Projects 2002

Mtech Projects 2002. Sunita Sarawagi. Sequence mining. Several real-life mining applications on sequence data Classical applications Speech, language, handwritten are all complex sequences Newer applications Bio-informatics: DNA and proteins

sumana
Download Presentation

Mtech Projects 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mtech Projects 2002 Sunita Sarawagi

  2. Sequence mining • Several real-life mining applications on sequence data • Classical applications • Speech, language, handwritten are all complex sequences • Newer applications • Bio-informatics: DNA and proteins • Telecommunication: Network alarms, network packet data • Retail data mining: Customer behavior

  3. Sequence mining: problems • Existing work scattered and application specific • Field in dire need of consolidated algorithms and software solutions • More technical details can be discussed after we finish this topic in class on March 3

  4. Sensor databases and mining • Several distributed sensors that push data to centralized database servers • Example: Automatic Vehicle Location systems consisting of sensors at bus stops, an entry in the server each time a bus passes a stop. • Goal: Build a DBMS for managing this data and supporting queries like “when is the next bus to X going to arrive”?

  5. Problems Cross-disciplinary covering several areas • A mining sub-problem: predicting arrival time based on • Previous arrival patterns of same bus • Traffic conditions derived from other buses with common routes • A database query problem: • Approximate search based on spoken queries

  6. Multi-relational data mining • Existing mining software assume data in a single relation • Real-life data over multiple relations • Existing tools rely on manual preprocessing before commencing mining, this is time-consuming and in-accurate. • Design and implement mining algorithms for multi-relational data

  7. Who should apply • Fascinated by the areas of data mining, data bases, machine learning • Want to get a flavor of cutting-edge research • Enjoyed the courses • Have a knack for algorithm design and implementation • Are wery software savvy • Wants to stretch his learning/knowledge rather than slide through with an “easy” project.

  8. Possible achievements • Understand one topic deeply, learn to innovate • Produce software that several people use • Write papers in really top-quality international conferences • Demo the software in leading international forums

  9. Industries in the area • IBM IRL • Strand Genomics • GE Capital • TCS bio-informatics • PSPL • Startups like Vistaar • Outside india: several

  10. Sample outcomes form some previous MTPs

  11. Automatic segmentation of free text records, 2000 Batch • A HMM-based address segmenter • Software licensed by a Data Cleaning company • Paper in one of the two premium database conferences • ACM SIG on Management of Data (SIGMOD) 2001, Santa Barbara USA.

  12. ICUBE – Intelligent Rollups • MTP work integrated in ICube, demo-ed at SIGMOD 2000 held in Texas, USA • Icube software adopted by a startup • Paper at the other premium database conference, VLDB 2001 held in Rome, Italy.

  13. Data deduplication using active learning • Software likely to be transferred to National Informatics Corporation, Pune • Practical application of an interesting idea from machine learning • Paper at KDD 2002 conference held in Canda • Demos at VLDB 2002 Hongkong, ICDE 2003 Bangalore

More Related