1 / 34

Database Research Group - Modernization of Systems Information Retrieval Data Mining Data Management

The Database Research Group at Tehran University focuses on modernizing systems for information retrieval, data mining, and data management. They also offer related courses and have created a Persian Corpus for natural language processing research.

balley
Download Presentation

Database Research Group - Modernization of Systems Information Retrieval Data Mining Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hot News Reporter: HosseinKamyar Asefpoormasoomi Supervisor Dr. MohsenKahani

  2. Tehran University • Database Research Group • Natural Language and Text Processing Group

  3. Database Research Grouphttp://ece.ut.ac.ir/dbrg Members : Faculty Staff : 8 Students : 9 Alumni : 17 Dr.Caro Lucas Dr.BehzadMoshiri Dr. RohaniRankouhi

  4. Database Research Group Research Project: Modernization Of Systems Information Retrieval Data Mining Data Management

  5. Database Research Group Research Project: Modernization Of Systems Information Retrieval Data Mining Data Management

  6. Database Research Group Research Project: Modernization Of Systems Information Retrieval Data Mining Data Management

  7. Database Research Group Industrial Project Industrial Project Industrial Project

  8. Database Research Group Related Course: 1. Introduction to Database Systems 2. Advanced Database Systems 3. Special Topics in Database Systems 4. Database Laboratory 5. Data Mining 6. Information Retrieval 7. Natural Language Processing

  9. Database Research Group Persian Corpus Hamshahri Corpus • نسخه 1 رسمی مجموعه همشهری توسط برگزارکنندگان CLEFنگهداری و توزیع می‌شود. این مجموعه در CLEF2008 و CLEF2009استفاده شده است و 100 پرس‌و‌جو دارد. • نسخه 2 مجموعه همشهری در سال 1388 توسط سامانه UTIREدر گروه تحقیقاتی پایگاه داده دانشگاه تهران و بر اساس استاندارد TRECتهیه شده است.

  10. Database Research Group Persian Corpus Bijankhan Corpus Bijankhan corpus is a tagged corpus that is suitable for natural language processing research on the Persian (Farsi) language. This collection is gathered form daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural and so on. Totally, there are 4300 different subjects. The Bijankhan collection contains about 2.6 millions manually tagged words with a tag set that contains 40 Persian POS tags.

  11. Database Research Group PersianCorpus مجموعه محک وب dotIR • این مجموعه حاصل از خزش وب در حوزه .irشامل یک میلیون سند ایجاد شد. سپس با استفاده از نرم‌افزار ابداعی UTIRE تعداد 50 پرس‌و‌جو توسط 25 کاربر ساخته شدند. این پرس‌و‌جو‌ها برای جستجوی مجموعه مورد استفاده قرار گرفتند و صفحات بازیابی شده، شامل مجموع 18424 سند (بطور متوسط 369 سند برای هر پرس‌و‌جو)، توسط همان 25 کاربر مورد قضاوت قرار گرفتند. بدین ترتیب اسناد مرتبط با هر پرس‌و‌جو مشخص گردید. • بعلاوه برای بررسی و مقایسه الگوریتم‌های رتبه‌بندی در فعالیتی موازی تعداد 56 ویژگی از اسناد بازیابی شده برای هر پرس‌و‌جو بر اساس استاندارد LETOR (ارائه شده توسط Microsoft Research Asia ) استخراج شدند. محققان گرامی می‌توانند از بردارهای مقدار ویژگی، ارتباط برای مقایسه الگوریتم‌های پیشنهادی خود برای رتبه‌بندی و یا آموزش و تنظیم الگوریتم‌ها سود ببرند. • این پروژه توسط مرکز تحقیقات مخابرات ایران و آزمایشگاه پایگاه داده دانشگاه تهران پشتیبانی شده است.

  12. Natural Language and Text Processing Group Members: 10 members HeshaamFaili [Assistant Professor, Ph.D. Artificial Intelligence from Sharif University of Technology]

  13. Natural Language and Text Processing Group Research Project: More Than 23 Papers ?

  14. Natural Language and Text Processing Group Industrial Project Industrial Project Industrial Project • تشخیص و تصحیح خطاهای تایپی، دستوری و معنایی • قابلیت نصب بر روی ویرایشگر متداول word • قابلیت یادگیری و ارتقاء عملکرد به صورت خودکار • دقیق و کارآمد • رایگان

  15. Natural Language and Text Processing Group Persian Corpus 1. TEP: Tehran English-Persian Parallel Corpus • First free Eng-Per corpus • 4-million tokens on each side • Sentence Aligned 2. TMC: Tehran Monolingual Corpus • Largest freely available monolingual corpus for Persian language • Tokenized • Suitable for Language Modeling 3. Mutual Information http://ece.ut.ac.ir/nlp/resources.html

  16. Natural Language and Text Processing Group Related Course: Introduction to Natural Language Processing, Dr. HeshaamFaili Advanced Database Systems

  17. Beheshti University shahid • Natural Language Processing research laboratory was founded by Dr. MehrnoushShamsfardat the beginning of 2006 in computer engineering department of ShahidBeheshti University • More Than 25 members. • More Than 92 papers. http://nlp.sbu.ac.ir/

  18. Research Project A. Developing Linguistic resources • Developing Semantic annotated corpus • Developing chunked corpus • Developing parallel corpus • Developing Persian Verbs database • Semi-automatic Lexicon Acquisition  Start :2006 Researchers : MalihehMonshizadeh, ElhamFekri

  19. Research Project B. Fundamental Persian text processing tools • Standard Text Preparation for Persian • Stemmer /Morphological analyzer / lemmatizer • Tokenizer • POS Tagger • Spell checker • chunker • Syntax parser • Persian Named Entity Recognition - SBUNER • Persian Anaphora resolution • Semantic Role Labelling Start :2006 Researchers : Samira Noferesti, Rana Forsati, Pooneh Mortazavi, Hoda Sadat Jafari

  20. Research Project C. NLP Applications • Machine translation – PenTrans project    • English to Persian Translation System • Persian to English Translation System • Machine translation evaluation toolkit • Persian Text summarization – PARSUMIST    • Question Answering    • Persian – • English – SBUQA • Information Extraction - Mersad • Text understanding    • Conversion between Persian sentences and first order logic • Text generation Start :2006 Researchers : Chakaveh Saedi, Yasaman Motazedi, Mostafa Nazari

  21. Research Project D. Ontology engineering • Ontology development    • Development of CMMI-ACQ ontology • Collaborative development of ontology of computer science and engineering (COMON) • Fuzzy ontologies • Ontology Learning • Ontology learning from text • Ontology learning from web • Relation extraction • Ontology mapping • Evolutionary ontology matching • A linguistic-Structural Approach to Bilingual Ontology Mapping • Ontology population and instantiation Start :2006 Researchers : Aynaz Taheri, Hakimeh Fadaei, Tara akhavan, Rahim Dehkharghani, Valeh Montaghami, Bahareh Sarrafzadeh, Amir Sharifloo, Rana Forsati

  22. Research Project E. Semantic Web • Semantic Annotation of documents     • Converting web documents into semantic web resources    • Semantic search    • Semantic web service discovery and composition Start :2006 Researchers : Bahareh Sarrafzadeh, Hoda Mirzaie, Maryam Haghollahi, Homan Farrokhzad

  23. Research Project F. Hybrids • Application of fuzzy ontologies in qualitative reasoning     • E-learning    • Ontology based Content Rearrangement for Intelligent Tutoring Systems  – OCRITS • Intelligent Content Management Start :2006 Researchers: Hamzeh Motahari, Marzieh Shariati

  24. Courseware • Ontology Engineering • Natural Language Processing • Semantic Web • Advanced Natural Language Processing, Fall 2005 BY: • Regina Barzilay and Michael Collins MIT University Columbia University

  25. Tools FarsNet The first Persian WordNet STeP-1  Standard Text Preparation for Persian Tokenizer Stemmer POS tagger Spell checker

  26. S harifUniversity • Natural Language Processing • Web Intelligence Laboratory

  27. Natural Language Processing DrghasemSani DrheshamFaili Since 2003 after three inactivity • Eliza • POS Tagger • Unsupervised Natural Grammar Induction

  28. Web IntelligenceLaboratory Supervisor: Dr Abolhasani with 28 members

  29. Web IntelligenceLaboratory Advanced Researches: • Semantic Search Engines • Semantic Web Services • Semantic web for pervasive computing • Annotation • Semantic Grids • Social Networks Analysis • Ontology Alignment and Learning • Web Clustering • Business Intelligence

  30. Web IntelligenceLaboratory New Researches: • Composite Web Service Execution Framework. • Tracking news to find hot topics. • Semantic Programming. • Trust model in Semantic Web. • New models for recommender systems. • Using web to create a lecture for a subject. • A Farsi framework for Information Retrieval. • A semantic based framework for business intelligence applications.

  31. S cience & Technology University • Unknown Laboratory • but Online POS Tagger با همکاری پروژه ی عروض تحت پشتیانی شورای عالی اطلاع رسانی http://persianp.ir/index.php?option=com_wrapper&view=wrapper&Itemid=7 http://www.prosody.ir

  32. Conferences The Cross-Language Evaluation Forum (CLEF) • developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts • (ii) creating test-suites of reusable data which can be employed by system developers for benchmarking purposes. CLEF Conferences be held since 2000 CLEF2011 will be held by Amsterdam University Computational Approaches to Arabic Script-based Languages(CAASL) CAASL2011 will be held in Geneva

  33. Corporation عصر گویش پرداز • استخراج اطلاعات آماري n-gram براي زبان فارسي • استخراج گرامر زبان فارسي • تهيه مجموعه واژگان زبان فارسي • استخراج كلمات پركاربرد زبان فارسي به تفكيك موضوعي پروژه های در حال تحقیق • مدل احتمالي کلمات تکي، دوتايي، سه‌تايي و چهارکلمه‌اي براي زبان‌هاي فارسي و انگليسي • قوانين دستوري GPSG براي زبان فارسي • دستور زبان احتمالي • پارسرهاي مناسب مدل زباني • روشهاي خوشه بندي کلمات

  34. w edo ...

More Related