slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Knowledge Management PowerPoint Presentation
Download Presentation
Knowledge Management

Loading in 2 Seconds...

play fullscreen
1 / 38

Knowledge Management - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Knowledge Management. Speaker Prof. Sudeshna Sarkar Computer Science & Engineering Department, Indian Institute of Technology Kharagpur,Kharagpur sudeshna@cse.iitkgp.ernet.in. Indo-German Workshop on Language technologies AU-KBC Research Centre, Chennai. Research Activities .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Knowledge Management' - waite


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Knowledge Management

  • Speaker
      • Prof. Sudeshna Sarkar
      • Computer Science & Engineering Department,
      • Indian Institute of Technology Kharagpur,Kharagpur
  • sudeshna@cse.iitkgp.ernet.in

Indo-German Workshop on Language technologies

AU-KBC Research Centre, Chennai

research activities

Research Activities

Department of Computer Science & Engineering

College of Engineering, Guindy

Chennai – 600025

Participant : Dr.T.V.Geetha

Other members: Dr. Ranjani Parthasarathi

Ms.D. Manjula

Mr. S. Swamynathan

slide3

Knowledge Management, Semantic Web Retrieval - Possible Areas of cooperation

  • Semantic Based Approaches to Information Retrieval Extraction
    • Cognitive Approaches to  Semantic Search Engines with user profiles and user perspective
    • Multilingual Semantic Search Engines – use of an intermediate representation like UNL
    • Goal based Information Extraction from semi-structured Documents – use of ontology
    • Information Extraction and its Visualization – development of time line visualization of documentsContacted: Dr. Steffen StaabUniversity of KarlsruheInstitute of Applied Informatics and Formal Description MethodsCore Competencies:  Knowledge Management
slide4

Knowledge Management, Web Services - Work done in the area

  • Design and implementation of Reactive Web Services using Active Databases
    • Design and Implementation of Rule Engine
    • Design and implementation of complex rules to tackle client and server side semantics of the rule engine.
    • Development of intelligent web services for E-commerce.  
    • Extension to tackle multiple and cooperative web service environments.
slide5

Knowledge Management, Web Services - Possible Areas of cooperation

  • Formalization and Description of Web Service Semantics using Semantic Web
  • Introspection between Web Service
  • Personalization of Web Services
  • Rating of Web Services

Contacted: Dr. Steffen StaabUniversity of KarlsruheInstitute of Applied Informatics and Formal Description MethodsCore Competencies:  Knowledge Management

natural language processing knowledge representation possible areas of cooperation
Natural Language ProcessingKnowledge Representation Possible Areas of cooperation
  • Knowledge Representation Architecture based on Indian Logic
  • Argumentative Reasoning Models based on Indian Logic
  • Knowledge representation and interpretation strategies based on Indian sastras like Mimamsa
  • Building Domain Ontologies based on above architecture
  • Knowledge Management based on above approaches
  • Contacted: Prof. Dr. Gerd UnruhUniversity of Applied Sciences FurtwangenDepartment of Informatics

Core Competencies: WordNet, Data bases

utkal university we work on

Utkal UniversityWe Work On

Image Processing

Speech Processing

Knowledge Management

utkal university we work on8

Utkal UniversityWe Work On

Image Processing

Speech Processing

Knowledge Management

knowledge management
Knowledge Management
  • Machine Translation

Normal sentences with WSD

  • Lexical Resources
  • (A) e-Dictionary (Oriya EnglishHindi) – Got IPR. and Tested by SQTC, ETDC Banglore 27,000 Oriya, 30,000 English and 20,000 Hindi words.
  • (B) Oriya WordNet withMorphological Analyzer.
  • Got IPR. , Tested by SQTC, ETDC, Banglore -1,000 Lexicon.
  • (C) Ori-Spell (Oriya Spell Checker)
  • Got IPR , Tested by SQTC, ETDC Banglore, 1,70,000 words (root and derived).
  • (D) Trilingual Word Processor (Hindi- English-Oriya)
  • Integrated with Spell Checker and Grammar Checker.
utkal university we work on10

Utkal UniversityWe Work On

Image Processing

Speech Processing

Knowledge Management

knowledge management11
Knowledge Management
  • Machine Translation

Normal sentences with WSD

  • Lexical Resources
  • (A) e-Dictionary (Oriya EnglishHindi) – Got IPR. and Tested by SQTC, ETDC Banglore 27,000 Oriya, 30,000 English and 20,000 Hindi words.
  • (B) Oriya WordNet withMorphological Analyzer.
  • Got IPR. , Tested by SQTC, ETDC, Banglore -1,000 Lexicon.
  • (C) Ori-Spell (Oriya Spell Checker)
  • Got IPR , Tested by SQTC, ETDC Banglore, 1,70,000 words (root and derived).
  • (D) Trilingual Word Processor (Hindi- English-Oriya)
  • Integrated with Spell Checker and Grammar Checker.
km sanskrit
KM(Sanskrit)
  • San-Net(Sanskrit Word-Net)
  • Developed using Navya-NyAya()Philosophy and Paninian Grammar
  • Beside Synonym, Antonym, Hypernym, Hyponym, Holonym and Meronyms etc., some more relation such as: Analogy, Etymology, Definition, Nominal Verb, Nominal Qualifier, Verbal Qualifier and Verbal Noun have been introduced in San-Net.
  • San-Net can be used for Indian language understanding, translating, summarizing and generating.
  • A standard Knowledge Base (KB) has been developed for analyzing syntactic, semantic and pragmatic aspects of any lexicon.
utkal university we work on13

Utkal UniversityWe Work On

Image Processing

Speech Processing

Knowledge Management

knowledge management14
Knowledge Management
  • Machine Translation

Normal sentences with WSD

  • Lexical Resources
  • (A) e-Dictionary (Oriya EnglishHindi) – Got IPR. and Tested by SQTC, ETDC Banglore 27,000 Oriya, 30,000 English and 20,000 Hindi words.
  • (B) Oriya WordNet withMorphological Analyzer.
  • Got IPR. , Tested by SQTC, ETDC, Banglore -1,000 Lexicon.
  • (C) Ori-Spell (Oriya Spell Checker)
  • Got IPR , Tested by SQTC, ETDC Banglore, 1,70,000 words (root and derived).
  • (D) Trilingual Word Processor (Hindi- English-Oriya)
  • Integrated with Spell Checker and Grammar Checker.
km sanskrit15
KM(Sanskrit)
  • San-Net(Sanskrit Word-Net)
  • Developed using Navya-NyAya()Philosophy and Paninian Grammar
  • Beside Synonym, Antonym, Hypernym, Hyponym, Holonym and Meronyms etc., some more relation such as: Analogy, Etymology, Definition, Nominal Verb, Nominal Qualifier, Verbal Qualifier and Verbal Noun have been introduced in San-Net.
  • San-Net can be used for Indian language understanding, translating, summarizing and generating.
  • A standard Knowledge Base (KB) has been developed for analyzing syntactic, semantic and pragmatic aspects of any lexicon.
present interest
Present Interest
  • Sanskrit WordNet based Machine Translation System
  • Morphological Analyser for Sanskrit
  • Navya Nyaya Philosophy to be extensively used for it.
  • Help to have better WSD as NNP provides a effective Conceptual analysisng capability.
natural language processing group

Natural Language Processing Group

Computer Sc. & Engg. Department

JADAVPUR UNIVERSITY

KOLKATA – 700 032, INDIA.

Professor Sivaji Bandyopadhyay

sivaji_ju@vsnl.com

cross lingual information management
Cross-lingual Information Management
  • Multilingual and Cross-lingual IR
    • A Cross Language Database (CLDB) System in Bengali and Hindi developed
    • Natural language query analyzed using a Template Grammar and Knowledge Bases to produce the corresponding SQL statement
    • Cooperative response in the query language
    • Anaphora / Coreference in CLDB studied
    • Database updates and elliptical queries also supported
cross lingual information management19
Cross-lingual Information Management
  • Open Domain Question Answering
    • Work being done for English
    • Currently building a set of question templates (Qtargets) and the corresponding Answer patterns with relative weights
    • Input question analyzed to produce the corresponding question template
      • Appropriate answer pattern retrieved
      • Answer generated using the input document and the synthesis rules of the language
slide20

Search and Information Extraction Lab

IIIT Hyderabad

Search and Information extraction lab focuses

building technologies for Personalized, customizable and highly relevant information retrieval and extraction systems The vertical search or the domain specific search, when combined with the personalization aspects, will drastically improve the quality of search results.

slide21

Current work includes on building search engines that are vertical portals in nature. It means that they are specific to a chosen domain aiming at producing highly quality results (with high recall and precision). It has been realized in the recent past that it is highly difficult to build a generic search engine that can be used for all kinds of documents and domains yet produce high quality results. Some of the tasks that are involved in building domain specific search engines include to have representation of the domain in the form of ontology or taxonomy, ability to “deeply understand” the documents belonging to that domain using techniques like natural language processing, semantic representation and context modeling. Another area of immediate interest for English pertains to summarization of documents. Work is also going-on on text categorization and clustering.

slide22

The development makes use of the basic technology already developed for English, as well as for Indian languages pertaining to word analyzers, sentential parsers, dictionaries, statistical techniques, keyword extraction, etc. These have been woven in a novel architecture for information extraction.

Knowledge based approaches are being experimented with. The emphasis is on using a combination of approaches involving automatic processing together with handcrafting of knowledge. Applications to match extracted information from documents with given specifications are being looked at. For example, a given job requirement could be matched with resumes (say, after information is extracted from them).

A number of sponsored projects from industry and government are running at the Center in this area. A major knowledge management initiative in the areas of eGovernance is also being planned.

slide23

We are building search engines and named entity extraction tools specifically for Indian context. As a test bed, we are building an experimental system codenamed as PSearch (http://nlp.iiit.net/~psearch).

SIEL is also actively developing proper name gazetteers to cover the commonly used names of people, places, organizations etc in the Indian news media for various languages. These resources will help in the information extraction, categorization, and machine translation activities.

For further information and details please email to vv@iiit.net

slide24

Efforts in Language & Speech Technology

Natural Language Processing Lab

Centre for Development of Advanced Computing

(Ministry of Communications & Information Technology)

‘Anusandhan Bhawan’,

C 56/1 Sector 62, Noida – 201 307, India

karunesharora@cdacnoida.com

gyan nidhi parallel corpus
Gyan Nidhi : Parallel Corpus

‘GyanNidhi’ which stands for ‘Knowledge Resource’ is parallel in 12 Indian languages , a project sponsored by TDIL, DIT, MC &IT, Govt of India

gyan nidhi multi lingual aligned parallel corpus
Gyan Nidhi: Multi-Lingual Aligned Parallel Corpus

What it is?The multilingual parallel text corpus contains the same text translated in more than one language.

What Gyan Nidhi contains?GyanNidhi corpus consists of text in English and 11 Indian languages (Hindi, Punjabi, Marathi, Bengali, Oriya, Gujarati, Telugu, Tamil, Kannada, Malayalam, Assamese). It aims to digitize 1 million pages altogether containing at least 50,000 pages in each Indian language and English.

Source for Parallel Corpus

  • National Book Trust India
  • Sahitya Akademi
  • Navjivan Publishing House
  • Publications Division
  • SABDA, Pondicherry
slide28

Gyan Nidhi: Multi-Lingual Aligned Parallel Corpus

Platform : Windows

Data Encoding : XML, UNICODE

Portability of Data : Data in XML format supports various platforms

Applications of GyanNidhi

Automatic Dictionary extraction

Creation of Translation memory

Example Based Machine Translation (EBMT)

Language research study and analysis

Language Modeling

tools prabandhika corpus manager
Tools: Prabandhika: Corpus Manager
  • Categorisation of corpus data in various user-defined domains
  • Addition/Deletion/Modification of any Indian Language data files in HTML / RTF / TXT / XML format.
  • Selection of languages for viewing parallel corpus with data aligned up to paragraph level
  • Automatic selection and viewing of parallel paragraphs in multiple languages
    • Abstract and Metadata
    • Printing and saving parallel data in Unicode format
tools vishleshika statistical text analyzer
Tools: Vishleshika : Statistical Text Analyzer
  • Vishleshika is a tool for Statistical Text Analysis for Hindi extendible to other Indian Languages text
  • It examines input text and generates various statistics, e.g.:
      • Sentence statistics
      • Word statistics
      • Character statistics
  • Text Analyzer presents analysis in Textual as well as Graphical form.
sample output character statistics
Sample output: Character statistics

Above Graph shows that the distribution is almost equal in Hindi and Nepali in the sample text.

Most frequent consonants in the Hindi

Most frequent consonants in the Nepali

Results also show that these six consonants constitute more than 50% of the consonants usage.

slide35

AU-KBC Research Centre

Knowledge Management

Information Retrieval / Information Extraction

slide36

IE in Partially structured data

Information extraction on partially structured domain dependent data is done for IB.

The sample data was in criminal domain.

This is a rule based system and the rules are hand crafted.

There are various dictionaries for places, events and the basic verbs which are used by the rules.

The dictionary can be dynamically updated.

The template is pre-defined.

slide37

Example:

Event : An exchange of fire took place between the police and CPML-PW extremists ( 2 ) at Basheera ( Kamarpally mandal/district Nizamabad/January 9 ) resulting in the death of a DCM of the outfit . The police also recovered wireless sets ( 2 ) , hand-grenade ( 1 ) and revolver ( 1 ) from the site .

Participant 1 = police

Participant 2 = CPML-PW_extremists

No of Participant 2 = ( 2 )

Material = revolver

Date = January 9 2002

Police Station = Nizamabad

Mandal = Kamarpally

District = Nizamabad

Event = exchange of fire

slide38

IE in Unstructured data

Information extraction on Unstructured, domain dependent data is done in online matrimonial.

The sample data was take from The Hindu online matrimonial.

This is a rule based system and the rules are hand crafted. Linguistic rules as well heuristic rules play a major role in this.

There are various dictionaries for cast, religion, language etc. Which are used by the system.

The template to be filled up is static and pre-defined.