1 / 59

Agro Explorer & UNL

Agro Explorer & UNL. CS 671 ICT For Development 19 th Sep 2008. Vishal Vachhani CFILT and DIL, IIT Bombay. Agro Explorer A Meaning Based Multilingual Search Engine. Introduction to aAqua. Web-site for Indian farmers Farmers can submit their problems related to their crops

lawanda
Download Presentation

Agro Explorer & UNL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agro Explorer & UNL CS 671 ICT For Development 19th Sep 2008 Vishal Vachhani CFILT and DIL, IIT Bombay

  2. Agro Explorer A Meaning Based Multilingual Search Engine Vishal Vachhani

  3. Introduction to aAqua • Web-site for Indian farmers • Farmers can submit their problems related to their crops • Queries are answered by Agricultural Experts at KVK, Baramati • Languages supported: Marathi, Hindi, English Vishal Vachhani

  4. Why Need Multilingual Search • Vast Amount of Information available on the Web • Almost 70% of the Information is in English • The Indian rural populace is not English-Literate •  “A Big Language Barrier” • Information has to be made available to them in their local languages. Vishal Vachhani

  5. Why Need Meaning Based Search • Most of the current Search Engines are Keyword Based. • They do not consider the semantics of the query • The result set contains a large number of extraneous documents. • Search based on the Meaning of the query will help narrow down on the desired information quickly. Vishal Vachhani

  6. System Query in Hindi English Document search Marathi Document Result in Hindi English Document Multilinguality Vishal Vachhani

  7. Meaning Based Search Same Keywords Different Semantics Moneylenders Exploit Farmers Farmers Exploit Moneylenders Found 1 Result Found 0 Result Vishal Vachhani

  8. Agro Explorer System • Provides both • Meaning Based Search • Cross-Lingual Information Access Vishal Vachhani

  9. System Architecture Vishal Vachhani

  10. Vishal Vachhani

  11. Vishal Vachhani

  12. Vishal Vachhani

  13. Vishal Vachhani

  14. Vishal Vachhani

  15. Conclusion • Provides two independent features • Multi-Linguality • Meaning Based Search. • Because of UNL both multi-lingual and meaning based properties can be incorporated together rather than using separate language translators in search engines. • The scheme admits itself to Integration of multiple languages in a seamless, scalable manner. Vishal Vachhani

  16. UNL Universal Networking Language Vishal Vachhani

  17. UNL System Hindi English French UNL Marathi Tamil Vishal Vachhani

  18. Approaches of MT System • Direct translation - translation will be done directly - N*(N-1) translator are needed for N languages translation. • Intermediate Language - intermediate language will be used for language translation - Only 2*N translators are required. Vishal Vachhani

  19. UNL : Interlingua • UNL is an acronym for “Universal Networking Language”. • UNL is a computer language that enables computers to process information and knowledge across the language barriers. • UNL is a language for representing information and knowledge provided by natural languages • Unlike natural languages, UNL expressions are unambiguous. Vishal Vachhani

  20. UNL : Interlingua • Although the UNL is a language for computers, it has all the components of a natural language. • It is composed of Universal Words (UWs), Relations, Attributes. • Knowledge :semantic graph • Nodes  concepts • Arcs  relation between concepts Vishal Vachhani

  21. Universal Words (UWs) • A UW represents simple or compound concepts. There are two classes of UWs: • unit concepts • compound structures of binary relations grouped together ( indicated with Compound UW-Ids) • A UW is made up of a character string (an English-language word) followed by a list of constraints. • <UW>::=<Head Word>[<Constraint List>] • example • state(icl>express) • state(icl>country) Vishal Vachhani

  22. Relations • A relation label is represented as strings of 3 characters or less. • The relations between UWs are binary. • rel (UW1, UW2) • They have different labels according to the different roles they play. • At present, there are 46 relations in UNL • For example, agt (agent), ins (instrument), pur (purpose), etc. Vishal Vachhani

  23. Attribute Labels • Attribute labels express additional information about the Universal Words that appear in a sentence. • They show what is said from the speaker’s point of view; how the speaker views what is said. (time, reference, emphasis, attitude, etc) • @entry, @present, @progressive, @topic, etc. Vishal Vachhani

  24. UNL : Interlingua Example: Ram eats rice. {unl} agt(eat.@entry.@present, Ram) obj(eat.@entry.@present, rice(icl>eatable)) {/unl} Vishal Vachhani

  25. UNL as graph eat plc agt rice Ram Vishal Vachhani

  26. UNL : Interlingua Example: The boy who works here went to school. {unl} agt(go(icl>move).@entry.@past, :01) plt(go(icl>occur).@entry.@past,school(icl>institution)) agt:01(work(icl>do), boy(icl>person.@entry)) plc:01(work(icl>do),here) {/unl} Vishal Vachhani

  27. UNL as graph go agt plt :01 work school plc agt here boy Vishal Vachhani

  28. Intermediate Language Enconvertor Source language Intermediate Language Deconvertor target language Vishal Vachhani

  29. DeConverter • It’s a Language Independent Generator • It can deconvert UNL expressions into a variety of native languages, using a number of linguistic data such as Word Dictionary, Grammatical Rules of each language. • The DeConverter transforms the sentence represented by a UNL expression into Natural language sentence. Vishal Vachhani

  30. DeConverter Block Diagram Vishal Vachhani

  31. Block diagram of the natural language generator Dictionary Case Marking Rules Morphology Rules Syntax Planning Rules HindiDoc UNLDoc UNL Parser Case Marking Module Morphology Module Syntax Planning Module Language dependent Module Language Independent Module Vishal Vachhani

  32. UNL Parser UNL parser module will do following tasks Check input format of UNL document Separate attributes form UWs Separate attributes form dictionary entries Replace UWs with Hindi root words

  33. Case Marking Module • Category of morpho-syntactic properties which distinguish the various relations that a noun phrase may bear to a governing head. • ने, पर ,के, से, पे,etc. • A rule base based on : • UNL attributes • lexical attributes from dictionary Vishal Vachhani

  34. Case Marking Module. • Case marking is implemented using rules. • We analyze all UNL as well as dictionary attributes and decide next and previous case marker. • Also we use relation with parent to extract the right case mark. Vishal Vachhani

  35. Rule for Case Marking • agt:null:null:null:ने:@past#V:VINT:N:null • Structure • relName : • parent previous case marker: • parent next case marker: • child previous case marker: • child next case marker: • the rest four are in form of • attr'REL'relationname • and attr will be separated by # • also relation name are separated by # Vishal Vachhani

  36. Morphology Module • What is Morphology • Study of Morphemes • Their formation into words, including inflection, derivation and composition Vishal Vachhani

  37. Types of Morphology • Noun, Verb and Adjective Morphology • Depends on the phonetic properties of the Hindi word • Noun Morphology • Depends on gender, number and vowel ending of the noun • Adjective Morphology • अच्छा लडका, अच्छी लडकी, अच्छे लडके • adjective अच्छ changes, lexical attribute “AdjA” • Verb Morphology • Depends upon tense, gender, number , person etc. Vishal Vachhani

  38. Verb Morphology • Verbs are categorized by • Tense (past,present,future) • Gender(male,female) • Person (1st , 2nd , 3rd ) • Number (sg,pl) • Example • Ladaka khana kha raha hai. • It contains present continuous tense,male, sg, and 3rd person Vishal Vachhani

  39. Syntax planing • Arranging word according to the language structure • Rule based module • It is priority based graph traversal Vishal Vachhani

  40. General strategy Algorithm for Syntax Planning: 1) Start traversing the UNL graph from the entry node. 2) If node has no children then add this node to final string. 3) If there is more than one child of one node then sort children based on the priority of the relations. Relation having highest priority will be traversed first. 4) Mark that node as visited node. 5) Repeat steps 3 and 4 until all the children of that node get visited. 6) If all the children of that node get visited then add that node to final string. 7) Repeat steps 2 to 4 until all the nodes get traversed. Vishal Vachhani

  41. U-3 obj:17 man:9 mod:5 qua:5 spray obj man solution also mod mod percent Neemark qua 5 Example • Also, spray 5% Neemark solution. Vishal Vachhani

  42. Flow Entry spray Vishal Vachhani

  43. Flow Entry spray obj man Vishal Vachhani

  44. Flow Entry spray obj:17 man:9 Vishal Vachhani

  45. Flow Entry spray obj:17 man:9 solution Vishal Vachhani

  46. Flow Entry spray obj:17 man:9 solution mod mod Vishal Vachhani

  47. Flow Entry spray obj:17 man:9 solution mod:5 mod:5 Vishal Vachhani

  48. Flow Entry spray obj:17 man:9 solution mod:5 mod:5 percent Vishal Vachhani

  49. Flow Entry spray obj:17 man:9 solution mod:5 mod:5 percent Vishal Vachhani

  50. Flow Entry spray obj:17 man:9 solution mod:5 mod:5 percent qua:5 Vishal Vachhani

More Related