Data mining and text analytics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 11

Data Mining and Text Analytics PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on
  • Presentation posted in: General

Data Mining and Text Analytics. Quranic Arabic Corpus. By Saima Rahna & Anees Mohammad. Summary. Quranic Arabic corpus enables further analysis of the Quran Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax

Download Presentation

Data Mining and Text Analytics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data mining and text analytics

Data Mining and Text Analytics

Quranic Arabic Corpus

By Saima Rahna & Anees Mohammad


Summary

Summary

Quranic Arabic corpus enables further analysis of the Quran

Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax

Automated algorithms were used in the Quran.


Introduction

Introduction

Islam was born in Arabia (1400 years ago)

The key sacred texts are in Arabic

Only a minority Muslims can speak and understand Arabic

A larger percentage of Muslims know English as a second language or even first

Web resources and book resources use English in parallel with Arabic.


Data mining

Data Mining

Uses tools and techniques to extract data

Different aspects of a single topic in the Quran can reappear in many chapters

Therefore frequent patterns can be used to construct a subjective index where all versus on a single topic can be covered easily.


Text analytic

Text Analytic

Referred to as information extraction

The Quranic corpus is an advantage to those who don't understand Arabic

Can give the English readers a better insight into the source

The translation is at a detailed text Analytic level


Resources techniques

Resources & Techniques

Statistical techniques

Implementing statistical techniques such as keyword extraction

Can explore semiotic relationships between sound and meaning in the Quran

Recognise reoccurring patterns

Recognise reoccurring patterns for high level of accuracy

Linguistic resource

Arabic grammar and syntax used for each word in the quran

A comment based system used online for visitors to discuss and correct the data.


Algorithms

Algorithms

Quranic Arabic Corpus used Java to implement their algorithms.

Search feature

(searching concepts and key words in the Holy Quran)

Finding multi-word repetitions

Mining frequent patterns to a graph.


Algorithm for indexing the quran

Algorithm for indexing the Quran

When a word is encountered for the first time, it is added to the index; if it already exists there, then a new location is added to its list.

For each verse V

parse word list -> list(W)

For each word W

If INDEX contains W is false

add W and W.location to Index

Else

fetch W in INDEX

add new location to W


Filtering algorithm

Filtering algorithm

The Quranic 'quote filtering' algorithm

The Quran has the use of Arabic diacritics (symbols)

The filtering algorithm has 3 filtering stages after making the input text.

Algorithm-Sub path Mining

This is used to generate frequent patterns within the Quran corpus

The process starts by scanning the transaction database, calculating the count for each vertex in the graph


Conclusion

Conclusion

Algorithms used

Resources and techniques used for

implementation of the Quranic Arabic corpus

How data mining is applied

How text analytic has also been applied


Data mining and text analytics

Thank you

:-)


  • Login