Data mining and text analytics
1 / 11

Data Mining and Text Analytics - PowerPoint PPT Presentation

  • Uploaded on

Data Mining and Text Analytics. Quranic Arabic Corpus. By Saima Rahna & Anees Mohammad. Summary. Quranic Arabic corpus enables further analysis of the Quran Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Data Mining and Text Analytics' - errin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data mining and text analytics
Data Mining and Text Analytics

Quranic Arabic Corpus

By Saima Rahna & Anees Mohammad


Quranic Arabic corpus enables further analysis of the Quran

Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax

Automated algorithms were used in the Quran.


Islam was born in Arabia (1400 years ago)

The key sacred texts are in Arabic

Only a minority Muslims can speak and understand Arabic

A larger percentage of Muslims know English as a second language or even first

Web resources and book resources use English in parallel with Arabic.

Data mining
Data Mining

Uses tools and techniques to extract data

Different aspects of a single topic in the Quran can reappear in many chapters

Therefore frequent patterns can be used to construct a subjective index where all versus on a single topic can be covered easily.

Text analytic
Text Analytic

Referred to as information extraction

The Quranic corpus is an advantage to those who don't understand Arabic

Can give the English readers a better insight into the source

The translation is at a detailed text Analytic level

Resources techniques
Resources & Techniques

Statistical techniques

Implementing statistical techniques such as keyword extraction

Can explore semiotic relationships between sound and meaning in the Quran

Recognise reoccurring patterns

Recognise reoccurring patterns for high level of accuracy

Linguistic resource

Arabic grammar and syntax used for each word in the quran

A comment based system used online for visitors to discuss and correct the data.


Quranic Arabic Corpus used Java to implement their algorithms.

Search feature

(searching concepts and key words in the Holy Quran)

Finding multi-word repetitions

Mining frequent patterns to a graph.

Algorithm for indexing the quran
Algorithm for indexing the Quran

When a word is encountered for the first time, it is added to the index; if it already exists there, then a new location is added to its list.

For each verse V

parse word list -> list(W)

For each word W

If INDEX contains W is false

add W and W.location to Index


fetch W in INDEX

add new location to W

Filtering algorithm
Filtering algorithm

The Quranic 'quote filtering' algorithm

The Quran has the use of Arabic diacritics (symbols)

The filtering algorithm has 3 filtering stages after making the input text.

Algorithm-Sub path Mining

This is used to generate frequent patterns within the Quran corpus

The process starts by scanning the transaction database, calculating the count for each vertex in the graph


Algorithms used

Resources and techniques used for

implementation of the Quranic Arabic corpus

How data mining is applied

How text analytic has also been applied