Data Mining and Text Analytics. Quranic Arabic Corpus. By Saima Rahna & Anees Mohammad. Summary. Quranic Arabic corpus enables further analysis of the Quran Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax
Quranic Arabic Corpus
By Saima Rahna & Anees Mohammad
Quranic Arabic corpus enables further analysis of the Quran
Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax
Automated algorithms were used in the Quran.
Islam was born in Arabia (1400 years ago)
The key sacred texts are in Arabic
Only a minority Muslims can speak and understand Arabic
A larger percentage of Muslims know English as a second language or even first
Web resources and book resources use English in parallel with Arabic.
Uses tools and techniques to extract data
Different aspects of a single topic in the Quran can reappear in many chapters
Therefore frequent patterns can be used to construct a subjective index where all versus on a single topic can be covered easily.
Referred to as information extraction
The Quranic corpus is an advantage to those who don't understand Arabic
Can give the English readers a better insight into the source
The translation is at a detailed text Analytic level
Implementing statistical techniques such as keyword extraction
Can explore semiotic relationships between sound and meaning in the Quran
Recognise reoccurring patterns
Recognise reoccurring patterns for high level of accuracy
Arabic grammar and syntax used for each word in the quran
A comment based system used online for visitors to discuss and correct the data.
Quranic Arabic Corpus used Java to implement their algorithms.
(searching concepts and key words in the Holy Quran)
Finding multi-word repetitions
Mining frequent patterns to a graph.
When a word is encountered for the first time, it is added to the index; if it already exists there, then a new location is added to its list.
For each verse V
parse word list -> list(W)
For each word W
If INDEX contains W is false
add W and W.location to Index
fetch W in INDEX
add new location to W
The Quranic 'quote filtering' algorithm
The Quran has the use of Arabic diacritics (symbols)
The filtering algorithm has 3 filtering stages after making the input text.
Algorithm-Sub path Mining
This is used to generate frequent patterns within the Quran corpus
The process starts by scanning the transaction database, calculating the count for each vertex in the graph
Resources and techniques used for
implementation of the Quranic Arabic corpus
How data mining is applied
How text analytic has also been applied