1 / 14

The use of an intelligent forum crawler for data retrieval from e-learning portals

6th International Conference on Education and New Learning Technologies Barcelona , 7th - 9th of July 2014. The use of an intelligent forum crawler for data retrieval from e-learning portals.

maris-koch
Download Presentation

The use of an intelligent forum crawler for data retrieval from e-learning portals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6th International Conference on Education and New Learning Technologies Barcelona, 7th - 9th of July 2014 The use of an intelligent forum crawler for data retrieval from e-learning portals Miloš PavkovićandJelicaProtić, University of BelgradeSchool of Electrical Engineering, Belgrade, Serbia

  2. Introduction • A large number of forums with different topics • Forums are often used by students during their studies • Large number of relevant information scattered around different forums inside one university domain • Forums are based on different technologies

  3. Issues • The same topic can appear across different forums inside one university domain • School official forums VS. departments independent forums • Same documents can be uploaded as post attachments to a couple of different web forums • Similar courses at different schools

  4. Solution – Specialized crawler • Specialized forum crawler • Aggregation of crawled data from multiple forums of a single university domain • Storing data into database • Forum modules that use this database for helping students

  5. Forum structure • Always defined by presented implicit paths • Example of a) forum b) thread c) attachments inside post.

  6. Crawler algorithm • FCbRE – Forum Crawler based on Regular Expressions • Automated system • Identifying DOM structure and basic forum elements with regular expressions. • Identifying forum implicit paths using regexExample: >>index\.php\?showforum\==\digit+!>+>\P=!<+ • Extraction of post content and storing into the database

  7. Web Forum Threads Forums Posts Attach - site id - site name - site link + site id - forum id - forum name - forum link + forum id - thread id - thread name - thread link + thread id - post id - post info + post id - attach id - attach name - attach link T – Simil. A – Simil. F – Simil. F/T – Simil. + thread id (1) + thread id (2) + attach id (1) + attach id (2) + forum id (1) + forum id (2) + forum id + thread id Crawler database • Essential in FCbRE model • Forum threads and posts are separately stored • Similarity tables that contain unique pairs of identifiers of forums, threads and attachments

  8. Finding similarities • Determining similarities of forums, threads or document names • It is not enough to just compare the words • grammatical errors • Singular/plural form • different form but the same semantic meaning • Using existing search engines to distinguish semantics • FCbRE uses low-level semantic difference

  9. Module plugins • Two module plugins • FCbRE-S (FCbRE Search plugin ) • FCbRE-DP (FCbRE Duplicate Prevention plugin) • Both used for experimental purposes • Written for vBulletin technology • Can be adopted for any other forum technology

  10. FCbRE-S (FCbRE Search plugin ) • Designed for standard forums searches • Forwards the requested query to FCbRE database for similarity comparison • All similarities are shown as addition to standard search results

  11. FCbRE-DP (Duplicate Prevention plugin) • Implemented in the section where the users can create a topic or forum • Monitors the field for the name of new thread or forum • Notifies the user that the similarity exist

  12. Results • 9 web forums from the University of Belgrade, manually gathered • This group is a mixture from different sources • Percentage of similar forums is smallest, while for the document is highest • True percentage of "useful" duplicates should be taken with caution

  13. Conclusion • The proposed solution performs information aggregation of related forums • It has potential in reducing duplication of forums, topics and posts • The use of plugins would result in higher forum content quality

  14. Thank you! Feel free to contact us and ask any question that you may find interesting milos_pavkovic@yahoo.com jeca@etf.bg.ac.rs

More Related