1 / 6

組員 : 資工 4A 87068800 王俊傑 資工 4B 87070300 陳國富 資工 4B 87070600 夏希璿

IR. 組員 : 資工 4A 87068800 王俊傑 資工 4B 87070300 陳國富 資工 4B 87070600 夏希璿. 程式開發環境. Web Interface : 1.Web Server : Internet Information Services 2.Web Script Language : PHP Indexing 程式 : Perl script language 使用 Database:mysql. 程式架構和流程. ‧Indexing. Indexing. 去除 stop word.

Download Presentation

組員 : 資工 4A 87068800 王俊傑 資工 4B 87070300 陳國富 資工 4B 87070600 夏希璿

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IR 組員:資工4A 87068800 王俊傑 資工4B 87070300 陳國富 資工4B 87070600 夏希璿

  2. 程式開發環境 Web Interface : 1.Web Server : Internet Information Services 2.Web Script Language : PHP Indexing程式: Perl script language 使用Database:mysql

  3. 程式架構和流程 ‧Indexing Indexing 去除stop word Document ‧Query Query 存放 Index的 DateBase 送出查詢 Web-InterFace 送回結果 Result

  4. Indexing步驟 1.將檔案讀入,去掉stop word (程式執行前 以手動輸入)。 2.將單字,及其出現的文章存入Database。 3.先將一半的檔案以步驟1,2做處理。 4.將不滿足 N/Ni >10的單自從Database中 去除。 N :文章總數 Ni:某一單字出現過的文章數目

  5. Indexing步驟 5.重複步驟1,2,3,對剩下的檔案進行處理 6.然後,以 dfi=dfi*(1+log(N/Ni)) dfi (若Ni=0,即該單字未出現) 若dfi > 該篇文章的總單字數/100 則將該單字取為index dfi:只某一單字在單一篇文章出現的次數

  6. Indexing執行時間 Indexing : 一篇文章約5~10分鐘 (包含去除stop word,和select index term 的時間) searching: 輸入Query為單一個單字時 searching 時間約 5~10 sec 輸入兩個單字時, searching time大約為30 sec

More Related