50 likes | 175 Views
Stanford Events Crawler. Zoe Chu Michael Tung. Architectural Overview. Crawler. Event? classifier. http. web. nntp. newsgroup. pop. Extractor. mailing list. backend. frontend. Event tuple. DBMS. Presentation layer. User. Applications. Notification. Crawling/Classification.
E N D
Stanford Events Crawler Zoe Chu Michael Tung
Architectural Overview Crawler Event? classifier http web nntp newsgroup pop Extractor mailing list backend frontend Event tuple DBMS Presentation layer User Applications Notification
Crawling/Classification • Event pages • Index and detail pages • 10 events/sec
Extraction/Normalization • LR Wrappers • Segmentation for email – decision tree classifier • Hand written rules for field extraction • Date & Time Normalization • Building Normalization • Edit distance against a lexicon of Stanford building names • Free text search - Lucene
Which fields? • Title • Date • Time • Location • Category • Sponsor • Contact info • Admission/fees • Speaker • Food • Description • Building • X,Y physical coordinates • Picture of building • Map • Nearby buildings