1 / 8

Intro of Dataset used in Dissertation Research

Intro of Dataset used in Dissertation Research. Xiangyu Fan. Research Topic and Used Data. Focused on recommendation of medical info Use medical topic overlap to help improve recommendation Use simulation as research method Use MedicalNewsDaily website as source

fifi
Download Presentation

Intro of Dataset used in Dissertation Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro of Dataset used in Dissertation Research Xiangyu Fan

  2. Research Topic and Used Data • Focused on recommendation of medical info • Use medical topic overlap to help improve recommendation • Use simulation as research method • Use MedicalNewsDailywebsite as source • One of the most popular medical news websites • Good categorization by professionals • 123 unique topics (i.e. category) • Each news has one main topic and 0-4 sub topics

  3. Building Dataset Three steps to build dataset: • Select 123 unique topics (i.e. 123 categories in data source) • Crawl 100 recent medical news articles for each topic, store them into DB • Retrieve main and sub topics for each document and store them into DB

  4. Tables in DB • Article Table • Article title, • Article content • Publish date • Source • Topic table • Article ID • Topic name • Topic type (Main vs Sub) 12,300 records (12300 articles) 34,290 records (Freq of topic occurrence) Each article has 2.8 topics on average (1 main topic and ~2 sub topics)

  5. Sample Question 1 • What’s frequently-occurring sub topics in the articles on headache (as main topic)?

  6. Topic Distribution

  7. Sample Question 2 • What’s topic pairs with the strongest correlation?

  8. Building Simulation Dataset • Topics with Strong Overlap • Select 30 topics with the highest freq of co-occurrence • Average of co-occurrence freq: 63 • Topics with Weak Overlap • Select 30 topics with the lowest freq of co-occurrence • Average of co-occurrence freq: 1

More Related