1 / 15

Researcher Portal

Researcher Portal. ——Group 26 WANG Jingwei JIANG Yu. Introduction. Data Sources. Data Collection. Keyword Search Scrapy , PHP Keywords Algorithm Databases Data mining Computer Computing Internet Network Recognition Software System. Schema Mapping.

deon
Download Presentation

Researcher Portal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ResearcherPortal ——Group 26 WANG Jingwei JIANG Yu

  2. Introduction

  3. Data Sources

  4. Data Collection • Keyword Search • Scrapy, PHP • Keywords • Algorithm • Databases • Data mining • Computer • Computing • Internet • Network • Recognition • Software • System

  5. Schema Mapping • Delete unnecessary attributes • Repetition: eg. journal & alternate journal • Useless: eg. rec-number, db-id, Reference count, PDF link, Patent Citation Count… • Redefine data • String → Int: Year, Time cited, Volume, Pages • Replace inconsistent name • Different name of attributes in different datasets have the same meaning • Start page, End page -> Pages • Journal / Book title -> Publication • … • Split data: Author, Paper

  6. Result

  7. Entity Resolution

  8. Result IEEE SCI DBLP • Author • Count: 4436 • Paper • Count: 8103 • Author: • Count: 5141 • Paper: • Count: 5002 • Author • Count:8489 • Attribute:Author,Title • Paper: • Count:11010 • Attribute: Id, Score, Title, Author, Venue, Volume, Number, Pages, Year, Type, URL, Publisher, URL7

  9. Evaluation • Explanation • M1: Method 1 - Take the most resent value • M2: Method 2 - Take the most often occurring value • Reduction Ratio =

  10. Data Fusion

  11. Data Conflict

  12. Result

  13. Measure • Explanation • M1: Method 1 - Assume one data source is accurate • M2: Method 2 - Take the most resent value • Reduction Ratio =

  14. Application

  15. Thank you

More Related