1 / 14

Build the NY Times Subject Headings and Topics in the Cloud

Build the NY Times Subject Headings and Topics in the Cloud. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011. Preface.

willa
Download Presentation

Build the NY Times Subject Headings and Topics in the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011

  2. Preface • For the last 150 years, The New York Times has maintained one of the most authoritative news vocabularies  ever developed. In 2009, they began to publish this vocabulary as linked open data. The New York Times also uses approximately 30,000 tags to power their Times Topics Pages. It is their intention to publish all of these tags as linked open data. • Today AOL Government publishes both of those together as linked open data in Spotfire so our readers can more readily browse, search, and download these invaluable data sets!

  3. data.nytimes.com These can be screen scrape into Excel! People is a 14 MB RDF file! See next slide http://data.nytimes.com/

  4. Build Your Own NYT Linked Data Application • March 30, 2010, 1:21 PM Build Your Own NYT Linked Data Application ByEVAN SANDHAUS • That’s It?: • So there you have it — all it takes to build a simple linked data application with New York Times Linked Open Data. But remember: this post just focuses on the highlights. We encourage you to take a closer look at the code and dig into some of the more advanced features we didn’t discuss. We hope that you share our excitement about the possibilities of linked data, and we look forward to seeing what you create! http://open.blogs.nytimes.com/2010/03/30/build-your-own-nyt-linked-data-application/

  5. Alumni in the News Opens and Closes Snippet http://select.nytimes.com//2005/10/15/business/15nocera.html http://topics.nytimes.com/top/reference/timestopics/people/l/frank_lorenzo/index.html http://data.nytimes.com/schools/schools.html

  6. “Who Went Where” Code 833 lines of code! http://data.nytimes.com/code/schools.html

  7. Subject Headings See next slide http://data.nytimes.com/home/a.html

  8. Subject Headings See next slide http://data.nytimes.com/86075200336035840002

  9. Using Our Linked Data http://data.nytimes.com/home/about.html

  10. Times Topics The New York Times uses approximately 30,000 tags to power our Times Topics Pages. It is our intention to publish all of these tags as linked open data. See next page http://topics.nytimes.com/topics/reference/timestopics/index.html

  11. Times Topics See next page http://topics.nytimes.com/topics/reference/timestopics/all/a/index.html

  12. Times Topics http://topics.nytimes.com/top/news/business/companies/a-m-castle-and-company/index.html

  13. Spotfire • Describe the chart, how it’s made: • The Spotfire chart was made by screen scraping the NY Times Subject Headings and Topics into an Excel spreadsheet and importing it into Spotfire. The author decided to place the two listings side-by-side as Tufte suggests to facilitate comparisons. The author also decided to re-create the summary table of Subject Heading categories to see how much change had occurred between January 13, 2010, and July 4, 2011 (very little). • How it succeeds or falls short • This single Spotfire chart makes the two lists at the NY Times sortable (click on column headers), searchable (use Filters and facets), and downloadable (click on the down arrow in the table header in the Spotfire Web Player). • Add any tips for improving: • The NY Times Topics need URLs (25,389) and the author will find a way to automate that task and will soon finish adding the URLs for NY Time Reporters by-hand.

  14. Spotfire PC Desktop Spotfire

More Related