1 / 29

Harpers: a Semantic Web(ish) site for Harper’s Magazine

Harpers.org: a Semantic Web(ish) site for Harper’s Magazine. Paul Ford Associate Web Editor, Harpers.org ford@harpers.org. Harper’s is…. A magazine of literature, politics, culture, and the arts published continuously from 1850 A small non-profit. Available content.

jasia
Download Presentation

Harpers: a Semantic Web(ish) site for Harper’s Magazine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harpers.org: a Semantic Web(ish) site for Harper’s Magazine Paul Ford Associate Web Editor, Harpers.org ford@harpers.org

  2. Harper’s is… • A magazine of literature, politics, culture, and the arts published continuously from 1850 • A small non-profit

  3. Available content • The Weekly Review, an emailed summary of world events, from 2000 • The Harper’s Index, a statistical portrait of the world, from 1998 • Public domain, scanned-in archives from 1850-1982 • Readings • Occasional features

  4. And that’s it. • Maybe full text of issues will be offered someday, but not soon. So… • How do we get more value out of limited content?

  5. Solution • Hack up the what we have into bits by content type, then… • Reassemble it according to link targets… • Which are arranged in a taxonomy… • Creating a very small “Semantic Web” for Harpers.org

  6. A quick demo… • >>>

  7. How it works • Simple set of ontological relationships (partOf, supervisorOf) • Taxonomy of content • & narrative content • that is split into smaller pieces • & links into the taxonomy

  8. Markup • Text: “Country Y announced that it had cut off relations with country Z. On Wednesday, something happened to persons X and Y.”

  9. Markup <event> Country Y announced that it had cut off relations with country Z. </event> <event> On Wednesday, something happened to persons W and X. </event>

  10. Markup <event on=“2004-03-12” id=“24848”> Country Y announced that it had cut off relations with country Z. </event>

  11. Markup <event on=“2004-03-12” id=“24848”> <link to=“#CountryY”>Country Y</link> announced that it had cut off relations with <link to=“#CountryZ”>country Z</link>. </event>

  12. Conditionals • Some text required conditional markup • Text: “Country Y announced that it had cut off relations with country Z, and on Wednesday, something happened to persons X and Y.”

  13. Conditionals: ugly, but simple <event> Country Y announced that it had cut off relations with country Z <cond is=“id”>, and</cond> <cond not=“id”>.</cond> </event> <event> <cond is=“id”>on</cond> <cond not=“id”>On</cond> on Wednesday, something happened to persons X and Y. </event>

  14. Conditionals: ugly, but simple • Narrative version • Country Y announced that it had cut off relations with country Z, and on Wednesday, something happened to persons X and Y. • Timeline-friendly version • Country Y announced that it had cut off relations with country Z. • On Wednesday, something happened to persons X and Y.

  15. All of it gets slurped up • And turned into a set of triples • Then processed in-memory • With HTML pages spit out as a result

  16. Hard, then easy • Hard to get started (lots of events, facts, and links) • Easy to keep going, if you don’t mind the markup and use a good text editor

  17. Tools used • emacs, vi, bbedit • XSLT2.0 (SAXON) • CVS

  18. Why not RDF? • Not right for redundant content and conditionals • Easy enough to transform arbitrary structured XML into RDF with XSLT, as needed • (Or into RSS1.0, RSS2.0, Atom, etc.) • ?

  19. For free… • From 300 individual pages… • To 1100 pages of “remixed” content – all unique and relevant • And Google-friendly

  20. And also for free… • Semantically relevant in-site advertising, if we want it • Topic-sorted, reusable content • Permanent, readable URIs

  21. Do people get it? • Some do, and others just navigate the site as usual • Harper’s was fine with the learning curve • “Odd but useful” – Gawker

  22. Results • Uptick in traffic and subscription revenues • Low cost of maintenance • Ever-increasing database of facts and events – adding one Weekly Review adds value to 50 different pages • Happy client

  23. Why the SemWeb(ish) framework? • Leaves plenty of room to grow • Web-only content • Full text of issues • Subscriber services • Etc • Take advantage of new SemWeb tools • Incorporate RDF sources into the taxonomy • Anticipate Semantic Web browsers

  24. Next?

  25. Make it pretty • Redesign • Hide some of the navigation • Turn links on and off

  26. Make it scale • Currently maxes out at about 20-30 megs of content, due to limits of in-memory DOM representation (10-12x XML document size) • Use a publicly available storage layer (Kowari, Jena, etc) • Go triple-crazy

  27. Make it easy to query and navigate • “Show me everything related to George Bush and Iraq.” or • “Show me everything related to politicians and the Middle East.” • New navigation • ?

More Related