1 / 20

RSA 2019, Toronto Preconference day March 16, 2019 11AM-1PM

RSA 2019, Toronto Preconference day March 16, 2019 11AM-1PM. Data Organization and visualization for beginners Jodi Cranston , Catherine Walsh , Angela Dressen. Programm. 11-11:05 -- Introduction to the session and presenters PRESENTATION OF PROJECTS

stoll
Download Presentation

RSA 2019, Toronto Preconference day March 16, 2019 11AM-1PM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RSA 2019, Toronto Preconferenceday March 16, 2019 11AM-1PM Data Organization and visualization for beginnersJodiCranston, Catherine Walsh, Angela Dressen

  2. Programm • 11-11:05 -- Introductiontothesession and presenters • PRESENTATION OF PROJECTS • 11:05-11:20 – Jodi: Mapping Titan, Mapping Paintings • 11:20-11:35 – Catherine: Mapping Sculpture • PRESENTATION OF TOOLS • 11:35-12:05 – Angela: OpenRefine, TimelineJS • 12:05-12:35 – Catherine: Palladio, CARTO • Hands-on

  3. OpenRefine

  4. OpenRefine • Cleaning up messy data from a spreadsheet • Spelling errors • Uniform data • Removingwhitespace • Splittingcolumns • Enriching data from externalsources • Etc. Youwon’t be analysingyour data one by one, but in groups and sets. Therefore the applicationissuitable for very large data sets.

  5. OpenRefine • Apart from cleaning data, you can also use Open Refine for differentpurposes • Word counts in sets • Combine sheets • Enrichingreconciled data with Open Refine: Import data from Wikidata or VIAF

  6. OpenRefine • Free, open source software • Works best with Google Chrome (less with Safari and Explorer) • Written in Java. Requires Java JRE • Works with Interactive Data Transformationtools (IDTs), whichallows to change a big data set atone time. Itissimilar to a spreadsheet, buthas more functionalities. • Works as a destopapplication. Itdoesnotstoreyour data. Save them! Itmay be used in severaltabscontemporaneously. • The .exe file opens a terminal windowas web application, where the little server isrunning. Itneeds to remain open. Runs offline through the terminal window.

  7. OpenRefine • Chose a project and upload it. • Renameproject (saveitlater, Open Refinedoesnotsave or storeautomatically!!) • Use code UTF-8 • Configure your data: You will be shown a preview of your data. In the lower blue field, make sure “Parse data as” is set to “CSV / TSV / separator-based files”. Where it says character encoding, click in the blank field next to it and select UTF-8 from the pop-up window of encodings. Make sure the first row with your column headers is recognized as headers (boldfaced) and not as your data. If it is not automatically recognized, check the click box for “Parse next ‘1’ line(s) as column headers”. Since our exercise file is a CSV, activate the radio button “commas (CSV)” as the separator.

  8. OpenRefine – basicclean up • Text facet -> cluster • Getrid of whitespace: «Edit cells» -> «Common transforms» -> «Trim leading and trailing whitespace» / «Collapse consecutive whitespace» • Divide columns: «Edit column» -> «Split into several columns…» • Reorder columns • Cluster: «Edit cells» -> «Cluster and edit…» (only works for entire clusters to be merged, no selection possible) • Replace: Edit cells -> replace • Undo/redo: step by step index in the menu • Cancelling: Text facet –> chosewhat to eliminate and place a star –> back to facet by star –> true –> under all – facet by star –> removeallmatchingrows

  9. OpenRefine - transform • Exchange values: Edit cells -> transform -> GREL language -> transform the value • Replace: value.replace(‘xx’, ‘x’) • Add characters to a column: “prefix” + value • Cleaning up a date to show only the year: datePart(value,'year') • GREL : General Refine Expression Language on GitHub https://github.com/OpenRefine/OpenRefine/wiki/General-Refine-Expression-Language

  10. OpenRefine – example from Wikipedia – Italianartists • Download table from Wikipedia • Youwant to separate names and years • Addcolumnbased on thiscolumn • Editcells -> replace (to change the braketsinto a colon, to be usedlaterasidenfier) • Editcolumn – split intoseveralcolumns (use colon asidentifier) • Replace ) by null • Value + «, « + cells(«mycell»).value • Person separate: editcolumn – addcolumnbased on thiscolumn – value.split(« «)[1] • 1= last name / 0= first name • Add last name, first nametogether: value + «, « + cells[«Firstname»].value • Another option: Split cells: Choose ‘Edit cells’, ‘Split multi-valued cells’, entering ‘|’ as the value separator.

  11. OpenRefinefor Data enrichment(usingLinked Open Data) • Fetch URLs usingRefine • Contruct URL queriestoretrieveinformationfrom a simple web API • Usingqueryservices like: • Wikidata • Google maps API • VIAF (Virtual International Authority File) • etc.

  12. Retrieving data from Wikidata • Youneed a columnWikidata_uri • Create a columnWikidata_id: Editcolumn –> addcolumnbased on thiscolumn –> for the ID extractionentervaluereplace(value,"http://www.wikidata.org/entity/", "") • On Wikidata_idcolumn: Editcolumn -> addcolumn by fetchingURLs -> ifyouwant to querybirthdatesentervalue «P569» ("https://tools.wmflabs.org/openrefine-wikidata/en/fetch_values?item="+value+"&prop=P569") -> namecolumn «date_of_birth_Wikidata». The resultis in JSON. • Clean data by -> editcells -> transform -> for valueenterforEach(value.parseJson().values,v,v).join(";") • Cleaning up a date to show only the year: datePart(value,'year')

  13. Retrieving data from Wikidata • Reconcile (how simple isthis!!) • Chose source – Wikidata (in caseincludeothercolumnstoo) • Start reconciling – record will beautomaticallylinkedtoWikidata (someresthastobedonemanually) • Usevaluesasidentifiers

  14. OpenRefine - export • At the end: export your data set! (Open Refine does not change your original data set) • Single column export -> facet -> chose facet -> export csv • Full sheet export -> comma-separated value • It is also possible to only export parts of your sheet.

  15. OpenRefinetutorials • http://openrefine.org/ • https://programminghistorian.org/en/lessons/cleaning-data-with-openrefine • https://github.com/miriamposner/get-started-with-openrefine/blob/master/get-started-with-openrefine.md • https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users • Retrieving data from Wikidata or VIAF https://medium.com/the-bytegeist-blog/enriching-reconciled-data-with-openrefine-89b885dcadbb • There are many more!!

  16. Timeline JS

  17. Timelines (selection) • Timeline JS (NorthwesternUniversity) https://news.northwestern.edu/stories/2012/03/knight-lab-digital-timelines/ (with examples and spreadsheet) • Neatline – for Omeka http://docs.neatline.org/creating-records.html • Google Timelinehttps://www.google.com/maps/timeline?pb • Office Timelines (for Excel or Powerpoint) https://templates.office.com/en-us/Timelines?page=1

  18. TimelineJSWith Google Chrome and Google Spreadsheets • Advantages • Easy tousefor a chronologicalvisualization • Incorporatesmapsandimagesfromthe web • Can be incorporated into Websites and Powerpoints • Disadvantages • Limited interactivity • Onlyusesimagespublished on the web, not fromowncollection

  19. TimelineJSWith Google Chrome • https://timeline.knightlab.com/ • Botticelli spreadsheet: https://docs.google.com/spreadsheets/d/1BAg-2_XZM-Oap1cwQoftBcYjrJYBjXOSNOqdXBwQWyY/edit#gid=0 • Botticelli timeline (imbedded link to website or presentation)

  20. Thankyou ! Dr. Angela Dressen Villa I Tatti, The Harvard University Center for Italian Renaissance Studies / Florenz, Italy adressen@itatti.harvard.edu Discipline Representative for Digital Humanitiesat the Renaissance Society of America (RSA)

More Related