data science and visualization n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Science and Visualization PowerPoint Presentation
Download Presentation
Data Science and Visualization

Loading in 2 Seconds...

play fullscreen
1 / 16

Data Science and Visualization - PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on

Data Science and Visualization. 2014 Summer Internship - Tetherless World Constellation. Sumithra Gnanasekar Lakshmi Chenicheri. Objective. Visualize Minimum Information about a Marker Gene Sequence ( MiMarks ) compliant datasets A dark data exercise. *. MiMarks.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Science and Visualization' - drew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data science and visualization

Data Science and Visualization

2014 Summer Internship - Tetherless World Constellation

Sumithra Gnanasekar

Lakshmi Chenicheri

objective
Objective
  • Visualize Minimum Information about a Marker Gene Sequence (MiMarks) compliant datasets
  • A dark data exercise

*

mimarks
MiMarks
  • A standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences
  • Describes the environment from which the sample has been taken from
  • Ensures contextual data is collected and submitted

*

datasets
Datasets
  • Two datasets from a bacterial diversity study from the Western English Channel
  • Focused on the seasonal structure of microbial communities
  • Dataset 1 was converted from Excel to CSV
  • Dataset 2 was converted from SRA to CSV
  • Data cleaning was undertaken to retrieve relevant fields

*

tools for visualization
Tools for Visualization
  • R
  • Google charts integrated with R
  • Shiny R Studio
  • D3.js

D3.js was finally used due to its flexibility of use and range of visualizations available

*

scatter plot dataset 1
Scatter Plot Dataset 1
  • Allows the user to filter fields
  • Drill and expand
  • Group based on fields
  • Handy in determining correlations between variables

*

analysis of scatter plot dataset 1
Analysis of Scatter Plot Dataset 1
  • Depth, density, total_Depth of water column, longitude and latitude were found to be independent of the other environmental variables
  • Near linear correlation between nitrate and silicate, and nitrate and phosphate

*

scatter plot dataset 2
Scatter Plot Dataset 2
  • Allows the user to filter fields
  • Drill and expand

*

analysis of scatter plot dataset 2
Analysis of Scatter Plot Dataset 2

Linear trend seen in the scatter plots of:

  • Spots vs Bases
  • Nitrate vs Phosphate
  • Org_nitro vs Ord_carb
  • Temperature vs Density

*

temporal visualization
Temporal Visualization

Allows one to filter values based on time and analyze its effect on other variables

*

doi visualization
DOI Visualization
  • Visually represents DOIs associated with data points
  • On clicking a bubble, the metadata for that DOI is fetched and displayed

*

bubble chart
Bubble Chart
  • Visually represents the environment data associated with each sample
  • Bubble size corresponds to organism count

*

rdf conversion
RDF Conversion

The RDF conversion for MiMarkscompliant datasets involves two steps:

  • Construct an Ontology or use an existing one
  • Convert the dataset into a triple instance using CSV to RDF conversion tools

csv2rdf4lod is an open source tool that can be used to easily convert the data in a CSV file into RDF encoded data

*

spatio temporal feature of mimarks vamps and codl datasets
Spatio-temporal feature of MiMarks, VAMPS and CoDL datasets

Some tools or visualizations that can be used to visualize the MiMarks, VAMPS and CoDL datasets are as follows:

  • Planetary.js, an open source tool will be effective in representing the spatial features in an interactive way
  • Motion charts that show the change over a period of time can be effective, by showing a change in the quantity represented as the size of the bubble in the motion chart
  • Calendar based representation of values if there is continuous data, is another option

*

links to visualizations
Links to Visualizations
  • Timeline crossfiltering visualization:http://dco.tw.rpi.edu/viz/timeline/index.html
  • DOI visualization:

http://dco.tw.rpi.edu/viz/doiVis/index.html

  • Scatterplot visualization for Dataset 1:http://dco.tw.rpi.edu/viz/scatterPlot/demo/demo.html
  • Bubble chart Visualization:http://dco.tw.rpi.edu/viz/Bubblechart/bubble_dataset2/index.html
  • Scatterplot visualization for Dataset 2:http://dco.tw.rpi.edu/viz/scatterplot_dataset2/demo/demo.html

*