1 / 34

Data Wrangling

Data Wrangling. Managing data. Programs Scripts Data Documentation Text Images Movies. Programs and scripts. Reproducibility Code snapshots Documentation Archiving Version control RCS, Subversion Feature creep Expansion vs modification. Data management. Size matters

lottp
Download Presentation

Data Wrangling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Wrangling IS&T Scientific Visualization Tutorial – Summer 2010

  2. Managing data • Programs • Scripts • Data • Documentation • Text • Images • Movies IS&T Scientific Visualization Tutorial – Summer 2010

  3. Programs and scripts • Reproducibility Code snapshots Documentation • Archiving • Version control RCS, Subversion • Feature creep Expansion vs modification IS&T Scientific Visualization Tutorial – Summer 2010

  4. Data management • Size matters • What to keep? What is hard to reproduce Short vs long-term • Archiving SCF archive system Back up to external drive IS&T Scientific Visualization Tutorial – Summer 2010

  5. Back to the pipeline Data Matlab VTK OpenGL Maya IDL Paraview OSG Photoshop Gnuplot DAFFIE Performer Premier Xmgrace Excel IS&T Scientific Visualization Tutorial – Summer 2010

  6. Your data sci-vis package • Minimal conversion, i.e., keep basic structure • Headers • Reformatting • ASCII vs binary • Data type (int, single, double) • Endian-ness • Example – exporting from Matlab to VTK IS&T Scientific Visualization Tutorial – Summer 2010

  7. Array layout • 2-D example, Matlab >> a(1,1) = 11; >> a(1,2) = 12; >> a(2,1) = 21; >> a(2,2) = 22; >> a a = 11 12 21 22 >> a1d = reshape(a,4,1) a1d = 11 21 12 22 IS&T Scientific Visualization Tutorial – Summer 2010

  8. Array layout • 2-D example, C #include <stdio.h> main() { int m[2][2]; int *pm = m; int i; m[0][0] = 11; m[0][1] = 12; m[1][0] = 21; m[1][1] = 22; for (i=0; i<4; i++) printf("%d\n", pm[i]); } Output: 11 12 21 22 IS&T Scientific Visualization Tutorial – Summer 2010

  9. Permuting dimensions • 2-D example, Matlab >> a a = 11 12 21 22 >> b = permute(a, [2,1]) b = 11 21 12 22 >> b1d = reshape(b,4,1) b1d = 11 12 21 22 IS&T Scientific Visualization Tutorial – Summer 2010

  10. Endian-ness Big endian Little Endian IS&T Scientific Visualization Tutorial – Summer 2010

  11. VTK legacy format • Example # vtk DataFile Version 3.0 output of gen_vtk_v3_loop.m BINARY DATASET STRUCTURED_POINTS ORIGIN 0.0 0.0 0.0 SPACING 1.0 1.0 1.0 DIMENSIONS 4 7 12 POINT_DATA 336 VECTORS v3 float @ @ IS&T Scientific Visualization Tutorial – Summer 2010

  12. Writing out a VTK legacy file • Example using Matlab fprintf(fid, '# vtk DataFile Version 3.0\n'); fprintf(fid, 'output of gen_vtk_v3_loop.m\n'); fprintf(fid, 'BINARY\n'); fprintf(fid, 'DATASET STRUCTURED_POINTS\n'); fprintf(fid, 'ORIGIN 0.0 0.0 0.0\n'); fprintf(fid, 'SPACING 1.0 1.0 1.0\n'); fprintf(fid, 'DIMENSIONS %s %s %s\n', int2str(nx), int2str(ny), int2str(nz)); fprintf(fid, 'POINT_DATA %s\n', int2str(nx*ny*nz)); fprintf(fid, 'VECTORS %s float\n‘, varname); fwrite(fid, dv3, 'single'); fclose(fid); IS&T Scientific Visualization Tutorial – Summer 2010

  13. VTK XML format <VTKFile type="ImageData" version="0.1" byte_order="LittleEndian"> <ImageData WholeExtent="0 128 0 32 0 32" Origin="0.0 0.0 0.0" Spacing="1.0 1.0 1.0"> <Piece Extent="0 128 0 32 0 32"> <PointData Vectors="velo"> <DataArray Name="velo" type="Float32" format="ascii” NumberOfComponents="3"> 0.0 8.2 69.2 0.0 1.2 68.8 ... 490.3 67.2 0.2 497.3 77.2 -0.l </DataArray> </PointData> </Piece> </ImageData> </VTKFile> IS&T Scientific Visualization Tutorial – Summer 2010

  14. Larger picture IS&T Scientific Visualization Tutorial – Summer 2010

  15. Example – molecular dynamics • Simulation creates data files • Molecule x,y,z + type •  colored spheres (C program) • Electron density as volume data •  isosurfaces (IDL)  .obj files • Rendered in Maya IS&T Scientific Visualization Tutorial – Summer 2010

  16. Problem statement • Atoms File with x,y,z,Atom type (over time) • Electron density File containing volume data (over time) • Desired output, animation of Atoms as colored balls Electron density as isosurfaces IS&T Scientific Visualization Tutorial – Summer 2010

  17. Decisions • Final display program Find an off-the-shelf solution? Write an OpenGL program? Produce models for generic display software? • How to represent the geometry Colored spheres? Colored isosurfaces? • How to get from input data to this representation • Electron density IS&T Scientific Visualization Tutorial – Summer 2010

  18. Digging down - geometry • Spheres, program: void drawSphere(x, y, z, r, nlat, nlong) { for (i=0; i<nlat; i++) { s0 = sin(PI*(i/nlat)); c0 = cos(PI*(i/nlat)); s1 = sin(PI*((i+1)/nlat)); c1 = cos(PI*((i+1)/nlat)); glBegin(GL_QUAD_STRIP); for (j=0; j<=nlong; j++) { c2 = cos(2*PI*j)/nlong); s2 = sin(2*PI*j)/nlong); glNormal3f(c2*c0, s2*c0, s0); glVertex3f(x+r*c2*c0, y+r*s2*c0, z+r*s0); glNormal3f(c2*c1, s2*c1, s1); glVertex3f(x+r*c2*c1, y+r*s2*c1, z+r*s1); } glEnd(); } } IS&T Scientific Visualization Tutorial – Summer 2010

  19. Digging down - geometry # OBJ file: sphere.obj # nvert = 512 # nface = 128 v 0.05257 0 -8.5065 v 0.05257 0 8.5065 v -0.05257 0 8.5065 ... f 1 2 4 3 f 3 4 6 5 f 5 6 8 7 ... IS&T Scientific Visualization Tutorial – Summer 2010

  20. Surfaces: polygonal representation IS&T Scientific Visualization Tutorial – Summer 2010

  21. Digging down - geometry • Sometimes special types of geometry #Inventor V2.1 ascii … DEF O_mat Material { ambientColor 0.05 0.20 0.40 diffuseColor 0.05 0.20 0.50 specularColor 0.05 0.20 0.20 shininess 0.20 } DEF atom_1187 Separator { USE O_mat Translation { translation -40.0 -60.0 0.0 } Sphere { radius 2.5 } } … IS&T Scientific Visualization Tutorial – Summer 2010

  22. And the isosurface v[0] = ( 0.52, 1.01, 9.50) v[1] = ( 0.57, 0.99, 8.11) v[2] = (-0.67, 0.43, 7.23) ... f[0] = {1, 2, 4} f[1] = {3, 4, 6, 5} f[2] = {5, 6, 8, 7} ... IS&T Scientific Visualization Tutorial – Summer 2010

  23. A variety of data structures for cells IS&T Scientific Visualization Tutorial – Summer 2010

  24. 3D file formats • What they represent • How they represent it • What software can read it • What software can write it • How complex is it • Human readable • ASCII vs binary • Proprietary vs open source • Cost IS&T Scientific Visualization Tutorial – Summer 2010

  25. 3D file formats - continuum • Simplest: explicit points, lines, planes, patches • Add color information, texture maps, bump maps • More complex: scene graph including lights, etc • Cutover to programmatic paradigm • Conversions may not preserve all features IS&T Scientific Visualization Tutorial – Summer 2010

  26. .obj files • Materials file • List of materials by name • Contain surface reflectance properties • Contain names of texture (image) files • Vertex list • v x y z • Normals list • n x y z • Texture coordinate list • t u v IS&T Scientific Visualization Tutorial – Summer 2010

  27. .obj files • Faces as vertex lists • f v1 v2 v3 … • f v1/vt1 v2/vt2 v3/vt3 ... • f v1/vt1/vn1 v2/vt2/vn2 v3/vt3/vn3 ... • v1//vn1 v2//vn2 v3//vn3 .. IS&T Scientific Visualization Tutorial – Summer 2010

  28. (0,1) (1,1) (0,0) (1,0) .obj file example mtllib ./alien.mtl v 0.0 0.0 0.0 v 1.0 0.0 0.0 v 1.3 0.6 0.0 v 1.0 1.0 0.0 v 0.0 1.0 0.0 v 2.0 -0.3 0.0 v 2.3 0.7 0.0 vn 0.0 0.0 1.0 vt 0.0 0.0 vt 0.5 0.0 vt 0.5 0.6 vt 0.5 1.0 vt 0.0 1.0 vt 1.0 0.0 vt 1.0 0.5 usemtl alien f 1/1/1 2/2/1 3/3/1 4/4/1 5/5/1 f 3/3/1 2/2/1 6/6/1 7/7/1 v5 v4 v3 v7 v1 v2 v6 IS&T Scientific Visualization Tutorial – Summer 2010

  29. Tools for 3D format conversion • VTK import and export (freely available) • Okino nugraf (1 copy in CGL) • Roll-your-own • Meshlab (not tried) IS&T Scientific Visualization Tutorial – Summer 2010

  30. Example – data wrangling flow • For each time step: • atoms file -> obj file (dw) • E density file -> volume file (dw) • volume -> isosurface data file (IDL) • isosurface data file -> obj file (dw) • obj files -> tiff image file (Maya) • tiff file -> png file (ImageMagick • Collect image files into movie (Premiere) IS&T Scientific Visualization Tutorial – Summer 2010

  31. Work flow with lots of data • Cannot fit whole experiment in running program • Cannot fit all data onto disk • Requires staging / tracking dependencies • Requires deleting intermediate data • Requires queue management and etiquette IS&T Scientific Visualization Tutorial – Summer 2010

  32. Example workflow –pressure on turbine • For each time step • Simulation produces plot3d file • plot3d -> obj file with color as texture (dw) • obj file -> tiff image file (Maya) • tiff image file -> ppm file (imagemagick) • ppm file -> DVD wall (SCV movie player) IS&T Scientific Visualization Tutorial – Summer 2010

  33. Conclusion - Tips • Don’t get consumed by tangents • Like, learning a new language • Or, developing a general matrix library • Use existing formats and software when possible • But don’t let them take over • Simple and open source • Make careful choices re ASCII vs binary • Take snapshots and make backups • Document, document, document IS&T Scientific Visualization Tutorial – Summer 2010

  34. The end • Erik Brisson: ebrisson@bu.edu • Tutorial powerpoint slides: • http://www.bu.edu/tech/research/training/presentations/list/ IS&T Scientific Visualization Tutorial – Summer 2010

More Related