1 / 34

Alternative Storage

Regions of Interest. Alternative Storage. Overview. What’s in a ROI? Use cases Requirements Current Storage System Problems Alternative Storage . What’s in an ROI?. ROI Geometry Measurements ROI on Channel Annotations ROI Measurement Links . U se Cases. User created ROI

hasad
Download Presentation

Alternative Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regions of Interest Alternative Storage

  2. Overview • What’s in a ROI? • Use cases • Requirements • Current Storage System • Problems • Alternative Storage

  3. What’s in an ROI? • ROI • Geometry • Measurements • ROI on Channel • Annotations • ROI • Measurement • Links

  4. Use Cases • User created ROI • Measurement tools • HCS generated ROI • Automatic • External • External analysis • Particle Tracking • Other • Templates • ROIs without images

  5. Use Cases – Human Generated • Human generated • More interactions • Merge, Propagate, Split, Delete • Measurements • Geometry • Intensity • Path • ROI/ROI Links • Tags mostly on ROI • Write Many/Read Many

  6. Use Cases - HCS • HCS Generated ROI • Lots of ROI • Attached to Channel • Measurements Attached • Multiple measurements • Tags on ROI, Measurements • Analysis, results and meta. • Write Once, Read Many

  7. Use Cases – External Tools • External Tool can Generate ROI (+ scripts) • Can be tagged • Links (ROI/ROI, ROI/Image) • Results can be in any format

  8. Use Cases - Templates • ROI need not be attached to image • Template to define other ROI

  9. ROI from the Nth Dimension • N-Dimensional Data • Storage of Image data simple • ROI more complex • Database entry, file format • We don’t just want to store in HDF

  10. Current Storage Solutions • Database • ROI • ROI Annotations • PyTables • Mask ROI • Measurements

  11. Current Status • Pytables • ROI are heterogeneous • Concurrency • Python behind a core service call • Measurements are optimal • Tagging is an issue • Inside file • Multiple annotations reported to be slow

  12. Database • ROI can be stored in database • Mask data can be an issue • Tagging in RBD not best • Many more annotations than we’d like • Link to external source for measurements

  13. Alternative Storage • Key-Value Pair Stores • Berkeley DB • Project Voldermort • Tokyo Cabinet • Document DB • MongoDB • CouchDB • Graph DB • Neo4J • InfoGrid • Table DB • Cassandra • Hypertables • HBase

  14. Where others have gone before • Other opinions on the storage solutions • MongoDB vs CouchDB, Cassandra, .. • CouchDB vs MongoDB • Pros and cons of MongoDB • Digg on Cassandra • What is a supercolumn • Cassandra talk • Indexing nodes in Neo4J

  15. MongoDB • Document Database • NOSQL movement • Schemaless • No Tables • Collections of like data • No Joins • Document is equivalent of row of data • Distributed file system (GridFS)

  16. MongoDB– Pros and Cons Pros • It has bindings to numerous languages (C++, C#, Java, Python, ...). • Allows storage, indexing, linking of any user data • Annotations are now very easy, efficient • Has mechanisms for schema upgrade • Dynamic Queries • Replication • Sharding. • Map-Reduce framework. • Fast. • GridFS is a distributed file storage mechanism within Mongo. • Easy to install Cons • Schemaless, data integrity will need to be worked on. • Graph structures not inherently supported.

  17. MongoDB - Deployments DEPLOYMENTS • SourceForge http://sourceforge.net/ • BusinessInsider http://www.businessinsider.com/ • New York Times  http://www.nytimes.com/ • Disqus http://www.disqus.com/

  18. MongoDB– ROI Use cases

  19. MongoDB– Example insert connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.insert({"tags" : [ ], "label" : “MyROI”, "shapes" : [{ "tags" : [{"tag" : "foo1", "namespace" : "bob"}], "rx" : 17, "ry" : 17, "label" : null, "cy" : 75, "cx" : 3, "t" : 0, "z" : 0, "type" : "Ellipse", "id" : 3 }, { "tags" : [{"tag" : "foo2", "namespace" : "bob"}], "rx" : 10, "ry" : 16, "label" : null, "cy" : 82, "cx" : 45, "t" : 0, "z" : 0, "type" : "Ellipse", "id" : 5 }], "type" : "Roi", "id" : 565 })

  20. MongoDB– Example query Find roi with tag foofoo and shapes with tag foo1 connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.find({”shapes.tags.tag”:”foo1”,”tags.tag”:”foofoo”}) Find roi shapes with tag containing mitosis connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.find({"shapes.tags.tag":'/.*mitosis.*/i'})

  21. Neo4J • Graph Database • use nodes to represent objects • User specifies relationship between nodes • Allows complex traversal of node structures

  22. Neo4J – Pros and Cons PROS • Handles graph structures nicely • Transactional • Supported by Gremlin  Gremlin • Native RDF  http://components.neo4j.org/neo-rdf-sail/ • Easy to install CONS • No C++ language binding. • Not distributed. • Tables are not so easily modeled. • Difficult to query on node contents

  23. Neo4J - Deployments DEPLOYMENTS • The Swedish Defence forces  http://www.mil.se • Windh Technologies  http://www.windh.com • Flextoll http://www.flextoll.se

  24. Neo4J - Example public enumOMERORelations implements RelationshipType { ASSOCIATE, DERIVE, AGGREGATE, COMPOSE } Node image = neo.createNode(); image.setProperty("IObject",imageI); image.setProperty("id",imageI.getId().getValue()); image.setProperty("name",imageI.getName().getValue()); Node derivedImage = neo.createNode(); derivedImage.setProperty("IObject",derivedImageI); derivedImage.setProperty("id",derivedImageI.getId().getValue()); derivedImage.setProperty("name",derivedImageI.getName().getValue()); Relationship relationship = image.createRelationshipTo( derivedImage, OMERORelations.DERIVE ); relationship.setProperty("type","ROI"); relationship.setProperty("operation","crop"); relationship.setProperty("roi",cropRoiI);

  25. Neo4J – ROI Use cases

  26. Cassandra Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table. A sophisticated toolset is required to get the most out of this solutions, for instance Google has created  sawzall to query this system. Digg have released a language to work with Cassandra called  LazyBoy. Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).

  27. Cassandra – Pros and Cons Pros • Quick • Handles heterogeneous data well • Different rows can have different columns • Can manage distributed data • Map/Reduce • Focus on writes not reads • Scales nicely • Easy to Install Cons • Not simple to work with • Building hierarchical structures • Sorting • Querying • Ad Hoc Queries are bad, Digg still use MySQL for certain queries. • Have to manage secondary indexes, (K/V) • Version 0.5

  28. Cassandra - Deployments Deployments • Facebook (MAYBE!!) http://www.facebook.com • Digghttp://www.digg.com

  29. Cassandra – ROI Use cases

  30. HyperTable Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table. A sophisticated toolset is required to get the most out of this solutions, for instance Google has created  sawzall to query this system.HyperTable has a query language call HQL. Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).

  31. Hypertable–Pros and Cons Pros • Quick • Handles heterogeneous data well • Different rows can have different columns • Can manage distributed data • Map/Reduce • Scales nicely • Easy to Install Cons • GPL License • Building hierarchical structures • Docs are weak • HQL works for simple queries only • Map/Reduce for other work • limit of 255 column families • Secondary keys

  32. HyperTable- Deployments Deployments • Rediffhttp://www.rediff.com • Zventshttp://www.zvents.com/

  33. HyperTable–ROI Use cases

  34. Are we Normal? • Why do we have an RDMS • We don’tnormalise the data • Each import will normalise on: • Image, ObjectiveSettings, LogicalChannel, LightSettings, Detector Settings. • Object Penalty • Difference between normalisation and view

More Related