1 / 12

Computations with Big Image Data

Computations with Big Image Data. Phuong Nguyen Sponsor: NIST. Computations with Big Image Data. Motivation: Live cell image processing application: microscope generates a large number of spatial image tiles with several measurements at each pixel per time slice.

ellis
Download Presentation

Computations with Big Image Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computations with Big Image Data Phuong Nguyen Sponsor: NIST

  2. Computations with Big Image Data • Motivation: • Live cell image processing application: microscope generates a large number of spatial image tiles with several measurements at each pixel per time slice. • Analyze these image including computations that calibrate, segment and visualize image channels, as well as extract image features for further analyses • Using desktop • E.g. image segmentation on stitched image using Matlab 954 files*8mins= 127 hours Stitched TIFF: ~0.578 TB per experiment • E.g 161files * 8mins= 21.5 hours 1GB per file • Goals: • Computational scalability of cell image processing • Data distributed partitioning strategies , parallel algorithms • Analysis and evaluation on different algorithm/approaches • Generalize as libraries/benchmarks /tools for image processing

  3. Computations with Big Image Data cont. • Processing these image: • Operate either on thousands of Mega-pixel images (image tiles) or on hundreds of a half or Giga-pixel images (stitched images) • Range from computationally intensive to data intensive • Approaches: • Develop distributed data partitioning strategies and parallel processing algorithms • Implement/Run benchmarks: distributed /parallel framework/platforms • Use HadoopMapReduce framework and compare with using other frameworks or parallel scripts (PBS) using network file system storage

  4. Image segmentation using Java/Hadoop • Segmentation method that consists of four linear workflow steps: • Sobel-based image gradient computation • Connectivity analysis to group 4-connected pixels and threshold by a value to remove small objects • Morphological open (dilation of erosion) using 3x3 convolution kernel to remove small holes and islands, and • Connectivity analysis and threshold by a value to remove small objects again • Connectivity analysis to assign the same label to each contiguous group of 4-connected pixels. Sodel gradient equation

  5. Flat Field Correction • Correct spatial shading of tile image • where IFFC(x,y) the flat-field corrected image intensity, • DI(x,y) is the dark image acquired by closing camera shutter, is the raw uncorrected image intensity • WI(x,y) is the flat field intensity acquired without any object

  6. Characteristic of selected cell image processing computations Summary of computations, and input and output image data files

  7. HadoopMapReduce approach Source: http://developer.yahoo.com/hadoop/tutorial/module4.html • Images files upload to HDFS • Changes of input formats (read image Input format and serialization ) • Splitting of the input (currently No split – mapper process whole stitched image … ). • Only use Mapper, output directly write to HDFS as files Output Files Output Files

  8. HadoopMapReduceapproach cont. • Advantage of using Hadoop • Data at local node -> avoid network file system bottlenecks running at scale • Managing execution of tasks, auto rerun-failed tasks for task failures • Big image loss more work if failures on task • Small images e.g. <128MB –> use HadoopSequenceFilesthat consists of binary key/value pairs (key: image filename, value: image data). Alternative Apache Avro (a data serialization system) • Run on NIST HPC cluster (Raritan cluster) • HPC queue system • Move data in/out • Not possible to share data in HDFS

  9. Image segmentation benchmark using Hadoop results • Single node and single threaded using Java take 10 hours. • Using Matlab on desktop machine take ~21.5 hours • Both I/O and computation intensive. • Image segmentation scale well using Hadoop • Efficiency decrease as increase number of nodes

  10. Flat Field Correction benchmark using Hadoop results • I/O intensive tasks primary writing output data to HDFS file system

  11. HadoopMapReduce approach cont.Future work considering techniques • Future work considering techniques • Achieve pixel level parallelism by breaking each image into smaller images, running algorithms (segmentation/flat field correction, …) and joining the results upon completion (before download files from HDFS to network file system. • This method can also be extended to overlapping blocks (by provide a method that splits the input (image) along boundaries between atomic number of rows/cols in input image and define number of overlapping pixels along each sides) • Comparison between non split/split/split with overlapping pixels • Reduce tasks in MapReduce framework can be useful for some image processing algorithm e.g. feature extraction

  12. Summary • We have developed image processing algorithms and characterized their computations as potential contributions to • scale cell image analysis application and • provide image processing benchmarks using Hadoop • Future work considers • Optimize and tune these image processing computations using Hadoop • Towards generalize as libraries/benchmarks /tools for image processing

More Related