1 / 12

BULK DATA RETRIEVAL

BULK DATA RETRIEVAL. ECHO Technical Interchange Meeting April 30 & May 1, 2013. Quick access to Publicly Available Data via URLs No processing options User Driven Pull Near-instant. What is it For?. Put items in your cart, click “Download” URL Options: Data Metadata Browse

gagan
Download Presentation

BULK DATA RETRIEVAL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Raytheon EED Program | ECHO Technical Interchange 2013 BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013

  2. Raytheon EED Program | ECHO Technical Interchange 2013 • Quick access to Publicly Available Data via URLs • No processing options • User Driven Pull • Near-instant What is it For?

  3. Raytheon EED Program | ECHO Technical Interchange 2013 • Put items in your cart, click “Download” • URL Options: • Data • Metadata • Browse • Download Options • Text File • FTP Batch Script State of the Union (Reverb)

  4. Raytheon EED Program | ECHO Technical Interchange 2013 • Catalog-REST! • Granule Searches • “atom” format results • Scan for “links” to URLs • Download a file containing those links. How Does Reverb Do It

  5. Raytheon EED Program | ECHO Technical Interchange 2013 • Catalog-REST! • Granule Searches • “atom” format results • Scan for “links” to URLs • Create a file containing those links. • Get them. How do we do it?

  6. Raytheon EED Program | ECHO Technical Interchange 2013 • curl -gG “https://testbed.echo.nasa.gov/catalog-rest/echo_catalog/granules.atom?echo_collection_id=C3878-LPDAAC_ECS&bounding_box=10.488%2C-0.703%2C53.331%2C68.906&temporal[]=2009-01-01T10%3A00%3A00Z%2C2010-03-10T12%3A00%3A00Z” • This gets all granules with: • echo_collection_id of: C3878-LPDAAC_ECS • Spatial bounding box: 10.488, -0.703, 53.331, 68.906 (W, S, E, N) • Time constraint: 2009-01-01T10:00:00Z - 2010-03-10T12:00:00Z • ~80 hits! Use -I as options to curl, and look for: • “Echo-Hits” * also from perl/bulk/get_bulk.pl Example (cURL)*

  7. Raytheon EED Program | ECHO Technical Interchange 2013 <entry xmlns:georss="http://www.georss.org/georss/10" xmlns:time="http://a9.com/-/opensearch/extensions/time/1.0/" xmlns:echo="http://www.echo.nasa.gov/esip" xmlns:gml="http://www.opengis.net/gml"> <id>G10607-LPDAAC_ECS</id> <title type="text">SC:MCD43A4.005:2075808749</title> <updated>2009-10-15T14:01:49.076Z</updated> <echo:datasetId>MODIS/Terra+Aqua Nadir BRDF-Adjusted Reflectance 16-Day L3 Global 500m SIN Grid V005</echo:datasetId> <echo:producerGranuleId>MCD43A4.A2009257.h21v08.005.2009276131145.hdf</echo:producerGranuleId> <echo:granuleSizeMB>57.7068</echo:granuleSizeMB> <echo:dataCenter>LPDAAC_ECS</echo:dataCenter> <time:start>2009-09-14T00:00:00.000Z</time:start> <time:end>2009-09-29T23:59:59.999Z</time:end> <link href="ftp://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD43A4.005/2009.09.14/MCD43A4.A2009257.h21v08.005.2009276131145.hdf" hreflang="en-US" rel="http://esipfed.org/ns/fedsearch/1.1/data#"/> <link href="ftp://e4ftl01.cr.usgs.gov/WORKING/BRWS/Browse.001/2009.10.03/BROWSE.MCD43A4.A2009257.h21v08.005.2009276091235.1.jpg" hreflang="en-US" title=" (BROWSE)" type="image/jpeg" rel="http://esipfed.org/ns/fedsearch/1.1/browse#"/> <link href="ftp://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD43A4.005/2009.09.14/MCD43A4.A2009257.h21v08.005.2009276131145.hdf.xml" hreflang="en-US" title=" (METADATA)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/> <link href="http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/qaFlagPage.cgi?sat=aqua" hreflang="en-US" title=" (DatasetDisclaimer)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/> <link href="http://lpdaac.usgs.gov/modis/dataprod.html" hreflang="en-US" title="Documents page for LP DAAC MODIS Products. (MiscInformation)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/> <link href="http://testbed.echo.nasa.gov/LPDAAC_ECS/2010/04/06/:BR:Browse.001:2075808773:1.BINARY" hreflang="en-US" type="application/x-hdfeos" rel="http://esipfed.org/ns/fedsearch/1.1/browse#" length="29953"/> <georss:polygon>3.85518158489962e-05 29.8878504914521 -0.00342414555683897 40.0119084380163 9.99985925957934 40.6260396971992 10.0030323070386 30.3449665340407 3.85518158489962e-05 29.8878504914521</georss:polygon> <echo:onlineAccessFlag>true</echo:onlineAccessFlag> <echo:browseFlag>true</echo:browseFlag> <echo:dayNightFlag>DAY</echo:dayNightFlag> </entry> YIKES! What do the Results Look Like?

  8. Raytheon EED Program | ECHO Technical Interchange 2013 <linkhref="ftp://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD43A4.005/2009.09.14/MCD43A4.A2009257.h21v08.005.2009276131145.hdf" hreflang="en-US" rel="http://esipfed.org/ns/fedsearch/1.1/data#"/> <linkhref="ftp://e4ftl01.cr.usgs.gov/WORKING/BRWS/Browse.001/2009.10.03/BROWSE.MCD43A4.A2009257.h21v08.005.2009276091235.1.jpg" hreflang="en-US" title=" (BROWSE)" type="image/jpeg" rel="http://esipfed.org/ns/fedsearch/1.1/browse#"/> <linkhref="ftp://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD43A4.005/2009.09.14/MCD43A4.A2009257.h21v08.005.2009276131145.hdf.xml" hreflang="en-US" title=" (METADATA)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/> <linkhref="http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/qaFlagPage.cgi?sat=aqua" hreflang="en-US" title=" (DatasetDisclaimer)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/> <linkhref="http://lpdaac.usgs.gov/modis/dataprod.html" hreflang="en-US" title="Documents page for LP DAAC MODIS Products. (MiscInformation)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/> <linkhref="http://testbed.echo.nasa.gov/LPDAAC_ECS/2010/04/06/:BR:Browse.001:2075808773:1.BINARY" hreflang="en-US" type="application/x-hdfeos" rel="http://esipfed.org/ns/fedsearch/1.1/browse#" length="29953"/> “link” is your friend!

  9. Raytheon EED Program | ECHO Technical Interchange 2013 • curl -gG“https://testbed.echo.nasa.gov/catalog-rest/echo_catalog/granules.atom?echo_collection_id=C3878-LPDAAC_ECS&bounding_box=10.488%2C-0.703%2C53.331%2C68.906&temporal[]=2009-01-01T10%3A00%3A00Z%2C2010-03-10T12%3A00%3A00Z” | perl–ne “printfif m/link.*\/data#/;” • “Just show me the results that have ‘data’ type in the links” • Slightly more clever perl: • … | perl -anF/\”/ -e “printfqq(\$F[1]\n) if m/link.*\/data#/;” • But watch out for Windows vs. Mac/Linux quoting! Only link…/data# Please!

  10. Raytheon EED Program | ECHO Technical Interchange 2013 • append &page_size=500 to URL_String • end_pages = (Echo-Hits DIV page_size) + 1 • for page 1 .. end_pages • curl (string+&page_num=$page) | clever.perl >> output.URLs • Now you have an output.URLs with lots of URLs in them… Loop it Over your Echo-Hits

  11. Raytheon EED Program | ECHO Technical Interchange 2013 • Scripting curl to the rescue! • Linux/Mac/Unix: • for urlin $(<output.URLs); do curl $url-OL -s; done • Windows: • for /f %f in (output.URLs) do curl %f -OL So What if I Have Some URLs?

  12. Raytheon EED Program | ECHO Technical Interchange 2013 Questions?

More Related