1 / 48

OTN Workshop 2014

OTN Workshop 2014. OTN SandBox Presented by Marta Mihoff OTN Database/Data Process Manager. Outline Background Platform Overview Exercises Upload new Sandbox Backup data folder and Upload data files Create working folder File Conversion White-Mihoff False Filtering Tool

marin
Download Presentation

OTN Workshop 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OTN Workshop 2014 OTN SandBox Presented by Marta Mihoff OTN Database/Data Process Manager

  2. Outline • Background • Platform Overview • Exercises • Upload new Sandbox • Backup data folder and Upload data files • Create working folder • File Conversion • White-Mihoff False Filtering Tool • Distance Matrix Merge • Mihoff Interval Data Tool • Cleanup Tool • Wrap Up

  3. OTN Sandbox Backround • Symposium 2013 researcher requests • Fall 2013 request from Steve Kessel and Eddie Halfyard • Reverse engineered Easton White’s code • Presentation Platform • Winter development and testing

  4. OTN SandBox Platform • Free open software Black Box • Oracle Virtual Box • OTN Sandbox Appliance • Postgresql 9.1 database • PGAdmin • Python 2.7 • Rstudio (only part visible)

  5. OTN SandBox Tools • White-Mihoff False Filtering Tool • Builds a file of suspect detections • Creates a file of filtered detections • Creates a distance matrix • Distance Matrix Merge • Outputs a matrix overriding distances with researcher input • Mihoff Interval Data Tool • Creates a file of Compressed detections and a file of interval data • Miscellaneous • File Conversion (UTF8) • Cleanup

  6. Start Oracle Virtual Box and OTN SandBox • Open Start Window • Click on Oracle VM VirtualBox • Start OTN Sandbox

  7. Sign In • Open Chrome or Firefox • Paste sandbox URL • Sign in • Username: sandbox • Password: otn123 • Will not work with VPN turned on

  8. Exercise: Update Sandbox Folder • Navigate to Documentation folder on USB stick • Open Update Sandbox Tools Instructions.doc • Move files you want to Save

  9. Update Sandbox: save off files • Click New Folder button • Type in a folder name • Click OK • Confirmed you have saved files

  10. Exercise: Update Sandbox Folder • Navigate to folder Rstudio • Check the sandbox folder • Delete folder sandbox • Click the Upload button

  11. Upload Sandbox Folder • Navigate to USB stick • OTN TOOL BOX/OTN Sandbox • Choose sandbox.zip • Click Open • Click OK

  12. Data Folder Management • Manage your own data • Keep separate data folders for different projects • Current working data folder is always “data” • You can export a folder to your desktop

  13. Exercise: Renaming data folder • Check the data folder • Click the rename button on the Files menu • Type new name for data folder • Click OK

  14. Upload Sample Data • Click the Upload button • Navigate to USB stick • Choose data.zip • Click OK

  15. Create a work shop folder for test scripts • Click New Folder button on Files Menu • Type in folder name • Click OK

  16. Documentation and Software Location • Introduction page with links http://members.oceantrack.org/data/otn-tool-box • Direct Location for most up to date version http://members.oceantrack.org/toolbox/

  17. Folder Structure: Software • Check the date of sandbox.zip • Upload if more recent than your version • Watch for ova updates. • OVA replacment would be required if the underlying platform needed to change.

  18. Folder Structure: Documentation • There is extra stuff for geeks in the Appendix of the Install guide • Update Sandbox Tools Instructions would be used after initial install to add new functions or fixes • Troubleshooting will be expanded as users report problems and we find solutions

  19. Exercise: File Conversion • Open sandbox folder • Click on file_conversion_driver.r • File will open in upper left window of GUI • Save file to WorkShop Scripts folder

  20. Exercise: File conversion • Open data folder • Cut and paste the file name into the script • Save the script

  21. Running R-scripts • Highlight the lines you want to execute • Click the run button

  22. File Conversion: NotePad++ Encoding • Open file in NotePad++ • Click Encoding on Menu Bar • Button indicates encoding • Click Convert to UTF-8 wo BOM • Save file

  23. Exercise: filtering suspect detections • Open sandbox folder • Click on filter_driver.r • Will open in upper left window • Save to WorkShop Scripts folder

  24. Exercise: Filtering Control Parameters • Highlight this entire section and click the run button

  25. Exercise: Filtering Functions • loadDetections() • Input a detection file • Outputs a file of suspected detections • And an optional distance matrix • filterDetections() • Input a detection file and a file of suspect detections • Outputs a file of filtered detections • And an optional distance matrix

  26. False Filtering: Minimum Requirements • Column: unqdetecid must be present. • Must contain unique values. • Column: catalognumber must be present. • This can be an animal id or a transmitter id. • Column: datecollectedmust be present. • Must be format YYYY-MM-DD HH:MI:SS • or YYYY-MM-DDTHH:MI:SS • All digits must be present • Column: station must be present.

  27. False Filtering: Set input values • Open data folder • Highlight input detection file and copy • Paste into script window over detections.csv

  28. Run the load step • Paste the file name between the quotes • Highlight this section of code • Click the run button

  29. Output Messages: Load Step

  30. Data: Suspect Detections (transposed) • Each row represents info about three consecutive detections of one animal • The column value for suspect_detection represents the unique id from the input file

  31. Run the Filter Step • If you have your own file of suspect detections or have edited the one the tool created • This is where you override the input file • Otherwise the program will use the one created in the previous step

  32. Output Messages: Filter Step • Messages will tell you: • What file of suspect detections was used • What the input detection file was • Record counts • Output file names

  33. Data: Distance Matrix

  34. Exercise: Distance Matrix Merge • Open sandbox folder • Click on distance_matrix_merge_driver.r • Will open in upper left window • Save to WorkShop Scripts folder

  35. Exercise: Distance Matrix Merge • File for distance_matrix_input was created in false filtering step • Highlight file sample_distance_matrix_override_values.csv in the data folder and paste into distance_real_input expected value

  36. Exercise: Distance Matrix Merge • Highlight entire script • Click Run

  37. Exercise: Interval Data • Open sandbox folder • Click file interval_data_driver.r • Will open in upper left window • Save to WorkShop Scripts folder

  38. Exercise: Interval Data • Grabbing filenames for subsequent steps • Find them in the output in the output Console. Bottom L.H.S.

  39. Exercise: Interval Data • Grab the output file of detections from the last step of the filter step • Grab the output file from the distance matrix merge step • Paste values into the script • Save the script

  40. Exercise: Interval Data • Execute the script

  41. Data: Compressed Detections

  42. Data: Interval Data

  43. Exercise: Cleanup • Open sandbox folder • Click on file cleanup_driver.r • Will open in upper left window • Highlight entire script • Click Run

  44. Teach yourself to program • Free open software • Extremely powerful • Standardized • Python • Python(x,y): rival to MATLAB and Rstudio • PostgreSQL

  45. How? Coursera • Rice University: An Introduction to Interactive Programming in Python TBA • https://www.coursera.org/specialization/fundamentalscomputing/9?utm_medium=catalogSpec • University of Michigan: Programming for Everybody Next Session Jun 2 • https://www.coursera.org/course/pythonlearn • Johns Hopkins: R Programming Part of the "Data Science" Specialization Next session Jun 2 ( not too late) • https://www.coursera.org/course/rprog

  46. PostgreSQL: Online Tutorials http://www.postgresqltutorial.com/

  47. RStudiovs R • Prefer Rstudiohttps://www.rstudio.com/ide/download/ • User friendly Interface • Not standardized so use with caution • Null always TRUE • Unpredictable results • Help • Rseek: http://www.rseek.org/

  48. Questions? • Wish list? • Cohort data • Separate files for animal detections on other lines • Station Group mapping function • If you can think it and describe it in English, we can program it.

More Related