160 likes | 292 Views
How and why to document data for long-term storage; and What's special about Geographical data?. Allan Reese Cefas Weymouth. Cefas buildings. Weymouth. Burnham-on-Crouch. Fish farms in E&W. Fish disease. Spring Viremia of Carp Virus. Causes and progress of infection
E N D
How and why to document data for long-term storage;andWhat's special about Geographical data? Allan Reese Cefas Weymouth
Cefas buildings Weymouth Burnham-on-Crouch
Fish farms in E&W Fish disease Spring Viremia of Carp Virus • Causes and progress of infection • Diagnostics (viruses, bacteria, fungi, parasites) • Vaccines and therapeutics (safety and efficacy) • Epidemiology & risk assessment • Surveillance and control – Fish Health Inspectors • Emerging and exotic diseases • Policy advice
Who wants a database? • I’ve got some data so I need a database • Our demo will show you how easy it is to simultaneously search, share and retrieve information from thousands of library databases • Project… plans to build, through networking, a database on best practices in the field • Rapid growth in the quantity of omic data means bio-informaticians need to manage data in an efficient and reliable manner. The main focus of this course is on designing, creating and querying relational databases
Why a (relational) database? • large volume of data (typically gigabytes) • complex data structure (not matching standard application) • long-term use / continued accumulation or incremental update • total accuracy & consistency needed on micro-scale • frequent accesses to small subsets, ad hoc queries • data shared by more than one person (University Computing 1991; Significance Dec 2007)
Extract for analysis • Fields ( variables ) = columns • Units ( level of analysis ) = rows • Columns x Rows = Data table Query -> view -> table of data -> summary or analysis
Mystery meat • What tables form the raw data? • What fields are in each table? • Data dictionary? • Documenting meanings or DB structure?
Table preferred when • Scientific data probably SHOULD NOT be changed • or data added in batches ( incremental ) • Structure NOT complex • replication across units allowed, but not excessive • Levels of analysis are few ( or few dominant ) • Analyses summarize whole data or samples • often one-offs ( bespoke or user-written ) • Sorting or indexing allows very rapid access
Data table needs metadata • Metadata standards (Dublin core) • emphasis on discovery • list many fields • codebook not mentioned • A modest suggestion • data table of rows and columns, with column headers • codebook: another table to explain headers • metadata: describe background, ownership etc
ESRI (ArcInfo) assumes • The purpose of a GIS is to provide a spatial framework to support decisions … • Most often, a GIS presents information in the form of maps and symbols … • A map user is the end consumer of a GIS. This person looks at maps … • When the Cassini spacecraft was launched, GIS was used to evaluate the risk of an accident with the plutonium generators on board
GISs contain • Data as points, lines, areas • Location data • lat/long, grid refs, postcodes, toids • Representation instructions • scaling, icons, label position, shading
Can you get data out? • Point and click works for pop-up labels • not to output a table • Limited to the precision of the input device, including the user’s eyesight • I want, probably, a whole layer of data, including the positions as named fields How do my needs map into the database?
Lacking / hidden / difficult in GIS • List fields associated with physical object • Choose many objects and output data • eg to make proximity matrix • Distinguish raw from constructed data • point-heights versus interpolated contour • Output data values for an area • eg sea surface temperatures
Request GIS suppliers may prefer to address users’ needs by adding yet more features to the interface, or pointing to the SQL interface I would rather they re-consider the role of the GIS as a data warehouse, from which it should be easier to select and extract data that can be analysed in other software