180 likes | 322 Views
Databases מאגרי מידע. אחסון שליפה. Different kinds of DBs dealing with biological information retrieved by various means. DNA. RNA. protein. phenotype. Protein sequences Translated nuc sequences Protein domains Protein structure. Diseases polymorhism Gene expression
E N D
Databases מאגרי מידע אחסון שליפה
Different kinds of DBs dealing with biological information retrieved by various means DNA RNA protein phenotype • Protein sequences • Translated nuc sequences • Protein domains • Protein structure • Diseases • polymorhism • Gene expression • Prot-prot interactions DNA sequences (individual genes or complete genomes) • cDNA • ESTs • Non-coding RNA
Common to all databases • A database is a structured collection of information. • A database is composed of basic objects calledrecordsorentries (רשומות). • Each record is composed offields (שדות),which hold defined data that is related to that record. Let’s consider the following database of students learning bioinfo in HUJI
For some records there is only partial information – some fields contain no data (quality of DB) Some records contain similar data in some of the fields Each record has unique identifier Databases A database can be thought of as a large table, where the rows represent records and the columns represent fields. ID (Accession Numbers): Unique identifiers of the database records.
Data Retrieval • The purpose of databases is not merely to collect and organize data, but mainly to allow advanced data retrieval. • Aquery (שאילתא)is a method to retrieve information from the database. • The organization of each record into predetermined fields, allows us to use queries on fields.
Fields Phrase your query Syntax Keywords Boolean operators 1. Think – phrase your scientific question. 2. Choose appropriate database 5. Think, evaluate. The computer is just a machine. You are (hopefully) a thinking organism. 4. Access additional entries discussing same or similar entities by links to additional databases.
Phrasing a query… Terms/words for search [field] + (BOLLEAN OPERATORS) Terms/words for Search [field]
“cell cycle” Boolean Operators 1 AND 2 1 2 cell AND cycle Cell* - cell, cells, cellular etc) 1 OR 2 1 2 cell OR cycle 1 NOT 2 1 2 cell NOT cycle
The secretary wants to locate the record of the student Sharon Asulin but does not remember the last name – search Sharon The search was not limited to a certain field Sharon[allfields]
OOPS !! Retrieved too many records that don’t match the required data - too much noise.
Evaluating Search Results Search results “scientific truth”
What can we do to reduce/eliminate false positives without reducing true positives?
Sensitivity Ability of a method to detect positives, irrespective of how many false positives are reported. Selectivity Ability of a method to reject negatives, irrespective of how many false negatives are rejected. Sensitivity Selectivity
Let’s refine our search Find allstudents whose first name is Sharon Sharon[first name] Keyword synthax (NCBI)field definition
Now we don’t retrieve any answer (false negative?) and we are still not distracted by the noise. The original search phrase sharon[all fields] would have retrieved all the noise but not the required info.
The secretary wants to locate the record of the female student who comes from Cuba but does not remember her name.Search female[gender]AND*cuba*[comments] Keyword synthax (NCBI)field definition Boolean operator
והעיקר, והעיקר : לא לפחד כלל