Sql databases and ensembl modules
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

SQL, Databases, and Ensembl Modules PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

SQL, Databases, and Ensembl Modules. Please look for next lecture. Ensembl API Tutorial: http://www.ensembl.org/info/software/core/core_tutorial.html. Survey of Databases (if time). Ensembl web interface Ensembl Biomart. RDBMS. Oracle and Sybase (many others)

Download Presentation

SQL, Databases, and Ensembl Modules

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sql databases and ensembl modules

SQL, Databases, and Ensembl Modules


Please look for next lecture

Please look for next lecture

Ensembl API Tutorial:

http://www.ensembl.org/info/software/core/core_tutorial.html


Survey of databases if time

Survey of Databases(if time)

Ensembl web interface

Ensembl Biomart


Rdbms

RDBMS

  • Oracle and Sybase (many others)

    • industry standard, commercial products

    • development and management tools

  • PostgresSQL

    • full-featured relational DBMS

    • open source

    • found in most linux distributions

    • handles unusual datatypes well which adds flexibility for future extensions

  • MySQL

    • open-source relational DBMS

    • easy to setup and use

    • Linux/Windows/Mac

  • Each has variations

    • SQL, datatypes, functions, features


Structured query language sql

Structured Query Language (SQL)

  • SQL is the standard language used to create, modify, maintain, and query relational databases.

  • SQL commands are issued within the context of the DBMS interface

  • SQL commands can be passed to the DBMS by other programs (such as perl, PHP, Java, manually, etc.)

  • Many venders have extensions (Sybase-SQL)

  • SQL Example:

  • select FIELD from TABLE where CONDITION

  • select TABLE.FIELD from TABLE where CONDITION

  • select program from software where program like 'blast%';


Sql datatypes

SQL datatypes

  • datatype -- describes the data stored in a particular column of a table

  • typically is either numeric or character strings

  • SQL defines subtypes that set different upper limits on the size of text or numerical data

  • Also -- special types such as DATE, MONEY


Sql datatypes1

SQL datatypes

  • INT

  • FLOAT

  • REAL (larger float)

  • CHAR -- fixed length text string

  • TEXT -- variable length text

  • BLOB -- variable length binary field

  • DECIMAL -- real number stored as character string

  • DATE

  • TIMESTAMP -- value updates every time the record is modified

  • ENUM -- limited set of options (numeric or named)

  • SET -- value is one of a limited set


Creating a new table in the database

Creating a new table in the database

CREATE TABLE tablename (columnane type [modifiers], columanme type [modifiers])

create table software_package (

packid INT NOT NULL PRIMARY KEY AUTO_INCREMENT,

packname VARCHAR(100),

packurl VARCHAR(255),

function TEXT,

keyword ENUM,

os SET,

format SET,

archfile VARCHAR(255) );


Inserting data

Inserting data

INSERT INTO table(colname1, colname2, columname3) VALUES ('value1','value2','value3')


Accessing data select

Accessing Data -- SELECT

SELECT [fields] FROM [table] where [condition]

select * from [table] where [condition]

select packname, packurl from software

select packname, packurl from software where keyword = "sequence alignment";

select packname, packurl from software where packname like "B%";


Joining multiple tables

Joining multiple tables

  • SELECT can "join" two related tables

  • relationships between tables are created by replicating information ("primary key") from one table as a "foreign key" in another table

    select attribute from table1, table2 where primarykey1 = foreignkey2

    SELECT packname, packurl, reference_date FROM software, reference WHERE software.package_id = reference.package_id AND reference_date >= 1998;


Example

Example

select Clients.Last, Clients.First, Clients.Phone from Clients where Clients.City = “El Paso” order by Clients.Last, Clients.First

select Last, First, Phone from Clients where City = "El Paso" order by Last, First


Sql basics

SQL Basics

  • select * from table_name

  • select count(*) from table_name

  • Find start times of Entertainers where Jameson is the Client

  • Nested query (may not always be supported)

    • select StartTime from Engagements where ClientID = (select ClientID from Clients where Last = “Jameson”)

  • Join

    • select Engagements.StartTime, Clients.ClientID from Engagements, Clients where Engagements.ClientID = Clients.ClientID and Client.Last = “Jameson”

    • select Engagements.StartTime from Engagements,Clients where Engagements.ClientID = Clients.ClientID and Client.Last = “Jameson”

  • insert into TABLE values (val1, val2, …)

  • insert into TABLE (FIELDa, FIELDb, …FIELDn) values (value1, value2, …)

  • insert into Clients values (“Terry”, “Braun”, 11/10/03,555-6666)


As a spreadsheet

…as a spreadsheet

Flattened table structure, and lined up the rows.

Appears to be okay….

Except…

The relationships in this trivial example are more complicated than a

simple flat structure such as a spreadsheet.

Example) The first Engagement is between Client 9001, and Entertainer 3002


Sql basics1

SQL Basics

delete TABLE where FIELD = value

update TABLE set FIELD = value

drop DATABASE


Sql databases and ensembl modules

exon

transcript_id

exon_num

sequence_start

sequence_stop

intron

transcript_id

intron_num

sequence_start

sequence_stop

primer_pair

id

transcript_id

left_primer_id

right_primer_id

project

id

name

description

date

transcript

id

sequence_id

source

source_id

sequence

id

target_id

type

sequence

chr_name

strand

genomic_start

genomic_stop

source

source_id

refresh

target

id

date

gene_name

description

accession

status

set_target_join

set_id

target_id

rank

cas_rank

cas_options

set_table

id

project_id

name

date

description


Sql databases and ensembl modules

Sample Data

exon

transcript_id

exon_num = 3

sequence_start

sequence_stop

intron

transcript_id

intron_num = 3

sequence_start

sequence_stop

primer_pair

id

transcript_id

left_primer_id

right_primer_id

project

id

name = pro1

description

date

transcript

id

sequence_id

source = Ensembl

source_id

sequence

id

target_id

type = nucleotide

sequence = ATG…

chr_name = 15

strand = 1

genomic_start = 15,123,120

genomic_stop = 16,378,131

source

source_id

refresh

target

id

date

gene_name = BBS4

description

accession

status

target_set_info

set_id

target_id

rank = 5

cas_rank

cas_options

set_table

id

project_id

name = testset

date

description


Extracting data from a db

Extracting Data from a DB

  • This simple figure provides me with enough understanding of the database structure to extract data


Local sql example

Local SQL Example:

  • We have gene name (BBS4), and would like to find project name, and the exon number, start/stops

  • select id from target where gene_name = "BBS4"

    • 11

  • select set_table.id from set_table, target_set_info where target_set_info.set_id = set_table.id and target_set_info.target_id = TARGET_ID

    • 3

  • select project.name from project, set_table where set_table.project_id = project.id and set_table.id = SET_ID

    • project333

  • select id from sequence where target_id = TARGET_ID

    • 33

  • select id from transcript where sequence_id = SEQUENCE_ID

    • 68

  • select exon_num, sequence_start, sequence_stop from exon where transcript_id = TRANSCRIPT_ID

  • Whew.


Mysql demo gscr

MySQL demo (gscr)

  • mysql -h gscr -u tabraun –p

  • use trapss;


Trapss database

TrAPSS Database

Show tables;

Show columns from table TABLE_NAME;


Sql databases and ensembl modules

exon

transcript_id

exon_num = 3

sequence_start

sequence_stop

intron

transcript_id

intron_num = 3

sequence_start

sequence_stop

primer_pair

id

transcript_id

left_primer_id

right_primer_id

project

id

name = pro1

description

date

transcript

id

sequence_id

source = Ensembl

source_id

sequence

id

target_id

type = nucleotide

sequence = ATG…

chr_name = 15

strand = 1

genomic_start = 15,123,120

genomic_stop = 16,378,131

source

source_id

refresh

target

id

date

gene_name = BBS4

description

accession

status

target_set_info

set_id

target_id

rank = 5

cas_rank

cas_options

set_table

id

project_id

name =testset

date

description

  • select id from target where gene_name = "BBS4";


Sql databases and ensembl modules

exon

transcript_id

exon_num = 3

sequence_start

sequence_stop

intron

transcript_id

intron_num = 3

sequence_start

sequence_stop

primer_pair

id

transcript_id

left_primer_id

right_primer_id

project

id

name = pro1

description

date

transcript

id

sequence_id

source = Ensembl

source_id

sequence

id

target_id

type = nucleotide

sequence = ATG…

chr_name = 15

strand = 1

genomic_start = 15,123,120

genomic_stop = 16,378,131

source

source_id

refresh

target

id = 23

date

gene_name = BBS4

description

accession

status

target_set_info

set_id

target_id = 23

rank = 5

cas_rank

cas_options

set_table

id

project_id

name = testset

date

description

select set_table.id from set_table, target_set_info where target_set_info.set_id = set_table.id and target_set_info.target_id = 23; (3 rows ==> in 3 sets)


Sql databases and ensembl modules

exon

transcript_id

exon_num = 3

sequence_start

sequence_stop

intron

transcript_id

intron_num = 3

sequence_start

sequence_stop

primer_pair

id

transcript_id

left_primer_id

right_primer_id

project

id

name = pro1

description

date

transcript

id

sequence_id

source = Ensembl

source_id

sequence

id

target_id

type = nucleotide

sequence = ATG…

chr_name = 15

strand = 1

genomic_start = 15,123,120

genomic_stop = 16,378,131

source

source_id

refresh

target

id

date

gene_name = BBS4

description

accession

status

target_set_info

set_id =

target_id

rank = 5

cas_rank

cas_options

set_table

id = 7

project_id

name =testset

date

description

select project.name from project, set_table where set_table.project_id = project.id and set_table.id = 7;


Sql databases and ensembl modules

exon

transcript_id

exon_num = 3

sequence_start

sequence_stop

intron

transcript_id

intron_num = 3

sequence_start

sequence_stop

primer_pair

id

transcript_id

left_primer_id

right_primer_id

project

id

name = pro1

description

date

transcript

id

sequence_id

source = Ensembl

source_id

sequence

id

target_id = 23

type = nucleotide

sequence = ATG…

chr_name = 15

strand = 1

genomic_start = 15,123,120

genomic_stop = 16,378,131

source

source_id

refresh

target

id

date

gene_name = BBS4

description

accession

status

target_set_info

set_id

target_id

rank = 5

cas_rank

cas_options

set_table

id

project_id

name = testset

date

description

select id from sequence where target_id = 23;


Sql databases and ensembl modules

exon

transcript_id

exon_num = 3

sequence_start

sequence_stop

intron

transcript_id

intron_num = 3

sequence_start

sequence_stop

primer_pair

id

transcript_id

left_primer_id

right_primer_id

project

id

name = pro1

description

date

transcript

id

sequence_id = 2216

source = Ensembl

source_id

sequence

id

target_id

type = nucleotide

sequence = ATG…

chr_name = 15

strand = 1

genomic_start = 15,123,120

genomic_stop = 16,378,131

source

source_id

refresh

target

id

date

gene_name = BBS4

description

accession

status

target_set_info

set_id

target_id

rank = 5

cas_rank

cas_options

set_table

id

project_id

name = testset

date

description

select id from transcript where sequence_id = 2216;


Sql databases and ensembl modules

exon

transcript_id = 5053

exon_num = 3

sequence_start

sequence_stop

intron

transcript_id

intron_num = 3

sequence_start

sequence_stop

primer_pair

id

transcript_id

left_primer_id

right_primer_id

project

id

name = pro1

description

date

transcript

id= 5053

sequence_id

source = Ensembl

source_id

sequence

id

target_id

type = nucleotide

sequence = ATG…

chr_name = 15

strand = 1

genomic_start = 15,123,120

genomic_stop = 16,378,131

source

source_id

refresh

target

id

date

gene_name = BBS4

description

accession

status

target_set_info

set_id

target_id

rank = 5

cas_rank

cas_options

set_table

id

project_id

name = testset

date

description

select exon_num, sequence_start, sequence_stop from exon where transcript_id = 5053;


Sql examples one large query

SQL Examples – One Large Query

select distinct exon_num, project.name, sequence_start, sequence_stop from exon, transcript, sequence, target, target_set_info, set_table, project where target.gene_name = "BBS4" and project.name = "pro1" and set_table.name = "testset" and set_table.project_id = project.id and target_set_info.set_id = set_table.id and target_set_info.target_id = target.id and sequence.target_id = target.id and transcript.sequence_id = sequence.id and exon.transcript_id = transcript.id order by exon_num;


Executing sql commands from a text file http www mysql org doc refman 5 0 en batch commands html

Executing SQL commands from a text filehttp://www.mysql.org/doc/refman/5.0/en/batch-commands.html

  • However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here:

  • shell> mysql db_name < text_file

  • If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line:

  • shell> mysql < text_file

  • If you are already running mysql, you can execute an SQL script file using the source or \. command:

  • mysql> source file_name

  • mysql> \. file_name


Remotely

Remotely

mysql -h gscr.eng.uiowa.edu -u tabraun -p


Setting up mysql

Setting up MySQL

  • server -- database application that is running on the machine where the data is stored

    • runs as a "daemon" on Unix machines

    • a process that is always on, listening for and responding to requests (from clients)

    • mysqld -- server

  • client -- programs that connect to the server and request data

    • can be on same machine as server, or

    • located on other machines and connected by network

    • mysql -- client

    • mysqladmin -- administration tool


Myslq help

MySLQ help

http://dev.mysql.com/

Show tables;

Show columns from table TABLE_NAME;

Show TrAPSS (time permitting)


Survey of databases if time1

Survey of Databases(if time)

Ensembl web interface

Ensembl Ensmart


  • Login