a data retrieval workflow using ncbi e utils python part ii jinja2 flask n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A data retrieval workflow using NCBI E- Utils + Python Part II: Jinja2 / Flask PowerPoint Presentation
Download Presentation
A data retrieval workflow using NCBI E- Utils + Python Part II: Jinja2 / Flask

Loading in 2 Seconds...

play fullscreen
1 / 18

A data retrieval workflow using NCBI E- Utils + Python Part II: Jinja2 / Flask - PowerPoint PPT Presentation


  • 171 Views
  • Uploaded on

A data retrieval workflow using NCBI E- Utils + Python Part II: Jinja2 / Flask. John Pinney Tech talk Tue 19 th Nov. My tasks 1. Produce a list of human genes that are associated with at least one resolved structure in PDB AND at least one genetic disorder in OMIM

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A data retrieval workflow using NCBI E- Utils + Python Part II: Jinja2 / Flask' - pete


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a data retrieval workflow using ncbi e utils python part ii jinja2 flask

A data retrieval workflow usingNCBI E-Utils + PythonPart II: Jinja2 / Flask

John Pinney

Tech talk Tue 19th Nov

slide2

My tasks

1. Produce a list of human genes that are associated with

at least one resolved structure in PDB

AND

at least one genetic disorder in OMIM

2. Make an online table to display them

slide4

Python modules used in part 1

PyCogentSimple request handling for the main EUtils.

pycogent.org

urllib2 General HTTP request handler.

docs.python.org/2/library/urllib2.html

BeautifulSoupAmazingly easy to use object model for XML/HTML.

www.crummy.com/software/BeautifulSoup/bs4/doc/

slide5

Some REST services need API keys

The OMIM server requires a license agreement but is free for academic use.

They provide a personal API key which must be submitted with each HTTP request.

OMIM_APIKEY = 'E835870B16FBAF479E826FA5168CB2615EDA0F11'

result= urllib2.urlopen( \

"http://api.europe.omim.org/api/entry?mimNumber=" + \ omimid + "&apiKey=" + OMIM_APIKEY \

).read()

slide6

Throttling queries

Most bioinformatics web servers have limits on the number of queries that can be sent from the same IP address (per day / per second etc.)

They will ban you from accessing the site if you attempt too many requests.

This can have serious consequences (e.g. the whole institution being blocked from NCBI).

slide7

Throttling queries

To ensure compliance with usage limits, implement a simple throttle:

defomim_info(omimid):

checktime('api.europe.omim.org')

result = urllib2.urlopen(...

slide8

Throttling queries

import time

lastRequestTime = {}

throttleDelay= {'eutils.ncbi.nlm.nih.gov':0.25, \ 'api.europe.omim.org':0.5}

defchecktime(host):

if((host in lastRequestTime) and (time.time() - \ lastRequestTime[host] < throttleDelay[host])):

time.sleep(throttleDelay[host] - (time.time() - \ lastRequestTime[host]))

lastRequestTime[host] = time.time()

slide9

HTML templating

I need to produce an HTML table containing basic information about the genes I have collected.

The Jinja2templating engine is an easy way to generate these kinds of documents.

I will use web services at NCBI and OMIM to assemble the information I need.

slide10

Jinja2

Using Jinja2 as an HTML templating engine, we need to split the work between 2 files:

a normal python script (in which I call the web services).

an HTML template with embedded python commands.

Not all python functions are available within the template, so it makes sense to do as much work as possible within the script before passing the data over.

slide11

Jinja2 (script)

from jinja2 import Template

template = Template(file("gene_row_template.html").read())

fout = open("gene_list.html",'w')

...

for g in sorted_genes:

fout.write( template.render(

g=g,

gene=gene_info(g),

omim=[omim_info(x) for x in omim_links(g)],

struc=[struc_info(x) for x in struc_links(g)]

)

)

(variables passed to template as kwargs)

slide12

Jinja2 (template)

<tr>

<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term={{g}}[uid]'>

{{gene.find('Gene-ref_locus').text}}

</a></td>

<td>{{gene.find('Gene-ref_desc').text}}</td>

<td>{% for m in omim%}

<a href='http://omim.org/entry/{{m.mimNumber.text}}'>

{{m.preferredTitle.text}}

</a><br>

{% endfor %}</td>

<td>{% for s in struc-%}

<a href='http://www.rcsb.org/pdb/explore/explore.do?structureId={{s.find('Item',attrs={'Name':'PdbAcc'}).text}}'>

{{s.find('Item',attrs={'Name':'PdbAcc'}).text}}

</a><br>

{%- endfor %}</td>

</tr>

slide13

Jinja2 (template)

<tr>

<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term={{g}}[uid]'>

{{gene.find('Gene-ref_locus').text}}

</a></td>

<td>{{gene.find('Gene-ref_desc').text}}</td>

<td>{% for m in omim%}

<a href='http://omim.org/entry/{{m.mimNumber.text}}'>

{{m.preferredTitle.text}}

</a><br>

{% endfor %}</td>

<td>{% for s in struc-%}

<a href='http://www.rcsb.org/pdb/explore/explore.do?structureId={{s.find('Item',attrs={'Name':'PdbAcc'}).text}}'>

{{s.find('Item',attrs={'Name':'PdbAcc'}).text}}

</a><br>

{%- endfor %}</td>

</tr>

{{ }} = print statement

{% %} = other command

I can access the methods of an object from within the template, so I can make use of all the nice BeautifulSoup shortcuts

slide14

Jinja2 (output)

<tr>

<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term=94[uid]'>

ACVRL1

</a></td>

<td>activin A receptor type II-like 1</td>

<td>

<a href='http://omim.org/entry/600376'>

TELANGIECTASIA, HEREDITARY HEMORRHAGIC, TYPE 2; HHT2

</a><br>

</td>

<td><a href='http://www.rcsb.org/pdb/explore/explore.do?structureId=4FAO'>

4FAO

</a><br><a href='http://www.rcsb.org/pdb/explore/explore.do?structureId=3MY0'>

3MY0

</a><br></td>

</tr>

slide15

Something more interactive

What if I need to produce a report on-the-fly?

Flask is a ‘micro’ web development framework for Python, which is useful for putting together a simple webserver.

For anything more substantial (e.g. if database queries are needed), consider using Django.

Flask uses Jinja2 as its template engine.

slide16

A simple webapp in Flask

from flask import Flask, request, render_template, Response

app = Flask(__name__)

@app.route('/report/')

defreport_handler():

gene = request.args.get('gene')

if( gene == None):

return render_template('report_form.html', unfound=None)

else:

return report_for_gene_name(gene)

if __name__ == '__main__':

app.run(debug=True)

slide17

Summary

Some web services may be more fiddly than others to set up, especially if they involve

API keys

Request limits (requires throttling)

Combining web services with an HTML template (either offline or on-the-fly via a webserver) is an easy way to generate user-friendly reports.

slide18

Python modules used in part 2

Jinja2 An elegant and highly versatile templating engine.

http://jinja.pocoo.org/

Flask Python ‘micro’ web development framework.

http://flask.pocoo.org