using network processors in genomics
Download
Skip this Video
Download Presentation
Using Network Processors in Genomics

Loading in 2 Seconds...

play fullscreen
1 / 23

Using Network Processors in Genomics - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

H. Bos – Leiden University 13/02/2004. 1. Using Network Processors in Genomics. Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit, Netherlands http://www.liacs.nl/~herbertb/projects/biocomp/.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Using Network Processors in Genomics' - samuru


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
using network processors in genomics

H. Bos – Leiden University 13/02/2004

1

Using Network Processors inGenomics

Herbert Bos* † Kaiming Huang*

{herbertb,khuang}@liacs.nl

*Leiden Universiteit, Netherlands

† Vrije Universiteit, Netherlands

http://www.liacs.nl/~herbertb/projects/biocomp/

case study blast

H. Bos – Leiden University 13/02/2004

2

Case study: BLAST
  • search nucleotide/protein database for query
  • BLAST discovers similarity rather than exact match
  • two main phases:
    • scoring (registering where query and DNADB match)
    • alignment (dynamic programming)
  • only the first phase on NPUs
window matching4

H. Bos – Leiden University 13/02/2004

7

Window matching
  • naïve approach: roughly W*N*M comparisons
  • does not scale
  • string search algorithms: Aho-Corasick
    • all windows matched at the same time
    • shifting genome one nucleotide at a time
    • matching algorithm transformed in a DFA
  • DFA may be quite large
aho corasick

H. Bos – Leiden University 13/02/2004

8

Aho-Corasick
  • Alphabet: acgt
  • Window size: 3
  • Query: acgccga
  • Windows: {acg,cgc,gcc,ccg,cga}
aho corasick1

H. Bos – Leiden University 13/02/2004

9

Aho-Corasick
  • Alphabet: acgt
  • Window size: 3
  • Query: acgccga
  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9

aho corasick2

H. Bos – Leiden University 13/02/2004

10

Aho-Corasick
  • Alphabet: acgt
  • Window size: 3
  • Query: acgccga
  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9

aho corasick3

H. Bos – Leiden University 13/02/2004

11

Aho-Corasick
  • Alphabet: acgt
  • Window size: 3
  • Query: acgccga
  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9

tacgcga

slide12

SRAM

H. Bos – Leiden University 13/02/2004

12

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI

slide13

SRAM

H. Bos – Leiden University 13/02/2004

13

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI

slide14

SRAM

H. Bos – Leiden University 13/02/2004

14

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI

slide15

a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

15

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI

slide16

a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

16

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI

slide17

a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

17

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI

ixpblast packet handling

H. Bos – Leiden University 13/02/2004

18

IXPBlast: packet handling

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

  • packets read and processed in batches of 100.000
  • “spilling” must be taken into account
  • currently no feedback
results

H. Bos – Leiden University 13/02/2004

19

Results
  • 232 MHz IXP1200 ~ 1.8GHz Pentium-4
  • 1611 Nucleotide query (MyD88)
  • 1.4 GB genome (Zebrafish)
    • IXP1200: 90 sec with DFA
    • IXP1200: 129 sec with “trie”
    • P4: 132: 132 sec with “trie”
  • number of matches: 524856
conclusions

H. Bos – Leiden University 13/02/2004

21

Conclusions
  • NPUs are useful in other application domains
  • Newer hardware is expected to perform much better
  • “Throughput processors”
  • Adapting our current approach to use BLAST tricks/heuristics
network processors

H. Bos – Leiden University 13/02/2004

22

Network processors
  • geared for high throughput
  • used exclusively in network systems
  • example: intrusion detection
  • similar to looking for gene onin genomes
  • differences

Radisysixp1200 board

application domain genomics

H. Bos – Leiden University 13/02/2004

23

Application domain: “Genomics”
  • example: search genome for occurrence of “patterns”
  • similar problems as IDS, poor performance on GPP cannot exploit parallelism
    • throughput-driven
    • how about FPGAs?
    • how about clusters?
  • NPU
    • easier to program than FPGAs
    • cheaper than cluster computing
    • “on the desktop”  IP never leaves the room
ad