Using network processors in genomics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Using Network Processors in Genomics PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on
  • Presentation posted in: General

H. Bos – Leiden University 13/02/2004. 1. Using Network Processors in Genomics. Herbert Bos * † Kaiming Huang * {herbertb,[email protected] * Leiden Universiteit, Netherlands † Vrije Universiteit, Netherlands http://www.liacs.nl/~herbertb/projects/biocomp/.

Download Presentation

Using Network Processors in Genomics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Using network processors in genomics

H. Bos – Leiden University 13/02/2004

1

Using Network Processors inGenomics

Herbert Bos* † Kaiming Huang*

{herbertb,[email protected]

*Leiden Universiteit, Netherlands

† Vrije Universiteit, Netherlands

http://www.liacs.nl/~herbertb/projects/biocomp/


Case study blast

H. Bos – Leiden University 13/02/2004

2

Case study: BLAST

  • search nucleotide/protein database for query

  • BLAST discovers similarity rather than exact match

  • two main phases:

    • scoring (registering where query and DNADB match)

    • alignment (dynamic programming)

  • only the first phase on NPUs


Window matching

H. Bos – Leiden University 13/02/2004

3

Window matching


Window matching1

H. Bos – Leiden University 13/02/2004

4

Window matching


Window matching2

H. Bos – Leiden University 13/02/2004

5

Window matching


Window matching3

H. Bos – Leiden University 13/02/2004

6

Window matching


Window matching4

H. Bos – Leiden University 13/02/2004

7

Window matching

  • naïve approach: roughly W*N*M comparisons

  • does not scale

  • string search algorithms: Aho-Corasick

    • all windows matched at the same time

    • shifting genome one nucleotide at a time

    • matching algorithm transformed in a DFA

  • DFA may be quite large


Aho corasick

H. Bos – Leiden University 13/02/2004

8

Aho-Corasick

  • Alphabet: acgt

  • Window size: 3

  • Query: acgccga

  • Windows: {acg,cgc,gcc,ccg,cga}


Aho corasick1

H. Bos – Leiden University 13/02/2004

9

Aho-Corasick

  • Alphabet: acgt

  • Window size: 3

  • Query: acgccga

  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9


Aho corasick2

H. Bos – Leiden University 13/02/2004

10

Aho-Corasick

  • Alphabet: acgt

  • Window size: 3

  • Query: acgccga

  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9


Aho corasick3

H. Bos – Leiden University 13/02/2004

11

Aho-Corasick

  • Alphabet: acgt

  • Window size: 3

  • Query: acgccga

  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9

tacgcga


Using network processors in genomics

SRAM

H. Bos – Leiden University 13/02/2004

12

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


Using network processors in genomics

SRAM

H. Bos – Leiden University 13/02/2004

13

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


Using network processors in genomics

SRAM

H. Bos – Leiden University 13/02/2004

14

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


Using network processors in genomics

a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

15

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


Using network processors in genomics

a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

16

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


Using network processors in genomics

a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

17

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


Ixpblast packet handling

H. Bos – Leiden University 13/02/2004

18

IXPBlast: packet handling

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

  • packets read and processed in batches of 100.000

  • “spilling” must be taken into account

  • currently no feedback


Results

H. Bos – Leiden University 13/02/2004

19

Results

  • 232 MHz IXP1200 ~ 1.8GHz Pentium-4

  • 1611 Nucleotide query (MyD88)

  • 1.4 GB genome (Zebrafish)

    • IXP1200: 90 sec with DFA

    • IXP1200: 129 sec with “trie”

    • P4: 132: 132 sec with “trie”

  • number of matches: 524856


Using network processors in genomics

H. Bos – Leiden University 13/02/2004

20

Results


Conclusions

H. Bos – Leiden University 13/02/2004

21

Conclusions

  • NPUs are useful in other application domains

  • Newer hardware is expected to perform much better

  • “Throughput processors”

  • Adapting our current approach to use BLAST tricks/heuristics


Network processors

H. Bos – Leiden University 13/02/2004

22

Network processors

  • geared for high throughput

  • used exclusively in network systems

  • example: intrusion detection

  • similar to looking for gene onin genomes

  • differences

Radisysixp1200 board


Application domain genomics

H. Bos – Leiden University 13/02/2004

23

Application domain: “Genomics”

  • example: search genome for occurrence of “patterns”

  • similar problems as IDS, poor performance on GPP cannot exploit parallelism

    • throughput-driven

    • how about FPGAs?

    • how about clusters?

  • NPU

    • easier to program than FPGAs

    • cheaper than cluster computing

    • “on the desktop”  IP never leaves the room


  • Login