Using network processors in genomics
Sponsored Links
This presentation is the property of its rightful owner.
1 / 23

Using Network Processors in Genomics PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

H. Bos – Leiden University 13/02/2004. 1. Using Network Processors in Genomics. Herbert Bos * † Kaiming Huang * {herbertb,[email protected] * Leiden Universiteit, Netherlands † Vrije Universiteit, Netherlands http://www.liacs.nl/~herbertb/projects/biocomp/.

Download Presentation

Using Network Processors in Genomics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


H. Bos – Leiden University 13/02/2004

1

Using Network Processors inGenomics

Herbert Bos* † Kaiming Huang*

{herbertb,[email protected]

*Leiden Universiteit, Netherlands

† Vrije Universiteit, Netherlands

http://www.liacs.nl/~herbertb/projects/biocomp/


H. Bos – Leiden University 13/02/2004

2

Case study: BLAST

  • search nucleotide/protein database for query

  • BLAST discovers similarity rather than exact match

  • two main phases:

    • scoring (registering where query and DNADB match)

    • alignment (dynamic programming)

  • only the first phase on NPUs


H. Bos – Leiden University 13/02/2004

3

Window matching


H. Bos – Leiden University 13/02/2004

4

Window matching


H. Bos – Leiden University 13/02/2004

5

Window matching


H. Bos – Leiden University 13/02/2004

6

Window matching


H. Bos – Leiden University 13/02/2004

7

Window matching

  • naïve approach: roughly W*N*M comparisons

  • does not scale

  • string search algorithms: Aho-Corasick

    • all windows matched at the same time

    • shifting genome one nucleotide at a time

    • matching algorithm transformed in a DFA

  • DFA may be quite large


H. Bos – Leiden University 13/02/2004

8

Aho-Corasick

  • Alphabet: acgt

  • Window size: 3

  • Query: acgccga

  • Windows: {acg,cgc,gcc,ccg,cga}


H. Bos – Leiden University 13/02/2004

9

Aho-Corasick

  • Alphabet: acgt

  • Window size: 3

  • Query: acgccga

  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9


H. Bos – Leiden University 13/02/2004

10

Aho-Corasick

  • Alphabet: acgt

  • Window size: 3

  • Query: acgccga

  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9


H. Bos – Leiden University 13/02/2004

11

Aho-Corasick

  • Alphabet: acgt

  • Window size: 3

  • Query: acgccga

  • Windows: {acg,cgc,gcc,ccg,cga}

a

c

g

t

0

1

2

3

c

g

c

4

5

6

a

12

c

g

10

11

g

c

c

7

8

9

tacgcga


SRAM

H. Bos – Leiden University 13/02/2004

12

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


SRAM

H. Bos – Leiden University 13/02/2004

13

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


SRAM

H. Bos – Leiden University 13/02/2004

14

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

15

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

16

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


a

c

g

0

1

2

3

t

c

g

c

4

5

6

a

12

SRAM

c

g

10

11

g

c

c

7

8

9

H. Bos – Leiden University 13/02/2004

17

IXPBlast

Architecture

Gbps ports

NPU (IXP1200)

ME

ME

scratch

ME

ME

DRAM

Control

Processor

ME

ME

Pentium

StrongARM

Microengines

PCI Bus

PCI


H. Bos – Leiden University 13/02/2004

18

IXPBlast: packet handling

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

  • packets read and processed in batches of 100.000

  • “spilling” must be taken into account

  • currently no feedback


H. Bos – Leiden University 13/02/2004

19

Results

  • 232 MHz IXP1200 ~ 1.8GHz Pentium-4

  • 1611 Nucleotide query (MyD88)

  • 1.4 GB genome (Zebrafish)

    • IXP1200: 90 sec with DFA

    • IXP1200: 129 sec with “trie”

    • P4: 132: 132 sec with “trie”

  • number of matches: 524856


H. Bos – Leiden University 13/02/2004

20

Results


H. Bos – Leiden University 13/02/2004

21

Conclusions

  • NPUs are useful in other application domains

  • Newer hardware is expected to perform much better

  • “Throughput processors”

  • Adapting our current approach to use BLAST tricks/heuristics


H. Bos – Leiden University 13/02/2004

22

Network processors

  • geared for high throughput

  • used exclusively in network systems

  • example: intrusion detection

  • similar to looking for gene onin genomes

  • differences

Radisysixp1200 board


H. Bos – Leiden University 13/02/2004

23

Application domain: “Genomics”

  • example: search genome for occurrence of “patterns”

  • similar problems as IDS, poor performance on GPP cannot exploit parallelism

    • throughput-driven

    • how about FPGAs?

    • how about clusters?

  • NPU

    • easier to program than FPGAs

    • cheaper than cluster computing

    • “on the desktop”  IP never leaves the room


  • Login