Webscale for a small web
This presentation is the property of its rightful owner.
Sponsored Links
1 / 5

Webscale for a small web PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

Webscale for a small web. Since 2005 All *. dk websites 4 times/year ~100 special sites harvested daily Explicitely stated by law that everything public can and must be harvested

Download Presentation

Webscale for a small web

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Webscale for a small web

Webscale for a small web

Since 2005

All *.dk websites

4 times/year

~100 special sites harvested daily

Explicitely stated by law that everything public can and must be harvested

ELAG 2014 quickly hacked lightning talk, Toke Eskildsen, [email protected], State and University Library, Denmark


Numbers

Numbers

Currently 370TB+ / 8-10B web resources

Estimated final index: 20-24TB

Indexing: 24 core / 256GB RAM / 5TB SSD

Searching: 16 core / 256GB RAM / 24TB SSD

Cost: ~£12,000

1 optimized shard / SSD (900GB, 300M docs)

Build time / shard: ~8 days

1 solr / shard, connected with Solrcloud

https://github.com/netarchivesuite/netsearch

(based on https://github.com/ukwa/webarchive-discovery)


20 threads 3 6tb simple searches

20 threads, 3.6TB, simple searches


1 thread faceting 900gb url field

1 thread, faceting, 900GB, URL-field


Remember the milk

Remember the milk!


  • Login