chris saxton maria sinn matt wronski arifur sumon rahman n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Gigaword Counter PowerPoint Presentation
Download Presentation
Gigaword Counter

Loading in 2 Seconds...

play fullscreen
1 / 11

Gigaword Counter - PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on

Chris Saxton, Maria Sinn, Matt Wronski, Arifur Sumon Rahman. Gigaword Counter. Overview. Different Methods Used Results Amdahl’s Law. Alpha Design. Used manager-worker paradigm Each worker received one file name Counted words and articles in a file

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Gigaword Counter' - dalia


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chris saxton maria sinn matt wronski arifur sumon rahman
Chris Saxton, Maria Sinn,

Matt Wronski, Arifur Sumon Rahman

Gigaword Counter

overview
Overview
  • Different Methods Used
  • Results
  • Amdahl’s Law
alpha design
Alpha Design
  • Used manager-worker paradigm
  • Each worker received one file name
  • Counted words and articles in a file
  • Returned counts to manager after each file
  • Problems
    • Must have at least as many files as processes
    • Files must be of similar size
beta decomposition
Beta Decomposition
  • Wanted to send each worker an article at a time
  • Used fseek() and ftell()
  • Send file name and location of line in file to each worker
  • Problems
    • Lots of communication
    • Each file was read a total of three times
beta design 2 nd attempt
Beta Design 2nd Attempt
  • Used getline()
  • Two methods
    • Each worker received an article
    • Each worker read the ever n lines from file
  • Problems
    • High communication costs
    • May require many file reads
    • Huge buffers for send
brute force approach
Brute force approach
  • Due to complexity with struct
  • Decided to store every term of length m – n in an array
  • Checked each term individually to see if it was distinct
  • Problems
    • No shared memory
    • Uses a lot of memory
    • A lot of time spent searching the array for already existing terms
decomposition by term
Decomposition by Term
  • Manager sends a term of length n to each worker
  • Manager responsible for total word and article count
  • Workers assigned part of the alphabet
  • Terms sent to appropriate worker based on first two letters
  • Preserves distinctness of terms
results of alpha
Results of Alpha

Processors Execution Time

32 45.58s

64 30.36s

128 27.30s

256 23.93s

Execution time increases significantly when there are smaller number of large files.

beta results
Beta Results
  • With GIGAWORD corpus as the input, beta program would not finish or produce expected result.
  • However, it would run on smaller portion of data, although the count would include some (but not all) duplicates of distinct terms.