1 / 66

Arrays

Arrays. CS101 2012.1. (1-dimensional) array. So far, each int , double or char variable had a distinct name Called “scalar” variables An array is a collection of variables They share the same name But they have different indices in the array Arrays are provided in C++ in two ways

barto
Download Presentation

Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Arrays CS101 2012.1

  2. (1-dimensional) array • So far, each int, double or char variable had a distinct name • Called “scalar” variables • An array is a collection of variables • They share the same name • But they have different indices in the array • Arrays are provided in C++ in two ways • “Native arrays” supported by language itself • The vector type (will introduce later) • Similar: understand one, understand both Chakrabarti

  3. Notation • When writing series expressions in math, we use subscripts like ai, bj etc. • For writing code, the subscript is placed in box brackets, as in a[i], b[j] • Inside […] can be an arbitrary integer expression • In C++, the first element is a[0], b[0] etc. • If the array has n elements, the last element is at position n-1, as in a[n-1] • Watch against out-of-bound index errors Chakrabarti

  4. Why do we need arrays • Print a running median of the last 1000 temperatures recorded at a sensor • Tedious to declare and use 1000 scalar variables of the form t0, t1, …, t99 • Cannot easily express a computation like “print differences between consecutive readings” as a loop computation • Want to write “ti+1 – ti” • I.e., want to access elements using index that is an integer expression Chakrabarti

  5. Declaring a native array main() { int vn = 9; int va[vn]; for (int vx = 0; vx < vn; ++vx) { va[vx] = vx * (vn - 1 - vx); } for (int vx = 0; vx < vn; ++vx) { cout << va[vx] << ", "; } cout << endl; Number of elements in array to be created Reserve memory for 9-element array Lvalue: cell to which rhs value is to be written Note: no size() or length; hang on to vn Rvalue: access int in specified cell Chakrabarti

  6. One 32-bit int (0) One 32-bit int (7) One 32-bit int (12) Representation in memory • Elements of array have fixed size • Laid out consecutively • Compiler associates array name va with memory address of first byte • To fetch va[ix], go to byte address A + ix*4 and fetch next four bytes as integer • A, A+4, A+8, A+12, … va starts here (address A) va[0] va[1] va[2] Chakrabarti

  7. Sum and product of all elements double arr[an]; // fill in suitably double sum = 0, prod = 1; for (int ax = 0; ax < an; ++ax) { sum += arr[ax]; prod *= arr[ax]; } • In standard notation we would write these as Chakrabarti

  8. Dot-product of two vectors double av[nn], bv[nn]; // filled in double dotprod = 0; for (int ix = 0; ix < nn; ++ix) { dotprod += av[ix] * bv[ix]; } Chakrabarti

  9. Cosine of the angle between two vectors • Need to compute norms alongside dot prod double av[nn], bv[nn]; // filled in double dot=0, anorm=0, bnorm=0; for (int ix = 0; ix < nn; ++ix) { dot += av[ix] * bv[ix]; anorm += av[ix] * av[ix]; bnorm += bv[ix] * bv[ix]; } double ans =dot/sqrt(anorm)/sqrt(bnorm); Chakrabarti

  10. Josephus problem • numPeople people stand in a circle • Say named 0, 1, …, numPeople-1 • Start at some arbitrary person (say 0) • Skip skip people • Loop back from numPeople-1 to 0 if needed • Throw out victim at next position • Repeat until circle becomes empy • Eviction order? Who survives to the end? • Need to model only two states per person: present (true) or absent (false) Chakrabarti

  11. Josephus solution bool present[numPeople]; // initialize to all true (present) int victim = 0, numEvicted = 0; while (numEvicted < numPeople) { // Skip over skip present people // Evict victim } • Irrespective of whether victim is present or not at the beginning of the loop, • ensure that after the skip step, victim is again a present person Chakrabarti

  12. The skip step for (int toSkip = skip; ; victim=(victim+1)%numPeople) { if (present[victim]) { --toSkip; } if (toSkip == 0) { break; } } • Note that present[victim] is true here • If skip is large, may skip over the same candidate victim many times • Note use of for without condition to check Chakrabarti

  13. Evict step present[victim] = false; ++numEvicted; cout << "Evicted " << victim << endl; • Eviction takes constant time • victim is now absent, but next iteration’s skip step will take care of that Chakrabarti

  14. Alternative representation • Are we wasting too much time skipping over people who have already left? • Alternative: instead of a bool array, maintain a shrinking int array with person IDs • On every eviction, squeeze out the evicted ID and reduce array size by 1 • (Allocated space remains same, we just don’t use all of it) Chakrabarti

  15. Busy eviction 0 victim numPeople for (int cx=victim, cx<numPeople-1; ++cx) { survivors[cx] = survivors[cx+1]; } --numPeople; • Cannot just swap, because order of people in survivors must not be disturbed Chakrabarti

  16. Skipping is now trivial • All person IDs in survivors is present by construction • So skipping is as simple asvictim = (victim + skip) % numPeople; • Takes constant time • Note that numPeople decreases by one after each eviction Chakrabarti

  17. Time analysis • First representation (bit vector) • Skipping is messy • Evicting is trivial • Second representation (person ID vector) • Skipping is trivial • Evicting is messy (move data) • Which is better? • Depends on the goal of the computation Chakrabarti

  18. Smallest and largest elements • Convention: for an empty array • Largest element is  (minus infinity) • Smallest element is + (plus infinity) • Will show how to instantiate for all numeric types int array[an]; // suitably filled in int min = PLUS_INF, max = MINUS_INF; for(int ax=0; ax < an; ++ax){ int elem = array[ax]; min = (elem < min)? elem : min; max = (elem > max)? elem : max; } Ties? Chakrabarti

  19. Position (index) of smallest element • Given double array[0, …, an-1] • Find index imin of smallest element • Must remember current smallest too double amin = PLUS_INF; int imin = -1; // illegal value to start for (int ax = 0; ax < an; ++ax) { if (array[ax] < amin) { amin = array[ax]; imin = ax; } } // imin holds the answer Chakrabarti

  20. Swapping array elements • Once we find imin, we can exchange the element at imin with the element at 0double tmp = array[imin];array[imin] = array[0];array[0] = tmp; • This places the smallest element at slot 0 • Or largest, if we wish • We can keep doing this, will give us a sorted array Chakrabarti

  21. Prefix (cumulative) sum • Given array a[0,…,n-1] • Compute array b[0,…,n-1] whereb[i] = a[0] + a[1] + … + a[i] • Nested loop works but is naïve int a[n], b[n]; // vector a filled for (int bx = 0; bx < n; ++bx) { b[bx] = 0; for (int ax = 0; ax <= bx; ++ax) { b[bx] += a[ax]; } } Time taken is proportional to n2 Chakrabarti

  22. Prefix sum: faster • Given array a[0,…,n-1] • Compute array b[0,…,n-1] whereb[i] = a[0] + a[1] + … + a[i] int a[n], b[n]; // vector a filled for (int ix = 0; ix < n; ++ix) { b[ix] = ( b[ix-1]) + a[ix]; } a[0] a[1] a[2] a[3] b[0]=a[0] a[0]+a[1] a[0]+a[1]+a[2] ? (ix == 0)? 0 : Makes code inefficient Chakrabarti

  23. Applications of cumulative sum • Running balance in bank account • Deposits positive, withdrawals negative • Range sum: given array, proprocess it so that, given “query” i, j, can returna[i] + a[i+1] + … + a[j] very quickly • Answer is simply b[j] – b[i-1] Chakrabarti

  24. Merging sorted arrays • Arrays double a[m], b[n] • Repeated values possible • Each has been sorted in increasing order • Know sorting by repeated move-smallest-to-front • Output is array c[m+n] • Contains all elements of a[m] and b[n] • Including all repeated values • In increasing order • Example: a=(0, 2, 7, 8) b=(2, 3, 5, 8, 9) c=(0, 2, 2, 3, 5, 7, 8, 8, 9) Chakrabarti

  25. Merge approach • Run two indexes ax on a[…] and bx on b[…] • Also called cursors • Choose minimum among cursors, advance that cursor • Append chosen number to c[…] ax 0 2 7 8 0 2 2 3 5 7 2 3 5 8 9 bx Chakrabarti

  26. Merge code double a[m], b[n], c[m+n]; // a and b suitably initialized int ax = 0, bx = 0, cx = 0; while (ax < m && bx < n) { if (a[ax]<b[bx]) {c[cx++]=a[ax++];} else { c[cx++] = b[bx++];} } // one or both arrays are empty here while (ax < m) { c[cx++] = a[ax++]; } while (bx < n) { c[cx++] = b[bx++]; } Chakrabarti

  27. Another way to sort • Earlier we sorted by finding the minimum element and pulling it out to the leftmost unsorted position • Suppose array length is a power of 2 • First sort segments of length 1 (already done) • Merge into sorted segments of length 2 • [0,1] [2,3] [4,5] [6, 7] • Merge into sorted segments of length 4 • [0, 1, 2, 3] [4, 5, 6, 7] Mergesort Chakrabarti

  28. Mergesort picture • Recall that any location of RAM can be read/written at approximately same speed • But access to hard disk best made in large contiguous chunks of bytes • Mergesort best for data larger than RAM 35 9 22 16 17 13 29 4 9 35 16 22 13 17 4 29 9 16 22 35 4 13 17 29 4 9 13 16 17 22 29 35 Chakrabarti

  29. Time taken by mergesort • Time to merge a[m] and b[n] is m+n • Suppose array to be sorted is s[2p] • 2p-1 merges of segments of size 1, 1  takes time 2p • 2p-2 merges of segments of size 2, 2  takes time 2p again • p merge phases each taking 2p time • So total time is p 2p • Writing N=2p, total timeis N log N, optimal! Index arithmetic and bookkeeping slightly complicated Chakrabarti

  30. Searching a sorted array • Given array a=(4, 9, 13, 16, 17, 22) • Find the index/position of q=16 (answer: 3) • Print -1 if q not found in a • Why do we care? • E.g., income tax department database • Array pan[numPeople] (sorted) • Array income[numPeople] (not sorted) • Each index is for one person • Find index of specific PAN • Then access person’s income Chakrabarti

  31. Searching a sorted array • Given array a=(4, 9, 13, 16, 17, 22) • Find the index/position of q=16 (answer: 3) • Print -1 if q not found in a • Linear searchfor (int ax=0; ax<an; ++ax) { if (a[ax] == q) { cout << ax; break; }}if (ax==an) { cout << -1; } • More efficient to do binary search No need to look farther because all values will be larger than q Chakrabarti

  32. Binary search • Bracket unchecked array segment between indexes lo and hi • Bisect segment: mid = (lo + hi)/2 • Compare q with a[mid] • If q is equal to a[mid] answer is mid • If q < a[mid], next bracket is lo…mid-1 • If q > a[mid], next bracket is mid+1…hi lo hi q=16 Chakrabarti

  33. Binary search • Terminate when lo == hi (?) • Before first halving, bracket has n candidates • After first halving, reduces to ~n/2 • After second halving, … ~n/4 • Number of halvings required is approximately log n • If array was not sorted, need to check all n • To implement, need a little care with index arithmetic • lo=2 hi=3 mid=2, lo=3 hi=4 mid=3 Chakrabarti

  34. Binary search code int lo = 0, hi = n-1, ans=-1; while (lo <= hi) { int mid = (lo + hi)/2; if (q < a[mid]) { hi = mid-1; } if (q > a[mid]) { lo = mid+1; } if (q == a[mid]) { ans = mid; break; } } if (ans == -1) { … } If not found, common convention to return the negative form of the index where the query should be inserted Chakrabarti

  35. Median • Given a sorted array a with n elements • By definition, median is a[n/2] or a[n/2+1] • Similarly, percentiles • What if the array is not sorted? • Trivial: first sort then find middle element • Have seen that this will take n2 or n log n time • Can we do better? • Turns out cn steps are enough for some constant c • (Complicated) Chakrabarti

  36. Median • Given two sorted arrays a[m] and b[n], supposing their merge was c[m+n] • Goal: find median of c without calculating it (would take m+n time) • I.e., find median in much less time Lab! All items marked large are larger than all items marked small Therefore no large item can be a median Eliminates ~(m+n)/2 elements m/2 a small 20 > small 27 large b n/2 Chakrabarti

  37. Back to mergesort index arithmetic • Assume array size N = 2pow • There are pow merge rounds • In the pxth merge round, px=0,1,…,pow-1: • There are 2pow-px sorted runs • Numbered rx = 0, 1, …, rn = 2pow-px • Each run has 2px elements • rxth run starts at index rx * 2px • And ends at index (rx+1)* 2px (excluded) • Merge rxth and (rx+1)th runs for rx=0, 2, 4, … Demo Chakrabarti

  38. Bottom up mergesort • Start with solutions to small subproblems, then combine them to form solutions of larger subproblems • Eventually solving the whole problem 35 9 22 16 17 13 29 4 9 35 16 22 13 17 4 29 9 16 22 35 4 13 17 29 4 9 13 16 17 22 29 35 Chakrabarti

  39. Top down expression • To mergesort array in positions [lo,hi) … • If hi == lo then we are done, otherwise … • Find m = (lo + hi)/2 • Mergesort array in positions [lo, m) • Recursive call • Mergesort array in positions [m, hi) • Recursive call • Merge these two segments Chakrabarti

  40. vector • A vector can hold items of many types • Need to tell C++ compiler which type • Declare as vector<double> oneArray;vector<int> anotherArray; • Very similar to string in other respects • array.resize(10); // sets or resets size • array.size() // current number of elements • foo = array[index]; // reading a cell • array[index] = value; // writing a cell Chakrabarti

  41. Initialization vector<int> array; const int an = 10; array.resize(an); for (int ax = 0; ax < an; ++ax) { array[ax] = ax * (an – ax); } • Usually, arrays are read from data files Chakrabarti

  42. Advantages of using vector • Can store elements of any type in it • E.g. vector<string> firstNames; • Memory management is handled by C++ runtime library • Can grow and shrink array after declaration • But there’s more! • Sorting is already provided • So is binary search • And many other useful algorithms Demo Chakrabarti

  43. Sparse arrays • Thus far, our arrays allocated storage for each and every element • Sometimes, can’t afford this • E.g. documents • Represented as vectors ina very high-dimensional space • One axis for each word inthe vocabulary, which couldhave billions of tokens • Most coordinates of most docs are zeros logic dog cat Chakrabarti

  44. Example • A corpus with only two docs • my care is loss of care • by old care done • Distinct tokens: 0=by 1=care 2=done 3=is 4=loss 5=my 6=of 7=old • Represented as token IDs, docs look like • 5 1 3 4 6 1 • 0 7 1 2 • In sparse notation ignoring sequence info • { 1:2, 3:1, 4:1, 5:1, 6:1 } • { 0:1, 1:1, 2:1, 7:1 } Documents are similar if they frequently share words Extreme compression: can instead store gaps between adjacent token IDs Chakrabarti

  45. Storing sparse arrays • Assign integer IDs to each axis/dimension • Use two “ganged” vectors • vector<int> dims storesdimension IDs • vector<float> vals storescoordinate in corresponding dim • Some important operations • Find the L2 norm of a vector • Add two vectors • Find the dot product between two vectors 1 3 4 5 6 2 1 1 1 1 Chakrabarti

  46. Norm vector<int> dims; // suitably filled vector<float> vals; // suitably filled float normSquared = 0; for (int vx = 0, vn = vals.size();vx < vn; ++vx) { normSquared += vals[vx] * vals[vx]; } float norm = sqrt(normSquared); Chakrabarti

  47. Sparse sum • aDims, aVals, bDims, bVals  cDims, cVals • Best to keep all dims in increasing order • Conventionally written as • a = { 0:2.2, 3:1.4, 11:35 } b = { 1: 5.2, 3: 1.4 } • Then c = { 0:2.2, 1:5.2, 11:35 } • Two parts • Given a sparse vector that is not ordered by dim, how to clean it up • Given a and b sorted on dim, how to compute c • Second part first It’s just a merge! Chakrabarti

  48. Sum merge ax 0 3 3 11 aDims 2.2 -1.4 35 aVals 0 1 11 cDims bx 2.2 -5.2 35 cVals 1 3 3 bDims -5.2 1.4 bVals Chakrabarti

  49. Sum merge code vector<int> aDims, bDims, cDims; vector<float> aVals, bVals, cVals; // a and b filled, c empty int ax=0, an=aDims.size(),bx=0, bn=bDims.size(); while (ax < an && bx < bn) { // three cases: // aDims[ax] < bDims[bx] // aDims[ax] > bDims[bx] // aDims[ax] == bDims[bx] } // append leftovers as before Chakrabarti

  50. Three cases • aDims[ax] < bDims[bx] • cDims.push_back(aDims[ax]); • cVals.push_back(aVals[ax]); ++ax; • aDims[ax] > bDims[bx] • cDims.push_back(bDims[bx]); • cVals.push_back(bVals[bx]); ++bx; • aDims[ax] == bDims[bx] • float newVal = aVals[ax] + bVals[bx]; • If |newVal| is zero/small, discard, else • cDims.push_back(aDims[ax]); • cVals.push_back(newVal); • In either case, ++ax; ++bx; Chakrabarti

More Related