1 / 37

APL Optimization Techniques

APL Optimization Techniques. Eugene Ying Senior Software Developer Fiserv, Inc. September 14 , 2012. Topics. File I/O Optimization. Component File Fragmentation. Storing Numbers in a Native File. CPU Optimization. The Outer Product. The Inner Product. The Match Function. 2.

pchapman
Download Presentation

APL Optimization Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APL Optimization Techniques Eugene Ying Senior Software Developer Fiserv, Inc. September 14, 2012

  2. Topics File I/O Optimization Component File Fragmentation Storing Numbers in a Native File CPU Optimization The Outer Product The Inner Product The Match Function 2

  3. A Component File where each Component Contains 100 Rows of Data Updating component 2 with 50 rows of data comp 2 Updating component 2 with 150 rows of data file is fragmented 3

  4. Initializing a Component File Suppose your data will not have more than 500 rows of data. To minimize the chance of fragmentation, you allocate 500 rows of data for each component. (500 10⍴' ')⎕FAPPEND TIE ⍝ Component 1 (500 4⍴0)⎕FAPPEND TIE ⍝ Component 2 (500 20⍴' ')⎕FAPPEND TIE ⍝ Component 3 (500 5⍴0)⎕FAPPEND TIE ⍝ Component 4 (500 15⍴' ')⎕FAPPEND TIE ⍝ Component 5 4

  5. Initializing a Component File comp 5 comp 1 comp 2 comp 3 comp 4 Intended Initialization characters numbers numbers characters characters comp 2 comp 3 comp 1 comp 4 comp 5 Actual Initialization numbers characters characters characters numbers Numeric Components are greatly under-allocated in size 5

  6. Storage Sizes of APL Numbers BOOLEAN←1000⍴1 ⎕SIZE 'BOOLEAN' 144 INTEGER1←1000⍴2 ⎕SIZE 'INTEGER1' 1016 INTEGER2←1000⍴128 ⎕SIZE 'INTEGER2' 2016 INTEGER4←1000⍴32768 ⎕SIZE 'INTEGER4' 4016 FLOAT8←1000⍴0.1 ⎕SIZE 'FLOAT8' 8016 6

  7. The Default APL Number 0 X←1000⍴0 ⎕SIZE 'X' 144 X←1000⍴0.1-0.1 ⎕SIZE 'X' 144 X←1000⍴0×0.1 ⎕SIZE 'X' 144 X←1000↑0⍴0.1 ⎕SIZE 'X' 144 X←0×1000⍴0.1 ⎕SIZE 'X' 144 7

  8. How Do You Create A Vector of Integer Zerosor A Vector of Floating Point Zeros? F64_0←1⊃11 645 ⎕DR 1000⍴0 ⍝ Floating pt # 0 ⎕SIZE 'F64_0' 8016 B32_999←1⊃163 323 ⎕DR 1000⍴999 ⍝ Binary-32 # 999 ⎕SIZE 'B32_999' 4032 B16_2←1⊃83 163 ⎕DR 1000⍴2⍝ Binary-16 # 2 ⎕SIZE 'B16_2' 2032 B8_0←1⊃11 83 ⎕DR 1000⍴0⍝ Binary-8 # 0 ⎕SIZE 'B8_0' 1016 8

  9. Declaring NumbersUsing a Defined Function to Preserve Numeric Type F64←64 DCL 1000⍴0⍝ Floating pt # 0 ⎕SIZE 'F64' 8016 I32←32 DCL 1000⍴999⍝ Binary-32 # 999 ⎕SIZE 'I32' 4032 I16←16 DCL 1000⍴2 ⍝ Binary-16 # 2 ⎕SIZE 'I16' 2032 I8←8 DCL 1000⍴0⍝ Binary-8 # 0 ⎕SIZE 'I8' 1016 9

  10. The DCL (Declare) Function [0] Z←X DCL Y;D;R [1] ⍝ Declare a floating point or integer array so that each [2] ⍝ item occupies the number of bits requested by the X argument [3] ⍝ X: # of bits that each number in the array will occupy [4] ⍝ 8 for 8-bit (1-byte) integer (¯128 to 127) [5] ⍝ 16 for 16-bit (2-byte) integer (¯32768 to 32767) [6] ⍝ 32 for 32-bit (4-byte) integer (¯2147483648 to 2147483647) [7] ⍝ 64 for 64-bit (8-byte) floating point # [8] ⍝ Y: Numeric array declared [9] ⍝ Z: Numeric array that occupies the space you requested [10] [11] D←⎕DR Y ⍝ Current data type of Y [12] :Select ⍬⍴X [13] :Case 8 ⋄ R←83 [14] :Case 16 ⋄ R←163 [15] :Case 32 ⋄ R←323 [16] :Case 64 ⋄ R←645 [17] :Else ⋄ ∘ ⍝ Stop if requested data type not supported [18] :EndSelect [19] →(D>R)↑'∘' ⍝ Stop if numeric overflow [20] Z←1⊃(D,R)⎕DR Y ⍝ Convert to requested data type 10

  11. Initialization as Intended For more accurate initialization: (500 10⍴' ')⎕FAPPEND TIE ⍝ Component 1 (64 DCL 500 4⍴0)⎕FAPPEND TIE ⍝ Component 2 (500 20⍴' ')⎕FAPPEND TIE ⍝ Component 3 (32 DCL 500 5⍴0)⎕FAPPEND TIE ⍝ Component 4 (500 15⍴' ')⎕FAPPEND TIE ⍝ Component 5 11

  12. Changing the Floating Point 0 Z1000←64 DCL 1000⍴0⍝ 1,000 Floating pt 0 ⎕SIZE 'Z1000' 8016 Z2000←2000↑Z1000⍝ 2,000 Floating pt 0 ⎕SIZE 'Z2000' 268 Z2000←64 DCL 2000⍴0⍝ 2,000 Floating pt 0 ⎕SIZE 'Z2000' 16016 12

  13. Precaution Do not change a Declared array and then re-use it. If you need another similar array but of different dimensions, you should declare the new one from scratch. Reason: The internal representation of the result R←X ⎕DR Y is guaranteed to remain unmodified until it is re-assigned (or partially re-assigned) with the result of any function (ref: Dyalog Apl Reference Manual Chapter 6) 13

  14. Storing Numbers in a Native File 14

  15. Blanks and commas are the most frequently used separators for numbers stored in a text file. Index Generator is also frequently used. Storing Numbers as Characters N1←'40001 40002'  N2←'40001,40002' N3←'40000+⍳2' The character strings are executed to retrieve the numbers :For I :In ⍳10000 X←⍎N1 Y←⍎N2 Z←⍎N3 :EndFor ⍝ Elapsed time = 72 ms ⍝ Elapsed time = 89 ms ⍝ Elapsed time = 94 ms 15

  16. Storing 1,000 Numbers as Characters N←4000+(1500⍴1 1 0)/⍳1500 N1←⍕N ⍝ 4001 4002 4004 4005 ... space separated N2←N1 ((N2=' ')/N2)←',' ⍝ 4001,4002,4004,4005,... comma separated N3←¯1↓,'(',(⍕⍪¯1+(1000⍴1 0)/N),500 5⍴'+⍳2),' ⍝ (4000+⍳2),(4003+⍳2),... Comma separated Index generated :For I :In ⍳100 X←⍎N1 Y←⍎N2 Z←⍎N3 :EndFor ⍝ Run Time   96 ms ⍝ Run Time  661 ms ⍝ Run Time  504 ms 16

  17. Space Wasted by Trailing Blanks Character Matrix with 2 records Record 1 can be compressed a little bit by the Index Generator so that record 2 has less trailing blanks But in a nested vector, record 2 naturally has no trailing blanks 17

  18. File I/O Optimization Suggestions • Use the DCL function to Declare arrays to initialize the numeric components of a component file, otherwise the numeric components are under-allocated in size and the component file becomes fragmented too quickly. • To store purely numeric data in a native file, do not use commas to separate the numbers, even though CSV format is very popular, because APL commas are being executed as primitive functions. 18

  19. Outer Product 19

  20. Replacing Outer Product by Indexing ⎕WA 2656824552 X←1≠+/D∘.=D←⍳33000 LIMIT ERROR Y←⍳32000 :For I :In ⍳5 L←1≠+/Y∘.=Y M←Y∊((⍳⍴Y)≠Y⍳Y)/Y :EndFor ⎕WA 270924 ⍝ 10,000 times smaller WS X←D∊((⍳⍴D)≠D⍳D)/D←⍳33000 ⍝ No LIMIT ERROR ⍝ 21724 ms ⍝ 20 ms ⍝ 1,000 times faster 20

  21. Replacing Outer Product by Simple Logic M←100000↑50000⍴⍳13 ⎕WA 1397828 L←1≠×/×M∘.-1 12 WS FULL M←100000↑50000⍴⍳13 :For I :In ⍳1000 L←1≠×/×M∘.-1 12 N←(M≥1)^M≤12 :EndFor ⎕WA 37832 L←(M≥1)^M≤12 ⍝ 40 times smaller WS ⍝ No WS FULL ⍝ 9210 ms ⍝ 813 ms ⍝ 10 times faster 21

  22. Replacing Outer Product by a Loop A←32800?32800 B←20000+32800?32800 ⎕WA 2047735492 X←+/((⍳⍴A)∘.≥⍳⍴A)^A∘.<B LIMIT ERROR :For J :In ⍳10 X←+/((⍳⍴A)∘.≥⍳⍴A)^A∘.<B Y←⍬ :For I :In ⍳⍴B Y,←+/A[I]<I↑B :EndFor :EndFor ⎕WA 405316 X←⍬ :For I :In ⍳⍴B X,←+/A[I]<I↑B :EndFor ⍝ 5,000 times smaller workspace ⍝ No LIMIT ERROR ⍝ 75810 ms ⍝ 26422 ms ⍝ 3 times faster 22

  23. Inner Product 23

  24. Matrix on the (wrong) Side of the Expression Requiring a Matrix Transpose ⍝ Transpose needed 'ABC'^.=⍉((1↑⍴D),3)↑D “one less pair of parentheses”  ⍝ Transpose not needed (((1↑⍴D),3)↑D)^.='ABC' 24

  25. Transposed Inner Product VECTOR^.=⍉MATRIX vs MATRIX^.=VECTOR Y←10000 6⍴⎕A :For I :In ⍳10000 L←'EFGHIJ'^.=⍉Y M←Y^.='EFGHIJ' :EndFor ⍝ 14561 ms ⍝ 2302 ms 25

  26. Array Comparisons 26

  27. Comparing Array Contents with a scalar M←1000 1000⍴⎕AV ^/M^.=' ' or ^/^/M=' ' or M≡(⍴M)⍴' ' 27

  28. Character Comparison Efficiency M←1000 1000⍴⎕AV :For I :In ⍳10000 {}^/M^.=' ' {}^/^/M=' ' {}M≡(⍴M)⍴' ' :EndFor ⍝ 9108 ms ⍝ 9060 ms ⍝ 587 ms 28

  29. Numeric Comparison Efficiency M←1000 1000⍴ ⍳10000 :For I :In ⍳10000 {}^/M^.=0 {}^/^/M=0 {}M≡(⍴M)⍴0 :EndFor ⍝ 12254 ms ⍝ 12201 ms ⍝ 52 ms 29

  30. Comparing Vectors A←10000?10000 B←10000?10000 C←A^.=B C←A≡B :For I :In ⍳10000 {}A^.=B {}A≡B :EndFor ⍝ 1244 ms ⍝ 135 ms 30

  31. Comparing Vectors of Unequal Lengths A←10000?10000 B←9999?9999 C←A^.=B LENGTH ERROR C←A^.=B ^ 31

  32. Comparing Vectors of Unequal Lengths To avoid LENGTH ERROR L←(⍴A)⌈⍴B C←(L↑A)^.=L↑B or :If C←(⍴A)=⍴B :AndIfC←A^.=B :EndIf or C←A≡B 32

  33. Checking the Return Code of a Function Nowadays, many functions are written such that a 2-item nested vector is returned where one item contains the result and another item contains the return code. But there are still many functions written such that the result returned can be either the data or the return code. E.g. if ¯1 returned by a function means an error has occurred; then we need to be very careful with the use of the ∊membership function. →(¯1∊DATA←FUNCTION_1)/ERR 33

  34. Example of Function Return Code A popular IBM APL utility function to read text file is called ∆FM (File Matrix I/O). When ∆FM reads a text file and encounters an error, instead of returning the data, it returns an error code of 28. Thus many programmers would write the text file I/O coding in the following way. →(28∊DATA←∆FM 'file.csv')/ERR 34

  35. Example of Return Code Inefficiency Y←∆FM 'file.csv' ⎕SIZE'Y' 9979076 ⍴Y 72312 138 :For I :In ⍳1000 {}28∊Y {}28≡Y :EndFor ⍝ 3208521 ms ⍝ 4 ms 35

  36. CPU Optimization Suggestions When an elegant outer product generates a sparse matrix that causes LIMIT ERROR, WS FULL, or computational slow down, replace the outer product by a simpler but not so elegant expression. Example of code elegance: 1≠×/×M∘.-1 12 vs (M≥1)^M≤12 Try to avoid unnecessary transpose of a matrix when you perform an inner product of a matrix with a vector. Remember that in some cases, the match function can run much faster than the inner product or the membership function. 36

  37. The End Eugene Ying Fiserv, Inc. 37

More Related