1 / 30

KIPA Game Engine Seminars

KIPA Game Engine Seminars. Day 15. Jonathan Blow Seoul, Korea December 12, 2002. Bit Tricks. Generating Bit Masks Is some number a power of two? Avoiding ‘if’ statements (branch prediction) Floating-point absolute value Floating-point compare Floating-point log2. Generating Bit Masks.

pia
Download Presentation

KIPA Game Engine Seminars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KIPA Game Engine Seminars Day 15 Jonathan Blow Seoul, Korea December 12, 2002

  2. Bit Tricks • Generating Bit Masks • Is some number a power of two? • Avoiding ‘if’ statements (branch prediction) • Floating-point absolute value • Floating-point compare • Floating-point log2

  3. Generating Bit Masks • Suppose we want to mask the low n bits of a machine word • We can generate that with a loop • Show summation equation for the loop • Identity that lets us do something faster

  4. Is some number a power of two? • The power-of-two will be a single bit somewhere in the middle of the word • The power-of-two minus one will be a bit mask like the ones we just looked at • ANDing them together will produce 0

  5. Counting the numberof set bits in a machine word • Slow loop version • “Trick” O(num set bits) version • Discussion of tree version

  6. Pentium 4 “fireball” • A 16-bit integer unit at the core of the chip that runs at very high clock speeds • 32-bit integer operations are pipelined through the fireball as multi-stage 16-bit operations • Pipeline is organized for bits to flow from bottom to top of the word (as with addition and subtraction) • Right-shifts require a dependency that goes in the opposite direction (slower!)

  7. “How many bits does it take to store this range of values?” • Application: network or file i/o • Want ceil(log2(n_max)) assuming the values go from 0 to n_max • Slow floating-point versions • Fast bit-extraction versions

  8. Floating-Point log2 • Show slow version • Fast version utilizing the IEEE-754 format

  9. Fast absolute value • Utilizing IEEE-754 floating point format

  10. Fast floating-point compare • Description of how x86 machines compare floating point numbers • Get at least one of them on the stack • Perform ‘fcomp’ instruction • Load the floating point control word • Bit-mask it to see if the desired field is set

  11. Decision-making without branching • (And without writing in assembly language, to use instructions like CMOV) • Build a mask based on whether some intermediate result is negative or not • Use that to mask values and add them, or whatever you want • Examples

  12. Collision Detection • Speedbox and Schnitzel as alternatives to the “prevent tunneling” raycast

  13. Collision Detection • Don’t forget to optimize mainly for the expected case! • To miss a lot, or to hit a lot? • Example of Shock Force and the “early hit test” • We expect to miss usually! • So the early hit test was not so effective

  14. Collision detection • More Shock Force examples • Hierarchy of tests: bounding sphere, OBB, simple plane divide, BSP “hard case”

  15. Profiling • Motivation • You can’t optimize unless you profile. For some reason some people think they can… they’re wrong. • Demo of sample app • Goals: • Know where the overall CPU is being spent • May depend on which kind of behavior is happening! • Know which routines are stable and which ones are not

  16. Profiling • Example of getting the current time on Windows • At different accuracy levels • Description of how this is slow, and why • Too slow to call very often in code!

  17. Profiling (2) • Using the rdtsc instruction • Converting this to realtime units by calling QueryPerformanceCounter once per frame

  18. Profiling (3) • Define macros that put rdtsc calls into preambles and postambles for functions • Measure and categorize CPU time this way • Measure “self time” and “hierarchical time” • Code review of macros / constructors

  19. Problem with rdtsc • There’s this SpeedStep thing on Intel laptops • Change the CPU’s clock speed based on performance / temperature demands • Does not adjust rdtsc to compensate • May spread beyond laptops in the future • Power consumption of CPUs is becoming an important concern for businesses

  20. We can detect if rdtsc is screwing up profiling data • But we can’t fix the profiling data • Solution: just draw a big warning on the screen

  21. Division of Profiler • Low-Level Profiler • High-Level Profiler

  22. Walkthrough of first demo app • How it uses the macros • How it collects and draws the profiling data

  23. Measuring varianceof profiling data • To figure out how stable each function is • Draw which functions are “hot” in the realtime display

  24. Behaviors • We would like some better analysis of what the different behaviors are for our program • Just “eyeing” the results is not very scientific • Examples of different behaviors • Fill rate limited, AI limited, etc

  25. Batch Profiling vs Interactive Profiling • Batch profiling averages a bunch of data together over a session • Maybe it provides a way to peek at individual samples but the processing is never very convenient • Interactive profiling is about seeing results as soon as they happen • But interactive profilers are usually hacked together • What if we made a good one?

  26. Want to detect and analyzespecific behaviors • But without preconceived ideas of what they might be • Treat incoming frames of profiling data as vectors, and cluster them • Description of k-means clustering

  27. Clustering algorithms tend tobe pretty slow • And they require batch data to process • k-means needs random access to the input! • Online k-means • Faster, non-batch. But quality?

  28. Self-Organizing Map • “Kohonen Self-Organizing Map” • Description of the algorithm • Much like online k-means • But with coherence in a separate space

  29. Demo of SOM-enabledProfiling Tool • Visualizations are still early • Hopefully they will mature into something truly useful (people in other visualization fields like SOMs, so hopes are high)

  30. Discussions of changes made to SOM to support online clustering

More Related