AI & Parallelism

AI & Parallelism By: Bryan Griffiths

Topics • Parallel AI in Academics • Parallel AI in the Gaming Industry

Parallel AI in Academics • Areas of use • Academic Stigma • Extending Specific Algorithms (GA) • Benefits for other Algorithms

Areas of Use • There have been significant advances in parallel and distributed computing. • But what are the implications of these advances for AI? • Four areas: • Psychological modeling • Improving efficiency • Helping to organize systems in modular fashion • New methods and mechanisms

Psychological Modeling • This system was originally proposed as a model of human information processing and storage. • The ideas of short term and long-term memory, independently operating productions, matching, and other operations came from psychological literature. • The human brain contains individual neurons which are slow compared to digital computer circuits, but there are vast numbers of these and they are richly connected components that operate concurrently. • SOAR is production system with dual mission: • Architecture for building AI systems • Model human intelligence • SOAR incorporates both sequential and parallel aspects.

Improving Efficiency • AI programs consume significant space and time resources when working with complex problems. • It is therefore important that AI algorithms make use of advances in parallel computation to speed-up research. • There are several sources of parallelism for speedup in production systems: • Production level parallelism, in which all the productions match themselves against working memory in parallel. • Condition level parallelism, in which all of the conditions of single production are matched in parallel. • Action level parallelism, in which all of the actions of a single production are executed in parallel. • Task level parallelism, in which several cycles are executed simultaneously.

Parallelizing AI Algorithms • The amount of task level parallelism available is completely dependent on the nature of the task. • In medical diagnosis, each production firing might be dependent on the previous production firing, thus enabling a long, sequential chain of reasoning to occur. • If the system is diagnosing five patients simultaneously, productions involving different patients would not interact with one another and could be executed in parallel. • Embarrassingly parallel, but extremely useful.

Parallelizing AI Algorithms • Examples: • Ten authors can write a book much faster than one author • Ten woman cannot bear a child any faster than one can! • Likewise throwing more processors at an AI problem may not bring desired benefits. • Many problems can be solved efficiently by parallel methods. • It is not always easy to convert a sequential algorithm into an efficient parallel one. • Some AI algorithms whose parallel aspects have been studied are: • Genetic Algorithms • Most types of searches (BFS, DFS, IDA*) • Alpha-beta pruning • Theorem proving • Neural Nets

Academic Stigma • Currently most AI research does not use parallel implementations of the AI algorithms. Why is that? • Parallel algorithms need parallel machines, which of course means more money is needed to do the research. • Also the way other academics view the effectiveness of an algorithm: • If its parallel it must be slow. • If its parallel it must be a weaker algorithm.

Academic Stigma • If its parallel it must be slow? • You obviously parallelized it because your implementation was slow. • If its parallel it must be a weaker algorithm? • You must have parallelized it because you were not getting “good” solutions and needed more CPUs traversing the search space to achieve your answers.

Extending Specific Algorithms Genetic Algorithms: • is a search technique used to compute solutions to optimization and search problems. • Categorized as global search heuristics. • A class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and recombination.

Genetic Algorithms • Implemented as a computer simulation in which a population of abstract representations of candidate solutions to an optimization problem, evolves toward better solutions. • Applications in biogenetics, computer science, engineering, economics, chemistry, manufacturing, mathematics, physics and other fields.

Genetic Algorithms

Parallel Genetic Algorithms • Parallel implementations of genetic algorithms come in various flavours: • Coarse grained parallel genetic algorithms assume a population on each of the computer nodes and migration of individuals among the nodes. • Types: Island Model, Migration Models. • Fine grained parallel genetic algorithms assume an individual on each processor node which acts with neighboring individuals for selection and reproduction. A.k.a Cellular PGA.

Parallel Genetic Algorithms • Single-population master-slave GAs distributes the evaluation of individuals by scheduling fractions of the population among the processing slave nodes. Such a model has the advantage for ease of implementation and does not alter the search behavior of a sequential GA. • Hierarchical parallel GAs are basically any combination of two or more of the three basic forms of PGA is an HPGA.

Parallel Genetic Algorithms

PGA - Migration • Migration can be done in various ways and can be a factor or function of your cluster design. • Ring Method • Grid Method

PGA - Migration • There are also various ways of choosing which chromosome will migrate from one subpopulation to the next. • Random • Best/Bad

PGA - Migration

PGA - Migration • Another approach could be to implement a central database that subpopulations would submit their best too and then draw the best chromosome from that database to replace their worst. (King of the Hill Migration) • This provides a faster migration of your top chromosome across your entire set of subpopulations and thereby exposing your elite to new genetic materials in hopes of advancing it even farther.

PGA - Migration

PGA – Migration • Advantages: • Easy to implement, even in existing GA code. • Low network traffic. • Customizable to your architecture. • Faster than sequential GA and generally achieves a better result. (Unless optimal solution is easy to obtain.)

PGA – Migration • Disadvantages: • Supposedly harder to work with because you know have three new variables to fine tune. • Migration rate • Number of chromosomes to migrate • Migration direction and layout

Grid Enabled - HPGA

Benefits for other Algorithms(Greedy Algorithms) • Large search space. • A heuristic that tells which current alternative looks best. • Take that alternative first. • Potentially neglect other paths. • In parallel: • Take first N best looking alternatives. • Reducing the neglect.

Benefits for other Algorithms(Parameter sweeping) • Find the best solution for a given N dimensional function by trying all input values. • Given a function f(x,y,z,…) • Find its max, min, highest differential, etc • Tries all x, y, z,… • Mostly embarrassingly parallel. • Algorithms are extremely stupid. • Use other AI/search methods !

Benefits for other Algorithms(Simulated Annealing / Tabu Search) • Every value (or group of) can be handled by a cpu and searched along its own path. • When a cpu becomes free it could request a new value (or group) from another cpu and continue on from that point.

Benefits for other Algorithms(Neural Networks) • Large neural networks can have huge computational needs. • Training a neural network with many input data sets can be time consuming. • Some problems require real time behaviour.

Benefits for other Algorithms (Neural Networks) • Perform learning in parallel • Embarrassingly parallel approaches: • Simply start N neural networks in parallel with different (pseudo) random initial weights and then select one from the final trained networks. • Evaluate input using multiple different networks in parallel and choose answer from network with highest level of confidence • In this approach messages only need to be sent for: • Sending input data to processors • Retrieving output from all processors

Benefits for other Algorithms (Neural Networks) • Split forward propagating network into columns: • Split network into strongly connected clusters: (Neural Networks)

Benefits for other Algorithms (Breadth First Search) • Parallel Implementation: • Maintain shared queues for all processors • One queue per depth • Each processor gets/adds vertices from queue • Barrier until all vertices from one depth visited • Note: search order is potentially different than sequential search order

Benefits for other Algorithms (Depth First Search) • Sequential_DFS(root) • Visited(root) = true; • For all neighbour vertices w of root • If not visited(w) • Dfs_parent(w) = r • Sequential_DFS(w) • Parallel_DFS(root) • Visited(root) = true; • ParFor all neighbour vertices w of root • If not visited(w) • Dfs_parent(w) = r • parallel_DFS(w) • Note: search order is potentially different than sequential search order

Benefits for other Algorithms(IDA *) • Sequential: • For I = 0 to max_depth • DFS_search_to_depth(I) • • Parallel: • For I = 0 to max_depth • Parallel BFS_search_to_depth(I)

Parallel AI in the Gaming Industry • Industry Stigmas • Areas of Use • A Look at Past and Current Generations of Hardware • Some Examples Of AI Engines • Some Ideas of Mine • What the Future Holds

Industry Stigma • True AI uses to many resources that we need instead for our new “mega-super cool flashy graphics that are uber-realistic” engine and that other stuff… • Generally true especially when you understand that “stuff” they are talking about outside of the graphics engine include, physics engines, audio engines, voice and/or video – chat, and everything else that goes into the game. • In most companies AI gets a choke-hold put on it very early in development reducing it to simple finite state machines and scripting/trigger based events because “well it works and that’s all we need”. • Games that feature “revolutionary” AI are sometimes as simple as a system that now randomly says some kind of battle chatter or screams of agony when they die.

Areas of Usefor Parallel AI • Everywhere in this industry: • Graphics, animation, face effects, ect. • Pathfinding and path smoothing. • Individual and group NPC behaviours. • Dynamic music and sound systems. • Competitive and co-operative strategic play. • Arcade games that have large search spaces. • Chess, Go, ect.

Limitations of Past Generations • Single Processor • Single Thread • Limited Resources • Small amount of RAM • Small amount of ROM

Current Generation of Hardware • PS3: • Cell Processor: • 1 Power PE: • A 64-bit general purpose register set. • a 64-bit floating point register set. • a 128-bit Altivec register set. • 8 Synergistic PEs: • Able to do SIMD computations. • Or scalar data types ranging from 8 to 128-bits in size.

Current Generation of Hardware • Xbox 360: • triple-core PowerPC-based design. • Each of the cores has two symmetric hardware threads. • Multiple FPU and SIMD vector processing units in each core. • VMX128 (similar to Altivec, just with more registers)

Current Generation of Hardware • Computers: • Intel CPUs – Dual, Quad and Eight-core versions are in production or in the pipeline. • GPU Technologies becoming even more parallel in nature: • Multi-processor cards • SLI • Crossfire

So how did they use this power? • UE3… multi-threaded, barely. • But that’s not all bad, it means we could do more advanced AI elements in a game if we separate it from the engine and parallelize the AI middleware engine.

AI Middleware Engines • AI.implant, from BioGraphic Technologies in Montreal, Canada. • Focuses on animation control, offering unique AI solutions for complex animations that a developer might need a solution for. • Hierarchical pathfinding • Rule-based decisions • Group behaviors • Flocking behaviors • We could further parallelize this by computing behaviours for various groups or characters that are currently running around.

AI Middleware Engines • Kynapse Engine, Kynogon. • A.I. code reusable independently of engine code • Advanced 3D topology dynamic analysis • Runtime identification of key topological places for hiding, surrounding, organizing opposite flank assault, etc. • Path planning • Path smoothing • 3D pathfinding in a destructible world: http://www.kynogon.com/images-blog/Demos/GDC/destructibleworld.avi

What They Could be Doing… • Take the internet, a simple hosted server, storage drives, a learning algorithm, an AI implementation of algorithm “X” and a little bit of big brother, now blend… • If we can’t do the parallel heavy weight AI on the user’s system then outsource the work.

What They Could be Doing… • NPC AI for FPS that could be updated over time keeping the game fresh and challenging. • Just use a GA offsite on your company server that analyses the fitness of the AI when playing against different players then recombine the chromosome and send out the new AI to be tested and fitness to be evaluated. • Planning AI for RTS games that constantly evolves along with players as they both discover new strategies. • Even after 10 different “balancing patches” have been applied to a game like Starcraft, the AI would not be outdated if you played it in a single player game or included an AI player in a multiplayer game.

What They Could be Doing… • Do you enjoy different kinds of music besides the rock, techno and orchestral music used in most games? • No problem, submit your favorite playlist of music and have the game adapt it to the gameplay using a Neural Net that replaces battle music with your favorite Kung-fu Fighter remix or cranky truck driver-gone mad country track, ect.

Where this Could Lead in the Future • LucasArts next Indiana Jones video game and Star wars game contains some interesting AI improvments for character animations and physics reactions: • http://www.youtube.com/watch?v=jKLfD5M_I6o&eurl=http%3A%2F%2Fwww%2Egame%2Dreviews%2Eca%2Fnews%5F1116%2Ehtm • http://media.ps3.ign.com/media/823/823668/vids_1.html

Where this Could Lead in the Future • CryEngine 2: • Multi-threaded Engine • Which improves many aspects of the game such as AI and physics by speeding up CPU computations. One huge advantage to the CryEngine 2 is that it will detect the number of threads the CPU(s) have and will then equally distribute code out across all of the threads.

Where this Could Lead in the Future • At a glance many of AI problems can be solved with embarrassingly parallel approaches. • Pathing • Behaviours • Learning • etc…

AI & Parallelism

AI & Parallelism

Presentation Transcript

CPE 631: ILP, Dynamic Exploitation

Parallelism: Review

Parallel Programming & Cluster Computing Applications and Types of Parallelism

CS 6290 Instruction Level Parallelism

Chapter 5: Multiprocessors and Thread-Level Parallelism

Parallel Programming & Cluster Computing Shared Memory Multithreading

Understanding PRAM as Fault Line: Too Easy? or Too difficult?

Reducibility and NP-Completeness

Structured Forests for Fast Edge Detection

EECC551 Exam Review

Supercomputing in Plain English Applications and Types of Parallelism

Carlos Guestrin

Part 9 Instruction Level Parallelism (ILP) - Concurrency

Concurrency & Parallelism in UE4

Topic 3

Advanced Topic: High Performance Processors

Part 8 Instruction Level Parallelism (ILP) - Pipelining

Chapter 13: Query Processing

Supercomputing in Plain English Shared Memory Multithreading

CMPUT680 - Winter 2006

POSIX Threads Programming

AI &amp; Parallelism

AI &amp; Parallelism

Presentation Transcript

CPE 631: ILP, Dynamic Exploitation

Parallelism: Review

Parallel Programming &amp; Cluster Computing Applications and Types of Parallelism

CS 6290 Instruction Level Parallelism

Chapter 5: Multiprocessors and Thread-Level Parallelism

Parallel Programming &amp; Cluster Computing Shared Memory Multithreading

Understanding PRAM as Fault Line: Too Easy? or Too difficult?

Reducibility and NP-Completeness

Structured Forests for Fast Edge Detection

EECC551 Exam Review

Supercomputing in Plain English Applications and Types of Parallelism

Carlos Guestrin

Part 9 Instruction Level Parallelism (ILP) - Concurrency

Concurrency &amp; Parallelism in UE4

Topic 3

Advanced Topic: High Performance Processors

Part 8 Instruction Level Parallelism (ILP) - Pipelining

Chapter 13: Query Processing

Supercomputing in Plain English Shared Memory Multithreading

CMPUT680 - Winter 2006

POSIX Threads Programming

AI & Parallelism

AI & Parallelism

Parallel Programming & Cluster Computing Applications and Types of Parallelism

Parallel Programming & Cluster Computing Shared Memory Multithreading

Concurrency & Parallelism in UE4