html5-img
1 / 48

MULTI ROBOT LEARNING WITH ARTICLE SWARM OPTIMIZATION (PSO)

MULTI ROBOT LEARNING WITH ARTICLE SWARM OPTIMIZATION (PSO). Adham Atyabi Supervisour : Dr. Somnuk Phon-Amnuaisuk Co- Supervisour : Mr. Ho Chin Kuan January 2008. M otivation. To reduce costs of solving complex tasks based on the cooperation of simple robots

truong
Download Presentation

MULTI ROBOT LEARNING WITH ARTICLE SWARM OPTIMIZATION (PSO)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MULTI ROBOT LEARNING WITH ARTICLE SWARM OPTIMIZATION (PSO) Adham Atyabi Supervisour: Dr. SomnukPhon-Amnuaisuk Co-Supervisour: Mr. Ho Chin Kuan January 2008

  2. Motivation • To reduce costs of solving complex tasks based on the cooperation of simple robots • To introduce a novel PSO method called AEPSO, with higher adoptability in real world robotic domain. • To implement a method with high capability of knowledge sharing in complex robotic domains.

  3. Problem Statement • The environment is 500*500 pixels • Robots, bombs and obstacles are randomly located in the environment. • Robots should find and disarm bombs in a certain time. • Robots should avoid obstacles. • The robots have limited knowledge about the bombs location (only know the likelihood of bombs in the area). • The likelihood information is uncertain (because of noise and shadow effects).

  4. Research Objectives • To identify design and evaluate strategies for implementing a new Particle Swarm Optimization (PSO) for multi robot learning. • To evaluate the effectiveness of new novel PSO in static, dynamic and real-time scenarios with noise constraint. • To evaluate the effectiveness of new novel PSO in a multi agent based cooperative learning scenario with homogeneous and heterogeneous robots. • To solve the uncertainty in the perception level of the robots/agents in cooperative learning scenario.

  5. Publications • "Particle Swarm Optimization with Area Extension (AEPSO)", conf CEC2007, IEEE Congress on Evolutionary Computation, Stanford university of Singapore, accepted in 15 July 2007. • "Effects of Communication range, Noise and Help request Signal on Particle Swarm Optimization with Area Extension (AEPSO)", conf WIC / IAT, IEEE / ACM International Conference on Intelligent Agent Technology, Stanford University, USA, 25-28 September 2007. • "Particle Swarm Optimizations: A Critical Review", conf IKT07, Third conference of Information and Knowledge Technology, Ferdowsi University, Iran, submitted in 21 July 2007. • “Effectiveness of a Cooperative Learning version of AEPSO in Homogeneous and Heterogeneous Multi Robot Learning Scenario”, Conf, IEEE World Congress on Computational Intelligence (WCCI 2008), Hong Kong, submitted in 22 December 2007. • "Applying Area Extension PSO in Robotic Swarm", Journal paper, Evolutionary Computation, MIT-Press journal, submitted in 10 December 2007).

  6. Particle Swarm Optimization (PSO) • PSO is an Evolutionary Algorithm inspired from animal social behaviors.(Ribeiro and Schlansker, 2005; Chang et al., 2004; Pugh and Martinoli, 2006; Sousa et al., 2003; Nomura,2007) • PSO outperformed other Evolutionary Algorithms such as GA (Vesterstrom and Riget, 2002; Ratnaweera et al., 2004; Pasupuleti and Battiti,2006). • In PSO, X represent the possible solution and V represent the velocity vector.

  7. and EquationsPSO’s Parameters • Velocity: Vi,j(t+1)=w Vi,j(t)+ C1× r1,j× (pi,j(t)-xi,j(t))+C2× r2,j× (gi,j(t)-xi,j(t)) • Solutions (positions): Xi,j(t+1)= Xi,j(t) + Vi,j(t+1) • Personal best and Global Best(Riget and Vesterstrom, 2002a; Stacey et al., 2003).

  8. Neighborhood Topology • Ring (Lbest) • Star(Gbest) • Wheel • Von Neumann • Cluster • Pyramid The figure is presenting Various neighborhood topologies (Kennedy and Mendes, 2002; Zavala, Aguirro and Diharce, 2005).

  9. Litterateur Works on PSO • Single objective domains • Improvement on neighborhood topology. • Improvement on velocity equation. • Improvement on parameters and the way to control it. • Improvement on global best and personal best. (Vesterstrom and Riget, 2002; Pasupuleti and Battiti, 2006; Ratnaweera et al., 2004;Peram et al., 2003; Parsopoulos and Vrahatis, 2002)

  10. Litterateur Works on PSO • Multi objective domains: • Niching PSO • Mutation based and similar techniques. • Parallelism. • Re-initialization. • Clearing memory (personal and global best) • Using Sub-Swarms (Brits, Engelbrecht, and Van Den Bergh, 2002,2003; Yoshida, et al.,2001; Stacey, Jancic and Grundy,2003;Chang, et al., 2005; Vestestrom, Riget, 2002; Qin et al., 2004)

  11. PSO’s Weaknesses • Premature convergence • Parameter control • Fitness function • Diversity (exploration, exploitation) • Dynamic domains • Real Time domains (Angelina, 1998; Chang, et. al, 2003; Ratnaweera, et. Al, 2004; Pasupuleti and Battiti, 2006; Brits, Engelbrecht, and Van Den Bergh, 2002,2003; Yoshida, et al.,2001; Stacey, Jancic and Grundy,2003; Chang, et al., 2005; Vestestrom, Riget, 2002)

  12. Multi Agent Based and Robotic Swarm Litterateur studies • The amount of robots used in litterateurs are 20 to 300 robots (Lee at al.,2005; Hettiarachchi, 2006; Werfel et al., 2005; Chang et al., 2005; Ahmadabadi et al., 2001; Mondada et al. 2004). • Robots can use more knowledge (e.g. robots have knowledge about the location of goals and their teammates) (luke et al., 2005; Ahmadabadi et al., 2001; Yamaguchi et al., 1997; Martinson and Arkin, 2003). • It is commune to train robots individually (Ahmadabadi et al., 2001; Yamaguchi et al, 1997; Hayas et al., 1994).

  13. Area Extended version of PSO • To handle dynamic Velocity: • New velocity heuristic which solved the premature convergence(Vesterstromand Riget, 2002; Ratnaweera et al., 2004; Chang et al., 2004; Pasupuleti andBattiti, 2006). • To handle Direction and Fitness criteria: • Credit Assignment heuristic which solve the clue de sacs problem (Hettiarachchi,2006; Ahmadabadi et al., 2001). • Hot Zone/Area heuristic. • To handle Cooperation: • Different communications ranges condition which provide dynamic neighborhood and sub-swarms (Vesterstrom and Riget, 2002; Brits et al., 2002;Kennedy and Mendes, 2002). • Help Request Signal which provide cooperation between different sub-swarms (Chang et al., 2005).

  14. Area Extended version of PSO • To handle diversity of search: • Boundary Condition heuristic which solve the lack of diversity in basic PSO (Vesterstrom and Riget, 2002; Ratnaweera et al., 2004; Chang et al., 2004). • To handle Lack of reliable perception (Pugh and Martinoli, 2006;Bogatyreva and Shillerov, 2005): • Leave Force which provide the high level of noise resistance. • Guess mechanism which provide the high level of noise resistance.

  15. Dynamic Velocity

  16. Hot Zone/Area Heuristic • The idea is based on dividing the environment to sub virtual fixed areas with various credits. • Areas credit defined the proportion of goals and obstacles positioned in the area. • particles know the credit of first and second layer of its current neighborhood

  17. Communication Methodology and Help Request Signal • Robots can only communicate with those who are in their communication range. • Various communication ranges were used (500, 250, 187, 125, 5 pixels). • This heuristic has major effect on the sub swarm size. • Help request signal can provide a chain of connections.

  18. Credit assignment Heuristic • Reward and Punishment • Suspend factor

  19. Boundary condition • In AEPSO, robots take a suspend punishment each time that they cross boundary lines. • By this conditions they can escape from the areas that they are stuck in it and it is as useful as reinitializing the robot states in the environment.

  20. Guess mechanism and Leave Force Heuristics • Guess mechanism is based on using an extra memory in robots called Mask. • Masks can take values by: • Shadow effect. • Robots self observation. • Self guess. • Neighbor’s observation. • Neighbors guess. • Leave Force is an extra punishment which will force robots to decrease 10% of their current area after certain iteration.

  21. Scenarios • Static Scenario. • Dynamic scenario. • Real-Time scenario. • Cooperative learning scenario: • Homogeneous • Heterogeneous

  22. Usage of Heuristics in various scenarios

  23. Empirical parameter setups in various scenarios

  24. Static and Dynamic Domain • In contrast with static scenario, in dynamic domain, Bombs are able to run away. • Bomb velocity is set to 2 pixel/iteration and robots velocity is a value between [1-3] pixel/iteration. • Bombs’ explosion time is set to 20,000 iteration (maximum iteration).

  25. Static and Dynamic Scenarios - Results

  26. Real Time Scenario • Bombs explosion time is a random value between [3,000-20,000] iterations. • Robots should locate bombs before they reach to their explosion time. • A simple noise is presumed in the environment (a simple +/- value to areas’ credit).

  27. Real Time Scenario - Results

  28. Real Time Scenario - Results

  29. Various PSOS’ results on static environments

  30. AEPSO vs. Random Search and Linearly Search

  31. Cooperative Learning Scenarios • Higher level of Noise (Shadow) is presumed. • The scenarios have two phases: • Training • Testing • Higher level of cooperation is needed.

  32. Shadow effect (Noise) • The Shadow idea is inspired from our real world perceptions errors and mistakes which can be easily imagined as corrupted data which could be caused by the lack of communication (satellite data’s) or even sensation elements (sensors) weaknesses. • Shadow effect forced approximately over 50% noise to the environment.

  33. Homogeneous Cooperative Learning • The robots have limited knowledge about the bombs location (only know the likelihood of bombs in the area). • The likelihood information is uncertain (because of noise and shadow effects). • Robots should find the true credit of each area and solve those areas who have the most effect on the others first. • Robots can inspire from their training results knowledge with the aim of solving the task faster. • Robots should give priority to areas’ with highest effect on others.

  34. Heterogeneous Cooperative Learning • There are various type of bombs and robots. • Each robot can only disarm an specific type of bomb. • Robots and bomb types are set randomly. • Robots use more accurate version of Help Request Signal. • Three scenarios are presumed: • Homogeneous robots (S1). • Without guarantee (S2). • With guarantee (S3).

  35. Past Knowledge in Homogeneous scenario

  36. Homogeneous Cooperative Learning - Results

  37. Past Knowledge in Heterogeneous scenario

  38. Heterogeneous Cooperative Learning - Results

  39. Conclusion and Future work • In this study, we introduced AEPSO as a new modified version of Basic PSO and we also investigated its effectiveness on static, dynamic, real-time, multi dimension, and multi objective problem domains. • It is necessary to mentioned that the small number of particles (only 5 robots) gave a great advantage to AEPSO (due to being able to reduce the costs). • Robots were able to solve problems with high level of complexities based on using poor level of knowledge (training knowledge) and high level of cooperation and experience sharing. • We are going to test the q-learning technique and compare it with AEPSO in heterogeneous scenario.

  40. References Ahmadabadi, M. N., Asadpour, M., and Nakano, E. (2001). Cooperative q-learning: the knowledge sharing issue. Advanced Robotics. Angeline, P. J. (1998). Evolutionary optimization versus particle swarm optimization: Philosophy and performance differences. Evolutionary Programming VII, Lecture Notes in Computer Science, Springer. Atyabi, A. and Phon-Amnuaisuk, S. (2007). Particle swarm optimization with area extension (aepso). IEEE Transaction CEC2007. Atyabi, A., Phon-Amnuaisuk, S., and Ho, C. K. (2007a). Effectiveness of a cooperative learning version of aepso in homogeneous and heterogeneous multi robot learning scenario. Springer. Atyabi, A., Phon-Amnuaisuk, S., and Ho, C. K. (2007b). Effects of communication range, noise and help request signal on particle swarm optimization with area extension (aepso). IEEE/ACM TRANSACTION WI/IAT07.

  41. References Bakker, P. and Kuniyoshi, Y. (1996). Robot see, robot do: an overview of robot imitation. AISB workshop on Learning in robots and animals. Beslon, G., Biennier, F., and Hirsbrunner, B. (1998). Multi-robot path-planning based on implicit cooperation in a robotic swarm. Proceedings of the second international conference on Autonomous agents. Bogatyreva, O. and Shillerov, A. (2005). Robot swarms in an uncertain world:controllable adaptability. International Journal of Advanced Robotic Systems. Brits, R., Engelbrecht, A. P., and Van Den Bergh, F. (2002). A niching particle swarm optimizer. 4th Asia-Pacific Conference on Simulated Evolution and Learning 2002 (SEAL 2002). Brits, R., Engelbrecht, A. P., and Van Den Bergh, F. (2003). Scalability of niche pso. Swarm Intelligence Symposium, SIS ’03. Chang, B. C. H., Ratnaweera, A., Halgamuge, S. K., and Watson, H. C. (2004). Particle swarm optimization for protein motif discovery. Journal of Genetic Programming and Evolvable Machines.

  42. References Chang, K., Hwang, J., Lee, E., and Kazadi, S. (2005). The application of swarm engineering technique to robust multi chain robot systems. IEEE Conference on Systems, Man, and Cybernetics , USA. Chuanwen, J. and Jiaotong, S. (2005). A hybrid method of chaotic particle swarm optimization and linear interior for reactive power optimisation. Mathematics and Computers in Simulation. Clerc, M. (2002). The particle swarm - explosion, stability, and convergence in multidimensional complex space. IEEE Transaction on Evolutionary. Dowling, J. and Cahill, V. (2004). Self managed decentralized systems using k-components and collaborative reinforcement learning. 1st ACM SIGSOFT workshop on Self-managed systems. Grosan, C., Abraham, A., and Chis, M. (2006). Swarm intelligence in data mining, studies in computational intelligence (sci). Springer. Hayas, G. and Demiris, J. (1994). A robot controller using learning by imitation. 2nd Int.Symp on Intelligent Robotic Systems.

  43. References Hettiarachchi, S. (2006). Distributed online evolution for swarm robotics. Autonomous Agents and Multi Agent Systems. Kennedy, J. and Mendes, R. (2002). Population structure and particle swarm performance. Proceedings of the 2002 Congress on Evolutionary Computation (CEC ’02). Krink, T., Vesterstrom, J. S., and Riget, J. (2002). Particle swarm optimization with spatial particle extension. Congress on Evolutionary Computation (CEC),2002 IEEE World Congress on Computational Intelligence. Lee, C., Kim, M., and kazadi, S. (2005). Robot clustering. Systems, Man and Cybernetics, 2005 IEEE International Conference on. Luke, S., Sullivan, K., Balan, G. C., and Panait, L. (2005). Tunably decentralized algorithms for cooperative target. Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2005).

  44. References Martinson, E. and Arkin, R. C. (2003), Learning to Role-Switch in Multi-robot Systems. IEEE International Conference onRobotics and Automation (ICRA '03). Mondada, F., Pettinaro, G., Guignard, A., Kwee, I., Floreano, D., Deneubourg, J., Nolfi, S., Gambardella, L., and Dorigo, M. (2004). Swarm-bot: a new distributed robotic concept. Autonomous Robots. Nomura, Y. (2007). An integrated fuzzy control system for structural vibration. Computer-Aided Civil and Infrastructure Engineering 22. Parsopoulos, K. E. and Vrahatis, M. N. (2002). Particle swarm optimization method in multi objective problems. IEEE 2002 ACM symposium on Applied computing. Pasupuleti, S. and Battiti, R. (2006). The gregarious particle swarm optimizer (g-pso). 8th annual conf Genetic and evolutionary computation. Peram, T., Veeramachaneni, K., and Mohan, C. K. (2003). Fitness distance ratio based particle swarm optimization (fdr-pso). Swarm Intelligence Symposium, SIS ’03. Pugh, J. and Martinoli, A. (2006). Multi-robot learning with particle swarm optimization. AAMAS’06.

  45. References Pugh, J. and Zhang, Y. (2005). Particle swarm optimization for unsupervised robotic learning. Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE. Qin, Y., Ssun, D., Li, N., and Cen, Y. (2004). Path planning for mobile robot using the particle swarm optimization with mutation operator. Proceedmgs of the Third Intemational Conference on Machine Laming and Cybemetics. Ratnaweera, A., Halgamuge, S. K., and Watson, H. C. (2004). Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Transactions on evolutionary computation. Reggia, J. and Rodriguez, A. (2004). Extending self-organizing particle systems to problem solving. MIT Press Artificial Life. Ribeiro, P. F. and Schlansker, W. K. (2005). A particle swarm optimized fuzzy neural network for voice controlled robot systems. IEEE Transactions on Industrial Electronics. Riget, J. and Vesterstrom, J. S. (2002a). Controlling diversity in particle swarm optimization. Congress on Evolutionary Computation (CEC),2002 IEEE World Congress on Computational Intelligence.

  46. References Riget, J. and Vesterstrom, J. S. (2002b). Division of labor in particle swarm optimization. Congress on Evolutionary Computation (CEC),2002 IEEE World Congress on Computational Intelligence. Sousa, T., Neves, A., and Silva, A. (2003). Swarm optimization as a new tool for data mining. Proceeding of the international parallel and distributed processing symposium (IPDPS’03). Stacey, A., Jancic, M., and Grundy, L. (2003). Particle swarm optimization with mutation. IEEE Evolutionary Computation, CEC ’03. Tan, M. (1993). Multi agent reinforcement learning : independent vs. cooperative agents. 10th Int Machin Learning. Tangamchit, P., Dolan, J. M., and Khosla, P. K. (2003). Crucial factors affecting cooperative multirobot learning. Conference on Intelligent Robots and Systems. Vesterstrom, J. and Riget, J. (2002). Particle swarms extensions for improved local, multi-modal, and dynamic search in numerical optimization. Master’s thesis, Dept. Computer Science, Univ Aarhus, Aarhus C, Denmark.

  47. References Werfel, J., Yaneer, B. Y., and Negpal, R. (2005). Building patterned structures with robot swarms. Nineteenth International Joint Conference on Artificial Intelligence (IJCAI’05). Yamaguchi, T., Tanaka, Y., and Yachida, M. (1997). Speed up reinforcment learning between two agents with adaptive mimetism. IEEE/RSJ Intelligent Robots and Systems. Yang, C. and Simon, D. (2005). A new particle swarm optimization technique. 18th International Conference on Systems Engineering (ICSEng 2005). Zavala, A.E.M., Aguirro, A.H. and Diharce, E.R.V. (2005). Constrained Optimization via Particle Evolutionary Swarm Optimization Algorithm (PESO). Proceedings of the 2005 conference on Genetic and Evolutionary Computation. Zhang,W. and Xie, X. (2003). Depso: Hybrid particle swarm with differential evolution operator. IEEE, Systems, Man and Cybernetics. Zhao, Y. and Zheng, J. (2004). Particle swarm optimization algorithm in signal detection and blind extraction. IEEE, Parallel Architectures, Algorithms and Networks.

  48. Thanks

More Related