Control of a Walking Biped Using a Combination of Simple Policies

Whitman and Atkeson Control of a Walking Biped Using a Combination of Simple Policies

Goal Paper • Present a decoupled controller for a simulated three-dimensional biped. • Dynamics broke down into multiple subsystems that are controlled seperately. • Policies are brought back into simplified states and control action back onto the full system. Cognitive Robotics 2010

I. Introduction • Coordination of multiple policies: • Time till touch down • Dynamic programming simultaneously and globally optimize: • Foot placement • Step timing • Body motion Cognitive Robotics 2010

A. Related Work • Paradigm: high degree of freedom (DoF) systems generate a nominal trajectory stabilized with a feedback controller. • Only functions in a small tube or within the state space. • Produce policies that are valid for a large region of the state space (Brice et. al. 2006) • Library of multiple trajectories. Cognitive Robotics 2010

A. Related work • Dynamically stable walking trajectories based on the zero moment point (ZMP). • A desired ZMP trajectory was chosen before the specified footstep locations and timing. Then the CoM trajectory is calculated based on the desired ZMP trajectory. Kajita et. al. (2006) Cognitive Robotics 2010

B. Dynamic Programming • State space • (1) Pick a new action, best or random. • (2) Update the value function. • Compass gait walker • Point mass on two rigid legs. Cognitive Robotics 2010

II. System • Five rigid legs • W: 78 kg (50 torso, 14 legs) • Length; Legs: 0,81 m • CoM: 1,00 m above ground • 12 DoF • 6 torso; 2 x 2 for hip and 1 for each knee. • 3 Pitch joints for ankle. • u = coefficient of friction = 1,0 Cognitive Robotics 2010

Control Architecture • High dimensional state/action space • Lower dimensional: • Each joint in sagittal/coronal plane. • Dynamics decoupled and control separately. • Left and right coupling at the same time. • Double support is ignored (1% - 2% step) because compass gait. Cognitive Robotics 2010

A. Sagittal Stance Leg Policy • 7 DoF • 3 on the torso • 4 on the hip and knees. • 5 dimensional • Torque at the pitch hip, knees and ankle. • Simplified  Full system Cognitive Robotics 2010

A. Sagittal Stance Leg Policy • Simplify system • Origin at ankle: -2 DoF. • Ignoring the swing leg: -2 DoF. • Stance knee straight: -1 DoF • Torso at an constant angle: -1 DoF Total DoF will be 1: a two link inverted pendulum with the upper link at a fixed angle. Cognitive Robotics 2010

A. Sagittal Stance Leg Policy L = 0,81 m L2 = 0,40 m L1x = 0,4sin(ϕ) L1y = 0,4cos(ϕ) ϕ = 0,1 rad M = 50 kg m = 14 kg I = ml2/3 τ = torque Cognitive Robotics 2010

A. Sagittal Stance Leg Policy • V actual velocity • Vdesdesired velocity (1,0 m/s) • Fx = ground reaction force. • To full state; 3d vector from stance foot to stance hip in sagittal plane. • Proportional-derivative (PD) used. • Kp= 1500 Nm; Kd = 150Nm-s; Kp= 1000 Nm; Kd = 150Nm-s • Leg straight and torso at certain angle. Cognitive Robotics 2010

A. Sagittal Stance Leg Policy • Shape are similar. • Full model at higher frequency. • Difference in speed and touch in variation touch down model. • Torso bobs forward • Torque at hip applied. Cognitive Robotics 2010

B. Sagittal Swing Leg Policy • Swing leg controlled by stance leg. • Dynamics and controller are known. • Time and angle at touch down. • Error a few msec. Cognitive Robotics 2010

B. Sagittal Swing Leg Policy • Bending knee (5cm) above ground: inverse kinematics. • Spline at current angle and velocity. • Match velocity swing and stance leg. Cognitive Robotics 2010

C. Coronal Policy • 5 DoF • 4 Dimensional action space hip and ankle. • Simplified dynamics nearly the same • A third state is added; estimated time until touchdown. • Angle of touchdown variable. • Desired velocity of zero • Added y2 : legs close to vertical. Cognitive Robotics 2010

C. Coronal Policy • Periods match because of period design parameter. • Impact touchdown counters with rolling torso to vertical. Cognitive Robotics 2010

D. Yaw control • Ankle twist joints are used. • Servoing Kp = 500 and Kd = 30 Nm-s the joints to zero. • Coupling large because shin axis is in line with coronal plane torques. Cognitive Robotics 2010

IV. Robustness to perturbations • Result of pertubation depends on the timing, location and direction of the pertubation. • Unperturbed step: 56 sec. • Front to right not sensible, back to left. • Mid more stable then at the beginning or end. Cognitive Robotics 2010

IV. Robustness to perturbations • Friction affect slipping in forward direction. • Changing the height of the perturbation has a significant effect on perturbing torque around the stance foot: tipping • Changing height +20 has shows the location of the perturbation (right). Cognitive Robotics 2010

V. Speed Control • Two ways: • Torso lean angle • Desired velocity. • Desired velocity 1,0 m/s and lean angle 0,1 rad. . • Simplified system loses energy from touchdown. Cognitive Robotics 2010

V. Speed Control • Change sagital policy forward speed • Estimates of touchdown ankle more accurate. • Changing policies and lean angles in tandem. • First policy 1,0 m/s and second 0,25 m/s. • Little energy is los on short steps at slow speeds. Cognitive Robotics 2010

VI. Discussion and future work • The couplings between the subsystems functioning in a full system properly. • Study simulation researchers believe it is well suited in real hardware, because simplified. • The control architecture is modular. • Also produce other types of walking. Cognitive Robotics 2010

VI. Discussion and future work • Future work: • Correspond full dynamics to simplified dynamics • Insert torso dynamics for more accurate touch down model. • Reverse scenario: coronal policy determine the touch down. • Provide a good mechanism which policy was most important to. Cognitive Robotics 2010

Conclusion • Able to generate policies that are valid for a large region of the high-dimensional state of the full system. • Allowing to react on large perturbations. Cognitive Robotics 2010

Review Article Advantages • Simplified systems are closely related to the full systems. • Simultaneously adjustment. Disadvantages: • Only simulation is used, perturbations in real world? • No double support phase is used in their simulation. • No torso dynamics in simplified system. • Discussion… Cognitive Robotics 2010

Video experiment Cognitive Robotics 2010

Control of a Walking Biped Using a Combination of Simple Policies