1 / 29

Teuthology

Teuthology. Presented 2011-07-01 tommi.virtanen@dreamhost.com image credit: http://www.flickr.com/photos/peterblapps/3250800528/. Ceph as in Cephalopoda Mollusca Invertebrae. Teuthology Malacology. Not your grandmother's software stack. We tried Autotest.

marlis
Download Presentation

Teuthology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Teuthology Presented 2011-07-01 tommi.virtanen@dreamhost.com image credit: http://www.flickr.com/photos/peterblapps/3250800528/

  2. Ceph as in Cephalopoda Mollusca Invertebrae Teuthology Malacology

  3. Not your grandmother's software stack

  4. We tried Autotest ... and quickly discovered it's limitations Currently at 15 independent patches,  24 files changed, 575 insertions(+), 19 deletions(-) Realized Autotest's architecture is working against us. We still use it for it's packaged "client side" tests, but not its multi-machine features.

  5. Python + Paramiko (SSH) + gevent = orchestra Real-time Interactive Central controller Full SSH protocol   (channels!) Not Chef Not Fabric cluster = Cluster(...) cluster.run(...) cluster.only('x86').run(...) cluster.exclude('x86').run(...) http://github.com/tv42/orchestra Multi-machine control

  6. Teuthology is a test runner Run tasks on targets as told to by roles. Automatically Setup Monitor health Run test(s) Archive results Archive logs, core dumps, etc Clean up http://github.com/tv42/teuthology Read the README

  7. Run tasks on targets as told to by roles. targets: - ubuntu@sepiaXX.ceph.dreamhost.com - ubuntu@sepiaYY.ceph.dreamhost.com - ubuntu@sepiaZZ.ceph.dreamhost.com YAML format: lists, dicts, strings, numbers. You need to have SSH working, without passphrases. You need passphraseless sudo on the remote host.

  8. Run tasks on targets as told to by roles. roles: - [mon.0, mds.0, osd.0] - [mon.1, osd.1] - [mon.2, client.0]

  9. Run tasks on targets as told to by roles. targets: - ubuntu@sepiaXX... - ubuntu@sepiaYY... - ubuntu@sepiaZZ... roles: - [mon.0, mds.0, osd.0] - [mon.1, osd.1] - [mon.2, client.0]

  10. Run tasks on targets as told to by roles. tasks: - ceph: - kclient: [client.0] - autotest:     client.0: [dbench]

  11. Interactive mode tasks: - interactive: INFO:teuthology.run_tasks:Running task interactive... Ceph test interactive mode, use ctx to interact with the cluster, press control-D to exit... >>> 1+1 2 >>>

  12. Interactive mode >>> ctx.cluster.only('osd.0').run(args=['uptime']) INFO:orchestra.run.out: 13:05:38 up 42 days, 23:17,  0 users,  load average: 0.12, 0.09, 0.07 [<orchestra.run.RemoteProcess object at 0x28bd110>] One RemoteProcess per command run.

  13. Using just one Remote first >>> (remote,) = ctx.cluster.only('osd.0').remotes.keys() >>> proc = remote.run(args=['echo', '*']) INFO:orchestra.run.out:* >>> proc <orchestra.run.RemoteProcess ...> >>> proc.command "echo '*'" Shell quoting done for you. Works like ctx.cluster.run. Just one RemoteProcess, not a list.

  14. Failing processes >>> remote.run(args=['bork']) INFO:orchestra.run.err:bash: bork: command not found ... CommandFailedError: Command failed with status 127: 'bork' >>> proc = remote.run(args=['bork'], ...     check_status=False) INFO:orchestra.run.err:bash: bork: command not found >>> proc.exitstatus 127

  15. Concurrency >>> proc = remote.run(args=['uptime'], wait=False) >>> proc <orchestra.run.RemoteProcess object at 0x28bd1d0> >>> proc.exitstatus <gevent.event.AsyncResult object at 0x28c2a10>

  16. Concurrency >>> proc.exitstatus <gevent.event.AsyncResult object at 0x28c2a10> >>> import time; time.sleep(0) INFO:orchestra.run.out: 13:16:48 up 42 days, 23:28,  0 users,  load average: 0.35, 0.15, 0.08 >>> proc.exitstatus <gevent.event.AsyncResult object at 0x28c2a10> >>> proc.exitstatus.get() 0

  17. Capturing stdout/stderr >>> from orchestra import run >>> proc = remote.run(args=['uname', '-m'], ...     wait=False, stdout=run.PIPE) >>> proc.exitstatus <gevent.event.AsyncResult object at 0x28c2dd0> >>> proc.exitstatus.ready()    # just for debug False >>> proc.stdout.read() 'x86_64\n' >>> proc.exitstatus.get() 0

  18. Deadlocks you must avoid: stdout vs stderr stdout/err vs stdin stdout/err vs exit

  19. Using Cluster >>> processes = ctx.cluster.run( ...     args=['uname', '-m'], ...     wait=False, ...     stdout=run.PIPE) >>> processes [<orchestra.run.RemoteProcess object at 0x28bdbf0>, <orchestra.run.RemoteProcess object at 0x28bdb90>, <orchestra.run.RemoteProcess object at 0x28bdad0>] >>> [p.stdout.read() for p in processes] ['x86_64\n', 'x86_64\n', 'x86_64\n'] >>> run.wait(processes) >>> 

  20. Controlling stdout/stderr logging Usually looks like teuthology.task.foo >>> import logging >>> log = logging.getLogger(__name__) >>> log.info('foo') INFO:__builtin__:foo >>> ctx.cluster.only('osd.0').run( ...     args=['uptime'], ...     logger=log.getChild('uptime')) INFO:__builtin__.uptime.out: 13:52:49 up 43 days, 4 min,  0 users,  load average: 0.00, 0.01, 0.05 [<orchestra.run.RemoteProcess object at 0x28bdb90>] >>> 

  21. Tasks can be context managers tasks: - ceph: - kclient: ... - autotest: ... - interactive:

  22. /tmp/cephtest Must not exist already, or target is dirty   (see teuthology-nuke, later) Used by tasks to store things in Tasks are responsible for cleaning up after themselves   (no toplevel rm -rf, to flush out the bugs) Anything in /tmp/cephtest/archive gets archived Please bzip2 -9 any big files your task leaves in archive

  23. Cleanups & failures Clean up can fail, further cleanups are still attempted   -> always study the first error, not the last one. If a task fails to clean up, the targets are left "dirty". teuthology-nuke is a Big Hammer.

  24. Archived results 2011-06-21T10-00-44/ ├── ceph-sha1 ├── config.yaml ├── remote │   ├── ubuntu@sepia70.ceph.dreamhost.com │   │   ├── log │   │   │   ├── client.admin.log.bz2 │   │   │   ├── mds.0.log.bz2 │   │   │   ├── mon.0.log.bz2 │   │   │   └── osd.0.log.bz2 │   │   └── syslog │   │       ├── kern.log.bz2 │   │       └── misc.log.bz2 │   ├── ubuntu@sepia71.ceph.dreamhost.com ... │   └── ubuntu@sepia72.ceph.dreamhost.com │       ├── autotest │       │   └── ... │       ├── log ... │       └── syslog ... ├── summary.yaml └── teuthology.log

  25. gitbuilder A low-key low-hype continuous integration tool Builds tags and heads of branches On bad build, tries older commits until finds green We have it building ceph and our kernel fork http://ceph.newdream.net/gitbuilder/ http://ceph.newdream.net/gitbuilder-i386/ http://ceph.newdream.net/gitbuilder-gcov-amd64/ http://ceph.newdream.net/gitbuilder-deb-amd64/ http://ceph.newdream.net/gitbuilder-kernel-amd64/

  26. We made gitbuilder create tarballs http://ceph.newdream.net/gitbuilder/output/ref/origin_master/ Index of /output/ref/origin_master/mode links bytes last-changed name dr-x 2 4096 Jun 29 13:58 ./ dr-x 28 12288 Jun 29 15:16 ../ -r-- 1 149323650 Jun 29 13:58 ceph.x86_64.tgz -r-- 1 41 Jun 29 13:57 sha1 Don't trust the links, ProxyPass confuses the web server Fetch .../output/origin_master/sha1, then fetch .../output/sha1/SHA1_HERE/ceph.x86_64.tgz

  27. Future and topics not covered teuthology-suite nightly runs machine allocation gcov flavors custom ceph builds installing custom kernels failure testing monitor health

  28. Thank You Questions? tommi.virtanen@dreamhost.com

More Related