1 / 12

Processors with H yper -T hreading and AliRoot performance

Processors with H yper -T hreading and AliRoot performance. Jiří Chudoba FZÚ, Prague. Motivation. How to choose the optimal hardware Contributions are counted in SPECInts But how to measure it? www.spec.org : CPU2000 tests – 150 USD Many results are published, but:

shina
Download Presentation

Processors with H yper -T hreading and AliRoot performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processors with Hyper-Threading and AliRoot performance Jiří Chudoba FZÚ, Prague

  2. Motivation • How to choose the optimal hardware • Contributions are counted in SPECInts • But how to measure it? www.spec.org: CPU2000 tests – 150 USD Many results are published, but: • The hardware is often not identical with our machines • Results depend on OS, compiler, … chudoba@fzu.cz

  3. HP ProLiant DL360 G3 2xXeon 2.8 GhZ HT, cache 512 KB, 2x18.2 GB Ultra320 Hot Pluggable Drives http://h18004.www1.hp.com/products/servers/proliantdl360/description-g3.html SPECInt2000 results: ProLiant DL560, 2.8 GHz, Intel Xeon MP (2MB L3 cache), HT disabled in BIOS SPECint2000 1247 (SPECint_base2000 1196) ProLiant DL360 G3, 3.06GHz, Intel Xeon), 512KB L2, 1MB L3, no HT SPECint2000: 1258 Dell PowerEdge 2650, 2.8 GHz Xeon, 512KB L2 cache, HT disabled SPECint2000: 907 Intel D875PBZ motherboard (2.80C GHz PIV, HT maybe on – default status) SPECint2000: 1204 Intel D875PBZ motherboard (AA-301) (2.8E GHz, 1MB cache, HT maybe on) SPECint2000: 1269 chudoba@fzu.cz

  4. chudoba@fzu.cz

  5. chudoba@fzu.cz

  6. 2 logical processors Duplication of the architectural state on each processor, while sharing one set of processor execution resources Details on http://www.intel.com/technology/hyperthread/ 10:49pm up 3 days, 4:03, 1 user, load average: 0.00, 0.00, 0.00 31 processes: 30 sleeping, 1 running, 0 zombie, 0 stopped CPU0 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU2 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: 2069804K av, 1386140K used, 683664K free, 0K shrd, 145584K buff Swap: 2097112K av, 0K used, 2097112K free 1006064K cached chudoba@fzu.cz

  7. Not Doubled Performance Note that a CPU that supports hyper-threading is not going to provide comparable performance with two physical processors rated at the same speed. The simple reason for this is because the two logical processors that make up your hyper-threaded CPU have to share resources, namely the execution engine, cache, and access to the system bus. Intel promises 10-30% performance increase ... chudoba@fzu.cz

  8. Hyper-Threading – not always better … but it not always the case: http://www.2cpu.com/Hardware/ht_analysis/3.html chudoba@fzu.cz

  9. Other tests Unix Benchmark Utility v.0.3 Author: Sergei Viznyuk <sv@phystech.com> Klaus Schossmaier reported (numbers per CPU): Opteron 1.4 GHz 74955 1.8 GHz 97749 Xeon 2.4 GHz 88064 Itanium 1.0 GHz 66714 chudoba@fzu.cz

  10. Results for AliRoot noHT HT HT with scheduling 1 297 ±1 s 1.13 337 ± 48 s 1 296 ± 2 s 2 jobs, parallel 2 596 ± 10 s 1.73 515 ± 5 s 1.73 514 ± 3 s 4 jobs, parallel CPU0 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idle CPU1 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idle CPU2 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU3 states: 0.0% user, 0.1% system, 0.0% nice, 99.0% idle CPU0 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idle CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU2 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle 2 594 s 2.26 674 s 2 592 s 2+2 jobs CERN RH 7.3.3, kernel 2.4.20, AliRoot v4-01-05, 1000 tracks HIJINGParam, Real time ftp://ftp.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/ + http://freshmeat.net/projects/sched-utils/ chudoba@fzu.cz

  11. Conclusions • CPU resource estimates are probably very rough • HT can add 15% in performance but in some cases in Real Time • Publicly available results of some our standard CPU test would help (update of Root benchmark tests ?) chudoba@fzu.cz

  12. Root benchmark stress results: Root 3.10.02, gcc 3.2, -O 4 parallel jobs, 9000 events, HT: 512 4 parallel jobs, 9000 events, noHT: 732 2 parallel jobs, 9000 events, HT: 733 Klaus Schossmaier chudoba@fzu.cz

More Related