Production and Compression of Raw data for Time Projection Chamber Ajit Kumar Mohanty - PowerPoint PPT Presentation

Ava
production and compression of raw data for time n.
Skip this Video
Loading SlideShow in 5 Seconds..
Production and Compression of Raw data for Time Projection Chamber Ajit Kumar Mohanty PowerPoint Presentation
Download Presentation
Production and Compression of Raw data for Time Projection Chamber Ajit Kumar Mohanty

play fullscreen
1 / 12
Download Presentation
Production and Compression of Raw data for Time Projection Chamber Ajit Kumar Mohanty
241 Views
Download Presentation

Production and Compression of Raw data for Time Projection Chamber Ajit Kumar Mohanty

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Production and Compression of Raw data for Time Projection Chamber Ajit Kumar Mohanty Dario Favretto Dario Favretto 9 September 2002 1

  2. Summary • ALTRO data format • Data compression based on standard Huffman technique (ref. A. Nicolaucig, M. Mattavelli, S. Carrato) • Using one table • Using 5 tables • Preliminary results • Future developments Dario Favretto 9 September 2002 2

  3. Altro Data Format • ALTRO (Alice Tpc Read Out) • Only the samples over a given threshold are considered (while the others are discarded) • A Bunch is a group of adjacent over threshold samples coming from one pad (The signal can be represented bunch by bunch). • Information relative to one pad is stored in one packet A packet is a sequence of 10 bit words (range 0 -1023) followed by a trailer • Bunch length (number of samples in the bunch) • Time information (temporal position of the last sample in the bunch • Sequence of amplitude values Trailer • Number of words in the packet (10 bits) • Hardware and channel address (8 and 4 bit respectively) Dario Favretto 9 September 2002 3

  4. Compression • Lossless compression technique • Static Huffman coding • Variable length coding technique based on frequency of the symbols (symbols that appear more frequently are coded with a shorter sequence of bits respect to those symbol that appear less frequently in the source file • Static means that the algorithm is based on one or more tables that are built before the compression phase according to the frequency of the symbols Dario Favretto 9 September 2002 4

  5. Compression using one table • Frequency distribution using one table (entropy: 4.97) Dario Favretto 9 September 2002 5

  6. Results • Compression applied on a source file generated simulating one event of 1000 primaries • Threshold value: 2 (Source file dimension 6.5 MB) • Huffman (Dimension of the compressed file: ~3.5 MB) 54% • Gzip (Dimension of the compressed file: ~4.5 MB) 69% • Threshold value: 5 (Source file dimension 1.4 MB) • Huffman (Dimension of the compressed file: ~0.9 MB) 68% • Gzip (Dimension of the compressed file: ~1.2 MB) 83% • Threshold value: 10 (Source file dimension 1 MB) • Huffman (Dimension of the compressed file: ~0.7 MB) 72% • Gzip (Dimension of the compressed file: ~0.9 MB) 85% Dario Favretto 9 September 2002 6

  7. Compression using 5 tables Improvement in compression can be obtained considering the nature of the data. Most of the bunches have a pseudo Gaussian shape in which first and last sample have a smaller value with respect to those in central position. • Samples are classified in three categories (each category correspond to a table) • Isolated samples • Border samples • Central samples • Two more tables are used to store the frequency for the Time-Bin values and bunch length values. Dario Favretto 9 September 2002 7

  8. Frequency distribution Entropy • Bunch length: 1.00 • Bunch of 1 sample: 0.36 • Border samples: 4.43 • Central Samples: 6.95 Dario Favretto 9 September 2002 8

  9. Results • Compression applied on a source file generated simulating one event of 1000 primaries • Threshold value: 2 (Source file dimension 6.5 MB) • Huffman (Dimension of the compressed file: ~3.5 MB) 54% • Huff. 5 Table (Dimension of the compressed file: ~2.8 MB) 42% • Threshold value: 5 (Source file dimension 1.4 MB) • Huffman (Dimension of the compressed file: ~0.9 MB) 68% • Huff. 5 Table (Dimension of the compressed file: ~0.8 MB) 55% • Threshold value: 10 (Source file dimension 1 MB) • Huffman (Dimension of the compressed file: ~0.7 MB) 72% • Huff. 5 Table (Dimension of the compressed file: ~0.6 MB) 57% Dario Favretto 9 September 2002 9

  10. Results • Compression applied on a source file generated simulating one event of 10000 primaries • Threshold value: 2 (Source file dimension 21.8 MB • Gzip (Dimension of the compressed file: ~17.5 MB) 80% • Huff. 5 Table (Dimension of the compressed file: ~10.7 MB) 49% Dario Favretto 9 September 2002 10

  11. Main Macros and Classes • StoreDigits.C is a macro that creates a binary file (DigitsData.dat) containing the sequence of digits (Amplitude, Time-bin, Sector, Row and Pad number) • AliTPCBuildAltroFormat.C is a macro used to generate the Altro format file (AltroFormat.dat) from DigitsData.dat. • AliTPCBuffer160 is a class used to read/write values according to the Altro data format (10 bits words) • AliTPCHNode and AliTPCHTable are classes used to create and manage the tables used by Huffman coding. • AliTPCHCompression class for the implementation of compression and decompression based on one table • AliTPCCompression class for the implementation of compression and decompression based on 5 table Dario Favretto 9 September 2002 11

  12. Future developments • Test phase using bigger source file (80000 primaries) • Complete the implementation of the Altro data format • Optimize frequency tables independently of a particular source file • Improve the compression factor • Abstract the classes to make them available for others detectors (ITS) Dario Favretto 9 September 2002 12