1 / 36

UCR Time Series Semantic Segmentation Archive

UCR Time Series Semantic Segmentation Archive. Please reference as: Eamonn Keogh (2016).  The UCR Time Series Semantic Segmentation Archive.  URL: www.cs.ucr.edu/~eamonn/SemanticSegmentation. Each file is an independent problem The naming format is

mickie
Download Presentation

UCR Time Series Semantic Segmentation Archive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UCR Time Series Semantic Segmentation Archive Please reference as: Eamonn Keogh (2016). The UCR Time Series Semantic Segmentation Archive. URL: www.cs.ucr.edu/~eamonn/SemanticSegmentation

  2. Each file is an independent problem The naming format is <nemonic_name> _<recommended subsequence length>_<location of 1st change point>_...<location of ith change point>.txt For example, consider the following simple test dataset, created with the following two lines of Matlab: >> a= sin(0:0.05:400) + randn(size( sin(0:0.05:400) ))/30; >> a(3000:5000)=abs(a(3000:5000)); This dataset is visualized below. 0 1000 2000 3000 4000 5000 6000 7000 8000 The name of this test dataset is SimpleSynthetic_125_3000_5000.txt Note that the recommended subsequence length (about one period of the original sine waves) is just a very loose guess at the natural “scale” of the data (your algorithm can ignore it). Further note that the locations of the onset of the new segments are approximate, however the scoring function we suggest will reward predictions close to the stated location.

  3. There are at least two ways you could use these files to test your algorithm • You use the file name to find the true number (but not locations) of regimes in the data, and give that “hint” to your algorithm. In the example below, we can see that there are two change points, thus three regimes. Thus you might ask your algorithm “find me the best three regimes”. • You can ignore the clue embedded in the file name, and simply task the algorithm with “find me the best K regimes”. • Option ‘2’ is clearly harder and more realistic, however option ‘1’ is a little easer to score and compare results, so we recommend using that. The name of this test dataset is SimpleSynthetic_125_3000_5000.txt 0 1000 2000 3000 4000 5000 6000 7000 8000

  4. There is an important caveat about the real datasets in this collection. We have made a best effort to provide correct annotations the real datasets. In some cases we have access to external data (such as accompanying video, or doctors notes) that makes this easy. However, it is possible that there is a different set of regimes that are simply not apparent to us. To see this, let us consider an example in the analogue text space, suppose we have… Une fois, sur le minuit lugubre, pendant que je méditais, faible et Over many a quaint and curious volume of forgotten lore A European would mark the regime change as where the language changes from French to English. However, someone who literate only in Japanese would probably see the change of font as the regime change.

  5. Notes • Some of these problems are easy if you look at the local amplitude change • Some of these problems are easy if you look at the local frequency change • Some of these problems are easy if you look at the local noise-level change • Etc. • FLOSS does not look at any of these things explicitly, however you are obviously free to use any features you want.

  6. NogunGun_150_3000.txt • This dataset is derived from the same original source as the UCR Time Series Classification archive Gun-Point dataset. • The time series is the y-axis position of the actors hand as she points her finger about 20 times, followed by pointing a gun about 30 times. The actor holsters the gun between each time she aims it, and the holstering proved to be quite difficult. • There are some dropouts in the data (we used a very primate image processing algorithm to track the red glove, and it sometimes failed). • The timing/periodicity is better than you might expect, because we used a metronome to signal be beginning of each event (which are 150 datapoints apart).

  7. Fetal2013_70_6000_12000.txt This dataset is adapted from Noninvasive Fetal ECG: the PhysioNet/Computing in Cardiology Challenge 2013 https://physionet.org/challenge/2013/ We took just the AECG1 leads from three separate fetal recordings, a64, a68, a57, and concatenated them (in that order). We then downsampled the data 1 in 10, so the effective sampling rate was 100Hz

  8. TiltECG_200_25000.txt This dataset is adapted from Heldt T, Oefinger MB, Hoshiyama M, Mark RG. Circulatory response to passive and active changes in posture. ComputCardiol, 30:263–266, Sept. 2003. A subject was lying on tilt table with foot support. At 25,000 the tilt table was rapidly rotated to the stand-up position. This dataset records the volunteers ECG.

  9. TiltABP_210_25000.txt This dataset is adapted from Heldt T, Oefinger MB, Hoshiyama M, Mark RG. Circulatory response to passive and active changes in posture. ComputCardiol, 30:263–266, Sept. 2003. A subject was lying on tilt table with foot support. At 25000 the tilt table was rapidly rotated to the stand-up position. This dataset records the volunteers ABP.

  10. WalkJogRun1_80_3800_6800.txt Adapted from REALDISP Activity Recognition Dataset Data Set. Banos, O., Toth M. A., Damas, M., Pomares, H., Rojas, I. Dealing with the effects of sensor displacement in wearable activity recognition. Sensors vol. 14, no. 6, pp. 9995-10023 (2014).  This problem is mildly difficult because the sensor is a rotation sensor GyrY on the  left lower arm (presumably a leg sensor would be better), and because, while the distinction between walk and jog is very clear, the distinction between jog and run is less so. 20 15 10 5 0 -5 -10 -15 -20 0 2000 4000 6000 8000 10000

  11. WalkJogRun2_80_3800_6800.txt Adapted from REALDISP Activity Recognition Dataset Data Set. Banos, O., Toth M. A., Damas, M., Pomares, H., Rojas, I. Dealing with the effects of sensor displacement in wearable activity recognition. Sensors vol. 14, no. 6, pp. 9995-10023 (2014).  The sensor is a rotation sensor GyrY on the  left calf. This problem is mildly difficult because while the distinction between walk and jog is very clear, the distinction between jog and run is less so. Moreover, there is some evidence of a “stumble” (at about 1400) and correction (at about 1600). This is clearer if we look at a different view, the left leg acceptation (shown in red). 0 2000 4000 6000 8000 10000 12000

  12. RoboticDogActivityX_64_8699.txt • This dataset is derived from Carnegie Mellon University Sony RoboDog. • The time series is the x-axis acceleration of the Sony RoboDog. • We suggest a subsequence length of 64, which is about 1/2 second. • The time series contains two segments, for the first 8699 data points the robot is walking on cement, then it begins to play.

  13. RoboticDogActivityY_64_10699.txt • This dataset is derived from Carnegie Mellon University Sony RoboDog. • The time series is the x-axis acceleration of the Sony RoboDog. • We suggest a subsequence length of 64, which is about 1/2 second. • The time series contains two segments, for the first 10699 data points the robot is walking on cement, then it begins to play.

  14. RoboticDogActivityY_64_4000.txt • This dataset is derived from Carnegie Mellon University Sony RoboDog. • The time series is the x-axis acceleration of the Sony RoboDog. • We suggest a subsequence length of 64, which is about 1/2 second. • The time series contains two segments, for the first 10699 data points the robot is walking on cement, then it begins walk on carpet.

  15. SuddenCardiacDeath1_25_6200_7600.txt Data is at 50Hz. Patient is Male, 75 undergoing Cardiac surgery. For the first 125 seconds the patient has Bundle branch block beats, then there is a burst of Premature ventricular contractions lasting for about 28 seconds before the patient settles back to normal heartbeats. This database is described in: Greenwald SD. Development and analysis of a ventricular fibrillation detector. M.S. thesis, MIT Dept. of Electrical Engineering and Computer Science, 1986. 1000 800 600 400 200 0 -200 -400 -600 -800 0 2000 4000 6000 8000 10000 12000

  16. SuddenCardiacDeath2_25_3250.txt Data is at 50Hz. Patient is Female, 82 undergoing Heart failure. For the first 65 seconds the patient a very irregular beat, then there is a sustained ventricular tachyarrhythmia This database is described in: Greenwald SD. Development and analysis of a ventricular fibrillation detector. M.S. thesis, MIT Dept. of Electrical Engineering and Computer Science, 1986. 3000 2000 1000 0 -1000 -2000 0 2000 4000 6000 8000 10000 12000 14000

  17. SuddenCardiacDeath3_25_3250.txt Data is at 50Hz. Patient is Female, 82 undergoing Heart failure. For the first 65 seconds the patient a very irregular beat, then there is a sustained ventricular tachyarrhythmia This database is described in: Greenwald SD. Development and analysis of a ventricular fibrillation detector. M.S. thesis, MIT Dept. of Electrical Engineering and Computer Science, 1986. This is the same as SuddenCardiacDeath2_25_3250.txt, except that it is recorded from a different ECG channel (this is lead 1) 4000 2000 0 -2000 0 2000 4000 6000 8000 10000 12000 14000

  18. GreatBarbet1_50_1900_3700.txt Here we took 3 birds calls, all from the same species, Great Barbet,  Megalaimavirens, and concatenated approximately one minute snippets of their songs. We convent the sound file into MFCC, and below we show just the 7th coefficient. Which by eye, is the best at discriminating the three individuals. Note, there are much better ways to discriminate the birds using all the MFCCs or using other sound features. This exercise is to just create real data, for which we have the ground truth. Thanks to the wonderful xeno-canto website. http://www.xeno-canto.org/173220 http://www.xeno-canto.org/163079 http://www.xeno-canto.org/35474 >> Data= ([ zscore(XC173220GreatBarbet(7,1:1900)) , zscore(XC163079GreatBarbet(7,1:1800)), zscore(XC35474GreatBarbet(7,1:1000)) ]); 5 0 -5 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

  19. GreatBarbet2_50_1900_3700.txt Here we took 3 birds calls, all from the same species, Great Barbet,  Megalaimavirens, and concatenated approximately one minute snippets of their songs. We convent the sound file into MFCC, and below we show just the 3rd coefficient. Which by eye, is not the best at discriminating the three individuals. Note, there are much better ways to discriminate the birds using all the MFCCs or using other sound features. This exercise is to just create real data, for which we have the ground truth. Thanks to the wonderful xeno-canto website. http://www.xeno-canto.org/173220 http://www.xeno-canto.org/163079 http://www.xeno-canto.org/35474 >> Data= ([ zscore(XC173220GreatBarbet(3,1:1900)) , zscore(XC163079GreatBarbet(3,1:1800)), zscore(XC35474GreatBarbet(3,1:1000)) ]); 10 5 0 -5 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

  20. PulsusParadoxusSP02_30_10000.txt This dataset records the onset of Pulsus Paradoxus on a patient. Note that the clinician that annotated this data was in the room at the time and may have had access to information that is simply not available in this signal. Also note that there are explicit algorithms for detecting Pulsus Paradoxus that may work here, but these datasets are designed to test domain agnostic algorithms. SpO2 is an estimate of arterial oxygen saturation, or SaO2, which refers to the amount of oxygenated haemoglobin in the blood Pulsus paradoxus (PP), also paradoxic pulse or paradoxical pulse, is an abnormally large decrease in systolic blood pressure and pulse wave amplitude during inspiration. See also https://www.youtube.com/watch?v=7AXIYQK5BBM 4 3 2 1 0 -1 -2 -3 0 2000 4000 6000 8000 10000 12000 14000 16000 18000

  21. PulsusParadoxusECG1_30_10000.txt This dataset records the onset of Pulsus Paradoxus on a patient. Note that the clinician that annotated this data was in the room at the time and may have had access to information that is simply not available in this signal. Also note that there are explicit algorithms for detecting Pulsus Paradoxus that may work here, but these datasets are designed to test domain agnostic algorithms. Electrocardiography (ECG)is the process of recording the electrical activity of the heart over a period of time using electrodes placed on the skin. Pulsus paradoxus (PP), also paradoxic pulse or paradoxical pulse, is an abnormally large decrease in systolic blood pressure and pulse wave amplitude during inspiration. See also https://www.youtube.com/watch?v=7AXIYQK5BBM 10 5 0 -5 -10 0 2000 4000 6000 8000 10000 12000 14000 16000 18000

  22. PulsusParadoxusECG2_30_10000.txt This dataset records the onset of Pulsus Paradoxus on a patient. Note that the clinician that annotated this data was in the room at the time and may have had access to information that is simply not available in this signal. Also note that there are explicit algorithms for detecting Pulsus Paradoxus that may work here, but these datasets are designed to test domain agnostic algorithms. Electrocardiography (ECG)is the process of recording the electrical activity of the heart over a period of time using electrodes placed on the skin. Pulsus paradoxus (PP), also paradoxic pulse or paradoxical pulse, is an abnormally large decrease in systolic blood pressure and pulse wave amplitude during inspiration. See also https://www.youtube.com/watch?v=7AXIYQK5BBM 6 5 4 3 2 1 0 -1 -2 0 2000 4000 6000 8000 10000 12000 14000 16000 18000

  23. PigInternalBleedingDatasetCVP_100_7501.txt Published in the paper Classification of Time Sequences using Graphs of Temporal Constraints by Mathieu Guillame-Bert and Artur Dubrawski Undetected internal bleeding during and after surgical procedures pose a serious medical concern. Early and reliable detection of internal bleeding is considered a significant medical research problem. Our bleeding detection dataset is extracted from vital signs measured at high frequency (250Hz) using a bed-side hemodynamic monitor. The collected measurements include arterial blood pressure, central venous pressure and airway pressure. The data has been collected from a cohort of 52 pigs subjected to induced slow bleeding. Each animal has been sedated, instrumented and bleed with a pump at a rate of 20mL/min. From data of each pig, we randomly sampled two 30 second long segments of data: one from the period before and one approximately 2 minutes into the bleeding. Pre- and post-bleeding segments are respectively labeled as negative and positive. Unlike in the base UCR data, the SSTSs (vital signs) are not temporally aligned (both between pigs or for a same given pigs), the starting point of observation is arbitrary. Moreover, the data is multivariate. For reference, in current clinical practice, in the best cases, cardio-respiratory medical specialists are usually able to detect internal bleeding by monitoring high-frequency vital signs between 10 and 15 minutes after the onset of a slow internal bleeding episode. Pre- and post-bleeding SSTS are non-differentiable visually.

  24. PigInternalBleedingDatasetArtPressureFluidFilled_100_7501.txtPigInternalBleedingDatasetArtPressureFluidFilled_100_7501.txt Published in the paper Classification of Time Sequences using Graphs of Temporal Constraints by Mathieu Guillame-Bert and Artur Dubrawski Undetected internal bleeding during and after surgical procedures pose a serious medical concern. Early and reliable detection of internal bleeding is considered a significant medical research problem. Our bleeding detection dataset is extracted from vital signs measured at high frequency (250Hz) using a bed-side hemodynamic monitor. The collected measurements include arterial blood pressure, central venous pressure and airway pressure. The data has been collected from a cohort of 52 pigs subjected to induced slow bleeding. Each animal has been sedated, instrumented and bleed with a pump at a rate of 20mL/min. From data of each pig, we randomly sampled two 30 second long segments of data: one from the period before and one approximately 2 minutes into the bleeding. Pre- and post-bleeding segments are respectively labeled as negative and positive. Unlike in the base UCR data, the SSTSs (vital signs) are not temporally aligned (both between pigs or for a same given pigs), the starting point of observation is arbitrary. Moreover, the data is multivariate. For reference, in current clinical practice, in the best cases, cardio-respiratory medical specialists are usually able to detect internal bleeding by monitoring high-frequency vital signs between 10 and 15 minutes after the onset of a slow internal bleeding episode. Pre- and post-bleeding SSTS are non-differentiable visually.

  25. PigInternalBleedingDatasetAirwayPressure_400_7501.txt Published in the paper Classification of Time Sequences using Graphs of Temporal Constraints by Mathieu Guillame-Bert and Artur Dubrawski Undetected internal bleeding during and after surgical procedures pose a serious medical concern. Early and reliable detection of internal bleeding is considered a significant medical research problem. Our bleeding detection dataset is extracted from vital signs measured at high frequency (250Hz) using a bed-side hemodynamic monitor. The collected measurements include arterial blood pressure, central venous pressure and airway pressure. The data has been collected from a cohort of 52 pigs subjected to induced slow bleeding. Each animal has been sedated, instrumented and bleed with a pump at a rate of 20mL/min. From data of each pig, we randomly sampled two 30 second long segments of data: one from the period before and one approximately 2 minutes into the bleeding. Pre- and post-bleeding segments are respectively labeled as negative and positive. Unlike in the base UCR data, the SSTSs (vital signs) are not temporally aligned (both between pigs or for a same given pigs), the starting point of observation is arbitrary. Moreover, the data is multivariate. For reference, in current clinical practice, in the best cases, cardio-respiratory medical specialists are usually able to detect internal bleeding by monitoring high-frequency vital signs between 10 and 15 minutes after the onset of a slow internal bleeding episode. Pre- and post-bleeding SSTS are non-differentiable visually.

  26. GrandMalSeizures_10_8200.txt Tonic-clonic seizures of a subject recorded with a scalp right central (C4) electrode (linked earlobes reference). It contains a total of 3 minutes with about 1 min pre-seizure, the seizure and some post-seizure activity. Sampling rate is 102.4 Hz (see the paper for more details). It could be argued that the post- seizure section is a more significant regime… ElectroencephalogrClinNeurophysiol. 1997 Oct;103(4):434-9. Searching for hidden information with Gabor Transform in generalized tonic-clonic seizures. 1500 1000 500 0 -500 -1000 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

  27. GrandMalSeizures2_10_4550.txt Tonic-clonic seizures of a subject recorded with a scalp right central (C4) electrode (linked earlobes reference). It contains a total of about 2 minutes, the seizure and some post-seizure activity. Sampling rate is 102.4 Hz (see the paper for more details). Electroencephalogr Clin Neurophysiol. 1997 Oct;103(4):434-9. Searching for hidden information with Gabor Transform in generalized tonic-clonic seizures. 400 200 0 -200 -400 0 2000 4000 6000 8000 10000 12000

  28. EEGRat_10_1000.txt Each example contains 5 sec of a two-channel EEG recording at left and right frontal cortex of male adult WAG/Rij rats. Signals were referenced to an electrode placed at the cerebelum, they were filtered between 1-100 Hz and digitized at 200 Hz. The first 5 seconds correspond to a normal EEG (A) The next 5 seconds correspond spike-wave discharges. (E) http://www2.le.ac.uk/departments/engineering/research/bioengineering/neuroengineering-lab/software

  29. EEGRat2_10_1000.txt Each example contains 5 sec of a two-channel EEG recording at left and right frontal cortex of male adult WAG/Rij rats. Signals were referenced to an electrode placed at the cerebelum, they were filtered between 1-100 Hz and digitized at 200 Hz. The first 5 seconds correspond spike-wave discharges (E) The next 5 seconds correspond spike-wave discharges, but from a different rat (C) http://www2.le.ac.uk/departments/engineering/research/bioengineering/neuroengineering-lab/software 6 4 2 0 -2 -4 0 200 400 600 800 1000 1200 1400 1600 1800 2000

  30. InsectEPG1_50_3802.txt Data from: Machine learning for characterization of insect vector feeding Willett DS, George J, Willett NS, Stelinski LL, Lapointe SL Date Published: November 18, 2016 DOI: http://dx.doi.org/10.5061/dryad.4931c 4 2 0 -2 -4 0 2000 4000 6000 8000 10000 12000 14000 16000 18000

  31. InsectEPG2_50_1800.txt Data from: Machine learning for characterization of insect vector feeding Willett DS, George J, Willett NS, Stelinski LL, Lapointe SL Date Published: November 18, 2016 DOI: http://dx.doi.org/10.5061/dryad.4931c 4 2 0 -2 -4 -6 -8 0 2000 4000 6000 8000 10000 12000

  32. InsectEPG3_50_1710.txt Data from: Machine learning for characterization of insect vector feeding Willett DS, George J, Willett NS, Stelinski LL, Lapointe SL Date Published: November 18, 2016 DOI: http://dx.doi.org/10.5061/dryad.4931c 8 6 4 2 0 -2 -4 -6 0 1000 2000 3000 4000 5000 6000 7000 8000

  33. InsectEPG4_50_3160.txt Data from: Machine learning for characterization of insect vector feeding Willett DS, George J, Willett NS, Stelinski LL, Lapointe SL Date Published: November 18, 2016 DOI: http://dx.doi.org/10.5061/dryad.4931c 2 0 -2 -4 -6 -8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 10

  34. Cane_100_2345.txt Winston H. Wu, Lawrence K. Au, Brett L. Jordan, Thanos Stathopoulos, Maxim A. Batalin, William J. Kaiser, Alireza Vahdatpour, Majid Sarrafzadeh, Meika Fang, Joshua Chodosh: The SmartCane system: an assistive device for geriatrics. BODYNETS 2008: 2 Here we simply concatenated two different users of a Smartcane

  35. Powerdemand_12_4500.txt This dataset is 320 days of electrical power demand in an Italian city, beginning in mid august. At time point 4500, we simply flipped the remaining data left to right.

  36. DutchFactory_12_2184.txt This dataset is one year of electrical power demand in an Dutch facility city We simply flipped the first 91 days left to right. Jarke J. van Wijk, Edward R. van Selow: Cluster and Calendar Based Visualization of Time Series Data. INFOVIS 1999: 4-9

More Related