Rafael J. Fernández-Moctezuma April 3, 2007

Have we met?Trying to find similar events in an archival database in the context of travel time estimation Rafael J. Fernández-Moctezuma April 3, 2007

Acknowledgements • Kristin Tufte for introducing me to the “fetch similar” problem – and helping bring the problem to smaller pieces to solve. • Umut Ozertem (OGI) for valuable discussion from a Machine Learning perspective.

Contents • Issues in estimating travel time • “Instantaneous estimate” • A posteriori estimate • Combining archived data efficiently • Imposing structure to minimize the search space • Preliminary results • Concurrent / Future work

Motivation: Travel time Direction of flow

Travel time Inductive loop detectors measure speed, occupancy, and volume. These are the three fundamental quantities used to reason theoretically about traffic flow. A B C MA MB MC Direction of flow

Travel time Region of Influence A B C MA MB MC Direction of flow

Travel time α Ω VMS Region of Influence A B C MA MB MC Direction of flow A Variable Message Sign could inform drivers with estimated travel times. Useful before intersections, alternate routes may be selected if considerable delay is ahead.

Travel time α AB Ω BC VMS Region of Influence A B C MA MB MC Direction of flow

Instantaneous estimate t0 tf AB BC α Ω At time t0, calculates the travel time from α to Ω as the sum of times between the regions (i.e., time from α to AB + time from AB to BC + time from BC to Ω). However, by the time the vehicle arrives to AB, conditions measured at B may have changed. VMS A B C MA MB MC Direction of flow

A posteriori estimate t0 t1 t2 tf At time t0, calculates the travel time from α to AB (say, t1.). At time t1, look at the condition reported by B, and calculate the time between AB and BC (and so on.) This is closer to the travel time a vehicle experienced, but this estimate cannot be computed online at α, for the complete segment, since we cannot see the future. AB BC α Ω VMS A B C MA MB MC Direction of flow

Archived data is useful • It is possible to compute a posteriori estimates for previously observed measurements. • This opens the possibility for incorporating previously seen travel times (associated with instantaneous measurements) for online estimation.

Identical vs. Similar • We cannot guarantee that all possible combinations of measured values have been observed already. • We would also like the recall of relevant data point to be a fast process – we don’t want to go through the entire history at every refresh. • ML people constantly complain about “not having enough data”. In this case, we have a lot of data and we wish to quickly extract a representative sample.

Let’s step out and think in general terms: A model system Selection mechanism Fusion mechanism Data stream of measurements similar historical measurements Archive Estimation

A model system Selection mechanism Fusion mechanism Data stream of measurements similar historical measurements Archive Estimation How can we do this efficiently?

A model system What is a reasonable strategy? Selection mechanism Fusion mechanism Data stream of measurements similar historical measurements Archive Estimation

A model system Selection mechanism Fusion mechanism Data stream of measurements similar historical measurements Archive Estimation Our effort so far has concentrated in this section.

Impose structure in the data archive • Databases are very efficient when we know what to ask (e.g., “value >= 20” benefits greatly from index lookup, if index exists) • Can we index “similarity”? • Consider imposing structure on previously seen information. We can be clever about what to index and reduce the search space on the fly.

Similar as “close” in a vector space Retrieve the 3 nearest points to the new point For n existing points in the space, we need n comparisons. This problem is referred to as k-nearest neighbors. We can define “Close” in terms of Euclidean distance.

What if we can prune off some points? Suppose we are given regional boundaries. It is now feasible to look first at which one is the closest region, and then perform the search within it. The outcome of clustering can give us such boundaries.

K-means One of the simplest algorithms for clustering, attempts to minimize the variance within members of each cluster, while maximizing the variance between clusters. It finds the centroids of k clusters. These points may not be observed points. For this example, an initial comparison with three centroids reduces the search space to ~ 1/3.

K-means The random start property of k-means implies several limitations, three of which stand out: (1) For a particular choice of k, there can be one or more solutions. (2) It is possible to end up with empty clusters. (3) Initial choice of centroids can be problematic (bad derivative) We can cope with these limitations doing several Monte Carlo runs.

May not be a small enough set for KNN What if we could further reduce the search within a cluster? Suppose a torus is defined in terms of distances to the cluster centroid – we have already pre-computed them in pre-processing, and we have computed the novelty point’s distance when classifying as well. Only do KNN with points within the two radiuses (d + λ and d – λ).

Back to transportation • Input vector for a particular time t, in a segment s that contains n sensor stations • What are we clustering? • The input vectors looking only at the n fundamental measurements. • Travel time is our measure of interest, i.e., target for prediction. Clustering is “blind” to it.

Considerations • I said that clustering is “blind” to the a posteriori estimate of travel time • This much is true, but we want clusters that help us predict travel time. • The core assumption is that the fundamental measurements at a time t are related to the travel time (they are, we just haven’t expressed that yet).

Considerations • Look at the difference in travel times among members of a cluster. This helps us choose a suitable number of clusters. • We could use variance, but (1) it grows quadratically, and (2) is not intuitive. • Proposed error function should decrease smoothly as the number of clusters increases. • The error function is saying “+/- 3 is all the same to me.”

Prototype implementation • Looked at US 26 E sub-segment • Morning period that includes peak: 06:00 – 11:00 • Treated one day as historical, tested on another day (Oct. 9 2006 and Oct. 12 2006) • Careful: if we just shuffle points to estimate performance, we are fooling ourselves – the fitting process may have seen past and future. For this domain, if we are to simulate data loss, always leave the test day(s) out.

US 26 E Looked at three stations: Cornell Murray Cedar Hills Hypothetical VMS between 185th and Cornell, with target destination between Cedar Hills and Parkway. Segment length: 3.7 miles. Believe it or not, it can take up to 30 minutes during rush hour (been there, done that). Image from http://maps.google.com/

Choice of k • As expected, error function drops

Choice of k • As expected, error function drops suitable

Simplifying criteria • Radiuses determining the torus centered around the centroid: R1 = d/2 R2 = 3d/2 Where d is the distance from the novelty to the centroid. • “Fusion mechanism” is the average of 3-nearest neighbors within the torus.

Experimental results The trends during the peak period are followed correctly. Unsurprisingly, the early peak is somewhat captured – the fitting set did not have one. Still, ups and downs are discovered. ERRATA: LABELS REVERSED

Fitting set timeseries

Ongoing work (suggested by Kristin) • Looking at probe runs and comparing the measured times with a posteriori estimates – previous efforts were made with instantaneous estimates only. Curious as to whether the a posteriori estimate is significantly different than the instantaneous one. • Pick a larger dataset – OR 217 has better sensor density. Test over one month or so.

Future work • Current prototype is in MATLAB – should I start getting familiar with Niagara? • Any better ideas for the “fusion”? Should this just be an extra parameter? (“pick k nearest neighbors”) • The radius estimate can be a potential problem. Any suggestions? Should this just be one more parameter to find during fitting?

Future work • Is a torus the right shape? How about a hypercone? Could it be easily derived on the fly from pre-computed information? • We have considered expanding the feature vector (temperature, precipitation, etc.) These measurements are updated hourly, and sometimes available the next day. Any other sources that may make sense?

Rafael J. Fernández-Moctezuma April 3, 2007