### Weather Mining

Hayato Akatsuka

• Cluster a region which shares similar climate.

• Each weather station in the United States is an input

• Each station contains more than 50 parameters

• i.e. Latitude, Longitude, Elevation, Minimum Temperature, Maximum Temperature, so on…

• 6000 ~ 19000 Stations

Input (text file)

Station1 2005/01/01 MaxTemp MinTemp Lantitude Longitude Elevation ….Station2 2005/01/01 MaxTemp MinTemp Lantitude Longitude Elevation ….Station3 2005/01/01 MaxTemp MinTemp Lantitude Longitude Elevation …..

output(Image)

Clustering

• Euclidean Distance

If you are interested in some particular parameters, adjust k accordingly

• Day 1(Hierachical Clustering)

• This is an initialization Stage.

• Pick a number of clusters

• Then, Perform Hierarchical Clustering

• Day 2(Clustering variant)

• For each input, cluster with the nearest centroid obtained from the previous day (Day 1 in this case).

• Do not update centroid

• Repeat until you cluster all the input for Day 2.

• Recalculate centroid

• Day 3

• Repeat Day2 ….

• For same cluster

2nd Day:

3rd Day:

4th Day:

Day2

Day1

• For simplicity, just use only 1 parameter (TMIN). Number of Clusters = 5

Output

Hardiness Zone

• Well… there are not much different between a map I received from January and one from December.

• Simply making a map out of annual data, instead of daily data, might be better.

• Hardiness Map http://www.arborday.org/treeinfo/zonelookup.cfm