1 / 47

CS5525 Data Analytics Crime Data Mining

CS5525 Data Analytics Crime Data Mining. Wei Wang , Yi Xiao, ZhenHe Pan , Fang Jin . Outline. Project Motivation Challenge Our Approach Visualization Conclusion. Project Motivation. What is the association among crime category ? Who drag the crime rate lower or higher ?

winka
Download Presentation

CS5525 Data Analytics Crime Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS5525 Data Analytics Crime Data Mining Wei Wang, Yi Xiao, ZhenHe Pan, Fang Jin

  2. Outline • Project Motivation • Challenge • Our Approach • Visualization • Conclusion

  3. Project Motivation • What is the association among crime category ? • Who drag the crime rate lower or higher ? • How the crime distribute spatially, temporally ? • How are the correlation going among all crime events ? • Which criminal gang are hang around DC ?

  4. Outline • Project Motivation • Challenge • Our Approach • Visualization • Conclusion

  5. Challenges • Dataset problem • Wrong format date • Incomplete description • Disorder encoding date source • Lack of detailed timestamp, stay on day stage • Huge records require optimal algorithm to detect the association rule • How to extract the valuable information from the description? • How to find effective algorithm to detect similarity, within huge various records, eg: Robbery, larceny, homicide, arson have different format description.

  6. Outline • Project Motivation • Challenge • Our Approach • Visualization • Conclusion

  7. Our Approach • Time Series analysis • Text Mining • Association Rule • Similarity Analysis

  8. 1) Time Series analysis • Months • Weekday and weekend • Seasons • Outside factors, unemployment

  9. 1) Time Series Analysis

  10. 1) Time Series Analysis

  11. 1) Time Series Analysis

  12. 1) Time Series Analysis

  13. 1) Time Series Analysis

  14. 1) Time Series Analysis

  15. 1) Time Series Analysis

  16. 1) Time Series Analysis

  17. 1) Time Series Analysis

  18. 1) Time Series Analysis

  19. 1) Time Series Analysis

  20. 1) Time Series Analysis

  21. 1) Time Series Analysis

  22. 1) Time Series Analysis

  23. 1) Time Series Analysis

  24. 2) Text Mining 1. Delete stopwords, such as “a”,”the”, “and”,”on”. 2. Count word frequency, to get the top frequent words which support count is higher than 50 3. define a training set category, like: • Time: AM, PM, days in a week, month • Weapon: gun, knife, • Cloths: hoodie, T-shirt, cap • Color: black, red • Age: teenager, old, • Car brand: Toyota, BMW • Wounded: • Action: 4. From the seeds, to expand the feature list by extracting the nearby words 5. Add feature list by analyzing crime news from website

  25. 2) Text Mining 6. Filter the crime description text using those feature library to get the effective words. 7. Set the description text length threshold, e.g 30 effective words, which means if the text length is below this threshold, we think this text provides very general or small information about this criminal event. In this case, we will ignore this criminal event completely. 8. Compare any two criminal description words, whose length ratio should not greater than 20, and to find out the same words. If the number of the same words are more than 5, we compute its similarity. Otherwise, we abandon those two criminal description texts, and consider them are totally independent events. 9. Compute the similarity of each criminal events as out confidence.

  26. 2) Text Mining

  27. 2) Text Mining

  28. 2) Text Mining

  29. 3) Association Rule • Goal: Explore the association rules among different crime type. • Algorithm Apply Apriori algorithm, support threshold = 0.5 Normalize the transaction, treateach day as a basket of crime set, eliminate the low support crime events • Results: Burglary has a strong relationship with assault offenses and robbery. Each time an assault offense occurs, burglary will also happen.

  30. 3) Association Rule

  31. 4) Similarity Analysis Similarity Analysis in different dimensions 1. Records normalization based on properties • Temporal: Day of week, Day of Month, AM or PM • Spatial: latitude, longitude Using Haversine formula to compute distance between two Location • Category: URC_category, sub_category • Textual: TF – IDF

  32. 4) Similarity Analysis Similarity Analysis in different dimensions 2. Similarity computing Similarity = Wt* St + Ws*Ss + Wc*Sc + Wd * Sd St = 1/3 *[diff(day of week)>2] ? 0:1 + 1/3 *[diff(day of month)>3] ? 0:1 + 1/3 * [diff(phase)] Ss = [25 – Haversine(Lon1,Lat1,Lon2,Lat2)] / 25 Sc = ½ * [diff(Urc)] + ½ *[dfii(sub_category)] Sd = cosine(|D1|,|D2|)

  33. Outline • Project Motivation • Challenge • Our Approach • Visualization • Conclusion

  34. Visualization • Crime Distribution Revealed on Map • Crime Listing and Searching • Similarity of Crimes

  35. Spatially Marker Clustering • Why Cluster? Too crowed

  36. Spatially Marker Clustering The largest cluster size The second largest cluster size The third cluster size The fourth cluster size The fifth cluster size Single crime event

  37. Spatially MarkerClustering

  38. Spatially MarkerClustering

  39. Visualization

  40. Visualization

  41. Visualization

  42. Outline • Project Motivation • Challenge • Our Approach • Visualization • Conclusion

  43. Conclusion • What is the association among crime category ? They do have high confidence among crime categories, eg: Assault offense  burglary • Who drag the crime rate lower or higher ? Certain crime category have their own rules, for example: Arson are more likely to happened on October. Burglary are higher on Monday, while Arson are higher on Wednesday. Homicide are easily to happen during summer, especially higher on Saturday.

  44. Conclusion • How the crime distribute spatially, temporally ? Dc has the most crime events, account 76%, the second is Fairfax 14%, so the two countries should have more police. Alexandria public order is getting better, while Arlington is getting worse. DC keeps the same distribution. • How are the correlation going among all crime events ? Which criminal gang are hang around DC ? From the high confidence crime similarity, we can find hint of the same criminal gang.

More Related