110 likes | 215 Views
This project, undertaken during the ESRI Summer Internship in 2014 by Raghuveer Nanduri, focuses on enhancing the relevancy of search results for users. By implementing TF-IDF, a frequency-based statistic that assesses word importance within a document, and combining it with clustering techniques, the objective is to improve data visualization in clusters. Important documents related to user queries are prioritized, enabling better understanding by maximizing similarities within clusters while minimizing inter-cluster similarities. The study demonstrates practical applications in managing diverse datasets, such as environmental assessments.
E N D
Optimization of Search Results Raghuveer Nanduri ESRI Summer Intern 2014
Goal - To optimize the relevancy of search results - To facilitate users to visualize data in the form of clusters Idea: TF-IDF: Frequency based numerical statistic used to determine the importance of a word in a document. Combining TF-IDF with Clustering: Considers only important documents corresponding to the query -similarity of documents within the same cluster is maximized-similarity of documents across clusters is minimized
Clustering • “the process of grouping homogenous objects.” • Why do objects appear in the same cluster? • Spatial or Temporal correlation between objects leads to the formation of clusters K means clustering
Data Repository TF-IDF based top documents Clustering Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data Meta data
Modified (gptogc) Existing (gptogc)
Rivers,water Land Clustering Cluster 1 Cluster 2 Cluster 4 Air quality Biological Assessments Cluster 5 Water Depths Cluster 3
Modified (Alberta data) Existing (Alberta data)
Regional advisory council recommendation Clustering Cluster 1 Information articles about the parks Cluster 3 Different parks present in the area of alberta Cluster 2
Further Improvements • Association and scalar clustering to perform query elaboration. • Improving relevancy through user feed back.