Web Mining. Shah Mohammad Nur Alam Sawn 03/03/2014. What is Web Mining?. Discovering desired and useful information from the World Wide Web. Exploiting Geographical Location Information of Web Pages. Orkut Buyukkokten ( firstname.lastname@example.org ) Junghoo Cho( email@example.com )
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Shah Mohammad NurAlam Sawn
What is Web Mining?
Discovering desired and useful information from the World Wide Web
Department of Computer science ,Stanford University, Stanford, Ca 94305.
Department of Computer science, Columbia University, New York, 10027.
Ways of exploiting information from internet:
Computing geographical information
It has the phone numbers of network administrators of all Class A and B domains. From this database, extracted the area code of the domain administrator and built a Site-Mapper table with area code information for IP addresses belonging to Class A and Class B addresses.
It maps cities and townships to a given area code. In some cases, entire states (e.g., Montana) correspond to one area code. In other cases, a big city often has multiple area codes (e.g., Los Angeles). Then write scripts to convert the above data into a table with entries that maintained for each area code the corresponding set of cities/counties.
This mapped each zip code to a range of longitudes and latitudes.
Graphical Interface of Proof of Concept Prototype
Output of search
Wenwen Li, Michael F. Goodchild, Richard L. Church , and Bin Zhou
Process of Web Crowler
A Web crawler is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing. A Web crawler may also be called a Web spider, or an automatic indexer.
Form of street address for Identifying target webpages
d1:Distance between p and the location of the foremost digit in the number block closest (before) to location p.
d2: Distance between p and the location of the last digit of the first number that appears(for detecting 5-digit ZIP code), or the last digit of the second number after p if the token distance of the first and second number block equals
r1: regular expression [1-9][0-9]*[\\s\\r\\n\\t]*([a-zA-Z0-9\\.]+[\\s\\r\\n\\t])+
r2: : regular expression "city-Pattern "[\\s\\r\\n\\t,]?+
Station + Num
Key word Station and
Title web page as fire
web page title
Location of all fire station obtained by Cyber Miner from address database
This is able to search for location-specific information in Singapore based Web sites. The user is able to view their search locations on a satellite map instead of the two-dimensional maps currently used in street directories. The Web-based search engine is able to search for locations based on area names, building names, and groups of landmark types, business names, and business categories. Furthermore, the user is also able to use their current coordinates as a parameter so that the search engine is able to return results in order of the distance from the user’s current location.
Using googleearth for theirsearch
Keyhole Markup Language (KML) is a ﬁle format used to display geographic data in an earth browser such as Google Earth, Google Maps and Google Maps for mobile.
Usefull for mobile phoneonly and it is alsowebmapservicewhichmerge with googleearth
Google Earth allows download of tracks and waypoints from GPS devices creates KML files for the waypoints and tracks downloaded.
Here use the Haversine’sFormula for faster processing.
between the two points is related to their locations
by the formula:
h=haversin(Δ Ø)+cos(Ø1 )cos(Ø2)haversin(Δ λ)……(1)
The database from which these results are taken contain 1652 entries with the following categories: