Artface: Automated Web Client Expectation Classifier

Artface(Automated reorganization to fit approximate client expectations) Mike Venzke

Artface Goals • Provide a method for determining the approximate expectation of a web client • Examine feasibility of using this information in an automated manner

Description • Using Open Directory categories, create a model for classifying web pages. • Fetch, parse, and classify the referring page of local web hits. • As a result, have the approximate expectations people have when they go to different parts of your website.

Classification Categories • Used DMOZ categories • Already classified web pages; provides good training data. • Went 3 levels deep in directory • Wanted to get approximate expectation, not so specific that very similar items are considered different. • Time and constraints

Page Fetching • Used Python SGMLParser module • Good at parsing out irrelevant data • Fast enough • Easy to use

Classification • Rainbow – LGPL’d Naïve Bayesian text classifier • Used ~ 9000 documents as training data, with expanded category as classification. • ~7000 test pages taken from web logs of www.cs.rpi.edu and www.linenplace.com

Data Results • Fairly accurate results • http://webgraph.canbelearned.com

Automation Possibilities • Determine ‘good’ categories by self-site classification or user input • Track traffic from ‘good’ categories and provide higher-level links to local pages. • Set of bad categories is small and generally universal. • Take action against local sites based on how they’re being used, not what they have.

Automation Possibilities (contd) • Provide custom pages based on what user expected, rather than what page contains. • May not have found what they wanted. • May be interested in a more broad topic.

Process Enhancement Ideas • More training data • Use all levels of DMOZ data, but push classification up to threshold level. • Handle more page errors • Scripting, authentication errors provide false data. • Remove or special-parse ‘classless’ information pages • Search engines

Artface: Automated Web Client Expectation Classifier

Artface: Automated Web Client Expectation Classifier

Presentation Transcript

Qun Huang , Patrick P. C. Lee, Yungang Bao

TOP TEN ISSUES WHEN BUYING DISTRESSED COMMERCIAL REAL ESTATE

Joint Base San Antonio Civil Engineering Opportunities, Challenges Organization

One stop shop for Retail, Start-up, Expansion and Franchising ——

Financial and Procurement Shared Services

SECTION 9 — Technical Directives

Supernova Watches and HALO

How to approximate complex physical and thermodynamic interactions?

Premier Director Document Imaging

Integrating Government Service Channels February 11 th , 2003 Ottawa, Ontario, Canada

Robert Beltran, P.E., BCEE Assistant Executive Director

Packet Vaccine: Black-box Exploit Detection and Signature Generation

Agenda

API ECS Reorganization Update

CISE Reorganization

Reorganization in OCO

Military Saves Week 19-26 February 2012 Are you ready?

TXO Contractor Leadership FIT Charter

Approximate Object Location and Spam Filtering on Peer-to-Peer Systems

IMC Reorganization

Auto Finance Leads According to Actual Need of The Client

Numerical Analysis – Interpolation