Download
internet and www services n.
Skip this Video
Loading SlideShow in 5 Seconds..
Internet and WWW Services PowerPoint Presentation
Download Presentation
Internet and WWW Services

Internet and WWW Services

458 Views Download Presentation
Download Presentation

Internet and WWW Services

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Internet and WWW Services • Security • Types of Services • Vended versus Internally Provided • Costs and Benefits • Servers and Clients • Potential Problems • Stats

  2. General Network Security • Isolated Servers • Restricted Subnets • Firewalls • Proxy Servers

  3. WWW Application Security • OS Level • Server Level • Program Level

  4. Types of WWW Services • Static Data • Server Search Engines • Dynamic Data • Server Applications • Java Enabled

  5. Vended • Which Vendor • How Much Do They Do • HTML • Graphics • Design & Layout • Programming • Bandwidth • Total • Dedicated

  6. Internally Provided WWW Server • For who? • How many services, how much traffic? • For what use (scope the server) ?

  7. Cost of a WWW Service • Server Usage • Disk Space • Network Bandwidth • Router or LAN Load • Application Development with Limited Capabilities • Application Development with Limited Standardization

  8. Benefits • High-touch, High Impact Narrow-casting • Kiosks • Fast, Simple Apps From Central Server • Built-in Protocols • Potentially Large Installed Client Base

  9. Shopping List • Server Machine and O/S • Network Access • WWW Server • WWW Client • Server Programming Tools • Data and/or Databases

  10. Which Server Platform? • Unix • NT

  11. Which Server? • CREN • Microsoft • Netscape - Communication or Commerce • O’Reilly • WebForce • Oracle WebServer

  12. Client Compliance Level • HTML 2.0 • HTML 3.0 • Netscape Enhancements • Java • Lynx (Text Browser)

  13. CGI-BIN Risks • Dangerous Programs or Scripts • User-supplied Programs or Scripts

  14. Robots and Other Network Creatures • Problems with “Automated Agents” • Deterring Robots • Reacting to Robots

  15. WWW Server Stats

  16. WWW Server Stats

  17. Web Mining Web based information extraction

  18. Why the Web(web = web browser) • Ubiquitous: • Web browsers are on every desktop, every PC, Mac, workstation, and terminal. • Platform independence • Use of Java and server side programs means clicking on a button does the same thing everywhere.

  19. Data Cleansing Natural Language Text Mining Data Compression News Services Decision Trees Multidimensional vectors FactorialAnalysis Markov objects ID3 Tri-Grams Keyword Search Word frequency Hypothesis Verification Hidden Information Data warehouse Tri-Letter Sets

  20. Display Results N Extracted Data Cleansed Data DATA

  21. What Kind of Data? • Usenet News • Most places have Multi gigs of news • System accounting files • Can tell who is doing what, when • Misc. Web pages • A variety of interesting information • Listserver or public system email • We keep email concerning system problems

  22. Cleansing Data • News article • NNTP fields • signatures • Web Page • HTML codes • descriptions of links to other sites • pattern fields (headers and trailers that appear on every page at the site)

  23. Mining for data • Test hypothesis • Look for hidden information • Find other similar information

  24. Display of Information • Graphical • Text Listing • Directories: human maintained categories • e.g.: recreation, computers, finances, arts • Computer generated list • Customized • User defined defaults • Cookie defined defaults

  25. Display Results N Learning to extract data from the answer Compile and clean data Learning to use services Data and Services

  26. What Services? • Search Engines • Internet White Pages • (information on individuals) • Internet Yellow Pages • (information on corporations) • Usenet News repositories • Online libraries • Online periodicals

  27. Learning to use Services • Sample sets of data • can derive a format if taught to. • Machine learning (same as in Data Mining) • look at every interpretation, find the one that conveys the most information.

  28. Learning to interpret answers • What format is information given in? • What do the fields mean? • Can identify unknown fields by matching the data with a known information.

  29. Compile and Clean Data • Redundancies • Duplicates • Redundancies • Newer information has precedence

  30. Security • Server environment • Use trusted CGI scripts and server side includes • Client environment • Restrict access by IP number or domain • Restrict access by password • Internet • encrypt data (PGP) • Certification authority

  31. Checking for hidden information Y Data is in database? N Machine Learning

  32. Article: 52151 of comp.lang.perl.misc Path: lynx.unm.edu!pr1.plk.af.mil!tesuque.cs.sandia.gov!sloth.swcp.com!news.ironhorse.com!op.net!news.mathworks.com!enews.sgi.com!news.sgi.com!mr.net!news.mid.net!sbctri.tri.sbc.com!newspump.wustl.edu!newsfeed.rice.edu!rice!add From: add@pecos.is.rice.edu (Arthur Darren Dunham) Newsgroups: comp.lang.perl.misc,comp.infosystems.www.authoring.html Subject: Re: WWW: web site "pre-processor" in perl ? Date: 31 Oct 1996 00:20:06 GMT Organization: Rice University Lines: 23 Message-ID: <558rbm$61k@listserv.rice.edu> References: <539045$3de@news1-alterdial.uu.net> <8cvicmud6t.fsf@gadget.cscaper.com> <slrn55ih2s.qs8.charlie@antipope.demon.co.uk> <53o93k$3ia@panix.com> NNTP-Posting-Host: pecos.is.rice.edu Xref: lynx.unm.edu comp.lang.perl.misc:52151 comp.infosystems.www.authoring.html:111886 In article <53o93k$3ia@panix.com>, Clay Shirky <clays@panix.com> wrote: > >Au contraire. HTML _is_ broken, relative to, say, SGML, but if you are >careful with your tags and comment carefully, your data can be derived >from your HTML files, not v-v. > >find . -name '*html' -exec perl -p -i.bak -e > 's#(<body[^>]*bgcolor="?)oatmeal("?[^>]*>)#$1skyblue$2#i;' {} \; or if you wanted perl to do all the work, rather than have find(1) launch N perl executables for each .html files, you could do this.... find . -type f -name '*html' -print | xargs perl -p -i.bak -e 's#(<body[^>]*bgcolor="?)oatmeal("?[^>]*>)#$1skyblue$2#i;' That way, perl happily iterates through all the lines in all the files since we don't care which file we're in when we do the substitution. -- Darren Dunham add@is.rice.edu UNIX Sysadmin Rice University (This line currently in revision) Houston, TX Any resemblance between real opinions and my post is coincidental

  33. <HTML> <HEAD><TITLE>Information gathering</TITLE></HEAD> <BODY> <TABLE><TR><TH> <IMG SRC="info.gif"></TH> <TH> <font size="+3">Information Gathering</font> <BR> Just some sample text which might or might not be worthless. You'd want to sort out which of this was just HTML tags and other worthless junk and which was meaningful. </TH></TR></TABLE> <P> <CENTER><H2>Links to</H2> <A HREF="/sameplace/otherinfo"> A link to something on this site </A> <A HREF="/otherplace/otherinfo"> A link to something on this another site </A> </BODY></HTML>

  34. Articles from sci.lang selected through webSOM Re: Scots and English Gregory J Dalley, 30 May 1995, Lines: 18. Re: Dutch and English accents Phil Rose, 15 Jun 1995, Lines: 28. Re: ANY SIL'rs out there? A.K.A. Summer Institute of Linguistics. yomomma, 16 Jun 1995, Lines: 6. Re: ANY SIL'rs out there? A.K.A. Summer Institute of Linguistics. yomomma, 16 Jun 1995, Lines: 6. Conferences, Seminars-info wanted chris bowen, Mon, 03 Jul 1995, Lines: 7. AIGH? Coby (Jacob) Lubliner, 8 Jul 1995, Lines: 8. "Shall" and "Will" in Welsh English maryproto@delphi.com, Wed, 19 Jul 95, Lines: 14. careers in linguistics scharle, 10 Sep 1995, Lines: 8. job opportunities in computational linguistics? Sonny Xuan Vu, 30 Sep 1995, Lines: 14. Re: job opportunities in computational linguistics? Miss Sarah Tiller, Wed, 4 Oct 1995, Lines: 27. Re: What Is Singapore English? Zhong Qiyao, 11 Dec 1995, Lines: 28. Re: What Is Singapore English? Chew Kim Swee Andrew, 14 Dec 1995, Lines: 41. Re: What Is Singapore English? Pota alok Ashwin, 16 Dec 1995, Lines: 45. Re: How to write in English ... Ann Weiner, Tue, 2 Jan 1996, Lines: 13. Re: What Is Singapore English? Wing Luk, 7 Jan 1996, Lines: 27. Linguistics Careers lebitz,stacey b, 23 Jan 1996, Lines: 14. English Teaching Offering in China - offer2.doc [1/1] XIAOJUN ZHANG, 24 Jan 1996, Lines: 240. TRYING TO PROTECT YOUR WORK? prepaid, Sun, 04 Feb 1996, Lines: 1. Give me, please, one program for learn to speak english!! Please!! "Eugen I. Ivanov", 20 Feb 1996, Lines: 1. Re: The English "R" for Germans Joerg Settemeyer, 8 Mar 1996, Lines: 5. English Tutor Needed. Mua Tran, 23 Mar 1996, Lines: 20. Re: old form of shorthand Fido, 1 Apr 1996, Lines: 9. Re: Math as pornography Gordon Fitch, 17 May 1996, Lines: 7. Re: Chain Shift Charles Lieberman, 26 Jul 1996, Lines: 10. Re: Tendency of Inflections to Disappear - Why? Terrence Griffin, 28 Jul 96, Lines: 1. Re: Concerning the number of esperantists Marc Bonnaud, Fri, 09 Aug 1996, Lines: 14. Re: Concerning the number of esperantists Cheradenine Zakalwe, Fri, 9 Aug 1996, Lines: 16. Re: Concerning the number of esperantists Alan Gould, Sat, 10 Aug 1996, Lines: 22. Re: Concerning the number of esperantists Don HARLOW, Sun, 11 Aug 1996, Lines: 21. Re: Kiom da E-istoj *ne* regas la anglan? Andrew McConnell, Fri, 30 Aug 1996, Lines: 19. cohesion in CMC Per-Mikael Jansson ENGE, 22 Oct 1996, Lines: 10.

  35. Limitations of the Web • Some functionality/specialization was given up for ubiquity • Transfer time • Mass data transfer prohibitive • External to machine • Reliance on network • Not inherently as secure as staying home

  36. Why Data Mining • There is a lot of data of unknown worth and purity • Data mining uses the same underlying procedures as other knowledge discovery/ data extraction systems

  37. Automatic Customization to user preferences • Web pages • Hotwired autoconfigs based on what you surf to • News services • usenet service custom.roy-corey.1 • Information display paradigm • industry report style • collegiate style • Microsoft style

  38. Methods for gathering data • Extraction from documents • data mining • keyword searches • similarity searches • Extraction from services • ILA: internet learning agents • Softbots • Metacrawler

  39. Data mining on the web? • Transfer rate too slow to transfer most databases whenever you want • Computation too intensive to let others mine your database whenever they want • So: Use pre-collected data or pre-indexed database

  40. Java -- What is it? • Programming Language • Java Compiler • Java Interpreter (Java Virtual Machine) • For creating applets which run inside a browser • For creating applications (stand alone programs)

  41. Java Application Source Code // // Sample HelloWorld application // class HelloWorldApp { public static void main(String args[]) { System.out.println("Hello World!"); } }

  42. Java Applet Source Code // // Sample HelloWorld applet // import java.awt.Graphics; import java.applet.Applet; public class HelloWorld extends Applet { public void paint (Graphics g){ g.drawString("Hello world!", 25, 25); } }

  43. How could you use it? • Client applets or applications • Server code • Portable code • Create via Developer Tools

  44. Developer Tools • Visual C++ (Visual Java?) • Symantec • Sun • SGI - Cosmo Code

  45. Developer Tools • SourceCraft • Powersoft - Fusion • Quintessential Objects - Diva for Java (Javaside) • Roguewave - JFactory

  46. Advantages • Object Oriented and event-driven • Portable* bytecode • Multi-threaded • Integrated Network Abilities • Built-in Multimedia Capabilities • “Robust and Secure”

  47. Drawbacks • Few deployed clients • Very C++ -like • Not yet stabilized • Very few Developer Tools • Not all the class libraries exist (yet)

  48. Class Structure Class java.applet.Applet java.lang.Object | +----java.awt.Component | +----java.awt.Container | +----java.awt.Panel | +----java.applet.Applet

  49. Security • OS security in applications • “No Pointers” and no user memory management • Compile-time and Run-time checking • Client Data Security • No access to disk from Netscape • Directory-based security in Hot Java

  50. Security • Network Security • No Applets • No Access • Applet Host • Firewall • Any Host