1 / 31

Administrative Information about Term Project

Administrative Information about Term Project. . Term Project ( 15% of the total scores of the semester) Due_1 : Jan 7, 2005 . ( with 20% bonus of the project ) Before Jan 14 ( final exam , with 10% bonus ) Due_2 : Jan 21 ( no bonus ) Unaccepted after Jan 21 ( Score = 0 of this part).

waldo
Download Presentation

Administrative Information about Term Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Administrative Information about Term Project • .Term Project ( 15% of the total scores of the semester) • Due_1 : Jan 7, 2005. ( with 20% bonus of the project) • Before Jan 14 ( final exam, with 10% bonus) • Due_2 : Jan 21 (no bonus) • Unaccepted after Jan 21 ( Score = 0 of this part) DCP1172, Ch.8,9

  2. Term Project - Requirement: • you are supposed to submit a final report (in both electronic and paper documents using English or Chinese) including a proposal (or report) to select (or define) a problem that could be solved by the skills learned in the class ( DCP1172- Introduction to Artificial Intelligence) • you are supposed to design a prolog program for solving the problem (or a typical part of the problem) mentioned above and implements as much as possible techniques and knowledge learned in this class. DCP1172, Ch.8,9

  3. Term Project – Report Format • Cover page information: • Subject (主題) • 姓名, 學號, e-mail address • Abstract ( 簡單摘要) – 150 字左右 • Format: • The length of the report should be within 6-12 pages (not counting the program and the cover page), with A4 size paper in size-12 characters. • If you would like to incorporate Tables and/or Drawings (or graphics) in your report, be sure to make it clear for the instructor to review. • Put a short description on top of a table. • Put a short description on the bottom of a graph. DCP1172, Ch.8,9

  4. Term Project –Content of the reports: • Part 1 – problem formulation process • Part 2 – problem-solving content • Logic • syntax, semantic, rules of inferences • Monotonic vs. non-monotonic logics • Ontology • objects and relations, etc. • Computation model • By GA, by search heuristics, by CSP, etc. • Part 3 – Summary and Discussion DCP1172, Ch.8,9

  5. Term Project - Administrative information • Approach ( Choose one) : • Approach 1: Do the term project by oneself • Approach 2: Do the term project by the co-operation of two classmates . • both partners will be given the same credits in this part of study. • You have to make your mind and tell the instructor your decision before Dec 17. • After this day, you are supposed to take the approach 1 by default. DCP1172, Ch.8,9

  6. Term Project - Administrative information (cont.) • Scoring: 15% of the total class (semester) • Report =10%, Program= 5% • Deadline: • Due_1 : Jan 7, 2005. ( with 20% bonus of the project) • Before Jan 14 ( final exam, with 10% bonus) • Due_2 : Jan 21 (no bonus) • Unaccepted after Jan 21 ( Score = 0 of this part) DCP1172, Ch.8,9

  7. Appendix- Term Project Sample Content • Sample -SPAM mail filtering • Part 1 – problem formulation • Part 2 – Problem-solving process ( in part) • logic • Ontology • Computation model • Part 3- Summary and Discussion • Server-side vs. client-side filtering strategy DCP1172, Ch.8,9

  8. Introduction • What is SPAM ? Why to Anti-SPAM ? • Examples – characteristics of SPAM messages • Where did the SPAMmer get my e-mail address ? • Asymmetric information about anti-spam • almost cost-free for the sender/abuser • Asymmetric knowledge • Spam tools vs. anti-spam tools DCP1172, Ch.8,9

  9. What Is Spam? Why so much Spam ? • Best description: "Unsolicited Bulk E-mail" • In human terms: bulk e-mail you didn't want, and didn't ask for • Mailing lists, newsletters, "latest offers": not spam, if you asked for them in the first place • The most common form of spam is commercial spam, where the user is hoping to make a profit. • Take as an example spam, where a user abuses the un-metered nature of email to send out millions of emails. • As the incremental costs of sending more emails to the spammer are almost zero, he can still make a profit even with a success rate of 0.0001 %. DCP1172, Ch.8,9

  10. Why Bother Filtering Spam?--Economical view • Also on a global scale use of bandwidth, CPU resources and people time are wasted. • Seems to be about 30% to 60% of mail traffic, and increasing • The spam recipients time is wasted as well, and receiving the spam may directly costing the user bandwidth fees from their ISP. • ISP's also dislike spam because it costs them time and money to deal with the complaints, and recover overloaded mail servers that sometimes crash under the load of intensive spamming. DCP1172, Ch.8,9

  11. Why Bother Filtering Spam? - Social view • The unfortunate side-effect of this is that many people are annoyed by advertisements they are uninterested, and unlikely to be, interested in. • Nearly impossible to unsubscribe • “unsubscribe” addresses work only 37% of the time, according to the US FTC • Opt-in vs. opt-out • Legal retaliation (求償) not possible in most parts of the world yet • Only possible in some regions (e.g., Korea, some states in US, etc.) • Taiwan • 已經訂草案, 準備送交立法( 近日的報紙消息) • 採用 Option-out DCP1172, Ch.8,9

  12. Spam Volume Is Increasing DCP1172, Ch.8,9 (data from Brightmail.com)

  13. Question – How to identify a SPAM message ? • Question : How to identify a SPAM message ? • Truth depends on interpretation ( e.g., for system using predicate logic, etc.) • Plausible Solution: by using logical inference + Relaxed methods (by using heuristics) • Pattern matching • Blacklist (or enhanced blacklist), Whitelist, Greylist (i.e., two-phase adaptive blacklist ) • Machine learning • Genetic algorithm • Bayesian Learning methods • A Hybrid Model DCP1172, Ch.8,9

  14. Truth Depends on Interpretation(e.g., Anti-spam or anti-virus mail filtering) • MTA0 Filtering with H1(msg) Mail Spool Accept • MTA1 (or MUA1) Filtering With H2(msg) Discard • MTA = Mail Transfer Agent • MUA = Mail User Agent • MTA2(or MUA1) DCP1172, Ch.8,9

  15. Typical Example of Knowledge-based Agent Anti-spam Mail Filtering • Generic Mail Filtering Functions F(n) = g(n) + h(n) • G(n): exact value known • H(n): Heuristic / estimate value • Mail Transfer Agent Client Generic Mail Filtering Fail Anti-SPAM Filtering Reject Mail Spool Pass • Accept DCP1172, Ch.8,9

  16. A Typical Distribution Model of SPAM Mails • Open Relay/Proxy • SPAM Languages • English • Local Native Language • Destination • Mail Server • SPAMMER 1 • (e.g. Dialup User) (EnvFrom, EnvRcpt, RelayIP, ConnTime) Internet rDNS(RelayIP) • SPAMMER 2 • ( e.g., xDSL ) DNS server DNS server rDNS(RelayIP) DCP1172, Ch.8,9

  17. E-mail Basics • In the context of electronic mail, messages are viewed as having an envelope and contents. [RFC2821] • Message = an envelope + content • Content = mail headers + body • MIME extension • For the transmission of images, audio, or other sorts of structured data in electronic mail messages. • There are several extensions published, such as the MIME document series [RFC2045, RFC2046, RFC2049] DCP1172, Ch.8,9

  18. A Simple E-mail Message From cschen@ns2.nctu.edu.tw Wed May 26 16:29:42 2004 Return-Path: <cschen@ns2.nctu.edu.tw> Received: from ns2.nctu.edu.tw (localhost [127.0.0.1]) by ns2.nctu.edu.tw (8.13.0.Beta2/8.13.0.Beta2) with ESMTP id i4Q8Tgtl017667 for <cschen@ns2.nctu.edu.tw>; Wed, 26 May 2004 16:29:42 +0800 (CST) Received: (from cschen@localhost) by ns2.nctu.edu.tw (8.13.0.Beta2/8.13.0.Beta2/Submit) id i4Q8TgHg017666 for cschen; Wed, 26 May 2004 16:29:42 +0800 (CST) Date: Wed, 26 May 2004 16:29:42 +0800 (CST) From: User Cschen <cschen@ns2.nctu.edu.tw> Message-Id: <200405260829.i4Q8TgHg017666@ns2.nctu.edu.tw> To: cschen@ns2.nctu.edu.tw Subject: a test msg Status: R Hi, everybody. This is a test. -cschen DCP1172, Ch.8,9

  19. A Simple Model for Anti-spam Client MTA Anti-SPAM Search Engine-1 • Reject Anti-SPAM Search Engine-K • Bounce Account database Anti-SPAM Learning Engine-N • Accept Mail Spool • Discard DCP1172, Ch.8,9

  20. E-mail Ontology • E-mail Objects (Concepts) • Type of Relations • Types of Constraints DCP1172, Ch.8,9

  21. Partial Mail Ontology DCP1172, Ch.8,9

  22. Type of Relations • MemberOf(Server_ns.nctu.edu.tw, DnsZone_nctu.edu.tw) • MemberOf(Ns_RR, DnsRR_Set) • MemberOf(BINDv8.4.5, BINDServerPackage) • PartOf (DnsSpace, DNS) • ComponentOf(Ns_RR, DnsZone) • PartOf( DnsZoneFwd, DnsSpace) • PartOf( DnsServer, DNS) • SubsetOf(BINDServerPackage, DnsServerPackage) • SubsetOf(DnsSPOF, DnsProblem_Set) • ProblemOf(DnsSPOF, DNS) • SynonymOf(PrimaryServer,MasterServer) • SynonymOf(PartOf, ComponentOf) • RelatedTo() DCP1172, Ch.8,9

  23. SMTP Gateway and DNS SMTP domain name with multiple GWs Optional Mutual_in Receiving SMTP GW MailServer Domain name PartOf MX_RR Prerequisite PartOf SubsetOf SubsetOf Mutual_in Explicit MX_RR Implicit MX_RR A_RR Mutual_ex Prerequisite DCP1172, Ch.8,9

  24. Type of Constraints • Mutual_Exclusive( DnsServer_Model, CachingOnlyServer, AuthoritativeOnlyServer) • Mutually_Inclusive( ResolvingServer_Model, ResolvingServer, RootServerList ) • Optional( RecMailGateway,Explicit_MX_RR,MX_RR) • Optional( DomainName_Host, A_RR, Ptr_RR) • Prerequisite(RecMailGateway , MX_RR_Explicit,MailGateway_ multiple) • Prerequisite(DnsZone_Model,ZoneDelegation, AdvertisingServer) • Temporal( ZoneDelegation, AdvertisingServer) DCP1172, Ch.8,9

  25. Type of Constraints • Mutual_Exlusive (MX_RR, Explicit_MX_RR,Implicit_MX_RR) • Optional(MailGateway, MX_RR, explicit) • For each receiving mail gateway on a specific forward domain zone (e.g., user@nctu.edu.tw), there must exist at least one corresponding registered mail exchange resource record (i.e., MX RR) for it to work properly. • In principle, it could be optionally defined as an explicit MX_RR. • Or, there could be an implicit mail exchange resource record could be derived from the corresponding A_RR, if it is defined (e.g., A_RR for “nctu.edu.tw”). DCP1172, Ch.8,9

  26. Type of Constraints (cont.) • Mutual_Inclusive ( RecMailServer, MX_RR, A_RR ) • Prerequisite(MailGateway, MX_RR) • It is a prerequisite that you should define multiple MX RRs explicitly on DNS to have multiple mail gateways for a specific mail domain (e.g., user@nctu.edu.tw). • Optional(MailGateway, MX_RR, explicit) • For each receiving mail gateway on a specific forward domain zone (e.g., user@nctu.edu.tw), there must exist at least one corresponding registered mail exchange resource record (i.e., MX RR) for it to work properly. • In principle, it could be optionally defined as an explicit MX_RR. • Or, there could be an implicit mail exchange resource record could be derived from the corresponding A_RR, if it is defined (e.g., A_RR for “nctu.edu.tw”). DCP1172, Ch.8,9

  27. Logic in general - What is Logic? • Logic systems are formal languages for representing information such that conclusions can be drawn. • Logic systems could be defined in terms of three parts: • Syntax: Logic defines the valid strings of sentences (i.e., the alphabet of symbols and how they could be combined)in the language. • Semantics: Logic defines the meaning of sentences (i.e.,the truth of a sentence in a world). • Logic is concerned with truth value. The possible truth values are true and false. • A set of rules of deductions:Logic systems enable us to derive one expressionfrom a set of other expressions and thus make arguments and proofs. DCP1172, Ch.8,9

  28. Logic in general - What is Logic ? (cont.) • Language of arithmetic • Syntax issues • x + 2 >= y is a sentence • x2 +y > is not a sentence • Semantics issues • x + 2 >= y is true if and only if the number of x + 2 is no less than the number y • x +2 >= y is true in a world (or model) where x = 7, y = 1 • x +2 >= y is false in a world (or model) where x = 0, y = 6 DCP1172, Ch.8,9

  29. Logic in general - Why Logic is used ? • Logic, concerned with reasoning and validity of arguments, is widely used as a representation method for Artificial Intelligence (i.e., in general, in logic, we are not concerned with the truth of statements, but rather are concerned with their validity). • That is, although the following arguments (syllogism; 三段論) is clearly logically, it is not something that we would considered to be true. All lemons are blue. Mary is lemon. Therefore, Mary is blue. • This set of statements is considered valid because the conclusions (Mary is blue) follows logically from the other two statements, which we often call the premises. DCP1172, Ch.8,9

  30. Types of logic DCP1172, Ch.8,9

  31. A Refined Model for Anti-spam -- Generic Mail Filtering Client (1) Generic Mail Filtering White List Pass (2) Reject Black List Fail (3) • Accept Grey List Mail Spool Fail temporarily (4) Automatic SPAM Learning Fail Update Pass DCP1172, Ch.8,9

More Related