data mining methods for malware detection n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data Mining Methods for Malware Detection PowerPoint Presentation
Download Presentation
Data Mining Methods for Malware Detection

Loading in 2 Seconds...

play fullscreen
1 / 65

Data Mining Methods for Malware Detection - PowerPoint PPT Presentation

  • Uploaded on

Data Mining Methods for Malware Detection. From MUAZZAM AHMED SIDDIQUI Ph. D dissertation, 2008. Table of Contents. Definitions Malware Spam Phishing Exploits Review Detection methods Feature base approaches. Malware. Malicious programs Visuses Worms Trojans Spywares Adwares

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Data Mining Methods for Malware Detection' - skyla

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data mining methods for malware detection

Data Mining Methods for Malware Detection


Ph. D dissertation, 2008

table of contents
Table of Contents
  • Definitions
    • Malware
    • Spam
    • Phishing
    • Exploits
  • Review
    • Detection methods
    • Feature base approaches
  • Malicious programs
    • Visuses
    • Worms
    • Trojans
    • Spywares
    • Adwares
    • Varieties
  • Malicious programs
    • Any program that is purposefully created to harm the computer system operations or data
    • is software used to disrupt computer operation, gather sensitive information, or gain access to private computer systems.[1] It can appear in the form of code,scripts, active content, and other software.[2] 'Malware' is a general term used to refer to a variety of forms of hostile or intrusive software.[3](from Wiki)
malware virus
MalwareVirus ?
  • includes viruses, worms, trojans, backdoors, adwares, spywares, bots, rootkits etc.
  • includes computer viruses, ransomware, worms, trojan horses, rootkits, keyloggers, dialers, spyware, adware, malicious BHOs, rogue security software and other malicious programs; the majority of active malware threats are usually worms or trojans rather than viruses.[4](from Wiki)
computer virus
Computer virus
  • A computer virus is a type of malware that, when executed, replicates by inserting copies of itself (possibly modified) into other computer programs, data files, or the boot sector of the hard drive; when this replication succeeds, the affected areas are then said to be "infected".[1][2][3] Viruses often perform some type of harmful activity on infected hosts, such as stealing hard disk space or CPU time, accessing private information, corrupting data, displaying political or humorous messages on the user's screen, spamming their contacts, or logging their keystrokes.
  • Ransomware comprises a class of malware which restricts access to the computer system that it infects, and demands a ransom paid to the creator of the malware in order for the restriction to be removed. Some forms of ransomwareencrypt files on the system's hard drive (cryptoviral extortion), while some may simply lock the system and display messages intended to coax the user into paying.
computer worm
Computer worm
  • A computer worm is a standalone malwarecomputer program that replicates itself in order to spread to other computers.[1] Often, it uses a computer network to spread itself, relying on security failures on the target computer to access it. Unlike a computer virus, it does not need to attach itself to an existing program.[2] Worms almost always cause at least some harm to the network, even if only by consumingbandwidth, whereas viruses almost always corrupt or modify files on a targeted computer.
  • A Trojan horse, or Trojan, is a hacking program that is a non-self-replicating type of malware which gains privileged access to the operating system while appearing to perform a desirable function but instead drops a malicious payload, often including a backdoor allowing unauthorized access to the target's computer.[1] These backdoors tend to be invisible to average users, but may cause the computer to run slowly. Trojans do not attempt to inject themselves into other files like a computer virus. Trojan horses may steal information, or harm their host computer systems.[2] Trojans may use drive-by downloads or install via online games or internet-driven applications in order to reach target computers. 
  • A rootkit is a stealthy type of software, typically malicious, designed to hide the existence of certain processes or programs from normal methods of detection and enable continued privileged access to a computer.[1] The term rootkit is a concatenation of "root" (the traditional name of the privileged account on Unix operating systems) and the word "kit" (which refers to the software components that implement the tool). The term "rootkit" has negative connotations through its association with malware.[1]

ZeroAccess is a trojan horse that affects Microsoft Windows operating systems. It is used to download othermalware on an infected machine and to form a botnet, while remaining hidden on a system usingrootkit techniques.

keystroke logging
Keystroke logging
  • Keystroke logging, often referred to as keylogging or Keyboard Capturing, is the action of recording (or logging) the keys struck on a keyboard, typically in a covert manner so that the person using the keyboard is unaware that their actions are being monitored.[1] It also has very legitimate uses in studies of human-computer interaction.  
  • Dialers are necessary to connect to the internet (at least for non-broadband connections), but some dialers are designed to connect to premium-rate numbers. The providers of such dialers often search for security holes in the operating system installed on the user's computer and use them to set the computer up to dial up through their number, so as to make money from the calls. Alternatively, some dialers inform the user what it is that they are doing, with the promise of special content, accessible only via the special number. Examples of this content include software for download, (usually illegal) trojans posing as MP3s, trojans posing as pornography, or 'underground' programs such as cracks and keygens.
  • Spyware is software that aids in gathering information about a person or organization without their knowledge and that may send such information to another entity without the consumer's consent, or that asserts control over a computer without the consumer's knowledge. [1]
  • "Spyware" is mostly classified into four types: system monitors, trojans, adware, and tracking cookies.[2] Spyware is mostly used for the purposes such as; tracking and storing internet users' movements on the web; serving up pop-up ads to internet users.
a dware
  • Adware, or advertising-supported software, is any software package which automatically renders advertisements in order to generate revenue for its author. The advertisements may be in the user interface of the software or on a screen presented to the user during the installation process. The functions may be designed to analyze which Internet sites the user visits and to present advertising pertinent to the types of goods or services featured there. The term is sometimes used to refer to software that displays unwanted advertisements.[1]
  • An Internet bot, also known as web robot, WWW robot or simply bot, is a software application that runs automated tasks over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone. The largest use of bots is in web spidering, in which an automated script fetches, analyses and files information from web servers at many times the speed of a human. Each server can have a file called robots.txt, containing rules for the spidering of that server that the bot is supposed to obey or be removed.
  • A backdoor in a computer system (or cryptosystem or algorithm) is a method of bypassing normal authentication, securing illegal remote access to a computer, obtaining access to plaintext, and so on, while attempting to remain undetected. The backdoor may take the form of an installed program (e.g., Back Orifice) or may subvert the system through a rootkit.
rogue security software
Rogue security software
  • Rogue security software is a FraudTool (a form of Internet fraud using computer malware) that deceives or misleads users into paying money for fake or simulated removal of malware (so is a form of ransomware)—or it claims to get rid of, but instead introduces malware to the computer.[1]Rogue security software has become a growing and serious security threat in desktop computing in recent years (from 2008 on).[2]
malicious bho
Malicious BHO
  • A Browser Helper Object (BHO) is a DLLmodule designed as a plugin for Microsoft's Internet Explorerweb browser to provide added functionality.
  • Spam is abuse of electronic messaging systems to send unsolicited bulk messages Most widely recognized form is email spam. Also includes IM spam, blog spam, discussion forum spam, cell phone messaging spam etc.
  • Electronic Spamming is the use of electronic messaging systems to send unsolicited bulk messages (spam), especially advertising, indiscriminately.
  • Phishing can be defined as a criminal activity using social engineering techniques. The most common example of phishing is an email asking to enter account/credit card information for ecommerce websites (ebay, amazon etc) and online banking.
  • Phishing is the act of attempting to acquire information such as usernames, passwords, and credit card details (and sometimes, indirectly, money) by masquerading as a trustworthy entity in an electronic communication.[1][2]
  • An exploit is a piece of software, a chunk of data, or sequence of commands that take advantage of a bug, glitch or vulnerability in order to cause unintended or unanticipated behavior to occur on computer software, hardware, or something electronic (usually computerized).
  • This frequently includes such things as gaining control of a computer system or allowing privilege escalation or a denial of service attack.
  • In the analysis of malicious software such as worms, viruses and Trojans, it refers to the software's harmful results. Examples of payloads include data destruction, messages with insulting text or spurious e-mail messages sent to a large number of people.
  • Incomputer security, payload refers to the part of a computer virus which performs a malicious action.[3]
virus vs worms
Virus vs. worms
  • Both self replicated
  • Worms usually do not need any extra help from a user to replicate and execute
  • Computer virus usually need human intervention for replication and execution.
classification of virus by target
Classification of virus by target
  • Boot sector virus
    • Master Boot Record (Boot sector in DOS) is a piece of code that runs every time a computer system is booted. Boot sector virus infect the MBR on the disk, hence getting the privilege of getting executed every time the computer system starts up.
classification of virus by target1
Classification of virus by target
  • File virus
    • File virus is the most common form of viruses. They infect the file system on a computer. File virus infect executable programs and are executed every time the infected program is run.
  • Macro virus
    • Macro virus infect documents and templates instead of executable programs. It is written in a macro programming language that is built into applications like Microsoft Word or Excel. Macro virus can be automatically executed every time the document is opened with the application.
classification of virus by self protection strategy
Classification of virus by self protection strategy

Self-protection strategy can be defined as the technique used by a virus to avoid detection. In other words, the anti-antivirus techniques. They are

  • No concealment
  • Code obfuscation
  • Encryption
  • Polymorphism
  • Metamorphism
  • stealth
self protection strategy
Self protection strategy
  • No concealment
    • the one without any concealment. The virus code is clean without any garbage instructions or encryption.
  • Code obfuscation
    • a technique developed to avoid specific-signature detection. These include adding no-op instructions, unnecessary jumps etc., so the virus code look muddled and the signature fails.
  • Encryption
    • Encrypted viruses use an encrypted virus body and an unencrypted decryption engine. For each infection, the virus is encrypted with a different key to avoid giving a constant signature.
self protection strategy1
Self protection strategy
  • Polymorphism
    • Encrypted virus were caught by the presence of the unencrypted decryption engine that remain constant for every infection. This was cured by the mutating techniques. Polymorphic virus feature a mutation engine that generates the decryption engine on the fly. It consists of a decryption engine, a mutation engine and payload. The encrypted virus body and the mutating decryption engine refused to provide a constant signature.
self protection strategy2
Self protection strategy
  • Metamorphism
    • Metamorphic virus is a self mutating virus in its truest form of the word at it has no constant parts.
    • The virus body itself changes during the infection process and hence the infected file represents a new generation that does not resemble the parent.
  • Stealth
    • Stealth techniques, also called code armoring, refers to the set of techniques developed by the virus writers to avoid the recent detection methods of activity monitoring, code emulation etc.
    • The techniques include anti-disassembly, anti-debugging, anti-emulation, anti-heuristics etc.
classification of worm
Classification of worm
  • Activation
    • Activation defines the means by which a worm is activated onto the target system. This is the first phase in a worms life cycle.
  • Payload
    • The next phase in the worm’s life cycle is payload delivery. Payload describes what a worm does after the infection.
  • Target discovery
    • Once the payload is delivered, the worm start looking for new targets to attack.
  • Propagation
    • Propagation defines the means by which a worm spreads on a network.
activation of worm
Activation of worm
  • Human activation
    • Requiring a human to execute the worm
  • Human activity-based activation
    • Some action that the user performs not directly related to the worm such as launching an application program etc.
  • Scheduled process activation
    • Depending on the scheduled system processes such as automatic download of software updates etc.
  • Self activation
    • Initializing its execution by exploiting the vulnerabilities in the programs that are always running such as database or web servers
payload of worm
Payload of worm
  • None
    • Most worms do not carry any payload to avoid increasing machine and network traffic load.
  • Internet remote control
    • Some worms open a backdoor on the victims machine thus allowing connection from others via internet.
  • Spam-relays
    • Some worms convert the victim machine into a spam relay, thus allowing spammers to use it as a server.
target discovery of worm
Target discovery of worm
  • Scanning worms
    • They scan for targets by scanning sequentially through a block of addresses or by scanning randomly.
  • Flash worms
    • They use a pre-generated target list or a hit list to accelerate the target discovery process.
  • Metaserver worms
    • They use a list of addresses to infect maintained by an external metaserver.
  • Topological worms
    • They try to find the local communication topology by searching through a list of hosts maintained by application programs
  • Passive worms
    • They rely on user intervention or targets to contact worm for their execution
propagation of worm
Propagation of worm
  • Self-carried
    • They usually self activated and copy themselves to the target as part of the infection process.
  • Second channel
    • They copy their body after the infection by creating a connection from target to host to download the body.
  • Embedded
    • They embed themselves in the normal communication process as a stealth technique.
trojan vs virus worm
Trojan vs. virus/worm
  • More annoying than malicious
  • Do not reproduce by infecting other files
  • Do not self-replicate
  • Rely heavily on the exploitation of an end-user
  • Often create a backdoor
how trojan work
How Trojan work
  • Basic components: client and server
    • Client: attacker
    • Server: victim machine
  • Two kinds of connections:
    • Forward: The server listen on specific ports connections from client which knows IP sent by the server
    • Reverse: the server connects to Client (partly because of NAT infrastructure and the firewall on victim machine)
trojan must know
Trojan must know
  • Trojans are extremely simple to create in many programming languages. A simple Trojan in Visual Basic or C# using Visual Studio can be achieved in 10 lines of code or under.
  • Probably the most famous Trojan horse is the AIDS TROJAN DISK that was sent to about 7000 research organizations on a diskette.
common functions of trojan
Common functions of Trojan
  • Remote access trojans
  • Password sending trojans
  • Keyloggers
  • Destructive
  • Mail-bomb trojan
  • Proxy/wingatetrojans
  • FTP trojans
  • Software detection killers
remote access trojan
Remote Access Trojan
  • These are probably the most widely used trojans, just because they give the attackers the power to do more things on the victim’s machine than the victim itself while being in front of the machine.
  • Most of these trojans are often a combination of the other variations described below. The idea of these trojans is to give the attacker a total access to someone’s machine and therefore access to files, private conversations, accounting data, etc.
password sending trojan
Password Sending Trojan
  • The purpose of these trojans is to rip all the cached passwords and also look for other passwords you’re entering and then send them to a specific mail address without the user noticing anything.
  • Passwords for ICQ, IRC, FTP, HTTP or any other application that require a user to enter a (login+ password) are being sent back to the attacker’s email address, which in most cases is located at some free web based email provider.
  • These trojans are very simple. The only thing they do is logging the keystrokes of the victim and then letting the attacker search for passwords or other sensitive data in the log file.
  • Most of them come with two functions like online and offline recording. Of course, they could be configured to send the log file to a specific email address on a schedule basis.
  • The only function of these trojans is to destroy and delete files. This makes them very simple and easy to use. They can automatically delete all the system files on your machine.
  • The trojan is being activated by the attacker or sometimes works like a logic bomb and starts on a specific day and at specific hour.
  • Denial Of Service (DoS) Attack Trojans These trojans are getting very popular these days, giving the attacker the power to start DDoS when having enough victims.
mail bomb trojan
Mail-Bomb Trojan
  • This is another variation of a DoStrojan whose main aim is to infect as many machines as possible and simultaneously attack specific email address/addresses with random subjects and contents which cannot be filtered.
proxy wingate trojans
Proxy/Wingate Trojans
  • It turns the victim’s computer into a proxy/wingate server available to the whole world or to the attacker only. It’s used for anonymous Telnet, ICQ, IRC, etc., and also for registering domains with stolen credit cards and for many other illegal activities.
  • This gives the attacker complete anonymity and the chance to do everything from your computer, and if he/she gets caught, the trace leads back to you.
ftp trojans
FTP Trojans
  • These trojans are probably the most simple ones and are kind of outdated as the only thing they do is to open port 21 (the port for FTP transfers) and let everyone or just the attacker connect to your machine.
  • Newer versions are password protected, so only the one who infected you may connect to your computer.
software detection killers
Software Detection Killers
  • There are such functionalities built into some trojans, but there are also separate programs that will kill ZoneAlarm, Norton Anti-Virus and many other (popular anti-virus/firewall) programs that protect your machine.
  • When they are disabled, the attacker will have full access to your machine to perform some illegal activity, use your computer to attack others and often disappear.
  • Even though you may notice that these programs are not working or functioning properly, it will take you some time to remove the trojan, install the new software, configure it and get back online with some sense of security.
  • the first reported incidents of true viruses were in 1981 and 1982 on the Apple II computer.
  • Elk Cloner is considered to be the first documented example of a virus in mid-1981.
  • The first PC virus was a boot sector virus called Brain in 1986.
hierarchy of malware detection
Hierarchy of Malware Detection

Scanning, activity monitoring, integrity checking, data mining

Byte sequences, instruction sequences, API call sequences, hybrid, network connection records

Static analysis, dynamic analysis, mixed approach

Anomaly detection, misuse detection, hybrid detection

Detection methods

Feature type

Analysis type

Detection strategy