Data Mining Methods for Malware Detection

Data Mining Methods for Malware Detection From MUAZZAM AHMED SIDDIQUI Ph. D dissertation, 2008

Table of Contents • Definitions • Malware • Spam • Phishing • Exploits • Review • Detection methods • Feature base approaches

Malware • Malicious programs • Visuses • Worms • Trojans • Spywares • Adwares • Varieties

Malware • Malicious programs • Any program that is purposefully created to harm the computer system operations or data • is software used to disrupt computer operation, gather sensitive information, or gain access to private computer systems.[1] It can appear in the form of code,scripts, active content, and other software.[2] 'Malware' is a general term used to refer to a variety of forms of hostile or intrusive software.[3](from Wiki)

Malware

MalwareVirus ? • includes viruses, worms, trojans, backdoors, adwares, spywares, bots, rootkits etc. • includes computer viruses, ransomware, worms, trojan horses, rootkits, keyloggers, dialers, spyware, adware, malicious BHOs, rogue security software and other malicious programs; the majority of active malware threats are usually worms or trojans rather than viruses.[4](from Wiki)

Computer virus • A computer virus is a type of malware that, when executed, replicates by inserting copies of itself (possibly modified) into other computer programs, data files, or the boot sector of the hard drive; when this replication succeeds, the affected areas are then said to be "infected".[1][2][3] Viruses often perform some type of harmful activity on infected hosts, such as stealing hard disk space or CPU time, accessing private information, corrupting data, displaying political or humorous messages on the user's screen, spamming their contacts, or logging their keystrokes.

Computer virus

Ransomware • Ransomware comprises a class of malware which restricts access to the computer system that it infects, and demands a ransom paid to the creator of the malware in order for the restriction to be removed. Some forms of ransomwareencrypt files on the system's hard drive (cryptoviral extortion), while some may simply lock the system and display messages intended to coax the user into paying.

Ransomware

Computer worm • A computer worm is a standalone malwarecomputer program that replicates itself in order to spread to other computers.[1] Often, it uses a computer network to spread itself, relying on security failures on the target computer to access it. Unlike a computer virus, it does not need to attach itself to an existing program.[2] Worms almost always cause at least some harm to the network, even if only by consumingbandwidth, whereas viruses almost always corrupt or modify files on a targeted computer.

Computer worm

Trojan • A Trojan horse, or Trojan, is a hacking program that is a non-self-replicating type of malware which gains privileged access to the operating system while appearing to perform a desirable function but instead drops a malicious payload, often including a backdoor allowing unauthorized access to the target's computer.[1] These backdoors tend to be invisible to average users, but may cause the computer to run slowly. Trojans do not attempt to inject themselves into other files like a computer virus. Trojan horses may steal information, or harm their host computer systems.[2] Trojans may use drive-by downloads or install via online games or internet-driven applications in order to reach target computers.

How PC Trojan Duqu Infect computer using Word Document Vulnerability

Rootkits • A rootkit is a stealthy type of software, typically malicious, designed to hide the existence of certain processes or programs from normal methods of detection and enable continued privileged access to a computer.[1] The term rootkit is a concatenation of "root" (the traditional name of the privileged account on Unix operating systems) and the word "kit" (which refers to the software components that implement the tool). The term "rootkit" has negative connotations through its association with malware.[1]

Rootkits ZeroAccess is a trojan horse that affects Microsoft Windows operating systems. It is used to download othermalware on an infected machine and to form a botnet, while remaining hidden on a system usingrootkit techniques.

Keystroke logging • Keystroke logging, often referred to as keylogging or Keyboard Capturing, is the action of recording (or logging) the keys struck on a keyboard, typically in a covert manner so that the person using the keyboard is unaware that their actions are being monitored.[1] It also has very legitimate uses in studies of human-computer interaction.

Keystroke logging

Dialers • Dialers are necessary to connect to the internet (at least for non-broadband connections), but some dialers are designed to connect to premium-rate numbers. The providers of such dialers often search for security holes in the operating system installed on the user's computer and use them to set the computer up to dial up through their number, so as to make money from the calls. Alternatively, some dialers inform the user what it is that they are doing, with the promise of special content, accessible only via the special number. Examples of this content include software for download, (usually illegal) trojans posing as MP3s, trojans posing as pornography, or 'underground' programs such as cracks and keygens.

Dialers

Spyware • Spyware is software that aids in gathering information about a person or organization without their knowledge and that may send such information to another entity without the consumer's consent, or that asserts control over a computer without the consumer's knowledge. [1] • "Spyware" is mostly classified into four types: system monitors, trojans, adware, and tracking cookies.[2] Spyware is mostly used for the purposes such as; tracking and storing internet users' movements on the web; serving up pop-up ads to internet users.

Spyware

Adware • Adware, or advertising-supported software, is any software package which automatically renders advertisements in order to generate revenue for its author. The advertisements may be in the user interface of the software or on a screen presented to the user during the installation process. The functions may be designed to analyze which Internet sites the user visits and to present advertising pertinent to the types of goods or services featured there. The term is sometimes used to refer to software that displays unwanted advertisements.[1]

Adware

Bots • An Internet bot, also known as web robot, WWW robot or simply bot, is a software application that runs automated tasks over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone. The largest use of bots is in web spidering, in which an automated script fetches, analyses and files information from web servers at many times the speed of a human. Each server can have a file called robots.txt, containing rules for the spidering of that server that the bot is supposed to obey or be removed.

Bots

Backdoor • A backdoor in a computer system (or cryptosystem or algorithm) is a method of bypassing normal authentication, securing illegal remote access to a computer, obtaining access to plaintext, and so on, while attempting to remain undetected. The backdoor may take the form of an installed program (e.g., Back Orifice) or may subvert the system through a rootkit.

Backdoor

Rogue security software • Rogue security software is a FraudTool (a form of Internet fraud using computer malware) that deceives or misleads users into paying money for fake or simulated removal of malware (so is a form of ransomware)—or it claims to get rid of, but instead introduces malware to the computer.[1]Rogue security software has become a growing and serious security threat in desktop computing in recent years (from 2008 on).[2]

Rogue security software

Malicious BHO • A Browser Helper Object (BHO) is a DLLmodule designed as a plugin for Microsoft's Internet Explorerweb browser to provide added functionality.

Spam • Spam is abuse of electronic messaging systems to send unsolicited bulk messages Most widely recognized form is email spam. Also includes IM spam, blog spam, discussion forum spam, cell phone messaging spam etc. • Electronic Spamming is the use of electronic messaging systems to send unsolicited bulk messages (spam), especially advertising, indiscriminately.

Spam

Phising • Phishing can be defined as a criminal activity using social engineering techniques. The most common example of phishing is an email asking to enter account/credit card information for ecommerce websites (ebay, amazon etc) and online banking. • Phishing is the act of attempting to acquire information such as usernames, passwords, and credit card details (and sometimes, indirectly, money) by masquerading as a trustworthy entity in an electronic communication.[1][2]

Phising

Exploits • An exploit is a piece of software, a chunk of data, or sequence of commands that take advantage of a bug, glitch or vulnerability in order to cause unintended or unanticipated behavior to occur on computer software, hardware, or something electronic (usually computerized). • This frequently includes such things as gaining control of a computer system or allowing privilege escalation or a denial of service attack.

Payload • In the analysis of malicious software such as worms, viruses and Trojans, it refers to the software's harmful results. Examples of payloads include data destruction, messages with insulting text or spurious e-mail messages sent to a large number of people. • Incomputer security, payload refers to the part of a computer virus which performs a malicious action.[3]

Virus vs. worms • Both self replicated • Worms usually do not need any extra help from a user to replicate and execute • Computer virus usually need human intervention for replication and execution.

Classification of virus by target • Boot sector virus • Master Boot Record (Boot sector in DOS) is a piece of code that runs every time a computer system is booted. Boot sector virus infect the MBR on the disk, hence getting the privilege of getting executed every time the computer system starts up.

Classification of virus by target • File virus • File virus is the most common form of viruses. They infect the file system on a computer. File virus infect executable programs and are executed every time the infected program is run. • Macro virus • Macro virus infect documents and templates instead of executable programs. It is written in a macro programming language that is built into applications like Microsoft Word or Excel. Macro virus can be automatically executed every time the document is opened with the application.

Classification of virus by self protection strategy Self-protection strategy can be defined as the technique used by a virus to avoid detection. In other words, the anti-antivirus techniques. They are • No concealment • Code obfuscation • Encryption • Polymorphism • Metamorphism • stealth

Self protection strategy • No concealment • the one without any concealment. The virus code is clean without any garbage instructions or encryption. • Code obfuscation • a technique developed to avoid specific-signature detection. These include adding no-op instructions, unnecessary jumps etc., so the virus code look muddled and the signature fails. • Encryption • Encrypted viruses use an encrypted virus body and an unencrypted decryption engine. For each infection, the virus is encrypted with a different key to avoid giving a constant signature.

Self protection strategy • Polymorphism • Encrypted virus were caught by the presence of the unencrypted decryption engine that remain constant for every infection. This was cured by the mutating techniques. Polymorphic virus feature a mutation engine that generates the decryption engine on the fly. It consists of a decryption engine, a mutation engine and payload. The encrypted virus body and the mutating decryption engine refused to provide a constant signature.

Self protection strategy • Metamorphism • Metamorphic virus is a self mutating virus in its truest form of the word at it has no constant parts. • The virus body itself changes during the infection process and hence the infected file represents a new generation that does not resemble the parent. • Stealth • Stealth techniques, also called code armoring, refers to the set of techniques developed by the virus writers to avoid the recent detection methods of activity monitoring, code emulation etc. • The techniques include anti-disassembly, anti-debugging, anti-emulation, anti-heuristics etc.

Classification of worm • Activation • Activation defines the means by which a worm is activated onto the target system. This is the first phase in a worms life cycle. • Payload • The next phase in the worm’s life cycle is payload delivery. Payload describes what a worm does after the infection. • Target discovery • Once the payload is delivered, the worm start looking for new targets to attack. • Propagation • Propagation defines the means by which a worm spreads on a network.

Activation of worm • Human activation • Requiring a human to execute the worm • Human activity-based activation • Some action that the user performs not directly related to the worm such as launching an application program etc. • Scheduled process activation • Depending on the scheduled system processes such as automatic download of software updates etc. • Self activation • Initializing its execution by exploiting the vulnerabilities in the programs that are always running such as database or web servers

Payload of worm • None • Most worms do not carry any payload to avoid increasing machine and network traffic load. • Internet remote control • Some worms open a backdoor on the victims machine thus allowing connection from others via internet. • Spam-relays • Some worms convert the victim machine into a spam relay, thus allowing spammers to use it as a server.

Target discovery of worm • Scanning worms • They scan for targets by scanning sequentially through a block of addresses or by scanning randomly. • Flash worms • They use a pre-generated target list or a hit list to accelerate the target discovery process. • Metaserver worms • They use a list of addresses to infect maintained by an external metaserver. • Topological worms • They try to find the local communication topology by searching through a list of hosts maintained by application programs • Passive worms • They rely on user intervention or targets to contact worm for their execution

Propagation of worm • Self-carried • They usually self activated and copy themselves to the target as part of the infection process. • Second channel • They copy their body after the infection by creating a connection from target to host to download the body. • Embedded • They embed themselves in the normal communication process as a stealth technique.

Trojan vs. virus/worm • More annoying than malicious • Do not reproduce by infecting other files • Do not self-replicate • Rely heavily on the exploitation of an end-user • Often create a backdoor

Data Mining Methods for Malware Detection