E N D
Introduction • Association rule mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently a itemset occurs in a transaction. • Association rule mining is a data mining technique that helps uncover relationships and patterns within large datasets. It’s widely used in industries like retail, e-commerce, and healthcare to understand customer behavior, detect patterns, and make informed decisions.
Use Case • Association rule mining is a data analysis technique that identifies patterns and relationships within large datasets of network traffic or system logs, allowing investigators to discover potential malicious activities by uncovering co-occurring events that might indicate a cyber attack, even if individual events seem benign on their own, essentially revealing "if-then" relationships between different system behaviors
Identifying suspicious patterns: By analyzing log data, association rule mining can identify sequences of seemingly normal events that, when occurring together, might indicate a malicious attack, such as a suspicious IP address accessing multiple sensitive files within a short timeframe If-then" rule format: Association rules are typically expressed as "if [event A] occurs, then [event B] is likely to occur", allowing investigators to understand potential attack chains or correlated behaviors.
Metrics for evaluation: "Support" (how frequently a pattern appears in the data) and "confidence" (the probability of event B occurring given event A) are key metrics used to assess the strength of an association rule
Common applications of association rule mining in cyber forensics: • Intrusion detection: • Analyzing network traffic logs to identify unusual patterns that might indicate an ongoing intrusion attempt. • Anomaly detection: • Discovering deviations from normal system behavior by looking for unusual combinations of events. • Malware analysis: • Identifying sequences of system calls or registry modifications characteristic of specific malware families. • User behavior analysis: • Analyzing user logins, file access patterns, and network activity to detect potential insider threats.
Challenges in using association rule mining for cyber forensics: • Data complexity: Large volumes of diverse log data can be challenging to process and analyze effectively. • False positives: Identifying relevant associations while filtering out noise and irrelevant patterns can be difficult. • Algorithm selection: Choosing the appropriate association rule mining algorithm based on the characteristics of the data is crucial.
Case Study: Identifying Phishing Attempts • Dataset: Email metadata • Rules: If sender is unknown and link clicks > 5, then phishing attempt likelihood = 90% • Visual Representation: Use graphs or charts
Dataset Overview Email Metadata Fields Used: Sender Type: Known or Unknown Email Subject Keywords: Presence of specific keywords like "Urgent," "Account Locked," etc. Number of Link Clicks: Count of URLs clicked within the email. Attachment Type: Presence and type of attachments (e.g., .pdf, .exe). Time of Access: Time when the email was opened or accessed. Email Content Sentiment: Level of urgency or request for sensitive information.
Association Rule Generation Rule Example: If Sender Type = UnknownAND Link Clicks > 5AND Keywords = "Urgent" or "Account Locked"THEN Phishing Attempt Likelihood = 90 If Attachment = .exeAND Sentiment = High UrgencyTHEN Phishing Attempt Likelihood = 85%