360 likes | 417 Views
Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic Middleware System. Robert C Broeckelmann Jr. 29 Nov 2007. Where We Started…. Began meeting with Dr. Gill in September, 2006.
E N D
Masters Project DefenseInvestigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic Middleware System Robert C Broeckelmann Jr. 29 Nov 2007
Where We Started… • Began meeting with Dr. Gill in September, 2006. • Officially, started working in January, 2007. • Explored how an OS Scheduler could be extended to determine, before a scheduling decision, if • a thread is displaying an undesirable behavior • operating outside of a predefined range • Can information available to an IDS system be fed to an OS scheduler?[22,23,24,25,26]
Where We Went… • What other information is available to make such decisions? • How do we gather & process this information? • How Do You classify the High-Level Function of Threads based upon this data? • Practical use in Industry.
Original Test Environment • Spent Spring Semester building a test environment. • Several dead ends. • Original Test Environment consisted of: • VMWare Workstation 5.x[10] • Fedora Core 6[11] • Custom build of KURT Linux 2.6.18,.19,.20/STREAMs[12] • Custom Linux Kernel build 2.6.18[13] • Java 1.5[1,3,5,8,9] • Strace[14] • KURT Linux incapable of capturing System Calls per lwp out-of-the-box. • Explored using strace on Linux—problems with high-thread counts.
Final Test Environment • Final Test environment • 2 Dell PCs(2 CPU, 2GB memory) • OpenSolaris 2.11[28] • Java 1.6[4,5,6,8,9] • JBoss 3.2.8b[15] • MySQL v5.0 Community Server[29] • JMeter v2.2[31] • Java PetStore eCommerce application (J2EE Spec v1.3).[32] • PetStore configuration adaption for JBoss [33] • Had to move to Java 1.6.0 that ships with OpenSolaris 2.11 in order to utilize plug points with DTrace • This Masters Project completed using almost entirely Open Source tools. • Note, OpenSolaris, DTrace released under the OpenSolaris Binary License & CDDL (OSI approved) license[34,35]. • Java 1.6 is not Open Source. JDK 1.7 will be Open Source[36].
What information is available? • Available Information • System Calls[41] • File Descriptor, I/O SysCall patterns[42] • CPU utilization • Traditional (User, Kernel, Idle, I/O Wait)[30] • Micro-State Accounting information[30] • Other information is available, limiting scope. • Must be gathered with minimal overhead.
Gathering Information • Each type of data has a tool of choice • System Calls -> DTrace/DTruss[7,27,37,38] • Traditional CPU Utilization-> vmstat, prstat[40,43,44] • Micro-State Accounting -> prstat[36,40] • This project focuses on the use of System Call sequences (broken down per thread).
Practical Uses • The techniques developed here could have a practical use in industry. • For example, a System Administrator or Performance Engineer managing/monitoring a complex J2EE installation. • Such as BEA Weblogic, IBM Websphere, or Redhat JBoss[45,46,47] • Similar, multithreaded-middleware environment
Our Approach • Original Goal: build an OS Scheduler with an ability to distinguish between a thread whose behavior is desirable and one that is undesirable. • Chose to focus on a prerequisite. • What data do we gather? • Techniques for gathering and processing that data. • Focused on the classification of threads within a multithreaded process by data that can be gathered • on a per-thread basis at run-time. • efficiently • First step towards building this enhanced OS Scheduler.
Previous Work • Work in the areas of OS Security research and IDS systems has used system call analysis heavily[19,20,21,48,49,43,42]. • SubDomainTM:Parsimonious Server Security • Improving Host Security with System Call Policies • Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools • A Secure Environment for Untrusted Helper Applications Confining the Wily Hacker • Ostia: A Delegating Architecture for Secure System Call Interposition
Classifying Thread Functions • Using System Call information, I created a method to visually classify a thread’s function. • Experimented with different machine-learning algorithms to try to accurately predict thread function.
The Method For Classifying Threads • Basis for a new method that can be used to classify threads and determine if they are behaving correctly. • Produces a visual finger print of a thread’s behavior. • Produces a representation of run-time characteristics that would otherwise be difficult to analyze, visualize, & bring together.
Subject Of The Method – Modern Middleware • Modern (especially Java-based)middleware involves one or more processes with many threads & moving pieces. • Capturing the behavior of a single thread or interaction between the constituent pieces can be challenging. • We used JBoss 3.2.8SP1 as a representative piece of modern middleware for this project.
JBoss Internals • JMX Micro-Kernel Architecture • All the major J2EE subsystems are JMX beans. • Jboss 3.x fully supports J2EE 1.3 Spec. • Used for 3.2.8SP1 maturity and the Java PetStore application version used. • For more information, see [2]. • Note, Jboss 5.x is a complete architectural redesign.
High-Level Classification Of Threads In A JBoss J2EE Container
Data Gathering & Processing • Tools used to gather data during a load test • DTrace[7] • DTrace Toolkit[37] • Dtruss[38] • Bash Shell Scripting[39] • GNU Tools[18] • Prstat[40] • Tools used to process data after a load test • GNUPlot[17] • Bash Shell & other GNU tools[18] • RapidMiner[16]
Thread groups I am studying. • Thread Groups(collection of threads that perform similar functions) • HTTP Processor • JMS Thread(3) • JMS Session Workers • Connection Consumer • JBoss MQ Cache Reference Softener • Scanner Thread • Young GC Threads • Old Gen GC Thread • JIT Compiler • HSQLDB Timer • TimeOut Factory Thread • Why were the other threads left out? • Couldn’t capture thread type via a Java Thread Dump. • Insufficient number of System Calls made by thread during load test.
Result 1 – SysCall Graphs • Hypothesis: • We can use OS data (such as system call usage) to build a graphical representation (histogram) that uniquely identifies each type of thread (Thread Group). • Result: • For many thread types, yes. Classifying system call sequences using Thread Dumps shows that there is an identifiable pattern of System Calls in many thread types.
Data Processing—SysCall Graphs • Split into individual threads. • Replace system call names with #'s • Produce frequency counts • Build GNUPlot files • Generate PNG images • Generate HTML page
How To Map Threads & Graphs • The Sun JVM has the ability to pause all threads to print for each • full call stack • thread description • native lwpid. • Several thread dumps were captured during load tests. • Matched LWPIDs to NIDs(Native IDs) in Thread Dump.
Graph Format • 3-Dimensional • X – Time* • Y – System Call Type • Z – Frequency • *Relative time-frames are not represented.
Cache Reference Softener/Connection Consumer/Young GC ThreadHSQLDB Timer/HTTP Processor/TimeOutFactory
JIT Compiler/JMS Thread(3)/Session WorkerOld GC Thread/Scanner Thread
Results • Using these results we are able to categorize many of the threads with the SysCall Graphs. • From there, we were able to compare SysCall Graphs within a single run and between different runs. • Visually-recognizable pattern for each of the Thread Types that we are looking at. • This pattern holds for threads of the same type in each run. • This pattern holds for threads of the same type in different runs.
Comparisons between Runs:Connection Consumers/JMS Session Workers/HTTP Processor
Statistical Analysis of Data • Tried Nearest Neighbor on the actual sequence using Euclidean & Nominal measures—Unsuccessful. • Different length sequences • Experimented with Hyper Planes—Unsuccessful. • Experimented with 1st Order Markov Chains—Unsuccessful. • Tried NN on SysCall counts of a thread using Euclidean Measure. • Greatest success • Not perfect
Result 2 – Nearest Neighbor • Hypothesis 2: • We can apply machine learning techniques to predict the different thread types using the data we have gathered. • Result: • Using Nearest Neighbor on the system call counts we can partially do this.
Data Processing(Result 2) • RapidMiner Data Files[16] • Define an ARFF model definition file • Define an AML test data definition file. • Put test data into a space-delimited file. • Define Nearest Neighbor XML file • Produces a RapidMiner model file. • Define ModelLoader XML file. • Loads a model file and test data. • Forms predictions regarding test data. • Produces a data file that lists predictions and confidence values for each row in data file.
What It Did NOT Accurately Predict • No Connection Consumer threads accurately predicted with Nearest Neighbor. • System Call counts very similar to other threads. • One HTTP Processor thread mispredicted. • This thread handled very little traffic. As a result its system call counts were significantly different. • Shows shortcomings of Nearest Neighbor (Euclidean Distance Measure) algorithm for our purposes.
Future Directions • Rth-Level Markov Chain modeling of system call sequences to accurately predict Thread Functions[48,54]. • Using Micro-State Accounting data to fingerprint/predict thread types[36].
Questions? • Thank you.
JavaTM 2 Platform Standard Edition 5.0 API Specification. 29 Sept. 2004. Sun Microsystems Inc. 13 Jan. 2007 <http://java.sun.com/j2se/1.5.0/docs/api/> Research Project: An Analysis of JBoss Architecture. Liu, Jenny. 29 Apr. 2002. School of Information Technologies, University of Sydney. 13 Jan. 2007 <http://www.huihoo.org/jboss/jboss.html.> JDKTM 5.0 Documentation. 29 Sept. 2004. Sun Microsystems Inc. 13 Jan. 2007 <http://java.sun.com/j2se/1.5.0/docs/> Java™ Platform, Standard Edition 6 API Specification. 12 Dec. 2006. Sun Microsystems Inc. 1 Apr. 2007 <http://java.sun.com/javase/6/docs/api/> HotSpot Runtime Overview. OpenJDK Project. 15 Apr 2007. <https://openjdk.dev.java.net/hotspot/docs/RuntimeOverview.html> JDKTM 6 Documentation. 12 Dec. 2006. Sun Microsystems Inc. 1 Apr. 2007 <http://java.sun.com/javase/6/docs/> OpenSolaris Community: Dtrace. OpenSolaris. Sun Microsystems Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace/> The Java Language Specification, Third Edition. 1 Jan 2005. Gosling, James. Joy, Bill. Steele, Guy. 13 Jan. 2007 http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html The JavaTM Virtual Machine Specification Second Edition. 1999. Lindholm, Tim. Yellin, Frank. 13 Jan. 2007 <http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html> Workstation 5 User’s Manual. 16 Sept. 2005. Vmware, Inc. 13 Jan. 2007. < http://www.vmware.com/pdf/ws5_manual.pdf > Fedora Project – Fedora Core 6. 22 Oct 2006. RedHat, Inc. 13 Jan. 2007. <http://www.fedoraproject.org/> KU System Programming. The University of Kansas. 13 Jan. 2007 <http://wiki.ittc.ku.edu/kusp_wiki/index.php/Main_Page> The Linux Kernel Archive. 12 Jan 2007. Linux Kernel Organization, Inc. 13 Jan 2007 <http://www.kernel.org/> Strace Project. 13 Jan 2007. Strace Project <http://sourceforge.net/projects/strace/> JBoss Admin Development Guide. 2004. JBoss, Inc. 13 Jan 2007. <http://docs.jboss.org/jbossas/admindevel326/html/> Reference
Reference • Mierswa, I. and Wurst, M. and Klinkenberg, R. and Scholz, M. and Euler, T., Yale (now: RapidMiner): Rapid Prototyping for Complex Data Mining Tasks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD 2006), 2006. • gnuplot homepage. 15 Apr 2007. Williams, Thomas. Kelley, Colin.<http://www.gnuplot.info> • The GNU Operating system - the GNU project - Free Software Foundation - Free as in Freedom - GNU/Linux. 15 Apr 2007. Free Software Foundation. <http://www.gnu.org> • Design and Performance of Configurable Endsystem Scheduling Mechnaisms • The Design, Modeling, and Implementation of Group Scheduling for Isolation of Computations from Adversarial Interference. • Group Scheduling in SELinux to Mitigate CPU-Focused Denial of Service Attacks. • SubDomainTM:Parsimonious Server Security • Improving Host Security with System Call Policies • Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools • A Secure Environment for Untrusted Helper Applications Confining the Wily Hacker • Ostia: A Delegating Architecture for Secure System Call Interposition • Solaris Dynamic Tracing Guide. 5 Sep. 2005. Sun Microsystems, Inc. 1 Apr 2007 http://docs.sun.com/app/docs/doc/817-6223 OpenSolaris v2.11 • Home at OpenSolaris.org. 1 Jun. 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/> • MySQL 5.0 Reference Manual. MySQL AB. 1 Apr 2007. <http://dev.mysql.com/doc/refman/5.0/en/manual-info.html> • Solaris Internals CPU/Processor. 15 July 2007. Solaris Internals. 1 Nov. 2007. <http://www.solarisinternals.com/wiki/index.php/CPU/Processor>
Reference • JMeter: Users Manual. 1 Jun. 2006. Apache Jakarta Project. 15 Apr 2007 < http://jakarta.apache.org/jmeter/usermanual/intro.html /> • Java Pet Store Demo 1.3.2. 4 Aug. 2003. Sun Microsystems, Inc. 13 Jan 2007 <http://java.sun.com/blueprints/code/jps132/docs/index.html> • Java Petstore Tutorial. MobileFish. 13 Jan 2007 <http://www.mobilefish.com/tutorials/petstore_1_3_2/petstore_1_3_2_quickguide_jbossmysql.html> • OpenSolaris Binary License. 4 Nov. 2005. Sun MicroSystems. 1 Apr. 2007. <http://opensolaris.org/os/licensing/opensolaris_binary_license/> • COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL). 24 Jan 2004. Sun Microsystems, Inc. 15 Apr. 2007 <http://www.sun.com/cddl/cddl.html> 36 The GNU General Public License, Version 2. 1 Jun. 1991. Free Software Foundation. 1 Nov 2007 <http://www.fsf.org/licensing/licenses/info/GPLv2.html> 37 OpenSolaris Community: Dtrace. 1 Jun. 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace> 38 DTraceToolkit at OpenSolaris.org. 1 Jun 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace/dtracetoolkit> 39 Bash Reference Manual. 15 Jul. 2002. Free Software Foundation. 1 Apr 2007 <http://www.gnu.org/software/bash/manual/bashref.html> 40 prstat(1M). 4 Jan. 2001. Sun Microsystems, Inc. 1 Apr 2007 <http://docs.sun.com/app/docs/doc/816-0211/6m6nc673u?a=view> • man pages section 2: System Calls. 4 Oct 2005. Sun Microsystems, Inc. 1 Apr 2007. <http://docs.sun.com/app/docs/doc/816-5167?l=en> • S. Zanero, Unsupervised Learning Algorithms for Intrusion Detection, Ph.D. Thesis, DEI Politecnico di Milano, 2006 • The Design, Modeling, and Implementation of Group Scheduling for Isolation of Computations from Adversarial interference • vmstat(1M). 20 Dec. 2004. Sun Microsystems, Inc. 1 Apr 2007 <http://docs.sun.com/app/docs/doc/816-5166/6mbb1kqjv?a=view> • BEA Weblogic Server 10.0. 13 Dec. 2006. BEA Systems, Inc. 1 Nov 2007 <http://edocs.bea.com/wls/docs100/index.html>
Reference • WebSphere Application Server documentation. 29 May 2006. IBM Inc. 1 Nov 2007 <http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.base.doc/info/welcome_base.html> • JBoss.org: Community Documentation. 2004. Redhat, Inc. 13 Jan 2007 <http://labs.jboss.com/projects/docs/> • Markov Chain paper • Group Scheduling in SELinux to Mitigate CPU-Focused Denial of Service Attacks