640 likes | 785 Views
Storage Network Designs for OLTP Business Continuity. Marc Farley President, Building Storage Networks, Inc. Agenda. The Vendor Neutral Approach Overview of OLTP &High Availability I/O Redundancy Methods Storage Network Technologies Storage Networking for HA OLTP .
E N D
Storage Network Designs for OLTP Business Continuity Marc Farley President, Building Storage Networks, Inc.
Agenda • The Vendor Neutral Approach • Overview of OLTP &High Availability • I/O Redundancy Methods • Storage Network Technologies • Storage Networking for HA OLTP
Vendor Neutral Approach • Generic terms, not vendor terms • Assumed basic knowledge of SAN, NAS, RAID
OLTP Environments • Mission critical business applications • Business in real-time • Expensive equipment and software • Aggressive performance objectives • Highly skilled IT staff • Hands-on computing operations
OLTP Database Software • Oracle, • 8i Oracle Parallel Server (OPS) • 9i Real Application Cluster (RAC) • IBM • DB2 UDB • Informix • MS SQL Server • Sybase, My SQL, others
OLTP OS Platforms • IBM S/390 MVS • Unix Systems • Windows 2000+ • HA Linux
OLTP Requirements • 99.999% uptime • Non-degrading response time • High transaction rates • Seamless scalability • Cost relief
Database Storage Approaches • Raw parititions • Bypass OS I/O buffering • File system • Facilitates data management • NFS mounted • Offload DB server, NTAP + Oracle
ACID Properties of OLTP • Atomicity– No partial transactions • Consistency– All tables are in a consistent state before and after a completed transaction • Isolation– One transaction cannot contaminate other transactions • Durability–Transactions are complete only when the database updates are written to disk storage
Challenges of OLTP • Major systems integration effort • Intricate tuning and monitoring • Little tolerance for errors • Complex data structures & relationships • Time and sequence-sensitive processes • Must be adhered to for data integrity • Shifting workloads and bottlenecks
OLTP Database Files • Data files • Database data, tablespaces • Redo log files, archive log files • Reconstruct or rollback transactions • Control files • File layout information
OLTP Table Space Storage • Use many spindles to distribute hot spots • RAID 0+1 recommended • File system recommended over raw partitions • Easier data management
Striping for Performance RAID Controller (Microsecond performance) DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive Disk Drives (Millesecond performance)From rotational latency and seek time
My Personal Favorite, RAID 0+1 RAID Controller DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive 1 2 3 5 4 Mirrored Pairs of Striped Members
OLTP Redo Log Storage • Raw partitions recommended • Sequential high speed writes • Separate mirror pairs per log file group • Capacity for 30 – 60 minutes of data • Goal is to limit disk contention for current and active log files
OLTP Archive Log Storage • File system or NFS mounting is required • NFS mounting is recommended • Mirroring or RAID • Goal is to have easy access in case they are needed for reconstruction
High Availability • The ability for a system or application to immediately continue its mission after loss or damage to system components, systems, facilities and data
Expected Scaling limitations Processor Storage capacity Network Consolidations Product life cycles Unexpected Failures Bugs Virus Operator errors Disasters Availability Threats
HA Engages All Elements • Systems • Application • Network connections • Network services • Storage and I/O subsystems
Managing the Risks • Local copies of data • Immediate availability • (Remote) Nearby • Immediate availability to several hours • Remote Far away • One to several days availability
Disaster/Availability Radii Local Remote Nearby Remote Far Away
Nobody Expects….. • Weird things to happen to them • Disintegration of media • Underground flooding through tunnels • Fires in Telco switching centers
High Availability for OLTP • Duplication of functions • Without degrading performance • Without risking data integrity • Brute force techniques • Automation and efficiency • Cost is always an issue • And high availability DOES cost
Jedi Jim Gast Marc Skyfaller Farley A Long Time Ago in a Job Not So Far Away……………. You must learn the to be a master of redundancy it if you are going to be a storage geek. Remember Marc, there is only one concept: REDUNDANCY! Redundancy. Again! Got it Jim. Let’s Eat! Whatever
Eventually, I Learned to Appreciate His Teachings…… • REDUNDANCYNSPoF (No Single Point of Failure) Don’t get the giant spicy Polish for lunch – its too much for the digestion
OLTP HA Requires Complete Redundancy Protection • Client network • Server systems and components • Application modules • I/O Channels and Networks • Storage subsystems and components • Data
A Quick Look At Clustered Storage Shared Everything Shared Nothing Both servers share control of a common storage address space Each server controls its own storage address space
Examples of OLTP Clusters Microsoft SQL Server Oracle 9.1 RAC Data is exchanged between servers Data is accessed directly from storage Failoverpaths only
One more time, with subsystems… Microsoft SQL Server Oracle 9.1 RAC All storage is shared by all cluster nodes Same subsystem but different address spaces
I/O Redundancy • Host to subsystem • Mirroring: Host to independent targets • Multi-pathing: Host to a single target • Subsystem to subsystem • Store and forward: • Local • Remote
Disk Mirroring: Redundant storage targets Independent, identically sized storage address spaces One controller Two controllers
Disk Mirroring: I/Os to 2 Targets • “Brute force” redundancy: fast and simple • Both read and write I/Os • Overlapped reads for performance • Local connections • Limited capacity* • I/O Bottlenecks* for random I/O activity • * if targets are disk drives
Disk Mirroring for Redo Log Files • Log files are a common bottleneck • Use raw partitions • Redundancy is required • Mirroring is adequate • Use highest RPM with lowest seek times • Put on a separate channel from database I/O • Use separate mirrored pairs per group
Mirroring to Storage Subsystems StorageSubsystem Independent, identically sized storage address spaces Two controllers StorageSubsystem
Mirroring to Subsystems • Targets are subsystems, not disks • Separate address spaces • Capacity scales to subsystem max • Double level redundancy • Mirroring plus RAID • Multiple disk spindles reduces I/O bottlenecks
Disk Mirroring Datafiles from Host to Storage Subsystems • Disk mirroring + subsystem RAID • Excellent capacity scaling • Adjacent and across campus/town • One subsystem outside site radius • Requires longer distance cabling • Reads and writes both transmitted
Multi-Pathing: Redundant Paths Between a Host & Subsystem X Application data volume Pathing software determines that a transmission error occurs & switches to a redundant path
Multi-pathing vs Mirroring • Mirroring assumes independent, but similar storage targets • Multi-pathing assumes multiple paths to the exact same target • Mirroring can use a single HBA, multi-pathing needs two HBAs
Path Failures 1 3 2 1. HBA problem Application data volume 2. Link, switch or network problem 3. Subsystem controller problem
I/O sent to storage No ack received Transmission failures recognized after SCSI timeouts are exceeded The I/Os is retried and eventually an error is passed back to the process that issued the I/O
Path Failover for OLTP I/O • Redundant path resources take over activities for a failed path to sustain operations without disrupting service or risking data integrity
Store and Forward Independent, identically sized storage address spaces Host A B
Store & Forward: One Host I/O and Two Copies of Data • Only real option for remote copies • Does not forward read I/Os • Proprietary protocols and methods • Standards are emerging ie. FC/IP • First step to storage snapshots
ACK ACK I/O I/O Forward Forward ACK Store and Forward: Acknowledgements Asynchronous Synchronous B B A A
Trade-offs withAcknowledgement Handling • Synchronous • Always preferred • Slowest performance • State of copy is precise • Asynchronous: • Fastest performance • Least precise knowledge of copy status
Store & Forward: Local and Remote Copies • Local & nearby copy techniques • Synchronous • Fiber optic cabling, optical/DWDM services • Remote-far away copy techniques • Asynchronous • ATM gateways, OC-12 or less, FC/IP
Mirroring Async I/O Reads and writes No snapshot tie-in Uses more host slots Least costly Store and Forward Async or Sync I/O Writes only Snapshot ready May conserve host I/O slots Most costly Mirroring vs Synchronous Store and Forward for Local & Nearby Copies
Combining Mirroring with Store and Forward Store and Forward Radius Local Nearby Remote Far Away Mirroring Radius