1 / 59

Scalable, Fault-Tolerant NAS for Oracle - The Next Generation

Scalable, Fault-Tolerant NAS for Oracle - The Next Generation . Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc. The Un-”Show Stopper”. NAS for Oracle is not “file serving”, let me explain…

norton
Download Presentation

Scalable, Fault-Tolerant NAS for Oracle - The Next Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable, Fault-Tolerant NAS for Oracle - The Next Generation Kevin Closson Chief Software Architect Oracle Platform Solutions, Polyserve Inc

  2. The Un-”Show Stopper” • NAS for Oracle is not “file serving”, let me explain… • Think of GbE NFS I/O paths from Oracle Servers to the NAS device that are totally direct. No VLANing sort of indirection. • In these terms, NFS over GbE is just a protocol as is FCPover FiberChannel • The proof is in the numbers. • A single dual-socket/dual-core ADM server running Oracle10gR2 can push through 273MB/s of large I/Os (scattered reads, direct path read/write, etc) of triple-bonded GbE NICs! • Compare that to infrastructure and HW costs of 4GbE FCP (~450MB/s, but you need 2 cards for redundancy) • OLTP over modern NFS with GbE is not a challenging I/O profile. • However, not all NAS devices are created equal by any means

  3. Agenda • Oracle on NAS • NAS Architecture • Proof of Concept Testing • Special Characteristics

  4. Oracle on NAS

  5. Oracle on NAS • Connectivity • Fantasyland Dream Grid™ would be nearly impossible with FibreChannel switched fabric, for instance: • 128 nodes == 256 HBAs, 2 switches each with 256 ports just for the servers then you have to work out storage paths • Simplicity • NFS is simple. Anyone with a pulse can plug in cat-5 and mount filesystems. • MUCH MUCH MUCH MUCH MUCH simpler than: • Raw partitions for ASM • Raw, OCFS2 for CRS • Oracle Home? Local Ext3 or UFS? • What a mess • Supports shared Oracle Home, shared APPL_TOP too • But not simpler than a Certified Third Party Cluster Filesystem , but that is a different presentation • Cost • FC HBAs are always going to be more expensive than NICs • Ports on enterprise-level FC switches are very expensive

  6. Oracle on NAS • NFS Client Improvements • Direct IO • open(,O_DIRECT,) works with Linux NFS clients, Solaris NFS client, likely others • Oracle Improvements • init.ora filesystemio_options=directIO • No async I/O on NFS, but look at the numbers • Oracle runtime checks mount options • Caveat: It doesn’t always get it right, but at least it tries (OSDS) • Don’t be surprised to see Oracle offer a platform-independent NFS client • NFS V4 will have more improvements

  7. NAS Architecture

  8. NAS Architecture • Single-headed Filers • Clustered Single-headed Filers • Asymmetrical Multi-headed NAS • Symmetrical Multi-headed NAS

  9. Single Headed Filer Architecture

  10. NAS Architecture: Single-headed Filer GigE Network Filesystems /u01 /u02 /u03

  11. Oracle Database Servers Oracle Servers Accessing a Single-headed Filer: I/O Bottleneck A single one of these… Has the same (or more) bus bandwidth as this! I/O Bottleneck Filesystems /u01 /u02 /u03

  12. Oracle Database Servers Oracle Servers Accessing a Single-headed Filer: Single Point of Failure Highly Available through failover-HA, DataGuard, RAC, etc Single Point of Failure Filesystems /u01 /u02 /u03

  13. Clustered Single-headed Filers

  14. Architecture: Cluster of Single-headed Filers Paths Active After Failover Filesystems /u01 /u02 Filesystems /u03

  15. Oracle Database Servers Paths Active After Failover Filesystems /u01 /u02 Filesystems /u03 Oracle Servers Accessing a Cluster of Single-headed Filers

  16. Oracle Database Servers Paths Active After Failover Filesystems /u01 /u02 Filesystems /u03 Architecture: Cluster of Single-headed Filers What if /u03 I/O saturates this Filer?

  17. Oracle Database Servers Filer I/O Bottleneck. Resolution == Data Migration Paths Active After Failover Filesystems /u01 /u02 Filesystems /u03 Filesystems /u04 Migrate some of the “hot” data to /u04

  18. Oracle Database Servers Data Migration Remedies I/O Bottleneck NEW Single Point of Failure Paths Active After Failover Filesystems /u01 /u02 Filesystems /u03 Filesystems /u04 Migrate some of the “hot” data to /u04

  19. Summary: Single-headed Filers • Cluster to mitigate S.P.O.F • Clustering is a pure afterthought with filers • Failover Times? • Long, really really long. • Transparent? • Not in many cases. • Migrate data to mitigate I/O bottlenecks • What if the data “hot spot” moves with time? The Dog Chasing His Tail Syndrome • Poor Modularity • Expanded by pairs for data availability • What’s all this talk about CNS?

  20. Asymmetrical Multi-headed NAS Architecture

  21. Oracle Database Servers SAN Gateway Asymmetrical Multi-headed NAS Architecture Three Active NAS Heads / Three For Failover and “Pools of Data” FibreChannel SAN … … Note: Some variants of this architecture support M:1 Active:Standby but that doesn’t really change much.

  22. Asymmetrical NAS Gateway Architecture • Really not much different than clusters of single-headed filers: • 1 NAS head to 1 filesystem relationship • Migrate data to mitigate I/O contention • Failover not transparent • But: • More Modular • Not necessary to scale up by pairs

  23. Symmetric Multi-headed NAS

  24. HP Enterprise File Services Clustered Gateway

  25. /Dir1/File1 /Dir1/File1 /Dir1/File1 /Dir1/File1 /Dir2/File2 /Dir3/File3 /Dir2/File2 /Dir2/File2 /Dir2/File2 /Dir3/File3 /Dir3/File3 NAS Head NAS Head NAS Head NAS Head NAS Head NAS Head /Dir1/File1 /Dir1/File1 /Dir2/File2 /Dir3/File3 /Dir2/File2 /Dir3/File3 Symmetric vs Asymmetric EFS-CG

  26. Enterprise File Services Clustered Gateway Component Overview • Cluster Volume Manager • RAID 0 • Expand Online • Fully Distributed, Symmetric Cluster Filesystem • The embedded filesystem is a fully distributed, symmetric cluster filesystem • Virtual NFS Services • Filesystems are presented through Virtual NFS Services • Modular and Scalable • Add NAS heads without interruption • All filesystems can be presented for read/write through any/all NAS heads

  27. RAID 0 LUNS are RAID 1, so this implements S.A.M.E. Expand online Add LUNS, grow volume Up to 16TB Single Volume EFS-CG Clustered Volume Manager

  28. The EFS-CG Filesystem • All NAS devices have embedded operating systems and file systems, butthe EFS-CG is: • Fully Symmetric • Distributed Lock Manager • No Metadata Server or Lock Server • General Purpose clustered file system • Standard C Library and POSIX support • Journaled with Online recovery • Proprietary format but uses standard Linux file system semantics and system calls including flock() and fcntl() clusterwide • Expand a single filesystem online up to 16TB, up to 254 filesystems in current release.

  29. EFS-CG Filesystem Scalability

  30. Scalability. Single Filesystem ExportUsing x86 Xeon-based NAS Heads (Old Numbers) 1,196 1,084 1,200 986 1,000 739 800 MegaBytes per Second (MB/s) 493 600 400 ApproximateSingle-headed Filer limit 246 123 200 0 1 2 4 6 8 9 10 Cluster Size (Nodes) NAS Heads HP StorageWorks Clustered File System is optimized for both READ and WRITE performance.

  31. Virtual NFS Services • Specialized Virtual Host IP • Filesystem groups are exported through VNFS • VNFS failover and rehosting are 100% transparent to NFS client • Including active file descriptors, file locks (e.g. fctnl/flock), etc

  32. EFS-CG Filesystems and VNFS

  33. Oracle Database Servers Enterprise File Services Clustered Gateway Enterprise File Services Clustered Gateway vnfs1 vnfs1b vnfs2b vnfs3b NAS Head NAS Head NAS Head NAS Head /u03 /u03 /u02 /u01 /u04 /u04 … /u01 /u02 /u03 /u04

  34. EFS-CG Management Console

  35. EFS-CG Proof of Concept

  36. EFS-CG Proof of Concept • Goals • Use Oracle10g (10.2.0.1) with a single high performance filesystem for the RAC database and measure: • Durability • Scalability • Virtual NFS functionality

  37. EFS-CG Proof of Concept • The 4 filesystems presented by the EFS-CG were: • /u01. This filesystems contained all Oracle executables (e.g., $ORACLE_HOME) • /u02. This filesystem contained the Oracle10gR2 clusterware files (e.g., OCR, CSS) and some datafiles and External Tables for ETL testing • /u03. This filesystem was lower-performance space used for miscellaneous tests such as backup disk-to-disk • /u04. This filesystem resided on a high-performance volume that spanned two storage arrays. It contained the main benchmark database

  38. EFS-CG P.O.C. Parallel Tablespace Creation • All datafiles created in a single exported filesystem • Proof of multi-headed, single filesystem write scalability

  39. EFS-CG P.O.C. Parallel Tablespace Creation

  40. EFS-CG P.O.C. Full Table Scan Performance • All datafiles located in a single exported filesystem • Proof of multi-headed, single filesystem sequential I/O scalability

  41. EFS-CG P.O.C.Parallel Query Scan Throughput

  42. EFS-CG P.O.C.OLTP Testing • OLTP Database based on an Order Entry Schema and workload • Test areas • Physical I/O Scalability under Oracle OLTP • Long Duration Testing

  43. EFS-CG P.O.C.OLTP Workload Transaction Avg Cost * Averages with RAC can be deceiving, be aware of CR sends

  44. EFS-CG P.O.C.OLTP Testing

  45. EFS-CG P.O.C.OLTP Testing. Physical I/O Operations

  46. EFS-CG Handles all OLTP I/O Types Sufficiently—no Logging Bottleneck

  47. Long Duration Stress Test • Benchmarks do not prove durability • Benchmarks are “sprints” • Typically 30-60 minute measured runs (e.g., TPC-C) • This long duration stress test was no benchmark by any means  • Ramp OLTP I/O up to roughly 10,000/sec • Run non-stop until the aggregate I/O breaks through 10 Billion physical transfers • 10,000 physical I/O transfers per second for every second of nearly 12 days

  48. Long Duration Stress Test

  49. Long Duration Stress Test

  50. Long Duration Stress Test

More Related