1 / 12

ORACLE VLDB

VLDB. The real world is in the Tb range (British Telecom - 80Tb using Sun Oracle)Data consolidated from different sources to build Data Warehouses and using Data mining techniques to extract useful informationData is always READ onlyPhysics data has similar characteristics . VLDB. Current size li

elina
Download Presentation

ORACLE VLDB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. ORACLE & VLDB Nilo Segura IT/DB - CERN

    2. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different sources to build Data Warehouses and using Data mining techniques to extract useful information Data is always READ only Physics data has similar characteristics

    3. VLDB Current size limits : Solaris 64bits + Oracle 64bits = 4Pb per database This is more or less 19240 SunStoredge A1000 units (216Gb per unit today) The current technology does not allow us to store on-line all this data in a manageable way Database in Oracle terms : instance (memory) + database files . The dictionary + database accounts 4Pb is a limit in the db software, the underlying Filesystem can store more than that. It is clear that we need the proper hardwareDatabase in Oracle terms : instance (memory) + database files . The dictionary + database accounts 4Pb is a limit in the db software, the underlying Filesystem can store more than that. It is clear that we need the proper hardware

    4. VLDB A typical LHC experiment will get several Petabytes of raw data But the end user ought not to have access to all this. We need to process/group/summarize it following certain criterias This also means more disk space (if we want to keep everything on-line) We should keep only the reconstructed data on-line I think that it is a mistake to let the users moving around the raw data for practical reasons, they do not know enough of the system to produce good quality queries, they do not need to repeat the process already done at the reconstruction stage ASM data size 11-15 Tb per year is possible to handle with today’s technology, things will improve certainly in 2-3 yearsWe should keep only the reconstructed data on-line I think that it is a mistake to let the users moving around the raw data for practical reasons, they do not know enough of the system to produce good quality queries, they do not need to repeat the process already done at the reconstruction stage ASM data size 11-15 Tb per year is possible to handle with today’s technology, things will improve certainly in 2-3 years

    5. VLDB Not to mention backup…we need to keep our data safe (have you devised your backup strategy?) RAID technology to help us to increase the performance and availability RAID 0+1(best) or RAID 5(cheaper) Today, we should have raw data on tapes and reconstructed data on-line It is clear that we need to offer a good quality service, with mirroring ans stripping or hardware-raid5, unless everybody agrees that the service will stop in case of disk failure And we need also some clustering techniques and load balancing Even with 1Tb disk units, this means N*1024 units..quite a lot but becoming feasible..It is clear that we need to offer a good quality service, with mirroring ans stripping or hardware-raid5, unless everybody agrees that the service will stop in case of disk failure And we need also some clustering techniques and load balancing Even with 1Tb disk units, this means N*1024 units..quite a lot but becoming feasible..

    6. VLDB Now some software tricks to deal with all this amount of data Partitioning Parallel DML Materialized Views Parallel Server (Cluster configuration) Bitmapped indexes? This tips are needed if we want to increase the performance of the systemThis tips are needed if we want to increase the performance of the system

    7. VLDB Partitioning A large amount of data can be divided into physically independent structures according to a certain criteria However the user continues to see the same logical structure Partition keys can be defined by range or using a hash function The system can discard partitions based on the user’s queries, reducing the amount of data to be processed

    8. VLDB Parallel DML A single select/insert/delete/update can be executed by several processes (slaves) coordinated by a master process Naturally leads to a better use of SMP machines The degree of parallelism can be set by the user of automatically by the system (testing is a must) Parallel insert/update/delete does need partitioning You need to plan carefully your I/O system

    9. VLDB Materialized views Normal views are just a name for a SQL query with no real data associated (until runtime) This can be very costly if we run it regularly MV is just a view with all the data that satisfy the query already there (like in a normal table) It can be refreshed (manually or automatically) to reflect the dynamic nature of the view

    10. Parallel Server Several nodes are attacking the same database that is on a disk system shared by all the nodes in the cluster Users can be assigned to different nodes (load balancing) Intra parallelism – queries are executed across the different nodes At CERN there are 3 Sun Clusters (2 for DB, 1 for Web) and 1 Compaq There is no such thing for Linux (yet) For the moment, the files are actually raw partitions, but makers are working on shared filesystems (OpenVMS had them already years ago) There are heavy dependencies between HW/SW to create one of these clustersFor the moment, the files are actually raw partitions, but makers are working on shared filesystems (OpenVMS had them already years ago) There are heavy dependencies between HW/SW to create one of these clusters

    11. Others Another point is how to distribute data amongst the different Institutes Network ? , Tapes + Post ? … It would be nice to have a plug-in plug-out mechanism in the database This is called transportable tablespaces We may also use the database replication option but…. In Oracle this is called transportable tablespaces, in its current version it is limited to traffic between same hardware, will XML help us ? With XML there will be certainly some extra processing cost, it is very portable but you have the extra overhead and processing..In Oracle this is called transportable tablespaces, in its current version it is limited to traffic between same hardware, will XML help us ? With XML there will be certainly some extra processing cost, it is very portable but you have the extra overhead and processing..

    12. Conclusion Do not despair by the size of problem We are trying to solve tomorrow’s problem using today’s technology So keep an eye open and be VERY flexible in your model to be able to adapt quickly and painlessly Do never declare you model frozen, it is a mistake, try to improve it, adopt new technologies (if it is a benefit) Maybe for AMS tomorrow is today. I know cases of systems based on Oracle that were created many years ago and that had a lot of extra coding to workaround certain limitations, (no longer true) and now they are a real mess, extremely complex and heavy to maintain. Because they keep on adding new things by patching and patching… without any real thought of what the system must doMaybe for AMS tomorrow is today. I know cases of systems based on Oracle that were created many years ago and that had a lot of extra coding to workaround certain limitations, (no longer true) and now they are a real mess, extremely complex and heavy to maintain. Because they keep on adding new things by patching and patching… without any real thought of what the system must do

More Related