1 / 21

Large Object Datatypes

5a. Large Object Datatypes. Agenda. In this session, we will be discussing: What Is Unstructured Data? Why Should I Place Unstructured Data In Sybase IQ? Understanding How Sybase IQ Stores The Data Sybase IQ Limitations for Unstructured Data

Download Presentation

Large Object Datatypes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 5a Large Object Datatypes

  2. Agenda • In this session, we will be discussing: • What Is Unstructured Data? • Why Should I Place Unstructured Data In Sybase IQ? • Understanding How Sybase IQ Stores The Data • Sybase IQ Limitations for Unstructured Data • Creating, Loading, and Extracting Unstructured Data • New Functions • Sybase IQ 12.6 Functionality • Tuning Sybase IQ for Unstructured Data • Summary • Question And Answer

  3. What Is Unstructured Data? • Freeform binary or textual data that has meaning to a user or an application, but not to the database • Why is it unstructured? • Format is not known to the RDBMS • Cannot be indexed due to the unstructured nature • Large object types • Binary – Generally referred to as a BLOB (Binary Large Object) • Text – Generally referred to as a CLOB (Character Large Object) • Often referred to as a LOB or LOBs

  4. What Is Unstructured Data? • Can be any type of data you desire • Images • Video • Sound • Scanned or electronic documents • Applications • Company specific data format • It has to make sense to you, not the database!

  5. Why Should I Place Unstructured Data In Sybase IQ? • Single location for application data • The application can be written to access all structured and unstructured data using a single interface • The DBAs have control over all data for the application, not just the structured data • Filesystems are generally less efficient at handling large sequential files, causing more disk I/O • In order to achieve performance from filesystems, additional software and training is necessary • Requires tight coordination with system administrators • Application can be taken down for non-database problems • Filesystems going offline • Storage filling up

  6. Why Should I Place Unstructured Data In Sybase IQ? • Generally requires less storage • Filesystems cannot naturally compress the datafiles • IQ will compress the LOB data just as it compresses other data • Each IQ data page is run through a compression algorithm to compress the contents of the page into one or more disk blocks • Filesystems can have as much as 10% overhead • Inode table • Reserved space for the “root” account • Reading from filesystem must be done sequentially • The application can be designed to return portions of the data in parallel • Sybase IQ loads are done in parallel

  7. Understanding How Sybase IQ Stores The Data • Each object is stored in one or more data pages • If an object needs a portion of a page, the entire page is marked as in use for that LOB • Each page is then put through the IQ compression algorithms to squeeze as much storage savings as possible • No other indexing or changes take place on the LOB data

  8. Sybase IQ Limitations for Unstructured Data • There is no database limit on the number of LOB columns that can exist • Limited only by the total number of tables and columns in a database • A table can have an unlimited number of LOB columns • Limited only by the total number of columns allowed on a table • Each LOB columns is virtually unlimited in size • Maximum LOB size is 4GB * page size • Absolute maximum is 4GB * 512KB = 2 PetaBytes • Most RDBMS engines are limited to 2-4 GB per entry

  9. Creating, Loading, and Extracting Unstructured Data • Added new datatype “long binary” to support LOB data • Create a table with LOB support create table blob_data( file_id int primary key ,filename char(64) ,ext char(6) null ,file_size unsigned bigint ,lobcol long binary null ) LOB Column

  10. Creating, Loading, and Extracting Unstructured Data • New syntax added to support loading LOB data • “BINARY FILE()” added to column specification • Data for the binary file location must be delimited • LOB data does not exist in primary data file • Primary data file is a pointer to LOB data being loaded

  11. Creating, Loading, and Extracting Unstructured Data • Load a table with a LOB column set temporary option load_memory_mb=50; go load table blob_data ( file_id ',' ,filename ',‘ ,ext ',‘ ,file_size ',' ,lobcol binary file ( ',' ) ,filler(1) -- change this to filler(2) for Windows Data ) from ‘blob_file.dat’ format ascii preview on quotes off escapes off go commit go LOB Column

  12. Creating, Loading, and Extracting Unstructured Data • Sample blob_file.dat data 1,boston,jpg,/s1/loads/lobs/boston.jpg,1234, 2,map_of_concord,bmp,/s1/loads/maps/concord.bmp,321, 3,zero length test,NULL,,123, 4,null test,NULL,NULL,456, • Notice that there is no LOB data in the file • The IQ load engine will open the secondary files and load them into the appropriate rows • Row 1: /s1/loads/lobs/boston.jpg • Row 2: /s1/loads/maps/concord.bmp • Rows 3 and 4 will be NULL

  13. Creating, Loading, and Extracting Unstructured Data • More LOB loading notes… • If the file cannot be opened for any reason a NULL is loaded instead and no error is thrown • An error will be thrown if the column is created with “NOT NULL” • LOB data can only be loaded from an ASCII primary data file (e.g. blob_file.dat, map.jpg) • The ASCII file can be fixed width, however, the secondary file name column MUST have a delimiter after it

  14. Creating, Loading, and Extracting Unstructured Data • Three methods to extracting LOB data • Return the entire contents of the column • Return a portion of the column • Extract the column to disk • To return the entire contents of the column to the client application • select lobcol from my_table where file_id = 1 • To return a portion of the column to the client application • select byte_substr64( lobcol, 1, 1000 ) from my_table where file_id = 1 • Will return bytes 1 through 1000 of the LOB • Extract the column to disk • select bfile ( ‘/tmp/my_lob.dat’, lobcol ) from my_table where file_id = 1 • Will directly write the entire row/column to disk on the IQ server host

  15. New Functions • BYTE_LENGTH64( long binary ) • Returns the total bytes contained in the LOB column • select byte_length64( lobcol ) from my_table where file_id = 1 • BYTE_SUBSTR64( long binary, offset, length) • Returns a portion/substring of the LOB data • select byte_substr64( lobcol, 1, 1000 ) from my_table where file_id = 1 • BFILE( filename, long binary ) • Extract the contents of the LOB column to filename • Each row should have a unique filename • The filename can be any string manipulation • If the filename is non-unique the contents will be overwritten • select bfile( filename + ‘.’ + ext, lobcol ) from my_table where file_id = 1

  16. Sybase IQ 12.6 Functionality • New domain BLOB created for the “long binary” datatype • New domain CLOB created for the “long varchar” datatype • Single- and Multi-byte support added for the CLOB datatype • New load table syntax • Secondary_File_Error option specifies the desired error handling when an error occurred during opening/reading a secondary file • Supported syntax • UPDATE • INSERT..VALUES • INSERT..SELECT • LOAD • DELETE • TRUNCATE • SELECT..INTO • INSERT..LOCATION • SELECT

  17. Tuning Sybase IQ for Unstructured Data • CORE_Options14 • This sets the number of threads used to read an individual lob secondary file. Measurements have been inconsistent due to vast hardware differences and dependencies. A value much larger than 3 has shown to cause system time to be too high. Start with a value of 3 and tune from there. • CORE_Options15 • This sets the number of threads in the lob load team. A value of zero will use 1 thread per cpu, if available. A non-zero value of n sets the team size to n, if available. The lob load threads have to live within the other settings such as max threads per connection, max team size, etc. On a machine with many CPUs, this may need to be set manually.

  18. Tuning Sybase IQ for Unstructured Data • FP_LOB_Workunit_MBSize • This defaults to 100 MB and there is probably no reason to change it. It is the amount of data each thread reads in a single unit of work when loading lob data. • Loading LOB data is very sensitive to I/O and CPUs • Have sustained load rates of over 2 GB per second! • That’s 7 TB an hour • Had 72 CPUs and enough I/O controllers to handle the throughput • Loading LOB data is not memory intensive • A minimally configured IQ server (1-2 GB RAM total) is sufficient to load LOB data • Load_Memory_MB is not used heavily for LOB data

  19. Large Objects (LOBs) • Existing LONG BINARY columns created before 12.5ESD8 must be dropped before 12.6 installation • LONG BINARY size limit is between 512TB and 2PB (depending upon IQ PAGE size) – that is per row! • LONG VARCHAR now has the same “restrictions” as LONG BINARY • ASE TEXT can be inserted into LONG VARCHAR • A whole new manual entitled • Large Object Management in Sybase IQ

  20. Summary • A fast, compressible way to store unstructured data in an RDBMS • Compression ratio will vary, but is guaranteed to be no larger than the original file without the O/S overhead • Data can be written and read in parallel • Remove the need to rely on resources outside the database administrators and developers • Guaranteed performance since all resources are under the DBA’s control

  21. About things - End

More Related