1 / 20

The search for a self-documenting image-file format for macromolecular crystallography

The search for a self-documenting image-file format for macromolecular crystallography Development of imgCIF R.M. Sweet, Brookhaven Biology Herbert Bernstein, Math and Comp. Sci., Dowling College. Motivation:

huslu
Download Presentation

The search for a self-documenting image-file format for macromolecular crystallography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The search for a self-documenting image-file format for macromolecular crystallography Development of imgCIF R.M. Sweet, Brookhaven Biology Herbert Bernstein, Math and Comp. Sci., Dowling College

  2. Motivation: • Since the early days of computing, crystallography has been a discipline that has pushed the limits of computation (I remember a Burroughs 220 tube-type in about 1963). • It’s a data-rich science, results are created by computation and stored as numbers. • Early on, crystallographers created the Crystallographic Information Framework, a relational database schema stored as 80-column characters. • It organized the raw data, metadata describing the experiment, and the resulting structure. • Nowadays, the diffraction experiment for small molecules is like a spectroscopy – give the specimen to the machine; hit <Enter>; look at the answer; hit <Enter> again to submit the CIF file to the Cambridge Crystallographic Data Centre.

  3. High-Throughput macromolecular crystallography is approaching this situation. • The Protein Data Bank was created for this discipline in the ’70s to store the data, long before it was HTP. • The PDB started to use a relational database, and a flat-text form of that schema (mmCIF) in the ’90s. • Some of us felt that the raw data should carry meta data in an organized way from the experiment to the deposition of the structure in the PDB. • This would help the experimenter keep records, and it would help the programmer who wanted to use the data. • He or she ought not to have to worry about how or where the original experiments were done – there should be a complete annotation of the work.

  4. A growing need is FedEx data: Data are taken somewhere by one person, used somewhere else by someone else. It MUST be transparent where they came from, what they’re about.

  5. A little history: There has been over a decade of discussion and activity on this question. • March ’95 – The subject of internally documented images was raised in a workshop on GUI’s at BNL. • July ’95 – The SR-SIG endorsed creation of a standard image format with header at the ACA meeting. • Early ’96 – Intense E-mail discussions made progress. • August ’96 – Report from this group at Seattle IUCr mtg CIF wkshp. • October ’97 – Major workshop at BNL. Led to two years of off-line work to establish the imgCIF/CBF standard: Andy Hammersley, Herb Bernstein, Paul Ellis • August ’99 – Reported to IUCr COMCIFS.

  6. July ’00 – Submitted to COMCIFS • December ’00 – Approved by COMCIFS • May ’05 – ACA data committee says, “Let’s get on with it!” • July ’06 – Bernstein and Sweet hold yet another workshop to regain momentum.

  7. The plan for handling image files is that there should be a header of essentially text, then the image, probably as raw binary.

  8. The imgCIF dictionary is an add-on to the mmCIF dictionary. The dictionary is well documented. Here is shown an example from the dictionary of the data-array loop, this time as hexadecimal characters. loop_ _array_data.array_id _array_data.binary_id _array_data.data image_1 1 ; --CIF-BINARY-FORMAT-SECTION– Content-Type: application/octet-stream; conversions="x-CBF_CANONICAL" Content-Transfer-Encoding: X-BASE16 X-Binary-Size: 3927126 X-Binary-ID: 1 Content-MD5: u2sTJEovAHkmkDjPi+gWsg== # Hexadecimal encoding, byte 0, byte order ...21 # H4< 0050B810 00000000 00000000 00000000 000F423F 00000000 00000000 ... .... --CIF-BINARY-FORMAT-SECTION---- ;

  9. And here is shown a clear text description from the dictionary of the way to read/write components of the file. save__array_data.data _item_description.description ; The value of _array_data.data contains the array data encapsulated in a STAR string. The representation used is a variant on the Multipurpose Internet Mail Extensions (MIME) specified in RFC 2045-2049 by N. Freed et al. The boundary delimiter used in writing an imgCIF or CBF is "--CIF-BINARY-FORMAT-SECTION--" (including the required initial "--"). The Content-Type may be any of the discrete types permitted in RFC 2045; "application/octet-stream" is recommended. If an octet stream was compressed, the compression should be specified by the parameter 'conversions="x-CBF_PACKED"' or the parameter 'conversions="x-CBF_CANONICAL"'. . . .

  10. Experimental details are saved, e.g. beam-collimation method save__diffrn_radiation.collimation _item_description.description ; The collimation or focusing applied to the radiation. ; _item.name '_diffrn_radiation.collimation' _item.category_id diffrn_radiation _item.mandatory_code no _item_aliases.alias_name '_diffrn_radiation_collimation' _item_aliases.dictionary cif_core.dic _item_aliases.version 2.0.1 _item_type.code text loop_ _item_examples.case '0.3 mm double-pinhole' '0.5 mm' 'focusing mirrors' save_

  11. Or the divergence of the x-ray beam save__diffrn_radiation.div_y_source _item_description.description ; Beam crossfire in degrees parallel to the laboratory Y axis (see AXIS category). This is a characteristic of the xray beam as it illuminates the sample (or specimen) after all monochromation and collimation. This is the esd of the directions of photons in the Y-Z plane around the mean source beam direction. Note that some synchrotrons specify this value in milliradians, in which case a conversion would be needed. To go from a value in milliradians to a value in degrees, multiply by 0.180 and divide by Pi. ; _item.name '_diffrn_radiation.div_y_source' _item.category_id diffrn_radiation _item.mandatory_code no _item_type.code float _item_units.code degrees _item_default.value 0.0 save_

  12. At the BNL PXRR we have substantial infrastructure assembled to create the information for the header, and then to use it to create a final report.

  13. When data are taken, the data-collection system is hooked to the group and its project. All of this, plus experimental parameters, go into the image headers.

  14. A plan to complete the project: • A stumbling block has been to get all of the data-reduction software writers to accept and use the standard. • The plan is to get sets of a)beamline guy, b)hardware vendor, and c)software person to collaborate to get the system working, one facility at a time. • Begin to develop the habit of carrying metadata with intermediate results. • Ultimately create the PDB report, nearly complete, from parameters that started in the imgCIF.

More Related