1 / 12

Protein grouping in mzIdentML

Protein grouping in mzIdentML. ProteinDetectionList. ProteinAmbiguityGroup id=“PAG1”. ProteinDetectionHypothesis id=“PDH1” dbseq_ref =“dbseq_Q05421|CP2E1_MOUSE” anchor protein. ProteinDetectionHypothesis id=“PDH2” dbseq_ref =“dbseq_Q05423|CP2E2_MOUSE” sequence same-set.

anne
Download Presentation

Protein grouping in mzIdentML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein grouping in mzIdentML

  2. ProteinDetectionList ProteinAmbiguityGroup id=“PAG1” ProteinDetectionHypothesis id=“PDH1” dbseq_ref=“dbseq_Q05421|CP2E1_MOUSE” anchor protein ProteinDetectionHypothesis id=“PDH2” dbseq_ref=“dbseq_Q05423|CP2E2_MOUSE” sequence same-set ProteinDetectionHypothesis id=“PDH3” dbseq_ref=“dbseq_Q05312|CP2F1_MOUSE” sequence subset ProteinAmbiguityGroup id=“PAG2” ....

  3. ProteinAmbiguityGroup and ProteinDetectionHypothesis

  4. Existing CV terms for ProteinDetectionHypothesis id: MS:1001591 name: anchor protein def: "A representative protein selected from a set of sequence same-set or spectrum same-set proteins." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship id: MS:1001592 name: family member protein def: "A protein with significant homology to another protein, but some distinguishing peptide matches." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship id: MS:1001593 name: group member with undefined relationship OR ortholog protein def: "TO ENDETAIL: a really generic relationship OR ortholog protein." [PSI:MS] is_a: MS:1001101 ! protein group or subset relationship id: MS:1001594 name: sequence same-set protein def: "A protein which is indistinguishable or equivalent to another protein, having matches to an identical set of peptide sequences." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship id: MS:1001595 name: spectrum same-set protein def: "A protein which is indistinguishable or equivalent to another protein, having matches to a set of peptide sequences that cannot be distinguished using the evidence in the mass spectra." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship

  5. Existing CV terms for ProteinDetectionHypothesis id: MS:1001596 name: sequence sub-set protein def: "A protein with a sub-set of the peptide sequence matches for another protein, and no distinguishing peptide matches." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship id: MS:1001597 name: spectrum sub-set protein def: "A protein with a sub-set of the matched spectra for another protein, where the matches cannot be distinguished using the evidence in the mass spectra, and no distinguishing peptide matches." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship id: MS:1001598 name: sequence subsumable protein def: "A sequence same-set or sequence sub-set protein where the matches are distributed across two or more proteins." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship id: MS:1001599 name: spectrum subsumable protein def: "A spectrum same-set or spectrum sub-set protein where the matches are distributed across two or more proteins." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship

  6. Problems • No requirement for any exporter to use the terms “MAY” • “anchor protein” doesn’t capture intended role and isn’t used consis id: MS:1001596 name: sequence sub-set protein def: "A protein with a sub-set ...." [PSI:MS] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1001101 ! protein group or subset relationship • No definition of what should be put in the value slot of cv terms: • Could be the PDH identifier, accession or DBSequence identifier of group representative or any other protein that is super-set to this protein • Or anything else for that matter • What does passThreshold= “true” on PDH mean? • Unclear how to count the number of identified proteins in an mzIdentML file • Count PAGs or count PDHs? • No terms for protocol describing how inference has been done or how to interpret results

  7. Proposed work group outcomes • Attach cv terms to <ProteinDetectionProtocol> describing how protein inference has been done • Still under discussion, since these effectively describe parts of the algorithm used • Exactly one mandatory “representative protein” MUST be present per group (new name for “anchor protein”) on PDH • To be checked by semantic validator • ProteinDetectionList MUST have a cv term “number of identified proteins” (count PAGs that have “representative protein” PDH with passThreshold=“true” • Each PDH SHOULD be flagged with one term from a group stating whether it is “representative protein”, “sequence|spectrum same-set”, “sequence|spectrum subset”, “sequence|spectrum subsumed” or “marginally distinguished” (i.e. Not strictly any of these, but not enough evidence to be a group representative) • Value slot of these terms SHOULD contain a comma-separated list of super-set or same-set (as appropriate) PDH IDs

  8. Table 1 –New CV terms for reporting how protein inference has been performed. The semantic validation software for mzIdentML reports an error (MUST), a warning (SHOULD) or an informational message (MAY) if these terms are not reported within the file.

  9. Table 1 cont. –New CV terms for reporting how protein inference has been performed. The semantic validation software for mzIdentML reports an error (MUST), a warning (SHOULD) or an informational message (MAY) if these terms are not reported within the file.

  10. Table 2 New CV terms for reporting protein set (group) relationships and global statistics about the protein identification results. The semantic validation software for mzIdentML reports an error (MUST), a warning (SHOULD) or an informational message (MAY) if these terms are not reported within the file.

  11. Table 2 cont. New CV terms for reporting protein set (group) relationships and global statistics about the protein identification results. The semantic validation software for mzIdentML reports an error (MUST), a warning (SHOULD) or an informational message (MAY) if these terms are not reported within the file.

  12. Unresolved issues • Are the protocol terms necessary / sensible / overkill? • Is there general consensus on the idea that the number of identified proteins MUST be reported • and must equal count of PAGs with PDH passThreshold=“true” • Is it sensible to have SHOULD rules on all subset/same-sets? • Extra terms for relationships between protein sequences • Probably these will be removed • Mechanism for updating the mzIdentML specifications and validation software • Minor update + submission to shortened PSI process?

More Related