What are the remaining 2400 unclassified (link-record containing) carbohydrate-containing entries? The Interns
1. Sugar extending from outer surface of polymer, non-active site, covalent NAG 1A0H, 1A65, 2ETE
1A47, 1A3K, 2EXK 2. Di- and polysaccharides non-covalently associated XYS
148L 3. Chain of sugar + other non-sugar modified residues, non-covalent to protein. [NAG-MUB-ALA-FGA-API-DAL] API FGA DAL
What we know… • All entries contain instances that can be classified into one of these 3 groups. • However, often there are other instances within the same entry that can be classified into groups already “binned” • i.e. polysaccharide bin and N-glycan bin, free-floating drug-like ligand (not yet binned), nucleotide-containing
1EWK Same as group 1 identified in this remaining unclassified pile. Definition of polysaccharide? “[ NAG A 801 ASN A 98 ]”
1DOT Same as group 2? Again, polysaccharide list definition? “[ NAG A 692 FUC A 693 FUC A 693 NAG A 692 FUC A 693 NAG A 692 ]”
Polysaccharides • What exactly was Kim’s definition of polysaccharides and how did his search find them? • Number of monosaccharides units • Covalent or non-covalent • Can we refine the definition to improve search results?
N-glycans • All entries that contain N-glycans (unidentified by Kim) fall into group 1. • i.e. 1NMC, 2FK0, 1414 N-glycan Group 1 NAG
N-glycans • Was Kim’s search for N-glycans inhibited by something to do with group 1? • Or did Kim’s search for N-glycans purposely omit entries that could be classified into group 1? • Either way, can we identify the search logic that led to this correlation and use it to automate classification into group 1 from this “remainder” group?