QSAR Application Toolbox: Third Step - Data Gap Filling (Read-Across by Molecular Similarity) - PowerPoint PPT Presentation

Qsar application toolbox third step data gap filling read across by molecular similarity l.jpg
1 / 76

QSAR Application Toolbox: Third Step - Data Gap Filling (Read-Across by Molecular Similarity). Background.

Related searches for QSAR Application Toolbox: Third Step - Data Gap Filling (Read-Across by Molecular Similarity)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

QSAR Application Toolbox: Third Step - Data Gap Filling (Read-Across by Molecular Similarity)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Qsar application toolbox third step data gap filling read across by molecular similarity l.jpg

QSAR Application Toolbox:Third Step - Data Gap Filling(Read-Across by Molecular Similarity)

Background l.jpg


  • This is a step-by-step presentation designed to take the you through the workflow of the Toolbox in a data-gap filling exercise using read-across based on molecular similarity with data pruning.

  • If you are a novice user of the Toolbox you may wish to review the “Getting Started” document available at [www.oecd.org/env/existingchemicals/qsar]

Objectives 1 l.jpg


  • This presentation reviews a number of functionalities of the Toolbox :

    • Entering and Profiling a target chemical,

    • Identifying analogues for a target chemical,

    • Retrieving experimental results available for those analogues, and

    • Filling data gaps by read-across.

Objectives 2 l.jpg


  • This presentation also introduces several other functionalities of the Toolbox :

    • Use of the Flexible Track

    • Entering a target chemical by SMILES notation,

    • Identify analogues for a target chemical by molecular similarity,

    • Retrieve experimental results for multiple endpoints

Specific aims l.jpg

Specific Aims

  • To review the work flow of the Toolbox.

  • To review the use of the six modules of the Toolbox.

  • To review the basic functionalities within each module.

  • To introduce the user to new functionalities with selected modules

  • To explain to the rationale behind each step of the exercise.

The exercise l.jpg

The Exercise

  • In this exercise we will predict the Ames mutagenicity potential for an untested compound, (n-hexanal) [SMILES CCCCCC=O)], which is the “target” chemical.

  • This prediction will be accomplished by collecting a small set of test data for chemicals considered to be in the same category as the target molecule.

  • The category will be defined by molecular similarity, in particular “Organic functional groups”.

  • The prediction itself will be made by “read-across”.

Read across the analogue approach l.jpg

Read-across & the Analogue Approach

  • Read-across can be used to estimate missing data from a single or limited number of chemicals using an analogue approach.

  • In the analogue approach, endpoint information for a single or small number of tested chemicals is used to predict the same endpoint for an untested chemical that is considered to be “similar.”

Analogous chemicals l.jpg

Analogous Chemicals

  • Previously you learned that analogous sets of chemicals are often selected based on the hypothesis that the toxicological effects of each member of the set will show a common behavior.

  • For this reason mechanistic profilers and grouping methods have been shown to be of great value in using the Toolbox.

  • However, there are cases where the mechanistic profilers and grouping methods are inadequate and one is forced to rely on molecular similarity to form a category.

  • The Toolbox allows one to develop a category by using either organic functional groups or structural similarity.

  • Since there is no preferred way of identifying structural similarity the user is guided to use organic functional groups as a first option.

Side bar on mutagenesis l.jpg

Side-Bar On Mutagenesis

  • Mutagens do not create mutations.

  • Mutagens create DNA damage.

  • Mutations are changes in nucleotide sequence.

  • Mutagenesis is a cellular process requiring enzymes and/or DNA replication, thus cells create mutations.

Tracks l.jpg


  • After opening the Toolbox, the user has to choose between three use tracks (or workflows):

    • (Q)SAR Track

    • Category Track

    • Flexible Track

  • Since you are becoming more familiar with the functionalities of the Toolbox, select the Flexible Track.

Slide11 l.jpg

Tracks and Workflow

Workflow l.jpg


  • Remember each track follows the same workflow:

    • Chemical Input

    • Profiling

    • Endpoints

    • Category Definition

    • Filling Data Gaps

    • Reporting

Chemical input l.jpg

Chemical Input

  • Click on the “Flexible Track”.

  • This takes you to the first module, which is “Chemical input”.

  • This module provides the user with several means of entering the chemical of interest or the target chemical.

  • Since all subsequent functions are based on chemical structure, the goal here is to make sure the molecular structure assigned is the correct one.

Chemical input screen l.jpg

Chemical Input Screen

Ways of entering a chemical l.jpg

Ways of Entering a Chemical

  • Remember there are several ways to enter a target chemical and the most often used are:

    • CAS#,

    • SMILES (simplified molecular information line entry system) notation, and

    • Drawing the structure.

  • Click on SMILES.

  • This inserts the window entitled “Structure editor” (see next slide).

Blank structure editor screen l.jpg

Blank Structure Editor Screen

Entering a smiles l.jpg

Entering a SMILES

  • In the Aqua-colored area type in the SMILES; in this example enter CCCCCC=O

  • Note as you type the SMILES code the structure is being drawn in the center of the field (see next slide).

  • Click “OK” to accept the target chemical.

Smiles structure l.jpg


Target chemical l.jpg

Target Chemical

  • You have now selected your target chemical.

  • Click on the box next to “Substance Information”; this displays the chemical identification information.

  • It is important to remember from here on the workflow will be based on the structure coded in SMILES.

  • The workflow on the first module is now complete; click “Profiling” to move to the next module.

Chemical identification information l.jpg

Chemical Identification Information

Profiling l.jpg


  • “Profiling” refers to the electronic process of retrieving relevant information on the target compound, other than environmental fate, ecotoxicity, and toxicity data, which are stored in the Toolbox.

  • Available information includes likely mechanism(s) of action and a survey of organic functional group, which form the target chemical.

Profiling target chemical l.jpg

Profiling Target Chemical

  • Select the “Profiling methods” you wish to use by red-checking the box before the name of the profiler you wish to use.

  • For this example, select all the profilers for the “mechanistic” methods (see next slide).

  • Click on “Apply”.

Profilers for 1 hexanal l.jpg

Profilers for 1-Hexanal

Profiling24 l.jpg


  • The results of profiling automatically appear as a dropdown box under the target chemical (see next slide).

Profiles of 1 hexanal 1 l.jpg

Profiles of 1-Hexanal (1)

Profiles of 1 hexanal 2 l.jpg

Profiles of 1-Hexanal (2)

  • Very specific profiling results are obtained for the target compound.

  • Please note that no DNA-binding mechanisms was identified (see side-bar on mutagenicity above).

  • These results will be used to search for suitable analogues in the next steps of the exercise.

Side bar on the data tree l.jpg

Side-Bar on the Data Tree

  • As one moves through the different modules of the workflow the information on the target chemical increases.

  • One may find it advantageous to conceal some of that information.

  • For example we can hide the substance information by double clicking on the


Side bar on retrieving concealed information l.jpg

Side-Bar on Retrieving Concealed Information

  • One can retrieve hidden information by double clicking on the

  • This is demonstrated in the next two slides.


Double click on small box next to substance information l.jpg

Double click on small box next to substance information

Substance information reappears on screen l.jpg

Substance informationreappears on screen

Endpoints l.jpg


  • Click on “Endpoints” to move to the next module.

  • “Endpoints” refer to the electronic process of retrieving the environmental fate, ecotoxicity and toxicity data that are stored in the Toolbox.

  • Data gathering can be executed in a global fashion (i.e., collecting all data of all endpoints) or on a more narrowly defined basis (e.g., collecting data for a single or limited number of endpoints).

Side bar on gene mutation l.jpg

Side-Bar on Gene Mutation

  • Mutations within a gene are generally base-substitutions or small deletions/insertions (i.e., frameshifts).

  • Such alteration are generally called point mutations.

  • The Ames scheme based on strains of Salmonella provide the corresponding experimental data.

This example l.jpg

This Example

  • In this example, we focus our data gathering to the-multi-endpoint of mutagenicity and the databases OASIS Genotox and ISSCAN.

  • Click on the boxes next to all the databases except those entitled “ISSCAN Gentox” and “OASIS Genotox”.

  • This leaves a black check mark in the box next to these two database (the ones we want to search).

  • Click on “Gather data”.

Oasis genotox data gathering l.jpg

Oasis Genotox Data Gathering

Next step in data gathering l.jpg

Next Step in Data Gathering

  • Toxicity information on the target chemical is electronically collected from the selected dataset(s).

  • In this example, an insert window appears stating there was “no data found” for the target chemical (see next slide).

  • Close the insert window.

No data for target chemical l.jpg

No data for Target Chemical

Recap l.jpg


  • You have entered the target chemical by SMILES and found it to be 1-hexanal with the CAS# [66-25-1].

  • You have profiled the target chemical and found no experimental data is currently available for 1-hexanal.

  • In other words, you have identified a data gap, which you would like to fill.

  • Click on “Category definition” to move to the next module.

Category definition l.jpg

Category Definition

  • This module provides the user with several means of grouping chemicals into a toxicologically meaningful category that includes the target molecule.

  • This is the critical step in the workflow.

  • Several options are available in the Toolbox to assist the user in refining the category definition.

Grouping methods l.jpg

Grouping Methods

  • Allow the user to group chemicals into chemical categories according to different measures of “similarity” so that within a category data gaps can be filled.

  • For example, starting from a target chemical for which a specific DNA binding mechanism is identified, analogues can be found which can bind by the same mechanism and for which experimental results are available.

Side bar on mutagens l.jpg

Side-Bar on Mutagens

  • It is important to remember that mutagens are really cell-damaging agents, which can create a wide array of adverse effects beyond damage to DNA.

  • Lets take a moment to review our mechanistic profile of the target chemical (see next slide).

No dna binding l.jpg

No DNA Binding

Defining the category l.jpg

Defining the Category

  • In the case of 1-hexanal there is no structural evidence that it is a DNA binding compound.

  • Therefore, no grouping by a DNA mechanism is possible.

  • We elect to define the category by using molecular similarity.

  • Highlight “Organic functional groups”.

  • Click on “Defining Category”.

Defining the category43 l.jpg

Defining the Category

Confirmation of groups l.jpg

Confirmation of Groups

  • An insert window listing the organic function groups of the target chemical appears.

  • Click on “OK”.

Naming category l.jpg

Naming Category

  • Another insert window listing the default category name appears.

  • Click “OK”.

Analogues identified l.jpg

Analogues Identified

Recap47 l.jpg


  • You have identified a structurally similar category for the target chemical (1-hexanal).

  • There were 34 similar chemicals identified.

  • Available data on these similar chemicals can now be collected.

Next step in gathering data l.jpg

Next Step in Gathering Data

  • Highlight the “[35]Aldehydes <AND>Methyl …” under “Single Chemical” in the “Defined Categories” box.

  • The inserted window entitled “Read Data?” appears (see next slide).

What data to collect l.jpg

What data to collect?

Side bar to data collection l.jpg

Side-Bar to Data Collection

  • Data can be collected for a wide variety of endpoints or for narrowly defined (e.g., endpoint, test scheme) ones.

  • Since data is endpoint specific the data selection is presented in a drop-down menu.

  • By double clicking on an endpoint, the data tree is expanded.

Data selection l.jpg

Data Selection

  • To select the data to be read you click on the box(s) before the name of the data type.

  • This selects (a red check mark appears) or deselects (red check disappears) the data type.

  • Click on the box next to “Toxicological Information”.

  • This places a red check mark in the box next to this data type (the one we want to read).

  • Click on “OK” (see next slide).

Reading the selected data l.jpg

Reading the Selected Data

Analogues l.jpg


  • The data is automatically collated.

  • There is genotox data on only 15 of the 34 structurally similar analogues.

  • However, multiple entries of the same test result were found and one wants to eliminate duplications (see next slide).

Click select single then click ok l.jpg

Click “Select Single” then Click “OK”

Summary of toxicological information for analogues l.jpg

Summary of Toxicological Information for Analogues

Side bar on data l.jpg

Side-Bar on Data

  • Note the structure of the compounds with experimental results is shown.

  • Double clicking on any structure enlarges the view of the structure.

  • Details on the experimental results can be retrieved by double-clicking on any cell in the data matrix line.

Navigating through the data tree l.jpg

Navigating Through the Data Tree

  • The user can navigate through the data tree by closing or opening the nodes of the tree.

  • In this example, results from genotox testing are available.

  • By double clicking on a cell in the data matrix, additional information on the test result (Ames) is made available (see next slide).

Data tree l.jpg

Data Tree

Side bar on data tree l.jpg

Side-Bar on Data Tree

  • Details about the specific assays, in this case the different strains of Salmonella typhimurium can be observed at the bottom of the screen by placing the cursor on the text fragment of the test you want more information about (see next slide).

Filling data gap l.jpg

Filling Data Gap

  • You are ready to fill the data gap. Click on “Filling data gap”.

  • In this step in the work flow the user is provided three options for making a prediction for the target molecule.

  • In this example with qualitative mutagenicity data we can only use read-across. Click on “Read-across”

  • Highlight the blank space in the “AMES_mutagenicity” line under the column for the target chemical

  • Click on “All values” under data.

  • Click on “Apply” (see next slide).

Filling data gap62 l.jpg

Filling Data Gap

Possible data inconsistencies l.jpg

Possible Data Inconsistencies

  • An insert window alerting you to possible data inconsistencies appears.

  • Click on the small box before “Endpoint”.

Multiple endpoint data l.jpg

Multiple Endpoint Data

  • In this example what appears is a red checked listing of all the Ames data, which you are trying to model at the same time.

  • Click “OK”.

Results of read across l.jpg

Results of Read-across

Interpreting the read across figure l.jpg

Interpreting the Read-across Figure

  • The resulting plot is experimental results of all analogues (Y axis) according to a descriptor (X axis) with the default descriptor of log Kow.

  • Note the dots along the bottom of the previous screen. The RED dot represents the target chemical, while the PURPLE dots the experimental results available for the analogues, which are used for the read-across; the BLUE dots represent the experimental results available for the analogues but not used for read-across.

Interpreting the read across figure67 l.jpg

Interpreting the Read-across Figure

  • Upon further examination of the read-across results, noted in an upper corner is an aqua-color dot.

  • This represents the lone positive result in the Ames tests.

  • By placing the cursor on this dot and double left clicking the structural details of this chemical appear (see next slide).

Details of outlier l.jpg

Details of Outlier

Can one prune the outlier l.jpg

Can One Prune the Outlier?

  • Clearly this outlier is structurally dissimilar to all the other compounds in the category, including the target chemical.

  • It contains a number of functional groups which are not present in the target molecule.

  • This dissimilarity is justification for deleting (pruning) it from the category.

Pruning the outlier l.jpg

Pruning the Outlier

  • Close the structure detail window.

  • Place the cursor on outlying dot and right click.

  • then click on “remove focused”.

  • This removes the outlier.

  • Note the read-across results are automatically re-tabulated (see next slide).

Re evaluated read across l.jpg

Re-evaluated Read-across

Interpretation of read across l.jpg

Interpretation of Read-across

  • In pruned data, all 10 analogues are non-mutagenic in all the Ames assays.

  • The same non-mutagenic potential (value 0f -1.0) is, therefore, predicted with confidence for the target chemical.

Filled data gap l.jpg

Filled Data Gap

  • Click “Accept”.

  • By accepting the prediction the data gap is filled (see next slide).

  • You are now ready to complete the final module and download the report.

  • Click on “Report” to move to the last module.

Filled data gap74 l.jpg

Filled Data Gap

Report l.jpg


  • The final step in the workflow, report, provides the user with a downloadable written audit trail of what the Toolbox did to arrive at the prediction.

  • Click on “Show History”

  • This study history can be printed or copied to be inserted in a more detailed report (see next slide).

Report76 l.jpg


  • Login