PASS

Chemo-Informatics

Compute biological activity for a compound library

PASS with PharmaExpert

Compute biological activity for chemical compounds using SAR  (structure-activity relationship) approach

  • Biggest SAR base trained on over 1,600,000 compounds with known biological activities (over 5,000,000 structure-activity pairs)
  • Wide spectrum of biological activities over 8,500 terms (pharmacotherapeutic effects, biochemical mechanisms, toxicity, metabolic effect)
  • High accuracy  average accuracy of prediction for the whole PASS training set is 96%
  • Mechanisms of drug action deep hierarchy based on relationships between biological activities, drug-drug interactions and multiple targeting of chemical compounds. 
  • Locally installed on your Windows computer
  • Fast analyses library of hundreds thousand of compounds in a minute on your laptop

GUSAR

Create quantitative models on  structure-activity (structure-property) relationships

  • QSAR modeling uses quantitative structure-activity relationships
  • QNA Quantitative Neighborhoods of Atoms
  • Your own library trained on your own library of compounds with quantitative activity
  • Highly specific constructs highly specific models 
  • Toxicity models precomputed models for drug toxicity in mice and rat as well as antitargets. 

PASS with PharmaExpert

Compute biological activity for chemical compounds using SAR  (structure-activity relationship) approach 

Introduction

The acronym PASS stands for Prediction of Activity Spectra for Substances. Using structural formula of a drug-like substance as an input, one obtains its estimated biological activity profile as an output. The predicted biological activity list includes the names of the probable activities with two probabilities: Pa – likelihood of belonging to the class of “Actives” and Pi – likelihood of belonging to the class of “Inactives”. By default, all activities with Pa>Pi are considered as probable; however, depending on the particular tasks, the user may choose any other cutoff for selecting the probable “Actives”.

PASS has been well accepted by the research community, and is now actively used in the field of medicinal chemistry, by both academic organizations and pharma companies. There are over 1,200 publications described PASS approach and its applications. Overview on some papers is provided here

Activity prediction for a chemical substance by PASS

The slide show demonstrating the look-and-feel of PASS can be found here as well as on our Facebook site. A particularly useful tool to analyze and utilize PASS results further is PharmaExpert.

Current PASS statistics

PASS 2022 SAR Base is based on the information about structure-activity relationships of 1,614,066 substances; the total number of experimentally determined pairwise structure-activity records is 5,174,855. 8,565 biological activities can be predicted with average accuracy about 0.93; by default, 1,957 activities are predicted with average accuracy 0.97.

PASS features and applications

Main PASS features

  • Several thousand biological activities are predicted simultaneously, including: mechanisms of action, pharmacological effects, toxic and side effects, interaction with antitargets, interaction with drug metabolizing enzymes (inhibition, induction, interaction as a substrate), interaction with transporter protein (inhibition, stimulation, interaction as a substrate), changing gene expression of individual genes (increase, decrease). Due to such extensive analysis, the general pharmacological potential of molecules under study is disclosed.
  • The user may select a particular pharmacotherapeutic field or a certain set of analyzed activities relevant to the purpose of the study. Such selection facilitates the analysis of the output information and reduce the computation time, which is particularly important in case of large chemical libraries.
  • Algorithm of prediction is statistically robust with respect to the incompleteness of the data in the training set. Since no one biologically active compound is tested against all known biological activities, it is necessary to obtain the best possible estimations even using the imperfect information.
  • The prediction is based on the structural formula of the compound that allows one to apply it to the virtual structures.
  • Evaluation of the pharmaceutical substances’ action on many pharmacological targets to identify the hidden pharmacological potential of the launched pharmaceuticals, provides the ideas regarding possible drug repurposing.
  • Prediction of the potential averse/toxic effects of drug-like compounds leads to their filtering out at the early stages of research.
  • Re-training of the program using the customer’s training set lead to unique proprietary SAR bases creation, which can be used in house for further predictions.
  • High rate of computation allows efficient multitarget virtual screening of large chemical libraries containing many millions structural formulae of drug-like compounds.
  • Medicinal chemistry
  • Computational chemistry
  • Drug discovery / drug development
  • Drug repositioning
  • Chemical toxicity
  • Safety assessment
  • Pharmacogenomics, chemogenomics
  • SAR (qualitative structure-activity relationship)
  • Natural compound effects
  • Translational research / translational medicine
  • PASS Standard
    Standard software package includes the standard SAR Base (Structure-Activity Relationship knowledgebase). Currently, standard version of PASS can predict 8,565 biological activities (1,957 activities are predicted by default).Depending on the particular purposes, the user may include into the predictable activity list any of the 8,565 activity types using the “Selection” procedure.
  • PASS Refined
    PASS Refined package provides all functions of PASS Standard and can predict 1,957 biological activities that are the most topical.
  • PASS Professional
    PASS Professional package provides all functions of PASS Standard and the additional options to create, train and validate your proprietary SAR Base. With this package, you could make your own and unique SAR Base, and use it for further predictions in house on the exclusive basis. You may also add some additional data on structure-activity relationships to the standard SAR Base, and re-train the program to obtain the updated SAR Base. Thus, locally you would have a unique variant of PASS.
  • PASS Light
    With PASS Light, you can create, train and validate your proprietary SAR base, and use it for further predictions. The standard SAR Base is not included.
  • PASS customized
    According to your potential focus on particular types of biological activities, a customized version of PASS can be prepared, which can predict a restricted number of activities as per your selection. The command line utilities PASS 2022 to CSV, PASS 2022 to SDF, and PASS 2022 to TXT for use in pipelines are also available.
  • PASS + PharmaExpert package
    Any of the product PASS Standard, PASS Refined, PASS Professional, PASS Light or PASS customized, can be ordered in a package with PharmaExpert.

PharmaExpert tool

PharmaExpert is an analyses tool to study the relationships between biological activities, drug-drug interactions and multiple targeting of chemical compounds and selects compounds that have a pre-defined biological activity. It helps answer a question like “How to select the most promising compounds among those known to interact with the selected protein?”

Key features of PharmaExpert

PharmaExpert is an expert system taking into account the known relationships between pharmacotherapeutic effects and mechanisms of action of biologically active substances.

Fields of application: Analysis of the cause-effect relationships between the biological activities, estimation of possible positive and negative pharmacokinetic and pharmacodynamic drug-drug interactions, selection of compounds with the needed activity spectra predicted by PASS, identification of compounds with multiple mechanisms of particular pharmacological action.

PharmaExpert is designed to visualize and to analyze the prediction results of PASS and GUSAR software. It provides the following functions:

  • reading the SD files containing information about the structures of organic compounds and prediction results of their spectra of biological activity provided by PASS, as well as the prediction results of (Q)SAR models of GUSAR;
  • visualization of relationships between the predicted biological activities based on the known data about the causal relationships between them, and “target-pathway-effect” relationships;
  • selecting compounds with desired biological activities in one or more SD files;
  • analysis of possible positive and negative drug-drug interactions for individual pairs of compounds or for all compounds contained in the SD file;
  • saving identified relationships between the activities and the results of the selection of compounds with desired biological activities in SD or TXT file.


PharmaExpert 2022
contains a knowledgebase with over 15 thousand of known interactions between biological activities, as well as the relationships between proteins, signalling/regulatory pathways (KEGG or Reactome), Gene Ontology biological processes and therapeutic and adverse effects:

All biological activities are divided onto seven types: (1) mechanisms of action; (2) pharmacological effects; (3) toxic and side effects; (4) interaction with antitargets; (5) interaction with drug metabolizing enzymes (inhibition, induction, interaction as a substrate); (6) changing gene expression of individual genes (increase, decrease); (7) interaction with transporter protein (inhibition, stimulation, interaction as a substrate).

Automatic search is provided for compounds acting on any of the mechanisms of action (or simultaneously on several mechanisms of action, up to ten) associated with the therapeutic effect or signalling/regulatory pathway (KEGG or Reactome) and biological process of Gene Ontology.

Analysis of possible drug-drug interactions is performed simultaneously for all seven types of biological activity.

  • The program contains a knowledgebase with more than 14 thousand of known interactions between biological activities, as well as the relationships between proteins, signalling / regulatory pathways (KEGG, Reactome), Gene Ontology biological processes and therapeutic and adverse effects.
  • All biological activities are divided onto 7 types: (1) mechanisms of action; (2) pharmacological effects; (3) toxic and side effects; (4) interaction with antitargets; (5) interaction with drug metabolizing enzymes (inhibition, induction, interaction as a substrate); (6) interaction with transporter protein (inhibition, stimulation, interaction as a substrate); (7) changing gene expression of individual genes (increase, decrease).
  • Analysis of possible drug-drug interactions is performed simultaneously for all 7 types of biological activity. 
  • Automatic search for compounds acting on any of the mechanisms of action (or simultaneously on several mechanisms of action, up to 10) associated with the therapeutic effect or signalling/regulatory pathway (KEGG, Reactome) or biological process of Gene Ontology.
  • User-friendly interface, fast download speed and analysis of the prediction results of PASS, PASS Affinity and GUSAR

How PASS (Prediction of Activity Spectra for Substances) works?

PASS uses a structural formula of a drug-like substance as an input and computes its estimated biological activity profile (or spectrum) as an output. The predicted biological activity list includes the names of the probable activities with two probabilities: Pa – the likelihood of belonging to the class of “Actives” and Pi – the likelihood of belonging to the class of “Inactives”.

  • BIOLOGICAL ACTIVITY
    Biological activity is the result of a chemical compound’s interaction with biological objects. It depends on the characteristics of the compound (structure of molecule), the biological object (kind, sex, age, etc.), details of the exposure (route of administration, dosage), and peculiarities of the experimental conditions. In PASS, biological activities are described qualitatively («active» or «inactive»)
  • BIOLOGICAL ACTIVITY SPECTRUM
    The Biological Activity Spectrum of a chemical compound is the set of different types of biological activity that reflect the results of the compound’s interaction with various biological entities. It represents the “intrinsic” property of a compound depending only on its chemical structure. Though this may be a generalization, it provides the possibility for combining information from many different sources in the same training set, which is necessary because no one particular publication comprehensively covers all the various facets of the biological action of a compound.
  • CHEMICAL STRUCTURE REPRESENTATION
    The 2D structural formulae of organic compound were chosen as the basis for the description of chemical structures, because this is the only information available at the early stage of research (compounds may only be designed but not synthesized yet). The structure descriptors we use, which we call «Multilevel Neighborhoods of Atoms» (MNA), were specifically designed for chemical structure representation for SAR analysis realized in PASS and similar approaches (Filimonov D. et al., 1999). LINK

    Extended Connectivity Fingerprints (ECFP), which were developed later (Glen R.C. et al., 2006), are based on the same idea as MNA descriptors:

MNA descriptors, unlike ECFP, preserve the connectivity between atoms of different layers in the form of a nested bracket structure. They are based on the molecular structure representation, which includes all hydrogen atoms according to the valences and charges of atoms and does not specify bond types. Therefore, they inherently include the information about the type of hybridization of atomic orbitals.

  • EQUIVALENT STRUCTURES
    The chemical compounds are considered equivalent in PASS if their molecular structures have the same MNA descriptors set. Since MNA descriptors do not represent the stereochemical peculiarities of a molecule, structures that only differ by stereochemistry are formally considered equivalent.
  • SAR BASE
    The SAR Base contains vocabularies of MNA descriptors and activity names, the database of the substances’ structures represented by MNA descriptors, their biological activity spectrum, and data on the structure-activity relationships (SAR). Unfortunately, it is currently impossible to collect sufficiently large numbers of active compounds for all activities from available sources. This is why some activity types are represented in the general PASS training set by more than 300,000 drug-like compounds, while some others are only represented by a few ones. The supplied with PASS SAR Bases consists of the substances with at list one known activity.
    PASS 2022 SAR Base is based on the information about structure-activity relationships obtained with an in-house training set of more than 1,614,066 compounds with 10,112 known biological activities. This training set is continuously curated and expanded. SAR Base can also be replaced by an exclusive knowledgebase, which can be created using in-house data. SAR Base together with the user-defined constraints of biological activities of interest and relevant parameters provides PASS the starting point for the computational prediction.
    More detailed description of PASS method is available in several publications, for instance in the paper: Filimonov et al., 2014. LINK (PDF file may be obtained on request).

Key features of PASS 2022

PASS training set

The general PASS training set was corrected and extended; thus, PASS 2022 SAR Base includes 1,614,066 (1,368,353 in PASS 2020) drugs, drug-candidates, pharmaceutical agents and chemical probes, as well as compounds for which specific toxicity information is known.

Biological activities list

The entire activity list includes 10,112 terms describing biological activities (9,942 in PASS 2020). About two hundred novel biological activities were added including: Antiviral (Coronavirus), Antiviral (SARS coronavirus), 3C-Like protease (SARS coronavirus) inhibitor, Papain-like protease (SARS coronavirus) inhibitor.

Pairwise structure-activity

In PASS 2022 the total number of pairwise structure-activity records is 5,174,855 (4,288,195 in PASS 2020), with an average of 512 compounds per activity and 3.2 activities per compound.

Predictable activity types

The number of predictable activity types is 8,565, and 1,957 activity types are in the recommended activity list. The average invariant accuracy of prediction (IAP) exceeded 0.93 for all 8,565 predictable activities, and is over 0.97 for the recommended activities. Depending on the particular purpose, the user may include into the predictable activity list any of the 8,565 activity types using the “Selection” procedure.

In PASS 2022, the MNA descriptors (for prediction of activity spectra or for adding substances to SAR Base) are generated if structure corresponds to the following criteria:

  • each of the atoms in a molecule must be presented by atom symbol from the periodic table. Symbols of unspecified atom A, Q, *, or R group labels are not allowed;
  • each of the bonds in a molecule must be covalent bond presented by single, double or triple bond types only.

All other limitations on the structural formulae implemented in the previous PASS versions (only one uncharged component, minimum three carbon atoms in the structure, MW<1,250) are not applied anymore.

If the structure does not correspond to these criteria or the input data contains any other errors, a message about the first critical error will be received.

For a multi-component structure, only the largest component (with the largest number of heavy atoms) is taken into account.

Based on the prediction results, you can evaluate the contribution of each of the atoms of the structure to the estimated biological activity. Select the desired biological activity in the predicted activity spectra by clicking on it; then, each of the atoms of the structure will be colored according to the following scheme:

Light Green   Pa = 1, Pi = 0 (atom promotes activity)

Light Red       Pa = 0, Pi = 1 (atom promotes inactivity)

Light Blue      Pa = 0, Pi = 0 (atom does not generate any signal)

Grey                Pa = 0.33, Pi = 0.33 (atom equivocal for weak signal)

Acyclovir, selected activity – “Antineoplastic enhancer”.

GUSAR

Create quantitative models on structure-activity (structure-property) relationships. 

Run your compounds on pre-computed quantitative models

Introduction

GUSAR is a tool to create models on quantitative structure-activity relationships. The acronym stands for “General Unrestricted Structure-Activity Relationships”.The input of the program is your training set of chemical structures and quantitative data on biological activities. The output is a reliable quantitative SAR/SPR (Structure Activity and Property Relationship) model.

GUSAR user interface

Quantitative prediction of the effect of a chemical compound. 

GUSAR is a tool for analysis of quantitative and qualitative structure-activity/structure-property relationships ((Q)SAR/(Q)SPR) on the basis of the structural formulas of the compounds and data on their activity/property, and for prediction of activity/property for new compounds. The acronym stands for “General Unrestricted Structure-Activity Relationships”. GUSAR can be easily applied to different routine (Q)SAR tasks, for building many models, and for prediction by these models of the different quantitative/qualitative values simultaneously.

The GUSAR software provides the following functions:

  • reading of SD files with data on chemical structures and their activities/properties;
  • creation and validation of (Q)SAR models;
  • prediction of the activities/properties for the new compounds by created activities/properties (Q)SAR models and saving the prediction results in SD or CSV files.

GUSAR features and applications

  • Unique descriptors and mathematical algorithms
  • High speed of predictions
  • Easy-to-use interface
  • Selection of the most predictive models
  • Activity impact
  • Estimation which parts of a molecule provide positive and negative impact to the activity
  • Output extraction in SDF and CSV
  • Saving GUSAR output predictions in SDF and CSV formats for subsequent analyses

Automatic, simplified creation of (Q)SAR models
Enhanced choice of variables:

  • QNA descriptors
  • MNA based descriptors
  • Combinations between QNA, MNA descriptors and additional variables –
  • Topological length of a molecule
  • Topological volume of a molecule
  • Calculated lipophilicity

Improved algorithm of model building and prediction:

  • Self-consistent regression (SCR)
  • Radial basis function (RBF) neural networks regression
  • Both types of regression
  • Nearest neighbors’ prediction correction
  • Consensus of multiple models’ construction
  • Different methods for model validation.

GUSAR algorithm

The core of GUSAR consists of a unique algorithm of self-consistent regression that allows to select the best set of descriptors for a robust and reliable QSAR model.

Chemical structures are represented by MNA (Multilevel Neighborhood of Atoms) or QNA (Quantitative Neighbourhoods of Atoms) descriptors and biological activity descriptors that are based on the PASS prediction results for more than 8000 biological activities. QNA descriptors easily reflect the nature of intermolecular interactions. Models developed using biological activity descriptors enable to reveal key mechanisms of action of complex biological effects. MNA and QNA descriptors are used to calculate several variables, such as topological length and volume or lipophilicity of a molecule. For further details, see Filimonov et al. (2009), Zakharov et al. (2012), Zakharov et al. (2016).

In comparison with a number of 3D and 2D QSAR methods, the predictivity of GUSAR was superior to that of most other QSAR methods both on heterogeneous and on homogeneous data sets.
(Filimonov et al., 2009)

Comparison of different QSAR approaches; shown is the performance of GUSAR relative to other methods.

The program allows creating individual (Q)SAR models and their sets, presented in the form of consensus as well.
It can be used for the creation of (Q)SAR models for the prediction of properties of organic compounds belonging to both homogeneous and heterogeneous chemical classes.

It is based on unique and innovative atom centric descriptors which are called Quantitative and Multilevel Neighborhoods of Atoms (QNA and MNA) descriptors.
It uses modern and robust machine learning approaches: self-consistent regression and radial basis functions for automatic creation of (Q)SAR models. 
Along with prediction results the end-user can get an evaluation of applicability domain of the model.
Visualization of contributions of atoms into predicted value of the activity allows identifying the atoms and molecular fragments that make a positive and a negative contribution to the activity.

User-friendly interface, fast speed of the (Q)SAR models creation and prediction of the test compounds as well.

Precomputed GUSAR models

The GUSAR software optionally includes ready-trained GUSAR models (SAR bases) for predicting certain biological activities. These are SAR bases that can be used with the GUSAR program for predictions on acute rat toxicity, acute mouse toxicity or antitargets (off-targets).

The acute rat or mouse toxicity SAR bases can be used for in silico prediction of LD50 values for rats or mouse with four types of administration.

A quantitative prediction of antitarget interaction for chemical compounds can be done with the other SAR base. The QSAR models for the set of 32 activities (using IC50, Ki or Kact values) includes data on about 4,000 chemical compounds interacting with 18 antitarget proteins (13 receptors, 2 enzymes and 3 transporters).

GUSAR publications

WHAT MAKES PASS and GUSAR DIFFERENT FROM OTHER TOOLS?

  • Biggest SAR base PASS is trained on over 1,600,000 compounds with known biological activities. During the training we learn our prediction rules on the basis of over 5,000,000 structure-activity pairs.
  • Wide spectrum of biological activities PASS makes predictions for over 8,500 different biological activities  including pharmacotherapeutic effects, biochemical mechanisms, toxicity, metabolic effect etc.
  • High accuracy  The average invariant accuracy of prediction (IAP) exceeded 0.93 for all 8,565 predictable activities, and is over 0.97 for the recommended activities.