The basis > How it works
Biological activity is the result of a chemical compound’s interaction with biological objects. It depends on the characteristics of the compound (structure of molecule), the biological object (kind, sex, age, etc.), details of the exposure (route of administration, dosage), and peculiarities of the experimental conditions. In PASS, biological activities are described qualitatively («active» or «inactive»).
BIOLOGICAL ACTIVITY SPECTRUM
The Biological Activity Spectrum of a chemical compound is the set of different types of biological activity that reflect the results of the compound’s interaction with various biological entities. It represents the “intrinsic” property of a compound depending only on its chemical structure. Though this may be a generalization, it provides the possibility for combining information from many different sources in the same training set, which is necessary because no one particular publication comprehensively covers all the various facets of the biological action of a compound.
CHEMICAL STRUCTURE REPRESENTATION
The 2D structural formulae of organic compound were chosen as the basis for the description of chemical structures, because this is the only information available at the early stage of research (compounds may only be designed but not synthesized yet). The structure descriptors we use, which we call «Multilevel Neighborhoods of Atoms» (MNA), were specifically designed for chemical structure representation for SAR analysis realized in PASS and similar approaches (Filimonov D. et al., 1999). LINK
Extended Connectivity Fingerprints (ECFP), which were developed later (Glen R.C. et al., 2006), are based on the same idea as MNA descriptors:
MNA descriptors, unlike ECFP, preserve the connectivity between atoms of different layers in the form of a nested bracket structure. They are based on the molecular structure representation, which includes all hydrogen atoms according to the valences and charges of atoms and does not specify bond types. Therefore, they inherently include the information about the type of hybridization of atomic orbitals.
The chemical compounds are considered equivalent in PASS if their molecular structures have the same MNA descriptors set. Since MNA descriptors do not represent the stereochemical peculiarities of a molecule, structures that only differ by stereochemistry are formally considered equivalent.
The SAR Base contains vocabularies of MNA descriptors and activity names, the database of the substances’ structures represented by MNA descriptors, their biological activity spectrum, and data on the structure-activity relationships (SAR). Unfortunately, it is currently impossible to collect sufficiently large numbers of active compounds for all activities from available sources. This is why some activity types are represented in the general PASS training set by more than 300,000 drug-like compounds, while some others are only represented by a few ones. The supplied with PASS SAR Bases consists of the substances with at list one known activity.
PASS 2022 SAR Base is based on the information about structure-activity relationships obtained with an in-house training set of more than 1,614,066 compounds with 10,112 known biological activities. This training set is continuously curated and expanded. SAR Base can also be replaced by an exclusive knowledgebase, which can be created using in-house data. SAR Base together with the user-defined constraints of biological activities of interest and relevant parameters provides PASS the starting point for the computational prediction.
More detailed description of PASS method is available in several publications, for instance in the paper: Filimonov et al., 2014. LINK (PDF file may be obtained on request).