Cambridge MedChem Consulting

Predicting Metabolism

All drugs are subject to metabolic process, in general these process serve to increase the polarity of molecules in an effort to increase excretion. Whilst the impact of metabolism on the drug duration of action is major concern, a knowledge of the metabolic routes can be useful in other ways. In particular, knowledge of the mechanism may highlight potential high energy intermediates that could contribute to toxicity, the identification of metabolites that in themselves may or may not have pharmacological effects, changes in physicochemical properties resulting from biotransformation.

Several approaches have been used to produce in silico systems to predict metabolism,

Local solutions, intended to predict the activity of a single enzyme (and often only within a chemical series). These models can be based on pharmacophores (QSAR) derived from known substrates, or docking potential substrates into the active site of the enzyme, and/or quantum mechanical calculations used to predict reactivity. The majority of drug are metabolised by Cytochome P450 enzymes that exist predominantly in the liver. Whilst similar in structure the CYPs have distinct substrate specificities and models for each of the enzymes need to be derived. The recent crystallization of CYP P450s should help refine these models. A number of the computational models for predicting CYP mediated metabolism have been reviewed (European Journal of Medicinal Chemistry Volume 41, Issue 7, July 2006, Pages 795-808).

SMARTCyp is a method for prediction of which sites in a molecule that are labile for metabolism by Cytochromes P450 isoform 3A4. It is also a reactivity model which is applicable to all P450 isoforms. The method has been published as SMARTCyp – a 2D-method for Prediction of Cytochrome P450 Mediated Drug Metabolism in ACS Medicinal Chemistry Letters, DOI. The results for the sites of metabolism for the NK1 antagonist Emend are shown below.


The SMARTCyp methodology has been used by several groups to enhance other tools, RS-Predictor DOI. Combination of RS-Predictor and SMARTCyp are shown to have stronger performance than either method alone. RS-Predictor provides a predictive models for a selection of cytochrome P450 enzymes (CYPs 1A2, 2A6, 2B6, 2C19, 2C8, 2C9, 2D6, 2E1, and 3A4).


XenoSite is a web-based tool for predicting the atomic sites at which xenobiotics will undergo metabolic modification by Cytochrome P450 enzymes. J. Chem. Inf. Model., 2013, 53 (12), pp 3373–3383 DOI. XenoSite output is interpretable as a probability, which reflects both the confidence of the model that a particular atom is metabolised and the statistical likelihood that its prediction for that atom is correct.

XenoSite combines the computation of multiple quantitative descriptions of molecules, including topological and quantum chemical descriptions, as well as robust descriptions of the reactivity of atomic sites generated by the SmartCyp software [2]. Using a robust neural-network model, XenoSite combines these molecular descriptions with a fingerprint-based search using a technique known as Influence Relevance Voting, a previously developed in silico screening technique. The resulting models of nine CYP isozymes achieve the highest known cross-validated accuracy reported in the literature of CYP metabolism prediction models.

Note, so do not submit confidential structures.

WARNING! XenoSite is provided free-of-charge, however, we cannot make guarantees about data confidentiality on this public website

XenoSite supports models addressing several different enzymes and mechanisms and the results are very nicely displayed on a per model basis (shown using Emend again).


These models can be rather time-consuming and are perhaps best suited for evaluating a few lead compounds.

FAME DOI is a collection of random forest models trained on a comprehensive and highly diverse data set of 20,000 small molecules annotated with their experimentally determined sites of metabolism taken from multiple species (rat, dog and human).


In addition dedicated models are available to predict sites of metabolism of phase I and II processes.

StarDrop’s P450 calculations use semi-empirical quantum mechanical calculations (AM1 in MOPAC) to estimate the activation energy for product formation at each potential site of metabolism. These are corrected for steric and orientation effects of the binding pockets of different isoforms, currently covering: CYP3A4, CYP2D6, CYP2C9, CYP2C8, CYP1A2 and CYP2E1. In addition to the regioselectivity of metabolism for each isoform, the models also predict the lability of metabolism of each site in absolute terms and the metabolite formed by metabolism at each site. The calculations run on a server because they are more computationally intensive than QSAR models. They take ~1-2 minutes per ‘drug like’ compound. However they can also be scaled across multiple cores/CPUs to increase throughput.



A comparison of StarDrop and SMARTCyp (where specie isoform models are available).


It is noteworthy that SMARTCyp is better at predicting the sites of metabolism for 2D6 and 2C9, this is probably due to more restricted substrate specificity of 2D6 and 2C9. In addition the active site for 3A4 is able to accommodate a much greater diversity of substrates (and poses) meaning that novel substrates outside the applicability domain defined by the fixed list of SMARTS in SMARTCyp may not be predicted. In contrast the semi-empirical quantum mechanical calculations will still be valid.

BioTransformer is a comprehensive computational tool for small molecule metabolism prediction and metabolite identification. BioTransformer combines a machine learning approach with a knowledge-based approach to predict small molecule metabolism in human tissues (e.g. liver tissue), the human gut as well as the environment (soil and water microbiota), via its metabolism prediction tool. In addition BioTransformer provides information on secondary metabolism. There is a review of BioTransformer here

Using Emend as the test molecule BioTransformer identified over 200 putative metabolites. As well as the putative metabolite structure a range of other properties are calculated including InChiKey which can be used to search many other databases, Major Isotopic Mass which could be used in metabolite identification. Together with a description of the reaction and the likely enzymes involved. For the prediction of CYP450 metabolism, BioTransformer makes use of CypReact, a tool for CYP450 substrate specificity prediction


Running the process of Diazepam yielded a sdf file containing yielded 151 structures but on closer examination there were many duplicate structures and only 51 unique structures (I used this Vortex script to flag duplicate structures). This is because several reaction patterns can yield the same structure. Around 30 had PubChem CID suggesting these are previously identified metabolites.

Phase II metabolism

Aspirin hydrolyses to produce salicylic acid (2-hydroxybenzoic acid), Phase II metabolism results in conjugates with glycine (to form salicyluric acid) or glucuronic acid to make several ionised metabolites that can then be excreted in the urine.


Since none of the processes described above are CYP mediated it is not surprising that they are not highlighted by SMARTCyp.


In contrast FAME correctly identifies hydrolysis of the ester and the potential of the carboxylic acid to be involved in metabolic processes.


FAME offers a high performance prediction of sites of metabolism mediated by a wide variety of mechanisms.

GLORYx DOI predicts phase I and phase II metabolites for the chemical compound(s) provided by the user. The method is based on the FAME site of metabolism (SoM) prediction combined with sets of reaction rules encoding both phase I and phase II metabolic reactions. GLORYx is freely available as a web server at and is available as a java jar file upon request.

The GLORYx can be run from the command line. All you need to do is specify one or more input molecules, as SMILES strings or as an SDF file.

java -Xmx16g -jar /Users/chrisswain/Desktop/gloryx_2020-09-02/gloryx.jar -s "FC(F)(F)c1cc(cc(c1)C(F)(F)F)[C@H](O[C@H]4OCCN(CC/2=N/C(=O)NN\2)[C@H]4c3ccc(F)cc3)C"

Note that since the FAME 3 models can take quite a bit of memory, the -Xmx16g flag may be necessary. On my MacPro it utilised multiple cores


The metabolism phase can be specified. For example, to predict metabolites for only phase I

java -Xmx16g -jar /Users/chrisswain/Desktop/gloryx_2020-09-02/gloryx.jar -s "FC(F)(F)c1cc(cc(c1)C(F)(F)F)[C@H](O[C@H]4OCCN(CC/2=N/C(=O)NN\2)[C@H]4c3ccc(F)cc3)C" -p P1

The output is stored in a folder entitled "metabolitepredictionresults" created in the users home folder. This folder will contain the predictions in the form of one or more SDF files, whereby each SDF file corresponds to up to 1000 input molecules. The input molecules are included as the first record in the output file(s), so that each input molecule is followed by its predicted metabolites.

The resulting output file contains a large number of predicted metabolites together with a score, a description of the type of reaction and the InChi. This looks to be really comprehensive and would be very useful for those involved in metabolite ID.


Running the prediction with aspirin as the input highlights a variety of non-CYP mediated metabolic pathways.


I tried a range of other molecules and GLORYx was really very impressive in identifying potential metabolites.

Way2Drug offers a web service for predicting sites of metabolism details of which have been published DOI. However I would not recommend that you use it for proprietary molecules. All major classes of metabolic reactions—aliphatic and aromatic hydroxylation, N- and O-glucuronidation, N-, S- and C-oxidation, and N- and O-dealkylation are evaluated.


Global solutions, intended to predict the metabolism of any molecule exposed to a complex biological system. This type of solution is often rule-based and uses an extensive database of known biotransformations Programs such as MetaDrug (Expert Opin. Drug Metab. Toxicol. (2005) 1(1)) uses a series of rules together with a series of QSAR models to predict metabolic transformations, and includes both phase I and phase II metabolism. The transformations described include among many:- C,N,S and P-oxidation, including dealkylation, hydroxylation, double bond peroxidation, Quinone formation, reduction (nitro, carbonyl, azo, sulphur), hydrolysis (esters, amides, phosphates, epoxides), glucoronidation, sulphation, glutathione conjugation, methyl transferases, amino acid conjugation. Other programs using similar approaches include Meteor, (Pure Appl.Chem., Vol.76, No.5, pp.907–914, 2004) and Meta.

In general I’ve found these products very useful for identifying all potential metabolic sites, however they can over-predict and you may well find many of the potential metabolic routes have negligible contributions in vivo.

Whilst the above programs offer a nice insight into the potential soft spots in the molecule sometimes you just want to know which enzymes are likely to be involved in the metabolism of a molecule, CypReact DOI takes a structure (SMILES or sdf input) and predicts if the molecule will react with any one of the nine of the most important human cytochrome P450 (CYP450) enzymes [CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, or CYP3A4]. CypReact is a command line tool, is extremely fast and is ideal for quickly evaluating a batch of compounds. CypReact is available at

If we look at Emend

MacPro:~ Chris$ java -jar /Users/Chris/Downloads/Leon_Ti-cypreact-29a582219630/CypReactBundle/cypreact.jar /Users/Chris/Downloads/Leon_Ti-cypreact-29a582219630/CypReactBundle/  /Users/Chris/Desktop/SampleFiles/emend.sdf /Users/Chris/Desktop/SampleFiles/emendmetab.sdf 1A2,2A6,2B6,2C8,2C9,2C19,2D6,2E1,3A4
Processing Molecue: 1
1A2, Mole1: R
3A4, Mole1: R
2B6, Mole1: N
2E1, Mole1: N
2C9, Mole1: R
2C19, Mole1: R
2D6, Mole1: R
2C8, Mole1: N
2A6, Mole1: N

Worth reading:

Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms DOI
Predicting Regioselectivity and Lability of Cytochrome P450 Metabolism Using Quantum Mechanical Simulations DOI

See also the section on Metabolism

Updated 7 February 2021