Subscribe in a reader

Cambridge MedChem Consulting

AlphaFold Protein Structure Database

The AlphaFold Protein Structure Database Developed by DeepMind and EMBL-EBI is now available online.

AlphaFold DB provides open access to protein structure predictions for the human proteome and 20 other key organisms to accelerate scientific research.

AlphaFold DB currently provides predicted structures for the organisms listed below and includes human, laboratory species, and key pathogens. All the predictions for all the species can be downloaded from the EBI FTP site

Species Common Name Reference Proteome Predicted Structures Download
Arabidopsis thaliana Arabidopsis UP000006548 27,434 Download (3642 MB)
Caenorhabditis elegans Nematode worm UP000001940 19,694 Download (2601 MB)
Candida albicans C. albicans UP000000559 5,974 Download (965 MB)
Danio rerio Zebrafish UP000000437 24,664 Download (4141 MB)
Dictyostelium discoideum Dictyostelium UP000002195 12,622 Download (2150 MB)
Drosophila melanogaster Fruit fly UP000000803 13,458 Download (2174 MB)
Escherichia coli E. coli UP000000625 4,363 Download (448 MB)
Glycine max Soybean UP000008827 55,799 Download (7142 MB)
Homo sapiens Human UP000005640 23,391 Download (4784 MB)
Leishmania infantum L. infantum UP000008153 7,924 Download (1481 MB)
Methanocaldococcus jannaschii M. jannaschii UP000000805 1,773 Download (171 MB)
Mus musculus Mouse UP000000589 21,615 Download (3547 MB)
Mycobacterium tuberculosis M. tuberculosis UP000001584 3,988 Download (421 MB)
Oryza sativa Asian rice UP000059680 43,649 Download (4416 MB)
Plasmodium falciparum P. falciparum UP000001450 5,187 Download (1132 MB)
Rattus norvegicus Rat UP000002494 21,272 Download (3404 MB)
Saccharomyces cerevisiae Budding yeast UP000002311 6,040 Download (960 MB)
Schizosaccharomyces pombe Fission yeast UP000002485 5,128 Download (776 MB)
Staphylococcus aureus S. aureus UP000008816 2,888 Download (268 MB)
Trypanosoma cruzi T. cruzi UP000002296 19,036 Download (2905 MB)
Zea mays Maize UP000007305 39,299 Download (5014 MB)

The search bar at the top of the query page accepts queries based on protein name, gene name, UniProt identifier, or organism name. At present you can't search using a sequence and look for similar proteins. You would first need to do a BLAST search and use the results from that as queries.

Here I searched for Plasmodium falciparum carbonic anhydrase (Q8IHW5) a potential Malaria target. As you can see there is no crystal structure in the PDB. Whilst the active site is predicted with high confidence there are clearly regions for which there is very low confidence.


You can then download the structure in PDB or mmCIF format.

I made a homology model (in purple below) of this protein a while back and it has little sequence similarity with any proteins in the PDB. Despite not including a Zinc the Alphafold Predicted Structure includes histidines in positions to potentially coordinate to the Zinc. If it is possible to include the Zinc in the structure prediction I'd be interested in finding out.


Overall I'd say this is a very useful starting point.

The PROTACtable genome

As PROTACs have become more widespread the obvious question is which proteins are best suited to modulation by Protacs? A recent publication provides useful guidelines The PROTACtable genome DOI. The workflow is based on a method developed by a group at GSK, subsequently expanded and now integrated into the Open Targets Platform. Using publicly available data sources, the new method assesses whether a protein could be targeted using a PROTAC, based on the protein’s sequence, location, natural turnover rate in the cell, and evidence from published literature. The framework will help drug discovery researchers to gauge the PROTACtability of their protein of interest, and to prioritise their research accordingly.

The PROTACtable genome

More details on PROTACs here

AI4Proteins videos now online

On June 16/17 2021 RSC CICAG and AI3D held a joint meeting on Protein Structure Prediction. The full lineup of speakers, titles and abstracts can be found here.

Session 1: Session Chair: Professor Jeremy Frey (University of Southampton)
An AI solution to the protein folding problem: what is it, how did it happen, and some implications Professor John Moult (University of Maryland)
Session 2: Session Chair: Dr Melanie Vollmar (Diamond)
So you predicted a protein structure – What now? Dr Thomas Steinbrecher (Schrödinger)
Deep Learning enhanced prediction of protein structure and dynamics Dr Martina Audagnotto (AstraZeneca)
Fireflies-Lévy Flights algorithm for peptides conformational optimization Dr Zied Hosni (University of Sheffield)
Session 3: Session Chair: Dr Chris Swain (Cambridge MedChem Consulting)
How good are protein structure prediction methods at predicting folding pathways? Mr Carlos Outeiral Rubiera (University of Oxford)
Protein-Ligand Structure Prediction for GPCR Drug Design Dr Chris De Graaf (Sosei Heptares)
Session 4: Session Chair: Dr Márton Vass
Using icospherical input data in machine learning on the protein-binding problem Dr Ella Gale (University of Bristol)
Biological sequence design with machine learning Professor Debora Marks (Harvard University)
Session 5: Session Chair: Dr Simone Fulle (Novo Nordisk)
Lessons learned from generative models of biological sequences Professor Aleksej Zelezniak (Chalmers University of Technology)
DeepDock: a deep learning approach to predict ligand binding conformations Dr Oscar Méndez-Lucio (Janssen Pharmaceuticals)
Finding new in silico-based therapeutic strategies for IAHSP Dr Matteo Rossi Sebastiano (University of Turin)
Session 6: Session Chair: Professor Jonathan Goodman (University of Cambridge)
Designing molecular models by machine learning and experimental data Professor Cecilia Clementi (Freie Universität Berlin)
The “almost druggable” genome Professor Tudor Oprea (University of New Mexico)
Session 7: Session Chair: Dr Lucy Colwell (University of Cambridge)
General Effects of AI on Drug Discovery Dr Derek Lowe (Novartis)
Open Access Data: A Cornerstone for Artificial Intelligence Approaches to Protein Structure Prediction Professor Stephen Burley (RCSB PDB, Rutgers University, UCSD)

The videos of the presentations are now available on YouTube and you can access the playlist here

For those wanting a hype free insight into the impact AI might make on Drug Discovery then the presentation by Derek Lowe is well worth watching.

Open Targets Platform 21.06 has been released!

Open Targets Platform 21.06 has been released

The Open Targets Platform is a comprehensive tool that supports systematic identification and prioritisation of potential therapeutic drug targets. By integrating publicly available datasets including data generated by the Open Targets consortium, the Platform builds and scores target-disease associations to assist in drug target identification and prioritisation. It also integrates relevant annotation information about targets, diseases, phenotypes, and drugs, as well as their most relevant relationships.

Currently there are:-

Targets 60,606
Diseases 18,507
Drugs 13,185
Evidence strings 13,267,236
Associations 11,755,362

Computational Prediction of covalent Inhibitors

Covalent Inhibitors are an increasingly important class of therapeutic agents.

A computational pipeline has been described by the London lab to predict suggest covalent analogs of non-covalent ligands DOI.

Designing covalent inhibitors is increasingly important, although it remains challenging. Here, we present covalentizer, a computational pipeline for identifying irreversible inhibitors based on structures of targets with non-covalent binders. Through covalent docking of tailored focused libraries, we identify candidates that can bind covalently to a nearby cysteine while preserving the interactions of the original molecule. We found 11,000 cysteines proximal to a ligand across 8,386 complexes in the PDB. Of these, the protocol identified 1,553 structures with covalent predictions. In a prospective evaluation, five out of nine predicted covalent kinase inhibitors showed half-maximal inhibitory concentration (IC50) values between 155 nM and 4.5 μM. Application against an existing SARS-CoV Mpro reversible inhibitor led to an acrylamide inhibitor series with low micromolar IC50 values against SARS-CoV-2 Mpro. The docking was validated by 12 co-crystal structures. Together these examples hint at the vast number of covalent inhibitors accessible through our protocol.

RDKit was used for 2D molecular handling, conformation generation and RMSD calculation. RDKit: Open-source cheminformatics; version 2018.09.3; Marvin was used in the process of preparing the molecules for docking, Marvin 17.21.0, ChemAxon ( OpenBabel (http:// was used to switch between molecular file formats. DOCKovalent (London et al., 2014) was used for virtual covalent docking. The Covalentizer code is available at

Drug-induced phospholipidosis confounds drug repurposing for SARS-CoV-2

An interesting open access paper in Science "Drug-induced phospholipidosis confounds drug repurposing for SARS-CoV-2" DOI points a potential flaw in interpreting in vitro data.

Repurposing drugs as treatments for COVID-19 has drawn much attention. Beginning with sigma receptor ligands, and expanding to other drugs from screening in the field, we became concerned that phospholipidosis was a shared mechanism underlying the antiviral activity of many repurposed drugs. For all of the 23 cationic amphiphilic drugs tested, including hydroxychloroquine, azithromycin, amiodarone, and four others already in clinical trials, phospholipidosis was monotonically correlated with antiviral efficacy. Conversely, drugs active against the same targets that did not induce phospholipidosis were not antiviral.

Phospholipidosis is well known phenomena for those involved in drug discovery, it is based on the physicochemical properties of the molecule and results in excessive accumulation of intracellular phospholipids in tissues, such as the liver, kidney and lung. The resulting accumulation can lead to liver, kidney, or respiratory failure.

Drug-induced phospholipidosis can be determined by measuring the accumulation of specific fluorescent probes in HepG2 cells or primary hepatocytes.

There is a more detailed explanation of the consequences here, "A Strategy for Risk Management of Drug-Induced Phospholipidosis" DOI.

6th RSC-BMCS Symposium on Mastering MedChem

The 6th RSC-BMCS symposium on mastering medicinal chemistry, a virtual event Tuesday & Wednesday, 29th & 30th June 2021 (two afternoon sessions).

This is a fantastic opportunity, two days of high quality presentations, the best way to learn MedChem is to listen to Case Histories. Details are here

You can register here


LifeArc funding Project Moonshot

The independent medical charity LifeArc is to support an international effort to rapidly develop a potential antiviral candidate for COVID-19.

The grant will be entirely dedicated to the COVID Moonshot initiative, which PostEra jointly leads alongside leading scientists from Diamond Light Source, Oxford University, Weizmann Institute, Memorial Sloan Kettering Cancer Center and MedChemica.

On the Antibacterial Action of Cultures of a Penicillium

May 10th 1929 is a very important day in drug discovery research, it was on this day that Alexander Fleming submitted his paper entitled "On the Antibacterial Action of Cultures of a Penicillium, with Special Reference to their Use in the Isolation of B. influenzæ" Br J Exp Pathol. 1929 Jun; 10(3): 226–236.


Over 90 years later this chance discovery still has a major impact on health today. Whilst isolation proved too challenging for Fleming he sent his Penicillium mold to anyone who requested it in hopes that they might isolate penicillin. It was only in 1940 that Howard Florey and team published the isolation and purification of Penicillin. Penicillin as a chemotherapeutic agent. Lancet. 1940;236:226–8. 10.1016/S0140-6736(01)08728-1.


The basic structure of the beta-lactam ring is shown below.


The beta lactam antibiotics (Penicillin and Cephalosporins) are a very well studied class of therapeutic agent, the mechanism of action is the inhibition of cell wall synthesis. Penicillin inhibits the formation of peptidoglycan cross-links in the bacterial cell wall; this is achieved through reaction of the β-lactam ring of penicillin to the enzyme DD-transpeptidase. As a consequence, DD-transpeptidase cannot catalyze formation of the cell wall cross-links.


Predicting sites of metabolism

I've updated the predicting sites of metabolism page