AlphaFold predicts structure of almost every catalogued protein known to science
A little over a year ago I highlighted the AlphaFold Protein Structure Database in which AlphaFold DB provided open access to protein structure predictions for the human proteome and 20 other key organisms to accelerate scientific research. Well things have moved on.
DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have made AI-powered predictions of the three-dimensional structures of nearly all catalogued proteins known to science freely and openly available to the scientific community, via the AlphaFold Protein Structure Database.
The database is being expanded by approximately 200 times, from nearly 1 million protein structures to over 200 million, covering almost every organism on Earth that has had its genome sequenced. The expansion of the database includes predicted structures for a wide range of species, including plants, bacteria, animals, and other organisms.
The full dataset of all predictions is available at no cost and under a CC-BY-4.0 licence from Google Cloud Public Datasets. We've grouped this by single-species for ease of downloading subsets or all of the data. We suggest that you only download the full dataset if you need to process all the data with local computing resources (the size of the dataset is 23 TiB, ~1M tar files).
Downloads can be found here https://alphafold.ebi.ac.uk/download#full-dataset-section.
It is worth noting that AlphaFold2 is not the only protein structure prediction tool available, there is also RoseTTAFold, OpenFold, and FastFold.
The details of the latest Critical Assessment of Structure Prediction (CASP) experiment to determine and advance the state of the art in modeling biomolecular structures have been published https://predictioncenter.org/casp15/index.cgi.
The core of CASP remains the same: blind testing of methods with independent assessment against experiment to establish the state-of-art in modeling proteins and protein complexes. CASP15 will include following categories.
- Single Protein and Domain Modeling As in previous CASPs, the accuracy of single proteins and where appropriate single protein domains will be assessed, using the established metrics. Two changes will be the elimination of the distinction between template-based and template-free modeling, and an emphasis on the fine-grained accuracy of models, such as local main chain motifs and side chains. Because of the high accuracy of the new modeling methods, we expect assessment against high resolution experimental structures will be most informative.
- Assembly As in recent CASPs, the ability of current methods to correctly model domain-domain, subunit-subunit, and protein-protein interactions will be assessed. We will again work in close collaboration with our CAPRI partners. Because of the promising deep learning results reported so far, substantial progress is expected.
- Accuracy Estimation Members of the community will be invited to submit accuracy estimates for multimeric complexes and inter-subunit interfaces. There will no longer be a category for estimating the accuracy of single protein models, since it has become clear these cannot compete with modeling method specific estimates. Instead, there will be increased emphasis on assessment of self-reported accuracy estimates at the atomic level. Note the units will now be pLDDT, not Angstroms.
- RNA structures and complexes There will be a pilot experiment to assess the accuracy of modeling for RNA models and protein-RNA complexes. The assessment will be done in collaboration with the RNA-Puzzles and Marta Szachniuk's group in Poznan.
- Protein-ligand complexes Subject to the availability of adequate resources, there will also be a pilot experiment in this area. Deep-learning is already having an impact here, and there is high interest because of the relevance to drug design.
- Data Assisted As in recent CASPs, there will be assessment of the extent to which the accuracy of models can be increased by the provision of sparse data, particularly that provided by SAXS and mass spectroscopy/chemical crosslinking. Only targets where these low-resolution data are likely to be useful will be considered, that is, large single proteins and complexes. As previously, we will work with collaborators to obtain the necessary experimental data. Targets will initially be released without the experimental data, followed by a second round of prediction including those data.
- Protein conformational ensembles Following the success of deep-learning methods for single structures, it is increasingly important to assess methods for predicting structure ensembles. This is a huge area, ranging from the many conformations of disordered regions to the small number of conformations that may be involved in allosteric transitions and enzyme excited states to local protein dynamics. While it is clear that deep learning and other methods have the potential to generate ensembles in some circumstances, the difficulty is in finding cases where there are sufficiently accurate and extensive experimental data to allow rigorous assessment. One promising avenue is modeling sets of conformations in regions of cryo-EM structures where there is evidence of local conformational heterogeneity. If suitable cases arise, we will present these as a special type of sub-target. First requesting conformational ensembles that will be evaluated against the election density map and then in a possible second stage providing the map for data assisted ensemble prediction. A second possibility is for cases where detailed NMR data have already established the structure of two or more conformations. We have a good lead for a few targets of this type. In addition to this, we are considering a non-blind experiment (a departure from normal CASP practice), where we will first ask those interested to reproduce the known conformations. We will also ask participants to identify any additional conformations that appear to be present. It may then be possible to test these against existing or new experimental data.
Details of the targets will be made available over the next week https://predictioncenter.org/casp15/targetlist.cgi.
AI4Proteins videos now online
On June 16/17 2021 RSC CICAG and AI3D held a joint meeting on Protein Structure Prediction. The full lineup of speakers, titles and abstracts can be found here.
Session 1: Session Chair: Professor Jeremy Frey (University of Southampton)
An AI solution to the protein folding problem: what is it, how did it happen, and some implications Professor John Moult (University of Maryland)
Session 2: Session Chair: Dr Melanie Vollmar (Diamond)
So you predicted a protein structure – What now? Dr Thomas Steinbrecher (Schrödinger)
Deep Learning enhanced prediction of protein structure and dynamics Dr Martina Audagnotto (AstraZeneca)
Fireflies-Lévy Flights algorithm for peptides conformational optimization Dr Zied Hosni (University of Sheffield)
Session 3: Session Chair: Dr Chris Swain (Cambridge MedChem Consulting)
How good are protein structure prediction methods at predicting folding pathways? Mr Carlos Outeiral Rubiera (University of Oxford)
Protein-Ligand Structure Prediction for GPCR Drug Design Dr Chris De Graaf (Sosei Heptares)
Session 4: Session Chair: Dr Márton Vass
Using icospherical input data in machine learning on the protein-binding problem Dr Ella Gale (University of Bristol)
Biological sequence design with machine learning Professor Debora Marks (Harvard University)
Session 5: Session Chair: Dr Simone Fulle (Novo Nordisk)
Lessons learned from generative models of biological sequences Professor Aleksej Zelezniak (Chalmers University of Technology)
DeepDock: a deep learning approach to predict ligand binding conformations Dr Oscar Méndez-Lucio (Janssen Pharmaceuticals)
Finding new in silico-based therapeutic strategies for IAHSP Dr Matteo Rossi Sebastiano (University of Turin)
Session 6: Session Chair: Professor Jonathan Goodman (University of Cambridge)
Designing molecular models by machine learning and experimental data Professor Cecilia Clementi (Freie Universität Berlin)
The “almost druggable” genome Professor Tudor Oprea (University of New Mexico)
Session 7: Session Chair: Dr Lucy Colwell (University of Cambridge)
General Effects of AI on Drug Discovery Dr Derek Lowe (Novartis)
Open Access Data: A Cornerstone for Artificial Intelligence Approaches to Protein Structure Prediction Professor Stephen Burley (RCSB PDB, Rutgers University, UCSD)
The videos of the presentations are now available on YouTube and you can access the playlist here https://www.youtube.com/playlist?list=PLBQwbn0mPhvWyTLnN6eFsbIwb5FByrs.
For those wanting a hype free insight into the impact AI might make on Drug Discovery then the presentation by Derek Lowe is well worth watching.
AI3SD Online Guest Lecture Series
Artificial Intelligence and Augmented Intelligence for Automated Investigations for Scientific Discovery (AI3SD) are running an Online Guest Lecture Series this summer. The full seminar list is here.
If you missed a presentation or want to replay it, all the presentations are on the AI3SD YouTube channel.
COVID-19 Open Research Dataset Challenge (CORD-19)
There are a number of COVID-19 Kaggle challenges open at the moment, https://www.kaggle.com/datasets?search=COVID.
One of the more recent is:-
COVID-19 Open Research Dataset Challenge (CORD-19)
There is a large body of research and literature continuously evolving around COVID-19. Help the research community and global organizations better digest this to answer key questions."
In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.
You can read more about it here
Discovery of novel antibiotic Halicin using deep learning
A recent paper has caught a lot of attention recently "A Deep Learning Approach to Antibiotic Discovery" DOI from Regina Barzilay's group at MIT. They used a deep neural network model to predict growth inhibition of Escherichia coli using a collection of 2,335 molecules, the molecules were described using Morgan fingerprints, computed using RDKit, for each molecule using a radius of 2 and 2048-bit fingerprint vectors. Using this methodology they identified the known c-Jun N-terminal kinase inhibitor SU3327 which they renamed Halicin. A quick search using MolSeeker allowed identification of the structure and inChiKey.
A search of UniChem using the InChikey NQQBNZBOOHHVQP-UHFFFAOYSA-N identified a number of other identifiers in different databases.
Including a link to the ChEMBL entry CHEMBL510038 giving the biological data 0.7 nM Inhibition of c-Jun N-terminal kinase by time-resolved FRET assay, and links to the original 2009 publication DOI describing the c-JNK SAR. The compound has a rat half-life of 0.45 h. There is another publication that might be of interest describing "Discovery of 2-(5-nitrothiazol-2-ylthio)benzo[d]thiazoles as novel c-Jun N-terminal kinase inhibitors" DOI.
Certainly an interesting approach, I suspect the nitrothiazole functionality would set off a few structural alerts but there are certainly of plenty of similar compounds commercially available that would allow exploration of the SAR without too much investment in resources.
All code and data is available on GitHub and there is also a website where you can test your own molecules http://chemprop.csail.mit.edu.
I just thought I'd mention a couple of meetings I'm helping to organise.
2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry
Artificial Intelligence is presently experiencing a renaissance in development of new methods and practical applications to ongoing challenges in Chemistry. Following the success of the inaugural “Artificial Intelligence in Chemistry” meeting in 2018 a second meeting has been organised at Fitzwilliam College, Cambridge (2nd to 3rd September 2019). The lineup is now finalised and looks like a great selection of speakers. There is still time to submit posters (closing date 5th July).
Registration is open and there are discounts for RSC members.
The Twitter hashtag - #AIChem19 is already being actively used.
20th SCI/RSC Medicinal Chemistry Symposium
This is Europe’s premier biennial Medicinal Chemistry event, focussing on first disclosures and new strategies in Medicinal Chemistry. It takes place a Churchill College, Cambridge UK, 8 September - 11 September 2019. There is a fantastic lineup of speakers and looks to be one of the highlights of the MedChem calendar. Early career scientists can also take part in a Medicinal chemistry workshop on the Sunday afternoon, a great way for people to learn medicinal chemistry and meet other scientists in a fun and informal setting.
You can register here both RSC and SCI members get a reduced rate, and despite the slightly confusing page on the SCI website you don't have to be a member to attend, just select "Event Member FREE from the dropdown menu and you can register for the event without membership.
Twenty Years of the Rule of Five
It has been over twenty years since Lipinski published his work determining the properties of drug molecules associated with good solubility and permeability. Since then, there have been a number of additions and expansions to these “rules”. There has also been keen interest in the application of these guidelines in the drug discovery process and how these apply to new emerging chemical structures such as macrocycles.
This meeting aims to have a look at the impact the Ro5 has had on drug discovery and as well as looking to the future and how we use these rules in the changing drug compound landscape as drug discovery moves into novel areas of chemistry.
There is a very exciting group of speakers and the timetable has been designed to allow a panel discussion after each session. Given the topic and the speakers I'm sure these will be entertaining sessions.
You can register here and there are discounts for RSC members
Twitter hashtag - #RuleofFive2019
2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry
The lineup for the 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Monday-Tuesday, 2nd to 3rd September 2019 Fitzwilliam College, Cambridge, UK has been updated.
Artificial Intelligence is presently experiencing a renaissance in development of new methods and practical applications to ongoing challenges in Chemistry. Following the success of the inaugural “Artificial Intelligence in Chemistry” meeting in 2018, we are pleased to announce that the Biological & Medicinal Chemistry Sector (BMCS) and Chemical Information & Computer Applications Group (CICAG) of the Royal Society of Chemistry are once again organising a conference to present the current efforts in applying these new methods. The meeting will be held over two days and will combine aspects of artificial intelligence and deep machine learning methods to applications in chemistry.
Deep learning applied to ligand-based de novo design: a real-life lead optimization case study, Quentin Perron, IKTOS, USA
A Turing test for molecular generators, Jacob Bush, GlaxoSmithKline, UK
Presentation title to be confirmed, Keynote: Regina Barzilay, Massachusetts Institute of Technology, USA
Artificial intelligence for predicting molecular Electrostatic Potentials (ESPs): a step towards developing ESP-guided knowledge-based scoring functions, Prakash Rathi, Astex Pharmaceuticals, UK
Molecular transformer for chemical reaction prediction and uncertainty estimation, Alpha Lee, University of Cambridge, UK
Drug discovery disrupted - quantum physics meets machine learning, Noor Shaker, GTN, UK
Presentation title to be confirmed, Christian Tyrchan, AstraZeneca,
Presentation title to be confirmed, Anthony Nicholls, OpenEye Scientific Software, USA
Deep generative models for 3D compound design from fragment screens, Fergus Imrie, University of Oxford, UK
DeeplyTough: learning to structurally compare protein binding sites, Joshua Meyers, BenevolentAI, UK
Presentation title to be confirmed, Maciej Haranczyk, IMDEA, Spain
Deep learning for drug discovery, Keynote: David Koes, University of Pittsburgh, USA
Presentation title to be confirmed, Olexandr Isayev, University of North Carolina at Chapel Hill, USA
Dreaming functional molecules with generative ML models, Christoph Kreisbeck, Kebotix, USA
Presentation title to be confirmed, Keynote: Adrian Roitberg, University of Florida, USA
Applications for poster presentations are welcomed, the closing date for submission is 5th July. A number of RSC-BMCS and RSC-CICAG student bursaries are available up to a value of £250, to support registration, travel and accommodation costs for PhD and post-doctoral applicants studying at European academic institutions. The closing date for bursary applications is 15th July.
Full details are on the conference website
Atomwise AIMS awards
I suspect many will have noticed the recent announcement of the Early Results in Drug Discovery Partnership with AI Biotech Company. These are the first results of the Atomwise AIMS awards:
The researchers have been using Atomwise’s AI-powered in silico screening technology to develop therapeutic treatments for, among others, certain types of strokes, hand-foot-and-mouth disease, and an infection that causes reproductive failure in pigs.
The AIMS award program is a great opportunity for university research scientists to easily access AI-assisted structure-based virtual screening technology:
- Customized small molecule virtual screen using AtomNet™ technology
- 72 small molecules predicted to bind to a specific target protein – QC verified by mass spectrophotometry, resuspended and diluted to a convenient concentration, aliquoted into microtiter plates, and delivered at no cost to the researcher
- Support from Atomwise’s medicinal chemists and structural biologists
- Opportunity to receive up to $30K USD to subsidize assay work
If you have a target protein with an X-ray crystal, Cryo-EM, or NMR structure, or with close sequence homology to a protein with available structures, and an assay in place to evaluate 72 potential hits, then you should consider applying.
Full details are on the AIMs awards page and the closing date is 29 April 2019.
Encouraging early results for the drug delaying onset of Motor Neurone discovered by artificial intelligence
Motor neurone disease (MND) describes a group of diseases that affect the nerves (motor neurones) in the brain and spinal cord, is is likely that there are multiple molecular targets. Amyotrophic lateral sclerosis (ALS) also known as Lou Gehrig's disease is the most common form of MND. Edaravone was recently approved for the treatment of ALS but the mechanism is unknown. It is a free radical scavenger and oxidative stress has been hypothesised to be part of the process that kills neurones in people with ALS. However new treatments are urgently needed.
For this reason I was particularly interested to read about a potential novel treatment for ALS arising from work between Benevolnet.ai and Sheffield Institute for Translational Neuroscience.
The study, led by Dr. Richard Mead and Dr. Laura Ferraiuolo at SITraN, assessed the efficacy of a drug candidate proposed by BenevolentAI's artificial Intelligence technology for Motor Neuron Disease (MND), also known as Amyotrophic Lateral Sclerosis (ALS). SITraN found there are significant and reproducible indications that the drug prevents the death of motor neurones in patient cell models, and delayed the onset of the disease in the gold standard model of ALS…Dr. Richard Mead of SITraN commented: "This is an exciting development in our research for a treatment for ALS. BenevolentAI came to us with some newly identified compounds discovered by their technology - two of which were new to us in the field and, following this research, are now looking very promising. Our plan now is to conduct further detailed testing and continue to quickly progress towards a potential treatment for ALS."
SITraN expect to publish an abstract at the Motor Neurone Disease Association 28th International Symposium in Boston in December 2017.