Fragment-Based Screening Reported Hits
Whilst there are a variety of techniques to measure the properties or diversity of fragment libraries I thought it might be interesting to look at the profiles of compounds that actually appear as hits in fragment-based screening campaigns. I’ve been compiling a database of compounds that have been reported as hits in the literature, this database now has over 1300 entries culled from over 260 publications directed at nearly 200 different molecular targets using 26 different detection technologies and might be expected to give some insight into the type of compounds that appear as hits. With the caveat that the dataset only includes information that has been published.
A profile of the physicochemical properties (HBD, HBA, PSA, HAC, LogP, LogD, MWt, RBC) was generated using an Applescript that uses evaluate from ChemAxon to calculate the physicochemical properties and Aabel to construct the histograms. I also used it to determine pKa in order to identify acidic or basic groups and categorized the fragments accordingly, in addition I calculated the fraction of aromatic atoms (number of aromatic atoms/number of heavy atoms) since there has been concern about the number of aromatic compounds in fragment collections. I’ve also included npri (Normalized ratio of principle moments of inertia) as described by Sauer WH, Schwarz MK (2003) Molecular shape diversity of combinatorial libraries: A prerequisite for broad bioactivity. J Chem Inf Comput Sci 43:987–10030. DOI this was calculated using MOE. I’ve recently added “Plane of best fit” (A Novel Method to Characterize the Three-Dimensionality of Molecules, Nicholas C. Firth, Nathan Brown, and Julian Blagg, Journal of Chemical Information and Modeling 2012 52 (10), 2516-252 DOI. )
The results are shown in the collection of histograms below, in general the fragments are within the rule of three guidelines. One notable observation is that around 40% of the hits contain an ionisable group (pKa predicted using ChemAxon tools) and this has an impact when comparing cLogP and cLogD. The molecular weight is almost invariably below 300 with the majority below 250, with the heavy atom count (HAC) 16 or lower. The lack of rotatable bonds (RBC) and the relative high Fraction Aromatic scores are consistent with the observation that many of the hits are substituted aromatic or heteroaromatic rings. Looking at the npri plot the compounds are clustered along the axis between rod-like and disc-like again supporting the view that the hits have limited 3D shape. The plane of best fit data also underlines the paucity of 3D structures.
Approximately 40% of the fragments are predicted to be ionised at physiological pH, this could be due to either/or the need for high solubility in the screen, or the ionised group providing a key interaction.This profile needs to be taken with a pinch of salt since you can only test what is available, however a comparison with the corresponding physicochemical profile of all available fragments (shown below) does highlight some interesting differences. Perhaps the most notable difference is that the proportion of ionisable groups in the published fragments is much higher than found in the available fragments, this could reflect the need for a strong interaction to aid identification of fragment binding or it could be that improved solubility enables screening at higher concentrations. In addition the published fragments tend be of lower molecular weight and have a greater proportion of aromatic atoms. The importance of "3D" character remains unproven since there are actually few 3D fragments available.
This profile needs to be taken with a pinch of salt since you can only test what is available, it will be interesting to see if the efforts to design fragments with more 3D structure changes the profile of the observed hits.
I also clustered the compounds (Using Morgan fingerprints) within MOE and the results are shown in the histogram below, whilst the majority are singletons there are a number of 2, or 4- membered clusters, although it has t be said that fragments will generate rather sparse fingerprints.
Functional Group Analysis
I used Checkmol to analyse the fragments for the presence of various functional groups and structural elements. Selected results are shown below.
- 1240/1304 contain an aromatic ring, 950 of which are heteroaromatic
- 266 contain an arylhalide, 134 contain a phenol
- 216 contain an acidic group, 236 a basic group
- 28 contain a nitro group
- 178 contain a hydroxy, 126 an ether
- 520 contain an amine, 231 “anilines” (mainly on heteroaromatic systems)
- 186 amides, 42 esters, 23 ureas
The abundance of halides may be a result of the screening technology with F NMR or Br in X-ray being highlighted.
A detailed analysis of the molecular interactions present found between ligands and macromolecules has been undertaken, "A systematic analysis of atomic protein–ligand interactions in the PDB" DOI looking at over 11,000 complexes the authors were able to categorise the 7 most common types of interaction and compare drug-like sized molecules with fragments. Interestingly Hydrogen bonds, π-stacking interactions and salt bridges are more common among the fragment ligands, which is consistent with the high proportion of aromatic rings and ionisable groups found in fragment hits.
Most Common Scaffolds
To identify the most common scaffolds I used sca.svl with MOE. The script finds all scaffold in a database, writes them to a separate database. Interestingly the top 10 scaffolds are all aromatic systems.
I’ve also looked to see which of the published fragments are in the commercial fragment collections, the results are shown in Table 1 below. It is clear that some collections contain many more of the published hits than others, I suspect that this reflects those vendors who were the first to provide solubility data.
A total of 26 different technologies have been employed and whilst NMR and X-ray have been the historical preferred technologies, thermal shift has become more common recently. It seems that on many occasions thermal shift or SPR are used for a rapid screen and then X-ray or NMR are used to confirm hits.
This is perhaps because these offer the best compromise between speed of throughput and protein requirements.
I’ve also captured the target description and Uniprot ID, affinity (how measured), PDB code and then added the target type/class, using ChEMBL ontology, as you might expect the majority of targets explored are Enzymes (mainly protease and kinase).
Interestingly there are a number of fragments that have been identified for multiple targets, and it is notable that many are active against seemingly unrelated proteins. Whilst there might be concern about promiscuous ligands, inspection of the available X-ray structures suggests these fragments form specific binding interactions.
Of the 166 fragment hits identified for kinase targets 49 are in the PDB and the majority appear to bind to the hinge region. A more detailed examination of the remaining kinase fragments is underway.
How well do they bind?
As you might expect fragments have modest affinity, this is not measured for all examples but the vast majority are in the range 1-1000 uM. A number of different technologies have been used to measure affinity but bioassay data is the most commonly reported.
Evidence from literature that different technologies can identify the same hits for a single target. No evidence that detection technology influences the physiochemical properties of the hits identified. Some technologies (e.g. SPR) are thought to have a higher false positive rate.
How Are Fragments Optimised?
A recent paper J. Med. Chem. 2013, 56, 2478−2486 DOI looked at the different ways that the initial fragments were subsequently optimised, looking at a variety of physicochemical properties and ligand efficiency indices, in particular they emphasise the need to use size independent ligand efficiency measures. Lipophilicity metrics, logP and LELP were found to be particularly useful as all clinical candidates arose from fragment-based programs were located in, or near to the logP 0−5 and LELP 0−10 space.
LELP = (log P / LE)
Several conclusions were then drawn.
Hit detection methods were found to affect hit quality and biochemical hits were found to be especially advantageous. Biochemical hits not only exhibit better affinity and lipophilicity indices over NMR and X-ray hits, but compounds optimized from them preserve their advantageous properties over those optimized primarily from NMR and partly from X-ray hits.
It appears that the use of atomic level structural information of the target (most often from X-ray but also from NMR) advantageously influences compound properties.
The dataset includes work from universities (18%), Large Pharma (45%) and small/medium pharma (37%), with the small medium pharma producing results that were superior to Large Pharma and universities, probably because protein structural information plays a critical role in the small/medium pharma companies.
Whilst getting fragment hits has now become a relatively predictable process A general consensus from scientists working in the area is that the key to successful prosecution of fragment-based drug design is the availability of structural information, so whilst there are a steadily increasing number of technologies available for the initial screening of fragments generating the protein crystal structure is still a critical and often rate-limiting step in the process. Diamond provide a tips and tricks page based on their (and SGC) experience Tips and tricks. It might also be worth reading "High-throughput production of human proteins for crystallization: The SGC experience" DOI and "Lessons from high-throughput protein crystallization screening: 10 years of practical experience" DOI.
Updated 11 November 2018