Cambridge MedChem Consulting

Hit Finding Strategies

“The single most important factor determining the likelihood of success of a project is the quality of the starting lead”, Anon

In an analysis of 156 published clinical candidates from the Journal of Medicinal Chemistry between 2018 and 2021 "An Analysis of Successful Hit-to-Clinical Candidate Pairs?" DOI the source of the initial hit was identified.. The results are shown in the plot below


They categories the lead sources into 6 categories depending on the initial hit-finding strategy.

The popularity of using known molecules as starting points might at first seem surprising but this will include examples where the aim is to reduce some unexpected off-target activity/toxicity or drug-drug interaction, respond to resistance due to mutations in the target protein, or combine two different biological activities into a single molecule. Interestingly despite much recent interest there appear to be few examples of a phenotypic screen used to find hits. The discovery of rusavir (MK-8408) a HCV NS5A inhibitor DOI is one example but It is worth noting this comment in the publication "The exact mechanism of NS5A inhibition remains unclear and is poorly understood".

The analysis also highlighted the distribution of target classes with Kinases (31%) being the most popular followed by other enzymes (28%) , GPCR (10%) and Ion Channels (5%). Emerging areas highlighted include protein-protein interactions and epigenetic targets both target areas include many with open shallow binding sites requiring lager molecules to achieve high affinity binding.

I calculated the physicochemical properties of both the hits and the clinical candidates.



An analysis of physicochemical properties on the hit-to-clinical pairs shows an average increase in molecular weight (ΔMW = +85) but little change in lipophilicity (ΔclogP = −0.3), although exceptions are noted. Interestingly the number containing ionisable groups has increased as has the number of HBD. The majority (>50%) of clinical candidates were found to be structurally very different from their starting point and were more complex.

This comparison of hit to drug pairs largely mirrors the results from an analysis of W. Sneader’s book “Drug Prototypes and their exploitation" DOI with data from 480 case histories shown below.


These trends appear to be continuing, looking at the properties of published drugs (we of course don’t know about all failures) since the publication of the rule of 5 paper, Molecular weight has continued to increase, cLogP appears to have plateaued around 4, whilst there is an increase in the number of HBA, there is only a marginal increase in HBD


Hit identification

The Hit confirmation phase is follows:

Building up a sample collection for High-throughput screening is a major undertaking and for a small company or academic group submitting a proposal to the European Lead Factory might be an attractive alternative. I've written a review of the ELF here.

There is an editorial in ACS Central Science DOI that I would encourage everyone involved in hit identification to read.

A couple of quotes will give you an idea of the content

Alarmingly, up to 80–100% of initial hits from screening can be artefacts if appropriate control experiments are not employed.

it is important to realise that no PAINS-containing drug has ever been developed starting from a protein-reactive PAINS target-based screening hit

They also emphasise the critical need for experimental validation for any screening hit.

Such validation experiments include classic dose response curves, lack of incubation effects, imperviousness to mild reductants, and specificity versus counter-screening targets. If a molecule is flagged as a potential PAINS or aggregator using published patterns but is well-behaved by these criteria, it may be a true, well-behaved ligand. Ultimately, genuine SAR combined with careful mechanistic study provides the most convincing evidence for a specific interaction. Covalent and spectroscopic interference molecules act via specific physical mechanisms, for which controls are known. Colloidal aggregation, fortunately, is readily identified by rapid mechanistic tests and by counter-screening.

In addition you need to consider compound identify and purity, reproducing the activity with an authentic sample is essential.

Whilst time-consuming this validation work will save a fortune in the future.

Updated 25 May 2023