Hit Identification

“The single most important factor determining the likelihood of success of a project is the quality of the starting lead”, Anon

In an analysis of 156 published clinical candidates from the Journal of Medicinal Chemistry between 2018 and 2021 “An Analysis of Successful Hit-to-Clinical Candidate Pairs?” DOI the source of the initial hit was identified.. The results are shown in the plot below

They categories the lead sources into 6 categories depending on the initial hit-finding strategy.

Known, this might be the endogenous ligand or a molecule taken from the published literature or patents.
Random Screen, usually a high-throughput screen of a large compound collection.
Structure-based drug design (SBDD), in silico screening of compound collections, including the use of the target protein 3D structure.
Directed Screen, screening of smaller sets of compounds which are selected based on prior knowledge of the target or chemical class. Also known as focused, targeted or biased screening.
Fragment Screen, typically libraries with a few thousand compounds or less of low molecular weights (<200 Da), screened at high concentration
DNA encoded library, screening of very large collections (108) of small molecule compounds, using a technology that involves the conjugation of molecule to a DNA tag.

The popularity of using known molecules as starting points might at first seem surprising but this will include examples where the aim is to reduce some unexpected off-target activity/toxicity or drug-drug interaction, respond to resistance due to mutations in the target protein, or combine two different biological activities into a single molecule. Interestingly despite much recent interest there appear to be few examples of a phenotypic screen used to find hits. The discovery of rusavir (MK-8408) a HCV NS5A inhibitor DOI is one example but It is worth noting this comment in the publication “The exact mechanism of NS5A inhibition remains unclear and is poorly understood”.

The analysis also highlighted the distribution of target classes with Kinases (31%) being the most popular followed by other enzymes (28%) , GPCR (10%) and Ion Channels (5%). Emerging areas highlighted include protein-protein interactions and epigenetic targets both target areas include many with open shallow binding sites requiring lager molecules to achieve high affinity binding.

I calculated the physicochemical properties of both the hits and the clinical candidates.

An analysis of physicochemical properties on the hit-to-clinical pairs shows an average increase in molecular weight (ΔMW = +85) but little change in lipophilicity (ΔclogP = −0.3), although exceptions are noted. Interestingly the number containing ionisable groups has increased as has the number of HBD. The majority (>50%) of clinical candidates were found to be structurally very different from their starting point and were more complex.

This comparison of hit to drug pairs largely mirrors the results from an analysis of W. Sneader’s book “Drug Prototypes and their exploitation” DOI with data from 480 case histories shown below.

Whilst the rule-of-5 proposed by Chris Lipinski is a useful rule of thumb for physicochemical properties of “drug-like” molecules it is worth bearing in mind that the physicochemical properties of the molecule will change during the optimisation process. It has been proposed that high-throughput screening decks should be biased toward lower molecular weight and lipophilicity to allow for increased molecular weight and/or lipophilicity in the optimisation to a drug development candidate that are also drug-like. Hence the rule of five has been extended to the rule of three (RO3) for defining lead-like compounds.^[12]

A rule of three compliant compound is defined as one that has:

not more than 3 rotatable bonds
octanol-water partition coefficient log P not greater than 3
molecular mass less than 300 daltons
not more than 3 hydrogen bond donors
not more than 3 hydrogen bond acceptors

Hit identification

The Hit confirmation phase is follows:

Exclusion of hits with potential reactivity, assay interference or aggregation
Re-testing: compounds that were found active against the selected target are re-tested using the same assay conditions used during the HTS.
Dose response curve generation: an IC50 or EC50 value is then generated
Are related analogues available, check for genuine Structure-Activity Relationships
Check for irreversible binding
Orthogonal testing: Confirmed hits are assayed using a different assay which is usually closer to the target physiological condition or using a different technology.
Secondary screening: Confirmed hits are tested in a functional assay (agonist/antagonist) or in a cellular environment.
Assessment of drug-like properties using computational analysis and early physicochemical and ADME measurements
Chemical tractability: Medicinal chemists will evaluate compounds according to their synthesis feasibility and flexibilty towards chemical diversification or library synthesis.
Intellectual Property evaluation: Hit compound structures are quickly checked in specialized databases to define patentability and freedom to operate.
Hit ranking and clustering, preliminary SAR.

Building up a sample collection for High-throughput screening is a major undertaking and for a small company or academic group and might not be a cost-effective investment if they are only running 1 screen a year. Many CROs now offer high-throughput screening but it is worth evaluating their screening deck beforehand. Fragment-based screening may be an attractive alternative.

The are also opportunities for grant-funding screening, for example to https://www.find-government-grants.service.gov.uk/grants/small-molecule-high-throughput-screen-using-astrazeneca-facilities-grant-1. There is also the EU-OPENSCREEN https://www.eu-openscreen.eu.

There is an editorial in ACS Central Science DOI that I would encourage everyone involved in hit identification to read.

A couple of quotes will give you an idea of the content

Alarmingly, up to 80–100% of initial hits from screening can be artefacts if appropriate control experiments are not employed.

it is important to realise that no PAINS-containing drug has ever been developed starting from a protein-reactive PAINS target-based screening hit

They also emphasise the critical need for experimental validation for any screening hit.

Such validation experiments include classic dose response curves, lack of incubation effects, imperviousness to mild reductants, and specificity versus counter-screening targets. If a molecule is flagged as a potential PAINS or aggregator using published patterns but is well-behaved by these criteria, it may be a true, well-behaved ligand. Ultimately, genuine SAR combined with careful mechanistic study provides the most convincing evidence for a specific interaction. Covalent and spectroscopic interference molecules act via specific physical mechanisms, for which controls are known. Colloidal aggregation, fortunately, is readily identified by rapid mechanistic tests and by counter-screening.

In addition you need to consider compound identify and purity, reproducing the activity with an authentic sample is essential.

Whilst time-consuming this validation work will save a fortune in the future.

April 2, 2026

Cambridge MedChem Consulting

Navigation