Lipophilicity is possibly the most important physicochemical property of a potential drug, it plays a role in solubility, absorption, membrane penetration, plasma protein binding, distribution, CNS penetration and partitioning into other tissues or organs such as the liver and has an impact on the routes of clearance. It is important in ligand recognition, not only to the target protein but also CYP450 interactions, HERG binding, and PXR mediated enzyme induction.
LogP is a component of Lipinski’s Rule of 5 a rule of thumb to predict solubility and permeability that has become a surrogate for drug-likeness.
The Developability score DOI identifies four distinct cLog P/molecular weight regions that define optimal and sub-optimal chemical space, and a developability score derived from regression models using solubility, permeability, protein binding and 3A4 inhibition screening data. Whilst the sector MWt <400, cLogP <4 suggested the greatest chance of success, it was noted that even the MWt >400, cLogP >4 sector included some developable molecules albeit at a much lower chance of success.
The most commonly used measure of lipophilicity is LogP, this is the partition coefficient of a molecule between an aqueous and lipophilic phases, usually octanol and water.
Measurement of LogP can be undertaken in a variety of ways, the most common is the shake-flask method, which consists of dissolving some of the solute in question in a volume of octanol and water, shaking for a period of time, then measuring the concentration of the solute in each solvent. This can be time-consuming particularly if there is no quick spectroscopic method to measure the concentration of the molecule in the phases. A faster method of log P determination makes use of high-performance liquid chromatography. The log P of a solute can be determined by correlating its retention time with similar compounds with known log P value doi.
Calculation of lipophilicity
Usually it is not practical to experimentally determine the LogP of every compound made (and it may be of interest to calculate logP prior to synthesis) and so calculated results are used, there are a number of software tools available both desktop and online (don’t use for confidential compounds).
Many of these applications work by using a large training data-set of known values to determine fragment contributions for sub-structures and functional groups, however logP is not a simple additive property and correction terms are needed to allow for proximity effects, H-bonding, electronic effects etc. as shown in the examples below.
For unknown functional groups the programs often approximate using individual atom contributions.
The different methodologies to calculate logP can be divided into three different approaches.
Atomic (e.g. “AlogP”, ) & Enhanced Atomic / Hybrid (“XlogP”, “SlogP”)
Fragment (“ClogP”, KlogP, ACD/logP)
Property based methods (“MlogP”, “VlogP”, “MClogP”, “TlogP”)
Atomic logP considers that each atom has a contribution to the logP, and that the contributions to the final value is purely additive. However it is clear that a nitrogen in an amide is different to a nitrogen in an amine or pyridine, Enhanced Atomic takes into account the atom type.
Fragment methods use a large training data-set of known values to determine fragment contributions for sub-structures and functional groups, together with correction terms to account for proximity effects. These methods often fall back on atomic models for novel functional groups.
Property based methods tend to be computationally demanding and not really suitable for testing large datasets.
Because the training sets and the algorithms vary between applications it is very important not to combine calculated results using different tools.
Some of the tools allow the user to extend the training set using in house measured values, this may be critical when exploring novel functional groups or scaffolds.
However the majority of known drugs contain ionisable groups, as shown in the histogram below, this shows the distribution of small molecule drugs with DrugBank and are likely to be charged at physiological pH and LogP only correctly describes the partition coefficient of neutral (uncharged) molecules.
LogD the distribution constant is a better descriptor of the lipophilicity of a molecule. This can be determined in a similar manner to LogP but instead of using water, the aqueous phase is adjusted to a specific pH using a buffer. Log D is thus pH dependent, hence the one must specify the pH at which the log D was measured. Of particular interest is the log D at pH = 7.4 (the physiological pH of blood serum).
Applications like Marvin allow the user to calculate the logD but also display the pH distribution profile, as shown below for Warfarin.
For compounds with a pKa close to physiological pH it may be critical to consider what might actually be the predominant ionised form.
This can also be valuable when thinking about absorption from the different regions of the alimentary canal where the pH ranges from 1-3 in the stomach to 7-8 in the ileum.
The contributions of various functional groups to LogD has been explored "LogD contributions of substituents commonly used in medicinal chemistry" DOI, this study used matched molecular pairs analysis of experimental LogD values from several thousand compounds collected using the shake-flask method at pH = 7.4. They reported the average deltaLogD difference for particular molecular pairs and the results are shown below for the case where the functional group is at any position on the phenyl ring. I've also included the calculated LogD using Chemaxon software.
This is a useful table for comparing functional groups, in particular the last 11 entries compare the influence of various heterocycles have on LogD. These heterocycles are often used as bioisosteric replacements for a phenyl ring.
I thought it might be interesting to compare the LogD differences determined using the matched molecular pairs (deltaLogD_rad=0) with the values determined using the Chemaxon calculated LogD (deltacLogD). As you can see below there is a pretty good correspondence.
Remember Because the training sets and the algorithms vary between applications it is very important not to combine calculated results using different tools.
It is important to be circumspect that any improvement in binding affinity is not entirely driven by an increase in LogD, it is often useful to simply plot binding affinity versus LogD. The more interesting compound modifications are not necessarily those that give the greatest increase in affinity but can be those that give an increase affinity without a corresponding increase in lipophilicity. Looking at the table below there are a number of very high affinity compounds.
However if we plot IC50 versus LogP as shown below there is a very clear correlation between LogP and IC50, however one compound is clearly different. The 6-CN substituent gives an increase in affinity without a corresponding increase in LogP.
Lipophilicity is also an important component many of the off-target liabilities including plasma protein binding (especially albumin), HERG, CYP interactions, Transporters, have strong correlations with lipophilicity, and there have been a number of studies linking high logP to the likelihood of compounds failing in development as a result of poor ADMET (absorption, distribution, metabolism, excretion and toxicity) characteristics. In contrast it is clear often that a certain size and lipophilicity is required to achieve reasonable levels of affinity. Balancing these requirements is a key challenge in drug discovery and the suggestion is that chemists target the sweet spot MWt 250-500 and LogP 2-4. One consequence of this approach is the need to prioritise low molecular weight, less lipophilic compounds from screening. The initial focus of medicinal chemistry should be to select good-quality starting points and then to control shifts in physicochemical properties effectively during the optimization process.
Last Updated 12 January 2019