Protein-ligand binding

Protein-ligand binding#

Introduction#

In the following simple mechanism, $A$ is the protein, $B$ the ligand, and $AB$ the protein-ligand complex:

\[ A + B \underset{k_{-1}}{\stackrel{k_{+1}}{\rightleftharpoons}} AB . \]

Where $k_{+1}$ (in $\mu$$M^{-1}$ $s^{-1}$) and $k_{−1}$ (in $s^{-1}$) are rate constants for the forward and reverse reactions between $A$ and $B$.

In this notebook, we will determine the equilibrium binding constant, $K_d = \frac{[A]_{eq}[B]_{eq}}{[AB]_{eq}}$ (in $M$) in different conditions:

Weak binding: $[A]_0 \ll K_d$ and $[B]_0 \gg [A]_0$
Tight binding: $[A]_0 > K_d$

We will also investigate how to study protein-ligand binding using thermal shift assay data.

Convention#

We will use the convention that $[A]$ is the concentration of free $A$, $[A]_0$ is the initial (or total) concentration of $A$ before any binding or conversion occurs, and $[A]_{eq}$ is the concentration of $A$ at equilibrium.

$K_d$#

The $K_d$ reports the extent of binding that occurs between two biomolecules, e.g. a protein $A$ and a ligand $B$. Using the $K_d$, $[B]_0$, and $[A]_0$ we can estimate how much protein-ligand complex $AB$ is formed. It is important to realize that the amount of complex formation does not depend on excess of $[B]_0$ over $[A]_0$, but on how much $[B]_0$ exceeds the $K_d$!

The $K_d$ is a measurement of affinity between two molecules. A low $K_d$ value indicates that the two molecules have a high affinity for each other: a low concentration of the ligand is needed to saturate the protein. A high $K_d$ value indicates that the two molecules have a low affinity for each other: a high concentration of the ligand is needed to saturate the protein.

Experimental design#

Several methods exist to quantitatively measure binding affinity, including

fluorescence intensity
fluorescence anisotropy or fluorescence polarization
isothermal titration calorimetry (ITC)
optical biosensors
chemical shift perturbation (CSP) experiments in solution nuclear magnetic resonance (NMR) spectroscopy
electrophoretic mobility shift assay (EMSA)
thermal shift assay (TSA) or differential scanning fluorimetry (DSF)
…

For all methods:

we perform a titration of one binding partner (= the titrant, usually the ligand $B$) against a constant concentration of the other binding partner (= the analyte, usually the protein $A$). In the titration, $[A]_0$ is fixed and increasing amounts of $[B]_0$ are added and the formation of $[AB]_{eq}$ is followed via e.g. a change in fluorescence intensity.
we plot our experimental values as a function of titrant concentration
we fit the data to an appropriate equilibrium binding equation

Data#

Get the data needed for this exercise here.

The spreadsheet “ProteinLigandBinding.xlsx” contains three sheets (see figure below), make sure to read in the correct sheet.

Weak binding conditions#

Background#

We are provided with data which describe the fluorescence from component $A$ (at 0.5 $\mu$$M$, $[A]_0 \ll K_d$) on addition of a large excess of component $B$ ($[B]_0 \gg [A]_0$), which is non-fluorescent. A schematic (based on a published phosphate biosensor by Brune et al, see here) is shown below.

The increase in fluorescence indicates that a complex is formed between $A$ and $B$:

\[ A + B \rightleftharpoons AB . \]

Such data allow calculation of the equilibrium dissociation constant, $K_d$. Because $B$ is in large excess (> 5 $\mu$$M$), the amount which is bound to $A$ is very small and can be neglected. This means that $[B] = [B]_0 - [AB] \sim [B]_0$ at all times throughout the titration. If the stoichiometry is 1:1, the equilibrium dissociation constant, $K_d$ is

\[ K_{d} = \frac{[A]_{eq}[B]_{eq}}{[AB]_{eq}} = \frac{([A]_0 - [AB]_{eq})[B]_0}{[AB]_{eq}} . \]

Rearranging gives

\[ [AB]_{eq} = \frac{[A]_0[B]_0}{K_d + [B]_0} , \]

which is the equation of a hyperbola.

In this titration we do not measure $[A]_{eq}$ and $[AB]_{eq}$ directly, but assume the fluorescence value provides a proportional measure of concentration. Let’s call $F_{obs}$ the observed fluorescence for a particular value of $[B]_0$ and $F_{min}$ and $F_{max}$ the fluorescence values observed at the start and end of the titration, respectively. At the start of the titration, when $[B]_0 = 0, [AB]_{eq} = 0$, the start fluorescence ($F_{min}$) corresponds to $[A]_{eq} = [A]_0$. When $[A]_0 \ll K_d$ and $[B]_0 \gg [A]_0$, at the end of the titration, all the original $A$ is bound to $B$ and $[A]_{eq} \sim 0$ so that the end fluorescence ($F_{max}$) corresponds to $[AB]_{eq} \sim [A]_0$. We can write this as

\[ F_{obs} = F_{min}\frac{[A]_{eq}}{[A]_0} + F_{max}\frac{[AB]_{eq}} {[A]_0} . \]

Using $[A]_{eq} = [A]_0 - [AB]_{eq}$ and rearranging gives

\[ F_{obs} = F_{min} + (F_{max} - F_{min})\frac{[AB]_{eq}}{[A]_0} , \]

where the term $F_{max} - F_{min}$ is the amplitude of the fluorescence change. Combining with the hyperbola equation above gives

\[ F_{obs} = F_{min} + (F_{max} - F_{min})\frac{[B]_0}{K_d + [B]_0} . \]

Importing the libraries#

#Import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.optimize import curve_fit

Reading and plotting the data#

#Read the Excel file
dfWB = pd.read_excel ('../data/ProteinLigandBinding.xlsx',   #create a pandas DataFrame from the filename with file path and extension shown
                     sheet_name=0,   #use the first sheet
                     header=0)   #our data has column names

print(dfWB)   #print the DataFrame created

    [B]0 (µM)       Fobs
         0  50.309803
         5  52.311181
        10  53.483983
        15  55.288104
        20  56.325422
        25  56.552537
        30  57.144985
        35  57.813831
        40  58.260262
        45  57.947102
       50  58.289883
       55  59.189086
       60  59.297183
       65  58.707448
       70  59.406282
       75  59.806176
       80  59.360657
       85  60.091209
       90  60.297371
       95  60.123426
      100  60.303981
      105  60.118759
      110  60.483033
      115  60.754384
      120  60.522774
      125  60.770736
      130  60.450432
      135  60.183150
      140  60.287399
      145  61.124269
      150  60.950916
      155  60.717647
      160  60.439923
      165  60.602280

Let’s have a look at the relationship between $[B]_0$ and $F_{obs}$. We are going to plot $F_{obs}$ against $[B]_0$.

#Plot the data
plt.figure(figsize=(7,5))    #start a figure object

plt.plot(dfWB['[B]0 (µM)'], dfWB['Fobs'],   #plot a set of x (= [B]0), y (= Fobs) data points
         'o', color='gray', markersize=8, linestyle='None')   #use a round, gray marker with size 8; use no line

plt.title('Fluorescence titration', fontsize=22)   #title of graph
plt.xlabel('$[B]_0$ ($\\mu$M)', fontsize=18)   #X-axis label
plt.ylabel('$F_{obs}$', fontsize=18)   #Y-axis label
plt.axis([0, 170, 50, 62])   #axis boundaries, in this case from 0 to 170 for the X-axis and 50 to 62 for the Y-axis

plt.show()   #show the figure object

../_images/3407904c31c25e731e7701c134dcf00620e2d98632cf9d1b83a7ca0a1bf34faa.png

Fitting and reporting the data: $K_d$ and the signal change#

We first specify a parametrized model function (using def) to explain our data. The function takes as input 1 independent variable ($x = [B]_0$) and 3 parameters ($F_{min}$, $F_{max}$, and $K_d$):

\[ F_{obs} = F_{min} + (F_{max} - F_{min})\frac{[B]_0}{K_d + [B]_0} . \]

We then use the curve_fit command from SciPy to use non-linear least squares to fit our function to our data. The target function is the first argument, the independent variable the second argument, the dependent variable the third argument and the start values for the parameters the fourth argument. To find an initial guess for the parameters:

Look at the plotted fluorescence data and estimate the values of $F_{min}$ (i.e. start fluorescence) and $F_{max}$ (i.e. the end fluorescence).
Estimate the equilibrium dissociation constant $K_d$ from the concentration of $[B]_0$ which gives about 50% of the change in fluorescence.

Protein-ligand binding assay - initial guesses for parameters

The function returns the optimal values for the parameters and the covariance matrix. This covariance matrix can be used to calculate standard errors for the fit parameters.

#Define the protein-ligand binding function (weak binding)
def funcWeakBinding(x, Fmin, Fmax, Kd):
    return Fmin + (Fmax - Fmin) * ((x) / (Kd + x))

#Create a list of substrate concentrations as input for the function
Bin=np.linspace(0, 200, 100)

#Test the initial guesses for the fitting parameters
plt.figure(figsize=(7,5))   #start a figure object

plt.plot(dfWB['[B]0 (µM)'], dfWB['Fobs'],   #plot a set of x (= [B]0), y (= Fobs) data points
         'o', color='gray', markersize=8, linestyle='None',   #use a round, gray marker with size 8; use no line
         label='Data')   #add a legend label

plt.plot(Bin, funcWeakBinding(Bin, 50, 61, 20), #Test my guesses for the initial parameters. Use the generated [B]0 array between 0 and 200 as x-values.
         color="red", linestyle='solid',   #use red, solid line
         label='Guesses')   #add a legend label

plt.title('Fluorescence titration', fontsize=22)   #title of graph
plt.xlabel('$[B]_0$ ($\\mu$M)', fontsize=18)   #X-axis label
plt.ylabel('$F_{obs}$', fontsize=18)   #Y-axis label
plt.axis([0, 170, 50, 62])   #axis boundaries, in this case from 0 to 170 for the X-axis and 50 to 62 for the Y-axis
plt.legend()   #add a legend

plt.show()   #show the figure object

../_images/6065ac710d5cb6929b4dced33c89412277f1b8253a7125920e57d6eba2f51040.png

#Fit the data with the weak binding curve
paramsWB, params_covarianceWB = curve_fit(funcWeakBinding,   #the function we try to fit to the data
                                          dfWB['[B]0 (µM)'],   #the x values
                                          dfWB['Fobs'],   #the y values
                                          [50, 61, 20])   #my guesses for the initial parameters

#Report the fitted Kd parameter with standard error
print("Kd = ", paramsWB[2], "±", np.sqrt(np.diag(params_covarianceWB))[2])

Kd =  21.700795101327582 ± 1.6910805977378534

The fitted $K_d$ is 22 $\pm$ 2 $\mu$$M$. Now plot the resulting curve on the data and check the residuals to see if the fitting procedure worked well.

#Calculate the residuals
resid = dfWB['Fobs'] - funcWeakBinding(dfWB['[B]0 (µM)'], *paramsWB)   #calculate the residuals, the star in *paramsWB unpacks the array so the optimized parameter values become the arguments (after the x-values) to the function

#Produce a combined graph
fig = plt.figure()   #to create a figure object

xlimits = [0, 170]   #to make sure we use the same of the X-axis boundaries for both plots

ax1 = fig.add_axes([0.1, 0.53, 1, 0.81])   #to specify the coordinates, width and height of the top plot

ax2 = fig.add_axes([0.1, 0.1, 1, 0.4])   #to specify the coordinates, width and height of the bottom plot

ax1.plot(dfWB['[B]0 (µM)'], dfWB['Fobs'],   #plot a set of x (= [B]0), y (= Fobs) data points
         marker='o', color='gray', markersize=8, linestyle='None',   #use a round, gray marker with size 8; use no line
         label='Data')   #add a legend label

ax1.plot(Bin, funcWeakBinding(Bin, *paramsWB),   #Add the fitted line to plot. Use the generated [B]0 array between 0 and 200 as x-values. Use the fitted parameters to calculate the y-values. The star in *paramsWB unpacks the array so the optimized parameter values become the arguments (after the x-values) to the function.
         color="gray", linestyle='-',   #use a gray, solid line
         label='Fit')   #add a legend label

ax1.axis(xlimits + [50, 62])   #sets the X-axis and Y-axis boundaries for the top plot
ax1.tick_params(axis='x', bottom=False, labelbottom=False)   #removes the ticks and tick labels on the X-axis for the top plot
ax1.set_ylabel('$F_{obs}$')   #adds Y-axis title for the top plot
ax1.legend(loc='lower right')   #include legend

ax2.plot(dfWB['[B]0 (µM)'], resid,   #plot a set of x (= [B]0), y (= the residuals) data points
         marker='o', color='gray', linestyle='None', markersize=8)   #use gray datapoints size 8 and no line

ax2.axhline(0, color='gray', linestyle="--")   #adds a horizontal line at y=0
ax2.axis(xlimits + [-1, 1])   #sets the X-axis and Y-axis boundaries (needs some trial and error to find the right ones!) for the bottom plot
ax2.set_xlabel('$[B]_0$ ($\\mu$M)')   #adds X-axis title for the bottom plot, which is the same for the top plot
ax2.set_ylabel('Residuals')   #adds Y-axis title for the bottom plot

plt.show()   #show the figure object

../_images/32de3414a330921e52c007bae8d69dba36663b47cf5421ff3f5b85a48f5c7ee6.png

That looks good! Let’s now calculate the signal change.

Exercise 86

Calculate the % change in fluorescence based on the fitted values of $F_{max}$ and $F_{min}$:

\[ \% change = \frac{F_{max} - F_{min}}{F_{min}} 100 \% . \]

Solution to Exercise 86

#Define a function to calculate the change in fluorescence based on the minimum and maximum fluorescence
def fluochange(fmax, fmin):
    return ((fmax - fmin) / fmin) * 100

#Calculate the change in fluorescence using the minimum and maximum fluorescence from the fit
print(fluochange(pWramsTB[1], pWramsTB[0]))

The % change in fluorescence is 24% (increase!).

Tight binding conditions#

Background#

We are provided with data which describe the fluorescence from component $A$ (at 50 $\mu$$M$, $[A]_0 > K_d$) on addition of component $B$. Upon complex formation, we see a decrease in flurescence intensity (i.e. fluorescence quenching).

Since the starting $[A]_0$ value is high, a significant amount of $B$ is bound to form $AB$ in the early stages of the titration. As a consequence we cannot assume that $[B] \sim [B]_0$. Thus

\[ K_{d} = \frac{[A]_{eq}[B]_{eq}}{[AB]_{eq}} = \frac{([A]_0 - [AB]_{eq})([B]_0 - [AB]_{eq})}{[AB]_{eq}} . \]

Rearranging this equation gives a quadratic equation of the form $ax^2 + bx + c = 0$:

\[ [AB]_{eq}^2 - (K_d + [A]_0 + [B]_0)[AB]_{eq} + [A]_0[B]_0 = 0 . \]

Using the quadratic formula $x = \frac{- b ± \sqrt{b^2 - 4ac}}{2a}$, we can solve this equation (there is only one physically meaningful solution):

\[ [AB]_{eq} = \frac{K_d + [A]_0 + [B]_0 - \sqrt{(K_d + [A]_0 + [B]_0)^2 - 4[A]_0[B]_0}}{2} . \]

Importing the libraries#

#Import the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.optimize import curve_fit

Reading and plotting the data#

#Read the Excel file
dfTB = pd.read_excel ('../data/ProteinLigandBinding.xlsx',   #create a pandas DataFrame from the filename with file path and extension shown
                     sheet_name=1,   #use the second sheet
                     header=0)   #our data has column names

print(dfTB)   #print the DataFrame created

    [B]0 (µM)        Fobs
         0  101.281302
         5   95.546574
        10   91.905610
        15   90.282790
        20   86.508129
        25   81.503169
        30   81.923274
        35   77.492044
        40   77.296115
        45   70.331629
       50   68.978942
       55   69.103311
       60   66.267461
       65   68.373290
       70   64.928612
       75   66.796101
       80   67.250607
       85   63.991490
       90   62.293365
       95   64.068544
      100   61.697222
      105   64.684667
      110   64.637277
      115   62.298640
      120   63.763304
      125   60.691164
      130   60.303286
      135   60.034654
      140   64.196008
      145   61.231373
      150   60.467756
      155   64.271165
      160   59.671584
      165   59.371298

Let’s have a look at the relationship between $[B]_0$ and $F_{obs}$. We are going to plot $F_{obs}$ against $[B]_0$.

#Plot the data
plt.figure(figsize=(7,5))    #start a figure object

plt.plot(dfTB['[B]0 (µM)'], dfTB['Fobs'],   #plot a set of x (= [B]0), y (= Fobs) data points
         'o', color='gray', markersize=8, linestyle='None')   #use a round, gray marker with size 8; use no line

plt.title('Fluorescence titration', fontsize=22)   #title of graph
plt.xlabel('$[B]_0$ ($\\mu$M)', fontsize=18)   #X-axis label
plt.ylabel('$F_{obs}$', fontsize=18)   #Y-axis label
plt.axis([0, 170, 55, 105])   #axis boundaries, in this case from 0 to 170 for the X-axis and 55 to 105 for the Y-axis

plt.show()   #show the figure object

../_images/e30ab9ab3d1f7ebcc2354b2130362f96f865315127596b7066cce2b0d0ef2798.png

Fitting and reporting the data: $K_d$, signal change and stoichiometry#

We fit the data using the function:

\[ F_{obs} = F_{min} + (F_{max} - F_{min})\frac{K_d + [A]_0 + [B]_0 - \sqrt{(K_d + [A]_0 + [B]_0)^2 - 4[A]_0[B]_0}}{2} . \]

with our independent variable $x = [B]_0$ and 3 parameters $F_{min}$, $F_{max}$, and $K_d$. We need to specify $[A]_0$ outside the function so it can be used within the function.

To find an initial guess for the parameters:

Look at the plotted fluorescence data and estimate the values of $F_{min}$ (i.e. start fluorescence) and $F_{max}$ (i.e. the end fluorescence).
Estimate the equilibrium dissociation constant $K_d$ from the concentration of $[B]_0$ which gives about 50% of the change in fluorescence.

#Set the concentration of [A]0
A0 = 50

#Define the protein-ligand binding function (tight binding)
def funcTightBinding(x, Fmin, Fmax, Kd):
    return Fmin + (Fmax - Fmin) * (((Kd + A0 + x) - ((Kd + A0 + x)**2 - 4 * A0 * x )**(1/2)) / (2 * A0))

#Create a list of substrate concentrations as input for the function
Bin=np.linspace(0, 200, 100)

#Test the initial guesses for the fitting parameters
plt.figure(figsize=(7,5))   #start a figure object

plt.plot(dfTB['[B]0 (µM)'], dfTB['Fobs'],   #plot a set of x (= [B]0), y (= Fobs) data points
         'o', color='gray', markersize=8, linestyle='None',   #use a round, gray marker with size 8; use no line
         label='Data')   #add a legend label

plt.plot(Bin, funcTightBinding(Bin, 102, 55, 30), #Test my guesses for the initial parameters. Use the generated [B]0 array between 0 and 200 as x-values.
         color="red", linestyle='solid',   #use red, solid line
         label='Guesses')   #add a legend label

plt.title('Fluorescence titration', fontsize=22)   #title of graph
plt.xlabel('$[B]_0$ ($\\mu$M)', fontsize=18)   #X-axis label
plt.ylabel('$F_{obs}$', fontsize=18)   #Y-axis label
plt.axis([0, 170, 55, 105])   #axis boundaries, in this case from 0 to 170 for the X-axis and 55 to 105 for the Y-axis
plt.legend()   #add a legend

plt.show()   #show the figure object

../_images/8b9f36ecf57a75dd57fc0af83aee6e2625ddfd5d89f4e07ca3328a61d20c8651.png

#Fit the data with the weak binding curve
paramsTB, params_covarianceTB = curve_fit(funcTightBinding,   #the function we try to fit to the data
                                          dfTB['[B]0 (µM)'],   #the x values
                                          dfTB['Fobs'],   #the y values
                                          [102, 55, 30])   #my guesses for the initial parameters

#Report the fitted Kd parameter with standard error
print("Kd = ", paramsTB[2], "±", np.sqrt(np.diag(params_covarianceTB))[2])

Kd =  5.47180159867898 ± 1.3218122693165213

The fitted $K_d$ is 5 $\pm$ 1 $\mu$$M$. There is a large error on the fitted constant!

Note that the $K_d$ is now considerably less than $[A]_0$ confirming this is a tight binding case. A hyperbolic fit would be a poor approximation.

Now plot the resulting curve on the data and check the residuals to see if the fitting procedure worked well.

#Calculate the residuals
resid = dfTB['Fobs'] - funcTightBinding(dfTB['[B]0 (µM)'], *paramsTB)   #calculate the residuals, the star in *paramsTB unpacks the array so the optimized parameter values become the arguments (after the x-values) to the function

#Produce a combined graph
fig = plt.figure()   #to create a figure object

xlimits = [0, 170]   #to make sure we use the same of the X-axis boundaries for both plots

ax1 = fig.add_axes([0.1, 0.53, 1, 0.81])   #to specify the coordinates, width and height of the top plot

ax2 = fig.add_axes([0.1, 0.1, 1, 0.4])   #to specify the coordinates, width and height of the bottom plot

ax1.plot(dfTB['[B]0 (µM)'], dfTB['Fobs'],   #plot a set of x (= [B]0), y (= Fobs) data points
         marker='o', color='gray', markersize=8, linestyle='None',   #use a round, gray marker with size 8; use no line
         label='Data')   #add a legend label

ax1.plot(Bin, funcTightBinding(Bin, *paramsTB),   #Add the fitted line to plot. Use the generated [B]0 array between 0 and 200 as x-values. Use the fitted parameters to calculate the y-values. The star in *paramsTB unpacks the array so the optimized parameter values become the arguments (after the x-values) to the function.
         color="gray", linestyle='-',   #use a gray, solid line
         label='Fit')   #add a legend label

ax1.axis(xlimits + [55, 105])   #sets the X-axis and Y-axis boundaries for the top plot
ax1.tick_params(axis='x', bottom=False, labelbottom=False)   #removes the ticks and tick labels on the X-axis for the top plot
ax1.set_ylabel('$F_{obs}$')   #adds Y-axis title for the top plot
ax1.legend(loc='lower right')   #include legend

ax2.plot(dfTB['[B]0 (µM)'], resid,   #plot a set of x (= [B]0), y (= the residuals) data points
         marker='o', color='gray', linestyle='None', markersize=8)   #use gray datapoints size 8 and no line

ax2.axhline(0, color='gray', linestyle="--")   #adds a horizontal line at y=0
ax2.axis(xlimits + [-5, 5])   #sets the X-axis and Y-axis boundaries (needs some trial and error to find the right ones!) for the bottom plot
ax2.set_xlabel('$[B]_0$ ($\\mu$M)')   #adds X-axis title for the bottom plot, which is the same for the top plot
ax2.set_ylabel('Residuals')   #adds Y-axis title for the bottom plot

plt.show()   #show the figure object

../_images/e49a3484456101056f7bc78ccbc6ca90d89ae7940e3123e98d67b04876373a05.png

That looks good! Let’s now calculate the signal change.

Exercise 87

Calculate the % change in fluorescence based on the fitted values of $F_{max}$ and $F_{min}$.

Solution to Exercise 87

#Define a function to calculate the change in fluorescence based on the minimum and maximum fluorescence
def fluochange(fmax, fmin):
    return ((fmax - fmin) / fmin) * 100

#Calculate the change in fluorescence using the minimum and maximum fluorescence from the fit
print(fluochange(paramsTB[1], paramsTB[0]))

The %change in fluorescence is -41% (decrease!).

Let’s dive deeper into the topic: experimental design#

$[A]_0$#

Depending on the value of $K_d$ relative to $[A]_0$, the shape of a binding curve will vary from two straight lines which intersect at the equivalence point in which the amount of $B$ added is just enough to completely neutralize $A$ (for high affinity, small $K_d$ case) to a curve that is practically hyperbolic (for low affinity, large $K_d$ case).

We can use the function we wrote in “Tight binding conditions” to illustrate this:

\[ F_{obs} = F_{min} + (F_{max} - F_{min})\frac{K_d + [A]_0 + [B]_0 - \sqrt{(K_d + [A]_0 + [B]_0)^2 - 4[A]_0[B]_0}}{2[A]_0} . \]

#Set the concentration of [A]0
A0 = 50

#Define the protein-ligand binding function (tight binding)
def funcTightBinding(x, Fmin, Fmax, Kd):
    return Fmin + (Fmax - Fmin) * (((Kd + A0 + x) - ((Kd + A0 + x)**2 - 4 * A0 * x )**(1/2)) / (2 * A0))

To create a list of $[B]_0$ concentrations, we use the np.linspace(start, stop, num) command which returns num evenly spaced samples, calculated over the start-stop interval.

#Create a list of ligand concentrations as input for the function
xinput = np.linspace(0, 500, 501)

Let’s plot three curves: one with $[A]_0$ (50 $\mu$$M$) $\gg$ $K_d$ (1 $\mu$$M$), one with $[A]_0$ (50 $\mu$$M$) $\ll$ $K_d$ (1 $mM$) $K_d$, and one with $[A]_0$ (50 $\mu$$M$) = $K_d$ (50 $\mu$$M$). The other parameters are $F_{min} = 0$, and $F_{max} = 100$.

#Plot to compare the conditions
plt.figure(figsize=(7,5))
plt.plot(xinput, funcTightBinding(xinput, 0, 100, 1), color='gray', label='$[A]_0 \\gg K_d$')
plt.plot(xinput, funcTightBinding(xinput, 0, 100, 50), color='blue', label='$[A]_0 = K_d$')
plt.plot(xinput, funcTightBinding(xinput, 0, 100, 1000), color='red', label='$[A]_0 \\ll K_d$')
plt.title('Titration curves', fontsize=22)
plt.legend(loc='center right')
plt.xlabel('$[B]_0$ ($\\mu$M)', fontsize=18)
plt.ylabel('$F_{obs}$', fontsize=18)
plt.axis([0, 500, 0, 110])
plt.show()

../_images/1ca470735c020bd61b298d65f0aa42d33c0b515c2a2f7c724e308f7b4243c76c.png

The gray curve ($[A]_0 \gg K_d$) clearly shows a sharp bend at the equivalence point and the stoichiometry could be determined accurately for tight binding complexes. The blue and red curves ($[A]_0$ $\leqslant$ $K_d$) look more hyperbolic and the stoichiometry is difficult to determine accurately. Indeed, when $K_d$ > $10[A]_0$, a quadratic equation often fails to converge to a unique fit and it is necessary to use the simpler hyperbolic approximation. The ideal titration is when $K_d = \frac{[A]_0}{10}$ for which $K_d$ and the stoichiometry can be determined with reasonable accuracy. Obviously the $K_d$ is not known initially, but the experiment can be repeated with the appropriate $[A]_0$ concentration once an approximate $K_d$ has been determined.

Protein-ligand binding assay - equivalence point for various conditions

$[B]_0$#

The quality of the estimate of $K_d$ depends on covering a ligand concentration range that spans a significant portion of the binding isotherm. As a rule of thumb, we aim for a $[B]_{0}$ range from $0.05 K_d$ to $20 K_d$. We also need to get high data-density in the region of a binding curve that is most important for precise and accurate curve-fitting, i.e. around the $K_d$.

Experimentally, a convenient method for choosing ligand concentrations is to first make a stock solution of ligand at the highest concentration that is experimentally reasonable. Then, two-fold serial dilutions can be made from this stock to produce a range of lower ligand concentrations.

For example, let us say that the highest concentration of ligand to be used in the reaction is 200 $\mu$$M$. We could make a 2.0 $mM$ stock solution of the ligand that would be diluted 10-fold into the final assay reaction mixture (i.e., to give a final concentration of 200 $\mu$$M$). We could then take a portion of this stock solution and dilute it with an equal volume of buffer to yield a 1.0 $mM$ solution, which upon 10-fold dilution into the assay reaction mixture would give a final ligand concentration of 100 $\mu$$M$. A portion of this solution could also be diluted in half with buffer, and so on, to yield a series of solutions of diminishing ligand concentrations. The final ligand concentrations of such a 2-fold serial dilution scheme are… Let us write a function for that!ng

#Define the dilution function
def FoldDilution(conchigh, conclow, df): #highest concentration and lowest concentration you want
    i = conchigh #start with the highest concentration
    TwoFoldDilution = [] #create an empty list

    while ((i<=conchigh) and (i>=conclow)): #when i (which is either the highest concentration you provided or the newly calculated value) is below the highest concentration and above the lowest concentration you provided
        TwoFoldDilution.append(i) #i will be added to the list
        i = i/df #and a "new" i will be calculated, which will be evaluated 
    return TwoFoldDilution #return the list with calculated concentrations

Let’s see which values it returns for 200.0 $\mu$$M$ as highest concentration and around 0.1 $\mu$$M$ as lowest concentration in the assay reaction mixture.

Ligand1 = FoldDilution(200.0, 0.1, 2) #we get a list of concentrations that follow a two-fold dilution starting from 200 𝜇M and ending around 0.1 𝜇M
print(Ligand1)

[200.0, 100.0, 50.0, 25.0, 12.5, 6.25, 3.125, 1.5625, 0.78125, 0.390625, 0.1953125]

That looks good!

Let us now calculate and plot $[AB]_{eq}$ (in $\mu$$M$) against these ligand concentrations $[B]_0$ (in $\mu$$M$). We can use the function we wrote in “Tight binding conditions” to illustrate this:

\[ F_{obs} = F_{min} + (F_{max} - F_{min})\frac{K_d + [A]_0 + [B]_0 - \sqrt{(K_d + [A]_0 + [B]_0)^2 - 4[A]_0[B]_0}}{2[A]_0} . \]

Let’s use $[A]_0$ (0.2 $\mu$$M$) $\ll$ $K_d$ (10 $\mu$$M$). The other parameters are $F_{min} = 0$, and $F_{max} = 100$. Let us add uniform noise to the simulated data using the random.uniform command from NumPy.

#Set the concentration of [A]0
A0 = 0.2

#Define the protein-ligand binding function (tight binding)
def funcTightBinding(x, Fmin, Fmax, Kd):
    return Fmin + (Fmax - Fmin) * (((Kd + A0 + x) - ((Kd + A0 + x)**2 - 4 * A0 * x )**(1/2)) / (2 * A0))

#Create data with random noise
Exp1 = funcTightBinding(np.array(Ligand1), 0, 100, 10)   #we need to convert the list of numbers to a numpy array to use as an input for the mathematical function
Exp1Noise = Exp1 + np.random.uniform(size=Exp1.shape)   #with added random noise

#Plot to check
plt.figure(figsize=(7,5))
plt.plot(Ligand1, Exp1Noise, 'o-', color='gray', label='$[A]_0$ (0.2 $\\mu$$M$) $\\ll$ $K_d$ (10 $\\mu$$M$)')
plt.title('Titration curves', fontsize=22)
plt.legend(loc='center right')
plt.xlabel('$[B]_0$ ($\\mu$M)', fontsize=18)
plt.ylabel('$F_{obs}$', fontsize=18)
plt.axis([0, 250, 0, 110])
plt.show()

../_images/a6de18686b5606dcfd6c38fbb3b7cf19df537ee8d8fa5fc579cf16b0641c89f9.png

Exercise 88

To demonstrate the importance of a good range of ligand concentrations to determine $K_d$, try to fit the data generated above using non-linear least-squares fitting for

$[B]_{0}$ ranging from 200.0 $\mu$$M$ to 0.2 $\mu$$M$ (all 11 data points)
$[B]_{0}$ ranging from 1.6 $\mu$$M$ to 0.2 $\mu$$M$ (the last 4 data point)
$[B]_{0}$ ranging from 200.0 $\mu$$M$ to 25.0 $\mu$$M$ (the first 4 data points)

Solution to Exercise 88

#Calculate fitted parameters using a range of data points
paramsALL, params_covarianceALL = curve_fit(funcTightBinding, Ligand1, Exp1Noise, [0,100,10])
print("Kd using all data points = ", paramsALL[2], "±", np.sqrt(np.diag(params_covarianceALL))[2])

paramsLAST4, params_covarianceLAST4 = curve_fit(funcTightBinding, Ligand1[-4:], Exp1Noise[-4:], [0,100,10], maxfev = 1000)
print("Kd using 1.6 to 0.2 microM = ", paramsLAST4[2], "±", np.sqrt(np.diag(params_covarianceLAST4))[2])

paramsFIRST4, params_covarianceFIRST4 = curve_fit(funcTightBinding, Ligand1[:4], Exp1Noise[:4], [0,100,10], maxfev = 1000)
print("Kd using 200.0 to 25.0 microM = ", paramsFIRST4[2], "±", np.sqrt(np.diag(params_covarianceFIRST4))[2Using all data points, we get a $K_d$ = 10.0 ± 0.1 $\mu$$M$, in excellent agreement with the true value of this constant.

If measurements are made only at low substrate concentrations, we get a $K_d$ = 277 ± 21870 $\mu$$M$. In this concentration range, we never reach saturation, and graphically, $F_{max}$ appears to be infinite. 

If measurements are made only at high substrate concentrations, we get a $K_d$ = 8 ± 2 $\mu$$M$. We do not have enough data points in the region of a binding curve that is most important for precise and accurate curve-fitting, _i.e._ around the $K_d$.rease!).

Binding affinities using the Thermal Shift Assay#

When one performs the Thermal Shift Assay (TSA) in the presence of increasing concentrations of ligand, one will (usually) see a shift in the midpoint of the unfolding transition (i.e. the melting temperature, $T_m$) as the ligand interacts with and (typically) stabilizes the protein. Remember from the “Thermal shift assay” example, that the $T_m$ can be determined using curve-fitting via:

\[ Y = \frac{(Y_{n'} + m_n T) + (Y_{u'} + m_u T) \exp (\frac{\Delta H_m (\frac{T}{T_m} - 1) + \Delta C_p (T_m - T + T ln (\frac{T}{T_m}))}{RT})} {1 + \exp (\frac{\Delta H_m (\frac{T}{T_m} - 1) + \Delta C_p (T_m - T + T ln (\frac{T}{T_m}))}{RT})} \]

where $m_n$ and $m_u$ account for changes in the signals of the pre- and post-transition regions with changes in temperature and $Y_{n'}$ and $Y_{u'}$ are the signal of the native and unfolded states. $T$ is the temperature, $T_m$ is the midpoint of the thermal unfolding curve, $\Delta H_m$ is the enthalpy change for protein unfolding at $T_m$ and $\Delta C_p$ is the change in heat capacity.

Titration curves for TSA experiments are sometimes presented by showing the $T_m$ as a function of ligand concentration. However, we cannot simply fit these curves using the weak- or tight-binding equations we have seen earlier. These equations are only applicable when the observable is linearly proportional to the fraction of one of the species. $T_m$ is not such a variable as it also represents protein unfolding! In addition, the binding affinity, $K_d$ cannot be assumed constant at different temperatures.

So, we need to take into account the folding - unfolding reaction and the protein - ligand binding reaction: $U$ (unfolded protein) $+ L$ (ligand) $\rightleftharpoons F$ (folded protein) $+ L \rightleftharpoons FL$ (folded portein - ligand complex). $K_U$ is the equilibrium constant for the protein unfolding reaction, and $K_d$ is the equilibrium constant for the dissociation reaction. Both $K_U$ and $K_d$ depend on temperature, but both are constant at fixed temperature (and buffer conditions).

The 2019 paper from Bai and co-workers, available here, describes an isothermal strategy to analyse TSA data. Rather than analysing $T_m$ as a function of ligand concentration, they select a single temperature of interest, and at this temperature evaluate the fraction of protein that is folded / unfolded at each ligand concentration. Because all of the data used correspond to the same temperature, no thermodynamic parameters are required.

Their deviation leads to the following equation:

\[ f_u = \frac{1}{1 + (\frac{1}{K_u} (1 + \frac{[L]}{K_d}))} \]

with

\[ [L] = \frac{([L]_0 - [P]_0 - K_d (1 + K_u)) + \sqrt{([P]_0 - [L]_0 + K_d (1 + K_u))^2 + 4 [L]_0 K_d (1 + K_u)}}{2} \]

$[L]_0$ and $[P]_0$ are known experimental parameters; $K_U$ and $K_d$ can be fit at the same time, or alternatively, $K_U$ can first be determined at the temperature of interest from the thermal unfolding curve in the absence of ligand (see the Tutorial “Determining the free energy of unfolding of proteins in Python”).

Below, I included the equation and applied it to some experimental data. If you are interested, Bai and co-workers wrote a free and easy-to-use Python program implementing their isothermal approach for fitting an simulating data (including simulated thermal unfolding curves!). The code is available here.

Let us first import the data. Of note, $[P]_0$ is 1 $\mu$M.

#Read the Excel file
dfTSA = pd.read_excel ('../data/ProteinLigandBinding.xlsx',   #create a pandas DataFrame from the filename with file path and extension shown
                     sheet_name=2,   #use the second sheet
                     header=0)   #our data has column names

print(dfTSA)   #print the DataFrame created

    [L]0 (µM)    fu  Unnamed: 2  [P]0 = 1 µM
      0.15  0.49         NaN          NaN
      0.30  0.46         NaN          NaN
      0.60  0.43         NaN          NaN
      0.90  0.39         NaN          NaN
      1.20  0.36         NaN          NaN
      1.50  0.32         NaN          NaN
      2.00  0.30         NaN          NaN
      2.50  0.27         NaN          NaN
      3.00  0.24         NaN          NaN
      5.00  0.18         NaN          NaN
     7.00  0.13         NaN          NaN
    13.00  0.08         NaN          NaN
    30.00  0.04         NaN          NaN
   150.00  0.01         NaN          NaN

We can now plot the data. Let’s plot $f_u$ versus versus the logarithm of the free ligand concentration, $[L]_0$.

#Plot the data
plt.figure(figsize=(7,5))    #start a figure object

plt.plot(dfTSA['[L]0 (µM)'], dfTSA['fu'],   #plot a set of x (= [L]0), y (= fu) data points
         'o', color='gray', markersize=8, linestyle='None')   #use a round, gray marker with size 8; use no line

plt.xscale("log")   #use a logarithmic scale for the X-axis

plt.title('TSA: Titration Curve', fontsize=22)   #title of graph
plt.xlabel('$[L]_0$ ($\\mu$M)', fontsize=18)   #X-axis label
plt.ylabel('$f_{u}$', fontsize=18)   #Y-axis label
plt.axis([0.1, 200, 0, 1])   #axis boundaries, in this case from 0.1 to 200 for the X-axis and 0 to 1 for the Y-axis

plt.show()   #show the figure object

../_images/3b5c9b235c1732a3b0a0484fbfdcbb11843f728693f99803e6f5b8efaff25a9b.png

We now define the function and try out some initial guesses for $K_u$, and $K_d$.

Look at the plotted data to find an initial guess for the parameters:

Use the $f_u$ value at zero ligand concentration to estimate $K_u$ using $$ K_u = \frac{[U]_{eq}}{[N]_{eq}} = \frac{f_u}{f_n} = \frac{f_u}{1 - f_u} . $$
Estimate the equilibrium dissociation constant $K_d$ from the concentration of $[L]_0$ which gives about 50% of the change in $f_u$.

See how the protein concentration $[P]_0$ is one of the parameters for the function. This is to include uncertainties for the protein concentration during the fit. Yet, we will use bounds during fitting that will force this parameter to be searched within custom boundaries. We remove bounds for the other parameters with (-)np.inf.

#Define the function
def calculate_fitted_isothermal_point(ligand_conc, Ku, Kd, protein_conc):
    Kd = abs(Kd)
    Ku = abs(Ku)
    b = protein_conc + Kd*(1+Ku) - ligand_conc
    c = -1.0 * ligand_conc * Kd * (1+Ku)
    L_free = (-b+np.sqrt(b*b-4*c))/2
    fit_fraction_unfolded = 1 / (1 + (1/Ku)*(1+L_free/Kd))
    return fit_fraction_unfolded

#Create a list of substrate concentrations as input for the function
Bin=np.linspace(0, 150, 1000)

#Test the initial guesses for the fitting parameters
plt.figure(figsize=(7,5))   #start a figure object
plt.plot(dfTSA['[L]0 (µM)'], dfTSA['fu'], 'o', color='gray', markersize=8)   #plot a set of x (= [L]0), y (= fu) data points
plt.plot(Bin, calculate_fitted_isothermal_point(Bin, 1, 2, 1), color="red",label='Guess')   #Test my guesses for the initial parameters. Use the generated [B]0 array between 0 and 150 as x-values.
plt.xscale("log")   #use a logarithmic scale for the X-axis
plt.title('TSA: Titration Curve', fontsize=22)   #title of graph
plt.xlabel('$[L]_0$ ($\\mu$M)', fontsize=18)   #X-axis label
plt.ylabel('$f_{u}$', fontsize=18)   #Y-axis label
plt.axis([0.1, 200, 0, 1])   #axis boundaries, in this case from 0.1 to 200 for the X-axis and 0 to 1 for the Y-axis
plt.show()   #show the figure object

../_images/8c1d3acfceb9dae1e739d78fcaec740f89524a0d95bd8239770d18f16b721752.png

#Fit the data with the curve
paramsTSA, params_covarianceTSA = curve_fit(calculate_fitted_isothermal_point, dfTSA['[L]0 (µM)'],dfTSA['fu'],
                                               [2, 2, 1], bounds=((-np.inf,-np.inf,0.95), (np.inf,np.inf,1.05)))

#Report the fitted parameters with standard error
print("Ku = ", paramsTSA[0], "±", np.sqrt(np.diag(params_covarianceTSA))[0])
print("Kd = ", paramsTSA[1], "±", np.sqrt(np.diag(params_covarianceTSA))[1])

Ku =  1.0346673472477854 ± 0.023627867848768445
Kd =  1.1065639771448268 ± 0.06298727759934804

#Calculate the residuals
resid = dfTSA['fu'] - funcTightBinding(dfTSA['[L]0 (µM)'], *paramsTSA)   #calculate the residuals, the star in *paramsTB unpacks the array so the optimized parameter values become the arguments (after the x-values) to the function

#Produce a combined graph
fig = plt.figure()   #to create a figure object

xlimits = [0.1, 200]   #to make sure we use the same of the X-axis boundaries for both plots

ax1 = fig.add_axes([0.1, 0.53, 1, 0.81])   #to specify the coordinates, width and height of the top plot

ax2 = fig.add_axes([0.1, 0.1, 1, 0.4])   #to specify the coordinates, width and height of the bottom plot

ax1.plot(dfTSA['[L]0 (µM)'], dfTSA['fu'],   #plot a set of x (= [L]0), y (= fu) data points
         marker='o', color='gray', markersize=8, linestyle='None',   #use a round, gray marker with size 8; use no line
         label='Data')   #add a legend label

ax1.plot(Bin, calculate_fitted_isothermal_point(Bin, *paramsTSA),   #Add the fitted line to plot. Use the generated [L]0 array between 0 and 150 as x-values. Use the fitted parameters to calculate the y-values. The star in *paramsTSA unpacks the array so the optimized parameter values become the arguments (after the x-values) to the function.
         color="gray", linestyle='-',   #use a gray, solid line
         label='Fit')   #add a legend label

ax1.axis(xlimits + [0, 1])   #sets the X-axis and Y-axis boundaries for the top plot
ax1.set_xscale("log")   #use a logarithmic scale for the X-axis
ax1.tick_params(axis='x', bottom=False, labelbottom=False)   #removes the ticks and tick labels on the X-axis for the top plot
ax1.set_ylabel('$f_u$')   #adds Y-axis title for the top plot
ax1.legend(loc='best')   #include legend

ax2.plot(dfTSA['[L]0 (µM)'], resid,   #plot a set of x (= [B]0), y (= the residuals) data points
         marker='o', color='gray', linestyle='None', markersize=8)   #use gray datapoints size 8 and no line

ax2.axhline(0, color='gray', linestyle="--")   #adds a horizontal line at y=0
ax2.axis(xlimits + [-2, 2])   #sets the X-axis and Y-axis boundaries (needs some trial and error to find the right ones!) for the bottom plot
ax2.set_xscale("log")   #use a logarithmic scale for the X-axis
ax2.set_xlabel('$[L]_0$ ($\\mu$M)')   #adds X-axis title for the bottom plot, which is the same for the top plot
ax2.set_ylabel('Residuals')   #adds Y-axis title for the bottom plot

plt.show()   #show the figure object

../_images/496cda20ae59a397d22cff0fa6c7e2899e554fc4e596f4b98c7d7cc7a6bc5137.png

References#

Bai, N., Roder, H., Dickson, A. and Karanicolas, J. Isothermal Analysis of ThermoFluor Data can readily provide Quantitative Binding Affinities. Sci Rep 9, 2650 (2019).

Brune, M., Hunter, J.L., Corrie, J.E.T., and Webb, M.R. Direct, real-time measurement of rapid inorganic phosphate release using a novel fluorescent probe and its application to actomyosin subfragment 1 ATPase. Biochemistry 33, 8262-8271 (1994).

Protein-ligand binding

Contents

Protein-ligand binding#

Introduction#

Convention#

\(K_d\)#

Experimental design#

Data#

Weak binding conditions#

Background#

Importing the libraries#

Reading and plotting the data#

Fitting and reporting the data: \(K_d\) and the signal change#

Tight binding conditions#

Background#

Importing the libraries#

Reading and plotting the data#

Fitting and reporting the data: \(K_d\), signal change and stoichiometry#

Let’s dive deeper into the topic: experimental design#

\([A]_0\)#

\([B]_0\)#

Binding affinities using the Thermal Shift Assay#

References#