Colorimetric assay#

Introduction#


Colorimetric assays are based on a simple principle: add appropriate reagents to your protein samples to initiate a chemical reaction whose product is colored. The concentration of colored product, and its absorbance, is proportional to the initial protein concentration.

To calculate the protein concentration of an unknown sample, we use a standard curve that is generated from known protein standards.

When the relation between protein concentration of the known standards (X-axis) and their absorbance (Y-axis) is plotted, this produces a straight line or, in some cases, a parabola. They can be fit using

  • a line equation

\[ Absorbance = a * x + b \]
  • a polynomial equation

\[ Absorbance = a * x^2 + b * x + c \]

Where \(Absorbance\) is the measured signal, \(x\) is the protein concentration of the known standards, and \(a\) and \(b\) (and \(c\)) are model parameters.

See here for more information.

Data#


Get the data needed for this exercise here.

The spreadsheet “ColorimetricAssay.xlsx” contains the absorbances measured at 562 nm of the external standards and unknown protein samples on a plate reader. All samples were measured in duplicate at the same time.

The absorbances for the eight external standards (2000, 1500, 1000, 750, 500, 250, 125, 0 \(\mu\)g/mL) are in A5 to H6. The 0 \(\mu\)g/mL external standard is also known as the blank. The absorbances for the unknown protein samples (dilution factor 2.5, 5, 10 and 20) are in A7 to D8. E7 to H8 are empty wells.

Colorimetric assay data

Data analysis - creating the standard curve#


Exercise 47

Import the libraries needed. Use convenient naming.

Exercise 48

Read the data from the Excel file. Use only the columns and rows containing data, i.e. from A5 to H8. Name the columns BSA-1, BSA-2, Sample-1, and Sample-2. Store in a pandas DataFrame.

Exercise 49

Add a new column containing the concentrations of the eight standard points (i.e. 2000, 1500, 1000, 750, 500, 250, 125 and 0 \(\mu\)g/mL) to the existing DataFrame.

Exercise 50

Plot the data: \([BSA]\) in \(\mu\)g/mL versus absorbance in AU.

Inspect the data!

  • Do we discern a clear trend in our data?

    • Do the data show a positive (sloping upward), negative (sloping downward), or no (spread out) correlation?

    • Do we notice a linear or a non-linear relationship between x- and y-values?

    • Are the errors concentration dependent? Time dependent?

  • Do we have outliers?

    • Where the values entered correctly?

    • Where there any experimental errors? E.g. a calculation error that we picked up afterwards when looking at our lab notebook?

    • Are the data points a mistake? E.g. a pipetting error?

To deal with outliers for replicate data points at each value of x, we can use weighted linear regression (see AMC TB 27-2007, Why are we weighting?, available here and further). Alternatively, when each concentration used to construct a calibration curve is measured at least three times, one can use statistical tests developed for identifying outliers amongst replicate values (see AMC TB 69-2015, Using the Grubbs and Cochran tests to identify outliers, available here).

Exercise 51

Calculate the mean and the standard deviation for the duplicates, add them to the existing DataFrame.

Exercise 52

From the graph, it is unclear whether we have a line or a parabola. Define both functions that we can use to fit the data: a line and a quadratic curve.

Exercise 53

Fit the means using both functions. Use a weighted fit.

Exercise 54

Report the fit parameters and standard errors on the fit parameters for both functions.

Exercise 55

Calculate the residuals.

Exercise 56

Produce a combined figure showing the residuals plots underneath the main plot with data with errorbars and both fitted curves. Make sure they are aligned and have the same X-axis so we can see which residual corresponds to which data point.

Tip: Instead of using the matplotlib.pyplot.plot function, use the matplotlib.pyplot.errorbar function to create a graph with error bars to visualize the variability of the data.

Inspect the quality of both fits! Which one is the best model?

  • Look at the graph of the experimental data and the fitted curve Do the experimental data and model match?

  • Look at the graph of the residuals. Are they around 0? Are they random or is there a trend? If the residuals display a systematic pattern, the model fits the data poorly.

  • Look at the fit parameters and the standard errors on the fit parameters. Are the fit parameters within (biological) reason? Are the standard errors on the fit parameters small? If a standard error on a fit parameter is bigger than the fit parameter, it is possible that there are not enough data points or that the model fits the data poorly.

  • Look at the goodness of fit statistics. But be careful! For example, R-square, ranging from 0 (worst possible fit) to 1 (best possible fit), compares the fit of your model to the fit of a horizontal line through the mean of all Y values, which is valid for linear regression, but not for non-linear regression. For those reasons, these fit statistics are not readily available as output of the SciPy curve_fit() function…

Data analysis - using the standard curve#


Two replicates at four different dilutions (2,5 x, 5 x, 10 x, and 20 x) of a protein sample of unknown concentration were prepared and the absorbance measured.

  • We calculate the concentration for each sample, and

  • we calculate the average concentration taking the dilution factors into account.

The absorbances of the diluted samples need to be within the range of the standard curve. One might need to discard measurements that are not.

In our example, the 20 x dilution factor is not within the range of the standard curve, e.g. ~0.1 < ~0.2, the lower detection limit in the standard curve. We need to exclude this data point.

Data falling below the lower detection limit

Calculate the concentration for each of the dilution factors#

To create a column with our solutions, we use the pandas.DataFrame.apply function. We first specify the function that defines the solution for the standard curve, and we then apply this function to the pandas DataFrame columns we want as input, i.e. Sample-1 and Sample-2.

def solcalc(y, a, b, c):   #create the function
    """
    Solve the quadratic equation for x when y is given using the quadratic formula

    Args:
        the first coefficient, a
        the second coefficient, b
        the constant, c
        
    Returns:
        the solution x-values of a quadratic equation with y given
    """
    return (-b + np.sqrt(b**2 - 4 * a * (c-y)))/(2 * a)

dfCA['Solution-1'] = dfCA['Sample-1'].apply(solcalc, args=params2.tolist())   #apply the function that calculates the solution x-values of a quadratic equation with y given to the Sample-1 absorbances. Use the fitted parameters (converted from NumPy array to list) as arguments for the function (after the y-values). 
dfCA['Solution-2'] = dfCA['Sample-2'].apply(solcalc, args=params2.tolist())   #apply the function that calculates the solution x-values of a quadratic equation with y given to the Sample-2 absorbances. Use the fitted parameters (converted from NumPy array to list) as arguments for the function (after the y-values). 

Take the dilution factors into account#

We now add a column with the dilution factors. We calculate the dilution-factor corrected concentrations in two new columns.

dfCA['DF'] = [2.5, 5, 10, 20, 0, 0, 0, 0]   #add a column containing the dilution factors
dfCA['Concentration-1'] = dfCA['Solution-1'] * dfCA['DF']   #add a column with the calculated values for undiluted samples for 1
dfCA['Concentration-2'] = dfCA['Solution-2'] * dfCA['DF']   #add a column with the calculated values for undiluted samples for 2
print(dfCA)   #print the DataFrame

Calculate the overal concentration#

We extract the data with the concentrations we want to use to calculate the average concentration in a new DataFrame, called. We then calculate the mean and standard deviation for all values in the new DataFrame.

#Calculate the overal concentration
dfCAnew=dfCA.iloc[0:3,-2:]   #create a new DataFrame containing all values you want to calculate the mean and standard deviation for
print(dfCAnew)   #print the new DataFrame 

print(np.array(dfCAnew).mean())   #convert the new DataFrame into a NumPy array and calulcate the mean of all elements
print(np.array(dfCAnew).std())   #convert the new DataFrame into a NumPy array and calulcate the standard deviation of all elements

The concentration of the undiluted, original, protein sample is 2018 \(\pm\) 242 \(\mu\)g/mL. The error is derived from technical repeats.