Welcome to Python for BIOC0029#
All techniques we use produce data that need to be analyzed. Some instruments come with their own vendor-specific, often licensed, software for data acquisition and analysis. These programs usually have a black-box interface so we cannot control equations and / or parameters. That is why we prefer to work with data analysis software or programming languages that allow us flexibility.
There are many options out there: Wolfram Mathematica, MathWorks MATLAB, Microsoft Excel with Solver add-in, Maple, R Studio, WaveMetrics IGOR Pro, GraphPad (Prism), SigmaPlot, Origin, … and Python!
Python is a programming language that can be used to analyse biophysical and biochemical data, to analyse protein/DNA/RNA sequences, to process images, to build mathematical models of biochemical systems, to perform biomolecular simulations …
These notes introduce Python Jupyter Notebooks to biochemists. We focus on biophysical and biochemical data analysis.
Why Python?#
Python is developed under an open source license, which makes it free to use and distribute. Its development is driven by the community.
It is a high-level programming language, written in a form that is close to our human language. It makes is easier for the programmer to write, modify, and debug the code.
It is portable: the same source code works in different environments (e.g. operating systems).
Python provides extensive libraries. Many high-use programming tasks have already been scripted into these libraries which reduces length of code to be written significantly. Examples include NumPy, pandas, matplotlib, and SciPy.
The wide base of users and active developers has resulted in a rich online support (documentation and forums) to encourage development and the continued adoption of the language. Google it!
Alternatively, use AI code assistant tools, like ChatGPT API, GitHub Copilot, Replit Ghostwriter, or Amazon CodeWhisperer, to help you write your code. But be aware, sometimes, AI assistants do not give what you need: always read and understand the code they suggest!
Why Jupyter Notebooks?#
Jupyter notebooks (.ipynb files) are a presentation layer. They allow us to create and share documents that contain cells. There are three types of cell:
live code
explanatory text, equations, tables, and figures using Markdown
output in Raw NBConverter.
Each Jupyter notebook uses a kernel. The kernel runs our code cells in a specific programming language, in our case, Python. Any output is displayed. The kernel’s state persists over time and between cells. For example, if a library has been imported in one cell, then that library will be available for the whole notebook. We can reset the kernel by restarting it.
In code cells, lines with #
are comment lines. They are not evaluated. Comments are used to explain what the code is doing.
Markdown#
Markdown is a language that makes it very easy to write formatted content. It uses very easy-to-remember syntax. The following cheat sheet can help when writing explantory text, equations, and tables in Markdown cells: