# Variables, Data Types, Indices, and Slices

## __<font color=blue>Assigning variables and using `print()` to check how the code is working</font>__
---

To save a value, we assign them to a __variable__ for later use. The syntax for assigning variables is: `variable_name = variable_value`.

Use `print(variable_name)` to __print__ the specified variable.

_Tips:_
- Choose informative names for variables.
- Use comment lines to express the units of the variable or to describe the meaning of the variable.
- Common operators are `+` for addition, `-` for substraction, `*` for multiplication, `/` for division, and `**` for power operations.

Let's see this in action with _<font color=red>calculations using variables</font>_.

```{exercise}
:label: my-exercise1

How many grams of solid NaOH (40.0 g/mol) are required to prepare 500 ml of a 0.04 M solution?
```

````{solution} my-exercise1
:label: my-solution1
:class: dropdown

Here's one possible solution.

```{code-block} python
liters = 0.5   #l
M = 0.04   #mol/l
MW = 40.0   #g/mol

wt = (liters * M) * MW   #(l * mol/l) * g/mol = g

print(wt)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise2

An enzyme has a $V_{max}$ of 1.2 $\mu$$M s^{-1}$ and a $K_m$ of 10 $\mu$$M$. What is the initial velocity (in $\mu$$M s^{-1}$) for an 8 $\mu$$M$ substrate concentration?
```

````{solution} my-exercise2
:label: my-solution2
:class: dropdown

Here's one possible solution.

```{code-block} python
Km = 10   #microM
Vmax = 1.2   #microM/s
S = 8   #microM

V0 = (Vmax * S) / (Km + S)   #(microM/s * microM) / (microM + microM) = microM/s; Michaelis-Menten equation

print(V0)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise3

Convert the initial velocity in $\mu$$M s^{-1}$ from the previous exercise to $\mu$$M min^{-1}$.
```

````{solution} my-exercise3
:label: my-solution3
:class: dropdown

Here's one possible solution.

```{code-block} python
V0permin = V0 * 60   #microM/s * 60s/min = microM/min

print(V0permin)   #print the value that we calculated
```
````

## __<font color=blue>Data types, using `len()` to return the number of items in a sequence, and using `.count()` to count how many times an item appears in a sequence</font>__
---

In Python, the __data type__ is set when we assign a value to a variable. Different data types can do different things.

The most common data types are
- __strings__ ("str") for text (surrounded by either single quotation marks or double quotation marks),
- __integers__ ("int") for whole numbers, positive or negative, without decimals, of unlimited length,
- __floating point numbers__ ("float") for numbers, positive or negative, containing one or more decimals,
- __lists__ ("list") for multiple ordered and changeable items of different data types within one variable (created using square brackets (`[]`)),
- __tuples__ ("tuple") for multiple ordered and unchangeable items of different data types within one variable (created using round brackets (`()`)),
- __dictionaries__ ("dict") for storing a collection of ordered, changeable, and non-duplicate data as “key : value” pairs (created using curly brackets (`{}`); pairs are separated using commas, and keys and values are separated using colons)

Use `type(variable_name)` to identify the data type of any variable.

Use `len(variable_name)` to determine the __length__ of a sequence (_e.g._ a string, list, or tuple).

Use `sequence_name.count(value)` to __count__ the number of items with a specified value within a sequence.

Use `dictionary_name.get(key_name)` to return the value of a specified key in a dictionary. If the key is not found, it returns `None`.

Let's see this in action with _<font color=red>DNA, RNA, and protein sequences as strings</font>_ and _<font color=red>lists with substrate concentrations and amino acids</font>_.

```{exercise}
:label: my-exercise4

Determine the data type, length, and number of tryptophan residues for this LRRK2 protein sequence containing one letter code amino acids.
```

In [None]:
protseqLRRK2 ="MASGSCQGCEEDEETLKKLIVRLNNVQEGKQIETLVQILEDLLVFTYSERASKLFQGKNIHVPLLIVLDSYMRVASVQQVGWSLLCKLIEVCPGTMQSLMGPQDVGNDWEVLGVHQLILKMLTVHNASVNLSVIGLKTLDLLLTSGKITLLILDEESDIFMLIFDAMHSFPANDEVQKLGCKALHVLFERVSEEQLTEFVENKDYMILLSALTNFKDEEEIVLHVLHCLHSLAIPCNNVEVLMSGNVRCYNIVVEAMKAFPMSERIQEVSCCLLHRLTLGNFFNILVLNEVHEFVVKAVQQYPENAALQISALSCLALLTETIFLNQDLEEKNENQENDDEGEEDKLFWLEACYKALTWHRKNKHVQEAACWALNNLLMYQNSLHEKIGDEDGHFPAHREVMLSMLMHSSSKEVFQASANALSTLLEQNVNFRKILLSKGIHLNVLELMQKHIHSPEVAESGCKMLNHLFEGSNTSLDIMAAVVPKILTVMKRHETSLPVQLEALRAILHFIVPGMPEESREDTEFHHKLNMVKKQCFKNDIHKLVLAALNRFIGNPGIQKCGLKVISSIVHFPDALEMLSLEGAMDSVLHTLQMYPDDQEIQCLGLSLIGYLITKKNVFIGTGHLLAKILVSSLYRFKDVAEIQTKGFQTILAILKLSASFSKLLVHHSFDLVIFHQMSSNIMEQKDQQFLNLCCKCFAKVAMDDYLKNVMLERACDQNNSIMVECLLLLGADANQAKEGSSLICQVCEKESSPKLVELLLNSGSREQDVRKALTISIGKGDSQIISLLLRRLALDVANNSICLGGFCIGKVEPSWLGPLFPDKTSNLRKQTNIASTLARMVIRYQMKSAVEEGTASGSDGNFSEDVLSKFDEWTFIPDSSMDSVFAQSDDLDSEGSEGSFLVKKKSNSISVGEFYRDAVLQRCSPNLQRHSNSLGPIFDHEDLLKRKRKILSSDDSLRSSKLQSHMRHSDSISSLASEREYITSLDLSANELRDIDALSQKCCISVHLEHLEKLELHQNALTSFPQQLCETLKSLTHLDLHSNKFTSFPSYLLKMSCIANLDVSRNDIGPSVVLDPTVKCPTLKQFNLSYNQLSFVPENLTDVVEKLEQLILEGNKISGICSPLRLKELKILNLSKNHISSLSENFLEACPKVESFSARMNFLAAMPFLPPSMTILKLSQNKFSCIPEAILNLPHLRSLDMSSNDIQYLPGPAHWKSLNLRELLFSHNQISILDLSEKAYLWSRVEKLHLSHNKLKEIPPEIGCLENLTSLDVSYNLELRSFPNEMGKLSKIWDLPLDELHLNFDFKHIGCKAKDIIRFLQQRLKKAVPYNRMKLMIVGNTGSGKTTLLQQLMKTKKSDLGMQSATVGIDVKDWPIQIRDKRKRDLVLNVWDFAGREEFYSTHPHFMTQRALYLAVYDLSKGQAEVDAMKPWLFNIKARASSSPVILVGTHLDVSDEKQRKACMSKITKELLNKRGFPAIRDYHFVNATEESDALAKLRKTIINESLNFKIRDQLVVGQLIPDCYVELEKIILSERKNVPIEFPVIDRKRLLQLVRENQLQLDENELPHAVHFLNESGVLLHFQDPALQLSDLYFVEPKWLCKIMAQILTVKVEGCPKHPKGIISRRDVEKFLSKKRKFPKNYMSQYFKLLEKFQIALPIGEEYLLVPSSLSDHRPVIELPHCENSEIIIRLYEMPYFPMGFWSRLINRLLEISPYMLSGRERALRPNRMYWRQGIYLNWSPEAYCLVGSEVLDNHPESFLKITVPSCRKGCILLGQVVDHIDSLMEEWFPGLLEIDICGEGETLLKKWALYSFNDGEEHQKILLDDLMKKAEEGDLLVNPDQPRLTIPISQIAPDLILADLPRNIMLNNDELEFEQAPEFLLGDGSFGSVYRAAYEGEEVAVKIFNKHTSLRLLRQELVVLCHLHHPSLISLLAAGIRPRMLVMELASKGSLDRLLQQDKASLTRTLQHRIALHVADGLRYLHSAMIIYRDLKPHNVLLFTLYPNAAIIAKIADYGIAQYCCRMGIKTSEGTPGFRAPEVARGNVIYNQQADVYSFGLLLYDILTTGGRIVEGLKFPNEFDELEIQGKLPDPVKEYGCAPWPMVEKLIKQCLKENPQERPTSAQVFDILNSAELVCLTRRILLPKNVIVECMVATHHNSRNASIWLGCGHTDRGQLSFLDLNTEGYTSEEVADSRILCLALVHLPVEKESWIVSGTQSGTLLVINTEDGKKRHTLEKMTDSVTCLYCNSFSKQSKQKNFLLVGTADGKLAIFEDKTVKLKGAAPLKILNIGNVSTPLMCLSESTNSTERNVMWGGCGTKIFSFSNDFTIQKLIETRTSQLFSYAAFSDSNIITVVVDTALYIAKQNSPVVEVWDKKTEKLCGLIDCVHFLREVMVKENKESKHKMSYSGRVKTLCLQKNTALWIGTGGGHILLLDLSTRRLIRVIYNFCNSVRVMMTAQLGSLKNVMLVLGYNRKNTEGTQKQKEIQSCLTVWDINLPHEVQNLEKHIEVRKELAEKMRRTSVE"

````{solution} my-exercise4
:label: my-solution4
:class: dropdown

Here's one possible solution.

```{code-block} python
print(type(protseqLRRK2))   #determine and print the data type

len_protseqLRRK2 = len(protseqLRRK2)   #determine the length of the string
print(len_protseqLRRK2)   #print the value that we calculated

Wcount_protseqLRRK2 = protseqLRRK2.count("W")   #count the number of times W appears in the string
print(Wcount_protseqLRRK2)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise5

Define the EcoRI DNA recognition sequence (GAATTC) as a string.
```

````{solution} my-exercise5
:label: my-solution5
:class: dropdown

Here's one possible solution.

```{code-block} python
DNAseqEcoRI = "GAATTC"   #create a string using double quotation marks
```
````

```{exercise}
:label: my-exercise6

Determine the data type and length for this list with substrate concentrations.
```

In [None]:
subconc = [0, 1, 2, 4, 8, 15, 30, 60, 125, 250, 500]

````{solution} my-exercise6
:label: my-solution6
:class: dropdown

Here's one possible solution.

```{code-block} python
print(type(subconc))   #determine and print the data type

len_subconc = len(subconc)   #determine the length of the list
print(len_subconc)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise7

Determine the data type, length, and number of alanine residues for this list with amino acids.
```

In [None]:
AA3Letter = ["ALA", "ARG", "ASN", "ASP", "CYS", "GLN", "GLU", "GLY", "HIS", "ILE", "LEU", "LYS", "MET", "PHE", "PRO", "SER", "THR", "TRP", "TYR", "VAL"]

````{solution} my-exercise7
:label: my-solution7
:class: dropdown

Here's one possible solution.

```{code-block} python
print(type(AA3Letter))   #determine and print the data type

len_AA3Letter = len(AA3Letter)   #determine the length of the list
print(len_AA3Letter)   #print the value that we calculated

ALAcount_AA3Letter = AA3Letter.count("ALA")   #count the number of times ALA appears in the string
print(ALAcount_AA3Letter)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise8

Determine the number of “key : value” pairs in the following dictionary of restriction enzymes, with key = name and value = cleavage site. Retrieve the cleavage site for TaqI.
```

In [None]:
REs = {
    'EcoRI' : 'GAATTC',
    'BamHI' : 'GGATCC',
    'EarI' : 'CTCTTC',
    'ScaI' : 'AGTACT',
    'NotI' : 'GCGGCCGC',
    'TaqI' : 'TCGA',
    'FokI' : 'GGATG',
    'HindIII' : 'AAGCTT'
}   #Create a dictionary with restriction enzyme names (keys) and cleavage sites (values). Both are strings.

````{solution} my-exercise8
:label: my-solution8
:class: dropdown

Here's one possible solution.

```{code-block} python
len_REs = len(REs)   #determine the length of the dictionary
print(len_REs)   #print the value that we calculated

REs_TaqI = REs.get('TaqI')   #look up the cleavage site stored for TaqI
print(REs_TaqI)   #print the value that we calculated
```
````

## __<font color=blue>Index and slice</font>__
---

Sequence-based data types (_e.g._ a string, list, or tuple) are __indexed__, with the first item having index "0". To select the first item, use `sequence_name[0]`; to select the second item, use `sequence_name[1]` ...

If we have a long sequence and want to select an item towards the end, we can count backwards, starting at the index number "-1".

The syntax for selecting a subset of an existing sequence, a __slice__, is: `sequence_name[start:end]`. When we specify the end item for the slice, it goes up to but does not include that item of the list!

Use `sequence_name[:end]` to have the slice starting from the beginning of the sequence.

Use `sequence_name[start:]` to have the slice going to the end of the sequence.

```{image} ./Images-VariablesDataTypesIndicesSlices/IndexingSlicing.png
:alt: Indexing and slicing of sequences
:width: 600px
:align: center
```

Let's see this in action with _<font color=red>DNA, RNA, and protein sequences as strings</font>_.

```{exercise}
:label: my-exercise9

Select the signal peptide (residues 1 to 22: MVSTMLSGLVLWLTFGWTPALA) and serine residues that are phosphorylated (S136 and S200) of this 7B2 protein sequence containing one letter code amino acids.
```

In [None]:
protseq7B2 ="MVSTMLSGLVLWLTFGWTPALAYSPRTPDRVSETDIQRLLHGVMEQLGIARPRVEYPAHQAMNLVGPQSIEGGAHEGLQHLGPFGNIPNIVAELTGDNTPKDFSEDQGYPDPPNPCPIGKTDDGCLENTPDTAEFSREFQLHQHLFDPEHDYPGLGKWNKKLLYEKMKGGQRRKRRSVNPYLQGQRLDNVVAKKSVPHFSDEDKDPE"

````{solution} my-exercise9
:label: my-solution9
:class: dropdown

Here's one possible solution.

```{code-block} python
SPseq7B2 = protseq7B2[0:22]   #Select residues 1 (0 as the first item has index 0) to 22 (22 as it goes up to but does not include item 22)
print(SPseq7B2)   #print the value that we calculated

PhosS136 = protseq7B2[135]   #Select residue 136 (135 as the first item has index 0)
print(PhosS136)   #print the value that we calculated

PhosS200 = protseq7B2[199]   #Select residue 200 (199 as the first item has index 0)
print(PhosS200)   #print the value that we calculated
```
````