{"id":836,"date":"2017-02-23T19:01:41","date_gmt":"2017-02-23T19:01:41","guid":{"rendered":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/?page_id=836"},"modified":"2017-04-08T17:49:02","modified_gmt":"2017-04-08T17:49:02","slug":"php-programming-language-basics-more-on-strings-and-biological-sequences-manipulation-with-predefined-functions","status":"publish","type":"page","link":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/chapter-4-adding-a-dynamic-layer-introducing-the-php-programming-language\/php-programming-language-basics-more-on-strings-and-biological-sequences-manipulation-with-predefined-functions\/","title":{"rendered":"4-7: PHP programming language basics &#8211; more on strings and biological sequences manipulation with predefined functions"},"content":{"rendered":"<p>In the previous section we have started to see some basics on how to <a href=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/chapter-4-adding-a-dynamic-layer-introducing-the-php-programming-language\/php-programming-language-basics-built-in-predefined-functions-strings-and-biological-sequences-manipulation\/\">manipulate strings and biological sequences in PHP<\/a> by using predefined functions. In this section we explore the topic further by exploring a few more useful PHP built-in tools (predefined functions).<\/p>\n<h2>Splitting a biological sequence in single nucleotides, codons or amino-acids with the str_split() function<\/h2>\n<p>str_split() &#8211; <strong>string split<\/strong> &#8211; requires two arguments, a string and a number (an integer). It allows, as the name suggests, to split a string in pieces composed by a certain number (the second argument passed on function call) of characters. If the passed number exceeds the string length, it will return the entire string. If the string length cannot be exactly divided by the number, it will return sub strings formed by the passed number and a last substring with what remains.<\/p>\n<p>Let&#8217;s make this more clear with a few examples.<\/p>\n<pre lang=\"php\"><code>\r\n<?php\r\n\r\n$mystring = \"123456789\";\r\n\r\n$splitted_on_3 = str_split($mystring,3); \/\/ splitting to sub-strings or 3 characters \r\n\r\necho \"<p>Splitting 123456789 to substrings of 3 characters<br>\";\r\nvar_dump($splitted_on_3);\r\necho \"<\/p>\";\r\n\r\n\/\/ Will output: array(3) { [0]=> string(3) \"123\" [1]=> string(3) \"456\" [2]=> string(3) \"789\" }\r\n\r\n$splitted_on_2 = str_split($mystring,2); \/\/ splitting to sub-strings or 2 characters \r\n\r\necho \"<p>Splitting 123456789 to substrings of 2 characters<br>\";\r\nvar_dump($splitted_on_2);\r\n\/\/ Will output: array(5) { [0]=> string(2) \"12\" [1]=> string(2) \"34\" [2]=> string(2) \"56\" [3]=> string(2) \"78\" [4]=> string(1) \"9\" }\r\necho \"<\/p>\";\r\n\r\n$splitted_on_10 = str_split($mystring,10); \/\/ splitting to sub-strings or 10 characters \r\n\r\necho \"<p>Splitting 123456789 to substrings of 10 characters<br>\";\r\nvar_dump($splitted_on_10);\r\n\/\/ Will output: array(1) { [0]=> string(9) \"123456789\" }\r\necho \"<\/p>\";\r\n\r\n?>\r\n<\/code><\/pre>\n<p>In the example above please note that when we try to split a 9 characters string in substrings of 2 characters, str_split() will generate an array of 5 elements. The first 4 are composed by substrings of 2 characters (what we asked for by passing 2 as second argument to the function) and the last by just one character, that constitutes the remainder of the string after having taken out all the possible two characters substrings.<\/p>\n<p>Also note that when we attempt to subdivide our string in substrings longer than the string itself &#8211; in this example we try to subdivide a 9 characters string in substrings of 10 &#8211; the whole string is returned as the single element of the str_split() output array.<\/p>\n<p>In the example that follows we use str_split() to subdivide a coding sequence into the codons (triplets) that compose it. You may see as this could be a first step toward a translation of our DNA coding sequence into a protein sequence.<\/p>\n<pre lang=\"php\"><code>\r\n<?php\r\n\r\n$cod_sequence = \"ATGGCTAATGATAGA\"; \/\/ A short portion of a DNA coding sequence\r\n\r\n$codons = str_split($cod_sequence,3);\r\n\r\necho \"<p>\\n<strong>Here are the codons composing $cod_sequence<\/strong>\\n<ul>\\n\";\r\nforeach($codons as $codon){\r\n    echo \"<li>\".$codon.\"<\/li>\\n\";\r\n}\r\necho \"<\/ul>\\n<\/p>\";\r\n\r\n?>\r\n<\/code><\/pre>\n<p>This will generate the following output:<\/p>\n<p>\n<strong>Here are the codons composing ATGGCTAATGATAGA<\/strong><\/p>\n<ul>\n<li>ATG<\/li>\n<li>GCT<\/li>\n<li>AAT<\/li>\n<li>GAT<\/li>\n<li>AGA<\/li>\n<\/ul>\n<p>If we want to split the same DNA sequence used in the previous example into single nucleotides instead codons, all we have to do is use 1 instead of 3 as an argument in the str_split() call. Let&#8217;s also change the variables names so that they make sense for the new script.<\/p>\n<pre lang=\"php\"><code>\r\n<?php\r\n\r\n$cod_sequence = \"ATGGCTAATGATAGA\"; \/\/ A short portion of a DNA coding sequence\r\n\r\n$nucleotides = str_split($cod_sequence,1);\r\n\r\necho \"<p>\\n<strong>Here are the nucleotides composing $cod_sequence<\/strong>\\n<ul>\\n\";\r\nforeach($nucleotides as $nucleotide){\r\n    echo \"<li>\".$nucleotide.\"<\/li>\\n\";\r\n}\r\necho \"<\/ul>\\n<\/p>\";\r\n\r\n?>\r\n<\/code><\/pre>\n<p>Here is the output of the script above:<\/p>\n<p>\n<strong>Here are the nucleotides composing ATGGCTAATGATAGA<\/strong><\/p>\n<ul>\n<li>A<\/li>\n<li>T<\/li>\n<li>G<\/li>\n<li>G<\/li>\n<li>C<\/li>\n<li>T<\/li>\n<li>A<\/li>\n<li>A<\/li>\n<li>T<\/li>\n<li>G<\/li>\n<li>A<\/li>\n<li>T<\/li>\n<li>A<\/li>\n<li>G<\/li>\n<li>A<\/li>\n<\/ul>\n<h2 id=\"reverse-complement\">How to reverse-complement a DNA sequence in PHP<\/h2>\n<p>We now have enough knowledge of PHP to perform a simple and basic, yet often essential operation that concerns DNA sequences: from one strand, extrapolate the other. You are surely familiar with the concept that DNA is a double helix and the two strands of the helix are complementary to each other: if A is on one strand, T is on the other (and vice-versa) and if C is on one strand G is on the other (and vice-versa).<\/p>\n<figure id=\"attachment_902\" aria-describedby=\"caption-attachment-902\" style=\"width: 2560px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/DNA_StructureKeyLabelled.pn_NoBB.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/DNA_StructureKeyLabelled.pn_NoBB.png\" alt=\"The DNA double helix\" width=\"2560\" height=\"2498\" class=\"size-full wp-image-902\" srcset=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/DNA_StructureKeyLabelled.pn_NoBB.png 2560w, http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/DNA_StructureKeyLabelled.pn_NoBB-300x293.png 300w, http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/DNA_StructureKeyLabelled.pn_NoBB-768x749.png 768w, http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/DNA_StructureKeyLabelled.pn_NoBB-1024x999.png 1024w, http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/DNA_StructureKeyLabelled.pn_NoBB-1200x1171.png 1200w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><figcaption id=\"caption-attachment-902\" class=\"wp-caption-text\">Figure 4-7-1: DNA is a double helix in which the two strands are complementary: when an A is present on a strand, a T is present on the other strand. When a C is present on a strand, a G is present on the other strand. From the sequence of one strand you can easily compute the sequence of the other by performing a &#8220;reverse complement&#8221; operation. Image credits: <a href=\"https:\/\/commons.wikimedia.org\/wiki\/User:Zephyris\">Zephyris<\/a>, Wikipedia<\/figcaption><\/figure>\n<p>Given the sequence of one DNA strand, you can easily obtain the sequence of the other by performing a so called &#8220;<a href=\"http:\/\/www.cellbiol.com\/scripts\/complement\/dna_sequence_reverse_complement.php\">reverse complement<\/a>&#8221; operation.<\/p>\n<p>Here is some PHP code that allows you to perform just that. We will explore this much further on the web applications chapter.<\/p>\n<pre lang=\"php\"><code>\r\n<?php\r\n\r\n$complement_dict = array(\r\n    \"A\" => \"T\",\r\n    \"T\" => \"A\",\r\n    \"G\" => \"C\",\r\n    \"C\" => \"G\"\r\n);\r\n\r\n$sequence = \"ATGGTGAAGCAGATCGA\"; \r\n\r\n$nucleotides = str_split($sequence,1);\r\n\r\n$complement_sequence = \"\";\r\n\r\nforeach($nucleotides as $nucleotide){\r\n    $complement_sequence = $complement_sequence.$complement_dict[$nucleotide];\r\n}\r\n\r\n$revcomp_sequence = strrev($complement_sequence);\r\n\r\necho \"<p>\\n<strong>Input Sequence<\/strong><br>\\n<span style=\\\"font-family:courier;\\\">$sequence<\/span>\\n<\/p>\\n<p>\\n<strong>Reverse Complement<\/strong><br>\\n<span style=\\\"font-family:courier;\\\">$revcomp_sequence<\/span>\\n<\/p>\";\r\n\r\n?>\r\n<\/code><\/pre>\n<p>Here is the output of this script:<\/p>\n<p>\n<strong>Input Sequence<\/strong><br \/>\n<span style=\"font-family:courier;\">ATGGTGAAGCAGATCGA<\/span>\n<\/p>\n<p>\n<strong>Reverse Complement<\/strong><br \/>\n<span style=\"font-family:courier;\">TCGATCTGCTTCACCAT<\/span>\n<\/p>\n<h2 id=\"sequence_translation\">Translating a DNA coding sequence to an amino-acids sequence with PHP<\/h2>\n<p>Let&#8217;s now take the splitting of a DNA sequence into codons shown above one step further and actually perform the translation of the DNA coding sequence to an amino-acids sequence. In order to do a translation, pretty much any translation from a language to another, we do need a dictionary where we can look for a word and get the translated word for the new language we are interested in. In the case of DNA codons, we need a dictionary to translate triplets of DNA nucleotides (codons) to the correspond amino-acids. We can easily generate such a dictionary in PHP from the <a href=\"https:\/\/en.wikipedia.org\/wiki\/DNA_codon_table\" target=\"_blank\">genetic code<\/a>.<\/p>\n<figure id=\"attachment_863\" aria-describedby=\"caption-attachment-863\" style=\"width: 1812px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/genetic-code-wikipedia.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/genetic-code-wikipedia.png\" alt=\"The genetic code\" width=\"1812\" height=\"1122\" class=\"size-full wp-image-863\" srcset=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/genetic-code-wikipedia.png 1812w, http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/genetic-code-wikipedia-300x186.png 300w, http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/genetic-code-wikipedia-768x476.png 768w, http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/genetic-code-wikipedia-1024x634.png 1024w, http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-content\/uploads\/2017\/02\/genetic-code-wikipedia-1200x743.png 1200w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><figcaption id=\"caption-attachment-863\" class=\"wp-caption-text\">Figure 4-7-2: The Genetic Code &#8211; Source: Wikipedia<\/figcaption><\/figure>\n<p>As you know at this point, a dictionary can be created in PHP as an array in which each key is associated to a value. Each key is the word to translate while the corresponding value is the translation itself.<\/p>\n<p>Let&#8217;s get into it. Here&#8217;s the genetic code as a PHP array dictionary, derived from the figure above:<\/p>\n<pre lang=\"php\"><code>\r\n<?php\r\n\r\n$genetic_code = array(\r\n    \"TTT\" => \"F\",\r\n    \"TTC\" => \"F\",\r\n    \"TTA\" => \"L\",\r\n    \"TTG\" => \"L\",\r\n    \"CTT\" => \"L\",\r\n    \"CTC\" => \"L\",\r\n    \"CTA\" => \"L\",\r\n    \"CTG\" => \"L\",\r\n    \"ATT\" => \"I\",\r\n    \"ATC\" => \"I\",\r\n    \"ATA\" => \"I\",\r\n    \"ATG\" => \"M\",\r\n    \"GTT\" => \"V\",\r\n    \"GTC\" => \"V\",\r\n    \"GTA\" => \"V\",\r\n    \"GTG\" => \"V\",\r\n    \"TCT\" => \"S\",\r\n    \"TCC\" => \"S\",\r\n    \"TCA\" => \"S\",\r\n    \"TCG\" => \"S\",\r\n    \"CCT\" => \"P\",\r\n    \"CCC\" => \"P\",\r\n    \"CCA\" => \"P\",\r\n    \"CCG\" => \"P\",\r\n    \"ACT\" => \"T\",\r\n    \"ACC\" => \"T\",\r\n    \"ACA\" => \"T\",\r\n    \"ACG\" => \"T\",\r\n    \"GCT\" => \"A\",\r\n    \"GCC\" => \"A\",\r\n    \"GCA\" => \"A\",\r\n    \"GCG\" => \"A\",\r\n    \"TAT\" => \"Y\",\r\n    \"TAC\" => \"Y\",\r\n    \"TAA\" => \"Stop\",\r\n    \"TAG\" => \"Stop\",\r\n    \"CAT\" => \"H\",\r\n    \"CAC\" => \"H\",\r\n    \"CAA\" => \"Q\",\r\n    \"CAG\" => \"Q\",\r\n    \"AAT\" => \"N\",\r\n    \"AAC\" => \"N\",\r\n    \"AAA\" => \"K\",\r\n    \"AAG\" => \"K\",\r\n    \"GAT\" => \"D\",\r\n    \"GAC\" => \"D\",\r\n    \"GAA\" => \"E\",\r\n    \"GAG\" => \"E\",\r\n    \"TGT\" => \"C\",\r\n    \"TGC\" => \"C\",\r\n    \"TGA\" => \"Stop\",\r\n    \"TGG\" => \"W\",\r\n    \"CGT\" => \"R\",\r\n    \"CGC\" => \"R\",\r\n    \"CGA\" => \"R\",\r\n    \"CGG\" => \"R\", \r\n    \"AGT\" => \"S\",   \r\n    \"AGC\" => \"S\", \r\n    \"AGA\" => \"R\", \r\n    \"AGG\" => \"R\", \r\n    \"GGT\" => \"G\",\r\n    \"GGC\" => \"G\",\r\n    \"GGA\" => \"G\",\r\n    \"GGG\" => \"G\"\r\n);\r\n\r\n?>\r\n<\/code><\/pre>\n<p>We will use this genetic code PHP dictionary to translate an actual DNA coding sequence. Let&#8217;s take the Human Thioredoxin (<a href=\"http:\/\/www.uniprot.org\/uniprot\/P10599\" target=\"_blank\">Uniprot P10599<\/a>) as an example. The coding sequence can be found <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/CCDS\/CcdsBrowse.cgi?REQUEST=CCDS&#038;GO=MainBrowse&#038;DATA=CCDS35103.1\" target=\"_blank\">here<\/a>.<\/p>\n<pre lang=\"php\"><code>\r\n<?php\r\n\r\n$genetic_code = array(\r\n    \"TTT\" => \"F\",\r\n    \"TTC\" => \"F\",\r\n    \"TTA\" => \"L\",\r\n    \"TTG\" => \"L\",\r\n    \"CTT\" => \"L\",\r\n    \"CTC\" => \"L\",\r\n    \"CTA\" => \"L\",\r\n    \"CTG\" => \"L\",\r\n    \"ATT\" => \"I\",\r\n    \"ATC\" => \"I\",\r\n    \"ATA\" => \"I\",\r\n    \"ATG\" => \"M\",\r\n    \"GTT\" => \"V\",\r\n    \"GTC\" => \"V\",\r\n    \"GTA\" => \"V\",\r\n    \"GTG\" => \"V\",\r\n    \"TCT\" => \"S\",\r\n    \"TCC\" => \"S\",\r\n    \"TCA\" => \"S\",\r\n    \"TCG\" => \"S\",\r\n    \"CCT\" => \"P\",\r\n    \"CCC\" => \"P\",\r\n    \"CCA\" => \"P\",\r\n    \"CCG\" => \"P\",\r\n    \"ACT\" => \"T\",\r\n    \"ACC\" => \"T\",\r\n    \"ACA\" => \"T\",\r\n    \"ACG\" => \"T\",\r\n    \"GCT\" => \"A\",\r\n    \"GCC\" => \"A\",\r\n    \"GCA\" => \"A\",\r\n    \"GCG\" => \"A\",\r\n    \"TAT\" => \"Y\",\r\n    \"TAC\" => \"Y\",\r\n    \"TAA\" => \"Stop\",\r\n    \"TAG\" => \"Stop\",\r\n    \"CAT\" => \"H\",\r\n    \"CAC\" => \"H\",\r\n    \"CAA\" => \"Q\",\r\n    \"CAG\" => \"Q\",\r\n    \"AAT\" => \"N\",\r\n    \"AAC\" => \"N\",\r\n    \"AAA\" => \"K\",\r\n    \"AAG\" => \"K\",\r\n    \"GAT\" => \"D\",\r\n    \"GAC\" => \"D\",\r\n    \"GAA\" => \"E\",\r\n    \"GAG\" => \"E\",\r\n    \"TGT\" => \"C\",\r\n    \"TGC\" => \"C\",\r\n    \"TGA\" => \"Stop\",\r\n    \"TGG\" => \"W\",\r\n    \"CGT\" => \"R\",\r\n    \"CGC\" => \"R\",\r\n    \"CGA\" => \"R\",\r\n    \"CGG\" => \"R\", \r\n    \"AGT\" => \"S\",   \r\n    \"AGC\" => \"S\", \r\n    \"AGA\" => \"R\", \r\n    \"AGG\" => \"R\", \r\n    \"GGT\" => \"G\",\r\n    \"GGC\" => \"G\",\r\n    \"GGA\" => \"G\",\r\n    \"GGG\" => \"G\"\r\n);\r\n\r\n$translated_sequence = \"\";\r\n\r\n$thio_human_cds = \"ATGGTGAAGCAGATCGAGAGCAAGACTGCTTTTCAGGAAGCCTTGGACGCTGCAGGTGATAAACTTGTAGTAGTTGACTTCTCAGCCACGTGGTGTGGGCCTTGCAAAATGATCAAGCCTTTCTTTCATTCCCTCTCTGAAAAGTATTCCAACGTGATATTCCTTGAAGTAGATGTGGATGACTGTCAGGATGTTGCTTCAGAGTGTGAAGTCAAATGCATGCCAACATTCCAGTTTTTTAAGAAGGGACAAAAGGTGGGTGAATTTTCTGGAGCCAATAAGGAAAAGCTTGAAGCCACCATTAATGAATTAGTCTAA\";\r\n\r\n$codons = str_split($thio_human_cds,3);\r\n\r\necho \"<p>\\n<strong>The Human Thioredoxin DNA coding sequence<\/strong><br>\\n<span style=\\\"font-family:courier;\\\">$thio_human_cds<\/span><\/p>\\n\";\r\necho \"<p>\\n<strong>The translation of the individual codons<\/strong><br>\\n\";\r\nforeach($codons as $codon){\r\n    $translated_codon = $genetic_code[$codon];\r\n    echo \"<span style=\\\"font-family:courier;\\\">$codon translates to $translated_codon<\/span><br>\\n\";\r\n    if($translated_codon == \"Stop\"){break;}\r\n    $translated_sequence = $translated_sequence.$translated_codon;\r\n}\r\n\r\necho \"<\/p>\\n<p><strong>The Human Thioredoxin protein sequence<\/strong><br><span style=\\\"font-family:courier;\\\">$translated_sequence<\/span><\/p>\";\r\n\r\n?>\r\n<\/code><\/pre>\n<p>This is the full output of the code above:<\/p>\n<p>\n<strong>The Human Thioredoxin DNA coding sequence<\/strong><br \/>\n<span style=\"font-family:courier;\">ATGGTGAAGCAGATCGAGAGCAAGACTGCTTTTCAGGAAGCCTTGGACGCTGCAGGTGATAAACTTGTAGTAGTTGACTTCTCAGCCACGTGGTGTGGGCCTTGCAAAATGATCAAGCCTTTCTTTCATTCCCTCTCTGAAAAGTATTCCAACGTGATATTCCTTGAAGTAGATGTGGATGACTGTCAGGATGTTGCTTCAGAGTGTGAAGTCAAATGCATGCCAACATTCCAGTTTTTTAAGAAGGGACAAAAGGTGGGTGAATTTTCTGGAGCCAATAAGGAAAAGCTTGAAGCCACCATTAATGAATTAGTCTAA<\/strong><\/p>\n<p>\n<strong>The translation of the individual codons<\/strong><br \/>\n<span style=\"font-family:courier;\">ATG translates to M<\/span><br \/>\n<span style=\"font-family:courier;\">GTG translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">CAG translates to Q<\/span><br \/>\n<span style=\"font-family:courier;\">ATC translates to I<\/span><br \/>\n<span style=\"font-family:courier;\">GAG translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">AGC translates to S<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">ACT translates to T<\/span><br \/>\n<span style=\"font-family:courier;\">GCT translates to A<\/span><br \/>\n<span style=\"font-family:courier;\">TTT translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">CAG translates to Q<\/span><br \/>\n<span style=\"font-family:courier;\">GAA translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">GCC translates to A<\/span><br \/>\n<span style=\"font-family:courier;\">TTG translates to L<\/span><br \/>\n<span style=\"font-family:courier;\">GAC translates to D<\/span><br \/>\n<span style=\"font-family:courier;\">GCT translates to A<\/span><br \/>\n<span style=\"font-family:courier;\">GCA translates to A<\/span><br \/>\n<span style=\"font-family:courier;\">GGT translates to G<\/span><br \/>\n<span style=\"font-family:courier;\">GAT translates to D<\/span><br \/>\n<span style=\"font-family:courier;\">AAA translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">CTT translates to L<\/span><br \/>\n<span style=\"font-family:courier;\">GTA translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">GTA translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">GTT translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">GAC translates to D<\/span><br \/>\n<span style=\"font-family:courier;\">TTC translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">TCA translates to S<\/span><br \/>\n<span style=\"font-family:courier;\">GCC translates to A<\/span><br \/>\n<span style=\"font-family:courier;\">ACG translates to T<\/span><br \/>\n<span style=\"font-family:courier;\">TGG translates to W<\/span><br \/>\n<span style=\"font-family:courier;\">TGT translates to C<\/span><br \/>\n<span style=\"font-family:courier;\">GGG translates to G<\/span><br \/>\n<span style=\"font-family:courier;\">CCT translates to P<\/span><br \/>\n<span style=\"font-family:courier;\">TGC translates to C<\/span><br \/>\n<span style=\"font-family:courier;\">AAA translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">ATG translates to M<\/span><br \/>\n<span style=\"font-family:courier;\">ATC translates to I<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">CCT translates to P<\/span><br \/>\n<span style=\"font-family:courier;\">TTC translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">TTT translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">CAT translates to H<\/span><br \/>\n<span style=\"font-family:courier;\">TCC translates to S<\/span><br \/>\n<span style=\"font-family:courier;\">CTC translates to L<\/span><br \/>\n<span style=\"font-family:courier;\">TCT translates to S<\/span><br \/>\n<span style=\"font-family:courier;\">GAA translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">TAT translates to Y<\/span><br \/>\n<span style=\"font-family:courier;\">TCC translates to S<\/span><br \/>\n<span style=\"font-family:courier;\">AAC translates to N<\/span><br \/>\n<span style=\"font-family:courier;\">GTG translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">ATA translates to I<\/span><br \/>\n<span style=\"font-family:courier;\">TTC translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">CTT translates to L<\/span><br \/>\n<span style=\"font-family:courier;\">GAA translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">GTA translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">GAT translates to D<\/span><br \/>\n<span style=\"font-family:courier;\">GTG translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">GAT translates to D<\/span><br \/>\n<span style=\"font-family:courier;\">GAC translates to D<\/span><br \/>\n<span style=\"font-family:courier;\">TGT translates to C<\/span><br \/>\n<span style=\"font-family:courier;\">CAG translates to Q<\/span><br \/>\n<span style=\"font-family:courier;\">GAT translates to D<\/span><br \/>\n<span style=\"font-family:courier;\">GTT translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">GCT translates to A<\/span><br \/>\n<span style=\"font-family:courier;\">TCA translates to S<\/span><br \/>\n<span style=\"font-family:courier;\">GAG translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">TGT translates to C<\/span><br \/>\n<span style=\"font-family:courier;\">GAA translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">GTC translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">AAA translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">TGC translates to C<\/span><br \/>\n<span style=\"font-family:courier;\">ATG translates to M<\/span><br \/>\n<span style=\"font-family:courier;\">CCA translates to P<\/span><br \/>\n<span style=\"font-family:courier;\">ACA translates to T<\/span><br \/>\n<span style=\"font-family:courier;\">TTC translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">CAG translates to Q<\/span><br \/>\n<span style=\"font-family:courier;\">TTT translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">TTT translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">GGA translates to G<\/span><br \/>\n<span style=\"font-family:courier;\">CAA translates to Q<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">GTG translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">GGT translates to G<\/span><br \/>\n<span style=\"font-family:courier;\">GAA translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">TTT translates to F<\/span><br \/>\n<span style=\"font-family:courier;\">TCT translates to S<\/span><br \/>\n<span style=\"font-family:courier;\">GGA translates to G<\/span><br \/>\n<span style=\"font-family:courier;\">GCC translates to A<\/span><br \/>\n<span style=\"font-family:courier;\">AAT translates to N<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">GAA translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">AAG translates to K<\/span><br \/>\n<span style=\"font-family:courier;\">CTT translates to L<\/span><br \/>\n<span style=\"font-family:courier;\">GAA translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">GCC translates to A<\/span><br \/>\n<span style=\"font-family:courier;\">ACC translates to T<\/span><br \/>\n<span style=\"font-family:courier;\">ATT translates to I<\/span><br \/>\n<span style=\"font-family:courier;\">AAT translates to N<\/span><br \/>\n<span style=\"font-family:courier;\">GAA translates to E<\/span><br \/>\n<span style=\"font-family:courier;\">TTA translates to L<\/span><br \/>\n<span style=\"font-family:courier;\">GTC translates to V<\/span><br \/>\n<span style=\"font-family:courier;\">TAA translates to Stop<\/span>\n<\/p>\n<p><strong>The Human Thioredoxin protein sequence<\/strong><br \/><span style=\"font-family:courier;\">MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKYSNVIFLEVDVDDCQDVASECEVKCMPTFQFFKKGQKVGEFSGANKEKLEATINELV<\/span><\/p>\n<p>A few points are worth noting in respect to our DNA sequence to protein sequence translation script above:<\/p>\n<ul>\n<li>We use a &#8220;break&#8221; statement, inside the foreach loop that cycles sequentially through each codon, that is executed if we find a &#8220;stop&#8221; codon in the sequence, as we do not wish to add a &#8220;Stop&#8221; string in our translated sequence but rather stop the translation job and exit the foreach cycle. Indeed executing a &#8220;break&#8221; statement within a cycle will stop it and the code that follows the cycle in the script will be executed. This is in contrast with die(), that will entirely terminate the script execution, as we have seen in the <a href=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/chapter-4-adding-a-dynamic-layer-introducing-the-php-programming-language\/php-programming-language-basics-conditional-statements-if-elseif-else\/\">PHP conditional statements section<\/a> earlier in this chapter.<\/li>\n<li>There are actually some issues with the output as it is now in the script. Specifically, the initial DNA sequence is a very long uninterrupted string and will normally force an horizontal scrolling in the web page in order to be able to see it fully. This is not happening here, in this very page, as the WordPress template used will prevent that. If you however execute the translation script above in a standalone page, you will see that an horizontal scrolling bar will be present, which is not nice. There are of course ways to format the sequence before giving it in output to a webpage, inserting break tags &#8211; for example every 80 characters &#8211; so as to have a nice display and avoid horizontal scrolling. This was not implemented in this specific example.<\/li>\n<li>The $translated_sequence variable is declared as empty before the foreach cycle, and then filled up with translation results during the cycle. This is a classical way to proceed: declare an empty string or empty array before the start of a cycle and then fill it up during the cycle. Take note.<\/li>\n<\/ul>\n<h2 id=\"amino-acids-classification\">Classifying amino-acids in a peptide or protein sequence according to their nature (nonpolar, polar, basic, acidic)<\/h2>\n<p>Let us use the ability that we have acquired in this section to split a sequence into individual amino-acids or nucleotides to classify all the amino-acids of a peptide or protein sequence according to their nature, expanding on the example given at the end of the <a href=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/chapter-4-adding-a-dynamic-layer-introducing-the-php-programming-language\/php-programming-language-basics-built-in-predefined-functions-strings-and-biological-sequences-manipulation\/\">previous section<\/a>.<\/p>\n<pre lang=\"php\"><code>\r\n<?php\r\n\r\n$nonpolar =\"FLIMVPAWG\"; \/\/ A string made by all the nonpolar amino-acids in single letter notation\r\n$polar = \"STYCQN\"; \/\/ Polar amino-acids\r\n$basic = \"HKR\"; \/\/ Basic amino-acids\r\n$acidic = \"DE\"; \/\/ Acidic amino-acids\r\n\r\n$peptide = \"MVKQIESKTAFQEALDAAGDKLVVVDF\";\r\n\r\n\/\/ Splitting the peptide in an array of the individual amino-acids\r\n$single_aminoacids = str_split($peptide,1); \r\n\r\n\/\/ In the following array $aminoacids_classified we will collect each \r\n\/\/ aminoacid AND it's nature as a mini sub array of 2 elements\r\n\/\/ [(aminoacid,nature),(aminoacid,nature),.....]\r\n\/\/ in order to then provide an output\r\n$aminoacids_classified = array();\r\n\r\n\/\/ Let's start cycling through the amino-acids array\r\n\r\nforeach($single_aminoacids as $aminoacid){\r\n    if(strrchr($nonpolar, $aminoacid)){\r\n        $aminoacids_classified[] = array($aminoacid, \"nonpolar\");\r\n    }\r\n    elseif(strrchr($polar, $aminoacid)){\r\n        $aminoacids_classified[] = array($aminoacid, \"polar\");\r\n    }\r\n    elseif(strrchr($basic, $aminoacid)){\r\n        $aminoacids_classified[] = array($aminoacid, \"basic\");\r\n    }\r\n    elseif(strrchr($acidic, $aminoacid)){\r\n        $aminoacids_classified[] = array($aminoacid, \"acidic\");\r\n    }\r\n    else{\r\n        \/\/ We leave the possibility open to encounter an unknown character we have not classified\r\n        $aminoacids_classified[] = array($aminoacid, \"unclassified\");\r\n    }\r\n}\r\n\r\n\/\/ We now provide an HTML output\r\n\r\necho \"<p>\\n<strong>Here is a full listing of the amino-acids in our sequence<\/strong>\\n<ol>\\n\";\r\nforeach($aminoacids_classified as $aa_result){\r\n    echo \"<li>\".$aa_result[0].\" => \".$aa_result[1].\"<\/li>\\n\"; \r\n}\r\necho \"<\/ol>\\n<\/p>\";\r\n\r\n?>\r\n<\/code><\/pre>\n<p>This is the output of the script:<\/p>\n<p>\n<strong>Here is a full listing of the amino-acids in our sequence<\/strong><\/p>\n<ol>\n<li>M => nonpolar<\/li>\n<li>V => nonpolar<\/li>\n<li>K => basic<\/li>\n<li>Q => polar<\/li>\n<li>I => nonpolar<\/li>\n<li>E => acidic<\/li>\n<li>S => polar<\/li>\n<li>K => basic<\/li>\n<li>T => polar<\/li>\n<li>A => nonpolar<\/li>\n<li>F => nonpolar<\/li>\n<li>Q => polar<\/li>\n<li>E => acidic<\/li>\n<li>A => nonpolar<\/li>\n<li>L => nonpolar<\/li>\n<li>D => acidic<\/li>\n<li>A => nonpolar<\/li>\n<li>A => nonpolar<\/li>\n<li>G => nonpolar<\/li>\n<li>D => acidic<\/li>\n<li>K => basic<\/li>\n<li>L => nonpolar<\/li>\n<li>V => nonpolar<\/li>\n<li>V => nonpolar<\/li>\n<li>V => nonpolar<\/li>\n<li>D => acidic<\/li>\n<li>F => nonpolar<\/li>\n<\/ol>\n<div class=\"google-ad\"><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script><br \/>\n<!-- bioinfo web dev 2 --><br \/>\n<ins class=\"adsbygoogle\" style=\"display: inline-block; width: 728px; height: 90px;\" data-ad-client=\"ca-pub-0159360445983090\" data-ad-slot=\"3442176918\"><\/ins><br \/>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><\/div>\n<h2>Chapter Sections<\/h2>\n<p>[pagelist include=&#8221;435&#8243;]<\/p>\n<p>[siblings]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous section we have started to see some basics on how to manipulate strings and biological sequences in PHP by using predefined functions. In this section we explore the topic further by exploring a few more useful PHP built-in tools (predefined functions). Splitting a biological sequence in single nucleotides, codons or amino-acids with &hellip; <a href=\"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/chapter-4-adding-a-dynamic-layer-introducing-the-php-programming-language\/php-programming-language-basics-more-on-strings-and-biological-sequences-manipulation-with-predefined-functions\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;4-7: PHP programming language basics &#8211; more on strings and biological sequences manipulation with predefined functions&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":435,"menu_order":7,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-836","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/pages\/836","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/comments?post=836"}],"version-history":[{"count":77,"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/pages\/836\/revisions"}],"predecessor-version":[{"id":1606,"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/pages\/836\/revisions\/1606"}],"up":[{"embeddable":true,"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/pages\/435"}],"wp:attachment":[{"href":"http:\/\/www.cellbiol.com\/bioinformatics_web_development\/wp-json\/wp\/v2\/media?parent=836"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}