5-2: The reverse-complement sequence web application

In this section we will write a web application able to reverse, complement, or reverse-complement a DNA sequence. Starting from a DNA sequence, the reverse-complement operation enables to compute the sequence of the complementary strand, as already discussed in section 4-7 of this book where we have also provided a simple code able to achieve this operation. Building up on this code, in section 4-12 we have written a PHP function able to perform the task.

Single sequence version

We will now leverage on a slightly improved version of this function, which supports IUPAC characters, to build a web application that can compute the reverse, complement, or reverse-complement of sequences provided by users, in a web context.

We will also use the sequence breaker function seqbreak() to introduce a break every 80 nucleotides, to format the output sequence.

Using the previously written functions, stored in a functions.php file that is then imported in the script file with an “include” statement, will allow the script.php code to be extremely compact.

The application will support the FASTA format, the IUPAC code for degenerate sequences, and will have options to select the kind of transformation (reverse, complement or reverse-complement) to be applied on the input sequence.

The reverse-complement application web form
The reverse-complement application web form
The output of the reverse-complement web application
The output of the reverse-complement web application

In this first version of the application we will accept as input a single DNA sequence. At the end of this section we propose a version able to handle several sequences.

The code

As usual, the code for the web application will be distributed across several files. The general structure is the same as the one of the application developed in the previous section. Directories names are in bold.

reverse-complement
    index.php
    script.php
    html
        header.html
        footer.html
    css
        style.css
    include
        functions.php

header.html

footer.html

index.php

functions.php

The reverse-complement function – revcomp() – used in this section is modified with respect to the one proposed in section 4-12 so as to support all IUPAC characters for nucleotides (A, C, G, T, U, R, Y, S, W, K, M, B, D, H, V, N, ., -). More specifically, the complement dictionary associative array was extended.

style.css

script.php

You may test the script live here.

Batch version

Let’s now write a version of this application able to process several FASTA sequences at the same time, in batch.

To accept multiple sequences in input, we will switch the FASTA processing function, from process_fasta() to fasta_sequences_to_array(). We have already written the code for both functions in section 4-12.

The header, footer and css files remain unchanged with respect to the single sequence version. In the web form (index.php), the only change will be the name of the text-area, namely “fasta_sequence” will be changed to “fasta_sequences”. The id of the text-area and the “for” attribute of the text-area label will also be adjusted to this new value.

index.php (batch version)

The web form for the reverse-complement application, batch version
The web form for the reverse-complement application, batch version, loaded with some test sequences

In the functions file we replace process_fasta() with fasta_sequences_to_array().

functions.php (batch version)

And here is the script.

A line of the code may deserve some explanation.

When we get the sequences from the web form, we convert them in an array ($seqs_array) with this structure:

[(seq1 header, seq1 sequence),(seq2 header, seq2 sequence),(seq3 header, seq3 sequence), etc…]

with the fasta_sequences_to_array() function.

In one line of code, we transfer this information to a second array, in which both the headers and the sequences are modified.

More specifically, we append to each header a text ($t_txt) with ” – reverse”, ” – complement”, or ” – reverse-complement”, depending on the transformation selected by the user.

The sequences themselves are also modified according to the selected transformation. HTML tags are also added to the sequence. In particular a break tag is added every 80 nucleotides with the seqbreak() function and the whole sequence is embedded within a span tag with a “sequence” class, which has a font-family:courier in the CSS file.

All of this is done in a single line of code within a foreach cycle:

As it happens, to better understand the code, it should be read from right to left:

  • The sequence ($seq_array[1]) is converted to uppercase, the only characters set the revcomp() function understands
  • This uppercase sequence is passed as argument to revcomp() together with the selected transformation type ($transformation)
  • The sequence transformed by revcomp() is added a break tag every 80 nucleotides with seqbreak()
  • The sequence is then embedded within a span tag
  • The header ($seq_array[0]) is added the appropriate text accounting for the transformation
  • The transformed header and transformed and tagged sequence are the first and second element of an array
  • This two elements array is added to the transformed sequences array $seqs_array_t

Read the comments in the code to better understand the flow.

script.php (batch version)

The reverse-complement web application output, batch version
The reverse-complement web application output, batch version

You may test the script live here.

Chapter Sections

[pagelist include=”1461″]

[siblings]

WORK IN PROGRESS ON CHAPTER 5!

Leave a Reply

Your email address will not be published. Required fields are marked *