Introduction

Web skills are an increasingly important part of the “technology toolbox” that any Bioinformatics student and researcher should be constantly building and maintaining up to date (ref).

This web development course, targeted at Biology and Bioinformatics students, aims at teaching from scratch all the skills needed to setup a fully working Linux web server and to develop and deploy web applications for Bioinformatics.

No previous programming knowledge is assumed. By following this tutorial you will learn the fundamental concepts of programming by using scripting languages: variables, types, arrays, cycles, conditional statements, functions, objects, regular expressions, files reading and manipulation et-cetera.

Since this course aims at teaching software development for the web, we have chosen, among the various programming languages most used to write bioinformatics applications (Perl, Python, Ruby and others), to focus on PHP. When first released, PHP was viewed as somewhat slow in execution, as compared to other languages, and with limited vocabulary. It was regarded for a while as a language with limited scope, only good to add some dynamic features to web pages

Things have progressed and changed. Version 5 (PHP5), introduced in July 2004, constitutes a full featured programming language that allows basically every kind of Bioinformatics web application to be developed with relative ease, thanks to the rich set of built-in functions and the support for object-oriented programming. When required, PHP can make system calls to external applications and scripts, maybe written in different languages, for specialized tasks.

The introduction of PHP 7 in December 2015 have seen a further dramatic leap forward in performance with a 100% (2x) increase in speed and 50% better memory consumption with respect to PHP 5.6 thanks to the adoption of Zend engine 3 (ref).

An historical overview of the various PHP versions and their main features can be found in this interesting article on Wikipedia.

Last but not least, the learning curve of PHP is much “softer” with respect to other, maybe more complex languages, such as Python, allowing the student to concentrate on the basic programming concepts that once acquired can be easily applied to the learning of other languages. A function is a function in PHP, Python, Perl, ruby. The syntax may change, but the basic concepts stick across languages. These are all good reasons for starting with PHP, and maybe move to Perl, Python, Ruby and cgi at a later stage, if required.

Although this book is aimed at web development for bioinformatics, it is also suitable for anyone who wishes to learn how to setup a web server and develop a web site. In particular, the information in the first chapters is not really specific to Bioinformatics and applies to the development of any kind of website and web application, not requiring any particular understanding of biological concepts. When biology related code examples are used in chapter 4 and beyond, they are introduced gently and do not assume an hard core knowledge of Biology or Bioinformatics.

Bioinformatics Web Applications

We can broadly define a bioinformatics application as a software that will process some kind of biological data, either obtained directly from a user, or from other sources, and output the result of the elaboration (again either to a user, in human readable format, or maybe to another application):

Input –> Processing –> Output

In the case of a web application, the input from the user is gathered through a web page containing a web form (Figure 1).

The NCBI BLAST web form
Figure 1: The NCBI BLAST web form

Web forms include a SUBMIT button (see Figure 1). On pressing this button, the data collected in the web form are sent to a script (written in Perl, PHP, Python, Ruby or other languages).

The script:

  • processes the data from the web form. Processing could range from very simple to extremely complex, depending on the application
  • formats the processed data for the web, by using HTML
  • sends back to the user an html web page containing the formatted results

So the flow for a web application becomes:

Web form (input) –> Processing and formatting –> Web page with results (output)

The NCBI BLAST web application
Figure 2: The NCBI BLAST web application. This figure underscores the fact that while the web form (input) and the script output are generally rendered with HTML, CSS and optionally, JavaScript, the script itself requires a programming language such as PHP, Perl, Pyton, Ruby other languages with CGI support.

In order to design and write a bioinformatics web application ex-novo, a number of technical skills are therefore required. Here is a minimalistic list:

  • Understanding HTML and CSS, in order to be able to create a web page, a web form, and to format a script output so as to render it in HTML, as a web page
  • Learn a scripting language, either PHP or another language that allows to write cgi scripts. Writing a simple application can require very few programming skills that you can learn maybe in a day or so: stick with us.

In addition, if you want to set up a web server from scratch, instead of just using an account on somebody else’s server (which of course if perfectly fine, although slightly less cool than setting up your own server), it is very useful to learn the LINUX operating system basics and to possess some notions about the Internet, networks and TCP/IP. To be on the internet as an active contributor and resource builder, better understand where you are. The more you know, the better you will be able to move freely, be creative and stay secure. So let’s add:

  • Learning LINUX basics: installation, setting up a web server with apache, using the shell instead of a graphical interface
  • Understanding the basics of Internet and Networks

How is this course organized

In this web development for bioinformatics course you will learn all the skills detailed above, from scratch. No previous knowledge about HTML, Linux or programming is required. If you already have some knowledge, you may skip chapters and get directly to what you are interested in. Otherwise, it is recommended that you follow the proposed flow, as every chapter introduces new concepts that, in general, are the foundation for what comes next. Therefore, by following the flow, you will acquire, gradually, all the knowledge required to do the job.

This is a course for beginners, although experienced programmers may indeed benefit from different aspects of the book, depending on their background. For one, if you are an accomplished programmer not familiar with PHP programming and/or web programming, this is a great, easy and friendly place to learn. You will be able, thanks to your prior experience, to go quickly through the various topics and grab the concepts you need to make the transition from local application to web applications.

Web applications, as applications in general, can get to be extremely complex in terms of input processing. For instance, processing might involve calls to other applications, that will do part of the job, and then return the results to the “main” script for further elaboration. In this book all the “foundations” are provided. The ability to implement applications of growing complexity will come with time, practice and experience, and will probably require aspects that are not covered in this book. Still, the provided knowledge constitutes the basis on which you will be able to build, according to your personal interests.

Based on the solid foundation provided by this book, you should aim at developing your own original and rich toolbox.
Figure 3: Your toolbox. Based on the solid foundation provided by this book, you should aim at developing your own original and rich toolbox. Knife images credits

Let’s start by learning the Internet and networks basics.