Bioinformatics Frequently Asked Questions
jump to outline contents
jump to detailed contents
Latest changes
- added Georgia State University's courses---thanks to Eric VanWieren
- added FH Weihenstephan in Freising---thanks to Tobias Kailich
- added Johns Hopkins' courses---thanks to Tim Young
- added three new bioinformatics courses from Germany---thanks to Sebastian Kurscheid
- added courses at University of Illinois---thanks to Amit Sabnis
$Revision: 1.207 $ $Date: 2005/04/05 13:06:07 $
Introduction
Mail your questions to me, Damian
Counsell, and I'll try to bring you
answers. Alternatively, if you have your own answers, mail
them to me and I'll incorporate them. The practical section in particular is
full of gaps so your contributions to that are particularly
welcome; I am slowly completing and extending the entries
when I have the time.
Although I am happy to tackle questions of
general interest to all visitors to the site,
please note that:
- I cannot answer queries specific to you alone,
- I am not a careers adviser,
- I try not to offer opinions on the relative merits of bioinformatics courses,
- I won't answer your essay questions, assignments, or homework,
- I won't provide you with a list of companies for you to market your bioinformatics product to,
- I won't suggest a project for your Master's/PhD,
- I have not devised a bioinformatic cure for cancer---and neither have you, and
- This FAQ is perpetually under construction.
I hope, however, that the information here helps with your
studies, career and work.
I acknowledge the
help of many other individuals in creating this part of the
Bioinformatics.Org site. If you have contributed and I have
forgotten to credit you, please email
me and I will correct my oversight immediately.
Bioinformatics is, I believe, a special kind of
engineering discipline---it certainly isn't a
"pure" science. It has been enormously successful
in its short existence and I think its
successes have been the result of a practical and rigorous
approach which I hope to encourage in anyone interested in
entering the field.
This document is not a scientific paper or textbook (yet).
You will find blunt
opinions here. If you disagree with me about any of the
following please
tell me. I hope to learn a lot from your inevitable and
welcome criticisms.
There is certainly one sense in which I consider myself a
pure scientist: I'm open to rational persuasion.
I write this resource and hold the copyright for the
purposes of protecting its content from intellectual
property pirates. By that I mean I want to keep this out of
the hands of people who steal the work of others for
commercial gain, and those who abuse and extend the powers
of IP law at the expense of the disadvantaged---rather than
those who would like to copy or mirror this resource for
educational reasons. (This may sound overdramatic, but the
FAQ has already been pirated for
doubtful purposes.)
Overview
Contents
Definitions: What is Bioinformatics?
Definition of Bioinformatics: What is bioinformatics?
Roughly, bioinformatics describes any use of computers to
handle biological information.
In practice, the definition used by most people is
narrower; bioinformatics to them is a synonym for
"computational molecular biology"---the use of
computers to characterize the molecular components of living
things.
What is
Bioinformatics?---The Tight Definition
"Classical"
bioinformatics
Most biologists talk about "doing
bioinformatics" when they use computers to
store, retrieve, analyze or
predict the composition or the
structure of biomolecules. As computers become more
powerful you could probably add simulate to this
list of bioinformatics verbs. "Biomolecules"
include your genetic material---nucleic acids---and the
products of your genes: proteins. These are the
concerns of "classical" bioinformatics, dealing
primarily with sequence analysis.
Fredj Tekaia at the Institut Pasteur
offers this definition of bioinformatics:
"The mathematical, statistical and computing methods
that aim to solve biological problems using DNA and amino
acid sequences and related information."
It is a mathematically interesting property of most large
biological molecules that they are
polymers; ordered chains of simpler
molecular modules called monomers. Think of
the monomers as beads or building blocks which, despite
having different colours and shapes, all have the same
thickness and the same way of connecting to one
another.
Monomers that can combine in a chain are of the same
general class, but each kind of monomer in that class has
its own well-defined set of characteristics.
Many monomer molecules can be joined together to form a
single, far larger, macromolecule.
Macromolecules can have exquisitely specific informational
content and/or chemical properties.
According to this scheme, the monomers in a given
macromolecule of DNA or protein can be treated
computationally as letters of an alphabet,
put together in pre-programmed arrangements to carry messages
or do work in a cell.
"New"
bioinformatics
The greatest achievement of bioinformatics methods, the Human
Genome Project, is currently being completed. Because of
this the nature and priorities of bioinformatics research and
applications are changing. People often talk portentously of
our living in the "
post-genomic" era. My personal view is that this
will affect bioinformatics in several ways:
- Now we possess multiple whole genomes we can look for
differences and similarities between all the genes of
multiple species. From such studies we can draw particular
conclusions about species and general ones about evolution.
This kind of science is often referred to as
comparative genomics.
- There are now technologies designed to measure the
relative number of copies of a genetic message (levels of
gene expression) at different stages in development or
disease or in different tissues. Such technologies, such as
DNA microarrays
will grow in importance.
- Other, more direct, large-scale ways of identifying
gene functions and associations (for example
yeast two-hybrid methods) will grow in significance and
with them the accompanying bioinformatics of
functional genomics.
-
There will be a general shift in emphasis (of sequence
analysis especially) from genes themselves to gene
products. This will lead to:
- attempts to catalogue the activities and
characterize interactions between all gene products (in
humans): proteomics ).
- attempts to crystallize and or predict the
structures of all proteins (in humans):
structural genomics.
- fewer DNA double-helices in bad sci-fi movies.
- What some people refer to as research
or medical informatics, the management of
all biomedical experimental data associated with particular
molecules or patients---from mass spectroscopy, to in
vitro assays to clinical side-effects---will move from
the concern of those working in drug company and hospital
I.T. (information technology) into the mainstream of cell
and molecular biology and migrate from the commercial and
clinical to academic sectors.
This FAQ concentrates on classical bioinformatics, but will,
I hope, grow to cover more of the "post-genomic"
aspects of the field. It is worth noting that all of the
above non-classical areas of research depend upon established
sequence analysis techniques.
Definitions of Fields
Related to Bioinformatics
What is Biophysics?
Molecular biology itself grew
out of biophysics.The British Biophysical
Society defines biophysics as:
"an interdisciplinary field which applies techniques from the physical sciences to understanding biological structure and function"
More
information about the various facets of the discipline
can be found at the society's
site hosted at Birkbeck
College, London.
Mike Goodrich wrote to ask what the status of biophysics
was given the definition of computational biology submitted
by Paul Schulte (below). A recent
article in The
Scientist [free registration required] dealt with
this question---thanks to Jo Wixon (Managing Editor of Comparative
and Functional Genomics) for the reference.
What is Computational Biology?
Computational biologists might object (please
do), but, I find that people use "computational
biology" when discussing that subset of bioinformatics
(in the broadest sense) closest to the field of classical
general biology.
Computational biologists interest themselves more with
evolutionary, population and theoretical biology rather than
cell and molecular biomedicine. It is inevitable that
molecular biology is profoundly important in computational
biology, but it is certainly not what computational
biology is all about (see next paragraph). In these areas of
computational biology it seems that computational biologists
have tended to prefer statistical models for biological
phenomena over physico-chemical ones. This is often
wise...
One computational biologist (Paul J Schulte) did object to
the above and makes the entirely valid point that this
definition derives from a popular use of the term, rather
than a correct one. Paul works on water flow in plant cells.
He points out that biological fluid dynamics is a field of
computational biology in itself. He argues that this, and any
application of computing to biology, can be described as
"computational biology" (see also the "loose"
definition of bioinformatics below). Where we disagree,
perhaps, is in the conclusion he draws from this---which I
reproduce in full:
"Computational biology is not a "field", but
an "approach" involving the use of computers to
study biological processes and hence it is an area as
diverse as biology itself."
Richard Durbin, Head of Informatics at the Wellcome Trust Sanger
Institute, expressed an interesting opinion on this
distinction in an
interview:
"I do not think all biological computing is
bioinformatics, e.g. mathematical modelling is not
bioinformatics, even when connected with biology-related
problems. In my opinion, bioinformatics has to do with
management and the subsequent use of biological
information, particular genetic information."
What is Medical
Informatics?
The Medical
Informatics FAQ (no relation) provides the following
definition:
"Biomedical Informatics is an emerging discipline that
has been defined as the study, invention, and
implementation of structures and algorithms to improve
communication, understanding and management of medical
information."
That FAQ also points here
Aamir Zakaria, the author of the FAQ, emphasises that medical
informatics is more concerned with structures and algorithms
for the manipulation of medical data, rather than with the
data itself.
This suggests that one difference between bioinformatics
and medical informatics as disciplines lies with their
approaches to the data; there are bioinformaticians
interested in the theory behind the manipulation of that data
and there are bioinformatics scientists concerned
with the data itself and its biological implications. (I
believe that a good bioinformatics researcher should be
interested in both of these aspects of the field.)
Medical informatics, for practical reasons, is more likely
to deal with data obtained at "grosser" biological
levels---that is information from super-cellular systems,
right up to the population level---while most bioinformatics
is concerned with information about cellular and biomolecular
structures and systems.
On both of these points I'd be happy for any medical
informatics specialists to
correct me.
What is
Cheminformatics?
The Web advertisement for Cambridge Healthtech
Institute's Sixth Annual Cheminformatics conference
describes the field thus:
"the combination of chemical synthesis, biological
screening, and data-mining approaches used to guide drug
discovery and development"
but this, again, sounds more like a field being identified by
some of its most popular (and lucrative) activities, rather
than by including all the diverse studies that come under its
general heading.
The
story of one of the most successful drugs of all time,
penicillin,
seems bizarre, but the way we discover and develop drugs
even now has similarities, being the result of chance,
observation and a lot of slow, intensive chemistry. Until
recently, drug design always seemed doomed to continue to be
a labour-intensive, trial-and-error process. The possibility
of using information technology, to plan intelligently and
to automate processes related to the chemical synthesis of
possible therapeutic compounds is very exciting for chemists
and biochemists. The rewards for bringing a drug to market
more rapidly are huge, so naturally this is what a lot of
cheminformatics works is about.
Here
is a page with a commercial slant which links to some
interesting discussions of the term
"cheminformatics", what it means, whether or not
it exists as a distinct discipline, and even whether it
should be replaced by "chemoinformatics".
The span of academic cheminformatics is wide and is
exemplified by the interests of the cheminiformatics groups
at the Centre for Molecular
and Biomolecular Informatics at the University of Nijmegen in the
Netherlands. These interests include:
- Synthesis Planning
- Reaction and Structure Retrieval
- 3-D Structure Retrieval
- Modelling
- Computational Chemistry
- Visualisation Tools and Utilities
Trinity
University's Cheminformatics Web
page, for another example, concerns itself with
cheminformatics as the use of the Internet in chemistry.
What is Genomics?
Genomics is a field which existed before the completion of
the sequences of genomes, but in the crudest of forms, for
example the oft-re-referenced estimate of 100 000 genes in
the human genome derived from a(n) (in)famous piece of
"back of an envelope" genomics, guessing the weight
of chromosomes and the density of the genes they bear.
Genomics is any attempt to analyze or compare the entire
genetic complement of a species or species (plural). It is,
of course possible to compare genomes by comparing
more-or-less representative subsets of genes within
genomes.
What is Mathematical Biology?
Mathematical biology is easier to distinguish from
bioinformatics than computational biology. Mathematical
biology also tackles biological problems, but the methods it
uses to tackle them need not be numerical and need not be
implemented in software or hardware. Indeed, such methods
need not "solve" anything; in mathematical biology
it would be considered reasonable to publish a result which
merely establishes that a biological problem
belongs to a particular general class.
The distinction between bioinformatics and mathematical
biology was illuminated by an email I received from Alex Kasman
at the College of
Charleston. According to his working
definition, he distinguished bioinformatics which (under the
tight
definition at least)...
"...seems to focus almost exclusively on specific
algorithms that can be applied to large molecular
biological data sets..."
...from mathematical biology which...
"...includes things of theoretical interest which are
not necessarily algorithmic, not necessarily molecular in
nature, and are not necessarily useful in analyzing
collected data."
What is Proteomics?
A recent
review on proteomics in the journal Nature defined the
field this way:
"The term proteome was first
coined to describe the set of proteins encoded by the
genome1. The study of the proteome, called proteomics, now
evokes not only all the proteins in any given cell, but
also the set of all protein isoforms and modifications,
the interactions between them, the structural description
of proteins and their higher-order complexes, and for that
matter almost everything 'post-genomic'."
Michael J.Dunn, the Editor-in-Chief of
Proteomics defines the "proteome" as:
"the PROTEin complement of the genOME"
and proteomics to be concerned with:
"qualitative and quantitative studies of gene
expression at the level of the functional proteins
themselves"
that is:
"an interface between protein biochemistry and
molecular biology"
Characterizing the many tens of thousands of proteins
expressed in a given cell type at a given time---whether
measuring their molecular weights or isoelectric points,
identifying their ligands or determining their
structures---involves the storage and comparison of vast
numbers of data. Inevitably this requires bioinformatics.
Here is a
constructively skeptical review by Lukas
Huber.
What is
Pharmacogenomics?
Pharmacogenomics is the application of genomic approaches
and technologies to the identification of drug targets.
Examples include trawling entire genomes for potential
receptors by bioinformatics means, or by investigating
patterns of gene expression in both pathogens and hosts
during infection, or by examining the characteristic
expression patterns found in tumours or patients samples for
diagnostic purposes (possibly in the pursuit of potential
cancer therapy targets).
The term "pharmacogenomics" is used for the
more "trivial"---but arguably more
useful---application of bioinformatics approaches to the
cataloguing and processing of information relating to
pharmacology and genetics, for example the accumulation of
information in databases like this
one. (Thanks to Ivanovi.)
What is
Pharmacogenetics?
All individuals respond differently to drug treatments;
some positively, others with little obvious change in their
conditions and yet others with side effects or allergic
reactions. Much of this variation is known to have a genetic
basis. Pharmacogenetics is a subset of pharmacogenomics
which uses genomic/bioinformatic methods to identify genomic
correlates, for example SNPs (Single
Nucleotide Polymorphisms), characteristic
of particular patient response profiles and use those
markers to inform the administration and development of
therapies. Strikingly, such approaches have been used to
"resurrect" drugs thought previously to be
ineffective, but subsequently found to work with in subset
of patients. They can also be used for optimizing the doses
of chemotherapy for particular patients.
Overview of most common bioinformatics programs
Everyday bioinformatics is done with sequence search
programs like BLAST,
sequence analysis programs, like the EMBOSS and Staden
packages, structure prediction programs like THREADER
or PHD
or molecular imaging/modelling programs like RasMol and
WHATIF.
Overview of most common bioinformatics technology
Currently, a lot of bioinformatics work is concerned with
the technology of databases
(Thanks again to Ivanovi.) These databases include both
"public" repositories of gene data like
GenBank or the Protein DataBank (the
PDB), and private databases, like those used by research
groups involved in gene mapping projects or those held by
biotech companies. Making such databases accessible via open
standards is very important. Consumers of bioinformatics
data use a range of computer platforms: from the more
powerful and forbidding UNIX boxes favoured by the
developers and curators to the far friendlier Macs often
found populating the labs of computer-wary biologists.
Databases of existing sequencing data can be used to
identify homologues of new molecules that have been
amplified and sequenced in the lab. The property of sharing a
common ancestor, homology, can be a very powerful
indicator in bioinformatics (see below).
Acquisition of sequence data
Bioinformatics tools can be used to obtain sequences of
genes or proteins of interest, either from material obtained,
labelled, prepared and examined in electric fields by
individual researchers/groups or from repositories of
sequences from previously investigated material.
Analysis of data
Both types of sequence can then be analysed in many ways
with bioinformatics tools.
They can be assembled. Note that this is one of
the occasions when the meaning of a biological term differs
markedly from a computational one (see the amusing confusion
over the issue at Web-based geek forum Slashdot). Computer
scientists, banish from your mind any thought of assembly
language. Sequencing can only be performed for relatively
short stretches of a biomolecule and finished sequences are
therefore prepared by arranging overlapping "reads"
of monomers (single beads on a molecular chain) into
a single continuous passage of "code".
This is the bioinformatic sense of assembly.
They can be mapped---that is, their
sequences can be parsed to find sites where so-called
"restriction enzymes" will cut them.
They can be compared, usually by aligning
corresponding segments and looking for matching and
mismatching letters in their sequences. Genes or proteins
that are sufficiently similar are likely to be related and
are therefore said to be "homologous" to each
other---the whole truth is rather more complicated than this.
Such cousins are called "homologues".
If a homologue (a related molecule) exists, then a newly
discovered protein may be modelled---that is the three
dimensional structure of the gene product can be predicted
without doing laboratory experiments.
Bioinformatics is used in primer design. Primers
are short sequences needed to make many copies of (amplify) a
piece of DNA as used in PCR (the Polymerase
Chain Reaction).
Bioinformatics is used to attempt to predict the
function of actual gene products.
Information about the similarity, and, by implication, the
relatedness of proteins is used to trace the "family
trees" of different molecules through evolutionary
time.
There are various other applications of computer analysis
to sequence data, but, with so much raw data being generated
by the Human Genome Project and other initiatives in biology,
computers are presently essential for many biologists just to
manage their day-to-day results
Molecular modelling / structural biology is a growing
field which can be considered part of bioinformatics. There
are, for example, tools which allow you (often via the Net)
to make pretty good predictions of the secondary
structure of proteins arising from a given amino acid
sequence, often based on known "solved" structures
and other sequenced molecules acquired by structural
biologists.
Structural biologists use "bioinformatics" to
handle the vast and complex data from X-ray crystallography,
nuclear magnetic resonance (NMR) and electron microscopy
investigations and create the 3-D models of molecules that
seem to be everywhere in the media.
note
Unfortunately the word "map" is
used in several different ways in
biology/genetics/bioinformatics. The definition given above
is the one most frequently used in this context, but a gene
can be said to be "mapped" when its parent
chromosome has been identified, when its physical or genetic
distance from other genes is established and---less
frequently---when the structure and locations of its various
coding components (its "exons") are
established.
What is
Bioinformatics?---The Loose definition
There are other fields---for example medical imaging /
image analysis which might be considered part of
bioinformatics. There is also a whole other discipline of
biologically-inspired computation;
genetic algorithms, AI, neural networks. Often these
areas interact in strange ways. Neural networks, inspired by
crude models of the functioning of nerve cells in the brain,
are used in a program called PHD to predict, surprisingly
accurately, the secondary structures of proteins from their
primary sequences.
What almost all bioinformatics has in common is the
processing of large amounts of biologically-derived
information, whether DNA sequences or breast X-rays.
How old is the
discipline?
"How old is bioinformatics?" The answer to this
one depends on which source you choose to read.
From T K Attwood and D J Parry-Smith's
"Introduction to Bioinformatics", Prentice-Hall
1999 [Longman Higher Education; ISBN 0582327881]:
"The term bioinformatics is used to encompass
almost all computer applications in biological sciences,
but was originally coined in the mid-1980s for the analysis
of biological sequence data."
From Mark S. Boguski's article in the "Trends
Guide to Bioinformatics" Elsevier, Trends Supplement
1998 p1:
"The term "bioinformatics" is a
relatively recent invention, not appearing in the
literature until 1991 and then only in the context of the
emergence of electronic publishing...
"...However, some of my role models when I was a
graduate student (Margaret O. Dayhoff, Russell F.
Doolittle, Walter M. Fitch and Andrew D. McLachlan) had
been building databases, developing algorithms and making
biological discoveries by sequence analysis since the
1960s---long before anyone thought to label this activity
with a special term (if anything it was called `molecular
evolution'). Even a relatively new kid on the block,
the National Center for Biotechnology Information (NCBI),
is celebrating its 10th anniversary this year, having been
written into existence by US Congressman Claude Pepper and
President Ronald Reagan in 1988. So bioinformatics has, in
fact, been in existence for more than 30 years and is now
middle-aged."
Books: Can you recommend any bioinformatics books?
It's notoriously difficult to find any books on
bioinformatics itself that cater well for all of those
coming from computing, from mathematics and from biology
backgrounds. The few textbooks available in the field tend
to be eyewateringly expensive as well. I've divided
suggested reading into books of
general interest, those best suited to
people coming from a computational/mathematical background
and books for biologists
interested in bioinformatics. Where a book is also
listed in Bioinformatics.Org's books section I
have linked the title to the relevant entry there. Links to
other lists of bioinformatics books follow this section of
suggested reading.
General
introductions
Many people are curious about the Human Genome (Project).
The completion of the first draft probably represents
bioinformatics' coming of age as a discipline. The first
couple of books are aimed at the intelligent layperson.
A gossipy and insightful account of the race to sequence
the genome can be found in "The
Sequence" by Kevin Davies [Weidenfeld; ISBN
0297646982]. Matt Ridley's
"Genome" [Fourth Estate; ISBN
185702835X] is both an interesting layperson's
introduction to the issues raised by the bioinformatic
revolution and an overview of its biology and enormous scope.
If I remember rightly, Ridley's book received a slightly
snooty review from Walter Bodmer. This is understandable,
since his and Robin McKie's excellent
"pre-genomic" guide to the Human Genome Mapping
Project, "The Book of Life" [Oxford Paperbacks;
ISBN 0195114876] was undeservedly in a remainders bin when I
bought my copy a couple of years ago.
If you are a non-biological scientist (or a non-scientist)
and are hooked by these, why not go back to the "real
beginning" of the race and read James Watson's
entertaining and indiscreet memoir of his and Francis
Crick's determination of the structure of DNA,
"The Double Helix" [Penguin; ISBN
0140268774]---now updated with an introduction by media don
Steve Jones.
Nigel Barber at Peterborough Regional College in the UK
recommends Gary Zweiger's "Transducing the
Genome" [McGraw-Hill Professional Publishing: ISBN
0071369805]. The summary
at Amazon makes it sound a tad pretentious, but all the
reviews seem pretty positive so it might be worth a read.
If you are a quantitative scientist and would like a
deeper knowledge of contemporary (molecular) biology, but
you want to acquire it as painlessly as possible you could
try the following:
- Donna Rae Siegfried's Biology for
Dummies [Wiley; ISBN 0-7645-5326-7] is fun, well
thought out and a lot more informative than the title
might suggest. If only all biology textbooks were this
entertaining and unpretentious.
- If you already have some biological knowledge and
would like to get a grip on modern biomolecular science
then Richard J. Epstein's Human Molecular
Biology is an elegant, colourful and detailed
guide.
There are two classic competing texts in cell and
molecular biology which Maximilian Haeussler reminds me to
include: Alberts et al's Molecular Biology
of the Cell [Garland Science: ISBN 0815340729] and
Molecular Biology of the Gene [Benjamin
Cummings: ISBN 0321248643].
Computational/Mathematical
aspects
If you are a hardcore maths/computing person Michael
Waterman's "Introduction to Computational
Biology" [Chapman & Hall/CRC Statistics and
Mathematics; ISBN 0412993910] and Pavel Pevzner's
"Computational Molecular Biology - An Algorithmic
Approach" [The MIT Press (A Bradford Book); ISBN
0262161974] will give you all the discrete maths you can
shake a stick at, but perfunctory introductions to the
biology.
Bioinformatics.Org's very own Jeff Bizzaro recommends
Dan
Gusfield's "Algorithms on Strings, Trees and
Sequences" [Cambridge, 1997 ISBN
0-52158-519-8], Richard Durbin, S. Eddy, A. Krogh, G.
Mitchison "Biological
Sequence Analysis: Probabilistic Models of Proteins and
Nucleic Acids" [Cambridge, 1997 ISBN
0-52162-971-3] (which I think is one of the clearest and
most comprehensive guides to alignment algorithms) and---for
that full "computers-to-biology conversion"---
Geoffrey M. Cooper "The Cell: A Molecular
Approach" [ASM Press, 1996 ISBN 0-87893-119-8].
Jeff Ames writes that a second edition of this book is now
available [Sinauer Associates, Incorporated, 2000 ISBN
0-87893-106-6] and that this version---if you can find it in
the shops---comes with a CD.
Applying bioinformatics to
biological research
One outstanding general text for the biologist is David
W. Mount's "Bioinformatics"
[Cold Spring Harbor Press; ISBN 0879696087]. It's not
cheap, but it's the best I've seen if you are
studying bioinformatics itself.
Bioinformatics has been dismissed by some as "the
science of BLAST searches". The best collection of
advice so far on doing BLAST searches is O'Reilly's BLAST book
by Ian Korf, Mark Yandell and Joseph Bedell [O'Reilly ISBN
0-596-00299-8]. I reviewed it enthusiastically, but not
uncritically, for the UK
UNIX Users' Group magazine. I'd go as far as to
say that all biologists thinking of using BLAST in their
research should read the relevant sections before they even
go near a computer.
If you wish to use general bioinformatics tools,
especially if you are a little wary of computers, my new
"best" book is "Bioinformatics
for Dummies" [John Wiley and Sons ISBN
0764516965]. It is (obviously) aimed at people who are
beginners, who are happier using the Web rather than typing
commands, and who are more interested in learning than in
impressing people---the writing is friendly clear and
unpretentious. However, like several of my other tips
(below) it concentrates on Web-based resources so it will,
inevitably, date. (This is partially compensated for by
there being a
companion Website.)
Also, if you're coming to the subject as a computer
user with a biological background, looking to exploit the
many tools available, you might want to try Terry Attwood
and David Parry-Smith's "Introduction
to Bioinformatics" [Longman Higher
Education; ISBN 0582327881], or Des
Higgins and Willie Taylor's "Bioinformatics:
Sequence Structure and Databanks" [Oxford
University Press; ISBN 0199637903]. Another excellent
practical introduction is Andreas
Baxevanis and Francis Oulette's
"Bioinformatics: A Practical Guide to the
Analysis of Genes and Proteins"
[Wiley-Interscience; ISBN 0471383910], now in its new and
improved second edition. Bax teaches bioinformatics all
over Canada and the experience shows. Arthur Lesk has also
produced an excellent teaching book particularly for protein
bioinformatics in his Introduction
to Bioinformatics
Bioinformatics.Org also recommends Cynthia Gibas and Per
Jambeck's "Developing
Bioinformatics Skills" [O'Reilly, 2001
ISBN 1-56592-664-1].
Stuart Brown recommends his own book "Bioinformatics:
A Biologist's Guide to Biocomputing and the
Internet" [Eaton Pub Co; ISBN:
188129918X]. If he sends me a review copy I might recommend
it too ;-) .
Fiction books
"Darwin's Radio" by Greg Bear
[Ballantine Books, ISBN: 0345435249] is a wonderful hard SF
thriller which stretches ideas derived from genome
discoveries to their breaking point. It's gripping and
humane.
Leonard Crane, the author of Ninth Day of
Creation kindly sent me a copy for review. So
far it's an excellent read. I haven't finished it
yet, not because it isn't a rattling good story, but
because, like "Darwin's Radio",
it is very long and because I am very busy. If you'd
like to read a well-researched, but speculative, novel
containing actual scenes of practising bioinformatics then
try it.
Ken Allen contributed the following reviews:
"Frameshift [Tor Books,
ISBN: 0812571088] by Robert J. Sawyer---based around the
HGP---reasonable read, but poor / confused
ending."
Calculating God [Tor Books,
ISBN: 0812580354]by the same author---has a subtler bio
connection and is a much better read. Near the start an
alien spacecraft lands, the alien emerges and says 'take
me to your paleontologist'
Further suggestions for this section are
welcome.
Other lists of
bioinformatics books
See also compbiology.org's list, Steve
Brenner's
list, and Aik
Choon Tan's collection of books.
Centres of Bioinformatics Activity: Where is bioinformatics done?
The biggest and best source of bioinformatics links I
have encountered is the Genome Web
at the Rosalind Franklin Centre for Genomics Research at
the Genome
Campus near Cambridge,
UK. Most of the links below come from that resource. My
list is necessarily limited by comparison.
Research centres
Sequencing centres
[XXXX INSERT DETAILS OF MORE SEQUENCING CENTRES HERE]
Standards centres
[XXXX INSERT DETAILS OF STANDARDS CENTRES HERE]
What virtual centres (for example consortia and communities) for bioinformatics activity are there?
[XXXX INSERT MORE DETAILS OF VIRTUAL BIOINFORMATICS CENTRES HERE]
Online Resources: What bioinformatics Websites are there?
'Blogs
The front page of
Bioinformatics.Org itself is a bioinformatics
'Blog.
The Bio-Web links to
resources online for molecular and cell biologists and
covers current news in various biological/computational
fields.
Genehack is the first
bioinformatics 'Blog I ever encountered.
Information
The Australian National Genomic Information Service
(ANGIS) is operated by the Australian
Genomic Information Centre (currently at The University
of Sydney) to offer software, databases, documentation,
training and support for biologists
"The University of Maryland AgNIC gateway is a guide to
quality agricultural biotechnology information on the
Internet."
Directories
Christy Hightower, Engineering Librarian at the Science and
Engineering Library, University of California Santa
Cruz has already done this better than me. Visit her
excellent article about bioinformatics Net resources in
Issues in Science and
Technology Librarianship.
Societies
Humberto Ortiz Zuazaga kindly introduced me to The International Society for
Computational Biology which he points out "has links
to programs of study and online courses in computational
biology and to job postings".
Collections of Tools
You can start right here at
Bioinformatics.Org if you are looking for a bioinformatics
toolbox.
I cannot recommend strongly enough the Rosalind Franklin Centre's "GenomeWeb".
Of historical interest only now, I guess, is the legendary
"
Pedro's Molecular Biology Search and Analysis
Tools".
Portals
Bioinformatics.Org is an
international organization which promotes freedom and
openness in the field of bioinformatics and is the root domain
of a damned fine Website :-) .
CCP11
(Collaborative Computational Project 11) is another
product of the UK's Genome Campus. To quote their
Web site, it was...
"...established to foster the broad bioinformatics
community and the UK research community in particular. Its
purpose is to facilitate the transfer of knowledge and
expertise through conferences, workshops, a newsletter and
the use of the world wide web. CCP11 is funded by the BBSRC and is hosted at
the MRC Rosalind Franklin Centre for Genomics Research RFCGR located on
the Wellcome Trust Genome
Campus, Cambridge."
Jennifer Steinbachs runs compbiology.org which is a
general computational biology site as well as being a portal
to her own work.
BioPlanet is well
worth visiting. It describes itself as "a
not-for-profit site, funded with our resources, for [its
users'] benefit"
ColorBasePair
is a densely packed portal with lots of bioinformatics
links
Nick Yates runs his own informative bioinformatics site,
unsurprisingly called nick-yates.com. He
doesn't aim to make money from it, but it may have
paid-for ads. Check out the glossaries---they are better
than mine.
Tutorials
A great place to start, whether you come from a
biological, physical or computational background is at Martin
Vingron's superb online bioinformatics tutorial.
(Begin by choosing a section from the left-hand-side menu
bar.)
Tom Smith and Don Emmeluth have produced a nice little exploration
of bioinformatics using NCBI resources and tools.
I recently stumbled upon a promising set of
online lecture notes currently under construction by B.
Steipe at the Genzentrum (Gene
Center) at the Ludwig-Maximilians-Universität
München (University of Munich).
Chemistry for all
A defiantly frames-free chemistry
tutorial site.
Mathematics for biologists
First of all, an almost completely
painless introduction to the horrors of the quadratic
equation by Peter Whalen, James Walker, and Drew
Marticorena.
C. J.
Schwarz of the Department of Statistics and
Acturial Science, Simon
Fraser University has produced a course in
statistics which is
accompanied by set of sound,
online PDF handouts.
Here is a great
guide to a whole array of statistical learning/teaching
resources prepared by Juha Puranen of
the University of
Helsinki (English).
Computers for
biologists
Programming for
biologists
General introduction to
biology for computer scientists
Estrella Mountain
Community College in the States offers this excellent
short introduction to biology (actually "The Nature
of Science and Biology". It's a great place for
keyboard jockeys to start their journey to
enlightenment. Thanks to Alex O'Neill for pointing out the broken link.
Genetics
The Dolan DNA Learning Center at Cold Spring Harbor has an
outstanding
interactive tutorial introducing genetics. To take full
advantage of the multimedia elements you should download the
Flash and Real players.
Molecular biology for computer
scientists
The Institute of Arable Crop Research Beginner's
Guide to Molecular Biology
Protein chemistry for
computer scientists
Unilever Education Advanced Series
tutorial on proteins.
Cell biology for computer
scientists
The University of
Arizona has made available a
high-quality tutorial in cell biology. Not only does it
cover the facts, but it also attempts to introduce some of
the philosophy of the field---recommended. Even better,
it's also available en
Español and in
Italiano.
Once you've worked your way through that you might
like to see some scanning electron microscope images
of some of the structures you've read about taken by
members of John
Heuser's lab.
Evolution for computer scientists
Bob Patterson maintains his "Darwiniana"
with amazing diligence.
Practical
bioinformatics
Other lists of
bioinformatics tutorials
Education: Where can I study Bioinformatics...
jump straight to introduction to education section
...in Europe?
...remotely?
This
section is not complete, but contributions to
broaden its coverage are welcome. Please do not
direct questions about eligibility, course quality or
admissions policy to me, but to ask the individual
institutions directly. Use the links to obtain
contact details. If an institution doesn't provide
telephone numbers/email addresses or snailmail details on
its Web site it doesn't deserve your patronage.
This resource focuses on complete, full-time degree
programmes rather than on individual study modules. Curating
a list of the latter would be a full-time job. You can go to
other places, however, if you are looking for short courses.
Thanks to various contributors, including
Wentian Li who pointed me to this list
at Rockefeller which is mirrored at various other sites. And
to Humberto Ortiz Zuazaga for mailing me a link to the ICSB,
where you can find this list.
If you are interested in U.S. programmes, here's a
list from Curtin and here's a
list from Stanford. Thanks to Amelie Stein who also
supplied some of the individual entries in this section.
Those wanting to find programmes in the Asia Pacific
region could have a look at this
resource maintained by the Asia Pacific Bioinformatics
Network APBioNet. Thanks to Sentausa.
In the UK The
Bioinformatics Resource (part of the BBSRC's CCP11
project) project maintains (among many other resources)
lists of (mainly) British
Masters and
PhDs in bioinformatics. If you have any suggestions or
updates please
contact me with them. You can publicize your course and
offer a public service at the same time.
Africa
Rhodes
University, Grahamstown, South Africa offers an MSc. in
Bioinformatics and Computational Molecular Biology. Thanks
to Natalie Twine.
Cathal Seoighe wrote a while back about the South African
National Bioinformatics Institute (SANBI). Ruediger
Braeuning has since written to point out that bioinformatics
training in South Africa has been radically reorganized. He
says: "A new institute, the National
Bioinformatics Nework (NBN), has been created. We have nodes
at Universities all over the country (UWC, UCT, SUN, RU,
UKZN, UP, WITS). Our main tasks are to: - develop
capacity in Bioinformatics
- perform world-class
research
- support local Biotechnology
initiatives
"We do offer courses on various topics
in Bioinformatics ranging in length from 3 days to several
weeks. We also train Bioinformaticists on MSc, PhD and post
doc level. Undergraduate programs are currently being
developed. Bursaries are available. For more information
visit our
Website."
South African National Bioinformatics Institute (SANBI) Honours
Bioinformatics Course at the University of the Western
Cape. Next year the same institute will be offering a
Master's in bioinformatics---thanks to Cathal
Seoighe.
If you know of any other bioinformatics courses on the
African continent please feel free to
mail me about them.
The
Americas
Canada
Thanks to Jordan Patterson for the information that the
University of Alberta
offers four-year Biology
or Computer
Science degrees with a specialization in bioinformatics.
The Faculty of Computer
Science there offers Master's and PhD training in
bioinformatics.
Benjamin Horsman wrote to tell me that Simon Fraser University and
the University of British
Columbia are collaborating on a new Bioinformatics
training program with the British Columbia Cancer
Agency. The
program offers post-graduate diploma, Master's, and
PhD training in Bioinformatics. Now Simon Fraser University
also offers a joint major programme in Molecular
Biology and Biochemistry (MBB) and Computer Science in
Bioinformatics. Thanks to Brittany Nielsen for the
info.
Thanks to Olga Likhodi for the information that Seneca College, Toronto
offers a post-graduate diploma in Bioinformatics.
Peter Kublik informs me that from 2003/2004 the University of Calgary will offer a bioinformatics
programme. He's part of the first intake.
The University of
Waterloo, Department of Computer
Science offers
undergraduate and graduate
courses in bioinformatics. More information is
here.
California
The Keck Graduate Institute claims
that computational
biology is a core element of the curriculum in its Master of
Bioscience degree.
Stanford
University offers academic and professional
(distance-learning) MSs in Biomedical
Bioinformatics as well as its PhD programme. Thanks to
Betty Cheng.
Thanks to Momchil Georgiev for the information that the University of California at San
Diego offers a Bioinformatics
graduate programme and to Dana Brehm that there is now a
new bachelor's program, to quote her:
"[This is an] undergraduate, interdisciplinary program
for undergraduates leading to a B.S. degree. The new
Bioinformatics major is offered by the Division of Biology,
and the departments of Chemistry/Biochemistry, Computer
Science and Engineering, and Bioengineering. A student may
choose to major in Bioinformatics in any one of the four
departments or division. The Division of Biology currently
offers two Bioinformatics courses, and with the advent of
the cross-disicplinary major, even more courses are going
to be taught 2002-03 and 2003-04."
.
University of California,
Irvine Informatics in Biology
and Medicine
David Delong wrote to me to point out that the College of Natural and
Agricultural Sciences at the University of California,
Riverside is developing a
"Center in Genomics and Bioinformatics" which will
offer a PhD
curriculum in genomics and bioinformatics from academic
year 2001-2002 onwards.
Catherine Velazquez says that The University of California, Santa
Cruz offers a new
undergraduate BS course in
bioinformatics. They have a Frequently Asked Questions. Now they also offer an MS/PhD
in Bioinformatics. Thanks to Kevin Karplus for the update.
Connecticut
Javier Rojas Balderrama emailed me to point out thatYale University offers a Bioinformatics
and Computational Biology track
as part of its combined Biological
and Biomedical Sciences graduate programme.
Georgia
Georgia Institute of
Technology
Masters of Science in Bioinformatics
According to Eric VanWieren Georgia State University
offers a Master's and PhD in Computer Science with a
focus on bioinformatics. The university's Bachelor of
Science in Computer Science also offers a "Fundamentals
of Bioinformatics" course.
Illinois
The University
of Illinois at Chicago offers graduate programmes
covering Bioengineering Bioinformatics through its
Bioengineering department as well as an
undergraduate course track. Thanks to Amit Sabnis.
Indiana
IUPUI offers an MS
programme in Bioinformatics.
Indiana University
also offers an
MS programme in Bioinformatics.
Iowa
Iowa State
University offers an Interdisciplinary Ph.D. Program in
Bioinformatics and Computational Biology (BCB).
Maine
The Jackson Lab, a World
centre of mouse genome informatics offers a graduate
training program.
Maryland
Tim Young wrote to say that Johns Hopkins University in
Maryland offers an MS in Bioinformatics through the Zanvyl Krieger
School of Arts and Sciences Advanced Academic Programs
and Whiting School of
Engineering Engineering and Applied Science Programs for
Professionals. They are also offering a Bioinfomatics
concentration with their
MS in Biotechnology program.
Massachusetts
Boston University
offers a graduate
programme and so
does its partner North
Eastern University. North Eastern also offers a Graduate
Certificate in the subject.
Brandeis
University offers both a Master
of Science in Bioinformatics and a Graduate
Certificate in Bioinformatics. Thanks to Matt
Foster.
The Department of
Computer Science at UMass Lowell offers various degrees
from Bachelor's through to PhD. level in Computer Science with
Bioinformatics options.
Mexico
At the National Autonomous University of Mexico a
doctoral program in biomedical sciences is available.
Their Computational Molecular Biology Group is
here.
Minnesota
The University of
Minnesota offers a graduate programme in
bioinformatics.
Thanks to Anu Haniharan for drawing my attention to
mixing up the Minnesota and New Jersey paragraphs.
Nebraska
The University
of Nebraska Lincoln offers an Interdisciplinary Bioinformatics Specialization.
The Graduate Program of the Pathology-Microbiology
Department at the University
of Nebraska Medical Center (University of Nebraska at
Omaha) offers a specialty
track in bioinformatics.
NewJersey
Rama Penta wrote to say that Stevens Institute of
Technology offers a Master's programme in
Bioinformatics.
The message also states that the University
of Medicine and Dentistry New Jersey (UMDNJ) offers a
programme in biomedical
informatics.
Thanks to Anu Haniharan for drawing my attention to
mixing up the Minnesota and New Jersey paragraphs.
Moustafa wrote to say that Ramapo College in New
Jersey is the only school in New Jersey offering a
Bachelor's degree in bioinformatics.
New York State
The University at
Buffalo has been involved in establishing a "Center of
Excellence in Bioinformatics". It used to a range
of courses in bioinformatics and related subjects, but all
the course links seem to be dead now. Thanks to Jeff Ligas
for the original notification.
Canisius
College---also in Buffalo, NY---has had a state-approved
B.S. in Bioinformatics since
2001. Thanks to Deb Burhans.
Cornell and Rockefeller
Universities, together with the Sloan-Kettering Research
Institute offer a "Tri-institutional
program in Computational Biology and Medicine".
Thanks to Brant Inman.
Rensselaer Polytechnic
Institute offers both undergraduate
and graduate programmes in bioinformatics
Rochester Institute of
Technology offers BS
MS
and BS/MS
programmes in Bioinformatics. Thanks to Brandon H.
According to Maureen Downey, the College of Staten
Island, part of the City University of New York also offers
a challenging program in bioinformatics.
If you know of any other bioinformatics courses on the
American continent please feel free to
mail me about them.
North Carolina
Duke University's
Center for
Bioinformatics and Computational Biology offers var |