Time was when model building in structural biology was just what
it sounds like. Linus Pauling discovered the alpha helix by making
folds in a sheet of paper while he was lying in bed with a cold.
James Watson and Francis Crick developed their theory of DNA
structure through building a series of constructions of steel rods
and clamps, culminating in the famous double helix.
Three-dimensional models provided a route to understanding how
biological molecules worked, using the form to find the
function.
Computer models add dimensions to
sequences. |
Models are just as important today, but assemblages of steel rods
are a thing of the past. The most complex protein structure can be
realized in a computer and presented in all its 3-D glory on the
screen. Such images of known structures provide much of the raw
material for drug discovery programs, making it easy to visualize
likely targets and to design potential pharmaceutical agents to
match.
But what about unknown structures? Even with the most advanced
electron or X-ray diffraction methods, or with nuclear magnetic
resonance (NMR), solving the 3-D structures of the enzymes and
receptors that are the workhorses of the living organism is a
relatively slow business. Currently there are over 300,000 protein
sequences listed in databases such as SWISS-PROT/TrEMBL, while the
Brookhaven Protein Data Bank
of structures that have been determined through X-ray
crystallography or NMR (the vast majority of them in the past
decade) numbers fewer than 12,000. The gap can only keep growing, as
more and more protein sequences are derived from the genomic
information pouring out of the various genome sequencing
projects.
For the past decade, a number of groups around the world have
been attempting to bridge the gap by developing computer models that
can predict a protein's structure from its sequence of amino acids.
Many such models are based on a comparative approach - they use
proteins with known structures to predict how another, similar
sequence might fold itself into the more or less spherical forms
most proteins adopt. This is a highly technical and specialized
field, calling for both a good working understanding of protein
chemistry, and for high-level computing skills. It is not something
you can do at home on your PC - many of these programs require
access to some fairly formidable computing firepower. But the strong
philosophy of shared information in the biological research
community, combined with the ease of communication provided by the
World Wide Web, means that soon any biologist will be able to gain
access to the arcane world of protein-structure prediction.
SWISS-MODEL: Do try this at
home. |
One of the pioneers in this approach is Manuel Peitsch,
now at Glaxo Wellcome
Experimental Research and a founding member of the Swiss Institute for Bioinformatics
in Geneva, Switzerland. With his colleagues Nicolas Guex,
Torsten
Schwede, and Alexander
Diemand, he has developed a "comparative protein modeling
environment" that can be accessed from a PC via the Internet. "I
wanted to provide modeling capability as a service to the community
of molecular biologists," he says. "It provides them with a platform
to answer biological questions without having to worry about the
tedious technical aspects of the modeling procedure."
Called SWISS-MODEL,
the system is built around an Internet-based server that compares a
submitted protein sequence with a template sequence whose structure
is already known. As long as the template shares at least 25 percent
sequence identity with the target sequence, SWISS-MODEL proceeds
through a series of steps to generate a model structure based on the
template. A key feature of the system is a powerful graphical "front
end" and "sequence to structure workbench," which can be downloaded
free of charge. The Swiss-PdbViewer allows
biologists to visualize and analyze structures and sequences, and
manipulate them to find a satisfactory match before submitting a
modeling request. It now also offers the capability of creating
individual models on your PC at home.
The chances are, though, that the protein you're interested in
has already been modeled. In 1998, Peitsch and his colleagues
carried out the first really large-scale protein modeling
experiment, which they dubbed the 3DCrunch.
Using a Cray supercomputer provided by Silicon Graphics in Cortaillod,
Switzerland, they submitted every protein sequence then in the
SWISS-PROT database - over 200,000 of them - to the modeling
procedure, resulting in more than 70,000 models. It took the
64-processor computer just under a working week to do the job. A
single-processor machine would have needed more than a working
year.
3DCrunch is continually updated and
refined. |
"The 3DCrunch was a big thing," says Peitsch, "but since then
we've been automatically modeling new sequences every month." For
some thousand or more of the sequences in the original 3DCrunch, the
structure was already known. That provided Peitsch and his
colleagues with the means to check the accuracy of their models, and
to look for ways to improve them. "I wrote the original
SWISS-MODEL," he says, "but the others have rewritten it! We're
spending much of our time developing new algorithms based on known
structures and targeting sources of inaccuracy. In general the
closer the template is to the target sequence, the more accurate the
model will be. But we have had models based on pairs that were less
than 50 percent identical, yet which still deviated from the true
structure by an average of less than 1 angstrom."
This spring, Peitsch is planning to start all over again. With
the improved version of SWISS-MODEL, he wants to crunch every
protein sequence again to improve the accuracy of the models in his
collection. And to do it he'll be going back to the place where he
started all this - the National
Cancer Institute in Frederick, Maryland. A decade ago, as a
postdoc in Jacob V.
Maizel's Laboratory of Mathematical Biology there, he
started working on protein structure modeling. Today known as the Laboratory of Experimental and
Computational Biology, the lab has established an Advanced Biological Computing
Center, which is the only public-domain supercomputing center in
the world dedicated to biomedical research. Last year it took
delivery of a 96-processor Cray SV1 supercomputer, to add to an
already impressive display of serious hardware.
Will computers replace
crystallographers? |
Will this kind of computer power, added to the ever-growing
knowledge base, eventually make X-ray crystallographers and NMR
spectroscopists redundant? "They'll never be redundant - or not for
a very long time," says Peitsch. "Experiments provide the ultimate
proof, and always give new insights, especially if you need to
assess complexes of proteins. But modeling can direct experiments -
you can plan to address a certain number of proteins, so that you
might do five experiments rather than 50. It also allows you to
interpret the results of experiments in the absence of structural
knowledge. For example, you could locate a mutation and assess its
impact on the structure."
SWISS-MODEL is just one of a number of examples of software
developed for protein structure prediction, though it's
distinguished by its focus on accessibility and ease of use. But
just how good are these programs? Since 1994, the fortune-tellers of
protein structure have been putting their crystal balls to the test
in a biennial series of competitions originally devised by John Moult of the Center for Advanced Research in
Biotechnology at the University of Maryland, called Comparative
Assessment of Techniques of Protein Structure Prediction (CASP). The
organizers invite protein crystallographers and NMR spectroscopists
to submit the sequences of proteins whose structures they expect to
solve within a few months. These are posted on a Web server for
groups around the world to model. The true structures are then
revealed, and the exercise culminates in a four-day meeting at which
the most successful modelers are invited to describe their methods.
CASP 4 will be launched this spring, with open season on the target
structures from April until August, and a conference to present
the results in Asilomar, California, in December 2000.
Full automation hasn't won
yet. |
Michael
Sternberg heads the Biomolecular Modelling
Laboratory of the Imperial Cancer
Research Fund in London. Models he constructed with his
colleague Paul
Bates were among the more successful entrants in CASP 3,
and he also worked with Peitsch on the 3DCrunch.
"There are two possible approaches," says Sternberg. "Either you
can concentrate on developing a fully automated procedure, or you
can improve the results of the initial automated algorithms by
manual intervention based on expert knowledge." The most successful
groups at CASP 3 all used a certain amount of manual intervention to
improve their models. But the disadvantage of this approach is that
it can deal with relatively small numbers of structures at a time -
the CASP 3 participants worked on around a dozen.
"Fully automated procedures are essential for large-scale
modeling," says Sternberg. In addition to the work of Peitsch's
group, he points to the recent project by Roberto Sánchez
and Andrej Salí at the Laboratory of Molecular
Biophysics at the Rockefeller University, wherein they attempted
to model all 6,218 putative proteins generated from the complete
genome sequence of baker's yeast, Saccharomyces cerevisiae.
Using software called MODELLER,
originally developed by Salí and Tom Blundell at Birkbeck College in London,
they managed to produce models for over 1,000 of the proteins. A
quarter of these revealed family relationships with proteins of
known structure for the first time. Only 40 of the yeast proteins
had previously had their 3-D structure determined experimentally [1].
"Modeling provides a platform for creating
ideas." |
"Models derived from these large-scale projects are not
guaranteed to be accurate," says Sternberg. "But you can still go a
long way with them. You can look at the results, and say 'this
enzyme in this pathogen might make a good drug target,' and go on to
solve its structure experimentally."
Peitsch agrees that tools such as SWISS-MODEL are a means to an
end, not an end in themselves. "Modeling provides a platform for
creating ideas," he says.