Profile

Profile

Letters to the Editor

Books

Products

Software

Web

Top Ten

PROFILE

Protein Structure by Numbers by Georgina Ferry
Posted March 3, 2000 · Issue 73

Abstract

Molecular models have moved from steel rods to on-screen visualizations. Researchers now use supercomputers and modeling programs to translate the growing quantities of nucleotide information into protein structures. And in true egalitarian style, they're providing free access to this information on the Web.

Time was when model building in structural biology was just what it sounds like. Linus Pauling discovered the alpha helix by making folds in a sheet of paper while he was lying in bed with a cold. James Watson and Francis Crick developed their theory of DNA structure through building a series of constructions of steel rods and clamps, culminating in the famous double helix. Three-dimensional models provided a route to understanding how biological molecules worked, using the form to find the function.

Computer models add dimensions to sequences.

Models are just as important today, but assemblages of steel rods are a thing of the past. The most complex protein structure can be realized in a computer and presented in all its 3-D glory on the screen. Such images of known structures provide much of the raw material for drug discovery programs, making it easy to visualize likely targets and to design potential pharmaceutical agents to match.

But what about unknown structures? Even with the most advanced electron or X-ray diffraction methods, or with nuclear magnetic resonance (NMR), solving the 3-D structures of the enzymes and receptors that are the workhorses of the living organism is a relatively slow business. Currently there are over 300,000 protein sequences listed in databases such as SWISS-PROT/TrEMBL, while the Brookhaven Protein Data Bank of structures that have been determined through X-ray crystallography or NMR (the vast majority of them in the past decade) numbers fewer than 12,000. The gap can only keep growing, as more and more protein sequences are derived from the genomic information pouring out of the various genome sequencing projects.

For the past decade, a number of groups around the world have been attempting to bridge the gap by developing computer models that can predict a protein's structure from its sequence of amino acids. Many such models are based on a comparative approach - they use proteins with known structures to predict how another, similar sequence might fold itself into the more or less spherical forms most proteins adopt. This is a highly technical and specialized field, calling for both a good working understanding of protein chemistry, and for high-level computing skills. It is not something you can do at home on your PC - many of these programs require access to some fairly formidable computing firepower. But the strong philosophy of shared information in the biological research community, combined with the ease of communication provided by the World Wide Web, means that soon any biologist will be able to gain access to the arcane world of protein-structure prediction.

SWISS-MODEL: Do try this at home.

One of the pioneers in this approach is Manuel Peitsch, now at Glaxo Wellcome Experimental Research and a founding member of the Swiss Institute for Bioinformatics in Geneva, Switzerland. With his colleagues Nicolas Guex, Torsten Schwede, and Alexander Diemand, he has developed a "comparative protein modeling environment" that can be accessed from a PC via the Internet. "I wanted to provide modeling capability as a service to the community of molecular biologists," he says. "It provides them with a platform to answer biological questions without having to worry about the tedious technical aspects of the modeling procedure."

Called SWISS-MODEL, the system is built around an Internet-based server that compares a submitted protein sequence with a template sequence whose structure is already known. As long as the template shares at least 25 percent sequence identity with the target sequence, SWISS-MODEL proceeds through a series of steps to generate a model structure based on the template. A key feature of the system is a powerful graphical "front end" and "sequence to structure workbench," which can be downloaded free of charge. The Swiss-PdbViewer allows biologists to visualize and analyze structures and sequences, and manipulate them to find a satisfactory match before submitting a modeling request. It now also offers the capability of creating individual models on your PC at home.

The chances are, though, that the protein you're interested in has already been modeled. In 1998, Peitsch and his colleagues carried out the first really large-scale protein modeling experiment, which they dubbed the 3DCrunch. Using a Cray supercomputer provided by Silicon Graphics in Cortaillod, Switzerland, they submitted every protein sequence then in the SWISS-PROT database - over 200,000 of them - to the modeling procedure, resulting in more than 70,000 models. It took the 64-processor computer just under a working week to do the job. A single-processor machine would have needed more than a working year.

3DCrunch is continually updated and refined.

"The 3DCrunch was a big thing," says Peitsch, "but since then we've been automatically modeling new sequences every month." For some thousand or more of the sequences in the original 3DCrunch, the structure was already known. That provided Peitsch and his colleagues with the means to check the accuracy of their models, and to look for ways to improve them. "I wrote the original SWISS-MODEL," he says, "but the others have rewritten it! We're spending much of our time developing new algorithms based on known structures and targeting sources of inaccuracy. In general the closer the template is to the target sequence, the more accurate the model will be. But we have had models based on pairs that were less than 50 percent identical, yet which still deviated from the true structure by an average of less than 1 angstrom."

This spring, Peitsch is planning to start all over again. With the improved version of SWISS-MODEL, he wants to crunch every protein sequence again to improve the accuracy of the models in his collection. And to do it he'll be going back to the place where he started all this - the National Cancer Institute in Frederick, Maryland. A decade ago, as a postdoc in Jacob V. Maizel's Laboratory of Mathematical Biology there, he started working on protein structure modeling. Today known as the Laboratory of Experimental and Computational Biology, the lab has established an Advanced Biological Computing Center, which is the only public-domain supercomputing center in the world dedicated to biomedical research. Last year it took delivery of a 96-processor Cray SV1 supercomputer, to add to an already impressive display of serious hardware.

Will computers replace crystallographers?

Will this kind of computer power, added to the ever-growing knowledge base, eventually make X-ray crystallographers and NMR spectroscopists redundant? "They'll never be redundant - or not for a very long time," says Peitsch. "Experiments provide the ultimate proof, and always give new insights, especially if you need to assess complexes of proteins. But modeling can direct experiments - you can plan to address a certain number of proteins, so that you might do five experiments rather than 50. It also allows you to interpret the results of experiments in the absence of structural knowledge. For example, you could locate a mutation and assess its impact on the structure."

SWISS-MODEL is just one of a number of examples of software developed for protein structure prediction, though it's distinguished by its focus on accessibility and ease of use. But just how good are these programs? Since 1994, the fortune-tellers of protein structure have been putting their crystal balls to the test in a biennial series of competitions originally devised by John Moult of the Center for Advanced Research in Biotechnology at the University of Maryland, called Comparative Assessment of Techniques of Protein Structure Prediction (CASP). The organizers invite protein crystallographers and NMR spectroscopists to submit the sequences of proteins whose structures they expect to solve within a few months. These are posted on a Web server for groups around the world to model. The true structures are then revealed, and the exercise culminates in a four-day meeting at which the most successful modelers are invited to describe their methods. CASP 4 will be launched this spring, with open season on the target structures from April until August, and a conference to present the results in Asilomar, California, in December 2000.

Full automation hasn't won yet.

Michael Sternberg heads the Biomolecular Modelling Laboratory of the Imperial Cancer Research Fund in London. Models he constructed with his colleague Paul Bates were among the more successful entrants in CASP 3, and he also worked with Peitsch on the 3DCrunch.

"There are two possible approaches," says Sternberg. "Either you can concentrate on developing a fully automated procedure, or you can improve the results of the initial automated algorithms by manual intervention based on expert knowledge." The most successful groups at CASP 3 all used a certain amount of manual intervention to improve their models. But the disadvantage of this approach is that it can deal with relatively small numbers of structures at a time - the CASP 3 participants worked on around a dozen.

"Fully automated procedures are essential for large-scale modeling," says Sternberg. In addition to the work of Peitsch's group, he points to the recent project by Roberto Sánchez and Andrej Salí at the Laboratory of Molecular Biophysics at the Rockefeller University, wherein they attempted to model all 6,218 putative proteins generated from the complete genome sequence of baker's yeast, Saccharomyces cerevisiae. Using software called MODELLER, originally developed by Salí and Tom Blundell at Birkbeck College in London, they managed to produce models for over 1,000 of the proteins. A quarter of these revealed family relationships with proteins of known structure for the first time. Only 40 of the yeast proteins had previously had their 3-D structure determined experimentally [1].

"Modeling provides a platform for creating ideas."

"Models derived from these large-scale projects are not guaranteed to be accurate," says Sternberg. "But you can still go a long way with them. You can look at the results, and say 'this enzyme in this pathogen might make a good drug target,' and go on to solve its structure experimentally."

Peitsch agrees that tools such as SWISS-MODEL are a means to an end, not an end in themselves. "Modeling provides a platform for creating ideas," he says.

Georgina Ferry is a scientific journalist based in Oxford, England.

Alexandria Heather-Vazquez is art director of HMS Beagle.

Tell us what you think.
Feedback

Endlinks

ExPASy Molecular Biology Server - offers a wide range of protein analysis tools, including SWISS-MODEL. From the Swiss Institute of Bioinformatics.

MODELLER Page at the Rockefeller University - provides FTP sites, an HTML manual, and other information about this modeling program.

Sisyphus and Protein Structure Prediction - an overview of methods and progress in structure solving. From the journal Bionformatics.

Homology Modeling for Beginners - an online tutorial from the BIOcomputing Unit at EMBL. Offers a detailed 17-step or one-day introduction to these methods.

Bioinformatics Resource - offers conferences and workshops, a newsletter, and other resources.

Bioinformatics Links - an extensive listing of databases, online analyses, journals, and more.

Related HMS Beagle articles:

Bioinformatics Software Online - covers the wealth of resources available for sequence analysis.
Molecular Modeling - a compilation of Internet sites for biologists.

Previous Profile Articles

Virtual Cures: Entelos, Inc.

by William A. Wells (Posted February 18, 2000 · Issue 72)

The Giant "Yellow Pages" of Life, Online

by Daniel Edelstein (Posted February 4, 2000 · Issue 71)

Getting Rid of Radicals: MetaPhore Pharmaceuticals, Inc.

by William A. Wells (Posted January 21, 2000 · Issue 70)

Ethical Culture: Millennium Pharmaceuticals, Inc.

by Vicki Brower (Posted December 24, 1999 · Issue 69)

Rebuilding the Spine: Acorda Therapeutics, Inc.

by William A. Wells (Posted December 10, 1999 · Issue 68)

Bug Warfare: IntraBiotics Pharmaceuticals, Inc.

by William A. Wells (Posted November 26, 1999 · Issue 67)