April 22, 2025

Research in Context: Designing proteins

Harnessing nature’s nanotechnology

Proteins perform many different functions in biology. This special Research in Context feature explores advances in scientists’ efforts to custom design proteins that can perform unique functions beneficial to human health.

A researcher pointing to a molecular structure on a computer monitor. S. Singha / Shutterstock

Proteins are the workhorses of biochemistry. Nearly every function in every living organism involves proteins at some point. Proteins do it all—from enzymes that enable chemical reactions, to fibers that provide structural support for cells, to hormones that carry signals between different organs and the receptors that receive those signals. They are extremely versatile molecular tools.

Some scientists aim to harness that versatility to create new proteins that can perform specific tasks. “Proteins are the ultimate miniature machines,” says Dr. Brian Kuhlman, who studies protein design at the University of North Carolina School of Medicine. “Any biological process you can think about, proteins are involved with. And so, we want to be able to design proteins that interact with naturally occurring proteins and regulate their behavior, to create new therapeutics or probes for understanding biology. More ambitiously, we want to design completely new proteins that perform functions not observed in nature.”

Such bespoke proteins could serve many purposes. They might repair or remove the cause of a disease. They might make more effective vaccines. Or they could help to remove toxic substances from our bodies or our environment. Recent technological advances, particularly in artificial intelligence (AI) have made protein design increasingly feasible.

Predicting protein shapes

Illustration of a molecular structure emerging from a computer screen. ilyas / Adobe Stock

One way to design a new protein is to start with an existing protein and modify it to change its function. While this can work in many cases, it restricts the available options to ones that at least somewhat resemble existing proteins. Ideally, scientists would like to be able to design proteins from scratch, a process called de novo design. But doing this requires first solving a different problem: what controls a protein’s shape?

A protein starts off as a long chain of small units called amino acids. There are 20 different amino acids found in naturally occurring proteins, with varying structures and chemical properties. Once these amino acids are strung together, the chain can fold up into a specific three-dimensional shape that is necessary for its function. Lots of different shapes and lots of different functions are possible. How does a particular protein find the specific shape it needs out of the vast range of possibilities?

In the early 1960’s, an NIH researcher, Dr. Christian Anfinsen, found that several unfolded proteins could refold into their proper shapes without any outside help. Thus, he concluded that a protein’s folded shape must be determined by its unique sequence of amino acids. For this discovery, Anfinsen received the Nobel Prize in Chemistry for 1972.

But how does the amino acid sequence determine the shape? If we know that, we should be able to predict what structure a given protein will adopt just by knowing its sequence.

In 1994, a group of researchers established a competition called the Critical Assessment of Protein Structure Prediction (CASP). Participants were given the amino acid sequences of proteins whose structures had been determined through painstaking laboratory work but not yet made public. They would try to predict the structures of these proteins, and the predictions would be judged against the known structures. Since then, NIH has begun supporting the competition, and it continues to be held every other year.

During the early years of the competition, significant improvements were seen from one competition to the next. By the second decade of the 21st century, though, the best accuracy seemed to have plateaued at below 40% for the most difficult category of proteins—those that weren’t related to others with known structures that could be used as a template.

Things changed with the 13th CASP in 2018, when AI-based protein structure prediction approaches, including one called AlphaFold, entered the fray. AlphaFold reached an accuracy of almost 60% for the most difficult category of proteins by leveraging AI, a large database of experimentally determined protein structures, and insights gained from decades of computational protein structure prediction. NIH funding was essential for laying the groundwork for the success of AlphaFold. Two years later, an improved model, AlphaFold2, reached almost 90% accuracy in the 14th CASP. An accuracy of 90% is considered as good as a structure determined through experimentation.

AI may have solved the basic challenge of predicting protein structure from sequence, but that doesn’t mean that CASP has outlived its usefulness. “People say, ‘okay, great, this is solved, so we can move on,’” says Dr. Krzysztof Fidelis, one of the founders and organizers of CASP. “I think that's a bit naïve. If our eventual goal is to understand biology, then solving the folding problem helps, but it’s just a small step.” There are still biologically important structures, he explains, that we can’t yet model accurately, including complexes of proteins with other proteins, RNA, or DNA. As researchers develop ways to model ever larger and more complex systems, the need for ways to objectively assess how well they’re doing will continue to grow.

From prediction to design

A 3D rendering of membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1, as predicted by AlphaFold. A complex protein as predicted by AlphaFold. JeanMarc / Adobe Stock

Designing proteins is a major challenge closely related to the structure prediction problem. Instead of asking what structure a given amino acid sequence will adopt, you ask what amino acid sequence could adopt the structure you want.

Kuhlman contributed to a breakthrough in de novo protein design when he was a postdoctoral fellow in Dr. David Baker’s lab at the University of Washington. First, Baker, Kuhlman, and their colleagues came up with a protein structure unlike any known to exist in nature. They then developed sequence design software to find an amino acid sequence that would fold to that structure. The resulting sequence didn’t resemble any naturally occurring proteins. When the researchers made the protein, called Top7, and determined its structure experimentally, it had exactly the structure they had designed.

Since then, Baker’s group has gone on to design a wide variety of protein types. These include proteins that assemble to form nanostructures, protein sensors for detecting fentanyl, small proteins that can inhibit SARS-CoV-2 infection, and nanoparticles that display influenza virus proteins to serve as a vaccine. In recognition of his accomplishments, Baker, along with two developers of AlphaFold and AlphaFold2, received the 2024 Nobel Prize in Chemistry.

It turns out that, just as AI made a huge difference in protein structure prediction, it’s also incredibly useful for protein design. Inspired by AlphaFold, Baker and others have incorporated AI into their suite of protein design tools in order to find the ideal sequence for a specific structure.

“What’s really cool,” Kuhlman explains,” is that you can then use those two types of models—one that goes from sequence to structure, and one that goes from structure to sequence—in tandem and go back and forth between the two to, on the computer, evolve sequences for the particular function or structure that you want.”

Other researchers, like Dr. William DeGrado of the University of California, San Francisco, have continued to expand the possibilities of protein design. Getting a protein to adopt the shape you want is an important step, but it’s not the whole story. It also needs to be able to perform the function you want. For many proteins—such as sensors, receptors, and enzymes—a protein has to be able to recognize and bind to a specific smaller molecule, or ligand. This requires that the protein have a cavity that the ligand can fit into and interact with, like a key in a lock.

But you can’t necessarily design a protein to fold into a particular shape first and then build a binding cavity into it afterwards. You need to find a protein sequence that simultaneously folds to the shape you want and creates the necessary binding cavity. Previous attempts to do so have typically yielded only weak binding in the initial design. Repeated rounds of tinkering and experimentation may then be needed.

DeGrado and his colleague Nicholas Polizzi developed a strategy for finding an amino acid sequence that simultaneously optimizes folding and ligand binding. To test the effectiveness of their strategy, they designed a protein from scratch that binds to the blood-thinning drug apixaban. The resulting protein bore no structural resemblance to apixaban’s natural binding target. Instead, it adopted a structure that doesn’t usually bind to small molecules. And like Top7, its amino acid sequence didn’t resemble that of any naturally occurring protein. Yet when the researchers made the protein, it bound tightly to apixaban without any further tinkering.

Designing proteins that can bind small molecules could have many potential applications. “We're trying to design proteins that'll bind small molecule toxic drugs,” DeGrado says. “They can sop up the drugs and then be cleared. Or you could imagine binding to them and then having a second domain (another part of the protein) that will bind to something, say, on a cancer cell, and that brings it into the cell.”

Researchers might also design enzymes to break down compounds that don’t break down naturally. Examples could include microplastics or per- and polyfluoroalkyl substances (PFAS), sometimes called “forever chemicals.”

“Natural proteins have evolved to work on natural substrates, and now we have all these entirely new substances,” DeGrado says. “We may only be able to go so far in terms of engineering existing proteins.” De novo protein design could get around this natural limitation.

Proteins to improve health

3D illustrations of protein structures. Christoph Burgstedt / Shutterstock

Kuhlman also continues to work on designing new proteins. One of his areas of research involves small proteins that can act as drugs, binding other proteins to activate or inhibit them. Many drugs are small molecules, with a molecular mass of less than 1,000 daltons (a dalton is the mass of one hydrogen atom). By contrast, even relatively small proteins are typically in the thousands of daltons. Not every potential drug target will have a deep cavity suitable for small molecule binding. Proteins, in contrast, can be designed to bind to a wide variety of molecular surfaces.

One protein Kuhlman’s lab has designed blocks the function of another protein called PD-L1. PD-L1 suppresses immune activity, particularly in tumors. Blocking PD-L1 could thus boost the immune system’s ability to kill tumors. The problem is that if you block PD-L1 indiscriminately, it enhances immune activity throughout the body, leading to harmful side effects.

To get around this problem, Kuhlman and colleagues used the fact that tumors are enriched in enzymes called proteases, which cut up proteins. Using AI-based tools, they designed an “autoinhibitory domain” and attached it to a protein that blocks PD-L1, called HA-PD1. Autoinhibitory means it inhibits itself—the domain blocks the part of HA-PD1 that binds to PD-L1 and prevents binding. But when HA-PD1 gets into a tumor, the tumor proteases cut off the autoinhibitory domain, allowing it to bind and block PD-L1, thereby enhancing tumor killing.

Kuhlman has also used protein design to help develop a vaccine for dengue virus, a potentially life-threatening virus spread through mosquitos. The virus surface is coated with a protein called the E protein. Antibodies could potentially recognize and bind the E protein to neutralize the virus. But previous vaccines based on the E protein have performed poorly in clinical trials. It turns out that neutralizing antibodies need to recognize pairs of E protein, whereas in vaccines the E proteins are all separate from each other. Using protein design tools, Kuhlman and colleagues found that by changing a small number of amino acids, they could produce an E protein that spontaneously forms pairs. Mice immunized with these E protein pairs produced lots of neutralizing antibodies. Kuhlman’s team is currently testing their designed vaccine in monkeys. If it succeeds, they hope to develop it for use in people.

“Currently, there’s a revolution in protein design because of the new AI-based tools,” Kuhlman says. “So, we’re able now to start thinking about and working on problems that we couldn’t have worked on five years ago. We’re getting closer and closer to designing proteins that are as sophisticated as naturally occurring proteins.”

—by Brian Doctrow, Ph.D.

Related Links

References: Principles that govern the folding of protein chains. Anfinsen CB. Science. 1973 Jul 20;181(4096):223-30. doi: 10.1126/science.181.4096.223. PMID: 4124164.

Design of a novel globular protein fold with atomic-level accuracy. Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Science. 2003 Nov 21;302(5649):1364-8. doi: 10.1126/science.1089427. PMID: 14631033.

Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Proteins. 2021 Dec;89(12):1607-1617. doi: 10.1002/prot.26237. Epub 2021 Oct 7. PMID: 34533838

De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Cao L, Goreshnik I, Coventry B, Case JB, Miller L, Kozodoy L, Chen RE, Carter L, Walls AC, Park YJ, Strauch EM, Stewart L, Diamond MS, Veesler D, Baker D. Science. 2020 Oct 23;370(6515):426-431. doi: 10.1126/science.abd9909. Epub 2020 Sep 9. PMID: 32907861.

Multivalent designed proteins neutralize SARS-CoV-2 variants of concern and confer protection against infection in mice. Hunt AC, Case JB, Park YJ, Cao L, Wu K, Walls AC, Liu Z, Bowen JE, Yeh HW, Saini S, Helms L, Zhao YT, Hsiang TY, Starr TN, Goreshnik I, Kozodoy L, Carter L, Ravichandran R, Green LB, Matochko WL, Thomson CA, Vögeli B, Krüger A, VanBlargan LA, Chen RE, Ying B, Bailey AL, Kafai NM, Boyken SE, Ljubetič A, Edman N, Ueda G, Chow CM, Johnson M, Addetia A, Navarro MJ, Panpradist N, Gale M Jr, Freedman BS, Bloom JD, Ruohola-Baker H, Whelan SPJ, Stewart L, Diamond MS, Veesler D, Jewett MC, Baker D. Sci Transl Med. 2022 May 25;14(646):eabn1252. doi: 10.1126/scitranslmed.abn1252. Epub 2022 May 25. PMID: 35412328.

Quadrivalent influenza nanoparticle vaccines induce broad protection. Boyoglu-Barnum S, Ellis D, Gillespie RA, Hutchinson GB, Park YJ, Moin SM, Acton OJ, Ravichandran R, Murphy M, Pettie D, Matheson N, Carter L, Creanga A, Watson MJ, Kephart S, Ataca S, Vaile JR, Ueda G, Crank MC, Stewart L, Lee KK, Guttman M, Baker D, Mascola JR, Veesler D, Graham BS, King NP, Kanekiyo M. Nature. 2021 Apr;592(7855):623-628. doi: 10.1038/s41586-021-03365-x. Epub 2021 Mar 24. PMID: 33762730.

A defined structural unit enables de novo design of small-molecule-binding proteins. Polizzi NF, DeGrado WF. Science. 2020 Sep 4;369(6508):1227-1233. doi: 10.1126/science.abb8330. PMID: 32883865.

In silico evolution of autoinhibitory domains for a PD-L1 antagonist using deep learning models. Goudy OJ, Nallathambi A, Kinjo T, Randolph NZ, Kuhlman B. Proc Natl Acad Sci U S A. 2023 Dec 5;120(49):e2307371120. doi: 10.1073/pnas.2307371120. Epub 2023 Nov 30. PMID: 38032933.

Designed, highly expressing, thermostable dengue virus 2 envelope protein dimers elicit quaternary epitope antibodies. Kudlacek ST, Metz S, Thiono D, Payne AM, Phan TTN, Tian S, Forsberg LJ, Maguire J, Seim I, Zhang S, Tripathy A, Harrison J, Nicely NI, Soman S, McCracken MK, Gromowski GD, Jarman RG, Premkumar L, de Silva AM, Kuhlman B. Sci Adv. 2021 Oct 15;7(42):eabg4084. doi: 10.1126/sciadv.abg4084. Epub 2021 Oct 15. PMID: 34652943.