“This is an excellent example of how great researchers take advantage of our user facilities and how federal investment in capabilities and expertise make both broad and deep contributions to scientific discovery,” said Lab Director Mike Witherell. 

David Baker, a biochemist at the University of Washington, was awarded the 2024 Nobel Prize in Chemistry on October 8 for his work in computational protein design, along with Demis Hassabis and John Jumper from Google DeepMind for their work in protein structure prediction. Baker and his groundbreaking research have many connections to Berkeley Lab. He has collaborated with Lab researchers and utilized all five Berkeley Lab user facilities. Here’s a summary of some of the ways Baker has worked with Berkeley Lab scientists and its capabilities:

Advanced Light Source (ALS): Baker used the ALS for over two decades, leveraging its high-throughput small-angle X-ray scattering and protein crystallography capabilities to design novel proteins. The ALS’s crystallography beamlines, particularly through the Integrated Diffraction Analysis Technologies program, enabled rapid validation and refinement of protein structures. These tools and the collaboration between the ALS and Baker’s group were critical for confirming their breakthrough in computational de novo protein and advancing both experimental and computational protein science. Baker has published 78 papers related to protein structure that used ALS beamlines. The predicted protein structures were then verified using facilities at the ALS, where Baker has been a regular user of the Biosciences Area-run Berkeley Center for Structural Biology Beamlines and Structurally Integrated Biology for Life Sciences (SIBYLS) Beamline. Learn more

NERSC: Since first accessing NERSC systems in 2021, Baker’s team has used 1.5 million GPU hours on NERSC’s Perlmutter system and published at least eight major papers acknowledging NERSC, on topics ranging from modeling protein-protein interactions that are key to biological processes to illuminating microbial “dark matter” through metagenomics. Baker and his team were among the first to use Perlmutter’s GPUs when they were installed in 2021. In June 2024, Baker described his work and the ways in which NERSC has enabled it in a seminar, part of the NERSC@50 seminar series celebrating NERSC’s 50th anniversary. Learn more. 

Joint Genome Institute (JGI): Baker has worked with the JGI since 2013, and led two proposals in 2017 focused on structural models for protein families. Through the JGI’s Community Science Program, David Baker’s lab generated structural models for 614 or 12 percent of the protein families that had previously had no structural information available. The work was published in Science in 2017. A follow-up paper on functional dark matter appeared in Nature last year. Building off that proposal, Baker’s group availed of the FICUS collaborative science program in a call that harnessed both the JGI’s and NERSC’s resources. His team mined raw and annotated genome sequences in the IMG/M database to find more homologs within protein families that can then be used to develop computational methods. These methods could help build accurate models of how the proteins fold, providing testable clues to potential functions.

Biosciences Area: While some members of the Biosciences Area have been working with David Baker through the JGI and the Biosciences-run beamlines at the ALS, Paul Adams has collaborated with him on the computational side of his work. Adams, Associate Laboratory Director for Biosciences, leads a multi-institutional program that develops the Phenix software, which is used by biologists around the world to solve macromolecular structures. Adams teamed up with Baker over twelve years ago to explore how the highly accurate forcefield from Rosetta, a computational platform that Baker developed to predict and design protein structures, could be integrated with Phenix. Together they developed automated tools to use Phenix’s crystallographic algorithms and the Rosetta potential to improve crystallographic refinement of structures with low resolution data, resulting in dramatically better models. Most recently, Adams and other Biosciences researchers helped validate the RoseTTAFold software that uses AI to accurately predict protein structure, by using it to solve previously unsolvable structures. 

Molecular Foundry: At the Molecular Foundry, Baker has worked with staff scientist Bruce Cohen to design protein cages that can encapsulate the nanocrystals developed at the Foundry. “David’s group has designed hollow protein cages, and our goal has been to see if we can get them to form around our nanoparticles. This has been a big challenge, since we have to get them to form around the nanoparticles before they assemble on their own,” said Cohen. The groups have gone back and forth on a number of protein designs for specific nanoparticles. “It’s amazing that they can design most whatever they want and it actually works most of the time.”

Energy Sciences Network (ESnet): Known as the Department of Energy (DOE)’s “data circulatory system,” ESnet’s high-performance, high-speed network connects all of the DOE’s national labs, user facilities, and large-scale scientific instruments together, as well as to research and education network partners, across the United States and the globe. ESnet enabled Baker to quickly and efficiently move his terabytes of data from user facilities at Berkeley Lab, SLAC, Argonne National Lab, and Brookhaven National Lab to NERSC and the Argonne National Leadership Computing Facility, and to his home laboratory.