Feeding the Wulf: Biowulf Turns 25
NIH’s Supercomputer To Enter a New Era of Support for AI and Other Complex Demands
BY THE NIH CATALYST STAFF
Biowulf, NIH’s premier high-performance computing (HPC) system, turned 25 this year. Yet unlike that Dell 486 you might have invested in during the 1990s (32 megs of RAM; yeah, that oughta be plenty), Biowulf is faster and has more capacity than ever before.
Indeed, Biowulf is the most powerful HPC system in the United States dedicated to biomedical research. The NIH Center for Information Technology (CIT), which maintains Biowulf, hopes to further enhance its capacity to accommodate a rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) demands, according to Steve Bailey, CIT’s director of High Performance Computing Core Facility.
Today, Bailey said, Biowulf serves nearly 2,500 active users, including almost three quarters of the PIs in the NIH IRP. The majority of the usage is for genomics, followed by structural biology and imaging. Adam Phillippy, senior investigator and director of the NHGRI Center for Genomics and Data Science Research, is one such Biowulf user. His lab taps upward of 30 million central processing unit (CPU) hours per year.
With Biowulf, Phillippy and his colleagues were able to fully complete the sequence of a human genome in 2022, correcting errors introduced during the initial mapping, circa 1990–2003, and revealing the final unmapped 200 million bases in the 3-billion-base human DNA. Additionally, the international Telomere-to-Telomere Consortium behind the complete assembly used Biowulf’s file-sharing service as a centralized location to store and share data within the consortium as they worked together on the downstream analyses of the complete genome, Phillippy said.
“[Biowulf] was one of the draws that helped recruit me to the NIH in 2015,” said Phillippy. “Because we were such a compute-intensive group, I could only go somewhere that could support the needs of my group for the genome assembly and sequencing work. When I do my quadrennial reviews, and they have to ask ‘Why the NIH Intramural Research Program?’ I hold up Biowulf as a resource that’s unique to here that I don’t have access to elsewhere.”
Daniel Pine, chief of the NIMH Section on Development and Affective Neuroscience, is another moderately heavy user who now sees Biowulf as essential to his research, having turned to the system in 2015, some 15 years after joining the NIH.
A clinician-scientist, Pine specializes in brain function and psychopathology, such as stress, anxiety, and emotional problems in children and adolescents. His lab often uses more than a million CPU hours per month on Biowulf to analyze brain magnetic resonance imaging scans. His team can perform a multitude of individual computations on any given image across tens of thousands of microregions in the brain—capturing thickness, surface area, curvature, gyrification (brain folding), and other features—and compare that in parallel with thousands of other brain images.
“To be one of the leading groups in the world [on brain structure and function], you absolutely need access to high-performance computing resources,” Pine said. “Ultimately what we want to be able to do is to get information from the brain [scan] that we can’t get clinically to understand how we predict what’s going to happen to any one child…to better understand treatments.”
“For what my group does, which is fairly clinically focused, Biowulf is wonderful,” Pine added.
So, what’s under the hood?
Biowulf is a cluster of more than 3,000 computer nodes located in the belly of Building 12. Like books on a bookshelf, each node—sort of a computer unto itself, with a motherboard, memory, and processors—sits on a computer rack.
A typical node has between 32 and 192 processors. A CPU hour refers to one hour on one processor. If you do the math and multiply all those processors and nodes (we didn’t; they wouldn’t grant us access to Biowulf), you’ll understand how a single user can use millions of CPU hours over the course of only a few days.
Biowulf comprises 100,000 CPUs and 1,000 GPUs. CPU, one said processor, stands for central processing unit. This has been the mainstay for processors for decades and what is likely supporting your laptop computer. A GPU, short for graphics processing unit, was developed more recently, primarily for computer gaming.
A GPU isn’t necessarily better than a CPU, but adding more GPUs to Biowulf is one direction that CIT will pursue to increase speed and capacity, according to David Hoover, a computational biologist and chief scientist of Biowulf since 2003.
CPUs are ideal for complex, serial tasks such as protein motion simulations in which each new step in an intricate calculation depends heavily on the previous step, Hoover said. GPUs are ideal for less complex yet parallel tasks, such as identifying variants of gene sequences in very large datasets.
Perhaps more significant for Biowulf’s growth, Hoover said, GPUs are needed for the development of AI and ML programs. But there are limitations. First, there’s the physical size to consider: a CPU is about the size of a tea bag; a GPU is about the size of a tea box. Then there’s the cost. GPUs are about 10 to 100 times as expensive as a CPU. And, Hoover said, the highest-quality GPUs are in short supply as the demand has been so high the past two years with the rise of AI that large commercial users are snatching them up.
Another option that CIT is considering to increase Biowulf’s capacity is to offload some jobs to the cloud, a lofty term to describe very terrestrial computer servers, some of them as close as northern Virginia. Doing so comes at a high cost for such a commercial service.
Similarly, the CIT has tapped into the Texas Advanced Computing Center at the University of Texas at Austin, funded by the National Science Foundation. Although the cost might be cheaper than commercial cloud services, the queue might be long, Hoover said. And neither are a substitute for the Biowulf team’s hands-on service and familiarity with NIH research, including computational biology.
“Hardware is one thing. But people-resource? That’s something we have in spades,” Hoover said.
Solid infrastructure for exciting, uncertain future
Physical size is not a significant limiting factor for Biowulf; the system is more limited by power and cooling available in the data center, as well as funding for upgrades, according to Benjamin “Tim” Miller, the technical lead for NIH HPC responsible for architecture and technical operations—that is, the guy who gets called at 2 a.m. if Biowulf is down.
“We’re very fortunate that our data center facilities team at CIT, as well as the Office of Research Facilities, have been very accommodating of our needs,” Miller said. “Within the last several years, the chilled water pumps in the data center, which we rely on to cool Biowulf, underwent a major upgrade. Likewise, these groups have been taking good care of the power systems that provide a lot of electricity to the cluster. We just completed a major upgrade of our networking that will allow us to eventually increase our bandwidth to the NIH network backbone.”
Miller said he’s very excited to incorporate some DPUs, or data processing units, which are ideal for storage, networking, and security operations, essentially offloading these tasks from the CPUs in the system.
To flush out this alphabet soup, what is not in Biowulf’s immediate future, Miller said, are TPUs and QPUs. The TPU, a tensor processing unit, is a proprietary processor developed by Google for neural network machine learning and AI; a QPU is a quantum processing unit, still as elusive as quantum stability itself.
One more exciting project for the Biowulf team is “HPC on demand,” now in pilot mode. While access to Biowulf is rather simple via a laptop and web interface—you enter your request, and you monitor your job—HPC on demand allows users to work directly on Biowulf in real time for relatively small tasks.”
“Users dig it,” said Hoover. “It really is just dynamite.”
And such is the explosive future for Biowulf as it logs billions of computing hours well into the future.
Don’t miss the final installment of the Biowulf 25th Anniversary Seminar Series on November 14, 11 a.m. to noon in Building 10 in Room 7. The event, as well as the past three presentations, will be available on NIH VideoCast. Click here for more information.
Read More About the History of NIH’s Biowulf
This page was last updated on Tuesday, November 5, 2024