The Training Page/News Everyone Can Use

Lasso Your Data

Consider Adding Data-Science Skills to Your Biologist’s Toolbox

Think back to when you still had a basic cell phone. You could make calls, you could text, you could play some games. It got the job done. When you got your first smart phone, its capabilities probably seemed endless. How could you possibly go back to your “dumb” phone now?

For day-to-day data organization and analysis, we are all probably quite comfortable with Excel. But biology’s complexity is now being reflected in complex sets of data, so computational analyses that require coding skills are becoming the norm. With any hope, we biologists will soon look back at Excel in the same way we do old cell phones.

Learning coding sounds like a daunting task to many of us. Our excuses for not learning a coding language often resemble justifications for not learning a foreign language: “I get by without,” “I don’t have time,” or “I’m not good at those kinds of things.” But enhancing your data-science skills (machine learning, data visualization, and especially coding) can be your ticket to better personal and professional opportunities. Biomedical scientists’ ability to work with model organism databases, structural data, clinically relevant variation data, omics data, or any other publicly available set of “big data”—a virtual treasure trove—can help answer important research questions. Although most basic biological graduate programs do not require coding classes, the demand for data-science training has not gone unnoticed at the NIH.

“All [intramural] trainees have quite a few resources at their disposal if they want to start to tip-toe into the computational space,” said Andy Baxevanis, head of the National Human Genome Research Institute’s Computational Genomics Unit. “Classes range from the hands-on training available through the NIH Library and Foundation for Advanced Education in the Sciences to the ‘Current Topics in Genome Analysis’ series.” (For the latter, check out those lectures at Online coding classes are growing tremendously in popularity. Many of them are free or at least reasonably priced for the quality of content you receive.

NIH scientific interest groups (SIGs) and LISTSERV electronic mailing lists are also valuable resources for anyone. “One of the best things about the Bioinformatics SIG is that people can post specific questions on the LISTSERV. Questions are usually answered within the hour,” said Ben Busby, genomics outreach coordinator at the National Center for Biotechnology Information.

Efforts are underway to set up a related interface for general data science. Busby and colleagues are also establishing an NIH-wide data-science mentorship program to facilitate one-on-one mentorship and training. Several NIH institutes and centers have their own bioinformatics cores that offer similar training and mentoring opportunities.

For example, Supriyo De, staff scientist at the National Institute on Aging, and colleagues recently launched the Biomedical Data Science Initiative, which offers seminar-type overview training, hands-on training for smaller focus groups, and integrated one-on-one teaching for fellows analyzing data from their own projects.

All trainees can also reach out to Intramural Research Program faculty members who use computational approaches in their own work. “Many of us very much enjoy mentoring,” said Baxevanis. “People should feel free to reach out when they need some advice and guidance regarding their projects.”

So should all trainees learn a coding language? “Familiarity with a command-line interface such as Linux is a basic literacy skill in any technical field because a large and increasing proportion of modern data and tools are intractable with GUI [graphical user interface] tools,” explained Busby. (For coding-illiterate folks out there, GUI is the graphical display, rather than purely textual, that allows us to use the computer hardware in a user-friendly way.)

Whether or not you should learn coding may depend on the types of data that you want to analyze and your overall career trajectory. “It certainly is advantageous to have these skill sets to facilitate data analysis,” said Baxevanis. “But in many instances, what is more important is being able to use the analysis resources that are out there in an intelligent fashion, taking the time to understand what these predominantly Web-based resources can do and how they do it—and that they should never treat these resources as a ‘black box’! The same way it’s important to understand the underpinnings of any laboratory-based method, the same applies for all things computational.”

Resources for Learning Coding and Other Skills

Free courses and support at NIH

Paid courses hosted nearby

Online (both free and paid)