The path from scientist to data scientist

A natural transition

The transition from scientist to data scientist is achievable. As a scientist you’ve been trained to dive into the murky waters of real world data and emerge with meaningful insights. You’re an expert at assessing causality (probably more expert than many data scientists) and won’t be fooled by spurious correlations. You understand experimental design and are an excellent communicator of highly technical content. What’s missing? A bit of math and coding?

In reality, data scientists are a pretty heterogenous group without a common origin, training path, or skill set. They’re competent coders with a mathematical inclination and a love of data. If you add a bit of coding and some basic data science understanding to your scientific training, you could honestly market yourself as a data scientist.  Furthermore, you could be the most valuable type of data scientist—a unicorn with deep subject matter expertise and solid data science fundamentals. It will take some study and grit, but you can do it if you want it bad enough.

For a scientist trying to learn data science, start with leading to code in Python.

Why data science and analytics

Everyone’s why will be unique and multi-faceted. Here are some considerations. See if any of them resonate with you.

Abundant jobs

You’ll find more job listings for data scientist roles than you will for scientist roles. When you narrow your search to your subfield of plasma physics or population genetics, the disparity is further magnified. Data scientists are in high demand. Judging by CEOs’ comments from pharma to the automotive and tech industries, this trend isn’t changing any time soon.

High pay

In accord with their high demand, data scientists are very well paid. When you’re perusing salary listings online, remember that a lot of data scientists don’t have advanced degrees. That won’t necessarily make you a better data scientist than them, so check your ego. However, with some storytelling and the right fit, your Ph.D. will differentiate you from an ordinary data scientist.

Lateral mobility

Data science provides scientists with opportunities to explore other industries and functions. A scientist turned data scientist could get a job monitoring the impact of advertising campaigns on breakfast cereals.

Flexibility

Remote and flexible work arrangements are the norm in data science. While these modern, post-pandemic working styles remain widespread, you’ll simply can’t do bench lab work from your home office.

Scale and scope

Data science is grand in scale–focusing on big problems or at least big data sets, often millions of data points. Contrast this with the reductionist approach of most laboratory experiments where the scientist labors to generate three results to interpret. As a data scientist, the data comes to you. You get to do the fun part of analyzing and interpreting it.

Figuring out your why will be a journey. Give it some thought now, but don’t wait to start learning. Do you think that data science could be a fit for you? Exercise a bias to action and dive in headfirst.

Getting started.

First, enroll in an online course. I started with Coursera and DataCamp. This worked for me but took way longer than it needed to. None of the courses on the market are designed for a trained scientist. So I built the Scientist to Data Scientist program to share with you. Learn enough coding and beginner data science skills in one month (~60 hours) to get started on a project. These skills will include:

  • Setting up your coding environment
  • Using Python to orchestrate
  • Manipulating data with Pandas and Numpy packages
  • Visualizing data with Matplotlib, Seaborn, and Plotly

Second, formulate and start building a data science project. As a scientist, you know how to learn while doing something useful. And, luckily, you have access to real data and real problems. So construct an impactful project that is professionally important to you. (That way you’ll be far more likely to stick with it and gain recognition for your efforts.) Hopefully, an idea will spring to mind. If not, consider the following for inspiration:

Big data

Do you have access to a data set with a huge number of data points? 

High dimensionality

Do you have data sets with many variables and measurements?

Data visualization

Do these examples spark any ideas on how to present your data? matplotlib gallery, seaborn gallery, plotly gallery

Automation

Are there monotonous analyses you repeat frequently?

Image Analysis

Are there monotonous analyses you repeat frequently?

Natural language processing

Do you have a lot of text (papers, websites, surveys) that you’d like to organize and analyze?

Spend 1-3 months building out a project that you’re proud of, then sell it to anyone that will listen. If you’re in academia, publish a paper on it and distribute it online (perhaps through GitHub).

Keep learning

You’ve only just begun your learning journey, and, honestly, it will never end. As technology continues to advance with incredible velocity, you’ll have to keep pace.  So hopefully you identify as a lifelong learner. Luckily the internet is filled with resources–both free and paid. On the free side, Google searches, Stack Exchange, and Kaggle will be your best friends. However, I’d recommend investing in the resources that will get you where you want to be as efficiently as possible. Whether that’s through some other resource or through the Scientist to Data Scientist program, don’t do it halfway. Commit to yourself that you’re either going to become a bona fide data scientist. Or, at the very least, develop some really valuable skills for a scientist and earn the certifications to prove it. From my personal experience as a hiring manager, these certificates broadcast a candidate’s commitment and conscientiousness.

After you master the basics, you should gain greater perspective and high level data science understanding. As always, learn by doing. Learn just enough theory to get started and demonstrate some applied success at each of the following:

Visualization

Matplotlib, Seaborn, and Plotly packages are essential. They offer differing levels of ease of use and different feature sets. My go-to is PlotlyExpress for its ease of use, integration with Pandas, and the interactivity of plots.

Git

Version control will make your life easier and make you a better coder. There’s a huge community on GitHub that are sharing code. You should be both benefiting from this invaluable resource and contributing to it. Experience with Git is an absolute must in any data scientist application.

Linux

Command line wizardry separates the amateurs from the pros. Personally, I neglected this skill, but that was a huge mistake. The complexity of tasks that can be completed from a single line of code in your terminal is mind boggling.

Cloud

Working in the cloud has become the norm, and AWS is the leader. To

AI/ML

A vast toolkit to solve seemingly any problem. Focus on neural network-based applications.

SQL

Database queries and joins represent a major fraction of industry analytics work. 

APIs

Abstract away as many details as possible by using existing APIs. 

While you can learn all of these skills by browsing documentation and various forums, organized courses really are the way to go (and the Scientist to Data Scientist program is the best). They’ll provide you with high level context and prioritize the essential skills. But do NOT consume the courses passively. Course-associated exercises are good. Applying your new learnings to a real problem of your own is the best way to learn. 

Serve others

Comprehending someone else’s problems and solving them represents the next level of maturity in your budding data science career. The problem and solution may be more complicated than previous projects–reflecting your growing competence–but that’s not the crucial part. Successfully solving someone else’s problem demonstrates to a hiring manager that you can listen, translate, and deliver.

Practice balancing your newly acquired hard technical data science skills with the softer skills required to function and climb in a modern business. You’ll need to be able to communicate complex technical concepts to and empathize with your less technical colleagues. Bonus points for solving a problem for a larger, diverse set of stakeholders, demonstrating an even greater degree of organizational and interpersonal effectiveness. I’ve found that building and distributing a widely-used data analysis tool that empowers and enables subject matter experts is a perfect scenario to check this box. In addition to forging a story of your effectiveness on a resume or in an interview, you’ll start to redefine your reputation and persona, becoming a data scientist in the eyes of your organization. 

While developing your data scientist persona, make sure people see you coding.

Be a leader

As your data science skills grow, you can demonstrate your technical and organizational leadership by taking on a more central business problem. This is where you cease to be a learning hobbyist and become a bona fide data scientist. Use your non-data science subject matter expertise (your day job) to identify and define an important business problem that you think you can solve. Develop a business case articulating the value creation. De-risk the project as much as possible by making sure you can get the required data and building a minimum viable product. Then take your shot.

Aim upward. Your audience should probably be more than one level above you–I’d go for 2-3 levels up. Put yourself in this audience’s shoes. What metrics do they care about? How important is this in the grand scheme of their myriad problems? How much do they know about data science? Answer these questions and hope that you’ll get the chance to make the meaningful impact that your efforts deserve. However, in all likelihood, it won’t be as glorious as you expect. Don’t let it get you down. Keep solving problems, and eventually the combination of your increasing technical skills, your growing organizational savvy, and the circumstances outside of your control will lead to success. 

Define and own your persona

As your reputation and capabilities continue to grow, you’ll become a data scientist. It could happen overnight if you get hired as a data scientist in a different company that was impressed by your stories of initiative and demonstrated successes. Or it could happen very gradually among the people you work with every day. Either way, you’re getting the incredible opportunity to redefine yourself.

Don’t let imposter syndrome shape your persona. Instead, own your identity as a data scientist and flavor it with your unique set of skills and experiences. While every company has data scientists, how many have a data scientist with a bachelor’s degree in chemistry, a Ph.D. in immunology, experience working as a scientist in the pharmaceutical industry, the proven initiative to learn a new field independently, and the vision and grit to accomplish incredible feats in complex organizations. You are one-of-a-kind. Recognize your uniqueness and value, and sculpt your persona accordingly. 

Similar Posts