Home OSS About Privacy

Learn Data Science over 10 Years

Spend time doing data science on projects that motivate you. Don't worry about the math until you are interested. Be prepared to learn more about the things that support data science like data engineering, MLOPs, and how to communicate data and technical jargon. Consider formal education if you can afford it because it gives a credential and provides feedback but it will be a balance between theory and application.

A decade into my career isn't a very long time but it has given me enough exposure to the ups and downs of statistics, data analysis, machine learning, data science, deep learning, data engineering, and most recently MLOps. The time has helped give me perspective on failings of my own and expectations I had in the mad rush to learn data science.

I often think of Peter Norvig's Teach Yourself Programming in Ten Years and consider it good advice for any domain. However, when black, hispanic, and women are under represented in tech, it feels like a privileged opinion to have. However, Norvig's comments are well fitted for data science. To quickly summarize:

I would only add a few additional comments specific to data science.

Mathematical Statistics vs Algorithmic Modeling Culture

The Algorithmic Modeling Culture is in the lead and supports treating modeling exercises as a black box and prediction accuracy as most important. However, this does not preclude data scientists from learning about mathematical statistics.

Teaching yourself programming has a huge benefit over teaching yourself statistics. "Introductory statistics text book will assault you and prevent you from getting to things that are applicable". Whereas a "Hello World" application lets you create something instantly and receive immediate feedback which is crucial for beginners.

Learn the Ecosystem - It Changes Fast

The biggest reason to learn more than just algorithmic modeling is that you won't necessarily be a data scientist "forever". Careers are long and interests change. In addition, having a broader set of skills will both make your and other people's lives easier.

A word of caution: Data Science skills alone are now (2021) a commodity. Great data scientists are those with a major in another domain and a minor in data science. However, the higher paid data scientists are the ones working in Information Technology teams rather than in the business unit and are likely more focused on the MLOPs and data engineering sides.

Formal Education

If you think formal education will be helpful, focus on what you will get out of the investment and if it matches your goals.

Format Direction Application Theory Cost
Self-Study Chart your own course. Hard if you aren't sure what path to take and stay motivated. Variable Variable Low (Books or Courses)
Boot Camp Focus on applied skills, learning modern tools and potentially related skills like MLOPs and serving models as APIs. High Low Medium
Certificates Certs from places like Udemy or Coursera are not impressive but they give you applied skills. Certs from a university are not as impressive as a masters but are often not as applied as cert from a non-academic source. High Low Medium
MS Statistics Most programs are still focused on mathematical statistics and later on applied analysis skills. They don't typically teach neural networks / machine learning techniques. Medium High High
MS CS You'll have the basis of thinking algorithmically and they may offer a course or two on machine learning. However, this is likely the most lucrative degree and offers you the most options outside of data science. Medium Medium High
MS Data Science Very focused on applied skills with some theory and ethics thrown in there. Not as generalizable as a degree in CS. High Low High
MS Analytics Also very focused on applied skills but with a heavier emphasis on functional areas like accounting, operations, and marketing. Medium Low High
STEM Degree You'll learn a domain and that will give you a foundation of analytical thinking. Some companies just want smart people and will look for candidates with this background. Low High High

I'm a big proponent of formal education in data science because it provides you with a path to follow, regular feedback, and exposure to topics you might not have considered looking at. However, it can be easy to pass a class and harder to remember the important parts without active learning and deliberate practice.

Learn to Work with People

Lastly, data science requires communication and collaboration skills since it requires you to convey technical results to non-technical people and work with people across industries. Having a few "soft skills" will help you be more effective during the 90% of your time you are not dealing with algorithms and data.

Summary and Hope

The important part is to START LEARNING AND DOING.

For retaining knowledge around data science, the next part is to recall what you've learned either by quizzing yourself over a period of time, practicing on areas that you actually need to improve, creating recaps, talking about the topics with others, and, of course, by using what you learned in projects at work and at home.

For becoming more effective in the workplace, practice your communication and explanation skills. Then start learning about your domain and then the ecosystem around data science (data engineering, MLOPs, big data frameworks, etc.). Executives / business people appreciate those who communicate clearly and know the business. They do not care about how hard you worked or the steps you took to get the information or answer (except for the steps to ensure that it is correct).