/ #Data Science #Teaching 

Data science has always been different

Vicki has written a thought provoking post. I think she suffers a bit from generalization as she herself complains about generalization. I am not sure her argument is very balanced as her definition of a junior data scientist goes to low, and her picture of the expectations is too broad.

Junior Data Scientists

She seems to include almost every person that has completed a few Coursera courses or a couple of certificates. Taking a few classes in this space does not make someone a junior level data scientist. That is an expansive definition. However, as Vicki mentions that data scientists don’t need to know statistics, maybe I shouldn’t fault her for her lack of inferential ability?

I would never group junior data scientists into a space of engineering unless they were CS students looking for jobs in the space. I would group junior data scientists into the area she recommends at the end of the article (emphasis added).

While tuning models, visualization, and analysis make up some component of your time as a data scientist, data science is and has always been primarily about getting clean data in a single place to be used for interpolation.

Another point concerning the quote above is her lack of clarity that visualization is vital for cleaning data and inferential decisions that will be made with the data. It is not just a bullet within the camp of analysis.

Data Science in Industry

The phrase ‘Data Scientist’ is too broad in industry. Vicki’s article struggles to convey that point clearly as she bounces between what industry wants and what an entry-level data scientist would do. She groups data scientists into three categories and none of those categories include data munger which she references as being the data science entry position.

First, I recommend learning SQL for everyone, regardless of whether their ambition is to be a data engineer, ML expert, or AI superwhiz.

We are starting to get a set of names that describe the sub-categories of data science. Data engineering is a specialized sub-category within data science and is tailored to CS expertise. The field of data science is much larger than data engineering and touches every business.

As to her Gartner hype cycle statement about data science. I don’t think we have gone through nearly every stage. I think we are just entering the ‘Slope of Enlightenment.’ Looking at Vicki’s profile she appears to be in the ‘less than 5 percent of the potential audience’ and is once again making poor inferential statements with a poorly representative sample.

I would recommend reading, What To Look for in A Data Science & Analytics Education. It has a more explicit expression of what junior level data scientists need. Vicki attempts to put forth all the same ideas.

Conclusion

I have a decade’s experience in applied statistics and data science. I have spent the last three years developing an undegraduate data science program at BYU-I. We have worked across departments and with industry to build the program. I suppose I have some bias in the argument, but I am not trying to sell anything.

We have received high marks from multiple industry employers (Intermountain Health Care, Sorenson Media, Gulfstream Aerospace, Henry Schein, and Honeywell Aerospace for example) that we are moving in the right path. Our summer 2019 interns are going to high profile companies like Walmart, Goldman Sachs, the US Department of Energy, and EY. We are building out the skills that Vicki and industry describe for a junior data scientist.

  1. Learn SQL and R/Python
  2. Have strong data munging skills.
  3. Get substantial experience with real data.
  4. Build soft skills (especially through collaborative projects)
Author

J. Hathaway

Data Scientist, Consultant, Teacher, Learner