See on Scoop.it
- Data Nerd’s Corner
A look at some disparate definitions of data science
See on information-management.com
Carla Gentry CSPO's insight:
In his not-to-be-missed blog, Schutt’s adviser at Columbia, noted statistician Andrew Gelman, is even more unabashed in chronicling differences between data science and statistics. “There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, experimental design, and data collection as well as data analysis (which itself includes model building, visualization, and model checking as well as inference)) as a subset of data science…..The question then arises: why do descriptions of data science focus so strongly on statistical tasks? (As Schutt and O’Neil write, “the media often describes data science in a way that makes it sound like as if it’s simply statistics or machine learning in the context of the tech industry.”) I think it’s because statistics is the fun part and the part that, in this context, is new. The tech industry has always had to deal with databases and coding; that stuff is a necessity. The statistical part of data science is more of an option. To put it another way: you can do tech without statistics but you can’t do it without coding and databases.” A second Gelman blog, Statistics is the least important part of data science, reiterates these points.