As I wrote in my introductory post, I’ve been learning how to do data science (specifically) for the past five months, building on ten years of formal training in physics and math. Although I’m certainly no expert, I’ve found many online resources and habits to be useful in this endeavor. Here’s an overview:
Massive Open Online Courses are a recent phenomenon in education, and their growth in just the past couple years has been very impressive. They present both benefits (free access to information, self-directed and flexible learning schedule) and challenges (lack of interaction with instructor and other students, computer and internet requirements) to learning. If you have the time, though, they are a great source of information on many topics in data science, some of which would be quite difficult to learn through pure self-study. Personally, I’ve enrolled in several courses through Coursera, including:
- Machine Learning with Andrew Ng: Highly recommended! Provides an excellent introduction to and overview of machine learning techniques, with minimal prerequisites, from an expert in the field and the co-founder of Coursera itself. Also gets you up and running with GNU Octave.
- Computing for Data Analysis with Roger Peng: Recommended for those wishing to learn how to program and perform statistical analysis in R. (You do.) It’s much easier to follow along with a lecture than to learn directly from the R documentation!
- Web Intelligence and Big Data with Gautam Shroff: I’m currently enrolled in this course, and to be honest, I’m not impressed. That said, the instructor does provide a decent overview of a rapidly-evolving topic that, in my experience, is hard to make sense of on one’s own.
- Introduction to Data Science with Bill Howe: This may be exactly what we’re looking for; unfortunately, it won’t be given until next April. Stay tuned!
Besides Coursera, I’ve heard good things about Udacity, which offers a number of relevant courses on math, science, and programming. There’s also iTunesU and edX, among others. If you’re interested in learning a new programming language, I strongly recommend checking out Codecademy, which provides a series of interactive lessons that cover the fundamentals of a given language, Python being the most useful to prospective data scientists. (I don’t think this qualifies as a MOOC, but I’m sticking it here anyway.)
Given the current growth and interest in data science, it’s not hard to find blogs that are written by and/or cater to data scientists (and data-scientists-in-training). The information they provide isn’t as strictly educational as that in MOOCs, but they cover a wide range of topics at varying levels of skill and detail as well as up-to-date news, trends, and tips in the field. Here’s a short list of my favorite data science blogs:
- Data Science Rules: Duh.
- Data Science 101: This blog is only a few months old; it’s written by a guy who is, like me, learning to become a data scientist. It’s a good source of news and learning opportunities for those looking to get into the field.
- Revolutions: An R-specific blog with news, excellent examples, and occasional tutorials of interest to the R community. On Fridays they post fun and interesting videos for the geeks among us.
- FiveThirtyEight (Nate Silver’s Political Calculus): My favorite New York Times blog. It’s written by a statistician and provides an engaging, readable, and ongoing example of data science applied to political issues and polling.
- HilaryMason.com: Written by the eponymous Chief Scientist at bitly, who is well-known (and much-appreciated!) for her good work and accessible explanations of complicated topics in the field. It also provides links to videos of Hilary’s presentations — worth watching!
This is just the tip of the iceberg; I suggest exploring the blogosphere on your own to get a feel for what’s out there. If anyone out there has recommendations, please let me know in the comments!
Since this post has run long (and the evening wears on…), I’ll stop here, and cover Books & Tutorials and Practice in my next post.