TechBookReport logo



Keywords: Data science, Python, machine learning, statistics

Title: Data Science from Scratch: First Principles with Python

Author: Joel Grus

Publisher: O’Reilly

ISBN: 978-1492041139

Media: Book

Verdict: Very highly recommended.

 

Back in the day, when I was doing my PhD, I would try and explain what it was I was researching. It was sort of weird mix of programming, machine learning, data mining and statistics, sort of. The nearest to a catchy phrase to encapsulate this was 'intelligent data analysis', which never really too the world by storm. These days of course I'd only have to say 'data science' and people would get it, even if they only have the vaguest idea of what it entails. Even better, for those who are really interested in learning what it means, there are plenty of books which bring all of the elements that make up data science together in a single volume — like this one.

Python is probably the programming language most often associated with data science, but of course lots of other languages and tools are used in practice. In my own case I use a lot of Java, with R and even Excel VBA coming in handy at times. However, it's strictly Python in this book, although the author doesn't assume any existing Python knowledge or experience. So, after a quick outline about what data science is, and isn't, and the setting out of a series of hypothetical problems to solve there is a crash course in Python. It feels rushed of course, but when you consider that there are huge tomes devoted to learning Python, cramming it all into one chapter is no mean feat. It helps if you already have a programming language under your belt — just as it helps if you follow along and actually pay close attention.

Python isn't the only crash course on offer — there are single chapter intros to data visualisation, linear algebra, statistics, probability, inference and gradient descent. These introductions are succinct, as you'd expect, but they're well written, use lots of code and make use of good illustrative examples. They lend themselves to doing rather than focusing on the underlying theory. And Joel Grus makes sure to point the interested reader to further information and references to go away and explore.

Of course every data science project involves some level of data wrangling and there are two chapters looking at data input/output and data handling. As with the other parts of the book, good use is made of pre-existing libraries to lighten the load. Although there's usually some digging under the covers rather than depending on packaged code without an eye on what's going on behind the scenes.

Most of the rest of the book goes into a range of different techniques and algorithms that are part of the data science repertoire — k-nearest neighbours, different types of regression analysis, Bayes theorem, decision trees, neural networks, clustering and more. There is even a chapter on deep learning, which is definitely flavour of the month in the popular media. Some specialist topics are missing, such as genetic algorithms or ant colony optimisation, but these really are niche areas more often of interest to researchers rather than to the reader looking to make a practical start.

Overall Grus does a really good job of covering the key material. Examples are well structured and clearly explained. The choice of Python makes a lot of sense — the code examples are concise and can be teased apart and played with easily. Sure, you can use other languages but the concision of Python and the easy availability of excellent libraries means that a huge amount of material is covered in a book of less than 400 pages.

For those of us who know something about the topic already but who aren't Python users this is a really great resource. You can dive in very quickly and home in on the topic you want to explore and quickly find some working code to look at. It does a pretty good job of getting to grips with a topic in just a few pages. If you're new to data science and are in the market for a single-volume introduction to it then this is definitely the place to look. Very highly recommended.


Hit the 'back' key in your browser to return to subject index page

Return to home page

Contents © TechBookReport 2020. Published March 11 2020