TechBookReport logo

Keywords: Java, Python, SQL, XML

Title: Data Crunching

Author: Greg Wilson

Publisher: Pragmatic Bookshelf

ISBN: 0974514071

Media: Book

Level: All

Verdict: A really good read. Recommended


Wrestling with data is a fact of life for a lot of developers. There must be some who manage to steer clear of the problem, but for many of us it's an often painful chore that consumes more hours than we care to admit. Rolling up data from one format to another, applying adjustments, cleaning up poor quality data, moving from one system to another … the list of data crunching tasks is as long as the list of obscure formats that we have to untangle. By its nature the data crunching task often starts as a one-off job or a quick fix. This means that the code we produce is also a one-off or a quick fix. But the same problems crop up again and again and after a while we find ourselves solving similar problems for the umpteenth time. Greg Wilson has been there too, and this slim but useful little book is one of the results.

Sub-titled 'Solve Everyday Problems Using Java, Python, and More', this is another in the highly recommended 'Pragmatic Bookshelf' series. That means an emphasis on getting things done, with solid code examples and good clear explanations. The code in this case focuses on Java and Python, with most examples showing solutions in both languages, but there's also an occasional look at other languages such as Ruby. But for the most part the examples will suit Java and Python programmers the most.

This isn't to say that the book is about Java or Python as such. More important than the code is the thinking behind the problems and the solutions. Wilson gives readers the benefits of his experience as he solves common problems and points out the equally common pit-falls. The tone is nicely pitched, Wilson comes across as a helpful and experienced colleague rather than the all-seeing all-important guru.

With chapters on text handling, regular expressions, XML, relational data (including a crash course in SQL) and binary data formats, the book runs through a set of techniques that belong in every developers toolkit. So, for example, the chapter on XML starts with a basic intro, discusses DOM and SAX and then looks at XPath and XSLT. A book this size, (under 200 pages), can't hope to provide in-depth coverage of these topics, and it doesn't pretend to. What it does is provide an over-view and a set of tools that allow the reader to take things on and get results. It's the same with the relational data chapter, which includes enough SQL for the newbie to confidently get at the data locked away in a DBMS and to make changes and updates.

We thoroughly enjoyed this book, it was a pleasure to read, and it was gratifying to see how often the solutions we've come up with match the ones described by the author. Highly recommended.

Hit the 'back' key in your browser to return to subject index page

Return to home page

Contents © TechBookReport 2005. Published June 29 2005