TechBookReport logo

Keywords: Machine learning, statistics, Python, data mining, social networks

Title: Programming Collective Intelligence

Author: Toby Segaran

Publisher: O'Reilly

ISBN: 0596529325

Media: Book

Level: Introductory

Verdict: Recommended


Machine learning algorithms are at the heart of many of the most successful modern web sites and applications - from search algorithms to generating product recommendations, matching user interests to finding a date, machine learning is as essential as a fancy front-end with all the eye-candy you can muster. For those not versed in these arts, machine learning, data mining and knowledge discovery can be intimidatingly mathematical and conceptually difficult to understand. However, Programming Collective Intelligence, by Toby Segaran, sets out to make a variety of machine learning algorithms accessible to the average Joe programming a social networking or data-driven web application.

Using Python as the main tool, Segaran walks the reader through a number of different scenarios and applies appropriate algorithms to each in turn. He provides both code for generating the data required - and most of the problem domains he looks at are fairly data-heavy - and the code that processes the data sets once they are generated. Where possible he makes use both of existing open-source Python libraries and of publicly available APIs and data-sources (including Yahoo, eBay, and others).

In terms of algorithms he covers a pretty comprehensive range, including neural networks, genetic algorithms, Bayesian classifiers, decision trees, support-vector machines, genetic programs, k-Nearest neighbours and more. That list covers both supervised and unsupervised learning, with clear explanations of the difference between them. It's a fairly representative sample of the current state of machine learning as a discipline. On the whole the mathematical aspects are down-played, so the book doesn't include reams of derivations or proofs, but the underlying ideas are explained in non-mathematical language.

The problems that are tackled are fairly representative too. The book covers making recommendations based on finding other users with similar tastes, discovering groups (or features) in data, searching and ranking, document filtering, function optimisation, modelling with decision trees etc. In all cases the problem is explained in the context of a web site that seeks to provide a specific service or derive a specific type of knowledge. The emphasis is firmly on the practical rather than on the theoretical.

While some knowledge of Python would make the code more intelligible, there's nothing to stop the beginner from just grabbing the code and running it. As with all types of data mining and machine learning, there's an element of experimentation at work here, and the author makes good use of the interactive features of Python to explore the data and to develop the solutions.

Machine learning is a very practical subject that has escaped from the more esoteric aspects of the science of artificial intelligence (AI). It's AI with a screwdriver (or compiler in our case), and this book does a lot to make complex algorithms available to developers. Unlike, say, 'AI Application Programming', the emphasis is less on providing sample code to understand the algorithm per se and more on finding a solution to a specific class of problem using the algorithm.

Overall, this is recommended for those developers (especially those who've got some Python skills) looking to solve problems using machine learning, particularly for web applications and social networking sites.

Hit the 'back' key in your browser to return to subject index page

Return to home page

Contents © TechBookReport 2007. Published October 24 2007