TechBookReport logo

Keywords: Genetic algorithms, data mining, evolutionary computation

Title: Data Mining and Knowledge Discovery with Evolutionary Algorithms

Author: Alex A. Freitas

Publisher: Springer

ISBN: 3-540-43331-7

Media: Book

Level: Advanced

Verdict: Excellent

We are drowning in data, of that there is no doubt. Technological advance, coupled with continuing decreases in costs, mean that storage volumes are growing at an exponential rate. We are already at the stage where we cannot easily visualise the data that we are storing. In some domains, such as the human genome project, on-line transaction logging and supermarket basket analysis, the sheer volume of data is such that we risk losing valuable insights that are locked away in gigabyte after gigabyte of raw numbers.

Data mining is the task of digging through this data looking for patterns, associations or predictions and which transform that raw material into useful information. This is no mean feat given that very often we don't know what it is precisely we're looking for. It's pattern matching without knowing in advance what the pattern is. As a software engineering task it's one of the most interesting and urgent, and one where the usual tricks of database programming fail - either because of the data volumes or because the technology is not designed for the task.

Evolutionary algorithms - which includes techniques such as genetic programming and genetic algorithms - are one way of tackling the data mining challenge. It literally means evolving the patterns which fit the data that is being mined - and using Darwinian principles to weed out the patterns which don't work in favour of those that do. Survival of the fittest ensures that over time it is the patterns, (or rules), which best fit the raw data that are delivered as solutions.

In the snappily entitled Data Mining and Knowledge Discovery with Evolutionary Algorithms, leading researcher Alex A. Freitas introduces both data mining and evolutionary algorithms. He outlines the key features of the problems to be tackled and some of the key issues that practitioners have to solve. For example he details some of the different tasks that data mining has to solve, such as classification, prediction and so on. Although only a brief outline it's enough to put into context the kinds of areas that evolutionary algorithms might be applied to.

The bulk of the book is focused on evolutionary techniques, starting with the biological inspiration for these developments and then leading to the key features shared by the wide variety of algorithms that are used. Key issues, such as representation, fitness function and population selection are addressed in more detail. Representation, for example, is about the data structures that are used to encode potential patterns that are buried deep in the raw data. The fitness function is a measure of how good (or bad) a potential rule is, and population selection is the algorithm that is used to decide which individual patterns survive and which are cast aside. With an understanding of these three elements it is possible to design an evolutionary algorithm to mine just about any type of data that you care to.

Be aware though that this is not a book full of source code, the aim is to introduce and address the key challenges to a high-level of detail. With an understanding gleaned from this book, and source code freely available on the web, the world of data mining is your oyster (if you don't mind mixing metaphors).

Hit the 'back' key in your browser to return to subject index page

Return to home page