AITION: A Scalable Data Mining Platform for Medical Applications
By Harry Dimitropoulos, Omiros Metaxas and Manolis M. Tsangaris, Department of Informatics, University of Athens, Greece
AITION is a new and powerful tool in the quest for improved medical care. User-friendly and designed to run on grids, clouds or ad-hoc clusters alike, AITION allows doctors to investigate and integrate clinical, imaging, genetic and other patient data to find relationships between different medical variables. Based on generative state-of-the-art causal-probabilistic algorithms, AITION generates graph-based “knowledge models” that doctors can interactively explore to answer diagnostic and predictive questions.
Inside the AITION machination
The AITION system consists of four major components:
• User Interface: the heart of the system, the User Interface (UI) allows for user interaction and provides visualisation tools;
• Backend: all the core data mining algorithms take place in the backend, along with the overall coordination of the data mining flow;
• ADP engine: the Athena Distributed Processor (ADP) engine uses user-defined or “custom” operators to express, optimise, schedule and execute tasks or “queries” in a distributed system;
• Relational database: the database stores both the original data and the derived data models. The backend uses a collection of algorithms, most lifted from the open source data mining platform WEKA. These algorithms have been modified to run in a data streaming mode and to use multiple threads, implemented as userdefined ADP custom operators.
Instead of running these algorithms alone, the backend uses ADP to run queries that compose custom operators as a pipeline or sequence of simpler steps. AITION packages custom operators inside containers, then assigns them to different hosts and provides compute, memory, and other resources. AITION’s ADP optimizer then decides which operator implementation to use, the number of threads, the container to assign operators to, and so on. The execution plan is then evaluated using the selected computing resources.
Design benefits
Early in the design of AITION, we mapped each complex data mining algorithm to one or more ADP custom operators, so the ADP optimiser has the choice of using a low overhead compact implementation for small problems, or a relatively higher overhead distributed version for larger problems. AITION makes these choices automatically. We have thus been able to make several key algorithms scalable. In this way, we have reduced needs for memory and increased needs for parallel processing. AITION’s primary focus is on providing our users – doctors in particular – with user-friendly and transparent access to the knowledge models it generates. In this, it differs from traditional data mining, since it provides ways to present, navigate, visualise, and very often, interact with knowledge models. The end result is that users not only understand the process that led to a statistical conclusion, but also the impact of that conclusion on their medical hypotheses.
Further, since AITION users can easily “experiment” with alternative hypotheses, models and parameters –something very rare with traditional data mining approaches or tools – AITION is a real boon for discovering new medical leads.
AITION was developed by the University of Athens, under the Health-e-Child IST project (http://www.health-e-child.org/)
