Declarative Pattern Mining using Constraint Programming
The goal of pattern mining is to discover patterns in data. Many techniques have been proposed for this task, differing in the type of patterns they find. To ensure that only patterns of interest are found, a common approach is to impose constraints on the patterns. Constraint-based mining systems exist in which multiple constraints can be specified. However, combining constraints in new ways or adding complex constraints requires changing the underlying algorithms. A truly general approach to constraint-based pattern mining has been missing.
In this thesis we propose a general, declarative approach to pattern mining
based on constraint programming. In a declarative approach one specifies what
patterns need to be found, instead of algorithmically specifying how they must
be found. Constraint programming offers a methodology in which a problem
is stated in terms of constraints and a generic solver finds the solutions.
A first contribution of this thesis is that we show how constraint programming can be used to solve constraint-based, closed and discriminative itemset mining problems as well as combinations thereof. A second contribution is that we demonstrate how the difference in performance between general constraint solvers and specialised mining algorithms can be reduced. A third contribution is the introduction of the k-pattern set mining problem, which involves finding a set of k patterns that together satisfy constraints. We propose a high-level declarative language for k-pattern set mining as well as a transformation of this language to constraint programming. Finally we apply our declarative pattern mining framework on a challenging problem in bioinformatics, namely cis- regulatory module detection. For this application, the ability to add domain- specific constraints and to combine them with existing constraints is essential.
Hence we investigate, for the first time, how constraint programming can be used in pattern mining. We conclude on this promising approach with several remaining challenges.
Full textOnline PDF: phd_thesis_tias.pdf
Software (open source): CP4IM
- Friday 27 January 2012
- Starting at 16u30
- Auditorium Arenbergkasteel (room 01.07)
- Kasteelpark Arenberg 1, 3001 Heverlee
- Detailed directions
- A 45 minute presentation explaining what my thesis is about
(in Dutch and aimed at a general audience)
- Many questions from the jury, followed by a deliberation
- A reception, to celebrate! (starting ~18u)
- Let me know by email that you will join.
- To ensure sufficient appetizers, please notify me before Saturday 21 January
- Prof. Carlo Vandecasteele (Chair)
- Prof. Luc De Raedt (Promotor)
- Dr. Siegfried Nijssen (Co-promotor)
- Prof. Bettina Berendt
- Prof. Maurice Bruynooghe
- Prof. Patrick De Causmaecker
- Prof. Barry O'Sullivan, University College Cork, Ireland
- Prof. Pascal Van Hentenryck, Brown University, Providence, USA