Thesis information

Declarative Pattern Mining using Constraint Programming

The goal of pattern mining is to discover patterns in data. Many techniques have been proposed for this task, differing in the type of patterns they find. To ensure that only patterns of interest are found, a common approach is to impose constraints on the patterns. Constraint-based mining systems exist in which multiple constraints can be specified. However, combining constraints in new ways or adding complex constraints requires changing the underlying algorithms. A truly general approach to constraint-based pattern mining has been missing.

In this thesis we propose a general, declarative approach to pattern mining based on constraint programming. In a declarative approach one specifies what patterns need to be found, instead of algorithmically specifying how they must be found. Constraint programming offers a methodology in which a problem is stated in terms of constraints and a generic solver finds the solutions.
A first contribution of this thesis is that we show how constraint programming can be used to solve constraint-based, closed and discriminative itemset mining problems as well as combinations thereof. A second contribution is that we demonstrate how the difference in performance between general constraint solvers and specialised mining algorithms can be reduced. A third contribution is the introduction of the k-pattern set mining problem, which involves finding a set of k patterns that together satisfy constraints. We propose a high-level declarative language for k-pattern set mining as well as a transformation of this language to constraint programming. Finally we apply our declarative pattern mining framework on a challenging problem in bioinformatics, namely cis- regulatory module detection. For this application, the ability to add domain- specific constraints and to combine them with existing constraints is essential.

Hence we investigate, for the first time, how constraint programming can be used in pattern mining. We conclude on this promising approach with several remaining challenges.

Full text
Online PDF: phd_thesis_tias.pdf
Software (open source): CP4IM

The defence

You were warmly invited to my PhD defence:

When

  • Friday 27 January 2012
  • Starting at 16u30

Where

  • Auditorium Arenbergkasteel (room 01.07)
  • Kasteelpark Arenberg 1, 3001 Heverlee
  • Detailed directions

What

  • A 45 minute presentation explaining what my thesis is about
      (in Dutch and aimed at a general audience)
  • Many questions from the jury, followed by a deliberation
  • A reception, to celebrate! (starting ~18u)

How

  • Let me know by email that you will join.
  • To ensure sufficient appetizers, please notify me before Saturday 21 January

 

The jury

  • Prof. Carlo Vandecasteele (Chair)
  • Prof. Luc De Raedt (Promotor)
  • Dr. Siegfried Nijssen (Co-promotor)
  • Prof. Bettina Berendt
  • Prof. Maurice Bruynooghe
  • Prof. Patrick De Causmaecker
  • Prof. Barry O'Sullivan, University College Cork, Ireland
  • Prof. Pascal Van Hentenryck, Brown University, Providence, USA