Ali AYADI ICube Laboratory Télécom Physique Strasbourg 300 bd Sébastien Brant - CS 10413 F - 67412 Illkirch cedex
Office: C325 Phone: +33 (0) 3 68 85 45 78 Email: ali.ayadi (at) unistra (dot) fr
Title: Semantic technologies for the optimization of complex molecular networks
Promotor: Cecilia Zanni-Merk (Senior Tenured Associate Professor, ICube-SDC and INSA Strasbourg)
Overview: This PhD thesis focuses on two strong points of the Data Mining Theme of the BFO team of ICube: relational data mining on the one hand, and cost-sensitive learning on the other hand. These two points are currently studied as part of the european project REFRAME, in collaboration with the University of Bristol and the Polytechnic University of Valencia.
Relational data mining is a subfield of data mining where data is not represented according to the classic attribute-value model, in which every row of a single table would represent a training instance of a model with its properties, including the attribute to predict. Here, data is represented by several tables linked with foreign keys, which represent the different kinds of objects constituting the problem. A table, called the main table, contains the training instances (for instance, molecules) with the attribute to learn and other tables (for instance a table of the atoms constituting the molecules) contain the secondary objects linked to the main ones. We intend to take into account the properties of such secondary objects in the learning process on the main objects. A way to do so, in which we are more particularly interested, is the use of complex aggregates. They constitute a way to aggregate the secondary objects linked to one main object that meet a certain condition. More intuitively, the allow to summarize in one value the secondary table. Two examples of such an aggregate would be the number of carbon atoms in the molecule, or the average charge of the oxygen atoms of the molecule. However, the number of possibilities for the aggregate condition and the aggregate function make the exhaustive generation of all complex aggregates intractable. One of the goals of the PhD thesis is to propose a heuristic allowing to explore the complex aggregate space and to generate incrementally the ones that are relevant to address the given problem.
The other domain on which this PhD thesis focuses on is multi-class cost-sensitive learning. In this kind of problem, the attribute to learn can take many values, i.e. more than 2, contrary to the binary problems for which many learning algorithms are designed. Moreover, all the classification errors do not have the same cost, as expected in a medical domain, where diagnosing a disease for a sane patient will not have the same impact as not diagnosing the disease for a sick patient. In this framework, we are particularly interested in to binarization approaches, which consist in reducing a multi-class problem into several binary problems. More particularly, we consider the case where the binarization uses scorers, the scores being used to set decision thresholds between the two classes of the binary subproblems.
Teaching assistant at the UFR Mathématiques-Informatique (department of Mathematics and Computer Science) and at the Faculté de Géographie et d'Aménagement (University Institute of Technology) of the University of Strasbourg.
- L1/MathInfo Computer Science S1 :Computer and internet certificate (C2i)
- Master1/GE-OTG Computer Science S2: Spatial databases and SQL (PostgreSQL)
- L1/MathInfo Computer Science S2:Databases and SQL (Oracle)
- L1/MathInfo Computer Science S2: Object-Oriented Programming (Ocaml)