SDC, Data Science and Knowledge

Ali Ayadi

From SDC, Data Science and Knowledge
Jump to navigation Jump to search

PhD student in the SDC team (formerly BFO team) of the ICube laboratory of the University of Strasbourg since May 2015.


ICube Laboratory
Télécom Physique Strasbourg
300 bd Sébastien Brant - CS 10413
F - 67412 Illkirch cedex
Office: C325
Phone: +33 (0) 3 68 85 45 78
Email: ali.ayadi (at) unistra (dot) fr


PhD Thesis

Title: Semantic technologies for the optimization of complex molecular networks

Promotor: Cecilia Zanni-Merk (Senior Tenured Associate Professor, ICube-SDC and INSA Strasbourg)

Co-advisor: François de Bertrand de Beuvron (Tenured Associate Professor, ICube-SDC and INSA Strasbourg) and Julie Thompson (Research director CNRS, at ICube-CSTB)

Overview: This PhD thesis is prepared as part of a Franco-Tunisian cotutelle between, the SDC Team in collaboration with the LBGI Team and the University of Tunis. This project is intended to develop a platform for the optimisation of the transittability of complex biomolecular networks by offering transitions steering mechanisms in order to steer these complex networks from an unexpected state to a desired state.

Many complex systems of scientific interest can be represented as networks in which a set of nodes are connected in pairs by edges or arcs. Because of the interactions among nodes in a network, perturbing some nodes can affect other nodes, which may cause the state transition of a network. In reality, we have often observed some unexpected state transitions of a complex system (for example, from a normal sate to an abnormal state). Here we are interested in how to effectively steer the system from an unexpected state to a desired state by applying suitable input control signals. The main purpose of this project is to provide a platform based on two strong points of the Data Mining Theme of the SDC team of ICube: semantic technologies on the one hand, and combinatorial optimisation tools on the other hand.

Relational data mining is a subfield of data mining where data is not represented according to the classic attribute-value model, in which every row of a single table would represent a training instance of a model with its properties, including the attribute to predict. Here, data is represented by several tables linked with foreign keys, which represent the different kinds of objects constituting the problem. A table, called the main table, contains the training instances (for instance, molecules) with the attribute to learn and other tables (for instance a table of the atoms constituting the molecules) contain the secondary objects linked to the main ones. We intend to take into account the properties of such secondary objects in the learning process on the main objects. A way to do so, in which we are more particularly interested, is the use of complex aggregates. They constitute a way to aggregate the secondary objects linked to one main object that meet a certain condition. More intuitively, the allow to summarize in one value the secondary table. Two examples of such an aggregate would be the number of carbon atoms in the molecule, or the average charge of the oxygen atoms of the molecule. However, the number of possibilities for the aggregate condition and the aggregate function make the exhaustive generation of all complex aggregates intractable. One of the goals of the PhD thesis is to propose a heuristic allowing to explore the complex aggregate space and to generate incrementally the ones that are relevant to address the given problem.

The other domain on which this PhD thesis focuses on is multi-class cost-sensitive learning. In this kind of problem, the attribute to learn can take many values, i.e. more than 2, contrary to the binary problems for which many learning algorithms are designed. Moreover, all the classification errors do not have the same cost, as expected in a medical domain, where diagnosing a disease for a sane patient will not have the same impact as not diagnosing the disease for a sick patient. In this framework, we are particularly interested in to binarization approaches, which consist in reducing a multi-class problem into several binary problems. More particularly, we consider the case where the binarization uses scorers, the scores being used to set decision thresholds between the two classes of the binary subproblems.


Teaching assistant at the UFR Mathématiques-Informatique (department of Mathematics and Computer Science) and at the Faculté de Géographie et d'Aménagement (University Institute of Technology) of the University of Strasbourg.


  • L1/MathInfo Computer Science S1 :Computer and internet certificate (C2i)
  • Master1/GE-OTG Computer Science S2: Spatial databases and SQL (PostgreSQL)
  • L1/MathInfo Computer Science S2:Databases and SQL (Oracle)
  • L1/MathInfo Computer Science S2: Object-Oriented Programming (Ocaml)