SDC, Data Science and Knowledge

Difference between revisions of "Pei Zhang"

From SDC, Data Science and Knowledge
Jump to navigation Jump to search
Line 17: Line 17:
 
== PhD Thesis ==
 
== PhD Thesis ==
  
'''Title: ''' Quantification in Relational Data Mining
+
'''Title: ''' Capitalisation of experience in inventive design studies
  
'''Promotor: ''' [[Nicolas_Lachiche|Nicolas Lachiche]] (Tenured Senior Associate Professor, ICube-SDC)
+
'''Promotor: ''' [[Cecilia|Zanni-Merk]] (Senior Tenured Associate Professor, ICube-SDC)
  
'''Co-advisor: ''' [[Agnès_Braud|Agnès Braud]] (Tenured Associate Professor, ICube-SDC)
 
  
'''Funding: ''' Grant from the French Ministry of Higher Education and Research
+
'''Co-advisor: ''' [[Denis|CAVALLUCCI]] (Professor, LGeco)
  
'''Overview: ''' This PhD thesis focuses on two strong points of the Data Mining Theme of the BFO team of ICube: relational data mining on the one hand, and cost-sensitive learning on the other hand. These two points are currently studied as part of the european project [http://www.reframe-d2k.org/index.php/Main_Page REFRAME], in collaboration with the [http://www.bris.ac.uk University of Bristol] and the [http://www.upv.es/index-en.html Polytechnic University of Valencia].
 
  
Relational data mining is a subfield of data mining where data is not represented according to the classic attribute-value model, in which every row of a single table would represent a training instance of a model with its properties, including the attribute to predict. Here, data is represented by several tables linked with foreign keys, which represent the different kinds of objects constituting the problem. A table, called the main table, contains the training instances (for instance, molecules) with the attribute to learn and other tables (for instance a table of the atoms constituting the molecules) contain the secondary objects linked to the main ones. We intend to take into account the properties of such secondary objects in the learning process on the main objects. A way to do so, in which we are more particularly interested, is the use of complex aggregates. They constitute a way to aggregate the secondary objects linked to one main object that meet a certain condition. More intuitively, the allow to summarize in one value the secondary table. Two examples of such an aggregate would be the number of carbon atoms in the molecule, or the average charge of the oxygen atoms of the molecule. However, the number of possibilities for the aggregate condition and the aggregate function make the exhaustive generation of all complex aggregates intractable. One of the goals of the PhD thesis is to propose a heuristic allowing to explore the complex aggregate space and to generate incrementally the ones that are relevant to address the given problem.
+
'''Funding: ''' Grant from the China Scholarship Council
  
The other domain on which this PhD thesis focuses on is multi-class cost-sensitive learning. In this kind of problem, the attribute to learn can take many values, ''i.e.'' more than 2, contrary to the binary problems for which many learning algorithms are designed. Moreover, all the classification errors do not have the same cost, as expected in a medical domain, where diagnosing a disease for a sane patient will not have the same impact as not diagnosing the disease for a sick patient. In this framework, we are particularly interested in to binarization approaches, which consist in reducing a multi-class problem into several binary problems. More particularly, we consider the case where the binarization uses scorers, the scores being used to set decision thresholds between the two classes of the binary subproblems.
+
'''Overview: ''' This PhD will be dedicated to build, test and implement new algorithms for investigating large quantities of data coming from the web and extracting out of it useful information in order to support engineers when inventing new objects at the early stages of innovation pipeline.
 +
 
 +
The emergence of norms (ISO) related to innovation is now likely to appear worldwide. How will the R&D department (re)organize itself to systematically produce inventions upstream of the innovation chain? How to create new tools that can support teams in charge of breakthrough projects? The LGéCo (Design Engineering Lab.) is interested in the theories of invention (such as TRIZ) and how they could, in the age of Big Data and FabLabs, serve as a link between knowledge that humanity continuously produce and the way its use could assist idea generation followed by quick prototyping. This new practical and pragmatic way of inventing, theoretically grounded but governed by performance rules and efficiency expectations, is the core target of our researches in Inventive Design. Knowledge management in Inventive Design, as defined by our laboratories, is crucial to assist engineers when inventing new objects in the innovation pipeline. It has specific characteristics and requires the selection of certain p i e c e s o f knowledge which can induce evolutions; it produces the reformulation of the initial problem in order to build an abstract model of the concerned object, and includes three main steps: • The “formulation” phase, where the expert uses different tools to express the problem in the form of a contradiction network or another model. • The “abstract solution finding” phase, where access to different knowledge bases is made to get one or more solution models. Generally, in this step, TRIZ users are required to have wide experience on the TRIZ knowledge sources. They need to be capable of choosing the accurate abstract solution according to the current abstract problem. • The “interpretation” phase, where these solution models are instantiated with the help of the scientific-engineering effects knowledge base, to get one or more solutions to be implemented in the real world. Different knowledge sources exist in order to solve different types of inventive problems, such as the 40 inventive principles for eliminating the technical contradictions and the 11 separation principles for eliminating the physical contradictions. These knowledge sources are all built independently of the application field, and their levels of abstraction are very different, making their use quite complicated.
 +
 
 +
Our previous works have developed a framework for a new architecture (knowledge and rules) for managing data (currently semi-automatically) and partially filter the appropriate one compliant with specific engineering studies. The outcomes of this project will permit the finalization of this general architecture, by the incorporation of experience and of meta-knowledge to guide the use of the domain knowledge, the rules and the experience for completely managing data, populating the inventive design ontology and test the impact of this new knowledge on inventive studies.
  
 
= Teaching =
 
= Teaching =

Revision as of 14:40, 8 January 2016

PhD student in the SDC team (formerly BFO team) of the ICube laboratory of the University of Strasbourg since October 2012.

Contact

PeiZhang
ICube Laboratory
Télécom Physique Strasbourg
300 bd Sébastien Brant - CS 10413
F - 67412 Illkirch cedex
Office: C320
Phone: +33 (0) 7 68 16 61 08
Email: peizhang (at) unistra (dot) fr

Research

PhD Thesis

Title: Capitalisation of experience in inventive design studies

Promotor: Zanni-Merk (Senior Tenured Associate Professor, ICube-SDC)


Co-advisor: CAVALLUCCI (Professor, LGeco)


Funding: Grant from the China Scholarship Council

Overview: This PhD will be dedicated to build, test and implement new algorithms for investigating large quantities of data coming from the web and extracting out of it useful information in order to support engineers when inventing new objects at the early stages of innovation pipeline.

The emergence of norms (ISO) related to innovation is now likely to appear worldwide. How will the R&D department (re)organize itself to systematically produce inventions upstream of the innovation chain? How to create new tools that can support teams in charge of breakthrough projects? The LGéCo (Design Engineering Lab.) is interested in the theories of invention (such as TRIZ) and how they could, in the age of Big Data and FabLabs, serve as a link between knowledge that humanity continuously produce and the way its use could assist idea generation followed by quick prototyping. This new practical and pragmatic way of inventing, theoretically grounded but governed by performance rules and efficiency expectations, is the core target of our researches in Inventive Design. Knowledge management in Inventive Design, as defined by our laboratories, is crucial to assist engineers when inventing new objects in the innovation pipeline. It has specific characteristics and requires the selection of certain p i e c e s o f knowledge which can induce evolutions; it produces the reformulation of the initial problem in order to build an abstract model of the concerned object, and includes three main steps: • The “formulation” phase, where the expert uses different tools to express the problem in the form of a contradiction network or another model. • The “abstract solution finding” phase, where access to different knowledge bases is made to get one or more solution models. Generally, in this step, TRIZ users are required to have wide experience on the TRIZ knowledge sources. They need to be capable of choosing the accurate abstract solution according to the current abstract problem. • The “interpretation” phase, where these solution models are instantiated with the help of the scientific-engineering effects knowledge base, to get one or more solutions to be implemented in the real world. Different knowledge sources exist in order to solve different types of inventive problems, such as the 40 inventive principles for eliminating the technical contradictions and the 11 separation principles for eliminating the physical contradictions. These knowledge sources are all built independently of the application field, and their levels of abstraction are very different, making their use quite complicated.

Our previous works have developed a framework for a new architecture (knowledge and rules) for managing data (currently semi-automatically) and partially filter the appropriate one compliant with specific engineering studies. The outcomes of this project will permit the finalization of this general architecture, by the incorporation of experience and of meta-knowledge to guide the use of the domain knowledge, the rules and the experience for completely managing data, populating the inventive design ontology and test the impact of this new knowledge on inventive studies.

Teaching

Teaching assistant at the UFR Mathématiques-Informatique (department of Mathematics and Computer Science) and at the IUT Robert Schuman (University Institute of Technology) of the University of Strasbourg.

2014/2015:

  • IUT Computer Science S1: Databases and SQL (10h TD/28h TP)
  • IUT Computer Science S1: Introduction to Algorithmics and Programming (26h TP)

2013/2014:

  • IUT Computer Science S1: Databases and SQL (10h TD/28h TP)
  • IUT Computer Science S1: Data Structures and Fundamental Algorithms (14h TD/14h TP)

2012/2013:

  • L3/S6P Mathematics: Object-Oriented Programming (18h TD/12h TP)
  • L3/S5P Computer Science: Databases 2 (22h TP)
  • L3/S5P Computer Science: Operating Systems Basis (12h TP)

Publications

<anyweb>http://icube-publis.unistra.fr/?author=Charnay&=#hideMenu</anyweb>