Composition of machine learning algorithms in Weka

Machine Learning (ML) and artificial intelligence are everywhere. Several libraries exist to support software developers who use ML techniques. However, from a newcomer point of view, it is extremely complicated to select the right approach. How to preprocess the data? Which algorithm to select? This project proposes to develop reverse-engineering methods to help newcomers to use an ML library.

We will consider here the Weka library, which contains preprocessing and ML algorithms. The idea is to automatically extract from the Weka JAR file a model representation associated to each algorithm defined in the library: which kind of algorithms, its properties, … These information will be automatically (as much as possible) inferred by a static analysis of the source code of the library.

Then, from an user point of view, it will be possible to define Java code that assemble these algorithms into a workflow, for example using a small language created in the project. Based on the information extracted from the library, the project will define a compositional model that support the user to (i) validate their workflow and (ii) identify properties of the global workflow based on properties associated to each separated part of the workflow.

It might be possible to extend this work into an internship.

Compétences Requises

No ML skills are required, but being familiar to ML might help;
A strong taste for software engineering and compilation;
Not being afraid by static code analysis

Besoins Clients

As input, the project will absorb the Weka library as a JAR file. It will automatically feed a compositional model used to support the user who will assemble Weka artefact for a given puprose.

Résultats Attendus

A static analyser for the Weka library
An experiment applying this analyser to several versions of Weka
a reasoning engine to automatically infer properties on workflows defined using Weka algorithms

Références

Informations Administratives

Contact : Sébastien Mosser mosser@i3s.unice.fr
Identifiant sujet : Y1819-S056
Effectif : entre 2 et 3 étudiant(e)s
Parcours Recommandés : AL,SD
Équipe: SPARKS