Research Homepage of Edouard Klein

Je sers la science et c'est ma joie.

Basile in Léonard.

Research interests

During my PhD, I developped new algorithms for Inverse Reinforcement Learning. I applied them to toy problems to provide empirical insights about their performance and inner workings. Read my thesis (in French) here.

My advisors were Matthieu Geist and Yann Guermeur. I worked under the close supervision of Olivier Pietquin. I was a part of the MaLIS and ABC research teams. The funding came from Région Lorraine through Supélec Metz. All of whom I deeply thank.

After tying some loose ends (e.g. providing a clean and nice implementation of the algorithms I developped), I intend to flee academia to the land of the industry, to help put all the wonders I learned to good, actual use.

Main publications

SCIRL

SCIRL is an efficient algorithm (both in terms of computation time and data needs) that solves the IRL problem with only data from the expert (up to the use of some heuristics). To the best of my knowledge, it is the first published algorithm to do so. Others needs at least non expert data, if not access to the whole model.

Edouard Klein, Matthieu Geist, Bilal PIOT, and Olivier Pietquin. Inverse Reinforcement Learning through Structured Classification. In Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe (NV, USA), December 2012.
This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multi-class classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.
		    @inproceedings{klein2012structured,
		    author = {Edouard Klein and Matthieu Geist and Bilal PIOT and Olivier Pietquin},
		    title = {{Inverse Reinforcement Learning through Structured Classification}},
		    year = {2012},
		    booktitle = {{Advances in Neural Information Processing Systems (NIPS 2012)}},
		    month = {December},
		    address = {Lake Tahoe (NV, USA)},
		    url = {http://rdklein.fr/research/papers/klein2012stuctured.pdf}
		    }
		  

CSI

CSI is another efficient IRL algorithm. It possesses the same capabilities as SCIRL but is slighlty easier to implement. The theoretical bound is more standard.

Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. A cascaded supervised learning approach to inverse reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013), Prague (Czech Republic), September 2013.
This paper considers the Inverse Reinforcement Learning (IRL) problem, that is inferring a reward function for which a demonstrated expert policy is optimal. We propose to break the IRL problem down into two generic Supervised Learning steps: this is the Cascaded Supervised IRL (CSI) approach. A classification step that defines a score function is followed by a regression step providing a reward function. A theoretical analysis shows that the demonstrated expert policy is near-optimal for the computed reward function. Not needing to repeatedly solve a Markov Decision Process (MDP) and the ability to leverage existing techniques for classification and regression are two important advantages of the CSI approach. It is furthermore empirically demonstrated to compare positively to state-of-the-art approaches when using only transitions sampled according to the expert policy, up to the use of some heuristics. This is exemplified on two classical benchmarks (the mountain car problem and a highway driving simulator).
              
              @inproceedings{klein2013cascading,
              Address = {Prague (Czech Republic)},
              Author = {Edouard Klein and Bilal PIOT and Matthieu Geist and Olivier Pietquin},
              Booktitle = {{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013)}},
              Month = {September},
              Title = {{A cascaded supervised learning approach to inverse reinforcement learning}},
              Year = {2013}}
              url = {http://rdklein.fr/research/papers/klein2013cascading.pdf}
		    }
		  

LSTD-μ

LSTD-μ is the direct adaptation of LSTDQ of Lagoudakis and Parr to compute the feature expectation μ of a policy in a batch, off-policy manner.

Edouard Klein, Matthieu Geist, and Olivier Pietquin. Batch, Off-policy and Model-free Apprenticeship Learning. In Proceedings of the European inproceedings on Reinforcement Learning (EWRL 2011), Lecture Notes in Computer Science (LNCS), page 12 pages, Athens (Greece), september 2011. Springer Verlag - Heidelberg Berlin.
This paper addresses the problem of apprenticeship learning, that is learning control policies from demonstration by an expert. An efficient framework for it is inverse reinforcement learning (IRL). Based on the assumption that the expert maximizes a utility function, IRL aims at learning the underlying reward from example trajectories. Many IRL algorithms assume that the reward function is linearly parameterized and rely on the computation of some associated feature expectations, which is done through Monte Carlo simulation. However, this assumes to have full trajectories for the expert policy as well as at least a generative model for intermediate policies. In this paper, we introduce a temporal difference method, namely LSTD-μ, to compute these feature expectations. This allows extending apprenticeship learning to a batch and off-policy setting.
		    @inproceedings{klein2011batch,
		    author = {Edouard Klein and Matthieu Geist and Olivier Pietquin},
		    title = {{Batch, Off-policy and Model-free Apprenticeship Learning}},
		    year = {2011},
		    booktitle = {{Proceedings of the European inproceedings on Reinforcement Learning (EWRL 2011)}},
		    publisher = {Springer Verlag - Heidelberg Berlin},
		    pages = {12 pages},
		    month = {september},
		    series = {Lecture Notes in Computer Science (LNCS)},
		    address = {Athens (Greece)},
		    url = {http://rdklein.fr/research/papers/klein2011batch.pdf}
		    }
		  

Source code and data

The code and data for the exepriments in the aforementionned papers is available in this GitHub repo. It is of very limited use because of its lack of documentation or comments. A good tutoriel on SCIRL is nonetheless available.

I hope I will find the time and energy to properly implement and document LSTD-µ, SCIRL and CSI, if possible as part of a well known machine learning library.

All publications

[1] Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Classification structurée pour l'apprentissage par renforcement inverse. Revue d'Intelligence Artificielle, Mai 2013. [ bib | .pdf ]
[2] Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Apprentissage par renforcement inverse en cascadant classification et régression. In Journées Francophones de Planification, Décision et Apprentissage (JFPDA), 2013. [ bib | .pdf ]
[3] Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. A cascaded supervised learning approach to inverse reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013), Prague (Czech Republic), September 2013. [ bib | .pdf ]
[4] Matthieu Geist, Edouard Klein, Yann Guermeur, and Olivier Pietquin. Cascading and merging supervised learning for inverse reinforcement learning. Technical report, 2013. [ bib ]
[5] Matthieu Geist, Edouard Klein, Bilal Piot, Yann Guermeur, and Olivier Pietquin. Around inverse reinforcement learning and score-based classification. In Reinforcement Learning and Decision Making Meetings, 2013. [ bib ]
[6] Laurent Bougrain, Matthieu Duvinage, and Edouard Klein. Inverse reinforcement learning to control a robotic arm using a brain-computer interface. Technical report, eNTERFACE Summer Workshop, 2012. [ bib | .pdf ]
[7] Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Classification structurée pour l'apprentissage par renforcement inverse. Revue d'Intelligence Artificielle, 2013. [ bib ]
[8] Edouard Klein, Matthieu Geist, Bilal PIOT, and Olivier Pietquin. Inverse Reinforcement Learning through Structured Classification. In Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe (NV, USA), December 2012. [ bib | .pdf ]
[9] Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Classification structurée pour l'apprentissage par renforcement inverse. In Actes de la Conférence Francophone sur l'Apprentissage Automatique (Cap 2012), Nancy, France, 2012. to appear. [ bib | .pdf ]
[10] Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Structured Classification for Inverse Reinforcement Learning. In European inproceedings on Reinforcement Learning (EWRL 2012), Edinburgh (UK), June 2012. [ bib | .pdf ]
[11] Edouard Klein, Matthieu Geist, and Olivier Pietquin. Reducing the dimentionality of the reward space in the Inverse Reinforcement Learning problem. In Proceedings of the IEEE inproceedings on Machine Learning Algorithms, Systems and Applications (MLASA 2011), page 4 pages, Honolulu (USA), December 2011. [ bib | .pdf ]
[12] Edouard Klein, Matthieu Geist, and Olivier Pietquin. Batch, Off-policy and Model-free Apprenticeship Learning. In Proceedings of the European inproceedings on Reinforcement Learning (EWRL 2011), Lecture Notes in Computer Science (LNCS), page 12 pages, Athens (Greece), september 2011. Springer Verlag - Heidelberg Berlin. [ bib | .pdf ]
[13] Edouard Klein, Matthieu Geist, and Olivier Pietquin. Apprentissage par imitation étendu au cas batch, off-policy et sans modèle. In Sixièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2011), page 9 pages, Rouen (France), June 2011. [ bib | .pdf ]
[14] Edouard Klein, Matthieu Geist, and Olivier Pietquin. Batch, Off-policy and Model-Free Apprenticeship Learning. In IJCAI inproceedings on Agents Learning Interactively from Human Teachers (ALIHT 2011), Barcelona (Spain), July 2011. 6 pages. [ bib | .pdf ]

This table was generated by bibtex2html 1.96.