Je sers la science et c'est ma joie.
Basile in Léonard.
During my PhD, I developped new algorithms for Inverse Reinforcement Learning. I applied them to toy problems to provide empirical insights about their performance and inner workings. Read my thesis (in French) here.
My advisors were Matthieu Geist and Yann Guermeur. I worked under the close supervision of Olivier Pietquin. I was a part of the MaLIS and ABC research teams. The funding came from Région Lorraine through Supélec Metz. All of whom I deeply thank.
After tying some loose ends (e.g. providing a clean and nice implementation of the algorithms I developped), I intend to flee academia to the land of the industry, to help put all the wonders I learned to good, actual use.
SCIRL is an efficient algorithm (both in terms of computation time and data needs) that solves the IRL problem with only data from the expert (up to the use of some heuristics). To the best of my knowledge, it is the first published algorithm to do so. Others needs at least non expert data, if not access to the whole model.
@inproceedings{klein2012structured, author = {Edouard Klein and Matthieu Geist and Bilal PIOT and Olivier Pietquin}, title = {{Inverse Reinforcement Learning through Structured Classification}}, year = {2012}, booktitle = {{Advances in Neural Information Processing Systems (NIPS 2012)}}, month = {December}, address = {Lake Tahoe (NV, USA)}, url = {http://rdklein.fr/research/papers/klein2012stuctured.pdf} }
CSI is another efficient IRL algorithm. It possesses the same capabilities as SCIRL but is slighlty easier to implement. The theoretical bound is more standard.
@inproceedings{klein2013cascading, Address = {Prague (Czech Republic)}, Author = {Edouard Klein and Bilal PIOT and Matthieu Geist and Olivier Pietquin}, Booktitle = {{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013)}}, Month = {September}, Title = {{A cascaded supervised learning approach to inverse reinforcement learning}}, Year = {2013}} url = {http://rdklein.fr/research/papers/klein2013cascading.pdf} }
LSTD-μ is the direct adaptation of LSTDQ of Lagoudakis and Parr to compute the feature expectation μ of a policy in a batch, off-policy manner.
@inproceedings{klein2011batch, author = {Edouard Klein and Matthieu Geist and Olivier Pietquin}, title = {{Batch, Off-policy and Model-free Apprenticeship Learning}}, year = {2011}, booktitle = {{Proceedings of the European inproceedings on Reinforcement Learning (EWRL 2011)}}, publisher = {Springer Verlag - Heidelberg Berlin}, pages = {12 pages}, month = {september}, series = {Lecture Notes in Computer Science (LNCS)}, address = {Athens (Greece)}, url = {http://rdklein.fr/research/papers/klein2011batch.pdf} }
The code and data for the exepriments in the aforementionned papers is available in this GitHub repo. It is of very limited use because of its lack of documentation or comments. A good tutoriel on SCIRL is nonetheless available.
I hope I will find the time and energy to properly implement and document LSTD-µ, SCIRL and CSI, if possible as part of a well known machine learning library.
[1] | Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Classification structurée pour l'apprentissage par renforcement inverse. Revue d'Intelligence Artificielle, Mai 2013. [ bib | .pdf ] |
[2] | Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Apprentissage par renforcement inverse en cascadant classification et régression. In Journées Francophones de Planification, Décision et Apprentissage (JFPDA), 2013. [ bib | .pdf ] |
[3] | Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. A cascaded supervised learning approach to inverse reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013), Prague (Czech Republic), September 2013. [ bib | .pdf ] |
[4] | Matthieu Geist, Edouard Klein, Yann Guermeur, and Olivier Pietquin. Cascading and merging supervised learning for inverse reinforcement learning. Technical report, 2013. [ bib ] |
[5] | Matthieu Geist, Edouard Klein, Bilal Piot, Yann Guermeur, and Olivier Pietquin. Around inverse reinforcement learning and score-based classification. In Reinforcement Learning and Decision Making Meetings, 2013. [ bib ] |
[6] | Laurent Bougrain, Matthieu Duvinage, and Edouard Klein. Inverse reinforcement learning to control a robotic arm using a brain-computer interface. Technical report, eNTERFACE Summer Workshop, 2012. [ bib | .pdf ] |
[7] | Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Classification structurée pour l'apprentissage par renforcement inverse. Revue d'Intelligence Artificielle, 2013. [ bib ] |
[8] | Edouard Klein, Matthieu Geist, Bilal PIOT, and Olivier Pietquin. Inverse Reinforcement Learning through Structured Classification. In Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe (NV, USA), December 2012. [ bib | .pdf ] |
[9] | Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Classification structurée pour l'apprentissage par renforcement inverse. In Actes de la Conférence Francophone sur l'Apprentissage Automatique (Cap 2012), Nancy, France, 2012. to appear. [ bib | .pdf ] |
[10] | Edouard Klein, Bilal PIOT, Matthieu Geist, and Olivier Pietquin. Structured Classification for Inverse Reinforcement Learning. In European inproceedings on Reinforcement Learning (EWRL 2012), Edinburgh (UK), June 2012. [ bib | .pdf ] |
[11] | Edouard Klein, Matthieu Geist, and Olivier Pietquin. Reducing the dimentionality of the reward space in the Inverse Reinforcement Learning problem. In Proceedings of the IEEE inproceedings on Machine Learning Algorithms, Systems and Applications (MLASA 2011), page 4 pages, Honolulu (USA), December 2011. [ bib | .pdf ] |
[12] | Edouard Klein, Matthieu Geist, and Olivier Pietquin. Batch, Off-policy and Model-free Apprenticeship Learning. In Proceedings of the European inproceedings on Reinforcement Learning (EWRL 2011), Lecture Notes in Computer Science (LNCS), page 12 pages, Athens (Greece), september 2011. Springer Verlag - Heidelberg Berlin. [ bib | .pdf ] |
[13] | Edouard Klein, Matthieu Geist, and Olivier Pietquin. Apprentissage par imitation étendu au cas batch, off-policy et sans modèle. In Sixièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2011), page 9 pages, Rouen (France), June 2011. [ bib | .pdf ] |
[14] | Edouard Klein, Matthieu Geist, and Olivier Pietquin. Batch, Off-policy and Model-Free Apprenticeship Learning. In IJCAI inproceedings on Agents Learning Interactively from Human Teachers (ALIHT 2011), Barcelona (Spain), July 2011. 6 pages. [ bib | .pdf ] |
This table was generated by bibtex2html 1.96.