Reinforcement learning is a paradigm in machine learning where an agent learns an algorithm to make good decisions that are not known beforehand (unlike supervised learning, where the expected result of the model are known in advance, or unsupervised learning, where no decisions are made). Combinatorial optimisation achieves the same goal of making good decisions while relying on human experience to derive a mathematical model or to design heuristics.
I am currently involved in a team effort to bring these two worlds together. Each of them has its own drawbacks and advantages: combinatorial optimisation algorithms can make good decision in large dimension (but the computation times can grow impractical) while reinforcement learning cannot natively deal with the huge dimensionality of combinatorial sets; reinforcement learning cannot enforce constraints in the same way combinatorial optimisation models can; heuristics for combinatorial optimisations are designed by humans whereas reinforcement learning learns better policies over time.
The application domain is middle-mile logistics, a very hard combinatorial problem. Middle mile logistics is concerned by the transportation of goods between warehouses, sitting between the first and last miles of the supply chain. The major constraint is timing (goods must be present at the destination hub at the right time for the next step of the supply chain). While first and last miles assign one shipment to a single vehicle and its route through customers, middle mile is about finding a path for one shipment through several vehicles along predefined routes. We tried approaching the problem as a whole (finding a good path for each shipment through the entire graph) – as presented at a NeurIPS workshop and at the EURO conference – or decomposing it and solving the individual components using reinforcement learning (finding feasible paths in a time-expanded graph and covering the shipments with paths). I also presented a summary of published and unpublished results at a CNRS and Google joint event.
Along the way, we also developed an instance generator for middle-mile logistics: MilleMiglia. This generator can create new random instances with any level of hardness; it does so while defining an exchangeable data format to compare solvers easily (in the same way routing problems has formats such as TSPLIB, CARP, NEARP, or LiLim). This work has been presented at the EURO conference.
List of publications:
Applications of Reinforcement Learning to Combinatorial Optimization. Thibaut Cuvelier, Bruno De Backer, Onno Eberhard, Aymane Lotfi. CNRS + Google, May 2024. PDF.
A Novel Instance Generator for Simulating Middle-Mile Logistics Networks. Aymane Lotfi, Petris Matteo, Thibaut Cuvelier, Bruno De Backer, Claudia Archetti. 33rd European Conference on Operational Research. HAL (PDF).
A Tale of Middle-Mile Logistics, Graph Neural Networks, and Reinforcement Learning. Onno Eberhard, Thibaut Cuvelier, Bruno De Backer. 33rd European Conference on Operational Research. HAL (PDF).
Middle-Mile Logistics Through the Lens of Goal-Conditioned Reinforcement Learning. Onno Eberhard, Thibaut Cuvelier, Michal Valko, Bruno De Backer. NeurIPS 2023 Workshop on Goal-Conditioned Reinforcement Learning, November 2023. HAL (PDF).
Machine-learning algorithms are often based on optimisation techniques, but do not always take the most of them. For instance, in reinforcement learning, the combinatorial-bandit paradigm corresponds to situations where the agent makes decision that are chosen in a combinatorial space (a router decides for the path of an incoming packet, a website decides which ads should be shown, etc.). These decisions have a very specific structure, as they are made of many individual components (for instance, a path is a sequence of edges: the cost of the path is obtained as the sum of the cost of each edge). This field generalises online combinatorial optimisation.
Most algorithms for combinatorial bandits use mathematical-optimisation tools in some way. However, current algorithms either have an excellent operational performance (but taking a decision takes a very long time: these algorithms include ESCB and OSSB) or have great computational properties (but the solutions they take can be arbitrarily worse than those played by the first algorithms, like Thompson sampling or CUCB). More precisely, the first kind of algorithm typically has an exponential computational complexity (with respect to the dimension of the problem, i.e. the number of components of a solution).
Developing new advanced tools in the field of mathematical optimisation (solving nonlinear programs using budgeted linear problems), I reduce the bandit-problem complexity to polynomial in many useful cases, for two state-of-the-art algorithms (ESCB and OSSB). The result of this work is available as open-source software (under an MIT license). The techniques behind a polynomial-time implementation of ESCB have been presented at the SNAPP seminar, in an arXiv preprint, and at the ACM SIGMETRICS conference. The ones for OSSB have been presented at the ALT conference.
List of publications:
Polynomial-Time Algorithms for Combinatorial Semibandits: Computationally Tractable Reinforcement Learning in Complex Environments (official title: Algorithmes en temps polynomial pour les semi-bandits combinatoires : apprentissage par renforcement efficace dans des environnements complexes). Thibaut Cuvelier. PhD dissertation, university Paris-Saclay, June 2021. HAL (PDF).
Polynomial-Time Combinatorial Bandits: Computationally Tractable Reinforcement Learning in Complex Environments. Thibaut Cuvelier. Amazon Transport Services, Luxembourg (Luxembourg), March 2021. PDF.
Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time. Thibaut Cuvelier, Richard Combes, Éric Gourdin. Algorithmic Learning Theory (ALT), Paris (France), March 2021. Journal of Machine Learning. arXiv (PDF). HAL (PDF). Presentation at ALT 2021.
Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits. Thibaut Cuvelier, Richard Combes, Éric Gourdin. ACM SIGMETRICS 2021, Beijing (China), June 2021. ACM Digital Library. DOI. Kudos. arXiv (PDF). HAL (PDF).
Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits. Thibaut Cuvelier, Richard Combes, Éric Gourdin. ACM SIGMETRICS 2021, Beijing (China), June 2021. ACM Digital Library. DOI. HAL (PDF).
Active learning is the field of supervised machine learning where the learning algorithm can query an oracle (a human user or another information source) for new labels of previously unlabelled samples. In particular, in stream-based active learning, samples are sequentially presented to the algorithm. This field is highly related to optimal experimental design.
We propose a new algorithm, RAL (reinforced active learning), for stream-based active learning based on reinforcement learning, more specifically on contextual bandits with expert advice, that can outperform current techniques. RAL uses bandit algorithms to merge several existing active-learning algorithms in an adaptive ensemble. We provide a Python implementation of RAL. We studied RAL’s performance and provided a theoretical analysis at the IAL (interactive adaptive learning) workshop.
List of publications:
Adaptive and Reinforcement Learning Approaches for Online Network Monitoring and Analysis. Sarah Wassermann, Thibaut Cuvelier, Pavol Mulinka, Pedro Casas. IEEE Transactions on Network and Service Management, vol. 18, no. 2, June 2021. IEEE Xplore. HAL (PDF).
RAL — Reinforcement Active Learning for Network Traffic Monitoring and Analysis. Sarah Wassermann, Thibaut Cuvelier, Pedro Casas. Proceedings of the ACM SIGCOMM Conference Posters and Demos, online, August 2020. HAL (PDF).
ADAM & RAL: Adaptive Memory Learning and Reinforcement Active Learning for Network Monitoring. Sarah Wassermann, Thibaut Cuvelier, Pedro Casas, Pavol Mulinka. 15th International Conference on Network and Service Management (CNSM) 2019, Halifax (Canada), October 2019. HAL (PDF).
Improving Stream-Based Active Learning with Reinforcement Learning. Sarah Wassermann, Thibaut Cuvelier, Pedro Casas. Workshop for Women in Machine Learning (WiML) 2019. HAL (PDF).
RAL: Improving Stream-Based Active Learning by Reinforcement Learning. Sarah Wassermann, Thibaut Cuvelier, Pedro Casas. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) Workshop on Interactive Adaptive Learning (IAL), Würzburg (Germany), September 2019. IAL 2019 proceedings. HAL (PDF).
In computer networks, traffic engineering concerns the efficient use of existing network equipment so that users get the best experience when using the network. Most of the parameters have some level of uncertainty: for instance, operators cannot predict with an extremely high accuracy the traffic that their network will have to withstand. Therefore, this uncertainty must be built into the mathematical models that are used when making traffic-engineering decisions.
There are several ways to cope with the variability of parameters, such as robust optimisation. Oblivious routing is an often-disregarded paradigm, especially due to its computational cost. Other approaches include fairness, with the idea that a fair solution should withstand more traffic.
My implementation of many network-routing models is freely available on GitHub in the Seleroute.jl package (under an MIT license). Preliminary have been presented at the TMA conference. Details of the Julia implementation have been discussed at the Julia Days 2023.
List of publications:
Seleroute.jl, a generic package for network-routing optimisation. Thibaut Cuvelier. Julia and Optimization Days 2023. HAL (PDF).
Oblivious Routing: Static Routing Prepared Against Network Traffic and Link Failures. Thibaut Cuvelier, Éric Gourdin. Network Traffic Measurement and Analysis (TMA) PhD School 2019, Paris (France), June 2019. HAL (PDF).
Oblivious Routing: Worst-Case Routing is not Breaking the Internet’s Legs. Thibaut Cuvelier, Éric Gourdin. Network Traffic Measurement and Analysis (TMA) PhD School 2018, Vienna (Austria), June 2018. ORBi (PDF). HAL (PDF).
The InduStore research project develops a methodology that helps industrialists add flexibility when using their plants, especially regarding the price of electricity on the day-ahead market.
These developments started with easing the modelling of industrial processes with generic block-diagram models and their associated mathematical formulation. Thanks to this kind of models, those processes can be used in mathematical-programming-based solutions for optimising plants. Currently, a seemingly good candidate for this modelling is the reservoir, which can model many kinds of processes.
Once the process models are developed, they should be integrated within a complete plant model to actually exploit the flexibility. Outside the processes, two elements must be considered: the order book, but also the workers, as they are at the heart of the production. Industrial sites tend to use shift work to be able to produce twenty-four hours a day; however, this requires to organise the shifts according to legal constraints. Our approach considers the two faces of the same coin: production and shift work. The first one exploits the flexibility, while the second optimises the schedule for the workers’ well-being.
The source code of the integrated models is freely available on GitHub (under an MIT license). The production-HR coupling has been presented at the COMEX workshop and at the IFORS conference. The modelling methodology has been presented at the DS3 summer school.
List of publications:
Embedding reservoirs in industrial models to exploit their flexibility. Thibaut Cuvelier. SN Applied Sciences, volume 2, article number 2171, December 2020. Springer. DOI. Kudos. arXiv (PDF). HAL (PDF).
Characterising Industrial Sites’ Flexibility with Reservoir Models. Thibaut Cuvelier. DS3 Data Science Summer School (École Polytechnique), Paris (France), August 2017. ORBi (PDF).
Modelling the industrial flexibility from the electricity consumption and HR points of view. Thibaut Cuvelier, Quentin Louveaux. 22nd Belgian Mathematical Optimization Workshop, COMEX (combinatorial optimisation: metaheuristics and exact methods), La Roche-en-Ardenne (Belgium), April 2017. ORBi (PDF).
Optimising workforce and energy costs by exploiting production flexibility. Thibaut Cuvelier, Quentin Louveaux. 21st Conference of the International Federation of Operational Research Societies (IFORS), Québec (Canada), July 2017. ORBi (PDF).
Current dam management techniques that use optimisation either are based on mathematical programming and have a low level of detail, or heavily use metaheuristics to include physical phenomena. The latter completely lose the traditional convergence properties of mathematical optimisation, and both still lack provably reliable uncertainty modelling.
More specifically, this research aims to include flooding models inside mathematical programming while including uncertainty. Floods are usually described by nonlinear differential models, and the goal is to make them fit into convex optimisation frameworks.
For now, the models have a rather low level of detail, but include uncertainty in the inflow.
The Julia source code of the integrated models is freely available on GitHub (under an MIT license). The uncertain models have been compared in an article published in the Water Resources Management journal.
List of publications:
Operation rules of the Vesdre reservoir revisited. Benjamin Dewals, Thibaut Cuvelier, Pierre Archambeau, Sébastien Erpicum, Michel Pirotton, Quentin Louveaux. 6th International Symposium on Hydrological Modelling of the Meuse basin, September 2019. ORBi (PDF).
Comparison Between Robust and Stochastic Optimisation for Long-term Reservoir Operations Under Uncertainty. Thibaut Cuvelier, Pierre Archambeau, Benjamin Dewals, Quentin Louveaux. Water Resources Management, vol. 32, no. 5, pp. 1599–1614, March 2018. Springer. arXiv (PDF). ORBi (PDF). HAL (PDF).
Formal optimisation can take into account the uncertainty in two main ways: either stochastic or robust programming. Both paradigms have their own strengths, based on quite different theoretical notions: stochastic optimisation considers the uncertain parameters are described by their probability density functoin, while robust optimisation regards them as belonging to an uncertainty set.
Those two approaches are quite different, but have rarely seen a direct comparison, be it for their actual performance when solving the resulting optimisation problems, but also for the robustness of their solution against unforeseen uncertainty.
My master’s thesis was exactly about this comparison. It was presented at the ORBEL30 conference. The full text is also available online.
List of publications:
Comparing Oblivious and Robust Routing Approaches. Thibaut Cuvelier, Éric Gourdin. Programme Gaspard Monge pour l’optimisation, la recherche opérationnelle et leurs interactions avec les sciences des données (PGMO Days) 2018, November 2018. ORBi (PDF).
Optimisation and uncertainty: comparing stochastic and robust programming. Thibaut Cuvelier. 30th Annual Conference of the Belgian Operational Research Society (ORBEL), Louvain-la-Neuve (Belgium), January 2016. ORBi (PDF).
Implementing and Comparing Stochastic and Robust Programming. Thibaut Cuvelier. Engineering degree dissertation, university of Liège, June 2015. ORBi (PDF).
This book was written for anyone wanting to develop graphical user interfaces in Python, starting from a simple dialog to very elaborate applications with menus, toolbars, resizing, database access, etc. The reader is supposed to have little to no knowledge about Qt, but to be knowledgeable about Python and object-oriented programming, especially inheritance.
Qt is a very complete multiplatform framework, written in C++. PyQt is a binding between Python and Qt, bringing the full Python environment to Qt.
This book relies on PyQt version 5.6, which will be maintained in the long term, while also mentioning the differences with the next version, PyQt 5.7.
Two programming paradigms are proposed: the imperative way, that assembles widgets, and the declarative way, with the QML language (Qt Quick). The same example application (library management) is developed with the two methodologies. To go further, a third part details interactive 2D rendering with graphics views (within a Qt Widget application) and different advanced rendering techniques for Qt Quick (Canvas, Qt 3D).
Eric6 was chosen as default development environment. You will also use Qt Creator to develop in QML.
ISBN-13: 978-2-8227-0518-9.
Qt 5 was out at the end of 2012 and marks a new step in multiplatform programmin. Taking into account the evolution of application development, the framework was refactored as a large and modular toolbox that can be used to create any kind of application, either for desktop or mobile devices.
This book aims at covering all fundamental tools of Qt 5.
As a newcomer or a seasoned user, as a designer or a developer, the book will guide you through the first steps with the new version of the framework. Through rich and various examples, it gives you all the keys to create applications, help you find the best graphical modules, take the most out of Qt Creator, optimise your development, or succeed in migrating from Qt 4. A large part of the book is dedicated to Qt QUick and QML, being the most prominent addition to Qt.
Like the framework, this book is modular, meaning that it is made up of several relatively autonomous units, focused on specific items. Its objective is to directly answer learning goals and use cases, so that the reader may quickly find the most relevant bits.
This book will present:
ISBN-13: 978-2-8227-0108-2.
The World Wide Web has enabled the creation of a global information space comprising linked documents. As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to raw data not currently available on the Web or bound up in hypertext documents. Linked data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards - the Web of data. In this Synthesis lecture we provide readers with a detailed technical introduction to linked data. We begin by outlining the basic principles of linked data, including coverage of relevant aspects of Web architecture. The remainder of the text is based around two main themes - the publication and consumption of linked data. Drawing on a practical linked data scenario, we provide guidance and best practices on: architectural approaches to publishing linked data; choo ing URIs and vocabularies to identify and describe resources; deciding what data to return in a description of a resource on the Web; methods and frameworks for automated linking of data sets; and testing and debugging approaches for linked data deployments. We give an overview of existing linked data applications and then examine the architectures that are used to consume linked data from the Web, alongside existing tools and frameworks that enable these. Readers can expect to gain a rich technical understanding of linked data fundamentals, as the basis for application development, research or further study.
This book is a translation of Christian Bizer and Tom Heath’s Linked data: evolving the Web into a global data space.
ISBN-13: 978-2-7440-2519-8.