Exploiting environment configurability in reinforcement learning /
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Autor Corporativo: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Amsterdam, Netherlands :
IOS Press,
20221207.
|
Colección: | Frontiers in artificial intelligence and applications ;
v. 361. |
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Intro
- Title page
- Abstract
- Contents
- List of Figures
- List of Tables
- List of Algorithms
- List of Symbols and Notation
- Acknowledgments
- Introduction
- What is Reinforcement Learning?
- Why Environment Configurability?
- Original Contributions
- Overview
- Foundations of Sequential Decision-Making
- Introduction
- Markov Decision Processes
- Markov Reward Processes
- Markov Chains
- Performance Indexes
- Value Functions
- Optimality Criteria
- Exact Solution Methods
- Reinforcement Learning Algorithms
- Temporal Difference Methods
- Function Approximation
- Policy Search
- Modeling Environment Configurability
- Configurable Markov Decision Processes
- Introduction
- Motivations and Examples
- Definition
- Value Functions
- Bellman Equations and Operators
- Taxonomy
- Related Literature
- Solution Concepts for Conf-MDPs
- Cooperative Setting
- Non-Cooperative Setting
- Learning in Cooperative Configurable Markov Decision Processes
- Learning in Finite Cooperative Conf-MDPs
- Introduction
- Relative Advantage Functions
- Performance Improvement Bound
- Safe Policy Model Iteration
- Theoretical Analysis
- Experimental Evaluation
- Examples of Conf-MDPs
- Learning in Continuous Conf-MDPs
- Introduction
- Solving Parametric Conf-MDPs
- Relative Entropy Model Policy Search
- Theoretical Analysis
- Approximation of the Transition Model
- Experiments
- Applications of Configurable Markov Decision Processes
- Policy Space Identification
- Introduction
- Generalized Likelihood Ratio Test
- Policy Space Identification in a Fixed Env
- Analysis for the Exponential Family
- Policy Space Identification in a Configurable Env
- Connections with Existing Work
- Experimental Results
- Control Frequency Adaptation
- Introduction
- Persisting Actions in MDPs
- Bounding the Performance Loss
- Persistent Fitted Q-Iteration
- Persistence Selection
- Related Works
- Experimental Evaluation
- Open Questions
- Discussion and Conclusions
- Modeling Environment Configurability
- Learning in Conf-MDPs
- Applications of Conf-MDPs
- Appendices
- Additional Results and Proofs
- Additional Results and Proofs of Chapter 6
- Additional Results and Proofs of Chapter 7
- Additional Results and Proofs of Chapter 8
- Additional Results and Proofs of Chapter 9
- Exponential Family Policies
- Gaussian and Boltzmann Linear Policies as Exponential Family distributions
- Fisher Information Matrix
- Subgaussianity Assumption
- Bibliography