Cargando…

Reinforcement Learning

Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical boo...

Descripción completa

Detalles Bibliográficos
Autor principal: Winder, Phil
Formato: Electrónico eBook
Idioma:Indeterminado
Publicado: [S.l.] : O'Reilly Media, Inc., 2020.
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cam a2200000Mu 4500
001 OR_on1202569355
003 OCoLC
005 20231017213018.0
006 m d
007 cr n |||
008 201011s2020 xx o ||| 0 und d
040 |a VT2  |b eng  |c VT2  |d EBLCP  |d STF  |d ERF  |d LDP  |d TOH  |d VT2  |d KSU  |d UPM  |d OCLCQ 
066 |c Grek  |c (S 
020 |a 9781098114831 
020 |a 1098114833 
029 1 |a AU@  |b 000070607057 
035 |a (OCoLC)1202569355 
049 |a UAMI 
100 1 |a Winder, Phil. 
245 1 0 |a Reinforcement Learning  |h [electronic resource] /  |c Phil Winder. 
260 |a [S.l.] :  |b O'Reilly Media, Inc.,  |c 2020. 
300 |a 1 online resource 
500 |a Title from content provider. 
505 0 |a Intro -- Copyright -- Table of Contents -- Preface -- Objective -- Who Should Read This Book? -- Guiding Principles and Style -- Prerequisites -- Scope and Outline -- Supplementary Materials -- Conventions Used in This Book -- Acronyms -- Mathematical Notation -- Fair Use Policy -- O'Reilly Online Learning -- How to Contact Us -- Acknowledgments -- Chapter 1. Why Reinforcement Learning? -- Why Now? -- Machine Learning -- Reinforcement Learning -- When Should You Use RL? -- RL Applications -- Taxonomy of RL Approaches -- Model-Free or Model-Based -- How Agents Use and Update Their Strategy 
505 8 |a Discrete or Continuous Actions -- Optimization Methods -- Policy Evaluation and Improvement -- Fundamental Concepts in Reinforcement Learning -- The First RL Algorithm -- Is RL the Same as ML? -- Reward and Feedback -- Reinforcement Learning as a Discipline -- Summary -- Further Reading -- Chapter 2. Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods -- Multi-Arm Bandit Testing -- Reward Engineering -- Policy Evaluation: The Value Function -- Policy Improvement: Choosing the Best Action -- Simulating the Environment -- Running the Experiment 
505 8 |a Speedy Q-Learning -- Accumulating Versus Replacing Eligibility Traces -- Summary -- Further Reading -- Chapter 4. Deep Q-Networks -- Deep Learning Architectures -- Fundamentals -- Common Neural Network Architectures -- Deep Learning Frameworks -- Deep Reinforcement Learning -- Deep Q-Learning -- Experience Replay -- Q-Network Clones -- Neural Network Architecture -- Implementing DQN -- Example: DQN on the CartPole Environment -- Case Study: Reducing Energy Usage in Buildings -- Rainbow DQN -- Distributional RL -- Prioritized Experience Replay -- Noisy Nets -- Dueling Networks 
520 |a Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical book shows data science and AI professionals how to learn by reinforcementand enable a machine to learn by itself. Author Phil Winder of Winder Research covers everything from basic building blocks to state-of-the-art practices. You'll explore the current state of RL, focus on industrial applications, learnnumerous algorithms, and benefit from dedicated chapters on deploying RL solutions to production. This is no cookbook; doesn't shy away from math and expects familiarity with ML. Learn what RL is and how the algorithms help solve problems Become grounded in RL fundamentals including Markov decision processes, dynamic programming, and temporal difference learning Dive deep into a range of value and policy gradient methods Apply advanced RL solutions such as meta learning, hierarchical learning, multi-agent, and imitation learning Understand cutting-edge deep RL algorithms including Rainbow, PPO, TD3, SAC, and more Get practical examples through the accompanying website. 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
856 4 0 |u https://learning.oreilly.com/library/view/~/9781492072386/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
880 8 |6 505-00  |a Improving the ϵ -greedy Algorithm -- Markov Decision Processes -- Inventory Control -- Inventory Control Simulation -- Policies and Value Functions -- Discounted Rewards -- Predicting Rewards with the State-Value Function -- Predicting Rewards with the Action-Value Function -- Optimal Policies -- Monte Carlo Policy Generation -- Value Iteration with Dynamic Programming -- Implementing Value Iteration -- Results of Value Iteration -- Summary -- Further Reading -- Chapter 3. Temporal-Difference Learning, Q-Learning, and n-Step Algorithms -- Formulation of Temporal-Difference Learning 
880 8 |6 505-00  |a Q-Learning -- SARSA -- Q-Learning Versus SARSA -- Case Study: Automatically Scaling Application Containers to Reduce Cost -- Industrial Example: Real-Time Bidding in Advertising -- Defining the MDP -- Results of the Real-Time Bidding Environments -- Further Improvements -- Extensions to Q-Learning -- Double Q-Learning -- Delayed Q-Learning -- Comparing Standard, Double, and Delayed Q-learning -- Opposition Learning -- n-Step Algorithms -- n-Step Algorithms on Grid Environments -- Eligibility Traces -- Extensions to Eligibility Traces -- Watkins's Q( λ ) -- Fuzzy Wipes in Watkins's Q( λ ) 
938 |a ProQuest Ebook Central  |b EBLB  |n EBL6386759 
994 |a 92  |b IZTAP