|
|
|
|
LEADER |
00000cam a2200000Mu 4500 |
001 |
OR_on1202569355 |
003 |
OCoLC |
005 |
20231017213018.0 |
006 |
m d |
007 |
cr n ||| |
008 |
201011s2020 xx o ||| 0 und d |
040 |
|
|
|a VT2
|b eng
|c VT2
|d EBLCP
|d STF
|d ERF
|d LDP
|d TOH
|d VT2
|d KSU
|d UPM
|d OCLCQ
|
066 |
|
|
|c Grek
|c (S
|
020 |
|
|
|a 9781098114831
|
020 |
|
|
|a 1098114833
|
029 |
1 |
|
|a AU@
|b 000070607057
|
035 |
|
|
|a (OCoLC)1202569355
|
049 |
|
|
|a UAMI
|
100 |
1 |
|
|a Winder, Phil.
|
245 |
1 |
0 |
|a Reinforcement Learning
|h [electronic resource] /
|c Phil Winder.
|
260 |
|
|
|a [S.l.] :
|b O'Reilly Media, Inc.,
|c 2020.
|
300 |
|
|
|a 1 online resource
|
500 |
|
|
|a Title from content provider.
|
505 |
0 |
|
|a Intro -- Copyright -- Table of Contents -- Preface -- Objective -- Who Should Read This Book? -- Guiding Principles and Style -- Prerequisites -- Scope and Outline -- Supplementary Materials -- Conventions Used in This Book -- Acronyms -- Mathematical Notation -- Fair Use Policy -- O'Reilly Online Learning -- How to Contact Us -- Acknowledgments -- Chapter 1. Why Reinforcement Learning? -- Why Now? -- Machine Learning -- Reinforcement Learning -- When Should You Use RL? -- RL Applications -- Taxonomy of RL Approaches -- Model-Free or Model-Based -- How Agents Use and Update Their Strategy
|
505 |
8 |
|
|a Discrete or Continuous Actions -- Optimization Methods -- Policy Evaluation and Improvement -- Fundamental Concepts in Reinforcement Learning -- The First RL Algorithm -- Is RL the Same as ML? -- Reward and Feedback -- Reinforcement Learning as a Discipline -- Summary -- Further Reading -- Chapter 2. Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods -- Multi-Arm Bandit Testing -- Reward Engineering -- Policy Evaluation: The Value Function -- Policy Improvement: Choosing the Best Action -- Simulating the Environment -- Running the Experiment
|
505 |
8 |
|
|a Speedy Q-Learning -- Accumulating Versus Replacing Eligibility Traces -- Summary -- Further Reading -- Chapter 4. Deep Q-Networks -- Deep Learning Architectures -- Fundamentals -- Common Neural Network Architectures -- Deep Learning Frameworks -- Deep Reinforcement Learning -- Deep Q-Learning -- Experience Replay -- Q-Network Clones -- Neural Network Architecture -- Implementing DQN -- Example: DQN on the CartPole Environment -- Case Study: Reducing Energy Usage in Buildings -- Rainbow DQN -- Distributional RL -- Prioritized Experience Replay -- Noisy Nets -- Dueling Networks
|
520 |
|
|
|a Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical book shows data science and AI professionals how to learn by reinforcementand enable a machine to learn by itself. Author Phil Winder of Winder Research covers everything from basic building blocks to state-of-the-art practices. You'll explore the current state of RL, focus on industrial applications, learnnumerous algorithms, and benefit from dedicated chapters on deploying RL solutions to production. This is no cookbook; doesn't shy away from math and expects familiarity with ML. Learn what RL is and how the algorithms help solve problems Become grounded in RL fundamentals including Markov decision processes, dynamic programming, and temporal difference learning Dive deep into a range of value and policy gradient methods Apply advanced RL solutions such as meta learning, hierarchical learning, multi-agent, and imitation learning Understand cutting-edge deep RL algorithms including Rainbow, PPO, TD3, SAC, and more Get practical examples through the accompanying website.
|
590 |
|
|
|a O'Reilly
|b O'Reilly Online Learning: Academic/Public Library Edition
|
856 |
4 |
0 |
|u https://learning.oreilly.com/library/view/~/9781492072386/?ar
|z Texto completo (Requiere registro previo con correo institucional)
|
880 |
8 |
|
|6 505-00
|a Improving the ϵ -greedy Algorithm -- Markov Decision Processes -- Inventory Control -- Inventory Control Simulation -- Policies and Value Functions -- Discounted Rewards -- Predicting Rewards with the State-Value Function -- Predicting Rewards with the Action-Value Function -- Optimal Policies -- Monte Carlo Policy Generation -- Value Iteration with Dynamic Programming -- Implementing Value Iteration -- Results of Value Iteration -- Summary -- Further Reading -- Chapter 3. Temporal-Difference Learning, Q-Learning, and n-Step Algorithms -- Formulation of Temporal-Difference Learning
|
880 |
8 |
|
|6 505-00
|a Q-Learning -- SARSA -- Q-Learning Versus SARSA -- Case Study: Automatically Scaling Application Containers to Reduce Cost -- Industrial Example: Real-Time Bidding in Advertising -- Defining the MDP -- Results of the Real-Time Bidding Environments -- Further Improvements -- Extensions to Q-Learning -- Double Q-Learning -- Delayed Q-Learning -- Comparing Standard, Double, and Delayed Q-learning -- Opposition Learning -- n-Step Algorithms -- n-Step Algorithms on Grid Environments -- Eligibility Traces -- Extensions to Eligibility Traces -- Watkins's Q( λ ) -- Fuzzy Wipes in Watkins's Q( λ )
|
938 |
|
|
|a ProQuest Ebook Central
|b EBLB
|n EBL6386759
|
994 |
|
|
|a 92
|b IZTAP
|