Cargando…

Deep learning and the game of Go /

The ancient strategy game of Go is an incredible case study for AI. In 2016, a deep learning-based system shocked the Go world by defeating a world champion. Shortly after that, the upgraded AlphaGo Zero crushed the original bot by using deep reinforcement learning to master the game. Now, you can l...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Pumperla, Max (Autor), Ferguson, Kevin (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Shelter Island, NY : Manning Publications Co., [2019]
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Intro
  • Deep Learning and the Game of Go
  • Max Pumperla and Kevin Ferguson
  • Copyright
  • Dedication
  • Brief Table of Contents
  • Table of Contents
  • front matter
  • Foreword
  • Preface
  • Acknowledgments
  • About this book
  • Who should read this book
  • Roadmap
  • About the code
  • Book forum
  • About the authors
  • About the cover illustration
  • Part 1. Foundations
  • 1 Toward deep learning: a machine-learning introduction
  • 1.1. What is machine learning?
  • 1.1.1. How does machine learning relate to AI?
  • 1.1.2. What you can and can't do with machine learning
  • 1.2. Machine learning by example
  • 1.2.1. Using machine learning in software applications
  • 1.2.2. Supervised learning
  • 1.2.3. Unsupervised learning
  • 1.2.4. Reinforcement learning
  • 1.3. Deep learning
  • 1.4. What you'll learn in this book
  • 1.5. Summary
  • 2 Go as a machine-learning problem
  • 2.1. Why games?
  • 2.2. A lightning introduction to the game of Go
  • 2.2.1. Understanding the board
  • 2.2.2. Placing and capturing stones
  • 2.2.3. Ending the game and counting
  • 2.2.4. Understanding ko
  • 2.3. Handicaps
  • 2.4. Where to learn more
  • 2.5. What can we teach a machine?
  • 2.5.1. Selecting moves in the opening
  • 2.5.2. Searching game states
  • 2.5.3. Reducing the number of moves to consider
  • 2.5.4. Evaluating game states
  • 2.6. How to measure your Go AI's strength
  • 2.6.1. Traditional Go ranks
  • 2.6.2. Benchmarking your Go AI
  • 2.7. Summary
  • 3 Implementing your first Go bot
  • 3.1. Representing a game of Go in Python
  • 3.1.1. Implementing the Go board
  • 3.1.2. Tracking connected groups of stones in Go: strings
  • 3.1.3. Placing and capturing stones on a Go board
  • 3.2. Capturing game state and checking for illegal moves
  • 3.2.1. Self-capture
  • 3.2.2. Ko
  • 3.3. Ending a game
  • 3.4. Creating your first bot: the weakest Go AI imaginable.
  • 3.5. Speeding up game play with Zobrist hashing
  • 3.6. Playing against your bot
  • 3.7. Summary
  • Part 2. Machine learning and game AI
  • 4 Playing games with tree search
  • 4.1. Classifying games
  • 4.2. Anticipating your opponent with minimax search
  • 4.3. Solving tic-tac-toe: a minimax example
  • 4.4. Reducing search space with pruning
  • 4.4.1. Reducing search depth with position evaluation
  • 4.4.2. Reducing search width with alpha-beta pruning
  • 4.5. Evaluating game states with Monte Carlo tree search
  • 4.5.1. Implementing Monte Carlo tree search in Python
  • 4.5.2. How to select which branch to explore
  • 4.5.3. Applying Monte Carlo tree search to Go
  • 4.6. Summary
  • 5 Getting started with neural networks
  • 5.1. A simple use case: classifying handwritten digits
  • 5.1.1. The MNIST data set of handwritten digits
  • 5.1.2. MNIST data preprocessing
  • 5.2. The basics of neural networks
  • 5.2.1. Logistic regression as simple artificial neural network
  • 5.2.2. Networks with more than one output dimension
  • 5.3. Feed-forward networks
  • 5.4. How good are our predictions? Loss functions and optimization
  • 5.4.1. What is a loss function?
  • 5.4.2. Mean squared error
  • 5.4.3. Finding minima in loss functions
  • 5.4.4. Gradient descent to find minima
  • 5.4.5. Stochastic gradient descent for loss functions
  • 5.4.6. Propagating gradients back through your network
  • 5.5. Training a neural network step-by-step in Python
  • 5.5.1. Neural network layers in Python
  • 5.5.2. Activation layers in neural networks
  • 5.5.3. Dense layers in Python as building blocks for feed-forward networks
  • 5.5.4. Sequential neural networks with Python
  • 5.5.5. Applying your network handwritten digit classification
  • 5.6. Summary
  • 6 Designing a neural network for Go data
  • 6.1. Encoding a Go game position for neural networks.
  • 6.2. Generating tree-search games as network training data
  • 6.3. Using the Keras deep-learning library
  • 6.3.1. Understanding Keras design principles
  • 6.3.2. Installing the Keras deep-learning library
  • 6.3.3. Running a familiar first example with Keras
  • 6.3.4. Go move prediction with feed-forward neural networks in Keras
  • 6.4. Analyzing space with convolutional networks
  • 6.4.1. What convolutions do intuitively
  • 6.4.2. Building convolutional neural networks with Keras
  • 6.4.3. Reducing space with pooling layers
  • 6.5. Predicting Go move probabilities
  • 6.5.1. Using the softmax activation function in the last layer
  • 6.5.2. Cross-entropy loss for classification problems
  • 6.6. Building deeper networks with dropout and rectified linear units
  • 6.6.1. Dropping neurons for regularization
  • 6.6.2. The rectified linear unit activation function
  • 6.7. Putting it all together for a stronger Go move-prediction network
  • 6.8. Summary
  • 7 Learning from data: a deep-learning bot
  • 7.1. Importing Go game records
  • 7.1.1. The SGF file format
  • 7.1.2. Downloading and replaying Go game records from KGS
  • 7.2. Preparing Go data for deep learning
  • 7.2.1. Replaying a Go game from an SGF record
  • 7.2.2. Building a Go data processor
  • 7.2.3. Building a Go data generator to load data efficiently
  • 7.2.4. Parallel Go data processing and generators
  • 7.3. Training a deep-learning model on human game-play data
  • 7.4. Building more-realistic Go data encoders
  • 7.5. Training efficiently with adaptive gradients
  • 7.5.1. Decay and momentum in SGD
  • 7.5.2. Optimizing neural networks with Adagrad
  • 7.5.3. Refining adaptive gradients with Adadelta
  • 7.6. Running your own experiments and evaluating performance
  • 7.6.1. A guideline to testing architectures and hyperparameters
  • 7.6.2. Evaluating performance metrics for training and test data.
  • 12.1.1. What is advantage?
  • 12.1.2. Calculating advantage during self-play
  • 12.2. Designing a neural network for actor-critic learning
  • 12.3. Playing games with an actor-critic agent
  • 12.4. Training an actor-critic agent from experience data
  • 12.5. Summary
  • Part 3. Greater than the sum of its parts
  • 13 AlphaGo: Bringing it all together
  • 13.1. Training deep neural networks for AlphaGo
  • 13.1.1. Network architectures in AlphaGo
  • 13.1.2. The AlphaGo board encoder
  • 13.1.3. Training AlphaGo-style policy networks
  • 13.2. Bootstrapping self-play from policy networks
  • 13.3. Deriving a value network from self-play data
  • 13.4. Better search with policy and value networks
  • 13.4.1. Using neural networks to improve Monte Carlo rollouts
  • 13.4.2. Tree search with a combined value function
  • 13.4.3. Implementing AlphaGo's search algorithm
  • 13.5. Practical considerations for training your own AlphaGo
  • 13.6. Summary
  • 14 AlphaGo Zero: Integrating tree search with reinforcement learning
  • 14.1. Building a neural network for tree search
  • 14.2. Guiding tree search with a neural network
  • 14.2.1. Walking down the tree
  • 14.2.2. Expanding the tree
  • 14.2.3. Selecting a move
  • 14.3. Training
  • 14.4. Improving exploration with Dirichlet noise
  • 14.5. Modern techniques for deeper neural networks
  • 14.5.1. Batch normalization
  • 14.5.2. Residual networks
  • 14.6. Exploring additional resources
  • 14.7. Wrapping up
  • 14.8. Summary
  • Appendix A. Mathematical foundations
  • Vectors, matrices, and beyond: a linear algebra primer
  • Vectors: one-dimensional data
  • Matrices: two-dimensional data
  • Rank 3 tensors
  • Rank 4 tensors
  • Calculus in five minutes: derivatives and finding maxima
  • Appendix B. The backpropagation algorithm
  • A bit of notation
  • The backpropagation algorithm for feed-forward networks.