Cargando…

Distributed machine learning with Python : accelerating model training and serving with distributed systems /

Chapter 2: Parameter Server and All-Reduce -- Technical requirements -- Parameter server architecture -- Communication bottleneck in the parameter server architecture -- Sharding the model among parameter servers -- Implementing the parameter server -- Defining model layers -- Defining the parameter...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Wang, Guanhua
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Birmingham : Packt Publishing, Limited, 2022.
Temas:	Machine learning. Python (Computer program language) Apprentissage automatique. Python (Langage de programmation)
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

MARC


LEADER	00000cam a22000007a 4500
001	OR_on1312162521
003	OCoLC
005	20231017213018.0
006	m o d
007	cr cnu---unuuu
008	220423s2022 enka o 000 0 eng d
040			\|a EBLCP \|b eng \|e pn \|c EBLCP \|d ORMDA \|d OCLCO \|d UKMGB \|d OCLCF \|d OCLCQ \|d N$T \|d UKAHL \|d OCLCQ \|d IEEEE
015			\|a GBC274179 \|2 bnb
016	7		\|a 020566484 \|2 Uk
020			\|a 1801817219
020			\|a 9781801817219 \|q (electronic bk.)
020			\|z 9781801815697 \|q (pbk.)
029	1		\|a AU@ \|b 000071607833
029	1		\|a UKMGB \|b 020566484
035			\|a (OCoLC)1312162521
037			\|a 9781801815697 \|b O'Reilly Media
037			\|a 10163213 \|b IEEE
050		4	\|a Q325.5
082	0	4	\|a 006.3/1 \|2 23/eng/20220503
049			\|a UAMI
100	1		\|a Wang, Guanhua.
245	1	0	\|a Distributed machine learning with Python : \|b accelerating model training and serving with distributed systems / \|c Guanhua Wang.
260			\|a Birmingham : \|b Packt Publishing, Limited, \|c 2022.
300			\|a 1 online resource (284 pages) : \|b color illustrations
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
588	0		\|a Print version record.
505	0		\|a Intro -- Title page -- Copyright and Credits -- Dedication -- Contributors -- Table of Contents -- Preface -- Section 1 -- Data Parallelism -- Chapter 1: Splitting Input Data -- Single-node training is too slow -- The mismatch between data loading bandwidth and model training bandwidth -- Single-node training time on popular datasets -- Accelerating the training process with data parallelism -- Data parallelism -- the high-level bits -- Stochastic gradient descent -- Model synchronization -- Hyperparameter tuning -- Global batch size -- Learning rate adjustment -- Model synchronization schemes
520			\|a Chapter 2: Parameter Server and All-Reduce -- Technical requirements -- Parameter server architecture -- Communication bottleneck in the parameter server architecture -- Sharding the model among parameter servers -- Implementing the parameter server -- Defining model layers -- Defining the parameter server -- Defining the worker -- Passing data between the parameter server and worker -- Issues with the parameter server -- The parameter server architecture introduces a high coding complexity for practitioners -- All-Reduce architecture -- Reduce -- All-Reduce -- Ring All-Reduce.
505	8		\|a Collective communication -- Broadcast -- Gather -- All-Gather -- Summary -- Chapter 3: Building a Data Parallel Training and Serving Pipeline -- Technical requirements -- The data parallel training pipeline in a nutshell -- Input pre-processing -- Input data partition -- Data loading -- Training -- Model synchronization -- Model update -- Single-machine multi-GPUs and multi-machine multi-GPUs -- Single-machine multi-GPU -- Multi-machine multi-GPU -- Checkpointing and fault tolerance -- Model checkpointing -- Load model checkpoints -- Model evaluation and hyperparameter tuning
505	8		\|a Model serving in data parallelism -- Summary -- Chapter 4: Bottlenecks and Solutions -- Communication bottlenecks in data parallel training -- Analyzing the communication workloads -- Parameter server architecture -- The All-Reduce architecture -- The inefficiency of state-of-the-art communication schemes -- Leveraging idle links and host resources -- Tree All-Reduce -- Hybrid data transfer over PCIe and NVLink -- On-device memory bottlenecks -- Recomputation and quantization -- Recomputation -- Quantization -- Summary -- Section 2 -- Model Parallelism -- Chapter 5: Splitting the Model
505	8		\|a Technical requirements -- Single-node training error -- out of memory -- Fine-tuning BERT on a single GPU -- Trying to pack a giant model inside one state-of-the-art GPU -- ELMo, BERT, and GPT -- Basic concepts -- RNN -- ELMo -- BERT -- GPT -- Pre-training and fine-tuning -- State-of-the-art hardware -- P100, V100, and DGX-1 -- NVLink -- A100 and DGX-2 -- NVSwitch -- Summary -- Chapter 6: Pipeline Input and Layer Split -- Vanilla model parallelism is inefficient -- Forward propagation -- Backward propagation -- GPU idle time between forward and backward propagation -- Pipeline input
500			\|a Pros and cons of pipeline parallelism.
590			\|a O'Reilly \|b O'Reilly Online Learning: Academic/Public Library Edition
650		0	\|a Machine learning.
650		0	\|a Python (Computer program language)
650		6	\|a Apprentissage automatique.
650		6	\|a Python (Langage de programmation)
650		7	\|a Machine learning. \|2 fast \|0 (OCoLC)fst01004795
650		7	\|a Python (Computer program language) \|2 fast \|0 (OCoLC)fst01084736
776	0	8	\|i Print version: \|a Wang, Guanhua. \|t Distributed Machine Learning with Python. \|d Birmingham : Packt Publishing, Limited, ©2022
856	4	0	\|u https://learning.oreilly.com/library/view/~/9781801815697/?ar \|z Texto completo (Requiere registro previo con correo institucional)
938			\|a Askews and Holts Library Services \|b ASKH \|n AH39813577
938			\|a ProQuest Ebook Central \|b EBLB \|n EBL6956758
938			\|a EBSCOhost \|b EBSC \|n 3242106
994			\|a 92 \|b IZTAP

Distributed machine learning with Python : accelerating model training and serving with distributed systems /

MARC

Ejemplares similares