|
|
|
|
LEADER |
00000cam a22000007a 4500 |
001 |
OR_on1312162521 |
003 |
OCoLC |
005 |
20231017213018.0 |
006 |
m o d |
007 |
cr cnu---unuuu |
008 |
220423s2022 enka o 000 0 eng d |
040 |
|
|
|a EBLCP
|b eng
|e pn
|c EBLCP
|d ORMDA
|d OCLCO
|d UKMGB
|d OCLCF
|d OCLCQ
|d N$T
|d UKAHL
|d OCLCQ
|d IEEEE
|
015 |
|
|
|a GBC274179
|2 bnb
|
016 |
7 |
|
|a 020566484
|2 Uk
|
020 |
|
|
|a 1801817219
|
020 |
|
|
|a 9781801817219
|q (electronic bk.)
|
020 |
|
|
|z 9781801815697
|q (pbk.)
|
029 |
1 |
|
|a AU@
|b 000071607833
|
029 |
1 |
|
|a UKMGB
|b 020566484
|
035 |
|
|
|a (OCoLC)1312162521
|
037 |
|
|
|a 9781801815697
|b O'Reilly Media
|
037 |
|
|
|a 10163213
|b IEEE
|
050 |
|
4 |
|a Q325.5
|
082 |
0 |
4 |
|a 006.3/1
|2 23/eng/20220503
|
049 |
|
|
|a UAMI
|
100 |
1 |
|
|a Wang, Guanhua.
|
245 |
1 |
0 |
|a Distributed machine learning with Python :
|b accelerating model training and serving with distributed systems /
|c Guanhua Wang.
|
260 |
|
|
|a Birmingham :
|b Packt Publishing, Limited,
|c 2022.
|
300 |
|
|
|a 1 online resource (284 pages) :
|b color illustrations
|
336 |
|
|
|a text
|b txt
|2 rdacontent
|
337 |
|
|
|a computer
|b c
|2 rdamedia
|
338 |
|
|
|a online resource
|b cr
|2 rdacarrier
|
588 |
0 |
|
|a Print version record.
|
505 |
0 |
|
|a Intro -- Title page -- Copyright and Credits -- Dedication -- Contributors -- Table of Contents -- Preface -- Section 1 -- Data Parallelism -- Chapter 1: Splitting Input Data -- Single-node training is too slow -- The mismatch between data loading bandwidth and model training bandwidth -- Single-node training time on popular datasets -- Accelerating the training process with data parallelism -- Data parallelism -- the high-level bits -- Stochastic gradient descent -- Model synchronization -- Hyperparameter tuning -- Global batch size -- Learning rate adjustment -- Model synchronization schemes
|
520 |
|
|
|a Chapter 2: Parameter Server and All-Reduce -- Technical requirements -- Parameter server architecture -- Communication bottleneck in the parameter server architecture -- Sharding the model among parameter servers -- Implementing the parameter server -- Defining model layers -- Defining the parameter server -- Defining the worker -- Passing data between the parameter server and worker -- Issues with the parameter server -- The parameter server architecture introduces a high coding complexity for practitioners -- All-Reduce architecture -- Reduce -- All-Reduce -- Ring All-Reduce.
|
505 |
8 |
|
|a Collective communication -- Broadcast -- Gather -- All-Gather -- Summary -- Chapter 3: Building a Data Parallel Training and Serving Pipeline -- Technical requirements -- The data parallel training pipeline in a nutshell -- Input pre-processing -- Input data partition -- Data loading -- Training -- Model synchronization -- Model update -- Single-machine multi-GPUs and multi-machine multi-GPUs -- Single-machine multi-GPU -- Multi-machine multi-GPU -- Checkpointing and fault tolerance -- Model checkpointing -- Load model checkpoints -- Model evaluation and hyperparameter tuning
|
505 |
8 |
|
|a Model serving in data parallelism -- Summary -- Chapter 4: Bottlenecks and Solutions -- Communication bottlenecks in data parallel training -- Analyzing the communication workloads -- Parameter server architecture -- The All-Reduce architecture -- The inefficiency of state-of-the-art communication schemes -- Leveraging idle links and host resources -- Tree All-Reduce -- Hybrid data transfer over PCIe and NVLink -- On-device memory bottlenecks -- Recomputation and quantization -- Recomputation -- Quantization -- Summary -- Section 2 -- Model Parallelism -- Chapter 5: Splitting the Model
|
505 |
8 |
|
|a Technical requirements -- Single-node training error -- out of memory -- Fine-tuning BERT on a single GPU -- Trying to pack a giant model inside one state-of-the-art GPU -- ELMo, BERT, and GPT -- Basic concepts -- RNN -- ELMo -- BERT -- GPT -- Pre-training and fine-tuning -- State-of-the-art hardware -- P100, V100, and DGX-1 -- NVLink -- A100 and DGX-2 -- NVSwitch -- Summary -- Chapter 6: Pipeline Input and Layer Split -- Vanilla model parallelism is inefficient -- Forward propagation -- Backward propagation -- GPU idle time between forward and backward propagation -- Pipeline input
|
500 |
|
|
|a Pros and cons of pipeline parallelism.
|
590 |
|
|
|a O'Reilly
|b O'Reilly Online Learning: Academic/Public Library Edition
|
650 |
|
0 |
|a Machine learning.
|
650 |
|
0 |
|a Python (Computer program language)
|
650 |
|
6 |
|a Apprentissage automatique.
|
650 |
|
6 |
|a Python (Langage de programmation)
|
650 |
|
7 |
|a Machine learning.
|2 fast
|0 (OCoLC)fst01004795
|
650 |
|
7 |
|a Python (Computer program language)
|2 fast
|0 (OCoLC)fst01084736
|
776 |
0 |
8 |
|i Print version:
|a Wang, Guanhua.
|t Distributed Machine Learning with Python.
|d Birmingham : Packt Publishing, Limited, ©2022
|
856 |
4 |
0 |
|u https://learning.oreilly.com/library/view/~/9781801815697/?ar
|z Texto completo (Requiere registro previo con correo institucional)
|
938 |
|
|
|a Askews and Holts Library Services
|b ASKH
|n AH39813577
|
938 |
|
|
|a ProQuest Ebook Central
|b EBLB
|n EBL6956758
|
938 |
|
|
|a EBSCOhost
|b EBSC
|n 3242106
|
994 |
|
|
|a 92
|b IZTAP
|