Intel Xeon Phi coprocessor high-performance programming /
Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engin...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Otros Autores: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Waltham, MA :
Morgan Kaufmann/Elsevier,
©2013.
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Front Cover; Intel® Xeon PhiTM Coprocessor High-Performance Programming; Copyright Page; Contents; Foreword; Preface; Organization; Lots-of-cores.com; Acknowledgements; 1 Introduction; Trend: more parallelism; Why Intel® Xeon PhiTM coprocessors are needed; Platforms with coprocessors; The first Intel® Xeon PhiTM coprocessor; Keeping the "Ninja Gap" under control; Transforming-and-tuning double advantage; When to use an Intel® Xeon PhiTM coprocessor; Maximizing performance on processors first; Why scaling past one hundred threads is so important; Maximizing parallel program performance
- Measuring readiness for highly parallel executionWhat about GPUs?; Beyond the ease of porting to increased performance; Transformation for performance; Hyper-threading versus multithreading; Coprocessor major usage model: MPI versus offload; Compiler and programming models; Cache optimizations; Examples, then details; For more information; 2 High Performance Closed Track Test Drive!; Looking under the hood: coprocessor specifications; Starting the car: communicating with the coprocessor; Taking it out easy: running our first code; Starting to accelerate: running more than one thread
- Petal to the metal: hitting full speed using all coresEasing in to the first curve: accessing memory bandwidth; High speed banked curved: maximizing memory bandwidth; Back to the pit: a summary; 3 A Friendly Country Road Race; Preparing for our country road trip: chapter focus; Getting a feel for the road: the 9-point stencil algorithm; At the starting line: the baseline 9-point stencil implementation; Rough road ahead: running the baseline stencil code; Cobblestone street ride: vectors but not yet scaling; Open road all-out race: vectors plus scaling
- Some grease and wrenches!: a bit of tuningAdjusting the "Alignment"; Using streaming stores; Using huge 2-MB memory pages; Summary; For more information; 4 Driving Around Town: Optimizing A Real-World Code Example; Choosing the direction: the basic diffusion calculation; Turn ahead: accounting for boundary effects; Finding a wide boulevard: scaling the code; Thunder road: ensuring vectorization; Peeling out: peeling code from the inner loop; Trying higher octane fuel: improving speed using data locality and tiling; High speed driver certificate: summary of our high speed tour
- 5 Lots of Data (Vectors)Why vectorize?; How to vectorize; Five approaches to achieving vectorization; Six step vectorization methodology; Step 1. Measure baseline release build performance; Step 2. Determine hotspots using Intel® VTuneTM Amplifier XE; Step 3. Determine loop candidates using Intel Compiler vec-report; Step 4. Get advice using the Intel Compiler GAP report and toolkit resources; Step 5. Implement GAP advice and other suggestions (such as using elemental functions and/or array notations); Step 6: Repeat!; Streaming through caches: data layout, alignment, prefetching, and so on