Cargando…

Optimizing HPC applications with Intel® cluster tools /

Optimizing HPC Applications with Intel® Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interf...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Supalov, Alexander (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Berkeley, CA : ApressOpen, 2014.
Colección:	Expert's voice in software engineering.
Temas:	High performance computing. Supercomputers. Superinformatique. Superordinateurs. Computer science. Computer science
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

Tabla de Contenidos:

Ch. 1 No Time to Read This Book?
Using Intel MPI Library
Using Intel Composer XE
Tuning Intel MPI Library
Gather Built-in Statistics
Optimize Process Placement
Optimize Thread Placement
Tuning Intel Composer XE
Analyze Optimization and Vectorization Reports
Use Interprocedural Optimization
Summary
References
ch. 2 Overview of Platform Architectures
Performance Metrics and Targets
Latency, Throughput, Energy, and Power
Peak Performance as the Ultimate Limit
Scalability and Maximum Parallel Speedup
Bottlenecks and a Bit of Queuing Theory
Roofline Model
Performance Features of Computer Architectures
Increasing Single-Threaded Performance: Where You Can and Cannot Help
Process More Data with SIMD Parallelism
Distributed and Shared Memory Systems
HPC Hardware Architecture Overview
A Multicore Workstation or a Server Compute Node
Coprocessor for Highly Parallel Applications
Group of Similar Nodes Form an HPC Cluster
Other Important Components of HPC Systems
Summary
References
ch. 3 Top-Down Software Optimization
The Three Levels and Their Impact on Performance
System Level
Application Level
Microarchitecture Level
Closed-Loop Methodology
Workload, Application, and Baseline
Iterating the Optimization Process
Summary
References
ch. 4 Addressing System Bottlenecks
Classifying System-Level Bottlenecks
Identifying Issues Related to System Condition
Characterizing Problems Caused by System Configuration
Understanding System-Level Performance Limits
Checking General Compute Subsystem Performance
Testing Memory Subsystem Performance
Testing I/O Subsystem Performance
Characterizing Application System-Level Issues
Selecting Performance Characterization Tools
Monitoring the I/O Utilization
Analyzing Memory Bandwidth
Summary
References
ch. 5 Addressing Application Bottlenecks: Distributed Memory
Algorithm for Optimizing MPI Performance
Comprehending the Underlying MPI Performance
Recalling Some Benchmarking Basics
Gauging Default Intranode Communication Performance
Gauging Default Internode Communication Performance
Discovering Default Process Layout and Pinning Details
Gauging Physical Core Performance
Doing Initial Performance Analysis
Is It Worth the Trouble?
Getting an Overview of Scalability and Performance
Learning Application Behavior
Choosing Representative Workload(s)
Balancing Process and Thread Parallelism
Doing a Scalability Review
Analyzing the Details of the Application Behavior
Choosing the Optimization Objective
Detecting Load Imbalance
Dealing with Load Imbalance
Classifying Load Imbalance
Addressing Load Imbalance
Optimizing MPI Performance
Classifying the MPI Performance Issues
Addressing MPI Performance Issues
Mapping Application onto the Platform
Tuning the Intel MPI Library
Optimizing Application for Intel MPI
Using Advanced Analysis Techniques
Automatically Checking MPI Program Correctness
Comparing Application Traces
Instrumenting Application Code
Correlating MPI and Hardware Events
Summary
References
ch. 6 Addressing Application Bottlenecks: Shared Memory
Profiling Your Application
Using VTune Amplifier XE for Hotspots Profiling
Hotspots for the HPCG Benchmark
Compiler-Assisted Loop/Function Profiling
Sequential Code and Detecting Load Imbalances
Thread Synchronization and Locking
Dealing with Memory Locality and NUMA Effects
Thread and Process Pinning
Controlling OpenMP Thread Placement
Thread Placement in Hybrid Applications
Summary
References
ch. 7 Addressing Application Bottlenecks: Microarchitecture
Overview of a Modern Processor Pipeline
Pipelined Execution
Out-of-order vs. In-order Execution
Superscalar Pipelines
SIMD Execution
Speculative Execution: Branch Prediction
Memory Subsystem
Putting It All Together: A Final Look at the Sandy Bridge Pipeline
A Top-down Method for Categorizing the Pipeline Performance
Intel Composer XE Usage for Microarchitecture Optimizations
Basic Compiler Usage and Optimization
Using Optimization and Vectorization Reports to Read the Compiler's Mind
Optimizing for Vectorization
Dealing with Disambiguation
Dealing with Branches
When Optimization Leads to Wrong Results
Analyzing Pipeline Performance with Intel VTune Amplifier XE
Using a Standard Library Method
Summary
References
ch. 8 Application Design Considerations
Abstraction and Generalization of the Platform Architecture
Types of Abstractions
Levels of Abstraction and Complexities
Raw Hardware vs. Virtualized Hardware in the Cloud
Questions about Application Design
Designing for Performance and Scaling
Designing for Flexibility and Performance Portability
Understanding Bounds and Projecting Bottlenecks
Data Storage or Transfer vs. Recalculation
Total Productivity Assessment
Summary
References.

Optimizing HPC applications with Intel® cluster tools /

Ejemplares similares