Optimizing HPC applications with Intel® cluster tools /
Optimizing HPC Applications with Intel® Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interf...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Berkeley, CA :
ApressOpen,
2014.
|
Colección: | Expert's voice in software engineering.
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Ch. 1 No Time to Read This Book?
- Using Intel MPI Library
- Using Intel Composer XE
- Tuning Intel MPI Library
- Gather Built-in Statistics
- Optimize Process Placement
- Optimize Thread Placement
- Tuning Intel Composer XE
- Analyze Optimization and Vectorization Reports
- Use Interprocedural Optimization
- Summary
- References
- ch. 2 Overview of Platform Architectures
- Performance Metrics and Targets
- Latency, Throughput, Energy, and Power
- Peak Performance as the Ultimate Limit
- Scalability and Maximum Parallel Speedup
- Bottlenecks and a Bit of Queuing Theory
- Roofline Model
- Performance Features of Computer Architectures
- Increasing Single-Threaded Performance: Where You Can and Cannot Help
- Process More Data with SIMD Parallelism
- Distributed and Shared Memory Systems
- HPC Hardware Architecture Overview
- A Multicore Workstation or a Server Compute Node
- Coprocessor for Highly Parallel Applications
- Group of Similar Nodes Form an HPC Cluster
- Other Important Components of HPC Systems
- Summary
- References
- ch. 3 Top-Down Software Optimization
- The Three Levels and Their Impact on Performance
- System Level
- Application Level
- Microarchitecture Level
- Closed-Loop Methodology
- Workload, Application, and Baseline
- Iterating the Optimization Process
- Summary
- References
- ch. 4 Addressing System Bottlenecks
- Classifying System-Level Bottlenecks
- Identifying Issues Related to System Condition
- Characterizing Problems Caused by System Configuration
- Understanding System-Level Performance Limits
- Checking General Compute Subsystem Performance
- Testing Memory Subsystem Performance
- Testing I/O Subsystem Performance
- Characterizing Application System-Level Issues
- Selecting Performance Characterization Tools
- Monitoring the I/O Utilization
- Analyzing Memory Bandwidth
- Summary
- References
- ch. 5 Addressing Application Bottlenecks: Distributed Memory
- Algorithm for Optimizing MPI Performance
- Comprehending the Underlying MPI Performance
- Recalling Some Benchmarking Basics
- Gauging Default Intranode Communication Performance
- Gauging Default Internode Communication Performance
- Discovering Default Process Layout and Pinning Details
- Gauging Physical Core Performance
- Doing Initial Performance Analysis
- Is It Worth the Trouble?
- Getting an Overview of Scalability and Performance
- Learning Application Behavior
- Choosing Representative Workload(s)
- Balancing Process and Thread Parallelism
- Doing a Scalability Review
- Analyzing the Details of the Application Behavior
- Choosing the Optimization Objective
- Detecting Load Imbalance
- Dealing with Load Imbalance
- Classifying Load Imbalance
- Addressing Load Imbalance
- Optimizing MPI Performance
- Classifying the MPI Performance Issues
- Addressing MPI Performance Issues
- Mapping Application onto the Platform
- Tuning the Intel MPI Library
- Optimizing Application for Intel MPI
- Using Advanced Analysis Techniques
- Automatically Checking MPI Program Correctness
- Comparing Application Traces
- Instrumenting Application Code
- Correlating MPI and Hardware Events
- Summary
- References
- ch. 6 Addressing Application Bottlenecks: Shared Memory
- Profiling Your Application
- Using VTune Amplifier XE for Hotspots Profiling
- Hotspots for the HPCG Benchmark
- Compiler-Assisted Loop/Function Profiling
- Sequential Code and Detecting Load Imbalances
- Thread Synchronization and Locking
- Dealing with Memory Locality and NUMA Effects
- Thread and Process Pinning
- Controlling OpenMP Thread Placement
- Thread Placement in Hybrid Applications
- Summary
- References
- ch. 7 Addressing Application Bottlenecks: Microarchitecture
- Overview of a Modern Processor Pipeline
- Pipelined Execution
- Out-of-order vs. In-order Execution
- Superscalar Pipelines
- SIMD Execution
- Speculative Execution: Branch Prediction
- Memory Subsystem
- Putting It All Together: A Final Look at the Sandy Bridge Pipeline
- A Top-down Method for Categorizing the Pipeline Performance
- Intel Composer XE Usage for Microarchitecture Optimizations
- Basic Compiler Usage and Optimization
- Using Optimization and Vectorization Reports to Read the Compiler's Mind
- Optimizing for Vectorization
- Dealing with Disambiguation
- Dealing with Branches
- When Optimization Leads to Wrong Results
- Analyzing Pipeline Performance with Intel VTune Amplifier XE
- Using a Standard Library Method
- Summary
- References
- ch. 8 Application Design Considerations
- Abstraction and Generalization of the Platform Architecture
- Types of Abstractions
- Levels of Abstraction and Complexities
- Raw Hardware vs. Virtualized Hardware in the Cloud
- Questions about Application Design
- Designing for Performance and Scaling
- Designing for Flexibility and Performance Portability
- Understanding Bounds and Projecting Bottlenecks
- Data Storage or Transfer vs. Recalculation
- Total Productivity Assessment
- Summary
- References.