Cargando…

Optimizing HPC applications with Intel® cluster tools /

Optimizing HPC Applications with Intel® Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interf...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Supalov, Alexander (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Berkeley, CA : ApressOpen, 2014.
Colección:Expert's voice in software engineering.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Ch. 1 No Time to Read This Book?
  • Using Intel MPI Library
  • Using Intel Composer XE
  • Tuning Intel MPI Library
  • Gather Built-in Statistics
  • Optimize Process Placement
  • Optimize Thread Placement
  • Tuning Intel Composer XE
  • Analyze Optimization and Vectorization Reports
  • Use Interprocedural Optimization
  • Summary
  • References
  • ch. 2 Overview of Platform Architectures
  • Performance Metrics and Targets
  • Latency, Throughput, Energy, and Power
  • Peak Performance as the Ultimate Limit
  • Scalability and Maximum Parallel Speedup
  • Bottlenecks and a Bit of Queuing Theory
  • Roofline Model
  • Performance Features of Computer Architectures
  • Increasing Single-Threaded Performance: Where You Can and Cannot Help
  • Process More Data with SIMD Parallelism
  • Distributed and Shared Memory Systems
  • HPC Hardware Architecture Overview
  • A Multicore Workstation or a Server Compute Node
  • Coprocessor for Highly Parallel Applications
  • Group of Similar Nodes Form an HPC Cluster
  • Other Important Components of HPC Systems
  • Summary
  • References
  • ch. 3 Top-Down Software Optimization
  • The Three Levels and Their Impact on Performance
  • System Level
  • Application Level
  • Microarchitecture Level
  • Closed-Loop Methodology
  • Workload, Application, and Baseline
  • Iterating the Optimization Process
  • Summary
  • References
  • ch. 4 Addressing System Bottlenecks
  • Classifying System-Level Bottlenecks
  • Identifying Issues Related to System Condition
  • Characterizing Problems Caused by System Configuration
  • Understanding System-Level Performance Limits
  • Checking General Compute Subsystem Performance
  • Testing Memory Subsystem Performance
  • Testing I/O Subsystem Performance
  • Characterizing Application System-Level Issues
  • Selecting Performance Characterization Tools
  • Monitoring the I/O Utilization
  • Analyzing Memory Bandwidth
  • Summary
  • References
  • ch. 5 Addressing Application Bottlenecks: Distributed Memory
  • Algorithm for Optimizing MPI Performance
  • Comprehending the Underlying MPI Performance
  • Recalling Some Benchmarking Basics
  • Gauging Default Intranode Communication Performance
  • Gauging Default Internode Communication Performance
  • Discovering Default Process Layout and Pinning Details
  • Gauging Physical Core Performance
  • Doing Initial Performance Analysis
  • Is It Worth the Trouble?
  • Getting an Overview of Scalability and Performance
  • Learning Application Behavior
  • Choosing Representative Workload(s)
  • Balancing Process and Thread Parallelism
  • Doing a Scalability Review
  • Analyzing the Details of the Application Behavior
  • Choosing the Optimization Objective
  • Detecting Load Imbalance
  • Dealing with Load Imbalance
  • Classifying Load Imbalance
  • Addressing Load Imbalance
  • Optimizing MPI Performance
  • Classifying the MPI Performance Issues
  • Addressing MPI Performance Issues
  • Mapping Application onto the Platform
  • Tuning the Intel MPI Library
  • Optimizing Application for Intel MPI
  • Using Advanced Analysis Techniques
  • Automatically Checking MPI Program Correctness
  • Comparing Application Traces
  • Instrumenting Application Code
  • Correlating MPI and Hardware Events
  • Summary
  • References
  • ch. 6 Addressing Application Bottlenecks: Shared Memory
  • Profiling Your Application
  • Using VTune Amplifier XE for Hotspots Profiling
  • Hotspots for the HPCG Benchmark
  • Compiler-Assisted Loop/Function Profiling
  • Sequential Code and Detecting Load Imbalances
  • Thread Synchronization and Locking
  • Dealing with Memory Locality and NUMA Effects
  • Thread and Process Pinning
  • Controlling OpenMP Thread Placement
  • Thread Placement in Hybrid Applications
  • Summary
  • References
  • ch. 7 Addressing Application Bottlenecks: Microarchitecture
  • Overview of a Modern Processor Pipeline
  • Pipelined Execution
  • Out-of-order vs. In-order Execution
  • Superscalar Pipelines
  • SIMD Execution
  • Speculative Execution: Branch Prediction
  • Memory Subsystem
  • Putting It All Together: A Final Look at the Sandy Bridge Pipeline
  • A Top-down Method for Categorizing the Pipeline Performance
  • Intel Composer XE Usage for Microarchitecture Optimizations
  • Basic Compiler Usage and Optimization
  • Using Optimization and Vectorization Reports to Read the Compiler's Mind
  • Optimizing for Vectorization
  • Dealing with Disambiguation
  • Dealing with Branches
  • When Optimization Leads to Wrong Results
  • Analyzing Pipeline Performance with Intel VTune Amplifier XE
  • Using a Standard Library Method
  • Summary
  • References
  • ch. 8 Application Design Considerations
  • Abstraction and Generalization of the Platform Architecture
  • Types of Abstractions
  • Levels of Abstraction and Complexities
  • Raw Hardware vs. Virtualized Hardware in the Cloud
  • Questions about Application Design
  • Designing for Performance and Scaling
  • Designing for Flexibility and Performance Portability
  • Understanding Bounds and Projecting Bottlenecks
  • Data Storage or Transfer vs. Recalculation
  • Total Productivity Assessment
  • Summary
  • References.