Cargando…

Optimizing HPC applications with Intel® cluster tools /

Optimizing HPC Applications with Intel® Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interf...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Supalov, Alexander (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Berkeley, CA : ApressOpen, 2014.
Colección:Expert's voice in software engineering.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cam a2200000Ii 4500
001 OR_ocn893478338
003 OCoLC
005 20231017213018.0
006 m o d
007 cr cnu|||unuuu
008 141021s2014 caua ob 001 0 eng d
040 |a GW5XE  |b eng  |e rda  |e pn  |c GW5XE  |d COO  |d B24X7  |d OCLCO  |d UMI  |d DEBBG  |d E7B  |d UPM  |d OCLCF  |d EBLCP  |d OCL  |d OCLCQ  |d Z5A  |d LIV  |d ESU  |d OCLCQ  |d VT2  |d IOG  |d CEF  |d UAB  |d DEHBZ  |d VTS  |d REB  |d OCLCQ  |d MERER  |d YDXCP  |d U3W  |d AU@  |d WYU  |d YOU  |d OCLCQ  |d OAPEN  |d OCLCQ  |d LEAUB  |d CNCEN  |d UWK  |d OCLCQ  |d OCLCO  |d DCT  |d ERF  |d OCLCQ  |d ADU  |d UKKNU  |d BRF  |d OCLCQ  |d DIPCC  |d S2H  |d OCLCQ  |d AAA  |d OCLCO  |d OCLCQ 
019 |a 895116739  |a 896861824  |a 897432769  |a 900883023  |a 1005798414  |a 1026466454  |a 1048144882  |a 1055343303  |a 1056437276  |a 1059037595  |a 1066501737  |a 1086529949  |a 1086952905  |a 1103281025  |a 1105798456  |a 1107345047  |a 1110281888  |a 1110846153  |a 1112554158  |a 1112592095  |a 1119459551  |a 1129340472  |a 1153035476  |a 1159386313  |a 1162639450  |a 1163814492  |a 1166053629  |a 1166379255  |a 1179888057  |a 1192336585  |a 1224922934  |a 1228530265  |a 1235834693  |a 1240531568 
020 |a 9781430264972  |q (electronic bk.) 
020 |a 1430264977  |q (electronic bk.) 
020 |a 1430264969  |q (print) 
020 |a 9781430264965  |q (print) 
020 |z 9781430264965 
024 7 |a 10.1007/978-1-4302-6497-2  |2 doi 
024 8 |a 9781430264972 
029 1 |a AU@  |b 000058380604 
029 1 |a AU@  |b 000060583841 
029 1 |a DEBBG  |b BV042491002 
029 1 |a DEBSZ  |b 43484179X 
029 1 |a GBVCP  |b 882741381 
035 |a (OCoLC)893478338  |z (OCoLC)895116739  |z (OCoLC)896861824  |z (OCoLC)897432769  |z (OCoLC)900883023  |z (OCoLC)1005798414  |z (OCoLC)1026466454  |z (OCoLC)1048144882  |z (OCoLC)1055343303  |z (OCoLC)1056437276  |z (OCoLC)1059037595  |z (OCoLC)1066501737  |z (OCoLC)1086529949  |z (OCoLC)1086952905  |z (OCoLC)1103281025  |z (OCoLC)1105798456  |z (OCoLC)1107345047  |z (OCoLC)1110281888  |z (OCoLC)1110846153  |z (OCoLC)1112554158  |z (OCoLC)1112592095  |z (OCoLC)1119459551  |z (OCoLC)1129340472  |z (OCoLC)1153035476  |z (OCoLC)1159386313  |z (OCoLC)1162639450  |z (OCoLC)1163814492  |z (OCoLC)1166053629  |z (OCoLC)1166379255  |z (OCoLC)1179888057  |z (OCoLC)1192336585  |z (OCoLC)1224922934  |z (OCoLC)1228530265  |z (OCoLC)1235834693  |z (OCoLC)1240531568 
037 |a CL0500000540  |b Safari Books Online 
050 4 |a QA76.88 
072 7 |a UY  |2 bicssc 
072 7 |a COM014000  |2 bisacsh 
082 0 4 |a 004.1/1  |2 23 
049 |a UAMI 
245 0 0 |a Optimizing HPC applications with Intel® cluster tools /  |c Alexander Supalov, Andrey Semin, Michael Klemm, Christopher Dahnken. 
264 1 |a Berkeley, CA :  |b ApressOpen,  |c 2014. 
264 2 |a New York, NY :  |b Distributed to the Book trade worldwide by Springer 
264 4 |c ©2014 
300 |a 1 online resource (xxiv, 265 pages) :  |b illustrations 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
347 |a text file 
347 |b PDF 
490 1 |a The expert's voice in software engineering 
500 |a Includes index. 
588 0 |a Online resource; title from PDF title page (SpringerLink, viewed October 21, 2014). 
520 |a Optimizing HPC Applications with Intel® Cluster Tools takes the reader on a tour of the fast-growing area of high performance computing and the optimization of hybrid programs. These programs typically combine distributed memory and shared memory programming models and use the Message Passing Interface (MPI) and OpenMP for multi-threading to achieve the ultimate goal of high performance at low power consumption on enterprise-class workstations and compute clusters. The book focuses on optimization for clusters consisting of the Intel® Xeon processor, but the optimization methodologies also apply to the Intel® Xeon PhiTM coprocessor and heterogeneous clusters mixing both architectures. Besides the tutorial and reference content, the authors address and refute many myths and misconceptions surrounding the topic. The text is augmented and enriched by descriptions of real-life situations. 
504 |a Includes bibliographical references and index. 
542 |f Copyright © 2014 by Apress Media, LLC, all rights reserved  |g 2014 
546 |a English. 
505 0 |a Ch. 1 No Time to Read This Book? -- Using Intel MPI Library -- Using Intel Composer XE -- Tuning Intel MPI Library -- Gather Built-in Statistics -- Optimize Process Placement -- Optimize Thread Placement -- Tuning Intel Composer XE -- Analyze Optimization and Vectorization Reports -- Use Interprocedural Optimization -- Summary -- References -- ch. 2 Overview of Platform Architectures -- Performance Metrics and Targets -- Latency, Throughput, Energy, and Power -- Peak Performance as the Ultimate Limit -- Scalability and Maximum Parallel Speedup -- Bottlenecks and a Bit of Queuing Theory -- Roofline Model -- Performance Features of Computer Architectures -- Increasing Single-Threaded Performance: Where You Can and Cannot Help -- Process More Data with SIMD Parallelism -- Distributed and Shared Memory Systems -- HPC Hardware Architecture Overview -- A Multicore Workstation or a Server Compute Node -- Coprocessor for Highly Parallel Applications -- Group of Similar Nodes Form an HPC Cluster -- Other Important Components of HPC Systems -- Summary -- References -- ch. 3 Top-Down Software Optimization -- The Three Levels and Their Impact on Performance -- System Level -- Application Level -- Microarchitecture Level -- Closed-Loop Methodology -- Workload, Application, and Baseline -- Iterating the Optimization Process -- Summary -- References -- ch. 4 Addressing System Bottlenecks -- Classifying System-Level Bottlenecks -- Identifying Issues Related to System Condition -- Characterizing Problems Caused by System Configuration -- Understanding System-Level Performance Limits -- Checking General Compute Subsystem Performance -- Testing Memory Subsystem Performance -- Testing I/O Subsystem Performance -- Characterizing Application System-Level Issues -- Selecting Performance Characterization Tools -- Monitoring the I/O Utilization -- Analyzing Memory Bandwidth -- Summary -- References -- ch. 5 Addressing Application Bottlenecks: Distributed Memory -- Algorithm for Optimizing MPI Performance -- Comprehending the Underlying MPI Performance -- Recalling Some Benchmarking Basics -- Gauging Default Intranode Communication Performance -- Gauging Default Internode Communication Performance -- Discovering Default Process Layout and Pinning Details -- Gauging Physical Core Performance -- Doing Initial Performance Analysis -- Is It Worth the Trouble? -- Getting an Overview of Scalability and Performance -- Learning Application Behavior -- Choosing Representative Workload(s) -- Balancing Process and Thread Parallelism -- Doing a Scalability Review -- Analyzing the Details of the Application Behavior -- Choosing the Optimization Objective -- Detecting Load Imbalance -- Dealing with Load Imbalance -- Classifying Load Imbalance -- Addressing Load Imbalance -- Optimizing MPI Performance -- Classifying the MPI Performance Issues -- Addressing MPI Performance Issues -- Mapping Application onto the Platform -- Tuning the Intel MPI Library -- Optimizing Application for Intel MPI -- Using Advanced Analysis Techniques -- Automatically Checking MPI Program Correctness -- Comparing Application Traces -- Instrumenting Application Code -- Correlating MPI and Hardware Events -- Summary -- References -- ch. 6 Addressing Application Bottlenecks: Shared Memory -- Profiling Your Application -- Using VTune Amplifier XE for Hotspots Profiling -- Hotspots for the HPCG Benchmark -- Compiler-Assisted Loop/Function Profiling -- Sequential Code and Detecting Load Imbalances -- Thread Synchronization and Locking -- Dealing with Memory Locality and NUMA Effects -- Thread and Process Pinning -- Controlling OpenMP Thread Placement -- Thread Placement in Hybrid Applications -- Summary -- References -- ch. 7 Addressing Application Bottlenecks: Microarchitecture -- Overview of a Modern Processor Pipeline -- Pipelined Execution -- Out-of-order vs. In-order Execution -- Superscalar Pipelines -- SIMD Execution -- Speculative Execution: Branch Prediction -- Memory Subsystem -- Putting It All Together: A Final Look at the Sandy Bridge Pipeline -- A Top-down Method for Categorizing the Pipeline Performance -- Intel Composer XE Usage for Microarchitecture Optimizations -- Basic Compiler Usage and Optimization -- Using Optimization and Vectorization Reports to Read the Compiler's Mind -- Optimizing for Vectorization -- Dealing with Disambiguation -- Dealing with Branches -- When Optimization Leads to Wrong Results -- Analyzing Pipeline Performance with Intel VTune Amplifier XE -- Using a Standard Library Method -- Summary -- References -- ch. 8 Application Design Considerations -- Abstraction and Generalization of the Platform Architecture -- Types of Abstractions -- Levels of Abstraction and Complexities -- Raw Hardware vs. Virtualized Hardware in the Cloud -- Questions about Application Design -- Designing for Performance and Scaling -- Designing for Flexibility and Performance Portability -- Understanding Bounds and Projecting Bottlenecks -- Data Storage or Transfer vs. Recalculation -- Total Productivity Assessment -- Summary -- References. 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
650 0 |a High performance computing. 
650 0 |a Supercomputers. 
650 6 |a Superinformatique. 
650 6 |a Superordinateurs. 
650 7 |a Computer science.  |2 bicssc 
650 7 |a High performance computing.  |2 fast  |0 (OCoLC)fst00956032 
650 7 |a Supercomputers.  |2 fast  |0 (OCoLC)fst01138790 
653 |a Computer science 
700 1 |a Supalov, Alexander,  |e author. 
758 |i Is found in:  |a Apress  |1 https://openresearchlibrary.org/module/8b6e954c-c94f-4241-bea2-12704534d0e6 
776 0 8 |i Printed edition:  |z 9781430264965 
830 0 |a Expert's voice in software engineering. 
856 4 0 |u https://learning.oreilly.com/library/view/~/9781430264972/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
938 |a Books 24x7  |b B247  |n bks00073674 
938 |a ProQuest Ebook Central  |b EBLB  |n EBL3091883 
938 |a ebrary  |b EBRY  |n ebr10952632 
938 |a Knowledge Unlatched  |b KNOW  |n 8efd21de-950b-4d01-ad23-c75fdb2e75de 
938 |a OAPEN Foundation  |b OPEN  |n 1001835 
938 |a YBP Library Services  |b YANK  |n 12143232 
938 |a DCS UAT TEST 8  |b TEST  |n 1001835 
994 |a 92  |b IZTAP