Design and optimization of highly heterogeneous HPC plateforms for scientific applications

Design and optimization of highly heterogeneous HPC plateforms for scientific applications

This last decade HPC platforms are getting increasingly heterogeneous and hierarchical. The main source of heterogeneity in many individual computing nodes is due to the utilization of specialized accelerators such as, FPGA, DSPs, and GPUs alongside general purpose CPUs. Moreover, heterogeneous many-core processors are also another source of intra-node heterogeneity. The last generation of HPC clusters become more heterogeneous, due to increasing number of different processing devices, hierarchical approach needs to be taken with respect to memory and communication interconnects to reduce complexity. During recent years, many scientific codes have been ported to multicore and GPU architectures.

Highly heterogeneous and hierarchical HPC platforms, called, hardware-accelerated multicore clusters, are widely used in high performance computing due to better power efficiency and performance/price ratio. The consequence of the use of multicores in HPC consisted in a significant refactoring of existing parallel applications. For instance, in the case of the general-purpose computing on GPUs, new programming models, such as CUDA and OpenCL were proposed. A large number of algorithms and specific applications have been successfully ported to GPUs delivering substantial speedup over their optimized CPU counterparts. Transition to hybrid CPU/GPU architectures is challenging in the aspects of efficient utilization of the heterogeneous hardware and reuse of the software stack. In existing programming and execution environments for hybrid platforms, such as OpenCL, StarPU [1] and CHPS [2], the problem of efficient cross-device data partitioning and load balancing still remains.

To achieve optimum performance of these applications on hybrid platforms software heterogeneity needs to be take into account. Therefore, the goal of this thesis is to propose a new programming model efficient and implementation of data parallel scientific applications for such highly heterogeneous and hierarchical platforms represent a significant scientific and engineering challenge. In order to validate the proposed solutions, application real world scientific problems (as in medical image processing and biostatistics) will be performed.

This work will be done in a collaboration context between the LISSI lab and the company ATOS (ex. BULL).

The applicant should hold a Master degree (or equivalent) in operations research, computer and / or applied mathematics and have a strong background in mathematics and computer programming and if possible have a culture of optimization algorithms.

He must be motivated to research and teamwork with strong interpersonal and writing skills.

The applications should be sent to nakib@u-pec.fr with the following documents: CV, cover letter, survey notes and ranking in the master search, recommendation letters (if possible).

starting: autumn 2017 for 3 years.

References

  1. C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, “StarPU:A unified platform for task scheduling on heterogeneous multicore architectures,” Euro-Par 2009, pp. 863–874, 2009.
  2. A. Ilic and L. Sousa, “Collaborative execution environment for heterogeneous parallel systems,” in IPDPS Workshops and Phd Forum (IPDPSW), pp. 1–8, 2010.
  3. T.-A. Nguyen, A. Nakib, H.-N. Nguyen, “Optimal Distributed Non-local Means filtering for Hybrid Parallel Architecture,” Computer methods and programs in Biomedicine (elsevier), vol. 127, pp.29-39, 2016.