A Distributed OpenCL Framework using Redundant Computation and Data Replication
Applications written solely in CUDA or OpenCL cannot execute on a cluster that runs multiple operating system instances. For this reason, many studies have been done to extend these programming models to clusters. Most previous approaches are based on a common idea: designating a centralized host and coordinating the other nodes by the host for computation. However, the centralized host may be a significant performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework for large-scale heterogeneous clusters. To overcome the limitations of the centralized approaches, it executes the host program of an OpenCL application in each node of the cluster by exploiting redundant computation with data replication. This reduces inter-node communication and synchronization overhead significantly. In addition, the proposed framework applies several optimization techniques, such as remote device virtualization and queueing optimization, to reduce the command delivery and enqueueing overhead. We also propose a new OpenCL API function to alleviate the command scheduling overhead. We show the effectiveness of the framework by evaluating it with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster of 512 nodes and a medium-scale GPU cluster of 36 nodes.
Thu 16 JunDisplayed time zone: Tijuana, Baja California change
17:00 - 18:00
|Higher-Order and Tuple-Based Massively-Parallel Prefix Sums|
Sepideh Maleki Texas State University, Annie Yang Texas State University, Martin Burtscher Texas State UniversityPre-print Media Attached
|A Distributed OpenCL Framework using Redundant Computation and Data Replication|
Junghyun Kim Seoul National University, Gangwon Jo Seoul National University, Jaehoon Jung Seoul National University, Jungwon Kim Oak Ridge National Laboratory, Jaejin Lee Seoul National UniversityMedia Attached