A Distributed OpenCL Framework using Redundant Computation and Data Replication (PLDI 2016 - Research Papers)

Who

Junghyun Kim, Gangwon Jo, Jaehoon Jung, Jungwon Kim, Jaejin Lee

Track

PLDI 2016 Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 16 Jun 2016 17:30 - 18:00 at Grand Ballroom San Rafael - Parallelism I Chair(s): Tony Hosking

Abstract

Applications written solely in CUDA or OpenCL cannot execute on a cluster that runs multiple operating system instances. For this reason, many studies have been done to extend these programming models to clusters. Most previous approaches are based on a common idea: designating a centralized host and coordinating the other nodes by the host for computation. However, the centralized host may be a significant performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework for large-scale heterogeneous clusters. To overcome the limitations of the centralized approaches, it executes the host program of an OpenCL application in each node of the cluster by exploiting redundant computation with data replication. This reduces inter-node communication and synchronization overhead significantly. In addition, the proposed framework applies several optimization techniques, such as remote device virtualization and queueing optimization, to reduce the command delivery and enqueueing overhead. We also propose a new OpenCL API function to alleviate the command scheduling overhead. We show the effectiveness of the framework by evaluating it with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster of 512 nodes and a medium-scale GPU cluster of 36 nodes.

Junghyun Kim

Seoul National University

Korea, South

Gangwon Jo

Seoul National University

Korea, South

Jaehoon Jung

Seoul National University

Korea, South

Jungwon Kim

Oak Ridge National Laboratory

United States

Jaejin Lee

Seoul National University

A Distributed OpenCL Framework using Redundant Computation and Data Replication - Junghyun Kim

slides [pdf]

slides [pptx]

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 16 Jun
Displayed time zone: Tijuana, Baja California change

17:00 - 18:00	Parallelism IResearch Papers at Grand Ballroom San Rafael Chair(s): Tony Hosking Australian National University, Data61, and Purdue University

17:00 30m Talk		Higher-Order and Tuple-Based Massively-Parallel Prefix Sums Research Papers Sepideh Maleki Texas State University, Annie Yang Texas State University, Martin Burtscher Texas State University Pre-print Media Attached
17:30 30m Talk		A Distributed OpenCL Framework using Redundant Computation and Data Replication Research Papers Junghyun Kim Seoul National University, Gangwon Jo Seoul National University, Jaehoon Jung Seoul National University, Jungwon Kim Oak Ridge National Laboratory, Jaejin Lee Seoul National University Media Attached