Thu 16 Jun 2016 17:00 - 17:30 at Grand Ballroom San Rafael - Parallelism I Chair(s): Tony Hosking

Prefix sums are an important parallel primitive, especially in massively-parallel programs. This paper discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a GPU-friendly algorithm for computing prefix sums and other scans that directly supports higher orders and tuple values. Its templated CUDA implementation unifies all of these computations in a single 100-statement kernel. SAM is communication-efficient in the sense that it minimizes main-memory accesses. When computing prefix sums of a million or more values, it outperforms Thrust and CUDPP on both a Titan X and a K40 GPU. On the Titan X, SAM reaches memory-copy speeds for large input sizes, which cannot be surpassed. SAM outperforms CUB, the currently fastest conventional prefix sum implementation, by up to a factor of 2.9 on eighth-order prefix sums and by up to a factor of 2.6 on eight-tuple prefix sums.

Thu 16 Jun

Displayed time zone: Tijuana, Baja California change

17:00 - 18:00
Parallelism IResearch Papers at Grand Ballroom San Rafael
Chair(s): Tony Hosking Australian National University, Data61, and Purdue University
Higher-Order and Tuple-Based Massively-Parallel Prefix Sums
Research Papers
Sepideh Maleki Texas State University, Annie Yang Texas State University, Martin Burtscher Texas State University
Pre-print Media Attached
A Distributed OpenCL Framework using Redundant Computation and Data Replication
Research Papers
Junghyun Kim Seoul National University, Gangwon Jo Seoul National University, Jaehoon Jung Seoul National University, Jungwon Kim Oak Ridge National Laboratory, Jaejin Lee Seoul National University
Media Attached