By abstracting away the complexity of distributed systems, large-scale data processing platforms—MapReduce, Hadoop, Spark, Dryad, etc.—have provided developers with simple means for harnessing the power of the cloud. In this paper, we ask whether we can automatically synthesize MapReduce-style distributed programs from input–output examples. Our ultimate goal is to enable end users to specify large-scale data analyses through the simple interface of examples. We thus present a new algorithm and tool for synthesizing programs composed of efficient data-parallel operations that can execute on cloud computing infrastructure. We evaluate our tool on a range of real-world big-data analysis tasks and general computations. Our results demonstrate the efficiency of our approach and the small number of examples it requires to synthesize correct, scalable programs.
Thu 16 Jun
|13:30 - 14:00|
|14:00 - 14:30|
Ravi ChughUniversity of Chicago, Brian HempelUniversity of Chicago, Mitchell SpradlinUniversity of Chicago, Jacob AlbersUniversity of ChicagoPre-print Media Attached
|14:30 - 15:00|
Calvin LoncaricUniversity of Washington, Emina TorlakUniversity of Washington, Michael D. ErnstUniversity of WashingtonMedia Attached