This paper presents a dynamically heterogeneous architecture use-case that is both realistic and favorable for distributed work-stealing in regular parallel applications. Using a straightforward implementation of distributed dense matrix multiplication in X10’s Global Load Balancing (GLB) library, we show that moderate differences in node processing power allow work-stealing to significantly outperform a standard static schedule such as SUMMA. It also scales comparably on up to 128 cores.