Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointin

MAR 29, 2008-1 MIN
Scale Cast – A podcast about big data, distributed systems, and scalability

Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointin

MAR 29, 2008-1 MIN

Description

This talk takes us on a journey through three varied, but interconnected
topics. First, our research lab has engaged in a series of disk-based
computations extending over five years. Disks have traditionally
been used for filesystems, for virtual memory, and for databases.
Disk-based computation opens up an important fourth use: an abstraction
for multiple disks that allows parallel programs to treat them in a
manner similar to RAM. The key observation is that 50 disks have
approximately the same parallel bandwidth as a _single_ RAM subsystem.
This leaves latency as the primary concern. A second key is the use
of techniques like delayed duplicate detection to avoid latency

link to video