Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointin

MAR 29, 2008-1 MIN
Scale Cast – A podcast about big data, distributed systems, and scalability

Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointin

MAR 29, 2008-1 MIN

Description

This talk takes us on a journey through three varied, but interconnected topics. First, our research lab has engaged in a series of disk-based computations extending over five years. Disks have traditionally been used for filesystems, for virtual memory, and for databases. Disk-based computation opens up an important fourth use: an abstraction for multiple disks that allows parallel programs to treat them in a manner similar to RAM. The key observation is that 50 disks have approximately the same parallel bandwidth as a _single_ RAM subsystem. This leaves latency as the primary concern. A second key is the use of techniques like delayed duplicate detection to avoid latency link to video