This post has already been read 16614 times!
Hans de Leenheer blogged about Nexenta again good post about Nexenta failover techniques.
1) Disk management:
The first part of failover management is still plane old ZFS. It’s the way disks work together, how I/O is handled and how data corruption is avoided.
- Storage Pool: a set of (equally configured) vDEVs
- vDEV: a set of disks with a certain protection level
- RAIDZ: first form of physical disks protection level. Examples:
- RAIDZ-5: 4 disks data, 1 disk parity
- RAIDZ2-6: 4 disks data, 2 disks parity
- RAIDZ3-9: 6 disks data, 3 disks parity
- Mirrored Disks: second form of physical disks protection level. Here too you have multiple choices of failover (dual-mirror, triple-mirror, …)
Stripes: writes are striped across all members of a vDEV, therefore the speed/bandwidth will be truncated by the slowest member. To add bandwith/IOPS, add more vDEVs. But this is for parallel IO streams! One IO stream = 1 vDEV.
RAID5 write-hole: ZFS is protected against the RAID5 write-hole because it does copy-on-write instead of read-modify-write. If in case of a power failure some blocks are not completely acknowledged (incl parity-check), nothing would have happened.
Physical disk-enclosure failover: if you design smart, you could even design for disk-enclosure failover by aligning the vDEV physical disks vertically (1 per enclosure)
2) System Scale:
The next step is in-system scale. It’s not really high availability but an equally important step in the story going forward. Because you benefit a lot by a bigger bandwith with ZFS you’d want to "scale-out" first inside your system and then go to "scale-up". Here you find a simplified example of that.
So instead of daisy chaining the 3 first nodes and then adding a second and 3rd loop I did it the other way around so you’d have 3 loops of 6Gbps SAS connectivity from the beginning.
Remark: this is a theoretical model. I would have to doublecheck with the solutions team before designing this for a customer.