SWITCH | SWITCHdrive | SWITCHengines |

Switch Engines Volume Reliability


#1

Hi,

We would like to host a website and database on Switch Engines. As we don’t want to loose any data, we are currently trying to figure out how reliable Switch Engines is.

The documentation mentions Snapshots which look useful for periodically backing up the whole machine. However, we were wondering how reliable the Volume storage is. Is our volume stored in a redundant fashion, be that e.g. RAID or replication?

Cheers, Philipp


#2

All volume data is stored with 3-way replication on separate disks and separate servers.

The underlying mechanism is Ceph RBD. Its data storage is very reliable in our experience. Even though we have to replace faulty disks every now and then, data remains intact and usually accessible. Access doesn’t even slow down much when we lose a disk, as data is just served from the remaining two copies while the missing 3rd copies are restored between the (hundreds) or remaining disks. Access does slow down noticeably when a replacement disk is inserted, as all those 3rd copies are then copied back to the “right” disk, which creates a bottleneck.

In the past few days (between 6 and 14 October) we had some networking issues that lead to (temporary) unavailability of volume data. This is a consequence of a tradeoff—“if in doubt, block access rather than risking data loss or inconsistency”. We regret these issues, they were very hard to diagnose, but we think we found the origin and should be able to work around them from now on until the underlying software bug (presumably in the Linux kernel) is fixed.

Another nice feature is that Ceph regularly “scrubs” its data: The three copies are read and compared periodically. This will detect accidental modifications of data “at rest”—and also latent disk errors.


#3

That sounds great and is definitely more than good enough for us. I think we also experienced some of the storage issues in the last few days, but well, if it’s fixed now we are happy:-)

Thank you for the detailed response!