In this VMworld TV interview, VCDX Josh Odgers explains the Nutanix system reliability. The Nutanix Virtual Computing Platform is designed and engineered from the ground-up to provide best-in-class reliability, and to efficiently cope with possible hardware and software failures. The distributed software architecture runs a virtual storage controller (Controller VM or CVM) on each node forming a distributed system.
All nodes actively work together to
aggregate individual direct-attached storage resources into a single global namespace that
can be leveraged by all hosts. All storage resources are managed by the Nutanix Distributed
Filesystem (NDFS) to ensure that data and system integrity is preserved in the event of
node, disk or software failure.
Data Protection
NDFS uses a distributed Oplog (short for operations log, and analogous to a fault-tolerant
journal on a local filesystem) as a staging area to absorb incoming writes onto a fast,
low-latency SSD tier. The data is then coalesced and later written (i.e., drained) to back-end
storage resources (Extent Store) asynchronously.
The Extent Store is made up of extents,
which are variable-sized contiguous regions of a vDisk (i.e., an NDFS file).
NDFS implements a fully distributed design that prevents Oplog data loss in the event of a
node failure. Before any write is acknowledged to the host, it is synchronously replicated to
an Oplog on another adjacent node. All nodes participate in replication. Only after the data
– and its associated metadata – is replicated, will the host receive an acknowledgment of a
successful write. This ensures that data exists in at least two independent locations within
the cluster and is fault tolerant.
Learn more at: Nutanix Tech Note System Reliability