Best practices are described for optimizing Big Data applications running on VMware vSphere. Hardware, software, and vSphere configuration parameters are documented, as well as tuning parameters for the operating system,
Hadoop, and Spark. The Dell 12-server cluster (10 of which were dedicated to Hadoop worker processes) used in the test is described in detail, showing how the best practices were applied in its configuration. Test results are shown from two MapReduce and two Spark applications running on vSphere as well as directly on the hardware, and results from a reduced-size cluster of 5 worker servers.
The virtualized cluster outperformed the bare metal cluster by 5-10% for all MapReduce and Spark workloads with the exception of one Spark workload, which ran at parity. All workloads showed excellent scaling from 5 to 10 worker servers and from smaller to larger
dataset sizes.
Download the Technical White Paper: Big Data Performance on vSphere 6 Best Practices for Optimizing Virtualized Big Data Applications