My ex-colleague and virtualization friend Ernst Cozijnsen who works a virtualization subject matter expert over at Capgemini did something awesome this week. We philosophised about unravelling the ESXi boot process in a few (long) phone calls the past few days and just before the start of the VMworld he actually did the unthinkable. Here’s his description about deploying VMware ESXi 4.1 in the enterprise.
Ok for us the time came to start deploying VMware ESXi 4.1 on some massive servers that rolled into or datacenter. Normally we use RDP (Altiris) for deployment of ESX (native) so naturally we also wanted to deploy VMware ESXi 4.1 in such a way. This article from Eric Wannemacher describes a way to inject ESXi into your RDP PXE boot menu and so giving you the possibility to PXE boot the ESXi installer and start your install.
The downside of this article is that the installer starts directly after booting the ESXi installer and destroys everything in its way including all files on your destination drive. The margin of error when using this specific method is very small and a catastrophic failure can be made very easily without any way of having a safety net. Since these kind of failures can have enormous consequences in a large enterprise environment we had to come up with a way that was more inside our "comfort zone".
Friday, October 8. 2010
Deploying VMware ESXi 4.1 the enterprise way
The (let’s call it) traditional way of deploying the normal ESX via RDP was by installing an automation partition on the target system drive. After booting the automation partition a job gets executed called "# Configure Boot Environment".
What this normally does is modifying the default.cfg kick-start file with the sever specific parameters and copying it to %ID%.cfg as is %ID% the computer object ID in RDP.
After this a script called "configuregrub.sh", "vmesx40.sh" or whatever is called which copies the vmlinuz (kernel) and initrd.img (ram drive) from the new ESX install tree to the /init directory of the automation partition. This way the correct kernel version is booted for the installer.
Also changes are made to the grub.conf so the correct kick-start file (%ID%.cfg) is called when booting the automation partition the next time.
Since ESXi uses a stripped version Linux and doesn't have the normal "vmlinuz" Kernel and initrd.img ram drive this way of replacing files is unusable for installing ESXi
Rather......
ESXi uses a small syslinux stripped boot loader to boot the multi kernel instance for the installer. So installing this boot loader would fix the problem right?
After I wrecked our central deployment sever for not 1 not 2 but 3 times a very nice and solid procedure came out.
- download the syslinux 3.70 tarbal sources from http://syslinux.zytor.com and place the tarbal in a new subdir of the ESXi install tree. MAKE SURE its version 3.70 otherwise it won’t work.
- In that same dir create a file "syslinux.cfg" containing the following code: This file NEEDS to be there for the boot loader to be configured correctly.
----------------------------------------------------
default 1
prompt 1
menu title VMware VMvisor Boot Menu
timeout 50
label 1
kernel mboot.c32
append vmkboot.gz ks= --- vmkernel.gz --- sys.vgz --- cim.vgz --- ienviron.vgz --- install.vgz
label 0
localboot 0x80
----------------------------------------------------
- remove the copy "vmlinuz" and "initrd.img" statement from the (e.g. vmesx40.sh) script and replace it with a procedure that does the following:
Copy the following files (form the ESXI install tree) to the root "/" of the automation partition. So not the root "/" of the ram drive from which you just PXE booted!!!!
cim.vgz
ienviron.vgz
install.vgz
mboot.c32
menu.c32 (optional of you want a bootloader menu with all kinds of nifty option)
sys.vgz
vmkboot.gz
vmkernel.gz
Now all ESXi files needed for booting the installer are on the spot!
- copy the syslinux.cfg file you created
do not forget to do a dos2unix of this file to remove the irritating "^M" sample: dos2unix syslinux.cfg ./tmpfile; mv ./tmpfile syslinux.cfg
- cd to the root of the automation partition and untar the syslinux tarbal
enter <automation partition root>/syslinux-3.70/linux
execute ./syslinux -s /dev/<target BOOT PARTIRION> (in my case /dev/sda1
The syslinux boot loader is now installed. Because GRUB screwed up the "masterboot record" (MBR) we need to replace this with a clean one.
- dd if=<automation partition>/syslinux-3.70/mbr/mbr.bin of=/dev/<targetdrive> use the device and NOT the partition
example: dd if=/mnt/hd/syslinux-3.70/mbr/mbr.bin of=/dev/sda
Now the MBR is clean again and the boot loader installed we can modify the syslinux.cfg file with the correct parameters. Now rework you syslinux.cfg so the correct "ks=" parameter is passed to the installer and your good to go. (check the vmesx40.sh) how the lines are modified and parsed.
Reboot your machine and be amazed ;-) Next time: injecting ESXi drivers unrevealed.