OpenStack – Take 2 – Doing It The Hard Way

This is Part 2 of an ongoing series of testing AMD’s SeaMicro 15000 chassis with OpenStack. (Note: Part 1 is delayed while it gets vetted by AMD for technical accuracy)

In the first part, we configured the SeaMicro for a basic OpenStack deploying using MaaS and Juju for bootstrapping and orchestration. This works fine for the purposes of showing what OpenStack looks like and what it can do (quickly) on the SeaMicro hardware. That’s great and all, but Juju is only easily configurable within the context of the options provided with its specific service charms. Because it fully manages the configuration, any manual configuration added (for example, the console proxy in our previous example) will get wiped out if any Juju changes (to a relationship for example) are made.

For production purposes, there are other more powerful orchestration suites out there (Puppet, Chef, SaltStack, etc) but because they are highly configurable they also require a significantly larger amount of manual intervention and scripting. This makes sense, of course, since the reason Juju is as rapid and easy as it is is exactly the same reason that it is of questionable value in a production deployment. To that end, we’re going to deploy OpenStack on the SeaMicro chassis the hard way: from scratch.

The Architecture

If you’re going to do something, you should do it right. We decided to take a slightly different approach to the design of the stack than the Juju based reference architecture did. If we were creating a “cloud-in-a-box” for our own datacenter, we would be attempting to optimize for scalability, availability and performance. This means highly available control services capable of running one or more additional SeaMicro chassis, as well as optimizing the number of compute resources available within the chassis. While the bootstrap/install node isn’t required to be a blade on the chassis for this exercise, we’re going to leave it in place with the expectation that it would be used in future testing as an orchestration control node. Based on that, we have 63 server cards available to us. The Juju deployment used 9 of them, with most control services being non-redundant. (The only redundant services were the Cinder block services)

The Juju based architecture had the following layout for the 29 configured servers:

  • Machine 1 – Juju control node
  • Machine 2 – MySQL database node
  • Machine 3 – RabbitMQ AMPQ node
  • Machine 4 – Keystone identity service, Glance image service, and Horizon OpenStack dashboard service
  • Machine 5 – Nova cloud controller
  • Machine 6 – Neutron network gateway
  • Machines 7-9 – Cinder block storage nodes
  • Machines 10-29 – Nova compute nodes

The goal of this experiment would be to end up with the following layout:

  • Machines 1-3 – Controller nodes (Keystone, Nova controller, MySQL, RabbitMQ and Neutron
  • Machines 4-6 – Storage nodes (Cinder, Ceph, Swift, etc… Specific backends TBD
  • Machines – 7-63 – Nova compute nodes

As part of this re-deployment we’ll be running through each of the OpenStack control services, their ability to be made highly available, and what configuration is required to do so.

A world without MaaS

Since we’re eliminating the MaaS bootstrap server, we’ll need to replace the services it provides. NAT and routing are configured in the kernel still, so the cloud servers will still have the same internet access as before. The services we’ll need to replace are:

  • TFTP – using tftpd-hpa for serving the PXE boot images to the servers on initial install
  • DHCP – using isc-dhcpd for address assignment and TFTP server options
  • DNS – MaaS uses dnsmasq to cache/serve local DNS. We’ll just be replacing this with a DHCP option for the upstream DNS servers for simplicity’s sake

Configuring the bootstrap process

Ubuntu makes this easy, with a simple apt-get install tftpd-hpa. Because the services conflict, this step will uninstall MaaS and start the tftpd-hpa service.
On our MaaS server, isc-dhcp was already installed, so we just needed to create a /etc/dhcpd.conf file. Since we want to have “static” IP addresses, we’ll create fixed leases for every server rather than an actual DHCP pool.

First, we need all the MAC addresses of the down servers (everything except our MaaS server):

seasm15k01# show server summary | include /0 | include Intel | exclude up

This is easily converted into a shell, perl, or whatever your text parsing language of choice is to get a DHCP config that looks something like the following:


subnet 10.1.1.0 netmask 255.255.255.0 {
filename “pxelinux.0”;
option subnet-mask 255.255.255.0;
option broadcast-address 10.1.1.255;
option domain-name “local”;
option routers 10.1.1.1;
option interface-mtu 9000; # Need this for Neutron GRE
}

host controller-0 {
hardware ethernet 00:22:99:ec:00:00;
fixed-address 10.1.1.20;
}

etc…

Restart the DHCP server process and we now have a functioning DHCP environment directing the servers to our TFTP server.

Fun with preseeds

The basic Ubuntu netboot image loads an interactive installer, which is great for when we configured the MaaS server, but nobody wants to manually enter information for 63 servers for installation. By passing some preseed information into the kernel, we can have it download and run the installer unattended, it just needs some hints as to what it should be doing.

This took a lot of trial and error, even just to get a good environment with nothing but the base system tools and ssh server. (Which is all we want for the controller nodes for now)

The PXE defaults config we settled on looks something like this:


default install
label install
menu label ^Install
menu default
kernel linux
append initrd=ubuntu-installer/amd64/initrd.gz console=ttyS0,9600n8 auto=true priority=critical
interface=auto netcfg/dhcp_timeout=120 preseed/url=http://10.1.1.1/preseed-openstack.cfg — quiet
ipappend 2

This tells the kernel to use the SeaMicro serial console (mandatory in this environment), interface eth0, to disable hard-based interface renaming, and to fetch a preseed file hosted on the MaaS server for install.

The preseed file partitions the disk based on a standard single-partition layout (plus swap), creates a “ubuntu” user with a long useless password, disables root login, and copies an authorized keys file for ssh use into /home/ubuntu/.ssh, allowing ssh login post-install. Since we’re not using a fastpath installer like MaaS does, this takes a bit of time, but it’s hands off once the preseed file is created. Once we get to installing compute nodes later on, we’ll probably find a way to use the preseed file to script the installation and configuration of the OpenStack components on the compute nodes, but since the controller nodes will be installed manually (for this experiment), there isn’t any reason to add much beyond basic ssh access in the initial preseed.

One note: Ubuntu insists on overwriting the preseed target’s /etc/network/interfaces with what it uses to bootstrap the network. Because of udev, this may not be accurate and causes the server to come up without a network. A hacky solution that seems to work is to download an interfaces file, then chattr +i /target/etc/network/interfaces at the end of the preseed so the installer cannot overwrite it. Additionally, udev was exhibiting some strange behavior, renaming only eth0 and eth3 to the old ethX nomenclature on most servers, but leaving the other 5 interfaces as the newer pXpX style. This unfortunately seemed to be somewhat inconsistent, with some servers acknowledging the udev persistent net rules file to rename all interfaces to ethX, and others ignoring it. Since eth0 was renamed in all cases, we decided to ignore this issue for the time being since this isn’t intended to be a production preseed environment.

To be continued…. Once all these nodes install and boot.