<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Russ Garrett &#187; netboot</title>
	<atom:link href="http://russ.garrett.co.uk/tag/netboot/feed/" rel="self" type="application/rss+xml" />
	<link>http://russ.garrett.co.uk</link>
	<description></description>
	<lastBuildDate>Wed, 02 Jun 2010 21:20:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Diskless Web Serving for Fun and Profit</title>
		<link>http://russ.garrett.co.uk/2008/12/03/diskless-web-serving-for-fun-and-profit/</link>
		<comments>http://russ.garrett.co.uk/2008/12/03/diskless-web-serving-for-fun-and-profit/#comments</comments>
		<pubDate>Wed, 03 Dec 2008 00:17:30 +0000</pubDate>
		<dc:creator>Russ</dc:creator>
				<category><![CDATA[Systems Admin]]></category>
		<category><![CDATA[boot]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[netboot]]></category>
		<category><![CDATA[pxe]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://russ.garrett.co.uk/?p=12</guid>
		<description><![CDATA[We&#8217;ve used network-booting diskless servers at Last.fm ever since we got our third web server back in 2004. I think it&#8217;s one of the best architectural decisions we&#8217;ve made, yet there&#8217;s precious little information around about running diskless servers. Hopefully this article will go some way towards rectifying that. Why? First, a summary of why [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve used network-booting diskless servers at Last.fm ever since we got our third web server back in 2004. I think it&#8217;s one of the best architectural decisions we&#8217;ve made, yet there&#8217;s precious little information around about running diskless servers. Hopefully this article will go some way towards rectifying that.</p>
<h3>Why?</h3>
<p>First, a summary of why diskless web serving rocks so much:</p>
<ul>
<li> No disks mean less failures, hence less maintenance. I hate disks.</li>
<li>One single image means only one copy of your web serving environment to maintain.</li>
<li>It&#8217;s very easy to bring web servers online &#8211; seconds from power-on to web serving.</li>
</ul>
<p>There are situations where diskless web serving won&#8217;t work, primarily when the content you want to serve won&#8217;t fit into an economical amount of RAM. If your code base isn&#8217;t enormous and you serve your static assets separately (which you absolutely should be doing), you shouldn&#8217;t hit this problem.</p>
<p>Of course, you can use this for other purposes than just web serving &#8211; we also boot our <a href="http://hadoop.apache.org/">Hadoop</a> clusters off the network.</p>
<h3>What you need</h3>
<ul>
<li>A server to boot from: the hardware requirements for this are minimal, however it must be reliable. If this dies, your entire web cluster does. Availability for this can be improved using <a href="http://www.linux-ha.org/HaNFS">HA-NFS</a> and similar trickery.</li>
<li>Between 1 and N web nodes. The only hardware requirement for your web nodes is that they support <a href="http://en.wikipedia.org/wiki/Preboot_Execution_Environment">PXE</a> booting, but pretty much everything does these days.</li>
</ul>
<p>I&#8217;m going to assume that you&#8217;re using Debian/Ubuntu on your machines. Debian&#8217;s installation tools are very handy for getting this running; your mileage may vary with other operating systems.</p>
<h3>Booting</h3>
<p>Here&#8217;s how a server will boot up into a web serving environment:</p>
<ol>
<li>The machine powers on. The BIOS is configured to use PXE in its boot order.</li>
<li>The network card&#8217;s PXE code brings up the link and gets an IP via DHCP (usually this will be a static MAC-&gt;IP mapping).</li>
<li>The DHCP server contains the &#8220;next-server&#8221; and &#8220;filename&#8221; options. The PXE client connects to the next-server, grabs the file (which happens to be <a href="http://syslinux.zytor.com/wiki/index.php/PXELINUX">PXELINUX</a>), and executes it.</li>
<li>PXELINUX grabs the boot configuration for the machine, which has the Linux kernel and initrd details in. It grabs them from TFTP and runs Linux.</li>
<li>Linux starts and runs the initrd.</li>
<li>The initrd mounts the root filesystem, switches to it, then starts init. From here on, it&#8217;s the plain Linux boot sequence.</li>
</ol>
<p>It&#8217;s a bit of a marathon, but the stages are actually quite simple. (And it&#8217;s magical when it works.) For the purposes of tying things together nicely, I&#8217;m going to run through configuring these steps roughly backwards.</p>
<h3>Bootstrap a Linux install</h3>
<p>You&#8217;ll need a Linux install for your webservers to run. This will be a full Linux filesystem which will exist on the boot server. To make this we use Debian&#8217;s excellent <code>debootstrap</code> tool. We&#8217;ll put a Debian Etch install in the directory <code>/netboot/root</code>:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">root<span style="color: #000000; font-weight: bold;">@</span>bootserver:<span style="color: #000000; font-weight: bold;">/</span>netboot<span style="color: #666666; font-style: italic;"># debootstrap etch ./root</span>
I: Retrieving Release
I: Retrieving Packages
I: Validating Packages
...</pre></div></div>

<p>Once that finishes (it will take a while), you should have a pristine Debian install. One thing you should do immediately is set up the <code>debian_chroot</code> file so you know if you&#8217;re inside the image or not:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">root<span style="color: #000000; font-weight: bold;">@</span>bootserver:<span style="color: #000000; font-weight: bold;">/</span>netboot<span style="color: #666666; font-style: italic;"># echo &quot;webserver&quot; &amp;gt; /netboot/root/etc/debian_chroot</span></pre></div></div>

<p>Now you can <code>chroot</code> into the filesystem and start configuring your new install:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">root<span style="color: #000000; font-weight: bold;">@</span>bootserver:<span style="color: #000000; font-weight: bold;">/</span>netboot<span style="color: #666666; font-style: italic;"># chroot /netboot/root</span>
<span style="color: #7a0874; font-weight: bold;">&#40;</span>webserver<span style="color: #7a0874; font-weight: bold;">&#41;</span>root<span style="color: #000000; font-weight: bold;">@</span>bootserver:<span style="color: #000000; font-weight: bold;">/</span><span style="color: #666666; font-style: italic;">#</span></pre></div></div>

<p>You&#8217;re now inside the image. You need to set up the <code>fstab</code> so that <code>/proc</code> gets mounted on boot. On a web server it&#8217;s also good security practice to mount <code>/tmp noexec</code>:</p>

<div class="wp_syntax"><div class="code"><pre class="ini" style="font-family:monospace;">none        /       tmpfs   defaults        <span style="">0</span>       <span style="">0</span>
proc        /proc   proc    defaults        <span style="">0</span>       <span style="">0</span>
none        /tmp    tmpfs   noexec          <span style="">0</span>       <span style="">0</span></pre></div></div>

<h4>Tuning: Linux Swapless Memory Management</h4>
<p>It&#8217;s worth noting that Linux&#8217;s memory management strategy doesn&#8217;t take kindly to being run with high memory pressures and no swap. People have tried various approaches to solving this in a diskless environment, even going so far as putting swap partitions on network block devices. We&#8217;ve found that it&#8217;s not too hard to keep things under control if you&#8217;re careful, even with PHP&#8217;s poor memory management.</p>
<p>There are two methods we use: firstly, leave a 10% safety margin when allocating your Apache <code>MaxChildren</code>. 10% &#8220;wasted&#8221; RAM may seem bad, but it&#8217;s a small price to pay for maintainability.</p>
<p>Secondly, put these settings in <code>/etc/sysctl.conf</code>:</p>

<div class="wp_syntax"><div class="code"><pre class="ini" style="font-family:monospace;">vm.overcommit_memory<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">1</span>
vm.vfs_cache_pressure<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">300</span>
vm.min_free_kbytes<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">32768</span></pre></div></div>

<p>I might document these better in a future post, but it&#8217;s at least a good start.</p>
<p>Bear in mind that in statistics, the size of the tmpfs root filesystem will not show up as &#8220;used&#8221; memory, it will show up as &#8220;cached&#8221;. Annoyingly, there&#8217;s no easy way of telling how much of your cached memory is essential and how much can be &#8220;swapped out&#8221;.</p>
<p>(Oh, and keep an eye out for shared memory leaks if you&#8217;re seeing inexplicable out-of-memory issues. That one kept me guessing for 4 months. Shared memory is also accounted for under &#8220;cached&#8221;.)</p>
<h3>The kernel and initrd</h3>
<p>Next, you should choose which kernel you want to use. It might be possible to use a distribution&#8217;s stock kernel, but it simplifies things a lot if you have a custom kernel with the modules you require statically compiled in. Generally this is pretty minimal: just Ethernet, NFS and possibly USB HID drivers are needed.</p>
<p>You now need to build a skeleton initrd, which you can do with <code>mkinitrd</code>, then mount it:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">root<span style="color: #000000; font-weight: bold;">@</span>bootserver:~<span style="color: #666666; font-style: italic;"># mkinitrd -o ./netboot.img</span>
root<span style="color: #000000; font-weight: bold;">@</span>bootserver:~<span style="color: #666666; font-style: italic;"># mkdir netboot</span>
root<span style="color: #000000; font-weight: bold;">@</span>bootserver:~<span style="color: #666666; font-style: italic;"># mount -o loop ./netboot.img ./netboot</span>
root<span style="color: #000000; font-weight: bold;">@</span>bootserver:~<span style="color: #666666; font-style: italic;"># ls ./netboot</span>
bin  bin2  dev  dev2  devfs  etc  keyscripts  lib  lib64  linuxrc  linuxrc.conf  loadmodules  mnt  proc  sbin  script  scripts  sys  tmp  usr  var</pre></div></div>

<p>I&#8217;m going to skirt around the finer points of initrd construction here; the initrd <code>mkinitrd</code> provides is slight overkill for our needs. The important thing is that you add a custom <code>linuxrc</code> file into your initrd to configure the network and root filesystem. <a href="http://static.last.fm/russ/linuxrc">Here&#8217;s</a> the one we use &#8211; it&#8217;s a little crude but it works (refinements welcome).</p>
<p>Once that&#8217;s done, unmount and compress it:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">root<span style="color: #000000; font-weight: bold;">@</span>bootserver:~<span style="color: #666666; font-style: italic;"># umount ./netboot</span>
root<span style="color: #000000; font-weight: bold;">@</span>bootserver:~<span style="color: #666666; font-style: italic;"># gzip -9 ./netboot.img</span></pre></div></div>

<h3>TFTP</h3>
<p>Set up a TFTP server of your choice. I like atftpd, but I find it quite hard to get excited about TFTP servers so I won&#8217;t prescribe one. I&#8217;ll assume that it&#8217;s working and it&#8217;s serving from the directory <code>/tftproot</code>. Install <a href="http://syslinux.zytor.com/wiki/index.php/PXELINUX">PXELINUX</a> into that directory, as well as your kernel (<code>vmlinux</code>) and initrd. You should have this directory structure:</p>
<pre>/tftpboot/netboot.img.gz              (your initrd)
/tftpboot/vmlinux-2.6.27.7-amd64      (the kernel)
/tftpboot/pxelinux.0                  (PXELINUX itself)
/tftpboot/pxelinux.cfg/               (PXELINUX config directory)</pre>
<p>Into the pxelinux.cfg directory, you can now put a file called default, in this format:</p>

<div class="wp_syntax"><div class="code"><pre class="ini" style="font-family:monospace;">LABEL linux
        KERNEL vmlinuz-2.6.27.7-amd64
        APPEND initrd<span style="color: #000066; font-weight:bold;">=</span><span style="color: #660066;">netboot.img.gz ramdisk_size=8192</span></pre></div></div>

<h3>DHCP</h3>
<p>Lastly, you need to set up your DHCP server to send the correct boot options to your web nodes. I&#8217;ll assume you&#8217;re using ISC dhcpd v3, which seems to be a decent enough DHCP server. In <code>dhcpd.conf</code>, we create a separate group for web servers (this is just a snippet, it assumes you have a working config beforehand):</p>

<div class="wp_syntax"><div class="code"><pre class="ini" style="font-family:monospace;">group <span style="">&#123;</span>
        next-server 10.0.0.10<span style="color: #666666; font-style: italic;">;          # IP address of your boot server</span>
        filename <span style="color: #933;">&quot;/pxelinux.0&quot;</span><span style="color: #666666; font-style: italic;">;         # Path of pxelinux on your boot server, relative to the tftp root</span>
        option root-path <span style="color: #933;">&quot;10.0.0.10:/export/root,actimeo=120&quot;</span><span style="color: #666666; font-style: italic;">;   # Where to mount your root from</span>
&nbsp;
        # An example web node, statically mapped by MAC address:
        host www1 <span style="">&#123;</span>
                hardware ethernet 00:E0:<span style="">81</span>:2F:<span style="">64</span>:6C<span style="color: #666666; font-style: italic;">;</span>
                fixed-address 10.0.1.1<span style="color: #666666; font-style: italic;">;</span>
                option host-name <span style="color: #933;">&quot;www1&quot;</span><span style="color: #666666; font-style: italic;">;</span>
        <span style="">&#125;</span>
<span style="">&#125;</span></pre></div></div>

<h4>Tuning: The NFS root filesystem</h4>
<p>You can see <code>,actimeo=120</code> in the root-path option. This is a standard mount option for NFS, and it&#8217;s used to control the stat (or getattr) cache. On your web nodes, all your system files will be NFS-mounted. In some cases files in these directories will be hit very frequently (glibc loves statting <code>/etc/localtime</code>) &#8211; you don&#8217;t want to incur a network trip every time that happens. This setting sets the cache to 120 seconds, so be aware that it may cause some weirdness.</p>
<h3>Icing</h3>
<p>Provided I haven&#8217;t missed anything, you should be able to boot a node and have it load up your Linux install. That&#8217;s the hard part.</p>
<p>We have init scripts which copy our web codebase onto the tmpfs ramdisk, then launch Apache and Memcache with parameters appropriate to the machine spec. Those are pretty specialist, though, so I&#8217;m not publishing them.</p>
<p>To finish by way of a list of credits, here&#8217;s a quick list of the other things which keep our web cluster ticking over smoothly:</p>
<ul>
<li><a href="http://ganglia.info/">Ganglia</a>: lightweight, comprehensive low-level monitoring.</li>
<li><a href="http://www.cacti.net/">Cacti</a>: customisable higher-level monitoring.</li>
<li><a href="http://sourceforge.net/projects/dsh">dsh</a>: distributed shell.</li>
<li><a href="http://www.danga.com/perlbal/">Perlbal</a>: configurable layer 7 load-balancing.</li>
<li><a href="http://www.linuxvirtualserver.org/">LVS</a>: fast layer 3 load-balancing.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://russ.garrett.co.uk/2008/12/03/diskless-web-serving-for-fun-and-profit/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
