<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>LinuxKVM | Linux</title>
	<atom:link href="http://linux.sjolshagen.net/tag/kvm/feed/" rel="self" type="application/rss+xml" />
	<link>http://linux.sjolshagen.net</link>
	<description>Linux for Businesses</description>
	<lastBuildDate>Wed, 01 Feb 2012 17:33:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Linux: Configure &#8220;bridge at boot&#8221; for NIC(s) in Fedora 13</title>
		<link>http://linux.sjolshagen.net/2010/07/28/linux-configure-bridge-at-boot-for-nics-in-fedora-13/</link>
		<comments>http://linux.sjolshagen.net/2010/07/28/linux-configure-bridge-at-boot-for-nics-in-fedora-13/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 14:31:51 +0000</pubDate>
		<dc:creator>Thomas S</dc:creator>
				<category><![CDATA[EqualLogic]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Mission Critical Computing]]></category>
		<category><![CDATA[Virtualization]]></category>
		<category><![CDATA[bridge]]></category>
		<category><![CDATA[Fedora 13]]></category>
		<category><![CDATA[ipv4]]></category>
		<category><![CDATA[iscsi]]></category>
		<category><![CDATA[KVM]]></category>
		<category><![CDATA[libvirt]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[network tuning]]></category>
		<category><![CDATA[NetworkManager]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[virsh]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://linux.sjolshagen.net/?p=385</guid>
		<description><![CDATA[Sometimes, for instance when having a limited number of Network Interface Cards (NICs) on a system that will be used for a Linux hosted platform virtualization solution (and you&#8217;re running Fedora 13), the easiest approach to giving each of the guests &#8220;direct&#8221; access to a network is to configure the physical devices as bridges on...]]></description>
			<content:encoded><![CDATA[<p>Sometimes, for instance when having a limited number of Network Interface Cards (NICs) on a system that will be used for a Linux hosted<a href="http://en.wikipedia.org/wiki/Hardware_virtualization#Concept"> platform virtualization</a> solution (and you&#8217;re running Fedora 13), the easiest approach to giving each of the guests &#8220;direct&#8221; access to a network is to configure the physical devices as <a href="http://gd.tuwien.ac.at/linuxcommand.org/man_pages/brctl8.html">bridge</a>s on the host.</p>
<p>This will permit the <a href="http://www.libvirt.org/">libvirt virtualization (management) abstraction interface</a> to easily build &#8220;briges of bridges&#8221; that in turn let a<a href="http://www.linux-kvm.com/"> Kernel Virtual Machine (KVM)</a> guest get it&#8217;s own &#8220;public&#8221; (only in quotes because I happen to think the average bear would not be so silly as to put their Linux/KVM host directly onto the internet. Right???) IP address and route its traffic directly onto the ether (via the lower levels of the IP stack of the host environment).</p>
<p>There are, as is the case with all things Linux or UNIX, a couple of ways to skin this particular bear (sorry, that&#8217;s bad!), but the one that makes the most sense to me is to have <span style="font-family: courier new,courier;">init</span> take care of the configuration as part of the system boot process (when the <span style="font-family: courier new,courier;">network </span>service executes). And doing that, although in its simplest form requires access to a terminal window and a text editor on the Fedora host, is actually very simple, once you know what you&#8217;re doing. Hopefully, the following will help you learn (if you don&#8217;t already know and are only reading this because you&#8217;re looking around and are a very bored individual).<span id="more-385"></span></p>
<p>At a high level, all you have to do is edit the <span style="font-family: courier new,courier;">/etc/sysconfig/network-scripts/ifcfg-ethX</span> file for the Ethernet (eth) device you want to bridge, specify that the device will belong to a bridge named &lt;bridgename&gt; and then create a <span style="font-family: courier new,courier;">/etc/sysconfig/network-scripts/ifcfg-&lt;bridgename&gt;</span> file that configures the bridge (assigning it a way to obtain an IP address &#8211; probably static, routing and DNS configuration information).</p>
<p>Prior to configuring the system to use bridging, I had configured static IP addresses on the two physical interfaces I will be creating bridges for. Since that configuration was being obsoleted in order to use the same interfaces for guest-to-iSCSI-SAN traffic, I edited and created the above mentioned configuration files. All of the <strong>edited items are in bold</strong>. Before you edit and create these files, life gets a little easier if you:</p>
<ol>
<li>Back up the original <span style="font-family: courier new,courier;">ifcfg-eth[N]</span> files to some other location than <span style="font-family: courier new,courier;">/etc/sysconfig/network-scripts/</span></li>
<li><span style="font-family: courier new,courier;"># ifdown eth[N]</span></li>
</ol>
<p>Then, as an example, the two-to-four files you need (ifcfg-eth[N], ifcfg-eth[N+1] and ifcfg-bridge[N] and ifcfg-bridge[N+1]).</p>
<p><span style="font-family: courier new,courier;">/etc/sysconfig/network-scripts/ifcfg-eth2:</span></p>
<pre># Intel Corporation 82571EB Gigabit Ethernet Controller
DEVICE=eth2
BOOTPROTO=none
HWADDR=00:15:17:6C:97:94
<strong>#IPADDR=X.X.Z.231
#NETMASK=255.255.255.0
#PREFIX=24
#DEFROUTE=yes</strong>
ONBOOT=yes
TYPE=Ethernet
<strong>#DNS1=[DNS-SERVER1]
#DNS2=[DNS-SERVER2]</strong>
IPV6INIT=no
USERCTL=no
IPV4_FAILURE_FATAL=yes
NAME="System eth2"
<strong>BRIDGE=iscsi-bridge0</strong>
<strong>MTU=9000</strong></pre>
<p><span style="font-family: courier new,courier;">/etc/sysconfig/network-scripts/ifcfg-eth3:</span></p>
<pre># Intel Corporation 82571EB Gigabit Ethernet Controller
DEVICE=eth3
<strong>#IPADDR=X.Y.Z.232
#NETMASK=255.255.255.0
#PREFIX=24
#DEFROUTE=yes
#IPV4_FAILURE_FATAL=yes</strong>
HWADDR=00:15:17:6C:97:95
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
<strong>BRIDGE=iscsi-bridge1</strong>
IPV6INIT=no
USERCTL=no
NAME="System eth3"
<strong>MTU=9000</strong></pre>
<p><span style="font-family: courier new,courier;">/etc/sysconfig/network-scripts/ifcfg-iscsi-bridge0:</span></p>
<pre><strong>DEVICE=iscsi-bridge0
ONBOOT=yes
TYPE=Bridge
IPADDR=X.Y.Z.231
NETMASK=255.255.255.0
STP=off
MTU=9000
DELAY=0</strong></pre>
<p><span style="font-family: courier new,courier;">/etc/sysconfig/network-scripts/ifcfg-iscsi-bridge1:</span></p>
<pre><strong>DEVICE=iscsi-bridge1
ONBOOT=yes
TYPE=Bridge
IPADDR=X.Y.Z.232
NETMASK=255.255.255.0
MTU=9000
STP=off
DELAY=0</strong></pre>
<p>Although there&#8217;s a perfectly valid way of achieving the same thing through the <span style="font-family: courier new,courier;">virsh/libvirtd</span> management interface to <span style="font-family: courier new,courier;">libvirt </span>as well as with the <a href="http://projects.gnome.org/NetworkManager/">Network Manager</a> tools, my preference is to make this configuration &#8220;stick&#8221; using the old <span style="font-family: courier new,courier;">network </span>init service. The problem(s) I see with the<span style="font-family: courier new,courier;"> NetworkManager</span>/<span style="font-family: courier new,courier;">libvirtd </span>approach is twofold:</p>
<ul>
<li>Timing of <span style="font-family: courier new,courier;">NetworkManager</span> start-up (not all that early) relative to the Open-iSCSI stack startup (early)</li>
<li>Timing of <span style="font-family: courier new,courier;">libvirtd </span>start-up (one of the last services to get called) relative to other iSCSI volumes needing to be available for the host environment.</li>
</ul>
<p>So, for this example, disable <span style="font-family: courier new,courier;">NetworkManager</span> as a boot service and enable the <span style="font-family: courier new,courier;">network</span> service:</p>
<pre># service NetworkManager stop
# chkconfig NetworkManager off
# chkconfig network on
# ifup iscsi-bridge0
# ifup iscsi-bridge1</pre>
<p>And, Bob&#8217;s yer uncle (or, at least, he should be!). To verify that everything is working properly, ping an IP target that should be reachable from the bridge device interface(s) only:</p>
<pre># ping -I iscsi-bridge0</pre>
<pre># ping -I iscsi-bridge1</pre>
<p>By the way: If you use bridged interfaces, no iSCSI volumes on the host (for guests only, in other words) and have iptables enabled on the host (which you should), make sure to configure your host iptables to leave the bridged interfaces alone. For details, see the &#8211; soon to be created, I promise &#8211; post about performance tuning the Linux IPv4 environment for iSCSI-initiators on this site. Alternatively, you can grab the information pertaining to the relevant bridge sysctl.conf entries from the <a href="https://inquiries.redhat.com/go/redhat/rhel-hp-proliant">KVM scalability white paper</a> Red Hat published (and I provided most of the content for in my previous career).</p>
]]></content:encoded>
			<wfw:commentRss>http://linux.sjolshagen.net/2010/07/28/linux-configure-bridge-at-boot-for-nics-in-fedora-13/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Scaling up your virtualization solution on 8-socket HP ProLiant Servers</title>
		<link>http://linux.sjolshagen.net/2010/03/01/scaling-up-your-virtualization-solution-on-8-socket-hp-proliant-servers/</link>
		<comments>http://linux.sjolshagen.net/2010/03/01/scaling-up-your-virtualization-solution-on-8-socket-hp-proliant-servers/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 19:10:28 +0000</pubDate>
		<dc:creator>Thomas S</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Virtualization]]></category>
		<category><![CDATA[8-socket]]></category>
		<category><![CDATA[DL 785]]></category>
		<category><![CDATA[HP ProLiant]]></category>
		<category><![CDATA[KVM]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Red Hat]]></category>
		<category><![CDATA[RHEL 5.4]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://linux.sjolshagen.net/?p=145</guid>
		<description><![CDATA[Some of the things we've learned while testing the KVM based virtualization solution in RHEL 5.4 on an 8-socket HP ProLiant server.]]></description>
			<content:encoded><![CDATA[<p>These days, when wearing my “Linux planner” hat, and with Virtualization being the “phrase that pays”, I’m often asked to help provide guidance on how to best take advantage of the technology included in our 8-socket HP ProLiant server offerings for Linux based virtualization solutions like Red Hat Enterprise Virtualization or Suse Linux Enterprise Server Xen (there’s a plethora of information out there about VMware ESX/ESXi 3.5.x and vSphere 4.0, so I’m not going to talk about that, this time around.)</p>
<p>The problem I’ve had, until recently, was providing actual – objective &#8211; data as a means to help illustrate my points.  For instance, I could not clearly illustrate how a snoop filter on the CPU interconnect can improve the linearity of the workload scalability in a virtualized environment (see Fig. 1).</p>
<div id="attachment_143" class="wp-caption aligncenter" style="width: 310px"><a href="http://linux.sjolshagen.net/wp-content/uploads/2010/03/Best-run-pinned-vs-unpinned.png"><img class="size-medium wp-image-143" title="Pinned and un-pinned tiles" src="http://linux.sjolshagen.net/wp-content/uploads/2010/03/Best-run-pinned-vs-unpinned-300x192.png" alt="" width="300" height="192" /></a><p class="wp-caption-text">Fig. 1: Average response time with pinned vs. un-pinned processors</p></div>
<p>I was unable to demonstrate benefits of the NUMA aware scheduler that the Linux kernel uses and how it <em>does</em> improve performance. (In figure 2, it’s represented by the improvement in average response times from the web-servers included in the workload) when your workloads run with memory interleaving disabled – see Fig. 2 and 3. Unless, for support reasons, your application vendor explicitly tells you otherwise, of course!</p>
<div id="attachment_151" class="wp-caption aligncenter" style="width: 310px"><a rel="attachment wp-att-151" href="http://linux.sjolshagen.net/2010/03/scaling-up-your-virtualization-solution-on-8-socket-hp-proliant-servers/non-interleaved-memory-avg_response/"><img class="size-medium wp-image-151" title="Non-interleaved Memory Config" src="http://linux.sjolshagen.net/wp-content/uploads/2010/03/non-interleaved-memory-avg_response-300x192.png" alt="" width="300" height="192" /></a><p class="wp-caption-text">Fig. 2: Average Response Times - Non-interleaved Memory Config</p></div>
<div id="attachment_153" class="wp-caption aligncenter" style="width: 310px"><a href="http://linux.sjolshagen.net/wp-content/uploads/2010/03/Interleaved-memory-avg-response.png"><img class="size-medium wp-image-153" title="Average Response Times - Interleaved RAM" src="http://linux.sjolshagen.net/wp-content/uploads/2010/03/Interleaved-memory-avg-response-300x225.png" alt="" width="300" height="225" /></a><p class="wp-caption-text">Fig. 3: Average Response Times - Interleaved memory</p></div>
<p>I also used to have a hard time explaining how and why to tune the Linux kernel for these systems. For instance, I only suspected how little (none) tuning of the host platform is required in order to drive pretty significant numbers of guests  (98) in these environments &#8211; see Fig. 4. But, if you engage in some very minor tuning activities of the network stack, how those very same workload performance results can be extended even further (to 256 guests) – see Fig. 5:</p>
<div id="attachment_161" class="wp-caption aligncenter" style="width: 310px"><a href="http://linux.sjolshagen.net/wp-content/uploads/2010/03/forgot-to-tune-linearity-graph.png"><img class="size-medium wp-image-161" title="Default tuning for Host server" src="http://linux.sjolshagen.net/wp-content/uploads/2010/03/forgot-to-tune-linearity-graph-300x192.png" alt="" width="300" height="192" /></a><p class="wp-caption-text">Fig. 4: The system has not been tuned beyond it&#39;s &quot;out of the box&quot; state.</p></div>
<div id="attachment_163" class="wp-caption aligncenter" style="width: 310px"><a href="http://linux.sjolshagen.net/wp-content/uploads/2010/03/tuned-slice.png"><img class="size-medium wp-image-163" title="Fully tuned and linear scalability" src="http://linux.sjolshagen.net/wp-content/uploads/2010/03/tuned-slice-300x192.png" alt="" width="300" height="192" /></a><p class="wp-caption-text">Fig. 5: System is tuned and exhibiting linear scalability to 256 KVM guests</p></div>
<p>As part of a joint documentation effort with Red Hat, all of the data collected has been brought together in a <a href="https://inquiries.redhat.com/go/redhat/rhel-hp-proliant">Reference Architecture document  &#8211; “Scaling RHEL 5.4 + KVM up to 256 Guests&#8221;</a> available for free from Red Hat’s website.</p>
<p>We obviously picked the guest density to prove a point about the platform, however it’s worth mentioning that <strong><em>256 guests</em></strong> <strong><em>does not represent the upper bound for the platform</em></strong>. It only represents where we thought the density went (far) beyond what is reasonable to expect in a production environment this day in age.</p>
]]></content:encoded>
			<wfw:commentRss>http://linux.sjolshagen.net/2010/03/01/scaling-up-your-virtualization-solution-on-8-socket-hp-proliant-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>KVM/Qemu and caching of I/O</title>
		<link>http://linux.sjolshagen.net/2010/01/10/kvmqemu-and-caching-of-io/</link>
		<comments>http://linux.sjolshagen.net/2010/01/10/kvmqemu-and-caching-of-io/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 15:00:36 +0000</pubDate>
		<dc:creator>Thomas S</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[disk i/o]]></category>
		<category><![CDATA[KVM]]></category>
		<category><![CDATA[libvirt]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://linux.sjolshagen.net/?p=97</guid>
		<description><![CDATA[A feeble(ish) attempt at documenting the 'cache' properties for the Kernel Virtual Machine when managed by libvirtd.]]></description>
			<content:encoded><![CDATA[<p>I like to live &#8220;on the edge&#8221;. At least technologically speaking.</p>
<p>As a consequence, in my environment, I&#8217;ve got a couple of KVM guests that are running Fedora 12 with Red Hat Cluster v3.0.6 installed. That&#8217;s not really &#8220;living on the edge&#8221;. The &#8220;living on the edge&#8221; part of that configuration is that the two guests share a clustered file system. This clustered file system is hosted on a DRDB replicated volume between two standard internal SATA drives hosted on two different KVM host systems. And these host systems are, in turn their own Fedora 12 based Red Hat Cluster.</p>
<p>Obviously, there are plenty of opportunities for data to go &#8220;missing&#8221; (get corrupted/get lost/disappear/etc) in a configuration like this. And I thought I&#8217;d been able to eliminate them all.</p>
<p>That was what I thought, until I ran one of the KVM guests on one of the hosts, and the other on the other. My GFS2 file system wasn&#8217;t impressed! And I was stumped. DRBD had been configured with synchronous replication (let&#8217;s not talk about the performance impact of that decision, shall we&#8230;?) but obviously the data wasn&#8217;t being committed simultaneously to both drives<sup>[1]</sup>.</p>
<p>I now suspect that&#8217;s happening because the KVM hosts were caching the data on the guests behalf. Could be a very spiffy performance boost<sup>[2]</sup> but causes all sorts of problems for my clustered applications that rely on the data in the file system being where it&#8217;s supposed to be.</p>
<p>So, I had to dig around a little and discovered  that Qemu/KVM/libvirt actually supports setting the caching properties for the &#8216;physical&#8217; devices backing its virtual hard drives (i.e. the hard drives or container files exported to the guest as &#8220;disks&#8221;). And it&#8217;s &#8211; if you&#8217;re using the CLI interfaces for managing KVM, libvirtd &amp; virsh &#8211; fairly easy to set it to what you want/need it to be.</p>
<p>The caching properties you can set are:</p>
<ul>
<li>writeback</li>
<li>writethrough</li>
<li>none</li>
<li>default</li>
</ul>
<p>Unfortunately, I&#8217;ve not been able to locate some way to set this while creating the guest with virt-manager. However, virt-install does let you set it, and if the guest is inactive (i.e. not running), you can set it by editing the &lt;driver&gt; tag.</p>
<p>For example:</p>
<blockquote>
<pre>&lt;disk type='block' device='disk'&gt;
   &lt;driver name='qemu' cache='none'/&gt;
   &lt;source dev='/dev/mapper/sharedVG01-www--local'/&gt;
   &lt;target dev='vde' bus='virtio'/&gt;
&lt;/disk&gt;</pre>
</blockquote>
<div><span style="color: #800000;">NOTE: </span><span style="color: #000080;">Early versions of libvirtd may </span><strong><em><span style="color: #000080;">not</span></em><span style="font-weight: normal;"><span style="color: #000080;"> support the &lt;driver cache=&#8221;&gt; nomenclature.</span> I&#8217;m using 0.7.5 in my environment, but I believe any recent (0.7 and later, for sure) of libvirtd include support for this. To check your libvirt version, issue the command:</span></strong></div>
<blockquote>
<pre># virsh version</pre>
</blockquote>
<pre></pre>
<h3>Apropos:</h3>
<pre></pre>
<p>[1] = I know, I know. A DRBD mirror set up to use &#8220;protocol C&#8221; doesn&#8217;t, technically, commit the data simultaneously to both devices. It only &#8220;looks&#8221; like that because the write() operation does not return success until the data has been successfully written on the &#8220;remote&#8221; device as well as the local one.</p>
<p>[2] = It is, actually. As an example, the reason why the likes of Xen, KVM, etc have been able to post IO benchmarks that are more than 100% the performance of the underlying hardware is because the host environment caches the data on the guests behalf. Looks good on benchmarks. Not so much if your host fails before the data have been flushed from the host cache onto the physical disk devices. Applications tend to get cranky when data they &#8220;know&#8221; was committed to persistent storage is missing&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://linux.sjolshagen.net/2010/01/10/kvmqemu-and-caching-of-io/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

