KVM RAS Test Suite
==================
The KVM RAS Test Suite is a collection of test scripts for testing the
Linux kernel MCE processing features in KVM guest system.

Jan 26th, 2010

Jiajia Zheng


In the Package
----------------

Here is a short description of what is included in the package

host/*
	Contains host test scripts, which drive test procedure on host system.
guest/*
	Contains guest test scripts, which drive test procedure on guest system.

Dependencies
----------------

KVM RAS Test Suite has following dependencies on kernel and other tools:

* Linux Kernel:
  Version 2.6.32 or newer, with MCA high level handlers enabled.

* mce-inject
  A tool to inject mce error into kernel

* page-types:
  A tool to query page types, which is accompanied with Linux kernel
  source (2.6.32 or newer, $KERNEL_SRC/Documentation/vm/page-types.c).

* simple_process:
  A process constantly access the allocated memeory. (../tools/simple_process)

* kpartx:
  A tool to list partition mappings from partition tables

* (optionally) losetup
  A tool to set up and control loop devices

* (optionally) pvdisplay/vgchange:
  A tool to display/change physical volume attribute

* (optionally) ssh-keygen
  A tool to provide the authentication key generation, management and conversion

Test method
---------------
- Start a process in the guest OS, get a virtual address from guest OS

- Translate this address untill we get the physical address on the host OS

- Software injects an SRAO MCE at that physical address from host OS

- (optionally) Write to the address from the guest, i.e attempt to
  write to the poisoned page from the guest.

the expected result:

HOST system dmesg:

...
Machine check injector initialized
Triggering MCE exception on CPU 0
Disabling lock debugging due to kernel taint
[Hardware Error]: Machine check events logged
MCE exception done on CPU 0
MCE 0x806324: Killing qemu-system-x86:8829 early due to hardware memory corruption
MCE 0x806324: dirty LRU page recovery: Recovered
...


GUEST system dmesg:
...
[Hardware Error]: Machine check events logged
MCE 0x75925: Killing simple_process:2273 early due to hardware memory corruption
MCE 0x75925: dirty LRU page recovery : Recovered
...


Installation
---------------
1. Build *host* kernel with
	CONFIG_KVM=y
	CONFIG_KVM_INTEL=y
	CONFIG_X86_MCE=y
	CONFIG_X86_MCE_INTEL=y
	CONFIG_X86_MCE_INJECT=y or CONFIG_X86_MCE_INJECT=m

   and following config both on *host* and *guest*
	CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
	CONFIG_MEMORY_FAILURE=y

   NOTE: if the host machine doesn't support software error recovery
   (MCG_SER_P in IA32_MCG_CAP[24]), please apply the patch fake_ser_p.patch
   under ./patches/
2. Use ssh-keygen to generate public and privite keys on the host OS,
   and copy id_rsa and id_rsa.pub from ~/.ssh/ into the testing directory
   on the host system.
3. compile and install qemu-kvm
   the qemu-kvm source can be got from git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git.
   Before compile qemu-kvm, a patch p2v.patch should be applied. This patch
   is located under ./patches/
   Please ensure *SDL-devel* library is installed on the host machine, otherwise
   only VNC can be used substituing local graphic output.
4. install the guest OS on the qemu
   e.g. 
   step 1: qemu-img create -f qcow2 test.img 10G
   step 2: qemu-system-x86_64 -hda ./test.img -m 2048 -cdrom rhel6.iso -boot d
   after the installation, please be sure to execute the following check:
     a) add necessary command line parameters under your boot item.
        This is to enable console output redirection to the serial.
        e.g.
          before:
        	title Red Hat Enterprise Linux Server (2.6.32kvm)
        		root (hd0,1)
        		kernel /boot/vmlinuz-2.6.32kvm ro root=/dev/sda1
        		initrd /boot/initramfs-2.6.32kvm.img
             after:
        	title Red Hat Enterprise Linux Server (2.6.32kvm)
        		root (hd0,1)
        		kernel /boot/vmlinuz-2.6.32kvm ro root=/dev/sda1 console=tty0 console=ttyS0,115200n8
        		initrd /boot/initramfs-2.6.32kvm.img
     b) DHCP guest ethernet card. This operation is to ensure the network connection
        is OK.
        e.g.
          bash> dhclient eth0
     c) enable SSH public/private key authorization, otherwise, the SSH connection password
        is necessary to be provided in the test progress.
        please check /etc/ssh/ssd_config and ensure related options are opened.
        e.g.
          RSAAuthentication yes
          PubkeyAuthentication yes
        after the related options are opened, please restart ssh service
        e.g.
          bash> service sshd restart


Start Testing
---------------
Run testing by
	./host_run.sh <option> <argument>
You can get the help information by
	./host_run.sh -h

