Xen Debugging
Contents
Debugging Xen hypervisor with kdb
Requirements
Oracle VM 2.2 server. Download from Oracle E-Delivery.
A debugging host. Whether Linux or Windows.
Connect Oracle VM server and debugging server with a rs232 connection wire.
Boot debugger enabled hypervisor on Orale VM server.
Create new grub boot entry and reboot with this entry:
title Oracle VM Server-ovs serial console (xen-64-3.4.0 2.6.18-128.2.1.4.3.el5ovs) debug root (hd0,4) kernel /boot/xen-64bit-debug.gz console=com1,vga com1=57600,8n1 dom0_mem=543M module /boot/vmlinuz-2.6.18-128.2.1.4.3.el5xen ro root=LABEL=/ console=tty0 console=ttyS0,57600n8 module /boot/initrd-2.6.18-128.2.1.4.3.el5xen.imgOn the debugging host (take Oracle Enterprise Linux for example), start Minicom:
# minicom
The Minicom config:
# cat /etc/minirc.dfl # Machine-generated file - use "minicom -s" to change parameters. pr port /dev/ttyS0 pu baudrate 57600 pu rtscts No
If you are using Windows as the debugging host, you can use HyperTerminal instead.
Debugging
Do the following tests on the serial console in the debugging host.
Switch serial console input:
(XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0) (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
Note
Minicom captures CTRL-a. To input a CTRL-a, You need to press twice CTRL-a. So you need to press six times of CTRL-a to switch the input.
Type CTRL-\ to break into kdb:
Enter kdb (cpu:0 reason:1 vcpu=0 domid:32767 eflg:0x246 irqs:1) ffff828c8017a922: acpi_safe_halt+2 ret [0]xkdb>
Some testing output:
[0]xkdb> h - ccpu is current cpu - following are always output in decimal: vcpu num, cpu num, domid - otherwise, almost all numbers output are in hex - all input in hex, unless indicated in usage - output: z17 means decimal 17 - domid 7fff(z32767) refers to hypervisor - if no domid before function name, then it's hypervisor - earlykdb in xen grub line to break into kdb during boot info : basic info/constants... f [vcpu ptr] : Display stack frames fg domid ipaddr(eip) spaddr(esp) : Display stack given ip and sp for guest dw <vaddr|sym>[num(dec)][domid] : Display word dd <vaddr|sym>[num(dec)][domid] : Display dword dwm <maddr|sym>[num(dec)] : Display machine word ddm <maddr|sym>[num(dec)] : Display machine dword dr [sp] : Display [special]Registers drg : Display guest/stack registers dis [addr|sym][num][0xdomid] : Disassemble dism : toggle Intel/ATT modes mw <vaddr|sym><val>[domid] : Modify Mem Word md <vaddr|sym><val>[domid] : Modify Mem DWord mr <reg><val> : Modify Register bc <num|all> : brkpt delete bp [addr|sym][0xdomid] : brkpt list/set(on all cpus) wp [addr|sym][w|i] : watchpoint list/set(on all cpus) wc <num|all> : watchpoint delete ni : next instr after call ss : Single Step ssb : Single Step to branch go : Continue Execution cpu [all|num] : Switch CPU nmi <cpu|all> : send nmi to cpu/s sym sym ? for usage : Load guest symbols vcpuh vcpu-ptr : Display hvm_vcpu{} vcpu [all|ptr] : Display vcpu/s dom [all|0xdomid] : Display dom/s sched : scheduler info mmu : Basic mmu info p2m 0xdomid 0xgpfn : gpfn to mfn m2p 0xmfn : mfn to pfn dpage mfn|page-ptr : Display page info dtrq : Dump timer queues didt : Dump IDT current table dgdt : Dump GDT table dirq : Dump IRQs bindings dvmc [0xdomid][0xvcpuid] : Dump vmcs/vmcb trcon : turn tracing on trcoff : turn tracing off trcz : zero entire trace buffer trcp : hints to print trace buffer via dd cmd usr1 : User defined cmd kdbf : Display kdb stack frames kdbdbg : toggle kdb debug reboot : Reboot h : Help [0]xkdb> dom 5 DOMAIN : domid:0x0005 ptr:0xffff83006f6de000 pgalk: 0001 4095 0 pglist: 0xffff828400df8ac0 0xffff82840041fd20 xpglist: 0xffff828400df8380 0xffff828400df8ca0 PAGES: tot:0x0003e800 max:0x0003e800 xenheap:0x00000005 next:0xffff830020fe6000 hashnext:0x0000000000000000 rangesets: nxt:0xffff8300600cbad0 prev:0xffff8300600cba10 lk: 0001 4095 0 Evt: MAX_EVTCHNS:$1024 ptr:ffff83006f6de078 pollmsk:00000000 lk: 0001 4095 0 &evtchn_pending:ffff83006fc65800 &evtchn_mask:ffff83006fc65880 chn: 0 st:1 _consumr=0 ntfy_:0 pend:0 mask:1 chn: 1 st:3 _consumr=0 ntfy_:0 rem-port:$26 domid:0 pend:0 mask:0 chn: 2 st:3 _consumr=0 ntfy_:0 rem-port:$27 domid:0 pend:0 mask:0 chn: 3 st:5 _consumr=0 ntfy_:0 virq:$0 pend:1 mask:0 chn: 4 st:6 _consumr=0 ntfy_:0 pend:0 mask:0 chn: 5 st:6 _consumr=0 ntfy_:0 pend:0 mask:0 chn: 6 st:6 _consumr=0 ntfy_:1 pend:0 mask:0 chn: 7 st:6 _consumr=0 ntfy_:1 pend:0 mask:0 chn: 8 st:5 _consumr=0 ntfy_:1 virq:$0 pend:1 mask:0 chn: 9 st:3 _consumr=0 ntfy_:0 rem-port:$28 domid:0 pend:0 mask:0 chn: 10 st:3 _consumr=0 ntfy_:0 rem-port:$29 domid:0 pend:0 mask:0 chn: 11 st:3 _consumr=0 ntfy_:0 rem-port:$30 domid:0 pend:0 mask:0 chn: 12 st:3 _consumr=0 ntfy_:0 rem-port:$36 domid:0 pend:0 mask:0 Grant table: gp:0xffff83001d905e00 nr_frames:0x00000004 shpp:0xffff83001d905e40 active:0xffff83001d8ed840 maptrk:0xffff8300600cb160 maphd:0x00000000 maplmt:0x00000200 mapcnt:mapcnt: lk: 0001 4095 0 hvm:0 priv:0 dbg:1 dying:0 paused:1 shutdown: lk: 0001 4095 0 shutn:0 shut:0 code:0 pausecnt:0x00000001 vm_assist:0x000000000000000f refcnt:0x0000007b cpumask:0 shared == vcpu_info[]: ffff83006fc65000 arch_shared: maxpfn: 3f000 pfn-mfn-frame-ll mfn: 6fd84 arch_domain at : ffff83006f6de580 pt:0xffff83006f6d2000 l2:0xffff83006fc68000 l3:0xffff83006fc66000 ioport:0xffff8300600cba10 &hvm_dom:0xffff83006f6de5c0 &pging_dom:ffff83006f6df128 mode:0 disabled p2m ptr:ffff8300600cb970 pages:{0000000000000000, 0000000000000000} max_mapped_pfn:0000000000000000 &alloc_page:ffff8300600cb990 phys_table.pfn:0000000000000000 physaddr_bitsz:0 32bit_pv:1 has_32bit_shinfo:1 sched:0xffff8300600cbb30 &handle:0xffff83006f6dfe08 vcpu ptrs: 0:ffff83006f6fe000 1:ffff83006fc3a000 [0]xkdb> cpu all [0]ffff828c8017a922: acpi_safe_halt+2 ret [1]ffff828c8017a922: acpi_safe_halt+2 ret [0]xkdb> dr (XEN) ----[ Xen-3.4.0 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff828c8017a922>] acpi_safe_halt+0x2/0x10 (XEN) RFLAGS: 0000000000000246 CONTEXT: hypervisor (XEN) rax: 0000000000000003 rbx: 00000000008e026c rcx: 0000000000000001 (XEN) rdx: 0000000000000808 rsi: 0000000004d2b4e8 rdi: ffff83007f2c2460 (XEN) rbp: ffff83007f2c2400 rsp: ffff828c802d7ec0 r8: 00000000000002b9 (XEN) r9: 0000000000000002 r10: 0000000000000000 r11: ffff828c8031a3e0 (XEN) r12: ffff83007f2c2460 r13: 00000b2a9fba595a r14: 0000000000000000 (XEN) r15: ffff828c8024d100 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000020fc0000 cr2: 00000000b7f44000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 [0]xkdb> bp schedule BP set for domid:32767 at: 0xffff828c801178b0 schedule+0 [0]xkdb> go Breakpoint on cpu 0 at 0xffff828c801178b0 ffff828c801178b0: schedule+0 subq $0x78, %rsp [0]xkdb> ss ffff828c801178b4: schedule+4 mov %rbx, 0x48(%rsp) [0]xkdb> bc all Deleted breakpoint [0] addr:0xffff828c801178b0 domid:32767 [0]xkdb> cpu 1 Switching to cpu:1 [1]xkdb> [1]xkdb> f (XEN) Xen call trace: (XEN) [<ffff828c80117a13>] schedule+0x163/0x3a0 (XEN) [<ffff828c80118968>] do_softirq+0x58/0x80 (XEN) [<ffff828c8013fedc>] idle_loop+0x4c/0xa0 [1]xkdb> go
Debugging Xen guest with gdbsx
Requirements
Oracle VM 2.2 server. Download from Oracle E-Delivery.
A debugging host (should be Linux).
Oracle Enterprise Linux template OVM_EL5U3_X86_PVM_4GB.
Download from Oracle E-Delivery.
Oracle Enterprise Linux kernel-debug rpm packages:
- kernel-xen-debuginfo-2.6.18-128.0.0.0.2.el5.i686.rpm
- kernel-debuginfo-common-2.6.18-128.0.0.0.2.el5.i686.rpm
Download from: http://oss.oracle.com/el5/debuginfo/
Oracle Enterprise Linux kernel source code:
- kernel-2.6.18-128.0.0.0.2.el5.src.rpm
Download from: http://oss.oracle.com/el5/SRPMS-updates/
For multiple VCPU debugging, Edit OVM_EL5U3_X86_PVM_4GB/vm.cfg, modify the guest to use two VCPUs:
vcpus = 2
On Oracle VM server, start OVM_EL5U3_X86_PVM_4GB:
# xm create OVM_EL5U3_X86_PVM_4GB/vm.cfg
Stop iptables service of Oracle VM server:
# service iptables stop
On the debugging host, first ensure gdb version 6.5-16+ is installed:
$ gdb --version GNU gdb Fedora (6.8-37.el5)
Prepare the kernel source:
$ rpm -ivh kernel-2.6.18-128.0.0.0.2.el5.src.rpm $ rpmbuild -bp ~/rpmbuild/SPECS/kernel-2.6.spec $ mv ~/rpmbuild/BUILD/kernel-2.6.18 /share/tmp/pkg/debug/
To debug a x86 Linux OS, you may need to do an extra setp:
$ cd /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686/include/ $ ln -s asm-i386 asm
Prepare the debugging kernel image:
$ cd /share/tmp/pkg/debug/ $ rpm2cpio kernel-xen-debuginfo-2.6.18-128.0.0.0.2.el5.i686.rpm | cpio -iumd $ rpm2cpio kernel-debuginfo-common-2.6.18-128.0.0.0.2.el5.i686.rpm | cpio -iumd
Debugging on Oracle VM server
VM status:
# xm list Name ID Mem VCPUs State Time(s) Domain-0 0 487 2 r----- 478.2 OVM_EL5U3_X86_PVM_4GB 5 1000 2 -b---- 17.9
gdbsx usage:
# gdbsx Usage 1: gdbsx -a domid <32|64> PORT [-d] PORT to listen for a TCP connection. Eg. gdbsx -a 3 32 9999 Usage 2: gdbsx -c domid <32|64> [vcpu#] [-d] to dump vcpu context(s) for given domidDisplays VCPU contexts:
# gdbsx -c 5 32 ===> Context for DOMID:5 --> VCPU:0 eip:c04013a7 esp:c06effc0 flags:00001246 eax:00000000 ebx:00000001 ecx:00000000 edx:00000000 esi:00000001 edi:00000000 ebp:c062780f cs:61 ds:7b fs:0 gs:0 Call Trace: [c04013a7] [c0408664] [c17ee6c8] [ffffffff] [c040321a] [c0403339] [c17ee6c8] [c071fc64] [c0627831] [c06f49f5] [c0765800] --> VCPU:1 eip:c04013a7 esp:c0e44f9c flags:00001246 eax:00000000 ebx:00000001 ecx:00000000 edx:00000000 esi:00000001 edi:00000001 ebp:00000000 cs:61 ds:7b fs:0 gs:0 Call Trace: [c04013a7] [c0408664] [ffffffff] [c040321a] [c0403339] [2d6e6578] [2d302e33] [5f363878] [6fc65000] [c0e48000] [c0d4b000]
Remote debugging
Start gdb server on Oracle VM server to debug guest OVM_EL5U3_X86_PVM_4GB:
# gdbsx -a 5 32 9999
Connect to the above gdb server on the debugging host:
$ gdb (gdb) file /share/tmp/pkg/debug/usr/lib/debug/lib/modules/2.6.18-128.0.0.0.2.el5xen/vmlinux Reading symbols from /share/tmp/pkg/debug/usr/lib/debug/lib/modules/2.6.18-128.0.0.0.2.el5xen/vmlinux...done. (gdb) dir /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686/ Source directories searched: /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686:$cdir:$cwd (gdb) dir /share/tmp/pkg/debug/usr/src/debug/kernel-2.6.18/linux-2.6.18.i686/ Source directories searched: /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686: /share/tmp/pkg/debug/usr/src/debug/kernel-2.6.18/linux-2.6.18.i686:$cdir:$cwd (gdb) target remote 10.182.120.209:9999 Remote debugging using 10.182.120.209:9999 [New Thread 0] [Switching to Thread 0] 0xc04013a7 in hypercall_page ()
Debugging multiple VCPUs.
Set below for single step of correct VCPU:
(gdb) set scheduler-locking on
Since gdb is not kernel debugger, VCPUs are emulated via threads Thus info threads will show all VCPUs:
(gdb) info thread [New Thread 1] 2 Thread 1 0xc04013a7 in hypercall_page () * 1 Thread 0 0xc04013a7 in hypercall_page ()
And switch thread to get to another VCPU. Remember, gdb has it's own thread id, off by 1.
Back trace:
(gdb) bt #0 0xc04013a7 in hypercall_page () #1 0xc0408664 in raw_safe_halt () at include/asm/mach-xen/asm/hypercall.h:197 #2 0xc040321a in xen_idle () at arch/i386/kernel/process-xen.c:109 #3 0xc0403339 in cpu_idle () at arch/i386/kernel/process-xen.c:161 #4 0xc06f49f5 in start_kernel () at init/main.c:618 #5 0xc040006f in startup_32 ()
Using gdb macros
Here are some useful gdb macros to ease Linux kernel debugging (gdbmacros):
define ps
dont-repeat
# 4 for 32bit kernels. 8 for 64bit kernels.
set $sz = sizeof(long)
set $tasks = (struct list_head *)init_task->tasks
set $offset = (char *)&init_task->tasks - (char *)&init_task
set $task = $tasks
set $task_entry = (struct task_struct *)((char *)$task - $offset)
if ($sz == 4)
printf "Pointer PID Command\n"
else
printf "Pointer PID Command\n"
end
printf "0x%-12lx%-9d%s\n", $task_entry, $task_entry->pid, $task_entry->comm
set $task = $task->next
while $task != $tasks
set $task_addr = (char *)$task - $offset
set $task_entry = (struct task_struct *)((char *)$task - $offset)
if ($task_entry->pid) != 0
printf "0x%-12lx%-9d%s\n", $task_entry, $task_entry->pid, $task_entry->comm
if ($sz == 4)
printf "0x%-12lx%-9d%s\n", $task_entry, $task_entry->pid, $task_entry->comm
else
printf "0x%-20lx%-9d%s\n", $task_entry, $task_entry->pid, $task_entry->comm
end
end
set $task = $task->next
end
end
document ps
Report a snapshot of the current processes.
end
define lsmod
dont-repeat
# 4 for 32bit kernels. 8 for 64bit kernels.
set $sz = sizeof(long)
set $mod = (struct list_head *)modules
if ($sz == 4)
printf "Pointer Address Name\n"
else
printf "Pointer Address Name\n"
end
while 1
set $mod_entry = (struct module *)((char *)$mod - $sz)
if ($sz == 4)
printf "0x%-12lx0x%-12lx%s\n", $mod_entry, $mod_entry->module_core, $mod_entry->name
else
printf "0x%-20lx0x%-20lx%s\n", $mod_entry, $mod_entry->module_core, $mod_entry->name
end
set $mod = $mod->next
if ($mod == &modules)
loop_break
end
end
end
document lsmod
Show the status of modules in the Linux kernel.
end
define log
dont-repeat
printf "%s", log_buf
end
document log
Dump system message buffer.
endTo use these macros, in the debugging session:
(gdb) source gdbmacros (gdb) help ps Report a snapshot of the current processes. (gdb) ps Pointer PID Command 0xc0d5daa0 1 init 0xc0d5d000 2 migration/0 ... (gdb) lsmod Pointer Address Name 0xe1234c00 0xe1231000 xenblk 0xe130b200 0xe12fc000 dm_raid45 0xe120d900 0xe120d000 dm_message ... (gdb) log <5>Linux version 2.6.18-128.0.0.0.2.el5xen (mockbuild@ca-build10.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Jan 21 05:49:36 EST 2009 <6>BIOS-provided physical RAM map: <4> Xen: 0000000000000000 - 0000000020800000 (usable) <5>0MB HIGHMEM available. <5>520MB LOWMEM available. <4>Using x86 segment limits to approximate NX protection ...
Using gdb init file
You can put all the above gdb commands in a init file and put to ~/.gdbinit or $cwd/.gdbinit:
file /share/tmp/pkg/debug/usr/lib/debug/lib/modules/2.6.18-128.0.0.0.2.el5xen/vmlinux dir /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686/ dir /share/tmp/pkg/debug/usr/src/debug/kernel-2.6.18/linux-2.6.18.i686/ source /share/data/docs/ovm/xen-debugging/gdb/gdbmacros target remote 10.182.120.209:9999 set scheduler-locking on
Then invoke gdb with:
$ gdb
Debugging crash dumps
Get the guest core dump:
# xm dump-core OVM_EL5U3_X86_PVM_4GB core-OVM_EL5U3_X86_PVM_4GB
Invoking crash to debug the core:
$ crash vmlinux core-OVM_EL5U3_X86_PVM_4GB KERNEL: vmlinux DUMPFILE: core-OVM_EL5U3_X86_PVM_4GB CPUS: 2 DATE: Tue Oct 13 08:15:15 2009 UPTIME: 00:08:52 LOAD AVERAGE: 0.00, 0.04, 0.03 TASKS: 41 NODENAME: localhost.localdomain RELEASE: 2.6.18-128.0.0.0.2.el5xen VERSION: #1 SMP Wed Jan 21 05:49:36 EST 2009 MACHINE: i686 (2127 Mhz) MEMORY: 520 MB PANIC: "" PID: 0 COMMAND: "swapper" TASK: c06762c0 (1 of 2) [THREAD_INFO: c06ef000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) WARNING: panic task not found crash> help * files mod runq union alias foreach mount search vm ascii fuser net set vtop bt gdb p sig waitq btop help ps struct whatis dev irq pte swap wr dis kmem ptob sym q eval list ptov sys exit log rd task extend mach repeat timer crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM > 0 0 0 c06762c0 RU 0.0 0 0 [swapper] > 0 1 1 c0d5d550 RU 0.0 0 0 [swapper] 1 0 0 c0d5daa0 IN 0.1 2080 604 init 2 1 0 c0d5d000 IN 0.0 0 0 [migration/0] 3 1 0 c0d54aa0 IN 0.0 0 0 [ksoftirqd/0] ... crash> mod MODULE NAME SIZE OBJECT FILE e1206b80 scsi_dh 11713 (not loaded) [CONFIG_KALLSYMS] e120a500 dm_mem_cache 10049 (not loaded) [CONFIG_KALLSYMS] e120d900 dm_message 6977 (not loaded) [CONFIG_KALLSYMS] ...
Xen Trace
Get trace data:
# xentrace -D -S 256 -T 1 trace.raw
Analyze trace date using xentrace_format:
# cat trace.raw | xentrace_format formats >trace.txt
Analyze trace date using xenalyze:
# xenalyze --cpu-hz=2.0G -s trace.raw >trace.summary ... ] 1.000744383 -x vmexit exit_reason EXCEPTION_NMI eip fffff80001029795 ] 1.000744383 -x mmio_assist w gpa fee000b0 data 0 ] 1.000745561 x- vmexit exit_reason EXCEPTION_NMI eip fffff8000102ff73 ] 1.000745561 x- mmio_assist w gpa fee000b0 data 0 vlapic eoi ] 1.000745561 x- fast mmio va fffffffffffe00b0 ] 1.000748415 -x vmentry ] 1.000748692 x- vmentry ] 1.000749948 -x vmexit exit_reason HLT eip fffffadffab2fb41 ] 1.000749948 -x hlt [ 0 0 a0028006 10dced2d ] ] 1.000750155 x- vmexit exit_reason HLT eip fffffadffab2fb41 ] 1.000750155 x- hlt [ 0 0 a0028006 10dcef1c ]
Use xenctx to show domain info:
# xenctx --symbol-table=/share/tmp/pkg/debug/boot/System.map-2.6.18-128.0.0.0.2.el5xen --all 1 cs:eip: 0061:c04013a7 hypercall_page+0x3a7 flags: 00001246 i z p ss:esp: 0069:c06effc0 eax: 00000000 ebx: 00000001 ecx: 00000000 edx: 00000000 esi: 00000001 edi: 00000000 ebp: c062780f ds: 007b es: 007b fs: 0000 gs: 0000 cr0: 8005003b cr2: b7f7e000 cr3: 3c6fd000 cr4: 00002660 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: ffff0ff0 dr7: 00000400 Code (instr addr c04013a7) cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 1d 00 00 00 cd 82 <c3> cc cc cc cc cc cc cc cc cc cc Stack: c0408664 c141c6c8 ffffffff c040321a c0403339 c141c6c8 c071fc64 c0627831 c06f49f5 00004b64 c0765800 01020800 c0dcb000 00000000 00000000 c040006f Call Trace: [<c04013a7>] hypercall_page+0x3a7 <-- [<c0408664>] raw_safe_halt+0x8c [<c040321a>] xen_idle+0x22 [<c0403339>] cpu_idle+0x91 [<c06f49f5>] start_kernel+0x37a
Use xen-hvmctx to show HVM guest info:
# xen-hvmctx 2 HVM save record for domain 2 Entry 0: type 1 instance 0, length 24 Header: magic 0x54381286, version 1 Xen changeset ffffffffffffffff CPUID[0][%eax] 0x000006f2 Entry 1: type 2 instance 0, length 1016 CPU: rax 0x0000000000000000 rbx 0x0000000000000000 rcx 0x00000000c0403bb0 rdx 0x00000000c06f0000 rbp 0x00000000c0627743 rsi 0x00000000c072dee4 rdi 0x00000000c0627765 rsp 0x00000000c06f0fd8 r8 0x0000000000000000 r9 0x0000000000000000 r10 0x0000000000000000 r11 0x0000000000000000 r12 0x0000000000000000 r13 0x0000000000000000 r14 0x0000000000000000 r15 0x0000000000000000 rip 0x00000000c0403be1 rflags 0x0000000000000246 cr0 0x000000008005003b cr2 0x0000000000dd09e0 cr3 0x000000001e93d000 cr4 0x00000000000006d0 dr0 0x0000000000000000 dr1 0x0000000000000000 dr2 0x0000000000000000 dr3 0x0000000000000000 dr6 0x00000000ffff0ff0 dr7 0x0000000000000400 cs 0x00000060 (0x0000000000000000 + 0xffffffff / 0x00c9b) ds 0x0000007b (0x0000000000000000 + 0xffffffff / 0x00cf3) es 0x0000007b (0x0000000000000000 + 0xffffffff / 0x00cf3) fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00) gs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00) ss 0x00000068 (0x0000000000000000 + 0xffffffff / 0x00c93) tr 0x00000080 (0x00000000c1401c80 + 0x00002073 / 0x0008b) ldtr 0x00000088 (0x00000000c0732020 + 0x00000027 / 0x00082) itdr (0x00000000c06e2000 + 0x000007ff) gdtr (0x00000000c1410000 + 0x000000ff) ...
Send debug keys to Xen by xm debug-key
To show installed key handlers:
# xm debug-key h # xm dmesg ... (XEN) 'h' pressed -> showing installed handlers (XEN) key '%' (ascii '25') => Trap to xendbg (XEN) key '0' (ascii '30') => dump Dom0 registers (XEN) key 'C' (ascii '43') => trigger a crashdump (XEN) key 'H' (ascii '48') => dump heap info (XEN) key 'N' (ascii '4e') => NMI statistics (XEN) key 'Q' (ascii '51') => dump PCI devices (XEN) key 'R' (ascii '52') => reboot machine (XEN) key 'a' (ascii '61') => dump timer queues (XEN) key 'c' (ascii '63') => dump cx structures (XEN) key 'd' (ascii '64') => dump registers (XEN) key 'e' (ascii '65') => dump evtchn info (XEN) key 'h' (ascii '68') => show this message (XEN) key 'i' (ascii '69') => dump interrupt bindings (XEN) key 'm' (ascii '6d') => memory info (XEN) key 'n' (ascii '6e') => trigger an NMI (XEN) key 'q' (ascii '71') => dump domain (and guest debug) info (XEN) key 'r' (ascii '72') => dump run queues (XEN) key 't' (ascii '74') => display multi-cpu clock info (XEN) key 'u' (ascii '75') => dump numa info (XEN) key 'v' (ascii '76') => dump Intel's VMCS (XEN) key 'z' (ascii '7a') => print ioapic info
To dump domain (and guest debug) info:
# xm debug-key q # xm dmesg ... (XEN) 'q' pressed -> dumping domain info (now=0x40B9:7BDAEE7D) (XEN) General information for domain 0: (XEN) refcnt=3 nr_pages=139008 xenheap_pages=5 dirty_cpus={1} (XEN) handle=00000000-0000-0000-0000-000000000000 vm_assist=0000000f (XEN) Rangesets belonging to domain 0: (XEN) Interrupts { 0-255 } (XEN) I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff } (XEN) I/O Ports { 0-1f, 22-3f, 44-60, 62-9f, a2-ffff } (XEN) Memory pages belonging to domain 0: (XEN) DomPage list too long to display (XEN) XenPage 000000007f3f7000: mfn=000000000007f3f7, caf=80000002, taf=00000000e8000002 (XEN) XenPage 000000007f3f6000: mfn=000000000007f3f6, caf=80000001, taf=00000000e8000001 (XEN) XenPage 000000007f3f5000: mfn=000000000007f3f5, caf=80000001, taf=00000000e8000001 (XEN) XenPage 000000007f3f4000: mfn=000000000007f3f4, caf=80000001, taf=00000000e8000001 (XEN) XenPage 000000007f3f1000: mfn=000000000007f3f1, caf=80000002, taf=00000000e8000002 (XEN) VCPU information and callbacks for domain 0: (XEN) VCPU0: CPU1 [has=T] flags=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={1} cpu_affinity={0-31} (XEN) 250 Hz periodic timer (period 4 ms) (XEN) Notifying guest (virq 1, port 0, stat 0/-1/0) (XEN) VCPU1: CPU0 [has=F] flags=1 upcall_pend = 00, upcall_mask = 00 dirty_cpus={} cpu_affinity={0-31} (XEN) 250 Hz periodic timer (period 4 ms) (XEN) Notifying guest (virq 1, port 0, stat 0/-1/0) (XEN) General information for domain 1: (XEN) refcnt=17652 nr_pages=0 xenheap_pages=0 dirty_cpus={} (XEN) handle=da6cc464-0ecd-7bea-3704-de76da53a771 vm_assist=0000000f (XEN) Rangesets belonging to domain 1: (XEN) Interrupts { } (XEN) I/O Memory { } (XEN) I/O Ports { }
Xen kexec and kdump
Note
I didn't get kdump to work under Oracle VM 2.2. But it does work under Oracle VM 2.1.5. However the vmcore cannot be analysed by crash.
Xen kdump is work on EL5U3 and the vmcore can be analysed by crash.
Enable crash dump of Xen hypervisor and dom0 kernel:
title Oracle VM Server-ovs serial console (xen-64-3.4.0 2.6.18-128.2.1.4.3.el5ovs) debug root (hd0,4) kernel /boot/xen-64bit-debug.gz console=com1,vga com1=57600,8n1 dom0_mem=543M crashkernel=128M@16M module /boot/vmlinuz-2.6.18-128.2.1.4.3.el5xen ro root=LABEL=/ console=tty0 console=ttyS0,57600n8 module /boot/initrd-2.6.18-128.2.1.4.3.el5xen.imgStart kdump service and mark it as autostart at system boot:
# service kdump start # chkconfi --level 2345 kdump on
Reboot and trigger a dom0 panic by press the key combo ALT-SysRq-c or by issuing the following command by root:
# echo "c" >/proc/sysrq-trigger
The vmcore will be available at /var/crash/.
Debug the vmcore using crash:
$ crash boot/xen-syms-2.6.18-128.el5 usr/lib/debug/boot/xen-syms-2.6.18-128.el5.debug core-EL5U3 KERNEL: boot/xen-syms-2.6.18-128.el5 DEBUGINFO: usr/lib/debug/boot/xen-syms-2.6.18-128.el5.debug DUMPFILE: core-EL5U3 CPUS: 2 DOMAINS: 4 UPTIME: 00:02:44 MACHINE: Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz (2128 Mhz) MEMORY: 2 GB PCPU-ID: 0 PCPU: ff1d3fb4 VCPU-ID: 0 VCPU: ffbe7080 (VCPU_RUNNING) DOMAIN-ID: 0 DOMAIN: ffbf0080 (DOMAIN_RUNNING) STATE: CRASH crash> help * dumpinfo log search vcpus alias eval p set whatis ascii exit pcpus struct wr bt extend pte sym q dis gdb rd sys domain help repeat union doms list sched vcpu crash version: 4.1.0 gdb version: 6.1 For help on any command above, enter "help <command>". For help on input options, enter "help input". For help on output options, enter "help output". crash> list list: starting address required Usage: list [[-o] offset] [-e end] [-s struct[.member[,member]]] [-H] start Enter "help list" for details. crash> doms DID DOMAIN ST T MAXPAGE TOTPAGE VCPU SHARED_I P2M_MFN 32753 ff21c080 RU O 0 0 0 0 ---- 32754 ff1c8080 RU X 0 0 0 0 ---- 32767 ff214080 RU I 0 0 2 0 ---- >* 0 ffbf0080 RU 0 ffffffff 67100 2 ffbec000 3e308 crash> pcpus PCID PCPU CUR-VCPU TSS * 0 ff1d3fb4 ffbe7080 ff1fa180 1 ff21bfb4 ffbe6080 ff1fa200 crash> vcpus VCID PCID VCPU ST T DOMID DOMAIN 0 0 ffbfd080 RU I 32767 ff214080 1 1 ff1cc080 RU I 32767 ff214080 >* 0 0 ffbe7080 RU 0 0 ffbf0080 > 1 1 ffbe6080 RU 0 0 ffbf0080
Xend Debugging
- Xend is written in Python and can be debugged by pdb.
- All user space debugging techniques are applied.
- Refer to Xend debugging.
Xend tracing/debugging/profiling
Export necessary system variables:
# export XEND_DEBUG=1 # export XEND_DAEMONIZE=0
Start xend tracing:
# /usr/sbin/xend trace_start
The trace file will be available at /var/log/xen/xend.trace.
Debug xend using the Python Debugger:
# python -m pdb /usr/sbin/xend trace_start
You'll drop to a debugging shell. You can set breakpoints and continue.
Profile xend using the Python Profiler:
# python -m profile /usr/sbin/xend start >xend.profile
To generate the profile result:
# kill -INT <pid of xend> # kill -USR1 <pid of xend>
The profiling log is available at xend.profile.
Troubleshooting
Compile Xen with debug=y
When xen is compiled with debug=y, it basically turns on lots of ASSERTs, etc.. When we say debug version in OVM, we've built xen with gdbsx=y kdb=y but not debug=y. More work is needed to make debug=y work with kdb since lot of data structs are different. If you really wanted debug=y, then you could compile gdbsx=y debug=y without kdb.
Capture Xen hypervisor log to filesystem
Start xenconsoled with option --log=hv. The log will be: <log_dir>/hypervisor.log where <log_dir> defaults to /var/log/xen/console, but can be overridden with command-line option --log-dir.
You can leverage this shell (xen-console-trace.sh):
#!/bin/sh /etc/init.d/xend stop killall -9 xenconsoled mkdir -p /var/log/xen/console export XENCONSOLED_TRACE=hv /etc/init.d/xend start
Reference
- Xen project home.
- Oracle VM.
- Oracle Enterprise Linux.
- Xen debuggers source repository.
- Xend debugging.
- Xenalyze source repository.
- Debugging Xen and Xen guests presentations on Xen Summit 2008 by Mukesh Rathor.
- Xen kexec.
- Xen port of kexec/kdump.
- Minicom.
- Crash.
- GDB.
- Xen serial console.
