Xen Debugging

Debugging Xen hypervisor with kdb

Requirements

  • Oracle VM 2.2 server. Download from Oracle E-Delivery.

  • A debugging host. Whether Linux or Windows.

  • Connect Oracle VM server and debugging server with a rs232 connection wire.

  • Boot debugger enabled hypervisor on Orale VM server.

    Create new grub boot entry and reboot with this entry:

    title Oracle VM Server-ovs serial console (xen-64-3.4.0 2.6.18-128.2.1.4.3.el5ovs) debug
        root (hd0,4)
        kernel /boot/xen-64bit-debug.gz console=com1,vga com1=57600,8n1 dom0_mem=543M
        module /boot/vmlinuz-2.6.18-128.2.1.4.3.el5xen ro root=LABEL=/ console=tty0 console=ttyS0,57600n8
        module /boot/initrd-2.6.18-128.2.1.4.3.el5xen.img
  • On the debugging host (take Oracle Enterprise Linux for example), start Minicom:

    # minicom

    The Minicom config:

    # cat /etc/minirc.dfl
    # Machine-generated file - use "minicom -s" to change parameters.
    pr port             /dev/ttyS0
    pu baudrate         57600
    pu rtscts           No

    If you are using Windows as the debugging host, you can use HyperTerminal instead.

Debugging

Do the following tests on the serial console in the debugging host.

  • Switch serial console input:

    (XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0)
    (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)

    Note

    Minicom captures CTRL-a. To input a CTRL-a, You need to press twice CTRL-a. So you need to press six times of CTRL-a to switch the input.

  • Type CTRL-\ to break into kdb:

    Enter kdb (cpu:0 reason:1 vcpu=0 domid:32767 eflg:0x246 irqs:1)
    ffff828c8017a922: acpi_safe_halt+2               ret
    [0]xkdb>
  • Some testing output:

    [0]xkdb> h
     - ccpu is current cpu
     - following are always output in decimal:
         vcpu num, cpu num, domid
     - otherwise, almost all numbers output are in hex
     - all input in hex, unless indicated in usage
     - output: z17 means decimal 17
     - domid 7fff(z32767) refers to hypervisor
     - if no domid before function name, then it's hypervisor
     - earlykdb in xen grub line to break into kdb during boot
    
    info   : basic info/constants...
    f [vcpu ptr]  : Display stack frames
    fg domid ipaddr(eip) spaddr(esp)  : Display stack given ip and sp for guest
    dw <vaddr|sym>[num(dec)][domid]  : Display word
    dd <vaddr|sym>[num(dec)][domid]  : Display dword
    dwm <maddr|sym>[num(dec)]  : Display machine word
    ddm <maddr|sym>[num(dec)]  : Display machine dword
    dr [sp]  : Display [special]Registers
    drg   : Display guest/stack registers
    dis [addr|sym][num][0xdomid]  : Disassemble
    dism   : toggle Intel/ATT modes
    mw <vaddr|sym><val>[domid]  : Modify Mem Word
    md <vaddr|sym><val>[domid]  : Modify Mem DWord
    mr <reg><val>  : Modify Register
    bc <num|all>  : brkpt delete
    bp [addr|sym][0xdomid]  : brkpt list/set(on all cpus)
    wp [addr|sym][w|i]  : watchpoint list/set(on all cpus)
    wc <num|all>  : watchpoint delete
    ni   : next instr after call
    ss   : Single Step
    ssb   : Single Step to branch
    go   : Continue Execution
    cpu [all|num]  : Switch CPU
    nmi <cpu|all>  : send nmi to cpu/s
    sym sym ? for usage  : Load guest symbols
    vcpuh vcpu-ptr  : Display hvm_vcpu{}
    vcpu [all|ptr]  : Display vcpu/s
    dom [all|0xdomid]  : Display dom/s
    sched   : scheduler info
    mmu   : Basic mmu info
    p2m 0xdomid 0xgpfn  : gpfn to mfn
    m2p 0xmfn  : mfn to pfn
    dpage mfn|page-ptr  : Display page info
    dtrq   : Dump timer queues
    didt   : Dump IDT current table
    dgdt   : Dump GDT table
    dirq   : Dump IRQs bindings
    dvmc [0xdomid][0xvcpuid]  : Dump vmcs/vmcb
    trcon   : turn tracing on
    trcoff   : turn tracing off
    trcz   : zero entire trace buffer
    trcp   : hints to print trace buffer via dd cmd
    usr1   : User defined cmd
    kdbf   : Display kdb stack frames
    kdbdbg   : toggle kdb debug
    reboot   : Reboot
    h   : Help
    
    [0]xkdb> dom 5
    
    DOMAIN :    domid:0x0005 ptr:0xffff83006f6de000
      pgalk: 0001 4095 0
      pglist:  0xffff828400df8ac0 0xffff82840041fd20
      xpglist: 0xffff828400df8380 0xffff828400df8ca0
      PAGES: tot:0x0003e800 max:0x0003e800 xenheap:0x00000005
      next:0xffff830020fe6000 hashnext:0x0000000000000000
      rangesets: nxt:0xffff8300600cbad0 prev:0xffff8300600cba10 lk: 0001 4095 0
    
      Evt: MAX_EVTCHNS:$1024 ptr:ffff83006f6de078 pollmsk:00000000 lk: 0001 4095 0
        &evtchn_pending:ffff83006fc65800 &evtchn_mask:ffff83006fc65880
        chn:   0 st:1 _consumr=0 ntfy_:0   pend:0 mask:1
        chn:   1 st:3 _consumr=0 ntfy_:0  rem-port:$26 domid:0  pend:0 mask:0
        chn:   2 st:3 _consumr=0 ntfy_:0  rem-port:$27 domid:0  pend:0 mask:0
        chn:   3 st:5 _consumr=0 ntfy_:0  virq:$0  pend:1 mask:0
        chn:   4 st:6 _consumr=0 ntfy_:0   pend:0 mask:0
        chn:   5 st:6 _consumr=0 ntfy_:0   pend:0 mask:0
        chn:   6 st:6 _consumr=0 ntfy_:1   pend:0 mask:0
        chn:   7 st:6 _consumr=0 ntfy_:1   pend:0 mask:0
        chn:   8 st:5 _consumr=0 ntfy_:1  virq:$0  pend:1 mask:0
        chn:   9 st:3 _consumr=0 ntfy_:0  rem-port:$28 domid:0  pend:0 mask:0
        chn:  10 st:3 _consumr=0 ntfy_:0  rem-port:$29 domid:0  pend:0 mask:0
        chn:  11 st:3 _consumr=0 ntfy_:0  rem-port:$30 domid:0  pend:0 mask:0
        chn:  12 st:3 _consumr=0 ntfy_:0  rem-port:$36 domid:0  pend:0 mask:0
    
      Grant table: gp:0xffff83001d905e00
        nr_frames:0x00000004 shpp:0xffff83001d905e40 active:0xffff83001d8ed840
        maptrk:0xffff8300600cb160 maphd:0x00000000 maplmt:0x00000200
        mapcnt:mapcnt: lk: 0001 4095 0
      hvm:0 priv:0 dbg:1 dying:0 paused:1
      shutdown: lk: 0001 4095 0
      shutn:0 shut:0 code:0
      pausecnt:0x00000001 vm_assist:0x000000000000000f refcnt:0x0000007b
      cpumask:0
      shared == vcpu_info[]: ffff83006fc65000
        arch_shared: maxpfn: 3f000 pfn-mfn-frame-ll mfn: 6fd84
    
      arch_domain at : ffff83006f6de580
        pt:0xffff83006f6d2000     l2:0xffff83006fc68000 l3:0xffff83006fc66000
        ioport:0xffff8300600cba10 &hvm_dom:0xffff83006f6de5c0
        &pging_dom:ffff83006f6df128 mode:0 disabled
        p2m ptr:ffff8300600cb970  pages:{0000000000000000, 0000000000000000}
                max_mapped_pfn:0000000000000000  &alloc_page:ffff8300600cb990
        phys_table.pfn:0000000000000000
        physaddr_bitsz:0 32bit_pv:1 has_32bit_shinfo:1
      sched:0xffff8300600cbb30  &handle:0xffff83006f6dfe08
      vcpu ptrs:
        0:ffff83006f6fe000 1:ffff83006fc3a000
    
    [0]xkdb> cpu all
    [0]ffff828c8017a922: acpi_safe_halt+2               ret
    [1]ffff828c8017a922: acpi_safe_halt+2               ret
    
    [0]xkdb> dr
    (XEN) ----[ Xen-3.4.0  x86_64  debug=n  Not tainted ]----
    (XEN) CPU:    0
    (XEN) RIP:    e008:[<ffff828c8017a922>] acpi_safe_halt+0x2/0x10
    (XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
    (XEN) rax: 0000000000000003   rbx: 00000000008e026c   rcx: 0000000000000001
    (XEN) rdx: 0000000000000808   rsi: 0000000004d2b4e8   rdi: ffff83007f2c2460
    (XEN) rbp: ffff83007f2c2400   rsp: ffff828c802d7ec0   r8:  00000000000002b9
    (XEN) r9:  0000000000000002   r10: 0000000000000000   r11: ffff828c8031a3e0
    (XEN) r12: ffff83007f2c2460   r13: 00000b2a9fba595a   r14: 0000000000000000
    (XEN) r15: ffff828c8024d100   cr0: 000000008005003b   cr4: 00000000000026f0
    (XEN) cr3: 0000000020fc0000   cr2: 00000000b7f44000
    (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
    
    [0]xkdb> bp schedule
    BP set for domid:32767 at: 0xffff828c801178b0 schedule+0
    [0]xkdb> go
    Breakpoint on cpu 0 at 0xffff828c801178b0
    ffff828c801178b0: schedule+0                     subq $0x78, %rsp
    [0]xkdb> ss
    ffff828c801178b4: schedule+4                     mov %rbx, 0x48(%rsp)
    [0]xkdb> bc all
    Deleted breakpoint [0] addr:0xffff828c801178b0 domid:32767
    
    [0]xkdb> cpu 1
    Switching to cpu:1
    [1]xkdb>
    
    [1]xkdb> f
    (XEN) Xen call trace:
    (XEN)    [<ffff828c80117a13>] schedule+0x163/0x3a0
    (XEN)    [<ffff828c80118968>] do_softirq+0x58/0x80
    (XEN)    [<ffff828c8013fedc>] idle_loop+0x4c/0xa0
    
    [1]xkdb> go

Debugging Xen guest with gdbsx

Requirements

  • Oracle VM 2.2 server. Download from Oracle E-Delivery.

  • A debugging host (should be Linux).

  • Oracle Enterprise Linux template OVM_EL5U3_X86_PVM_4GB.

    Download from Oracle E-Delivery.

  • Oracle Enterprise Linux kernel-debug rpm packages:

    • kernel-xen-debuginfo-2.6.18-128.0.0.0.2.el5.i686.rpm
    • kernel-debuginfo-common-2.6.18-128.0.0.0.2.el5.i686.rpm

    Download from: http://oss.oracle.com/el5/debuginfo/

  • Oracle Enterprise Linux kernel source code:

    • kernel-2.6.18-128.0.0.0.2.el5.src.rpm

    Download from: http://oss.oracle.com/el5/SRPMS-updates/

  • Boot debugger enabled hypervisor on Orale VM server.

  • For multiple VCPU debugging, Edit OVM_EL5U3_X86_PVM_4GB/vm.cfg, modify the guest to use two VCPUs:

    vcpus = 2
  • On Oracle VM server, start OVM_EL5U3_X86_PVM_4GB:

    # xm create OVM_EL5U3_X86_PVM_4GB/vm.cfg

    Stop iptables service of Oracle VM server:

    # service iptables stop
  • On the debugging host, first ensure gdb version 6.5-16+ is installed:

    $ gdb --version
    GNU gdb Fedora (6.8-37.el5)

    Prepare the kernel source:

    $ rpm -ivh kernel-2.6.18-128.0.0.0.2.el5.src.rpm
    $ rpmbuild -bp ~/rpmbuild/SPECS/kernel-2.6.spec
    $ mv ~/rpmbuild/BUILD/kernel-2.6.18 /share/tmp/pkg/debug/

    To debug a x86 Linux OS, you may need to do an extra setp:

    $ cd /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686/include/
    $ ln -s asm-i386 asm

    Prepare the debugging kernel image:

    $ cd /share/tmp/pkg/debug/
    $ rpm2cpio kernel-xen-debuginfo-2.6.18-128.0.0.0.2.el5.i686.rpm | cpio -iumd
    $ rpm2cpio kernel-debuginfo-common-2.6.18-128.0.0.0.2.el5.i686.rpm | cpio -iumd

Debugging on Oracle VM server

  • VM status:

    # xm list
    Name                                        ID   Mem VCPUs      State   Time(s)
    Domain-0                                     0   487     2     r-----    478.2
    OVM_EL5U3_X86_PVM_4GB                        5  1000     2     -b----     17.9
  • gdbsx usage:

    # gdbsx
    Usage 1: gdbsx -a domid <32|64> PORT [-d]
             PORT to listen for a TCP connection.
             Eg. gdbsx -a 3 32 9999
    
    Usage 2: gdbsx -c domid <32|64> [vcpu#] [-d]
             to dump vcpu context(s) for given domid
  • Displays VCPU contexts:

    # gdbsx -c 5 32
    ===> Context for DOMID:5
    --> VCPU:0
    eip:c04013a7 esp:c06effc0 flags:00001246
    eax:00000000 ebx:00000001 ecx:00000000 edx:00000000
    esi:00000001 edi:00000000 ebp:c062780f
    cs:61 ds:7b fs:0 gs:0
    
    Call Trace:
       [c04013a7]
       [c0408664]
       [c17ee6c8]
       [ffffffff]
       [c040321a]
       [c0403339]
       [c17ee6c8]
       [c071fc64]
       [c0627831]
       [c06f49f5]
       [c0765800]
    --> VCPU:1
    eip:c04013a7 esp:c0e44f9c flags:00001246
    eax:00000000 ebx:00000001 ecx:00000000 edx:00000000
    esi:00000001 edi:00000001 ebp:00000000
    cs:61 ds:7b fs:0 gs:0
    
    Call Trace:
       [c04013a7]
       [c0408664]
       [ffffffff]
       [c040321a]
       [c0403339]
       [2d6e6578]
       [2d302e33]
       [5f363878]
       [6fc65000]
       [c0e48000]
       [c0d4b000]

Remote debugging

  • Start gdb server on Oracle VM server to debug guest OVM_EL5U3_X86_PVM_4GB:

    # gdbsx -a 5 32 9999
  • Connect to the above gdb server on the debugging host:

    $ gdb
    (gdb) file /share/tmp/pkg/debug/usr/lib/debug/lib/modules/2.6.18-128.0.0.0.2.el5xen/vmlinux
    Reading symbols from /share/tmp/pkg/debug/usr/lib/debug/lib/modules/2.6.18-128.0.0.0.2.el5xen/vmlinux...done.
    
    (gdb) dir /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686/
    Source directories searched: /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686:$cdir:$cwd
    
    (gdb) dir /share/tmp/pkg/debug/usr/src/debug/kernel-2.6.18/linux-2.6.18.i686/
    Source directories searched: /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686:
    /share/tmp/pkg/debug/usr/src/debug/kernel-2.6.18/linux-2.6.18.i686:$cdir:$cwd
    
    (gdb) target remote 10.182.120.209:9999
    Remote debugging using 10.182.120.209:9999
    [New Thread 0]
    [Switching to Thread 0]
    0xc04013a7 in hypercall_page ()
  • Debugging multiple VCPUs.

    • Set below for single step of correct VCPU:

      (gdb) set scheduler-locking on
    • Since gdb is not kernel debugger, VCPUs are emulated via threads Thus info threads will show all VCPUs:

      (gdb) info thread
      [New Thread 1]
        2 Thread 1  0xc04013a7 in hypercall_page ()
        * 1 Thread 0  0xc04013a7 in hypercall_page ()

      And switch thread to get to another VCPU. Remember, gdb has it's own thread id, off by 1.

  • Back trace:

    (gdb) bt
    #0  0xc04013a7 in hypercall_page ()
    #1  0xc0408664 in raw_safe_halt () at include/asm/mach-xen/asm/hypercall.h:197
    #2  0xc040321a in xen_idle () at arch/i386/kernel/process-xen.c:109
    #3  0xc0403339 in cpu_idle () at arch/i386/kernel/process-xen.c:161
    #4  0xc06f49f5 in start_kernel () at init/main.c:618
    #5  0xc040006f in startup_32 ()

Using gdb macros

Here are some useful gdb macros to ease Linux kernel debugging (gdbmacros):

define ps
    dont-repeat
    # 4 for 32bit kernels. 8 for 64bit kernels.
    set $sz = sizeof(long)
    set $tasks = (struct list_head *)init_task->tasks
    set $offset = (char *)&init_task->tasks - (char *)&init_task
    set $task = $tasks
    set $task_entry = (struct task_struct *)((char *)$task - $offset)
    if ($sz == 4)
        printf "Pointer       PID      Command\n"
    else
        printf "Pointer               PID      Command\n"
    end
    printf "0x%-12lx%-9d%s\n", $task_entry, $task_entry->pid, $task_entry->comm
    set $task = $task->next
    while $task != $tasks
        set $task_addr = (char *)$task - $offset
        set $task_entry = (struct task_struct *)((char *)$task - $offset)
        if ($task_entry->pid) != 0
            printf "0x%-12lx%-9d%s\n", $task_entry, $task_entry->pid, $task_entry->comm
            if ($sz == 4)
                printf "0x%-12lx%-9d%s\n", $task_entry, $task_entry->pid, $task_entry->comm
            else
                printf "0x%-20lx%-9d%s\n", $task_entry, $task_entry->pid, $task_entry->comm
            end
        end
        set $task = $task->next
    end
end

document ps
Report a snapshot of the current processes.
end

define lsmod
    dont-repeat
    # 4 for 32bit kernels. 8 for 64bit kernels.
    set $sz = sizeof(long)
    set $mod = (struct list_head *)modules
    if ($sz == 4)
        printf "Pointer       Address       Name\n"
    else
        printf "Pointer               Address               Name\n"
    end
    while 1
        set $mod_entry = (struct module *)((char *)$mod - $sz)
        if ($sz == 4)
            printf "0x%-12lx0x%-12lx%s\n", $mod_entry, $mod_entry->module_core, $mod_entry->name
        else
            printf "0x%-20lx0x%-20lx%s\n", $mod_entry, $mod_entry->module_core, $mod_entry->name
        end
        set $mod = $mod->next
        if ($mod == &modules)
            loop_break
        end
    end
end

document lsmod
Show the status of modules in the Linux kernel.
end

define log
    dont-repeat
    printf "%s", log_buf
end

document log
Dump system message buffer.
end

To use these macros, in the debugging session:

(gdb) source gdbmacros
(gdb) help ps
Report a snapshot of the current processes.
(gdb) ps
Pointer       PID      Command
0xc0d5daa0    1        init
0xc0d5d000    2        migration/0
...
(gdb) lsmod
Pointer       Address       Name
0xe1234c00    0xe1231000    xenblk
0xe130b200    0xe12fc000    dm_raid45
0xe120d900    0xe120d000    dm_message
...
(gdb) log
<5>Linux version 2.6.18-128.0.0.0.2.el5xen
(mockbuild@ca-build10.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Wed Jan 21 05:49:36 EST 2009
<6>BIOS-provided physical RAM map:
<4> Xen: 0000000000000000 - 0000000020800000 (usable)
<5>0MB HIGHMEM available.
<5>520MB LOWMEM available.
<4>Using x86 segment limits to approximate NX protection
...

Using gdb init file

You can put all the above gdb commands in a init file and put to ~/.gdbinit or $cwd/.gdbinit:

file /share/tmp/pkg/debug/usr/lib/debug/lib/modules/2.6.18-128.0.0.0.2.el5xen/vmlinux
dir /share/tmp/pkg/debug/kernel-2.6.18/linux-2.6.18.i686/
dir /share/tmp/pkg/debug/usr/src/debug/kernel-2.6.18/linux-2.6.18.i686/
source /share/data/docs/ovm/xen-debugging/gdb/gdbmacros
target remote 10.182.120.209:9999
set scheduler-locking on

Then invoke gdb with:

$ gdb

Debugging crash dumps

  • Get the guest core dump:

    # xm dump-core OVM_EL5U3_X86_PVM_4GB core-OVM_EL5U3_X86_PVM_4GB
  • Invoking crash to debug the core:

    $ crash vmlinux core-OVM_EL5U3_X86_PVM_4GB
    KERNEL: vmlinux
        DUMPFILE: core-OVM_EL5U3_X86_PVM_4GB
            CPUS: 2
            DATE: Tue Oct 13 08:15:15 2009
          UPTIME: 00:08:52
    LOAD AVERAGE: 0.00, 0.04, 0.03
           TASKS: 41
        NODENAME: localhost.localdomain
         RELEASE: 2.6.18-128.0.0.0.2.el5xen
         VERSION: #1 SMP Wed Jan 21 05:49:36 EST 2009
         MACHINE: i686  (2127 Mhz)
          MEMORY: 520 MB
           PANIC: ""
             PID: 0
         COMMAND: "swapper"
            TASK: c06762c0  (1 of 2)  [THREAD_INFO: c06ef000]
             CPU: 0
           STATE: TASK_RUNNING (ACTIVE)
         WARNING: panic task not found
    
    crash> help
    
    *              files          mod            runq           union
    alias          foreach        mount          search         vm
    ascii          fuser          net            set            vtop
    bt             gdb            p              sig            waitq
    btop           help           ps             struct         whatis
    dev            irq            pte            swap           wr
    dis            kmem           ptob           sym            q
    eval           list           ptov           sys
    exit           log            rd             task
    extend         mach           repeat         timer
    
    crash> ps
       PID    PPID  CPU   TASK    ST  %MEM     VSZ    RSS  COMM
    >     0      0   0  c06762c0  RU   0.0       0      0  [swapper]
    >     0      1   1  c0d5d550  RU   0.0       0      0  [swapper]
          1      0   0  c0d5daa0  IN   0.1    2080    604  init
          2      1   0  c0d5d000  IN   0.0       0      0  [migration/0]
          3      1   0  c0d54aa0  IN   0.0       0      0  [ksoftirqd/0]
    ...
    
    crash> mod
     MODULE   NAME              SIZE  OBJECT FILE
    e1206b80  scsi_dh          11713  (not loaded)  [CONFIG_KALLSYMS]
    e120a500  dm_mem_cache     10049  (not loaded)  [CONFIG_KALLSYMS]
    e120d900  dm_message        6977  (not loaded)  [CONFIG_KALLSYMS]
    ...

Xen Trace

  • Get trace data:

    # xentrace -D -S 256 -T 1 trace.raw
  • Analyze trace date using xentrace_format:

    # cat trace.raw | xentrace_format formats >trace.txt
  • Analyze trace date using xenalyze:

    # xenalyze --cpu-hz=2.0G -s trace.raw >trace.summary
    ...
    ]  1.000744383 -x  vmexit exit_reason EXCEPTION_NMI eip fffff80001029795
    ]  1.000744383 -x mmio_assist w gpa fee000b0 data 0
    ]  1.000745561 x-  vmexit exit_reason EXCEPTION_NMI eip fffff8000102ff73
    ]  1.000745561 x- mmio_assist w gpa fee000b0 data 0 vlapic eoi
    ]  1.000745561 x- fast mmio va fffffffffffe00b0
    ]  1.000748415 -x  vmentry
    ]  1.000748692 x-  vmentry
    ]  1.000749948 -x  vmexit exit_reason HLT eip fffffadffab2fb41
    ]  1.000749948 -x  hlt [ 0 0 a0028006 10dced2d ]
    ]  1.000750155 x-  vmexit exit_reason HLT eip fffffadffab2fb41
    ]  1.000750155 x-  hlt [ 0 0 a0028006 10dcef1c ]
  • Use xenctx to show domain info:

    # xenctx --symbol-table=/share/tmp/pkg/debug/boot/System.map-2.6.18-128.0.0.0.2.el5xen --all 1
    cs:eip: 0061:c04013a7 hypercall_page+0x3a7
    flags: 00001246 i z p
    ss:esp: 0069:c06effc0
    eax: 00000000   ebx: 00000001   ecx: 00000000   edx: 00000000
    esi: 00000001   edi: 00000000   ebp: c062780f
     ds:     007b    es:     007b    fs:     0000    gs:     0000
    
    cr0: 8005003b
    cr2: b7f7e000
    cr3: 3c6fd000
    cr4: 00002660
    
    dr0: 00000000
    dr1: 00000000
    dr2: 00000000
    dr3: 00000000
    dr6: ffff0ff0
    dr7: 00000400
    Code (instr addr c04013a7)
    cc cc cc cc cc cc cc cc cc cc cc cc cc cc b8 1d 00 00 00 cd 82 <c3> cc cc cc cc cc cc cc cc cc cc
    
    
    Stack:
     c0408664 c141c6c8 ffffffff c040321a c0403339 c141c6c8 c071fc64 c0627831
     c06f49f5 00004b64 c0765800 01020800 c0dcb000 00000000 00000000 c040006f
    
    Call Trace:
      [<c04013a7>] hypercall_page+0x3a7  <--
      [<c0408664>] raw_safe_halt+0x8c
      [<c040321a>] xen_idle+0x22
      [<c0403339>] cpu_idle+0x91
      [<c06f49f5>] start_kernel+0x37a
  • Use xen-hvmctx to show HVM guest info:

    # xen-hvmctx 2
    HVM save record for domain 2
    Entry 0: type 1 instance 0, length 24
         Header: magic 0x54381286, version 1
                 Xen changeset ffffffffffffffff
                 CPUID[0][%eax] 0x000006f2
    Entry 1: type 2 instance 0, length 1016
        CPU:    rax 0x0000000000000000     rbx 0x0000000000000000
                rcx 0x00000000c0403bb0     rdx 0x00000000c06f0000
                rbp 0x00000000c0627743     rsi 0x00000000c072dee4
                rdi 0x00000000c0627765     rsp 0x00000000c06f0fd8
                 r8 0x0000000000000000      r9 0x0000000000000000
                r10 0x0000000000000000     r11 0x0000000000000000
                r12 0x0000000000000000     r13 0x0000000000000000
                r14 0x0000000000000000     r15 0x0000000000000000
                rip 0x00000000c0403be1  rflags 0x0000000000000246
                cr0 0x000000008005003b     cr2 0x0000000000dd09e0
                cr3 0x000000001e93d000     cr4 0x00000000000006d0
                dr0 0x0000000000000000     dr1 0x0000000000000000
                dr2 0x0000000000000000     dr3 0x0000000000000000
                dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
                 cs 0x00000060 (0x0000000000000000 + 0xffffffff / 0x00c9b)
                 ds 0x0000007b (0x0000000000000000 + 0xffffffff / 0x00cf3)
                 es 0x0000007b (0x0000000000000000 + 0xffffffff / 0x00cf3)
                 fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
                 gs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
                 ss 0x00000068 (0x0000000000000000 + 0xffffffff / 0x00c93)
                 tr 0x00000080 (0x00000000c1401c80 + 0x00002073 / 0x0008b)
               ldtr 0x00000088 (0x00000000c0732020 + 0x00000027 / 0x00082)
               itdr            (0x00000000c06e2000 + 0x000007ff)
               gdtr            (0x00000000c1410000 + 0x000000ff)
    ...

Send debug keys to Xen by xm debug-key

  • To show installed key handlers:

    # xm debug-key h
    # xm dmesg
    ...
    (XEN) 'h' pressed -> showing installed handlers
    (XEN)  key '%' (ascii '25') => Trap to xendbg
    (XEN)  key '0' (ascii '30') => dump Dom0 registers
    (XEN)  key 'C' (ascii '43') => trigger a crashdump
    (XEN)  key 'H' (ascii '48') => dump heap info
    (XEN)  key 'N' (ascii '4e') => NMI statistics
    (XEN)  key 'Q' (ascii '51') => dump PCI devices
    (XEN)  key 'R' (ascii '52') => reboot machine
    (XEN)  key 'a' (ascii '61') => dump timer queues
    (XEN)  key 'c' (ascii '63') => dump cx structures
    (XEN)  key 'd' (ascii '64') => dump registers
    (XEN)  key 'e' (ascii '65') => dump evtchn info
    (XEN)  key 'h' (ascii '68') => show this message
    (XEN)  key 'i' (ascii '69') => dump interrupt bindings
    (XEN)  key 'm' (ascii '6d') => memory info
    (XEN)  key 'n' (ascii '6e') => trigger an NMI
    (XEN)  key 'q' (ascii '71') => dump domain (and guest debug) info
    (XEN)  key 'r' (ascii '72') => dump run queues
    (XEN)  key 't' (ascii '74') => display multi-cpu clock info
    (XEN)  key 'u' (ascii '75') => dump numa info
    (XEN)  key 'v' (ascii '76') => dump Intel's VMCS
    (XEN)  key 'z' (ascii '7a') => print ioapic info
  • To dump domain (and guest debug) info:

    # xm debug-key q
    # xm dmesg
    ...
    (XEN) 'q' pressed -> dumping domain info (now=0x40B9:7BDAEE7D)
    (XEN) General information for domain 0:
    (XEN)     refcnt=3 nr_pages=139008 xenheap_pages=5 dirty_cpus={1}
    (XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=0000000f
    (XEN) Rangesets belonging to domain 0:
    (XEN)     Interrupts { 0-255 }
    (XEN)     I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
    (XEN)     I/O Ports  { 0-1f, 22-3f, 44-60, 62-9f, a2-ffff }
    (XEN) Memory pages belonging to domain 0:
    (XEN)     DomPage list too long to display
    (XEN)     XenPage 000000007f3f7000: mfn=000000000007f3f7, caf=80000002, taf=00000000e8000002
    (XEN)     XenPage 000000007f3f6000: mfn=000000000007f3f6, caf=80000001, taf=00000000e8000001
    (XEN)     XenPage 000000007f3f5000: mfn=000000000007f3f5, caf=80000001, taf=00000000e8000001
    (XEN)     XenPage 000000007f3f4000: mfn=000000000007f3f4, caf=80000001, taf=00000000e8000001
    (XEN)     XenPage 000000007f3f1000: mfn=000000000007f3f1, caf=80000002, taf=00000000e8000002
    (XEN) VCPU information and callbacks for domain 0:
    (XEN)     VCPU0: CPU1 [has=T] flags=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={1} cpu_affinity={0-31}
    (XEN)     250 Hz periodic timer (period 4 ms)
    (XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
    (XEN)     VCPU1: CPU0 [has=F] flags=1 upcall_pend = 00, upcall_mask = 00 dirty_cpus={} cpu_affinity={0-31}
    (XEN)     250 Hz periodic timer (period 4 ms)
    (XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
    (XEN) General information for domain 1:
    (XEN)     refcnt=17652 nr_pages=0 xenheap_pages=0 dirty_cpus={}
    (XEN)     handle=da6cc464-0ecd-7bea-3704-de76da53a771 vm_assist=0000000f
    (XEN) Rangesets belonging to domain 1:
    (XEN)     Interrupts { }
    (XEN)     I/O Memory { }
    (XEN)     I/O Ports  { }

Xen kexec and kdump

Note

I didn't get kdump to work under Oracle VM 2.2. But it does work under Oracle VM 2.1.5. However the vmcore cannot be analysed by crash.

Xen kdump is work on EL5U3 and the vmcore can be analysed by crash.

  • Enable crash dump of Xen hypervisor and dom0 kernel:

    title Oracle VM Server-ovs serial console (xen-64-3.4.0 2.6.18-128.2.1.4.3.el5ovs) debug
        root (hd0,4)
        kernel /boot/xen-64bit-debug.gz console=com1,vga com1=57600,8n1 dom0_mem=543M crashkernel=128M@16M
        module /boot/vmlinuz-2.6.18-128.2.1.4.3.el5xen ro root=LABEL=/ console=tty0 console=ttyS0,57600n8
        module /boot/initrd-2.6.18-128.2.1.4.3.el5xen.img
  • Start kdump service and mark it as autostart at system boot:

    # service kdump start
    # chkconfi --level 2345 kdump on
  • Reboot and trigger a dom0 panic by press the key combo ALT-SysRq-c or by issuing the following command by root:

    # echo "c" >/proc/sysrq-trigger
  • The vmcore will be available at /var/crash/.

  • Debug the vmcore using crash:

    $ crash boot/xen-syms-2.6.18-128.el5 usr/lib/debug/boot/xen-syms-2.6.18-128.el5.debug core-EL5U3
       KERNEL: boot/xen-syms-2.6.18-128.el5
    DEBUGINFO: usr/lib/debug/boot/xen-syms-2.6.18-128.el5.debug
     DUMPFILE: core-EL5U3
         CPUS: 2
      DOMAINS: 4
       UPTIME: 00:02:44
      MACHINE: Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz  (2128 Mhz)
       MEMORY: 2 GB
      PCPU-ID: 0
         PCPU: ff1d3fb4
      VCPU-ID: 0
         VCPU: ffbe7080  (VCPU_RUNNING)
    DOMAIN-ID: 0
       DOMAIN: ffbf0080  (DOMAIN_RUNNING)
        STATE: CRASH
    
    crash> help
    
    *              dumpinfo       log            search         vcpus
    alias          eval           p              set            whatis
    ascii          exit           pcpus          struct         wr
    bt             extend         pte            sym            q
    dis            gdb            rd             sys
    domain         help           repeat         union
    doms           list           sched          vcpu
    
    crash version: 4.1.0    gdb version: 6.1
    For help on any command above, enter "help <command>".
    For help on input options, enter "help input".
    For help on output options, enter "help output".
    
    crash> list
    list: starting address required
    Usage: list [[-o] offset] [-e end] [-s struct[.member[,member]]] [-H] start
    Enter "help list" for details.
    crash> doms
       DID   DOMAIN  ST T  MAXPAGE  TOTPAGE VCPU SHARED_I  P2M_MFN
      32753 ff21c080 RU O     0        0      0      0      ----
      32754 ff1c8080 RU X     0        0      0      0      ----
      32767 ff214080 RU I     0        0      2      0      ----
    >*    0 ffbf0080 RU 0 ffffffff   67100    2  ffbec000   3e308
    crash> pcpus
       PCID   PCPU   CUR-VCPU    TSS
     *    0 ff1d3fb4 ffbe7080 ff1fa180
          1 ff21bfb4 ffbe6080 ff1fa200
    crash> vcpus
       VCID  PCID   VCPU   ST T DOMID  DOMAIN
          0     0 ffbfd080 RU I 32767 ff214080
          1     1 ff1cc080 RU I 32767 ff214080
    >*    0     0 ffbe7080 RU 0     0 ffbf0080
    >     1     1 ffbe6080 RU 0     0 ffbf0080

Xend Debugging

  • Xend is written in Python and can be debugged by pdb.
  • All user space debugging techniques are applied.
  • Refer to Xend debugging.

Xend tracing/debugging/profiling

  • Export necessary system variables:

    # export XEND_DEBUG=1
    # export XEND_DAEMONIZE=0
  • Start xend tracing:

    # /usr/sbin/xend trace_start

    The trace file will be available at /var/log/xen/xend.trace.

  • Debug xend using the Python Debugger:

    # python -m pdb /usr/sbin/xend trace_start

    You'll drop to a debugging shell. You can set breakpoints and continue.

  • Profile xend using the Python Profiler:

    # python -m profile /usr/sbin/xend start >xend.profile

    To generate the profile result:

    # kill -INT <pid of xend>
    # kill -USR1 <pid of xend>

    The profiling log is available at xend.profile.

Troubleshooting

Compile Xen with debug=y

When xen is compiled with debug=y, it basically turns on lots of ASSERTs, etc.. When we say debug version in OVM, we've built xen with gdbsx=y kdb=y but not debug=y. More work is needed to make debug=y work with kdb since lot of data structs are different. If you really wanted debug=y, then you could compile gdbsx=y debug=y without kdb.

Capture Xen hypervisor log to filesystem

Start xenconsoled with option --log=hv. The log will be: <log_dir>/hypervisor.log where <log_dir> defaults to /var/log/xen/console, but can be overridden with command-line option --log-dir.

You can leverage this shell (xen-console-trace.sh):

#!/bin/sh

/etc/init.d/xend stop
killall -9 xenconsoled
mkdir -p /var/log/xen/console
export XENCONSOLED_TRACE=hv
/etc/init.d/xend start

Reference

XenDebugging (last edited 2010-02-26 03:31:58 by ZhigangWang)