*****************************************************************************
*      Kernel debugging, part 1:  "Understanding what's gone wrong"         *
*                                                                           *
* Copyright (c)  2001  Daniel P. Bovet, Marco Cesati, and Cosimo Comella    *
*    Permission is granted to copy, distribute and/or modify this document  *
*    under the terms of the GNU Free Documentation License, Version 1.1,    *
*    published by the Free Software Foundation; with no Invariant Sections, *
*    with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the *
*    license is included in the file named LICENSE.                         *
*                                                                           *
* (version 1.0)                                                             *
*****************************************************************************

Every complex program has bugs, and the Linux kernel is not an
exception. However, debugging the kernel is an hard task, because:

    * the kernel is BIG (millions of lines)
    * the kernel is very complex (multithreaded, 
      hardware-related, ...) 
    * there is no higher program that monitors it

You are usually made aware of a kernel bug when: 
    * your User Mode program crashes, freezes, or behaves in 
      unexpected ways 
    * you discover a kernel oops on the console or in the log files 
    * your system does not reply to ping, your console no longer 
      responds ...
    * your cabinet smokes :-)

Fixing a kernel bug involves three different steps:
    1. finding where the bug showed itself 
    2. understanding what is wrong with the existing code 
    3. fixing the code

As you may well suppose, steps 1 and 3 are relatively easy, while
step 2 is hairy. You need to understand how the kernel is supposed to
work, and how it actually works. If you finds a difference, then you
have likely found a bug.

Now, I have to admit that the title of this lecture is a lie: I
cannot give you any help to cope with step 2. You have to study the
kernel by yourself,  and trying to make a sense of it. 

In this lecture we'll focus on step 1, namely, locating the exact
instruction in the kernel source code where a particular bug showed
up. Be warned that the bug is not necessarily due to a wrong
instruction in the exact place where it showed up: the bug might be a
design flaw, a stale pointer passed by another function,  and so on.

We shall proceed by examples.

Example 1. The "mystery1" program hangs up. 

If "mystery1" is in foreground, it cannot be stopped or killed. But we can use
the "ps" command to collect some information about it:

$ ps u -C mystery1
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
cesati     208  0.0  0.1  1200  312 pts/1    D    18:35   0:00 ./mystery1  
  
The "D" flag gives us two bits of information:

1. the process is in Kernel Mode
2. the process is in TASK_UNINTERRUPTIBLE state

In Linux, every process alternates between User Mode and Kernel Mode. In User
Mode the process executes application-specific code, while in Kernel Mode it
executes kernel code. (In Intel processors, User and Kernel Modes are different
hardware states encoded in the last two bits of the cs and ss segmentation
registers.)

While in Kernel Mode, the process can be in one of the following states:
    
    TASK_RUNNING            running or waiting for the CPU
    TASK_INTERRUPTIBLE      waiting for some event, interruptible by signals
    TASK_UNINTERRUPTIBLE    waiting for some event, not interruptible
    TASK_STOPPED            stopped by a signal or terminal keypress (Ctrl-Z)
    TASK_ZOMBIE             execution terminated, waiting to be destroyed
    
Thus, our "mystery1" program is stuck in Kernel Mode waiting for some event.
Understanding what precisely the process is waiting for might clarify a lot.
The "ps" command is still useful here:

$ ps n -o pid,tt,user,fname,wchan -C mystery1
  PID TT           USER COMMAND   WCHAN
  208 pts/1        1000 mystery1 161dad

The WCHAN field is the information we need: it denotes the "wait
channel" of the process, namely, an offset of a function where the
process is blocked. The "n" flag of "ps" forces numerical output, thus
161dad must be interpreted as hex number 0x161dad.

How can we make sense out of this number? First of all, recall that the kernel
lives in the fourth gigabyte of linear address space. The address returned by
WCHAN should therefore be considered as 0xc00161dad. 

Now, we must know what instruction lies in RAM at that address. Association
between kernel symbols and RAM addresses is mostly done in the linking phase
of kernel compilation. Beside the kernel image file to be loaded in RAM, the
compilation yields a file named System.map, which lists the linear addresses of
all kernel symbols. When installing a new kernel, you should always save such
file in a safe place.

Even better, you should save this file in a place where the "ps" command may
find it. IMHO, the best place is the /boot directory. Also, "ps" looks first for
a file named "System.map-X.Y.Z", where "X.Y.Z" is the release of the kernel
as returned by "uname -r" (first four lines of the top Makefile).

Without the "n" flag, "ps" will search the System.map file and translate the
address by yourself:

$ ps -o pid,tt,user,fname,wchan -C mystery1
  PID TT       USER     COMMAND  WCHAN  
  208 pts/1    cesati   mystery1 semctl_down 

WARNING: be sure that "ps" looks at the right System.map file before using the
symbolic name! In our case, "ps" looks at the wrong file, therefore it gives
a wrong result. Using the correct System.map file yields:

$ ps -o pid,tt,user,fname,wchan -C mystery1
  PID TT       USER     COMMAND  WCHAN  
  208 pts/1    cesati   mystery1 noninterruptible_delay
  
Now, we should look up noninterruptible_delay() in the kernel source
code. Either you already know the source file where the function is defined,
or you have to search. I'm used to type a command like this in the top source
directory:

$ find . -name \*.[chS] -exec grep -l interruptible_delay {} \;
./lkhc/hack1.c
./lkhc/hack-static.c

In the lkhc/hack1.c file, interruptible_delay() is invoked:

====== excerpt from lkhc/hack1.c ======

static int hack1_read(char *page, char **start, off_t off, int count, int *eof,
                      void *data)
{
        int written;
 
        written = sprintf(page+off, "jiffies=%ld ", jiffies);
        noninterruptible_delay(1); /* sleep one second */
        written += sprintf(page+off, "%ld\n", jiffies);
        return written;
}

=======================================


In the lkhc/hack-static.c file, interruptible_delay() is defined as:

====== excerpt from lkhc/hack-static.c ======

void noninterruptible_delay(int seconds)
{
        DECLARE_WAIT_QUEUE_HEAD(wait);
 
        sleep_on_timeout(&wait, seconds*HZ*1000);
}

=============================================

We eventually found where our process is blocked: the sleep_on_timeout()
function, which puts the current process in TASK_UNINTERRUPTIBLE state 
until either it is awakened by another process, or a timeout expires.

Now step 2 can start: how is the kernel behaving? From its definition, it is
quite clear that the parameter "seconds" represents the timeout in seconds.
However, there is a mistake in the invocation of sleep_on_timeout(), because
this function expects a time interval in ticks.

Step 3 is also trivial: just replace the wrong line with:

    sleep_on_timeout(&wait, seconds*HZ);

    
Let's try a slightly different example.


Example 2. The "mystery2" program hangs up. 

When applying our "ps" command, we get a weird result:

$ ps -o pid,tt,user,fname,wchan -C mystery2
  PID TT       USER     COMMAND  WCHAN
10687 pts/3    cesati   mystery2 end

The symbol "end" is too vague to be a real symbol name, and actually it
corresponds to the linker symbol denoting the end of the kernel code.
Something's going wrong. Let's try "ps" with numerical output:

$ ps n -o pid,tt,user,fname,wchan -C mystery2
  PID TT           USER COMMAND   WCHAN
10687 pts/3        1000 mystery2  5509d

Now, if we try to interpret 5509d as the linear address 0xc005509d, we fail:
looking at the Symbol.map file, we don't find it.

As you have already guessed, the wait channel falls inside a module. It is easy
to understand that the System.map file cannot hold the linear addresses of
symbols defined inside modules, because any module is relocated when it is
loaded at run-time. Thus, we need another source of information:   /proc/ksyms
virtual file to the rescue!

The /proc/ksyms file has not been designed for debugging. It is mainly used
when loading a module, since it includes the linear addresses of all kernel
symbols (both from static code and loaded modules) that are exported. For
example, consider our hack1.c module, that refers to the jiffies global
variable: the /proc/ksyms file includes the linear address of jiffies:

$ grep jiffies /proc/ksyms
c02559e4 jiffies

The insmod program looks into /proc/ksyms, fetches the address, and replace any
occurrence of jiffies in hack1.o when preparing the relocated executable code
of the module.

This discussion suggests that /proc/ksyms does not include the address of every
function of every module. In fact:

$ cat /proc/ksyms
cd055000 __insmod_hack2_O/home/cesati/LKHC/linux-2.4.16-lkhc1/lkhc/hack2.o_M3C1659AA_V132112    [hack2]
cd055060 __insmod_hack2_S.text_L208     [hack2]
cd055130 __insmod_hack2_S.rodata_L24    [hack2]
cd0551f4 __insmod_hack2_S.bss_L4        [hack2]
cd053000 __insmod_hack1_O/home/cesati/LKHC/linux-2.4.16-lkhc1/lkhc/hack1.o_M3C164209_V132112    [hack1]
cd053060 __insmod_hack1_S.text_L144     [hack1]
cd0530f0 __insmod_hack1_S.rodata_L24    [hack1]
cd0531b4 __insmod_hack1_S.bss_L4        [hack1]
cd046c88 pcmcia_lookup_bus      [pcmcia_core]
cd0465e4 pcmcia_access_configuration_register   [pcmcia_core]
cd044d8c pcmcia_adjust_resource_info    [pcmcia_core]
cd0466a0 pcmcia_bind_device     [pcmcia_core]
cd04676c pcmcia_bind_mtd        [pcmcia_core]
cd045830 pcmcia_check_erase_queue       [pcmcia_core]
cd0458f8 pcmcia_close_memory    [pcmcia_core]
(snipped)

The hack1 and hack2 modules don't export any symbol. Fortunately, the insmod
program produces a few symbols that mark the beginning and the end of the
kernel address space allocated to any loaded module. In our example, the code
of module hack1 starts at linear address 0xcd053060 (text section), and the
code of module hack2 starts at 0xcd055060. These symbols allow us to make sense
of the number 0x5509d returned by "ps": it looks like an offset inside the code
of hack2.

(Short digression: you might wonder why statically compiled symbols start at
0xc0000000, while module symbols start at 0xcd000000. The kernel uses the
initial portion of the fourth gigabyte of linear address space to linearly map
all physical RAM in the system; the statically compiled kernel "lives" here. The
rest of the fourth gigabyte is used by the kernel to establish temporary
mappings; modules are loaded using these temporary mappings.)

Now, we would like to establish the exact function inside the hack2 module
that corresponds to 0xcd05509d. The module starts at 0xcd055060, thus
the instruction is at offset 0xcd05509d-0xcd055060=0x3d=61. In order to get the
Assembly code of the module, let's use the objdump command:

$ objdump -d hack2.o

hack2.o:     file format elf32-i386

Disassembly of section .text:

0000000000000000 <_noninterruptible_delay>:
   0:   83 ec 18                sub    $0x18,%esp
   3:   53                      push   %ebx
   4:   8b 4c 24 20             mov    0x20(%esp,1),%ecx
   8:   c7 44 24 04 00 00 00    movl   $0x0,0x4(%esp,1)
   f:   00
  10:   8b 44 24 04             mov    0x4(%esp,1),%eax
  14:   8d 54 24 14             lea    0x14(%esp,1),%edx
  18:   89 54 24 08             mov    %edx,0x8(%esp,1)
  1c:   89 54 24 0c             mov    %edx,0xc(%esp,1)
  20:   89 54 24 14             mov    %edx,0x14(%esp,1)
  24:   89 54 24 18             mov    %edx,0x18(%esp,1)
  28:   69 d1 a0 86 01 00       imul   $0x186a0,%ecx,%edx
  2e:   8d 5c 24 10             lea    0x10(%esp,1),%ebx
  32:   89 44 24 10             mov    %eax,0x10(%esp,1)
  36:   89 d8                   mov    %ebx,%eax
  38:   e8 fc ff ff ff          call   39 <_noninterruptible_delay+0x39>
  3d:   5b                      pop    %ebx
  3e:   83 c4 18                add    $0x18,%esp
  41:   c3                      ret
  42:   89 f6                   mov    %esi,%esi

0000000000000044 <init_module>:
  44:   6a 00                   push   $0x0
  46:   68 a4 01 00 00          push   $0x1a4
  4b:   68 00 00 00 00          push   $0x0
  50:   e8 fc ff ff ff          call   51 <init_module+0xd>
  55:   a3 00 00 00 00          mov    %eax,0x0
  5a:   83 c4 0c                add    $0xc,%esp
(snipped)

The instruction precedings those starting at offset 0x3d is where our process is
blocked. As you see, the instruction is inside the _uninterruptible_timeout
function, and it is a call to an external function whose symbol is yet to be
resolved by insmod.

If things are not clear, you can also compare the output of objdump with the
Assembly code generated by the compiler. To generate the Assembly file, go
into the top source directory and give the command:

# make lkhc/hack2.s

Assembly files have suffix ".s". In our case it contains:

====== excerpt from lkhc/hack2.s ======
        .file   "hack2.c"
        .version        "01.01"
gcc2_compiled.:
.text
        .align 4
        .type    _noninterruptible_delay,@function
_noninterruptible_delay:
        subl $24,%esp
        pushl %ebx
        movl 32(%esp),%ecx
        movl $0,4(%esp)
        movl 4(%esp),%eax
        leal 20(%esp),%edx
        movl %edx,8(%esp)
        movl %edx,12(%esp)
        movl %edx,20(%esp)
        movl %edx,24(%esp)
        imull $100000,%ecx,%edx
        leal 16(%esp),%ebx
        movl %eax,16(%esp)
        movl %ebx,%eax
        call sleep_on_timeout
        popl %ebx
        addl $24,%esp
        ret
.Lfe1:
        .size    _noninterruptible_delay,.Lfe1-_noninterruptible_delay
.section        .rodata
.LC11:
(snipped)

Thus, the problem is again the invocation of the sleep_on_timeout() function.
We look in the source code of the module, and we discover the same problem as
in the previous example:

====== excerpt from lkhc/hack2.c ======
static void _noninterruptible_delay(int seconds)
{
	DECLARE_WAIT_QUEUE_HEAD(wait);
	
	sleep_on_timeout(&wait, seconds*HZ*1000);
}

(A hint that may save you some trouble: remember to remove your Assembly file
from the kernel source code tree before recompiling, otherwise the make program
will use it instead of the corresponding C file.)


Example 3. The "mystery3" program fails.

The "mystery3" program fails with a "Segmentation fault" message. This message
is usually due to a system call that returned the -EFAULT error code, but it
could also be raised by a uncaught SIGSEGV signal sent to the process.

If we have the source code of the failing program, we could easily understand
what happened. Things are a bit harder if the source code of the program is not
available. 

Let's suppose we don't have the source code of mystery3. In order to understand
what happens, we must trace the system calls invoked by the program by means of
the "strace" command:

$ strace mystery3
execve("./mystery3", ["./mystery3"], [/* 36 vars */]) = 0
brk(0)                                  = 0x80496c8
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40016000
open("/etc/ld.so.preload", O_RDONLY)    = -1 ENOENT (No such file or directory)
open("/opt/kde/lib/i686/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(0xbfffedbc, 0xbfffee14)          = -1 ENOENT (No such file or directory)
open("/opt/kde/lib/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(0xbfffedbc, 0xbfffee14)          = -1 ENOENT (No such file or directory)
open("/opt/kde/lib/mmx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(0xbfffedbc, 0xbfffee14)          = -1 ENOENT (No such file or directory)
open("/opt/kde/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(0xbfffedbc, 0xbfffee14)          = 0
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(0x3, 0xbfffedc4)                = 0
old_mmap(NULL, 37203, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40017000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\264\323"..., 1024) = 1024
fstat64(0x3, 0xbfffee0c)                = 0
old_mmap(NULL, 1116516, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40021000
mprotect(0x40128000, 39268, PROT_NONE)  = 0
old_mmap(0x40128000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x106000) = 0x40128000
old_mmap(0x4012e000, 14692, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4012e000
close(3)                                = 0
munmap(0x40017000, 37203)               = 0
getpid()                                = 220
open("/proc/hack3", O_RDONLY)           = 3
read(3, "jiffies=14462\n", 128)         = 14
fstat64(0x1, 0xbffff20c <unfinished ...>
+++ killed by SIGSEGV +++


The first system calls are issued by the ELF program interpreter that sets up
the execution context of the program. We are interested in the last three system
calls. The program successfully open the /proc/hack3 virtual file (file
descriptor 3). Then it read 128 bytes from the file. The read() system call
fills the program buffer with the string "jiffies=14462\n" (length = 14 bytes).
Then the program invokes the fstat64() system call, but this system call never
terminates: the process receives a SIGSEGV signal and aborts.

The SIGSEGV signal is sent either by a User Mode process, or by the kernel 
when a serious error occurs. In this second case, the kernel builds a so-called
"kernel Oops".

A kernel Oops is a short message that describes the erroneous condition.
There are several ways to read the message (and you often have to try many of
them):

1. The Oops appears on the system console, or on a terminal that acts as the
   console. This is not very convenient for you, because console messages
   scrolls as new messages arrive, and the Oops is eventually lost.
   
2. The Oops can be read in the /proc/kmsg virtual file. Usually a system process
   named klogd continuously read from this file, and sends its content to
   another system process called syslogd. In turn, syslogd could write the
   message in one of the system log files in /var/log.
   
3. If the klogd process is not running, you could read the /proc/kmsg file by
   yourself.
   
4. The kernel messages (including Oops) can also be read with the dmesg
   command. This is my preferred method. Be aware, however, that the kernel
   message buffer usually contains only 16KB, thus the kernel messages
   are cyclically discarded.
   
Let's use the dmesg command to look for the Oops:

$ dmesg 
(snipped)
Unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c012ecc9
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c012ecc9>]    Not tainted
EFLAGS: 00010246
eax: 00000000   ebx: c8840000   ecx: 00000001   edx: 00000000
esi: fffffff7   edi: 00000001   ebp: bffff1ac   esp: c8841fb0
ds: 0018   es: 0018   ss: 0018
Process mystery3 (pid: 217, stackpage=c8841000)
Stack: c0134065 c8840000 40131170 00000001 c0106b0b 00000001 bffff20c 4012d760
       40131170 00000001 bffff1ac 000000c5 0000002b 0000002b 000000c5 400df05c
       00000023 00000212 bffff140 0000002b
Call Trace: [<c0134065>] [<c0106b0b>]
 
Code: 3b 48 08 73 06 8b 40 14 8b 14 88 85 d2 74 03 ff 42 14 89 d0

As you see, there is lot of information here. First of all, a description of
the error: "Unable to handle kernel NULL pointer dereference at virtual address
XXX". This is equivalent to state that the kernel tried to access an object
addressed by a null pointer. The CPU raised an exception ("General protection
fault"), and the corresponding exception handler established that the fault was
due to a kernel bug.

The Oops then shows the content of the eip register (the Program Counter) when
the exception was raised.  Finally, this is something we can understand. We
could look at the System.map file, and search the corresponding function:

====== excerpt from System.map ======
00000000c012ecb8 T fget
00000000c012ece0 T put_filp
=====================================

Thus, we have a function name, fget(), and an offset inside its code:
0xc012ecc9 - 0xc012ecb8 = 0x11 = 17

Next steps are straightforward: we search fget() inside the kernel source code
with the find command. It turns out to be defined in the fs/file_table.c file:

====== excerpt from fs/file_table.c ======
struct file * fget(unsigned int fd)
{
        struct file * file;
        struct files_struct *files = current->files;
 
        read_lock(&files->file_lock);
        file = fcheck(fd);
        if (file)
                get_file(file);
        read_unlock(&files->file_lock);
        return file;
}
===========================================

Now we have to locate the faulting instruction. By invoking objdump on the
correspoding object file we get:

$ objdump -d fs/file_table.o
(snipped)
0000000000000248 <fget>:
 248:   89 c1                   mov    %eax,%ecx
 24a:   b8 00 e0 ff ff          mov    $0xffffe000,%eax
 24f:   21 e0                   and    %esp,%eax
 251:   8b 80 54 05 00 00       mov    0x554(%eax),%eax
 257:   31 d2                   xor    %edx,%edx
 259:   3b 48 08                cmp    0x8(%eax),%ecx
 25c:   73 06                   jae    264 <fget+0x1c>
 25e:   8b 40 14                mov    0x14(%eax),%eax
 261:   8b 14 88                mov    (%eax,%ecx,4),%edx
 264:   85 d2                   test   %edx,%edx
 266:   74 03                   je     26b <fget+0x23>
 268:   ff 42 14                incl   0x14(%edx)
 26b:   89 d0                   mov    %edx,%eax
 26d:   c3                      ret
 26e:   89 f6                   mov    %esi,%esi
(snipped)

Linus suggests using the GNU debugger on the uncompressed kernel image:

$ gdb linux-2.4.16-lkhc/vmlinux
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-slackware-linux"...(no debugging symbols found)...
(gdb) disassemble fget
Dump of assembler code for function fget:
0xc012ecb8 <fget>:      mov    %eax,%ecx
0xc012ecba <fget+2>:    mov    $0xffffe000,%eax
0xc012ecbf <fget+7>:    and    %esp,%eax
0xc012ecc1 <fget+9>:    mov    0x554(%eax),%eax
0xc012ecc7 <fget+15>:   xor    %edx,%edx
0xc012ecc9 <fget+17>:   cmp    0x8(%eax),%ecx
0xc012eccc <fget+20>:   jae    0xc012ecd4 <fget+28>
0xc012ecce <fget+22>:   mov    0x14(%eax),%eax
0xc012ecd1 <fget+25>:   mov    (%eax,%ecx,4),%edx
0xc012ecd4 <fget+28>:   test   %edx,%edx
0xc012ecd6 <fget+30>:   je     0xc012ecdb <fget+35>
0xc012ecd8 <fget+32>:   incl   0x14(%edx)
0xc012ecdb <fget+35>:   mov    %edx,%eax
0xc012ecdd <fget+37>:   ret
0xc012ecde <fget+38>:   mov    %esi,%esi
End of assembler dump.
(gdb) quit


The instruction at offset 17 is "cmp 0x8(%eax),%ecx". (This is also confirmed
by the Code section in our Oops, which shows the bytes of the machine code of
the faulting instruction.) The Assembly instruction addresses the data
structure pointed to by the eax register, and store the field at offset 8 in
the ecx register. The source code of the function suggests that eax should
contain the content of the files field of the current process descriptor (at
offset 0x554); 8 is the offset of the file_lock field inside the data structure
pointed to by files.

Now, we have to consider another precious information coming from the Oops: the
dump of all register contents. In particular, before executing the faulting
instruction, %eax is set to 0. This causes the "NULL pointer dereference", the
general protection fault, the exception handler, the SIGSEGV signal sent to
mystery3, and so on. 

However, the crucial point is: what is the bug? Of course, the Oops could be
avoided by simply adding a check on the files field before using its value. In
this way, however, the bug is not fixed, it is just hidden. Step 2 of our bug
fixing procedure starts here: is it conceivable that the files field is NULL? A
basic understanding of the kernel is enough to establish that files should
definitevely be not null here. In fact, the field stores a pointer to the table
of opened file of the process; moreover, the fget() function is invoked to 
lookup a particular opened file specified by its file descriptor.

Clearly, the bug is not in the fget() function itself. Most likely, the bug has
been introduced in some function invoked before fget(). The Oops comes once
again to our rescue, because it shows a dump of the kernel stack when the
exception was raised. Even better, the "call trace" section lists the addresses
on the stack corresponding to the functions, discarding the function arguments:

Call Trace: [<c0134065>] [<c0106b0b>]

Looking into the System.map file we find that 0xc0134065 corresponds to
sys_fstat64(), while 0xc0106b0b is inside the system_call() function. In this
particular case, the call trace section is not particularly rewarding: the Oops
occurred while processing the fstat64() system call, as we already know.
Searching the function and looking at it doesn't help, either:

====== excerpt from fs/stat.c ======
asmlinkage long sys_fstat64(unsigned long fd, struct stat64 * statbuf, 
                            long flags)
{
        struct file * f;
        int err = -EBADF;
 
        f = fget(fd);
        if (f) {
                struct dentry * dentry = f->f_dentry;
 
                err = do_revalidate(dentry);
                if (!err)
                        err = cp_new_stat64(dentry->d_inode, statbuf);
                fput(f);
        }
        return err;
}
===================================

The function doesn't touch the files field. Therefore, the bug has been
introduced in a previous kernel function, most likely in the previous read()
system call. Unfortunately, the kernel stack is cleared when returning in User
Mode, thus the Oops cannot show the information relative to the read() system
call execution.

However, the strace command told us that the read() was reading from the
/proc/hack3 virtual file. We can thus dig in the kernel source code looking for
the function that implements /proc/hack3. We easily find the lkhc/hack3.c
source file, and discover the buggy instruction:

====== excerpt from lkhc/hack3.c ======
static int hack3_read(char *page, char **start, off_t off, int count, 
int *eof, void *data)
{
        int written;
        current->files = NULL;
        written = sprintf(page+off, "jiffies=%d\n", jiffies);
        return written;
}
=======================================

Eureka!

(A final note on kernel Oops. A small program named ksymoops fetches several
bits of information from the System.map file, the uncompressed kernel image,
the /proc/ksyms file, and so on, and then interprets the Oops for you. The home
page of the program is listed in the scripts/ksymoops directory of the Linux
source code.)