09 Feb, 2008

40 commits

  • ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be
    inline. Besides, they give no optimization being inline as they perform
    calls inside in any case.

    Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC
    internal API.

    $ ./scripts/bloat-o-meter vmlinux-orig vmlinux
    add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499)
    function old new delta
    ipcget - 392 +392
    ipc_lock_check_down - 49 +49
    ipc_lock_check - 49 +49
    sys_semget 119 105 -14
    sys_shmget 108 86 -22
    sys_msgget 100 78 -22
    do_msgsnd 665 631 -34
    do_msgrcv 680 644 -36
    do_shmat 771 733 -38
    sys_msgctl 1302 1229 -73
    ipcget_new 80 - -80
    sys_semtimedop 1534 1452 -82
    sys_semctl 2034 1922 -112
    sys_shmctl 1919 1765 -154
    ipcget_public 322 - -322

    The ipcget() growth is the result of gcc inlining of currently static
    ipcget_new/_public.

    Signed-off-by: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Also removes a cflag comparison that caused some mode changes to get wrongly
    ignored

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Signed-off-by: Alan Cox
    Cc: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Get the constness right, avoid nasty cast.

    Cc: Ingo Molnar
    Cc: Kyle McMartin
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • module.c should not define linker variables on its own. We have an include
    file for that.

    Signed-off-by: Christoph Lameter
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The module subsystem cannot handle symbols that are zero. If symbols are
    present that have a zero value then the module resolver prints out a
    message that these symbols are unresolved.

    [akinobu.mita@gmail.com: fix __find_symbl() error checks]
    Cc: Mathieu Desnoyers
    Cc: Kay Sievers
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Backend for s390.

    Acked-by: Alan Cox
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Give architectures that support the new termios2 the possibilty to overide the
    user_termios_to_kernel_termios and kernel_termios_to_user_termios macros. As
    soon as all architectures that use the generic variant have been converted the
    ifdefs can go away again. Architectures in question are avr32, frv, powerpc
    and s390.

    Cc: Alan Cox
    Cc: Paul Mackerras
    Cc: David Howells
    Cc: Haavard Skinnemoen
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Fix an off by one bug in the fault reason string reporting function, and
    clean up some of the code around this buglet.

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: mark gross
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    mark gross
     
  • Add support for protected memory enable bits by clearing them if they are
    set at startup time. Some future boot loaders or firmware could have this
    bit set after it loads the kernel, and it needs to be cleared if DMA's are
    going to happen effectively.

    Signed-off-by: mark gross
    Acked-by: Muli Ben-Yehuda
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    mark gross
     
  • Typical PDE creation code looks like:

    pde = create_proc_entry("foo", 0, NULL);
    if (pde)
    pde->proc_fops = &foo_proc_fops;

    Notice that PDE is first created, only then ->proc_fops is set up to
    final value. This is a problem because right after creation
    a) PDE is fully visible in /proc , and
    b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
    possible to ->read without ->open (see one class of oopses below).

    The fix is new API called proc_create() which makes sure ->proc_fops are
    set up before gluing PDE to main tree. Typical new code looks like:

    pde = proc_create("foo", 0, NULL, &foo_proc_fops);
    if (!pde)
    return -ENOMEM;

    Fix most networking users for a start.

    In the long run, create_proc_entry() for regular files will go.

    BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
    printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
    Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    last sysfs file: /sys/block/sda/sda1/dev
    Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom

    Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
    EIP: 0060:[] EFLAGS: 00210002 CPU: 0
    EIP is at mutex_lock_nested+0x75/0x25d
    EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
    ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
    Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
    00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
    00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
    Call Trace:
    [] seq_read+0x24/0x28a
    [] seq_read+0x0/0x28a
    [] seq_read+0x24/0x28a
    [] seq_read+0x0/0x28a
    [] proc_reg_read+0x60/0x73
    [] proc_reg_read+0x0/0x73
    [] vfs_read+0x6c/0x8b
    [] sys_read+0x3c/0x63
    [] sysenter_past_esp+0x5f/0xa5
    [] destroy_inode+0x24/0x33
    =======================
    INFO: lockdep is turned off.
    Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
    EIP: [] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Long ago when the CLONE_THREAD support first went it someone thought it
    would be wise to point /proc/self at /proc/ instead of /proc/.

    Given that /proc/ can return information about a very different task
    (if enough things have been unshared) then our current process /proc/
    seems blatantly wrong. So far I have yet to think up an example where the
    current behavior would be advantageous, and I can see several places where
    it is seriously non-intuitive.

    We may be stuck with the current broken behavior for backwards
    compatibility reasons but lets try fixing our ancient bug for the 2.6.25
    time frame and see if anyone screams.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ingo Molnar
    Cc: "Guillaume Chazarain"
    Cc: "Pavel Emelyanov"
    Cc: "Rafael J. Wysocki"
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Currently if you access a /proc that is not mounted with your processes
    current pid namespace /proc/self will point at a completely random task.

    This patch fixes /proc/self to point to the current process if it is
    available in the particular mount of /proc or to return -ENOENT if the
    current process is not visible.

    Signed-off-by: Eric W. Biederman
    Cc: Pavel Emelyanov
    Cc: Alexey Dobriyan
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Currently we possibly lookup the pid in the wrong pid namespace. So
    seq_file convert proc_pid_status which ensures the proper pid namespaces is
    passed in.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: another build fix]
    [akpm@linux-foundation.org: s390 build fix]
    [akpm@linux-foundation.org: fix task_name() output]
    [akpm@linux-foundation.org: fix nommu build]
    Signed-off-by: Eric W. Biederman
    Cc: Andrew Morgan
    Cc: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Menage
    Cc: Paul Jackson
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This conversion is just for code cleanliness, uniformity, and general safety.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Currently (as pointed out by Oleg) do_task_stat has a race when calling
    task_pid_nr_ns with the task exiting. In addition do_task_stat is not
    currently displaying information in the context of the pid namespace that
    mounted the /proc filesystem. So "cut -d' ' -f 1 /proc//stat" may not
    equal .

    This patch fixes the problem by converting to a single_open seq_file show
    method. Getting the pid namespace from the filesystem superblock instead of
    current, and simply using the the struct pid from the inode instead of
    attempting to get that same pid from the task.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Currently many /proc/pid files use a crufty precursor to the current seq_file
    api, and they don't have direct access to the pid_namespace or the pid of for
    which they are displaying data.

    So implement proc_single_file_operations to make the seq_file routines easy to
    use, and to give access to the full state of the pid of we are displaying data
    for.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Print a warning if PDE is registered with a name which already exists in
    target directory.

    Bug report and a simple fix can be found here:
    http://bugzilla.kernel.org/show_bug.cgi?id=8798

    [\n fixlet and no undescriptive variable usage --adobriyan]
    [akpm@linux-foundation.org: make printk comprehensible]
    Signed-off-by: Zhang Rui
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Rui
     
  • proc symlinks always have valid ->data containing destination of symlink. No
    need to check it on removal -- proc_symlink() already done it.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Move code around so as to reduce the number of forward-declarations.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Pseudo-code for lookup effectively is:

    LOCK kernel
    LOCK proc_subdir_lock
    find PDE
    UNLOCK proc_subdir_lock

    get inode

    LOCK proc_subdir_lock
    goto unlock
    UNLOCK proc_subdir_lock
    UNLOCK kernel

    We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping
    to unlock_kernel() directly.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • proc is not modular, so MODULE_LICENSE just expands to empty space. proc
    without doubts remains GPLed.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • There's already an option controlling the net namespaces cloning code, so make
    it work the same way as all the other namespaces' options.

    Signed-off-by: Pavel Emelyanov
    Cc: "David S. Miller"
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Just like with the user namespaces, move the namespace management code into
    the separate .c file and mark the (already existing) PID_NS option as "depend
    on NAMESPACES"

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Make the user_namespace.o compilation depend on this option and move the
    init_user_ns into user.c file to make the kernel compile and work without the
    namespaces support. This make the user namespace code be organized similar to
    other namespaces'.

    Also mask the USER_NS option as "depend on NAMESPACES".

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Currently the IPC namespace management code is spread over the ipc/*.c files.
    I moved this code into ipc/namespace.c file which is compiled out when needed.

    The linux/ipc_namespace.h file is used to store the prototypes of the
    functions in namespace.c and the stubs for NAMESPACES=n case. This is done
    so, because the stub for copy_ipc_namespace requires the knowledge of the
    CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
    included into many many .c files via the sys.h->sem.h sequence so adding the
    sched.h into it will make all these .c depend on sched.h which is not that
    good. On the other hand the knowledge about the namespaces stuff is required
    in 4 .c files only.

    Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
    msg.c and shm.c files. It turned out that moving these functions into
    namespaces.c is not that easy because they use many other calls and macros
    from the original file. Moving them would make this patch complicated. On
    the other hand all these functions can be consolidated, so I will send a
    separate patch doing this a bit later.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Currently all the namespace management code is in the kernel/utsname.c file,
    so just compile it out and make stubs in the appropriate header.

    The init namespace itself is in init/version.c and is in the kernel all the
    time.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov