20 Jul, 2007

4 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • This patch adds an interface to set/reset flags which determines each memory
    segment should be dumped or not when a core file is generated.

    /proc//coredump_filter file is provided to access the flags. You can
    change the flag status for a particular process by writing to or reading from
    the file.

    The flag status is inherited to the child process when it is created.

    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • This patch changes mm_struct.dumpable to a pair of bit flags.

    set_dumpable() converts three-value dumpable to two flags and stores it into
    lower two bits of mm_struct.flags instead of mm_struct.dumpable.
    get_dumpable() behaves in the opposite way.

    [akpm@linux-foundation.org: export set_dumpable]
    Signed-off-by: Hidehiro Kawai
    Cc: Alan Cox
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kawai, Hidehiro
     
  • Optimize show_stat to collect per-irq information just once.

    On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem.
    On every call to kstat_irqs, the process brings in per-cpu data from all
    online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS
    results in (256+32*63) * 63 remote cpu references on a 64 cpu config.
    Considering the fact that we already compute this value per-cpu, we can
    save on the remote references as below.

    Signed-off-by: Alok N Kataria
    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     

18 Jul, 2007

1 commit

  • KSYM_NAME_LEN is peculiar in that it does not include the space for the
    trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
    buffer. This is nonsense and error-prone. Moreover, when the caller
    forgets that it's very likely to subtly bite back by corrupting the stack
    because the last position of the buffer is always cleared to zero.

    This patch increments KSYM_NAME_LEN by one and updates code accordingly.

    * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
    is fixed.

    * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
    MODULE_NAME_LEN was treated as if it didn't include space for the
    trailing '\0'. Fix it.

    Signed-off-by: Tejun Heo
    Acked-by: Paulo Marques
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

17 Jul, 2007

9 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
    [PATCH] sched: fix up fs/proc/array.c whitespace problems
    [PATCH] sched: prettify prio_to_wmult[]
    [PATCH] sched: document prio_to_wmult[]
    [PATCH] sched: improve weight-array comments
    [PATCH] sched: remove dead code from task_stime()

    Fixed up trivial conflict in fs/proc/array.c

    Linus Torvalds
     
  • This reduces the memory footprint and it enforces that only the current
    task can enable seccomp on itself (this is a requirement for a
    strightforward [modulo preempt ;) ] TIF_NOTSC implementation).

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Make available to the user the following task and process performance
    statistics:

    * Involuntary Context Switches (task_struct->nivcsw)
    * Voluntary Context Switches (task_struct->nvcsw)

    Statistics information is available from:
    1. taskstats interface (Documentation/accounting/)
    2. /proc/PID/status (task only).

    This data is useful for detecting hyperactivity patterns between processes.

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Maxim Uvarov
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Jonathan Lim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maxim Uvarov
     
  • It's a bit dopey-looking and can permit a task to cause a pagefault in an mm
    which it doesn't have permission to read from.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Function proc_register() will assign proc_dir_operations and
    proc_dir_inode_operations to ent's members proc_fops and proc_iops
    correctly if ent is a directory. So the early assignment isn't
    necessary.

    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Changli Gao
     
  • Simple and stupid like some previous ones. Just use new API.

    Signed-off-by: Pavel Emelianov
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused uptime not to increase
    during suspend. This may cause confusion so I restore the old behaviour by
    using the boot based time instead of monotonic for uptime.

    Signed-off-by: Tomas Janousek
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomas Janousek
     
  • Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and
    process start times to become invalid after suspend. Using boot based time
    for those restores the old behaviour and fixes the issue.

    [akpm@linux-foundation.org: little cleanup]
    Signed-off-by: Tomas Janousek
    Cc: Tomas Smetana
    Acked-by: John Stultz
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomas Janousek
     
  • Fix following races:
    ===========================================
    1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
    meanwhile. Or, more generically, system call done on /proc file, method
    supplied by module is called, module dissapeares meanwhile.

    pde = create_proc_entry()
    if (!pde)
    return -ENOMEM;
    pde->write_proc = ...
    open
    write
    copy_from_user
    pde = create_proc_entry();
    if (!pde) {
    remove_proc_entry();
    return -ENOMEM;
    /* module unloaded */
    }
    *boom*
    ==========================================
    2. bogo-revoke aka proc_kill_inodes()

    remove_proc_entry vfs_read
    proc_kill_inodes [check ->f_op validness]
    [check ->f_op->read validness]
    [verify_area, security permissions checks]
    ->f_op = NULL;
    if (file->f_op->read)
    /* ->f_op dereference, boom */

    NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
    see how this scheme behaves, then extend if needed for directories.
    Directories creators in /proc only set ->owner for them, so proxying for
    directories may be unneeded.

    NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
    ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
    If your in-tree module uses something else, yell on me. Full audit pending.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

16 Jul, 2007

2 commits


10 Jul, 2007

4 commits


17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 May, 2007

18 commits

  • /proc/pid/clear_refs is only defined in the CONFIG_MMU case, so make sure we
    don't have any references to clear_refs_smap() in generic procfs code.

    Signed-off-by: David Rientjes
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Cleanup using simple_read_from_buffer() in procfs.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • notify_change() already calls security_inode_setattr() before
    calling iop->setattr.

    Alan sayeth

    This is a behaviour change on all of these and limits some behaviour of
    existing established security modules

    When inode_change_ok is called it has side effects. This includes
    clearing the SGID bit on attribute changes caused by chmod. If you make
    this change the results of some rulesets may be different before or after
    the change is made.

    I'm not saying the change is wrong but it does change behaviour so that
    needs looking at closely (ditto all other attribute twiddles)

    Signed-off-by: Steve Beattie
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: John Johansen
    Acked-by: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Cc: Alan Cox
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Johansen
     
  • notify_change() already calls security_inode_setattr() before
    calling iop->setattr.

    Signed-off-by: Tony Jones
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: John Johansen
    Acked-by: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Johansen
     
  • We can save some lines of code by using seq_release_private().

    Signed-off-by: Martin Peschke
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Peschke
     
  • kallsyms_lookup() can go iterating over modules list unprotected which is OK
    for emergency situations (oops), but not OK for regular stuff like
    /proc/*/wchan.

    Introduce lookup_symbol_name()/lookup_module_symbol_name() which copy symbol
    name into caller-supplied buffer or return -ERANGE. All copying is done with
    module_mutex held, so...

    Signed-off-by: Alexey Dobriyan
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Several kallsyms_lookup() pass dummy arguments but only need, say, module's
    name. Make kallsyms_lookup() accept NULLs where possible.

    Also, makes picture clearer about what interfaces are needed for all symbol
    resolving business.

    Signed-off-by: Alexey Dobriyan
    Cc: Rusty Russell
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove includes of where it is not used/needed.
    Suggested by Al Viro.

    Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
    sparc64, and arm (all 59 defconfigs).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Additions and removal from tty_drivers list were just done as well as
    iterating on it for /proc/tty/drivers generation.

    testing: modprobe/rmmod loop of simple module which does nothing but
    tty_register_driver() vs cat /proc/tty/drivers loop

    BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
    printing eip:
    c01cefa7
    *pde = 00000000
    Oops: 0000 [#1]
    PREEMPT
    last sysfs file: devices/pci0000:00/0000:00:1d.7/usb5/5-0:1.0/bInterfaceProtocol
    Modules linked in: ohci_hcd af_packet e1000 ehci_hcd uhci_hcd usbcore xfs
    CPU: 0
    EIP: 0060:[] Not tainted VLI
    EFLAGS: 00010297 (2.6.21-rc4-mm1 #4)
    EIP is at vsnprintf+0x3a4/0x5fc
    eax: 6b6b6b6b ebx: f6cb50f2 ecx: 6b6b6b6b edx: fffffffe
    esi: c0354700 edi: f6cb6000 ebp: 6b6b6b6b esp: f31f5e68
    ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
    Process cat (pid: 31864, ti=f31f4000 task=c1998030 task.ti=f31f4000)
    Stack: 00000000 c0103f20 c013003a c0103f20 00000000 f6cb50da 0000000a 00000f0e
    f6cb50f2 00000010 00000014 ffffffff ffffffff 00000007 c0354753 f6cb50f2
    f73e39dc f73e39dc 00000001 c0175416 f31f5ed8 f31f5ed4 0ee00000 f32090bc
    Call Trace:
    [] restore_nocheck+0x12/0x15
    [] mark_held_locks+0x6d/0x86
    [] restore_nocheck+0x12/0x15
    [] seq_printf+0x2e/0x52
    [] show_tty_range+0x35/0x1f3
    [] seq_printf+0x2e/0x52
    [] show_tty_driver+0x8a/0x1d9
    [] seq_read+0x70/0x2ba
    [] seq_read+0x0/0x2ba
    [] proc_reg_read+0x63/0x9f
    [] vfs_read+0x7d/0xb5
    [] proc_reg_read+0x0/0x9f
    [] sys_read+0x41/0x6a
    [] sysenter_past_esp+0x5f/0x99
    =======================
    Code: 00 8b 4d 04 e9 44 ff ff ff 8d 4d 04 89 4c 24 50 8b 6d 00 81 fd ff 0f 00 00 b8 a4 c1 35 c0 0f 46 e8 8b 54 24 2c 89 e9 89 c8 eb 06 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 89 c6 8b 44 24 28 89
    EIP: [] vsnprintf+0x3a4/0x5fc SS:ESP 0068:f31f5e68

    Signed-off-by: Alexey Dobriyan
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Eternal quest to make

    while true; do cat /proc/fs/xfs/stat >/dev/null 2>/dev/null; done
    while true; do find /proc -type f 2>/dev/null | xargs cat >/dev/null 2>/dev/null; done
    while true; do modprobe xfs; rmmod xfs; done

    work reliably continues and now kernel oopses in the following way:

    BUG: unable to handle ... at virtual address 6b6b6b6b
    EIP is at badness
    process: cat
    proc_oom_score
    proc_info_read
    sys_fstat64
    vfs_read
    proc_info_read
    sys_read

    Failing code is prefetch hidden in list_for_each_entry() in badness().
    badness() is reachable from two points. One is proc_oom_score, another
    is out_of_memory() => select_bad_process() => badness().

    Second path grabs tasklist_lock, while first doesn't.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Add support for finding out the current file position, open flags and
    possibly other info in the future.

    These new entries are added:

    /proc/PID/fdinfo/FD
    /proc/PID/task/TID/fdinfo/FD

    For each fd the information is provided in the following format:

    pos: 1234
    flags: 0100002

    [bunk@stusta.de: make struct proc_fdinfo_file_operations static]
    Signed-off-by: Miklos Szeredi
    Cc: Alexey Dobriyan
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Change the order of fields of struct pid_entry (file fs/proc/base.c) in order
    to avoid a hole on 64bit archs. (8 bytes saved per object)

    Also change all pid_entry arrays to be const qualified, to make clear they
    must not be modified.

    Before (on x86_64) :

    # size fs/proc/base.o
    text data bss dec hex filename
    15549 2192 0 17741 454d fs/proc/base.o

    After :

    # size fs/proc/base.o
    text data bss dec hex filename
    17229 176 0 17405 43fd fs/proc/base.o

    Thats 336 bytes saved on kernel size on x86_64

    Signed-off-by: Eric Dumazet
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
    information about the memory location and usage of processes. Issues:

    - maps should not be world-readable, especially if programs expect any
    kind of ASLR protection from local attackers.
    - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
    check the maps when %n is in a *printf call, and a setuid(getuid())
    process wouldn't be able to read its own maps file. (For reference
    see http://lkml.org/lkml/2006/1/22/150)
    - a system-wide toggle is needed to allow prior behavior in the case of
    non-root applications that depend on access to the maps contents.

    This change implements a check using "ptrace_may_attach" before allowing
    access to read the maps contents. To control this protection, the new knob
    /proc/sys/kernel/maps_protect has been added, with corresponding updates to
    the procfs documentation.

    [akpm@linux-foundation.org: build fixes]
    [akpm@linux-foundation.org: New sysctl numbers are old hat]
    Signed-off-by: Kees Cook
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • WARN_ON(de && de->deleted); is sooo unreliable. Why?

    proc_lookup remove_proc_entry
    =========== =================
    lock_kernel();
    spin_lock(&proc_subdir_lock);
    [find proc entry]
    spin_unlock(&proc_subdir_lock);
    spin_lock(&proc_subdir_lock);
    [find proc entry]

    proc_get_inode
    ==============
    WARN_ON(de && de->deleted); ...

    if (!atomic_read(&de->count))
    free_proc_entry(de);
    else
    de->deleted = 1;

    So, if you have some strange oops [1], and doesn't see this WARN_ON it means
    nothing.

    [1] try_module_get() of module which doesn't exist, two lines below
    should suffice, or not?

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Fix the following race:

    proc_readdir remove_proc_entry
    ============ =================

    spin_lock(&proc_subdir_lock);
    [choose PDE to start filldir from]
    spin_unlock(&proc_subdir_lock);
    spin_lock(&proc_subdir_lock);
    [find PDE]
    [free PDE, refcount is 0]
    spin_unlock(&proc_subdir_lock);
    /* boom */
    if (filldir(dirent, de->name, ...

    [de_put on error path --adobriyan]
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • proc_lookup remove_proc_entry
    =========== =================

    lock_kernel();
    spin_lock(&proc_subdir_lock);
    [find PDE with refcount 0]
    spin_unlock(&proc_subdir_lock);
    spin_lock(&proc_subdir_lock);
    [find PDE with refcount 0]
    [check refcount and free PDE]
    spin_unlock(&proc_subdir_lock);
    proc_get_inode:
    de_get(de); /* boom */

    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • This past week I was playing around with that pahole tool
    (http://oops.ghostprotocols.net:81/acme/dwarves/) and looking at the size
    of various struct in the kernel. I was surprised by the size of the
    task_struct on x86_64, approaching 4K. I looked through the fields in
    task_struct and found that a number of them were declared as "unsigned
    long" rather than "unsigned int" despite them appearing okay as 32-bit
    sized fields. On x86_64 "unsigned long" ends up being 8 bytes in size and
    forces 8 byte alignment. Is there a reason there a reason they are
    "unsigned long"?

    The patch below drops the size of the struct from 3808 bytes (60 64-byte
    cachelines) to 3760 bytes (59 64-byte cachelines). A couple other fields
    in the task struct take a signficant amount of space:

    struct thread_struct thread; 688
    struct held_lock held_locks[30]; 1680

    CONFIG_LOCKDEP is turned on in the .config

    [akpm@linux-foundation.org: fix printk warnings]
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    William Cohen
     
  • /proc/$PID/fd has r-x------ permissions, so if process does setuid(), it
    will not be able to access /proc/*/fd/. This breaks fstatat() emulation
    in glibc.

    open("foo", O_RDONLY|O_DIRECTORY) = 4
    setuid32(65534) = 0
    stat64("/proc/self/fd/4/bar", 0xbfafb298) = -1 EACCES (Permission denied)

    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: James Morris
    Cc: Chris Wright
    Cc: Ulrich Drepper
    Cc: Oleg Nesterov
    Acked-By: Kirill Korotaev
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

08 May, 2007

1 commit

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter