09 May, 2007

3 commits

  • Remove includes of where it is not used/needed.
    Suggested by Al Viro.

    Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
    sparc64, and arm (all 59 defconfigs).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
    namespaces. But they have different code paths.

    This patch merges all the nsproxy and its associated namespace copy/clone
    handling (as much as possible). Posted on container list earlier for
    feedback.

    - Create a new nsproxy and its associated namespaces and pass it back to
    caller to attach it to right process.

    - Changed all copy_*_ns() routines to return a new copy of namespace
    instead of attaching it to task->nsproxy.

    - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.

    - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
    just incase.

    - Get rid of all individual unshare_*_ns() routines and make use of
    copy_*_ns() instead.

    [akpm@osdl.org: cleanups, warning fix]
    [clg@fr.ibm.com: remove dup_namespaces() declaration]
    [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
    [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc:
    Signed-off-by: Cedric Le Goater
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • The value of shmmax may be larger than will fit in the struct used by
    the 32bit compat version of sys_shmctl. This change mirrors what the
    normal sys_shmctl does when called with the old IPC_INFO command.

    Signed-off-by: Guy Streeter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guy Streeter
     

08 May, 2007

1 commit

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

28 Mar, 2007

1 commit

  • When CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) claims success, but did not actually
    clone a new IPC namespace.

    Fix this to return -EINVAL so the caller knows his request was denied.

    Signed-off-by: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

07 Mar, 2007

1 commit


02 Mar, 2007

2 commits

  • This patch provides the following hugetlb-related fixes to the recent stacked
    shm files changes:
    - Update is_file_hugepages() so it will reconize hugetlb shm segments.
    - get_unmapped_area must be called with the nested file struct to handle
    the sfd->file->f_ops->get_unmapped_area == NULL case.
    - The fsync f_op must be wrapped since it is specified in the hugetlbfs
    f_ops.

    This is based on proposed fixes from Eric Biederman that were debugged and
    tested by me. Without it, attempting to use hugetlb shared memory segments
    on powerpc (and likely ia64) will kill your box.

    Signed-off-by: Adam Litke
    Cc: Eric Biederman
    Cc: Andrew Morton
    Acked-by: William Irwin
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • shm_nopage() can become static.

    Signed-off-by: Adrian Bunk
    Acked-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

21 Feb, 2007

1 commit

  • The current ipc shared memory code runs into several problems because it
    does not quite use files like the rest of the kernel. With the option of
    backing ipc shared memory with either hugetlbfs or ordinary shared memory
    the problems got worse. With the added support for ipc namespaces things
    behaved so unexpected that we now have several bad namespace reference
    counting bugs when using what appears at first glance to be a reasonable
    idiom.

    So to attack these problems and hopefully make the code more maintainable
    this patch simply uses the files provided by other parts of the kernel and
    builds it's own files out of them. The shm files are allocated in do_shmat
    and freed when their reference count drops to zero with their last unmap.
    The file and vm operations that we don't want to implement or we don't
    implement completely we just delegate to the operations of our backing
    file.

    This means that we now get an accurate shm_nattch count for we have a
    hugetlbfs inode for backing store, and the shm accounting of last attach
    and last detach time work as well.

    This means that getting a reference to the ipc namespace when we create the
    file and dropping the referenece in the release method is now safe and
    correct.

    This means we no longer need a special case for clearing VM_MAYWRITE
    as our file descriptor now only has write permissions when we have
    requested write access when calling shmat. Although VM_SHARED is now
    cleared as well which I believe is harmless and is mostly likely a
    minor bug fix.

    By using the same set of operations for both the hugetlb case and regular
    shared memory case shmdt is not simplified and made slightly more correct
    as now the test "vma->vm_ops == &shm_vm_ops" is 100% accurate in spotting
    all shared memory regions generated from sysvipc shared memory.

    Signed-off-by: Eric W. Biederman
    Cc: Michal Piotrowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

15 Feb, 2007

2 commits

  • The semantic effect of insert_at_head is that it would allow new registered
    sysctl entries to override existing sysctl entries of the same name. Which is
    pain for caching and the proc interface never implemented.

    I have done an audit and discovered that none of the current users of
    register_sysctl care as (excpet for directories) they do not register
    duplicate sysctl entries.

    So this patch simply removes the support for overriding existing entries in
    the sys_sysctl interface since no one uses it or cares and it makes future
    enhancments harder.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ralf Baechle
    Acked-by: Martin Schwidefsky
    Cc: Russell King
    Cc: David Howells
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Andi Kleen
    Cc: Jens Axboe
    Cc: Corey Minyard
    Cc: Neil Brown
    Cc: "John W. Linville"
    Cc: James Bottomley
    Cc: Jan Kara
    Cc: Trond Myklebust
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
    with special cases, and by keeping all of the ipc logic to together it makes
    the code a little more readable.

    [gcoady.lk@gmail.com: build fix]
    Signed-off-by: Eric W. Biederman
    Cc: Serge E. Hallyn
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Signed-off-by: Grant Coady
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

13 Feb, 2007

3 commits

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Many struct file_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • The problem we were assuming that current->nsproxy->ipc_ns would never
    change while someone has our file in /proc/sysvipc/ file open. Given that
    this can change with both unshare and by passing the file descriptor to
    another process that assumption is occasionally wrong.

    Therefore this patch causes /proc/sysvipc/* to cache the namespace and
    increment it's count when we open the file and to decrement the count when
    we close the file, ensuring consistent operation with no surprises.

    Signed-off-by: Eric W. Biederman
    Cc: Serge E. Hallyn
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

12 Feb, 2007

1 commit

  • A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
    source files, including:

    * make multi-line initial descriptions single line
    * denote some function names, constants and structs as such
    * change erroneous opening '/*' to '/**' in a few places
    * reword some text for clarity

    Signed-off-by: Robert P. J. Day
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

24 Jan, 2007

1 commit


14 Dec, 2006

1 commit

  • Run this:

    #!/bin/sh
    for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
    echo "De-casting $f..."
    perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
    done

    And then go through and reinstate those cases where code is casting pointers
    to non-pointers.

    And then drop a few hunks which conflicted with outstanding work.

    Cc: Russell King , Ian Molton
    Cc: Mikael Starvik
    Cc: Yoshinori Sato
    Cc: Roman Zippel
    Cc: Geert Uytterhoeven
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Kyle McMartin
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Greg KH
    Cc: Jens Axboe
    Cc: Paul Fulghum
    Cc: Alan Cox
    Cc: Karsten Keil
    Cc: Mauro Carvalho Chehab
    Cc: Jeff Garzik
    Cc: James Bottomley
    Cc: Ian Kent
    Cc: Steven French
    Cc: David Woodhouse
    Cc: Neil Brown
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

09 Dec, 2006

1 commit


08 Dec, 2006

4 commits

  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Burman Yan
     
  • Currently we allocate 64k space on the user stack and use it the msgbuf for
    sys_{msgrcv,msgsnd} for compat and the results are later copied in user [
    by copy_in_user]. This patch introduces helper routines for
    sys_{msgrcv,msgsnd} as below:

    do_msgsnd() : Accepts the mtype and user space ptr to the buffer along with
    the msqid and msgflg.

    do_msgrcv() : Accepts a kernel space ptr to mtype and a userspace ptr to
    the buffer. The mtype has to be copied back the user space msgbuf by the
    caller.

    These changes avoid the need to allocate the msgsize on the userspace (
    thus removing the size limt ) and the overhead of an extra copy_in_user().

    Signed-off-by: Suzuki K P
    Cc: Arnd Bergmann
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    suzuki
     
  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

22 Nov, 2006

1 commit

  • Pass the work_struct pointer to the work function rather than context data.
    The work function can use container_of() to work out the data.

    For the cases where the container of the work_struct may go away the moment the
    pending bit is cleared, it is made possible to defer the release of the
    structure by deferring the clearing of the pending bit.

    To make this work, an extra flag is introduced into the management side of the
    work_struct. This governs auto-release of the structure upon execution.

    Ordinarily, the work queue executor would release the work_struct for further
    scheduling or deallocation by clearing the pending bit prior to jumping to the
    work function. This means that, unless the driver makes some guarantee itself
    that the work_struct won't go away, the work function may not access anything
    else in the work_struct or its container lest they be deallocated.. This is a
    problem if the auxiliary data is taken away (as done by the last patch).

    However, if the pending bit is *not* cleared before jumping to the work
    function, then the work function *may* access the work_struct and its container
    with no problems. But then the work function must itself release the
    work_struct by calling work_release().

    In most cases, automatic release is fine, so this is the default. Special
    initiators exist for the non-auto-release case (ending in _NAR).

    Signed-Off-By: David Howells

    David Howells
     

05 Nov, 2006

1 commit

  • Commit 5a06a363ef48444186f18095ae1b932dddbbfa89 ("[PATCH] ipc/msg.c:
    clean up coding style") breaks fakeroot on Alpha (variously hangs or
    oopses), according to a report by Falk Hueffner.

    The fact that the code seems to rely on compiler access ordering through
    the use of "volatile" is a pretty certain sign that the code has locking
    problems, and we should fix those properly and then remove the whole
    "volatile" entirely.

    But in the meantime, the movement of "volatile" was unintentional, and
    should be reverted.

    Cc: Falk Hueffner
    Cc: Andrew Morton
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Nov, 2006

1 commit

  • Fix two issuses related to ipc_ids->entries freeing.

    1. When freeing ipc namespace we need to free entries allocated
    with ipc_init_ids().

    2. When removing old entries in grow_ary() ipc_rcu_putref()
    may be called on entries set to &ids->nullentry earlier in
    ipc_init_ids().
    This is almost impossible without namespaces, but with
    them this situation becomes possible.

    Found during OpenVZ testing after obvious leaks in beancounters.

    Signed-off-by: Pavel Emelianov
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     

04 Oct, 2006

2 commits


02 Oct, 2006

6 commits


01 Oct, 2006

2 commits

  • This is mostly included for parity with dec_nlink(), where we will have some
    more hooks. This one should stay pretty darn straightforward for now.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • When a filesystem decrements i_nlink to zero, it means that a write must be
    performed in order to drop the inode from the filesystem.

    We're shortly going to have keep filesystems from being remounted r/o between
    the time that this i_nlink decrement and that write occurs.

    So, add a little helper function to do the decrements. We'll tie into it in a
    bit to note when i_nlink hits zero.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

27 Sep, 2006

2 commits

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • * Rougly half of callers already do it by not checking return value
    * Code in drivers/acpi/osl.c does the following to be sure:

    (void)kmem_cache_destroy(cache);

    * Those who check it printk something, however, slab_error already printed
    the name of failed cache.
    * XFS BUGs on failed kmem_cache_destroy which is not the decision
    low-level filesystem driver should make. Converted to ignore.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

01 Aug, 2006

1 commit

  • Clean up ipc/msg.c to conform to Documentation/CodingStyle. (before it was
    an inconsistent hodepodge of various coding styles)

    Verified that the before/after .o's are identical.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

01 Jul, 2006

1 commit


23 Jun, 2006

1 commit

  • Pass the POSIX lock owner ID to the flush operation.

    This is useful for filesystems which don't want to store any locking state
    in inode->i_flock but want to handle locking/unlocking POSIX locks
    internally. FUSE is one such filesystem but I think it possible that some
    network filesystems would need this also.

    Also add a flag to indicate that a POSIX locking request was generated by
    close(), so filesystems using the above feature won't send an extra locking
    request in this case.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi