17 Dec, 2009

1 commit


22 Sep, 2009

1 commit


29 Jun, 2009

1 commit

  • This patch fixes an imbalance message as reported by Sanchin Sant.
    As we don't need to measure the message queue, just increment the
    counters.

    Reported-by: Sanchin Sant
    Signed-off-by: Mimi Zohar
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    Mimi Zohar
     

07 Apr, 2009

3 commits

  • Largely inspired from ipc/ipc_sysctl.c. This patch isolates the mqueue
    sysctl stuff in its own file.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Cedric Le Goater
    Signed-off-by: Nadia Derbey
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Implement multiple mounts of the mqueue file system, and link it to usage
    of CLONE_NEWIPC.

    Each ipc ns has a corresponding mqueuefs superblock. When a user does
    clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
    internal mount of a new mqueuefs sb linked to the new ipc ns.

    When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
    mqueuefs superblock.

    Posix message queues can be worked with both through the mq_* system calls
    (see mq_overview(7)), and through the VFS through the mqueue mount. Any
    usage of mq_open() and friends will work with the acting task's ipc
    namespace. Any actions through the VFS will work with the mqueuefs in
    which the file was created. So if a user doesn't remount mqueuefs after
    unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
    /dev/mqueue".

    If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
    ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
    ipc_ns:1 will be freed, (2) it's superblock will live on until task b
    umounts the corresponding mqueuefs, and vfs actions will continue to
    succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
    the deceased ipc_ns:1.

    To make this happen, we must protect the ipc reference count when

    a) a task exits and drops its ipcns->count, since it might be dropping
    it to 0 and freeing the ipcns

    b) a task accesses the ipcns through its mqueuefs interface, since it
    bumps the ipcns refcount and might race with the last task in the ipcns
    exiting.

    So the kref is changed to an atomic_t so we can use
    atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
    through ns = mqueuefs_sb->s_fs_info is protected by the same lock.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
    The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
    posix message queue namespaces and the SYSV ipc namespaces.

    The sysctl code will be fixed separately in patch 3. After just this
    patch, making a change to posix mqueue tunables always changes the values
    in the initial ipc namespace.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

01 Apr, 2009

1 commit


16 Mar, 2009

1 commit

  • Traditionally, changes to struct file->f_flags have been done under BKL
    protection, or with no protection at all. This patch causes all f_flags
    changes after file open/creation time to be done under protection of
    f_lock. This allows the removal of some BKL usage and fixes a number of
    longstanding (if microscopic) races.

    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Jonathan Corbet

    Jonathan Corbet
     

14 Jan, 2009

3 commits


09 Jan, 2009

1 commit

  • If a process registers for asynchronous notification on a POSIX message
    queue, it gets a signal and a siginfo_t structure when a message arrives
    on the message queue. The si_pid in the siginfo_t structure is set to the
    PID of the process that sent the message to the message queue.

    The principle is the following:
    . when mq_notify(SIGEV_SIGNAL) is called, the caller registers for
    notification when a msg arrives. The associated pid structure is stroed into
    inode_info->notify_owner. Let's call this process P1.
    . when mq_send() is called by say P2, P2 sends a signal to P1 to notify
    him about msg arrival.

    The way .si_pid is set today is not correct, since it doesn't take into account
    the fact that the process that is sending the message might not be in the
    same namespace as the notified one.

    This patch proposes to set si_pid to the sender's pid into the notify_owner
    namespace.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Bastian Blank
    Cc: Pavel Emelyanov
    Cc: Eric W. Biederman
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     

06 Jan, 2009

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    inotify: fix type errors in interfaces
    fix breakage in reiserfs_new_inode()
    fix the treatment of jfs special inodes
    vfs: remove duplicate code in get_fs_type()
    add a vfs_fsync helper
    sys_execve and sys_uselib do not call into fsnotify
    zero i_uid/i_gid on inode allocation
    inode->i_op is never NULL
    ntfs: don't NULL i_op
    isofs check for NULL ->i_op in root directory is dead code
    affs: do not zero ->i_op
    kill suid bit only for regular files
    vfs: lseek(fd, 0, SEEK_CUR) race condition

    Linus Torvalds
     
  • ... and don't bother in callers. Don't bother with zeroing i_blocks,
    while we are at it - it's already been zeroed.

    i_mode is not worth the effort; it has no common default value.

    Signed-off-by: Al Viro

    Al Viro
     

05 Jan, 2009

4 commits

  • * don't bother with allocations
    * don't do double copy_from_user()
    * don't duplicate parts of check for audit_dummy_context()

    Signed-off-by: Al Viro

    Al Viro
     
  • * logging the original value of *msg_prio in mq_timedreceive(2)
    is insane - the argument is write-only (i.e. syscall always
    ignores the original value and only overwrites it).
    * merge __audit_mq_timed{send,receive}
    * don't do copy_from_user() twice
    * don't mess with allocations in auditsc part
    * ... and don't bother checking !audit_enabled and !context in there -
    we'd already checked for audit_dummy_context().

    Signed-off-by: Al Viro

    Al Viro
     
  • * don't copy_from_user() twice
    * don't bother with allocations
    * don't duplicate parts of audit_dummy_context()
    * make it return void

    Signed-off-by: Al Viro

    Al Viro
     
  • * get rid of allocations
    * make it return void
    * don't duplicate parts of audit_dummy_context()

    Signed-off-by: Al Viro

    Al Viro
     

14 Nov, 2008

4 commits

  • Pass credentials through dentry_open() so that the COW creds patch can have
    SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
    when it opens its null chardev.

    The security_dentry_open() call also now takes a creds pointer, as does the
    dentry_open hook in struct security_operations.

    Signed-off-by: David Howells
    Acked-by: James Morris
    Signed-off-by: James Morris

    David Howells
     
  • Wrap current->cred and a few other accessors to hide their actual
    implementation.

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Separate the task security context from task_struct. At this point, the
    security data is temporarily embedded in the task_struct with two pointers
    pointing to it.

    Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
    entry.S via asm-offsets.

    With comment fixes Signed-off-by: Marc Dionne

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     

20 Oct, 2008

1 commit

  • Increase the range of various posix message queue limits.

    Posix gives the message queue user the ability to 'trade off' the maximum
    size of messages with the number of possible messages that can be 'in
    flight'. Linux currently makes this trade off more restrictive than it
    needs to be.

    In particular, the maximum message size today can be made no smaller than
    8192. This greatly restricts those applications that would like to have
    the ability to post large numbers of very small messages.

    So this task lowers the limit that the maximum message size can be set to,
    from 8192 to 128. It also lowers the limit that the maximum #number of
    messages in flight can be set to, from 10 to 1.

    With these changes the message queue user can make better trade offs
    between #messages and message size, in order to get everything to fit
    within the setrlimit(RLIMIT_MSGQUEUE) limit for that particular user.

    This patch also applies the values in

    /proc/sys/fs/mqueue/msg_max
    /proc/sys/fs/mqueue/msgsize_max

    as the defaults for the max #messages allowed and the max message size
    allowed, respectively, for those applications that do not supply these.
    Previously, the defaults were hardwired to 10 and 8192, respectively.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Joe Korty
    Cc: Al Viro
    Cc: Manfred Spraul
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Korty
     

27 Jul, 2008

2 commits

  • Incidentally, the name that gives hundreds of false positives on grep
    is not a good idea...

    Signed-off-by: Al Viro

    Al Viro
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

1 commit


06 Jun, 2008

1 commit


04 May, 2008

1 commit

  • A very small cleanup for mq_open.

    We do not have to call set_close_on_exit if we create the file
    descriptor right away with the flag set. We have a function for this
    now. The resulting code is smaller and a tiny bit faster.

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

19 Apr, 2008

2 commits

  • This is the first really tricky patch in the series. It elevates the writer
    count on a mount each time a non-special file is opened for write.

    We used to do this in may_open(), but Miklos pointed out that __dentry_open()
    is used as well to create filps. This will cover even those cases, while a
    call in may_open() would not have.

    There is also an elevated count around the vfs_create() call in open_namei().
    See the comments for more details, but we need this to fix a 'create, remount,
    fail r/w open()' race.

    Some filesystems forego the use of normal vfs calls to create
    struct files. Make sure that these users elevate the mnt
    writer count because they will get __fput(), and we need
    to make sure they're balanced.

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     
  • Elevate the write count during the vfs_rmdir() and vfs_unlink().

    [AV: merged rmdir and unlink parts, added missing pieces in nfsd]

    Acked-by: Serge Hallyn
    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     

09 Feb, 2008

2 commits

  • When sending the pid namespaces patches I wrongly converted the tsk->tgid into
    task_pid_vnr(tsk) in mqueue-s (the git id of this patch is
    b488893a390edfe027bae7a46e9af8083e740668).

    The proper behavior is to get the task_tgid_vnr(tsk).

    This seem to be the only mistake of that kind.

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
    _all_ converted to operate on the current pid namespace. After this each call
    like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
    one.

    Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
    appropriate.

    Signed-off-by: Pavel Emelyanov
    Reviewed-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

30 Nov, 2007

1 commit


07 Nov, 2007

1 commit

  • Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
    by moving the schedule_timeout() call to a new function that doesn't
    propagate the remaining timeout back to the caller. This means on each
    retry we start with the full timeout again.

    ipc/mqueue.c seems to actually want to wait indefinitely so this
    behaviour is retained.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

21 Oct, 2007

1 commit


20 Oct, 2007

1 commit

  • This is the largest patch in the set. Make all (I hope) the places where
    the pid is shown to or get from user operate on the virtual pids.

    The idea is:
    - all in-kernel data structures must store either struct pid itself
    or the pid's global nr, obtained with pid_nr() call;
    - when seeking the task from kernel code with the stored id one
    should use find_task_by_pid() call that works with global pids;
    - when showing pid's numerical value to the user the virtual one
    should be used, but however when one shows task's pid outside this
    task's namespace the global one is to be used;
    - when getting the pid from userspace one need to consider this as
    the virtual one and use appropriate task/pid-searching functions.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: nuther build fix]
    [akpm@linux-foundation.org: yet nuther build fix]
    [akpm@linux-foundation.org: remove unneeded casts]
    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Alexey Dobriyan
    Cc: Sukadev Bhattiprolu
    Cc: Oleg Nesterov
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

19 Oct, 2007

1 commit


17 Oct, 2007

1 commit

  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

11 Oct, 2007

1 commit


20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt