04 Sep, 2017

1 commit

  • struct timespec is not y2038 safe. Replace
    all uses of timespec by y2038 safe struct timespec64.

    Even though timespec is used here to represent timeouts,
    replace these with timespec64 so that it facilitates
    in verification by creating a y2038 safe kernel image
    that is free of timespec.

    The syscall interfaces themselves are not changed as part
    of the patch. They will be part of a different series.

    Signed-off-by: Deepa Dinamani
    Cc: Paul Moore
    Cc: Richard Guy Briggs
    Reviewed-by: Richard Guy Briggs
    Reviewed-by: Arnd Bergmann
    Acked-by: Paul Moore
    Signed-off-by: Al Viro

    Deepa Dinamani
     

10 Jul, 2017

1 commit

  • The retry logic for netlink_attachskb() inside sys_mq_notify()
    is nasty and vulnerable:

    1) The sock refcnt is already released when retry is needed
    2) The fd is controllable by user-space because we already
    release the file refcnt

    so we when retry but the fd has been just closed by user-space
    during this small window, we end up calling netlink_detachskb()
    on the error path which releases the sock again, later when
    the user-space closes this socket a use-after-free could be
    triggered.

    Setting 'sock' to NULL here should be sufficient to fix it.

    Reported-by: GeneBlue
    Signed-off-by: Cong Wang
    Cc: Andrew Morton
    Cc: Manfred Spraul
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Cong Wang
     

05 Jul, 2017

1 commit


02 Mar, 2017

3 commits


28 Feb, 2017

1 commit


21 Nov, 2016

1 commit

  • Currently the wake_q data structure is defined by the WAKE_Q() macro.
    This macro, however, looks like a function doing something as "wake" is
    a verb. Even checkpatch.pl was confused as it reported warnings like

    WARNING: Missing a blank line after declarations
    #548: FILE: kernel/futex.c:3665:
    + int ret;
    + WAKE_Q(wake_q);

    This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies
    what the macro is doing and eliminates the checkpatch.pl warnings.

    Signed-off-by: Waiman Long
    Acked-by: Davidlohr Bueso
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com
    [ Resolved conflict and added missing rename. ]
    Signed-off-by: Ingo Molnar

    Waiman Long
     

28 Sep, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

24 Jun, 2016

3 commits

  • Introduce a function may_open_dev that tests MNT_NODEV and a new
    superblock flab SB_I_NODEV. Use this new function in all of the
    places where MNT_NODEV was previously tested.

    Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those
    filesystems should never support device nodes, and a simple superblock
    flags makes that very hard to get wrong. With SB_I_NODEV set if any
    device nodes somehow manage to show up on on a filesystem those
    device nodes will be unopenable.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Set SB_I_NOEXEC on mqueuefs to ensure small implementation mistakes
    do not result in executable on mqueuefs by accident.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Today what is normally called data (the mount options) is not passed
    to fill_super through mount_ns.

    Pass the mount options and the namespace separately to mount_ns so
    that filesystems such as proc that have mount options, can use
    mount_ns.

    Pass the user namespace to mount_ns so that the standard permission
    check that verifies the mounter has permissions over the namespace can
    be performed in mount_ns instead of in each filesystems .mount method.
    Thus removing the duplication between mqueuefs and proc in terms of
    permission checks. The extra permission check does not currently
    affect the rpc_pipefs filesystem and the nfsd filesystem as those
    filesystems do not currently allow unprivileged mounts. Without
    unpvileged mounts it is guaranteed that the caller has already passed
    capable(CAP_SYS_ADMIN) which guarantees extra permission check will
    pass.

    Update rpc_pipefs and the nfsd filesystem to ensure that the network
    namespace reference is always taken in fill_super and always put in kill_sb
    so that the logic is simpler and so that errors originating inside of
    fill_super do not cause a network namespace leak.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

07 Aug, 2015

1 commit

  • A while back, the message queue implementation in the kernel was
    improved to use btrees to speed up retrieval of messages, in commit
    d6629859b36d ("ipc/mqueue: improve performance of send/recv").

    That patch introducing the improved kernel handling of message queues
    (using btrees) has, as a by-product, changed the meaning of the QSIZE
    field in the pseudo-file created for the queue. Before, this field
    reflected the size of the user-data in the queue. Since, it also takes
    kernel data structures into account. For example, if 13 bytes of user
    data are in the queue, on my machine the file reports a size of 61
    bytes.

    There was some discussion on this topic before (for example
    https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
    Kerrisk gave the following background
    (https://lkml.org/lkml/2015/6/16/74):

    The pseudofiles in the mqueue filesystem (usually mounted at
    /dev/mqueue) expose fields with metadata describing a message
    queue. One of these fields, QSIZE, as originally implemented,
    showed the total number of bytes of user data in all messages in
    the message queue, and this feature was documented from the
    beginning in the mq_overview(7) page. In 3.5, some other (useful)
    work happened to break the user-space API in a couple of places,
    including the value exposed via QSIZE, which now includes a measure
    of kernel overhead bytes for the queue, a figure that renders QSIZE
    useless for its original purpose, since there's no way to deduce
    the number of overhead bytes consumed by the implementation.
    (The other user-space breakage was subsequently fixed.)

    This patch removes the accounting of kernel data structures in the
    queue. Reporting the size of these data-structures in the QSIZE field
    was a breaking change (see Michael's comment above). Without the QSIZE
    field reporting the total size of user-data in the queue, there is no
    way to deduce this number.

    It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
    against the worst-case size of the queue (in both the old and the new
    implementation). Therefore, the kernel overhead accounting in QSIZE is
    not necessary to help the user understand the limitations RLIMIT imposes
    on the processes.

    Signed-off-by: Marcus Gelderie
    Acked-by: Doug Ledford
    Acked-by: Michael Kerrisk
    Acked-by: Davidlohr Bueso
    Cc: David Howells
    Cc: Alexander Viro
    Cc: John Duffy
    Cc: Arto Bendiken
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcus Gelderie
     

08 May, 2015

1 commit

  • This patch moves the wakeup_process() invocation so it is not done under
    the info->lock by making use of a lockless wake_q. With this change, the
    waiter is woken up once it is STATE_READY and it does not need to loop
    on SMP if it is still in STATE_PENDING. In the timeout case we still need
    to grab the info->lock to verify the state.

    This change should also avoid the introduction of preempt_disable() in -rt
    which avoids a busy-loop which pools for the STATE_PENDING -> STATE_READY
    change if the waiter has a higher priority compared to the waker.

    Additionally, this patch micro-optimizes wq_sleep by using the cheaper
    cousin of set_current_state(TASK_INTERRUPTABLE) as we will block no
    matter what, thus get rid of the implied barrier.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: George Spelvin
    Acked-by: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Borislav Petkov
    Cc: Chris Mason
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Manfred Spraul
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Steven Rostedt
    Cc: dave@stgolabs.net
    Link: http://lkml.kernel.org/r/1430748166.1940.17.camel@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

16 Apr, 2015

1 commit


20 Nov, 2014

1 commit

  • ... for situations when we don't have any candidate in pathnames - basically,
    in descriptor-based syscalls.

    [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]

    Signed-off-by: Al Viro

    Al Viro
     

08 Apr, 2014

1 commit


26 Feb, 2014

1 commit

  • Commit 93e6f119c0ce ("ipc/mqueue: cleanup definition names and
    locations") added global hardcoded limits to the amount of message
    queues that can be created. While these limits are per-namespace,
    reality is that it ends up breaking userspace applications.
    Historically users have, at least in theory, been able to create up to
    INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
    for some workloads and use cases. For instance, Madars reports:

    "This update imposes bad limits on our multi-process application. As
    our app uses approaches that each process opens its own set of queues
    (usually something about 3-5 queues per process). In some scenarios
    we might run up to 3000 processes or more (which of-course for linux
    is not a problem). Thus we might need up to 9000 queues or more. All
    processes run under one user."

    Other affected users can be found in launchpad bug #1155695:
    https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

    Instead of increasing this limit, revert it entirely and fallback to the
    original way of dealing queue limits -- where once a user's resource
    limit is reached, and all memory is used, new queues cannot be created.

    Signed-off-by: Davidlohr Bueso
    Reported-by: Madars Vitolins
    Acked-by: Doug Ledford
    Cc: Manfred Spraul
    Cc: [3.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

28 Jan, 2014

2 commits

  • Deal with checkpatch messages:
    WARNING: braces {} are not necessary for single statement blocks

    Signed-off-by: Davidlohr Bueso
    Cc: Aswin Chandramouleeswaran
    Cc: Rik van Riel
    Acked-by: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • The ipc code does not adhere the typical linux coding style.
    This patch fixes lots of simple whitespace errors.

    - mostly autogenerated by
    scripts/checkpatch.pl -f --fix \
    --types=pointer_location,spacing,space_before_tab
    - one manual fixup (keep structure members tab-aligned)
    - removal of additional space_before_tab that were not found by --fix

    Tested with some of my msg and sem test apps.

    Andrew: Could you include it in -mm and move it towards Linus' tree?

    Signed-off-by: Manfred Spraul
    Suggested-by: Li Bin
    Cc: Joe Perches
    Acked-by: Rafael Aquini
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

09 Nov, 2013

1 commit

  • We need to break delegations on any operation that changes the set of
    links pointing to an inode. Start with unlink.

    Such operations also hold the i_mutex on a parent directory. Breaking a
    delegation may require waiting for a timeout (by default 90 seconds) in
    the case of a unresponsive NFS client. To avoid blocking all directory
    operations, we therefore drop locks before waiting for the delegation.
    The logic then looks like:

    acquire locks
    ...
    test for delegation; if found:
    take reference on inode
    release locks
    wait for delegation break
    drop reference on inode
    retry

    It is possible this could never terminate. (Even if we take precautions
    to prevent another delegation being acquired on the same inode, we could
    get a different inode on each retry.) But this seems very unlikely.

    The initial test for a delegation happens after the lock on the target
    inode is acquired, but the directory inode may have been acquired
    further up the call stack. We therefore add a "struct inode **"
    argument to any intervening functions, which we use to pass the inode
    back up to the caller in the case it needs a delegation synchronously
    broken.

    Cc: David Howells
    Cc: Tyler Hicks
    Cc: Dustin Kirkland
    Acked-by: Jeff Layton
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     

10 Jul, 2013

1 commit

  • The old audit PATH records for mq_open looked like this:

    type=PATH msg=audit(1366282323.982:869): item=1 name=(null) inode=6777
    dev=00:0c mode=041777 ouid=0 ogid=0 rdev=00:00
    obj=system_u:object_r:tmpfs_t:s15:c0.c1023
    type=PATH msg=audit(1366282323.982:869): item=0 name="test_mq" inode=26732
    dev=00:0c mode=0100700 ouid=0 ogid=0 rdev=00:00
    obj=staff_u:object_r:user_tmpfs_t:s15:c0.c1023

    ...with the audit related changes that went into 3.7, they now look like this:

    type=PATH msg=audit(1366282236.776:3606): item=2 name=(null) inode=66655
    dev=00:0c mode=0100700 ouid=0 ogid=0 rdev=00:00
    obj=staff_u:object_r:user_tmpfs_t:s15:c0.c1023
    type=PATH msg=audit(1366282236.776:3606): item=1 name=(null) inode=6926
    dev=00:0c mode=041777 ouid=0 ogid=0 rdev=00:00
    obj=system_u:object_r:tmpfs_t:s15:c0.c1023
    type=PATH msg=audit(1366282236.776:3606): item=0 name="test_mq"

    Both of these look wrong to me. As Steve Grubb pointed out:

    "What we need is 1 PATH record that identifies the MQ. The other PATH
    records probably should not be there."

    Fix it to record the mq root as a parent, and flag it such that it
    should be hidden from view when the names are logged, since the root of
    the mq filesystem isn't terribly interesting. With this change, we get
    a single PATH record that looks more like this:

    type=PATH msg=audit(1368021604.836:484): item=0 name="test_mq" inode=16914
    dev=00:0c mode=0100644 ouid=0 ogid=0 rdev=00:00
    obj=unconfined_u:object_r:user_tmpfs_t:s0

    In order to do this, a new audit_inode_parent_hidden() function is
    added. If we do it this way, then we avoid having the existing callers
    of audit_inode needing to do any sort of flag conversion if auditing is
    inactive.

    Signed-off-by: Jeff Layton
    Reported-by: Jiri Jaburek
    Cc: Steve Grubb
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     

29 Mar, 2013

1 commit

  • Pull userns fixes from Eric W Biederman:
    "The bulk of the changes are fixing the worst consequences of the user
    namespace design oversight in not considering what happens when one
    namespace starts off as a clone of another namespace, as happens with
    the mount namespace.

    The rest of the changes are just plain bug fixes.

    Many thanks to Andy Lutomirski for pointing out many of these issues."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Restrict when proc and sysfs can be mounted
    ipc: Restrict mounting the mqueue filesystem
    vfs: Carefully propogate mounts across user namespaces
    vfs: Add a mount flag to lock read only bind mounts
    userns: Don't allow creation if the user is chrooted
    yama: Better permission check for ptraceme
    pid: Handle the exit of a multi-threaded init.
    scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.

    Linus Torvalds
     

27 Mar, 2013

1 commit

  • Only allow mounting the mqueue filesystem if the caller has CAP_SYS_ADMIN
    rights over the ipc namespace. The principle here is if you create
    or have capabilities over it you can mount it, otherwise you get to live
    with what other people have mounted.

    This information is not particularly sensitive and mqueue essentially
    only reports which posix messages queues exist. Still when creating a
    restricted environment for an application to live any extra
    information may be of use to someone with sufficient creativity. The
    historical if imperfect way this information has been restricted has
    been not to allow mounts and restricting this to ipc namespace
    creators maintains the spirit of the historical restriction.

    Cc: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

23 Mar, 2013

1 commit

  • mnt_drop_write() must be called only if mnt_want_write() succeeded,
    otherwise the mnt_writers counter will diverge.

    mnt_writers counters are used to check if remounting FS as read-only is
    OK, so after an extra mnt_drop_write() call, it would be impossible to
    remount mqueue FS as read-only. Besides, on umount a warning would be
    printed like this one:

    =====================================
    [ BUG: bad unlock balance detected! ]
    3.9.0-rc3 #5 Not tainted
    -------------------------------------
    a.out/12486 is trying to release lock (sb_writers) at:
    mnt_drop_write+0x1f/0x30
    but there are no more locks to release!

    Signed-off-by: Vladimir Davydov
    Cc: Doug Ledford
    Cc: KOSAKI Motohiro
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


28 Jan, 2013

1 commit

  • This patch allow the unprivileged user to mount mqueuefs in
    user ns.

    If two userns share the same ipcns,the files in mqueue fs
    should be seen in both these two userns.

    If the userns has its own ipcns,it has its own mqueue fs too.
    ipcns has already done this job well.

    Signed-off-by: Gao feng
    Signed-off-by: Eric W. Biederman

    Gao feng
     

13 Oct, 2012

2 commits

  • Keep a pointer to the audit_names "slot" in struct filename.

    Have all of the audit_inode callers pass a struct filename ponter to
    audit_inode instead of a string pointer. If the aname field is already
    populated, then we can skip walking the list altogether and just use it
    directly.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • getname() is intended to copy pathname strings from userspace into a
    kernel buffer. The result is just a string in kernel space. It would
    however be quite helpful to be able to attach some ancillary info to
    the string.

    For instance, we could attach some audit-related info to reduce the
    amount of audit-related processing needed. When auditing is enabled,
    we could also call getname() on the string more than once and not
    need to recopy it from userspace.

    This patchset converts the getname()/putname() interfaces to return
    a struct instead of a string. For now, the struct just tracks the
    string in kernel space and the original userland pointer for it.

    Later, we'll add other information to the struct as it becomes
    convenient.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

12 Oct, 2012

1 commit

  • Currently, this gets set mostly by happenstance when we call into
    audit_inode_child. While that might be a little more efficient, it seems
    wrong. If the syscall ends up failing before audit_inode_child ever gets
    called, then you'll have an audit_names record that shows the full path
    but has the parent inode info attached.

    Fix this by passing in a parent flag when we call audit_inode that gets
    set to the value of LOOKUP_PARENT. We can then fix up the pathname for
    the audit entry correctly from the get-go.

    While we're at it, clean up the no-op macro for audit_inode in the
    !CONFIG_AUDITSYSCALL case.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

09 Oct, 2012

1 commit

  • Commit d6629859b36d ("ipc/mqueue: improve performance of send/recv") and
    ce2d52cc ("ipc/mqueue: add rbtree node caching support") introduced an
    rbtree of message priorities, and usage of rb_init_node() to initialize
    the corresponding nodes. As it turns out, rb_init_node() is unnecessary
    here, as the nodes are fully initialized on insertion by rb_link_node()
    and the code doesn't access nodes that aren't inserted on the rbtree.

    Removing the rb_init_node() calls as I removed that function during
    rbtree API cleanups (the only other use of it was in a place that
    similarly didn't require it).

    Signed-off-by: Michel Lespinasse
    Acked-by: Doug Ledford
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

27 Sep, 2012

2 commits


19 Aug, 2012

1 commit


23 Jul, 2012

1 commit


14 Jul, 2012

1 commit