01 May, 2013

19 commits

  • Trying to run an application which was trying to put data into half of
    memory using shmget(), we found that having a shmall value below 8EiB-8TiB
    would prevent us from using anything more than 8TiB. By setting
    kernel.shmall greater than 8EiB-8TiB would make the job work.

    In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.

    ipc/shm.c:
    458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
    459 {
    ...
    465 int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
    ...
    474 if (ns->shm_tot + numpages > ns->shm_ctlall)
    475 return -ENOSPC;

    [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
    Signed-off-by: Robin Holt
    Reported-by: Alex Thorlton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt
     
  • The ipc/msg.c code does its list operations by hand and it open-codes the
    accesses, instead of using for_each_entry_[safe].

    Signed-off-by: Nikola Pajkovsky
    Cc: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: Peter Hurley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikola Pajkovsky
     
  • Introduce finer grained locking for semtimedop, to handle the common case
    of a program wanting to manipulate one semaphore from an array with
    multiple semaphores.

    If the call is a semop manipulating just one semaphore in an array with
    multiple semaphores, only take the lock for that semaphore itself.

    If the call needs to manipulate multiple semaphores, or another caller is
    in a transaction that manipulates multiple semaphores, the sem_array lock
    is taken, as well as all the locks for the individual semaphores.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
    [davidlohr.bueso@hp.com: make refcounter atomic]
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Jason Low
    Reviewed-by: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Emmanuel Benisty
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Having only one list in struct sem_queue, and only queueing simple
    semaphore operations on the list for the semaphore involved, allows us to
    introduce finer grained locking for semtimedop.

    Signed-off-by: Rik van Riel
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
    later that only locks the sem_array and does nothing else.

    Open code the locking from ipc_lock() in sem_obtain_lock() so we can
    introduce finer grained locking for the sem_array in the next patch.

    [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
    Signed-off-by: Rik van Riel
    Acked-by: Davidlohr Bueso
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Instead of holding the ipc lock for permissions and security checks, among
    others, only acquire it when necessary.

    Some numbers....

    1) With Rik's semop-multi.c microbenchmark we can see the following
    results:

    Baseline (3.9-rc1):
    cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
    total operations: 151452270, ops/sec 5048409

    + 59.40% a.out [kernel.kallsyms] [k] _raw_spin_lock
    + 6.14% a.out [kernel.kallsyms] [k] sys_semtimedop
    + 3.84% a.out [kernel.kallsyms] [k] avc_has_perm_flags
    + 3.64% a.out [kernel.kallsyms] [k] __audit_syscall_exit
    + 2.06% a.out [kernel.kallsyms] [k] copy_user_enhanced_fast_string
    + 1.86% a.out [kernel.kallsyms] [k] ipc_lock

    With this patchset:
    cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
    total operations: 273156400, ops/sec 9105213

    + 18.54% a.out [kernel.kallsyms] [k] _raw_spin_lock
    + 11.72% a.out [kernel.kallsyms] [k] sys_semtimedop
    + 7.70% a.out [kernel.kallsyms] [k] ipc_has_perm.isra.21
    + 6.58% a.out [kernel.kallsyms] [k] avc_has_perm_flags
    + 6.54% a.out [kernel.kallsyms] [k] __audit_syscall_exit
    + 4.71% a.out [kernel.kallsyms] [k] ipc_obtain_object_check

    2) While on an Oracle swingbench DSS (data mining) workload the
    improvements are not as exciting as with Rik's benchmark, we can see
    some positive numbers. For an 8 socket machine the following are the
    percentages of %sys time incurred in the ipc lock:

    Baseline (3.9-rc1):
    100 swingbench users: 8,74%
    400 swingbench users: 21,86%
    800 swingbench users: 84,35%

    With this patchset:
    100 swingbench users: 8,11%
    400 swingbench users: 19,93%
    800 swingbench users: 77,69%

    [riel@redhat.com: fix two locking bugs]
    [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Jason Low
    Cc: Emmanuel Benisty
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
    check permissions, mostly for IPC_RMID and IPC_SET commands.

    Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
    The locking version is retained, yet modified to call the nolock version
    without affecting its semantics, thus transparent to all ipc callers.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Suggested-by: Linus Torvalds
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Through ipc_lock() and therefore ipc_lock_check() we currently return the
    locked ipc object. This is not necessary for all situations and can,
    therefore, cause unnecessary ipc lock contention.

    Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
    functions that only lookup and return the ipc object.

    Both these functions must be called within the RCU read critical section.

    [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()]
    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Reviewed-by: Chegu Vinod
    Acked-by: Michel Lespinasse
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This series makes the sysv semaphore code more scalable, by reducing the
    time the semaphore lock is held, and making the locking more scalable for
    semaphore arrays with multiple semaphores.

    The first four patches were written by Davidlohr Buesso, and reduce the
    hold time of the semaphore lock.

    The last three patches change the sysv semaphore code locking to be more
    fine grained, providing a performance boost when multiple semaphores in a
    semaphore array are being manipulated simultaneously.

    On a 24 CPU system, performance numbers with the semop-multi
    test with N threads and N semaphores, look like this:

    vanilla Davidlohr's Davidlohr's + Davidlohr's +
    threads patches rwlock patches v3 patches
    10 610652 726325 1783589 2142206
    20 341570 365699 1520453 1977878
    30 288102 307037 1498167 2037995
    40 290714 305955 1612665 2256484
    50 288620 312890 1733453 2650292
    60 289987 306043 1649360 2388008
    70 291298 306347 1723167 2717486
    80 290948 305662 1729545 2763582
    90 290996 306680 1736021 2757524
    100 292243 306700 1773700 3059159

    This patch:

    There is no reason to be holding the ipc lock while reading ipcp->seq,
    hence remove misleading comment.

    Also simplify the return value for the function.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Rik van Riel
    Cc: Chegu Vinod
    Cc: Emmanuel Benisty
    Cc: Jason Low
    Cc: Michel Lespinasse
    Cc: Peter Hurley
    Cc: Stanislav Kinsbursky
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Signed-off-by: HoSung Jung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HoSung Jung
     
  • [fengguang.wu@intel.com: find_msg can be static]
    Signed-off-by: Peter Hurley
    Cc: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Teach the helper routines about MSG_COPY so that msgtyp is preserved as
    the message number to copy.

    The security functions affected by this change were audited and no
    additional changes are necessary.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • In preparation for refactoring the queue scan into a separate
    function, relocate msg copying.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Separating msg allocation enables single-block vmalloc
    allocation instead.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     

30 Apr, 2013

1 commit


03 Apr, 2013

1 commit

  • Make sure that msg pointer is set back to error value in case of
    MSG_COPY flag is set and desired message to copy wasn't found. This
    garantees that msg is either a error pointer or a copy address.

    Otherwise the last message in queue will be freed without unlinking from
    the queue (which leads to memory corruption) and the dummy allocated
    copy won't be released.

    Signed-off-by: Stanislav Kinsbursky
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

29 Mar, 2013

1 commit

  • Pull userns fixes from Eric W Biederman:
    "The bulk of the changes are fixing the worst consequences of the user
    namespace design oversight in not considering what happens when one
    namespace starts off as a clone of another namespace, as happens with
    the mount namespace.

    The rest of the changes are just plain bug fixes.

    Many thanks to Andy Lutomirski for pointing out many of these issues."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Restrict when proc and sysfs can be mounted
    ipc: Restrict mounting the mqueue filesystem
    vfs: Carefully propogate mounts across user namespaces
    vfs: Add a mount flag to lock read only bind mounts
    userns: Don't allow creation if the user is chrooted
    yama: Better permission check for ptraceme
    pid: Handle the exit of a multi-threaded init.
    scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.

    Linus Torvalds
     

27 Mar, 2013

1 commit

  • Only allow mounting the mqueue filesystem if the caller has CAP_SYS_ADMIN
    rights over the ipc namespace. The principle here is if you create
    or have capabilities over it you can mount it, otherwise you get to live
    with what other people have mounted.

    This information is not particularly sensitive and mqueue essentially
    only reports which posix messages queues exist. Still when creating a
    restricted environment for an application to live any extra
    information may be of use to someone with sufficient creativity. The
    historical if imperfect way this information has been restricted has
    been not to allow mounts and restricting this to ipc namespace
    creators maintains the spirit of the historical restriction.

    Cc: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

23 Mar, 2013

1 commit

  • mnt_drop_write() must be called only if mnt_want_write() succeeded,
    otherwise the mnt_writers counter will diverge.

    mnt_writers counters are used to check if remounting FS as read-only is
    OK, so after an extra mnt_drop_write() call, it would be impossible to
    remount mqueue FS as read-only. Besides, on umount a warning would be
    printed like this one:

    =====================================
    [ BUG: bad unlock balance detected! ]
    3.9.0-rc3 #5 Not tainted
    -------------------------------------
    a.out/12486 is trying to release lock (sb_writers) at:
    mnt_drop_write+0x1f/0x30
    but there are no more locks to release!

    Signed-off-by: Vladimir Davydov
    Cc: Doug Ledford
    Cc: KOSAKI Motohiro
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

09 Mar, 2013

2 commits

  • When MSG_COPY is set, a duplicate message must be allocated for the copy
    before locking the queue. However, the copy could not be larger than was
    sent which is limited to msg_ctlmax.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     
  • If the src msg is > 4k, then dest->next points to the
    next allocated segment; resetting it just prior to dereferencing
    is bad.

    Signed-off-by: Peter Hurley
    Acked-by: Stanislav Kinsbursky
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hurley
     

06 Mar, 2013

1 commit


04 Mar, 2013

3 commits


28 Feb, 2013

1 commit

  • Convert to the much saner new idr interface.

    The new interface doesn't directly translate to the way idr_pre_get()
    was used around ipc_addid() as preloading disables preemption. From
    my cursory reading, it seems like we should be able to do all
    allocation from ipc_addid(), so I moved it there. Can you please
    check whether this would be okay? If this is wrong and ipc_addid()
    should be allowed to be called from non-sleepable context, I'd suggest
    allocating id itself in the outer functions and later install the
    pointer using idr_replace().

    Signed-off-by: Tejun Heo
    Reported-by: Sedat Dilek
    Tested-by: Sedat Dilek
    Cc: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

1 commit

  • Pull user namespace and namespace infrastructure changes from Eric W Biederman:
    "This set of changes starts with a few small enhnacements to the user
    namespace. reboot support, allowing more arbitrary mappings, and
    support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
    user namespace root.

    I do my best to document that if you care about limiting your
    unprivileged users that when you have the user namespace support
    enabled you will need to enable memory control groups.

    There is a minor bug fix to prevent overflowing the stack if someone
    creates way too many user namespaces.

    The bulk of the changes are a continuation of the kuid/kgid push down
    work through the filesystems. These changes make using uids and gids
    typesafe which ensures that these filesystems are safe to use when
    multiple user namespaces are in use. The filesystems converted for
    3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
    changes for these filesystems were a little more involved so I split
    the changes into smaller hopefully obviously correct changes.

    XFS is the only filesystem that remains. I was hoping I could get
    that in this release so that user namespace support would be enabled
    with an allyesconfig or an allmodconfig but it looks like the xfs
    changes need another couple of days before it they are ready."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
    cifs: Enable building with user namespaces enabled.
    cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
    cifs: Convert struct cifs_sb_info to use kuids and kgids
    cifs: Modify struct smb_vol to use kuids and kgids
    cifs: Convert struct cifsFileInfo to use a kuid
    cifs: Convert struct cifs_fattr to use kuid and kgids
    cifs: Convert struct tcon_link to use a kuid.
    cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
    cifs: Convert from a kuid before printing current_fsuid
    cifs: Use kuids and kgids SID to uid/gid mapping
    cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
    cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
    cifs: Override unmappable incoming uids and gids
    nfsd: Enable building with user namespaces enabled.
    nfsd: Properly compare and initialize kuids and kgids
    nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
    nfsd: Modify nfsd4_cb_sec to use kuids and kgids
    nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
    nfsd: Convert nfsxdr to use kuids and kgids
    nfsd: Convert nfs3xdr to use kuids and kgids
    ...

    Linus Torvalds
     

24 Feb, 2013

2 commits

  • do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE
    multiple, however there was no equivalent code in mm_populate(), which
    caused issues.

    This could be fixed by introduced the same rounding in mm_populate(),
    however I think it's preferable to make do_mmap_pgoff() return populate
    as a size rather than as a boolean, so we don't have to duplicate the
    size rounding logic in mm_populate().

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or
    with MCL_FUTURE in effect), we want to populate the pages within the
    newly created vmas. This may take a while as we may have to read pages
    from disk, so ideally we want to do this outside of the write-locked
    mmap_sem region.

    This change introduces mm_populate(), which is used to defer populating
    such mappings until after the mmap_sem write lock has been released.
    This is implemented as a generalization of the former do_mlock_pages(),
    which accomplished the same task but was using during mlock() /
    mlockall().

    Signed-off-by: Michel Lespinasse
    Reported-by: Andy Lutomirski
    Acked-by: Rik van Riel
    Tested-by: Andy Lutomirski
    Cc: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

23 Feb, 2013

2 commits

  • Allocating a file structure in function get_empty_filp() might fail because
    of several reasons:
    - not enough memory for file structures
    - operation is not allowed
    - user is over its limit

    Currently the function returns NULL in all cases and we loose the exact
    reason of the error. All callers of get_empty_filp() assume that the function
    can fail with ENFILE only.

    Return error through pointer. Change all callers to preserve this error code.

    [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
    (things remaining here deal with alloc_file()), removed pipe(2) behaviour change]

    Signed-off-by: Anatol Pomozov
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Al Viro

    Anatol Pomozov
     
  • Signed-off-by: Al Viro

    Al Viro
     

28 Jan, 2013

1 commit

  • This patch allow the unprivileged user to mount mqueuefs in
    user ns.

    If two userns share the same ipcns,the files in mqueue fs
    should be seen in both these two userns.

    If the userns has its own ipcns,it has its own mqueue fs too.
    ipcns has already done this job well.

    Signed-off-by: Gao feng
    Signed-off-by: Eric W. Biederman

    Gao feng
     

05 Jan, 2013

2 commits