13 Jul, 2017

19 commits

  • Now that ipc_rcu_alloc() and ipc_rcu_free() are removed, document when
    it is valid to use ipc_getref() and ipc_putref().

    Link: http://lkml.kernel.org/r/20170525185107.12869-21-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The remaining users of __sem_free() can simply call kvfree() instead for
    better readability.

    [manfred@colorfullife.com: Rediff to keep rcu protection for security_sem_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-20-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There is nothing special about the msg_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff to keep rcu protection for security_msg_queue_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-19-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There is nothing special about the shm_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff, to continue to keep rcu for free calls after a successful security_shm_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-18-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Only after ipc_addid() has succeeded will refcounting be used, so move
    initialization into ipc_addid() and remove from open-coded *_alloc()
    routines.

    Link: http://lkml.kernel.org/r/20170525185107.12869-17-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Loosely based on a patch from Kees Cook :
    - id and retval can be merged
    - if ipc_addid() fails, then use call_rcu() directly.

    The difference is that call_rcu is used for failed ipc_addid() calls, to
    continue to guaranteed an rcu delay for security_msg_queue_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-16-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Loosely based on a patch from Kees Cook :
    - id and error can be merged
    - if operations before ipc_addid() fail, then use call_rcu() directly.

    The difference is that call_rcu is used for failures after
    security_shm_alloc(), to continue to guaranteed an rcu delay for
    security_sem_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-15-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Loosely based on a patch from Kees Cook :
    - id and retval can be merged
    - if ipc_addid() fails, then use call_rcu() directly.

    The difference is that call_rcu is used for failed ipc_addid() calls, to
    continue to guaranteed an rcu delay for security_sem_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-14-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • No callers remain for ipc_rcu_alloc(). Drop the function.

    [manfred@colorfullife.com: Rediff because the memset was temporarily inside ipc_rcu_free()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-13-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it. This also allows for msg_queue structure layout to be
    randomized in the future.

    Link: http://lkml.kernel.org/r/20170525185107.12869-12-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it. This also allows for shmid_kernel structure layout to be
    randomized in the future.

    Link: http://lkml.kernel.org/r/20170525185107.12869-11-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it to perform better sem-specific checks. This also allows
    for sem_array structure layout to be randomized in the future.

    [manfred@colorfullife.com: Rediff, because the memset was temporarily inside ipc_rcu_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-10-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There are no more callers of ipc_rcu_free(), so remove it.

    Link: http://lkml.kernel.org/r/20170525185107.12869-9-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-8-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-7-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-6-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • The only users of ipc_alloc() were ipc_rcu_alloc() and the on-heap
    sem_io fall-back memory. Better to just open-code these to make things
    easier to read.

    [manfred@colorfullife.com: Rediff due to inclusion of memset() into ipc_rcu_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-5-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • ipc has two management structures that exist for every id:
    - struct kern_ipc_perm, it contains e.g. the permissions.
    - struct ipc_rcu, it contains the rcu head for rcu handling and the
    refcount.

    The patch merges both structures.

    As a bonus, we may save one cacheline, because both structures are
    cacheline aligned. In addition, it reduces the number of casts, instead
    most codepaths can use container_of.

    To simplify code, the ipc_rcu_alloc initializes the allocation to 0.

    [manfred@colorfullife.com: really include the memset() into ipc_alloc_rcu()]
    Link: http://lkml.kernel.org/r/564f8612-0601-b267-514f-a9f650ec9b32@colorfullife.com
    Link: http://lkml.kernel.org/r/20170525185107.12869-3-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • sma->sem_base is initialized with

    sma->sem_base = (struct sem *) &sma[1];

    The current code has four problems:
    - There is an unnecessary pointer dereference - sem_base is not needed.
    - Alignment for struct sem only works by chance.
    - The current code causes false positive for static code analysis.
    - This is a cast between different non-void types, which the future
    randstruct GCC plugin warns on.

    And, as bonus, the code size gets smaller:

    Before:
    0 .text 00003770
    After:
    0 .text 0000374e

    [manfred@colorfullife.com: s/[0]/[]/, per hch]
    Link: http://lkml.kernel.org/r/20170525185107.12869-2-manfred@colorfullife.com
    Link: http://lkml.kernel.org/r/20170515171912.6298-2-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Acked-by: Kees Cook
    Cc: Kees Cook
    Cc:
    Cc: Davidlohr Bueso
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Fabian Frederick
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

10 Jul, 2017

1 commit

  • The retry logic for netlink_attachskb() inside sys_mq_notify()
    is nasty and vulnerable:

    1) The sock refcnt is already released when retry is needed
    2) The fd is controllable by user-space because we already
    release the file refcnt

    so we when retry but the fd has been just closed by user-space
    during this small window, we end up calling netlink_detachskb()
    on the error path which releases the sock again, later when
    the user-space closes this socket a use-after-free could be
    triggered.

    Setting 'sock' to NULL here should be sufficient to fix it.

    Reported-by: GeneBlue
    Signed-off-by: Cong Wang
    Cc: Andrew Morton
    Cc: Manfred Spraul
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Cong Wang
     

08 Jul, 2017

1 commit

  • Pull Writeback error handling fixes from Jeff Layton:
    "The main rationale for all of these changes is to tighten up writeback
    error reporting to userland. There are many ways now that writeback
    errors can be lost, such that fsync/fdatasync/msync return 0 when
    writeback actually failed.

    This pile contains a small set of cleanups and writeback error
    handling fixes that I was able to break off from the main pile (#2).

    Two of the patches in this pile are trivial. The exceptions are the
    patch to fix up error handling in write_one_page, and the patch to
    make JFS pay attention to write_one_page errors"

    * tag 'for-linus-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    fs: remove call_fsync helper function
    mm: clean up error handling in write_one_page
    JFS: do not ignore return code from write_one_page()
    mm: drop "wait" parameter from write_one_page()

    Linus Torvalds
     

06 Jul, 2017

1 commit


05 Jul, 2017

1 commit


09 May, 2017

2 commits

  • Patch series "kvmalloc", v5.

    There are many open coded kmalloc with vmalloc fallback instances in the
    tree. Most of them are not careful enough or simply do not care about
    the underlying semantic of the kmalloc/page allocator which means that
    a) some vmalloc fallbacks are basically unreachable because the kmalloc
    part will keep retrying until it succeeds b) the page allocator can
    invoke a really disruptive steps like the OOM killer to move forward
    which doesn't sound appropriate when we consider that the vmalloc
    fallback is available.

    As it can be seen implementing kvmalloc requires quite an intimate
    knowledge if the page allocator and the memory reclaim internals which
    strongly suggests that a helper should be implemented in the memory
    subsystem proper.

    Most callers, I could find, have been converted to use the helper
    instead. This is patch 6. There are some more relying on __GFP_REPEAT
    in the networking stack which I have converted as well and Eric Dumazet
    was not opposed [2] to convert them as well.

    [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
    [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

    This patch (of 9):

    Using kmalloc with the vmalloc fallback for larger allocations is a
    common pattern in the kernel code. Yet we do not have any common helper
    for that and so users have invented their own helpers. Some of them are
    really creative when doing so. Let's just add kv[mz]alloc and make sure
    it is implemented properly. This implementation makes sure to not make
    a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
    to not warn about allocation failures. This also rules out the OOM
    killer as the vmalloc is a more approapriate fallback than a disruptive
    user visible action.

    This patch also changes some existing users and removes helpers which
    are specific for them. In some cases this is not possible (e.g.
    ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
    require GFP_NO{FS,IO} context which is not vmalloc compatible in general
    (note that the page table allocation is GFP_KERNEL). Those need to be
    fixed separately.

    While we are at it, document that __vmalloc{_node} about unsupported gfp
    mask because there seems to be a lot of confusion out there.
    kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
    superset) flags to catch new abusers. Existing ones would have to die
    slowly.

    [sfr@canb.auug.org.au: f2fs fixup]
    Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Stephen Rothwell
    Reviewed-by: Andreas Dilger [ext4 part]
    Acked-by: Vlastimil Babka
    Cc: John Hubbard
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Clean up early flag and address some minutia.

    Link: http://lkml.kernel.org/r/1486673582-6979-3-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

06 May, 2017

1 commit

  • Pull namespace updates from Eric Biederman:
    "This is a set of small fixes that were mostly stumbled over during
    more significant development. This proc fix and the fix to
    posix-timers are the most significant of the lot.

    There is a lot of good development going on but unfortunately it
    didn't quite make the merge window"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc: Fix unbalanced hard link numbers
    signal: Make kill_proc_info static
    rlimit: Properly call security_task_setrlimit
    signal: Remove unused definition of sig_user_definied
    ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET
    ipc: Remove unused declaration of recompute_msgmni
    posix-timers: Correct sanity check in posix_cpu_nsleep
    sysctl: Remove dead register_sysctl_root

    Linus Torvalds
     

18 Apr, 2017

1 commit


03 Apr, 2017

1 commit

  • ./lib/string.c:134: WARNING: Inline emphasis start-string without end-string.
    ./mm/filemap.c:522: WARNING: Inline interpreted text or phrase reference start-string without end-string.
    ./mm/filemap.c:1283: ERROR: Unexpected indentation.
    ./mm/filemap.c:3003: WARNING: Inline interpreted text or phrase reference start-string without end-string.
    ./mm/vmalloc.c:1544: WARNING: Inline emphasis start-string without end-string.
    ./mm/page_alloc.c:4245: ERROR: Unexpected indentation.
    ./ipc/util.c:676: ERROR: Unexpected indentation.
    ./drivers/pci/irq.c:35: WARNING: Block quote ends without a blank line; unexpected unindent.
    ./security/security.c:109: ERROR: Unexpected indentation.
    ./security/security.c:110: WARNING: Definition list ends without a blank line; unexpected unindent.
    ./block/genhd.c:275: WARNING: Inline strong start-string without end-string.
    ./block/genhd.c:283: WARNING: Inline strong start-string without end-string.
    ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string.
    ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string.
    ./ipc/util.c:477: ERROR: Unknown target name: "s".

    Signed-off-by: Mauro Carvalho Chehab
    Acked-by: Bjorn Helgaas
    Signed-off-by: Jonathan Corbet

    mchehab@s-opensource.com
     

04 Mar, 2017

1 commit

  • Pull sched.h split-up from Ingo Molnar:
    "The point of these changes is to significantly reduce the
    header footprint, to speed up the kernel build and to
    have a cleaner header structure.

    After these changes the new 's typical preprocessed
    size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
    lines), which is around 40% faster to build on typical configs.

    Not much changed from the last version (-v2) posted three weeks ago: I
    eliminated quirks, backmerged fixes plus I rebased it to an upstream
    SHA1 from yesterday that includes most changes queued up in -next plus
    all sched.h changes that were pending from Andrew.

    I've re-tested the series both on x86 and on cross-arch defconfigs,
    and did a bisectability test at a number of random points.

    I tried to test as many build configurations as possible, but some
    build breakage is probably still left - but it should be mostly
    limited to architectures that have no cross-compiler binaries
    available on kernel.org, and non-default configurations"

    * 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
    sched/headers: Clean up
    sched/headers: Remove #ifdefs from
    sched/headers: Remove the include from
    sched/headers, hrtimer: Remove the include from
    sched/headers, x86/apic: Remove the header inclusion from
    sched/headers, timers: Remove the include from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/core: Remove unused prefetch_stack()
    sched/headers: Remove from
    sched/headers: Remove the 'init_pid_ns' prototype from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove the runqueue_is_locked() prototype
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove the include from
    sched/headers: Remove from
    ...

    Linus Torvalds
     

03 Mar, 2017

1 commit

  • Pull vfs pile two from Al Viro:

    - orangefs fix

    - series of fs/namei.c cleanups from me

    - VFS stuff coming from overlayfs tree

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    orangefs: Use RCU for destroy_inode
    vfs: use helper for calling f_op->fsync()
    mm: use helper for calling f_op->mmap()
    vfs: use helpers for calling f_op->{read,write}_iter()
    vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
    vfs: extract common parts of {compat_,}do_readv_writev()
    vfs: wrap write f_ops with file_{start,end}_write()
    vfs: deny copy_file_range() for non regular files
    vfs: deny fallocate() on directory
    vfs: create vfs helper vfs_tmpfile()
    namei.c: split unlazy_walk()
    namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate()
    lookup_fast(): clean up the logics around the fallback to non-rcu mode
    namei: fold unlazy_link() into its sole caller

    Linus Torvalds
     

02 Mar, 2017

7 commits


28 Feb, 2017

3 commits

  • The issue is described here, with a nice testcase:

    https://bugzilla.kernel.org/show_bug.cgi?id=192931

    The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
    the address rounded down to 0. For the regular mmap case, the
    protection mentioned above is that the kernel gets to generate the
    address -- arch_get_unmapped_area() will always check for MAP_FIXED and
    return that address. So by the time we do security_mmap_addr(0) things
    get funky for shmat().

    The testcase itself shows that while a regular user crashes, root will
    not have a problem attaching a nil-page. There are two possible fixes
    to this. The first, and which this patch does, is to simply allow root
    to crash as well -- this is also regular mmap behavior, ie when hacking
    up the testcase and adding mmap(... |MAP_FIXED). While this approach
    is the safer option, the second alternative is to ignore SHM_RND if the
    rounded address is 0, thus only having MAP_SHARED flags. This makes the
    behavior of shmat() identical to the mmap() case. The downside of this
    is obviously user visible, but does make sense in that it maintains
    semantics after the round-down wrt 0 address and mmap.

    Passes shm related ltp tests.

    Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Reported-by: Gareth Evans
    Cc: Manfred Spraul
    Cc: Michael Kerrisk
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Link: http://lkml.kernel.org/r/20170128235704.45302-1-luc.vanoostenryck@gmail.com
    Signed-off-by: Luc Van Oostenryck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luc Van Oostenryck
     
  • sysv sem has two lock modes: One with per-semaphore locks, one lock mode
    with a single global lock for the whole array. When switching from the
    per-semaphore locks to the global lock, all per-semaphore locks must be
    scanned for ongoing operations.

    The patch adds a hysteresis for switching from the global lock to the
    per semaphore locks. This reduces how often the per-semaphore locks
    must be scanned.

    Compared to the initial patch, this is a simplified solution: Setting
    USE_GLOBAL_LOCK_HYSTERESIS to 1 restores the current behavior.

    In theory, a workload with exactly 10 simple sops and then one complex
    op now scales a bit worse, but this is pure theory: If there is
    concurrency, the it won't be exactly 10:1:10:1:10:1:... If there is no
    concurrency, then there is no need for scalability.

    Link: http://lkml.kernel.org/r/1476851896-3590-3-git-send-email-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Peter Zijlstra
    Cc: Davidlohr Bueso
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc:
    Cc: kernel test robot
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul