13 Jul, 2017

40 commits

  • Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
    HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

    LOCKUP_DETECTOR implies the general boot, sysctl, and programming
    interfaces for the lockup detectors.

    An architecture that wants to use a hard lockup detector must define
    HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.

    Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
    minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
    does not implement the LOCKUP_DETECTOR interfaces.

    sparc is unusual in that it has started to implement some of the
    interfaces, but not fully yet. It should probably be converted to a full
    HAVE_HARDLOCKUP_DETECTOR_ARCH.

    [npiggin@gmail.com: fix]
    Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com
    Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com
    Signed-off-by: Nicholas Piggin
    Reviewed-by: Don Zickus
    Reviewed-by: Babu Moger
    Tested-by: Babu Moger [sparc]
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     
  • For architectures that define HAVE_NMI_WATCHDOG, instead of having them
    provide the complete touch_nmi_watchdog() function, just have them
    provide arch_touch_nmi_watchdog().

    This gives the generic code more flexibility in implementing this
    function, and arch implementations don't miss out on touching the
    softlockup watchdog or other generic details.

    Link: http://lkml.kernel.org/r/20170616065715.18390-3-npiggin@gmail.com
    Signed-off-by: Nicholas Piggin
    Reviewed-by: Don Zickus
    Reviewed-by: Babu Moger
    Tested-by: Babu Moger [sparc]
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     
  • Patch series "Improve watchdog config for arch watchdogs", v4.

    A series to make the hardlockup watchdog more easily replaceable by arch
    code. The last patch provides some justification for why we want to do
    this (existing sparc watchdog is another that could benefit).

    This patch (of 5):

    Remove unused declaration.

    Link: http://lkml.kernel.org/r/20170616065715.18390-2-npiggin@gmail.com
    Signed-off-by: Nicholas Piggin
    Reviewed-by: Don Zickus
    Reviewed-by: Babu Moger
    Tested-by: Babu Moger [sparc]
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     
  • xt_alloc_table_info() basically opencodes kvmalloc() so use the library
    function instead.

    Link: http://lkml.kernel.org/r/20170531155145.17111-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Pablo Neira Ayuso
    Cc: Jozsef Kadlecsik
    Cc: Florian Westphal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Now that ipc_rcu_alloc() and ipc_rcu_free() are removed, document when
    it is valid to use ipc_getref() and ipc_putref().

    Link: http://lkml.kernel.org/r/20170525185107.12869-21-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The remaining users of __sem_free() can simply call kvfree() instead for
    better readability.

    [manfred@colorfullife.com: Rediff to keep rcu protection for security_sem_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-20-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There is nothing special about the msg_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff to keep rcu protection for security_msg_queue_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-19-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There is nothing special about the shm_alloc/free routines any more, so
    remove them to make code more readable.

    [manfred@colorfullife.com: Rediff, to continue to keep rcu for free calls after a successful security_shm_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-18-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Only after ipc_addid() has succeeded will refcounting be used, so move
    initialization into ipc_addid() and remove from open-coded *_alloc()
    routines.

    Link: http://lkml.kernel.org/r/20170525185107.12869-17-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Loosely based on a patch from Kees Cook :
    - id and retval can be merged
    - if ipc_addid() fails, then use call_rcu() directly.

    The difference is that call_rcu is used for failed ipc_addid() calls, to
    continue to guaranteed an rcu delay for security_msg_queue_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-16-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Loosely based on a patch from Kees Cook :
    - id and error can be merged
    - if operations before ipc_addid() fail, then use call_rcu() directly.

    The difference is that call_rcu is used for failures after
    security_shm_alloc(), to continue to guaranteed an rcu delay for
    security_sem_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-15-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Loosely based on a patch from Kees Cook :
    - id and retval can be merged
    - if ipc_addid() fails, then use call_rcu() directly.

    The difference is that call_rcu is used for failed ipc_addid() calls, to
    continue to guaranteed an rcu delay for security_sem_free().

    Link: http://lkml.kernel.org/r/20170525185107.12869-14-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • No callers remain for ipc_rcu_alloc(). Drop the function.

    [manfred@colorfullife.com: Rediff because the memset was temporarily inside ipc_rcu_free()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-13-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it. This also allows for msg_queue structure layout to be
    randomized in the future.

    Link: http://lkml.kernel.org/r/20170525185107.12869-12-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it. This also allows for shmid_kernel structure layout to be
    randomized in the future.

    Link: http://lkml.kernel.org/r/20170525185107.12869-11-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Instead of using ipc_rcu_alloc() which only performs the refcount bump,
    open code it to perform better sem-specific checks. This also allows
    for sem_array structure layout to be randomized in the future.

    [manfred@colorfullife.com: Rediff, because the memset was temporarily inside ipc_rcu_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-10-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • There are no more callers of ipc_rcu_free(), so remove it.

    Link: http://lkml.kernel.org/r/20170525185107.12869-9-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-8-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-7-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Avoid using ipc_rcu_free, since it just re-finds the original structure
    pointer. For the pre-list-init failure path, there is no RCU needed,
    since it was just allocated. It can be directly freed.

    Link: http://lkml.kernel.org/r/20170525185107.12869-6-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • The only users of ipc_alloc() were ipc_rcu_alloc() and the on-heap
    sem_io fall-back memory. Better to just open-code these to make things
    easier to read.

    [manfred@colorfullife.com: Rediff due to inclusion of memset() into ipc_rcu_alloc()]
    Link: http://lkml.kernel.org/r/20170525185107.12869-5-manfred@colorfullife.com
    Signed-off-by: Kees Cook
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • sem_ctime is initialized to the semget() time and then updated at every
    semctl() that changes the array.

    Thus it does not represent the time of the last change.

    Especially, semop() calls are only stored in sem_otime, not in
    sem_ctime.

    This is already described in ipc/sem.c, I just overlooked that there is
    a comment in include/linux/sem.h and man semctl(2) as well.

    So: Correct wrong comments.

    Link: http://lkml.kernel.org/r/20170515171912.6298-4-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Kees Cook
    Cc:
    Cc: Davidlohr Bueso
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Fabian Frederick
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • ipc has two management structures that exist for every id:
    - struct kern_ipc_perm, it contains e.g. the permissions.
    - struct ipc_rcu, it contains the rcu head for rcu handling and the
    refcount.

    The patch merges both structures.

    As a bonus, we may save one cacheline, because both structures are
    cacheline aligned. In addition, it reduces the number of casts, instead
    most codepaths can use container_of.

    To simplify code, the ipc_rcu_alloc initializes the allocation to 0.

    [manfred@colorfullife.com: really include the memset() into ipc_alloc_rcu()]
    Link: http://lkml.kernel.org/r/564f8612-0601-b267-514f-a9f650ec9b32@colorfullife.com
    Link: http://lkml.kernel.org/r/20170525185107.12869-3-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • sma->sem_base is initialized with

    sma->sem_base = (struct sem *) &sma[1];

    The current code has four problems:
    - There is an unnecessary pointer dereference - sem_base is not needed.
    - Alignment for struct sem only works by chance.
    - The current code causes false positive for static code analysis.
    - This is a cast between different non-void types, which the future
    randstruct GCC plugin warns on.

    And, as bonus, the code size gets smaller:

    Before:
    0 .text 00003770
    After:
    0 .text 0000374e

    [manfred@colorfullife.com: s/[0]/[]/, per hch]
    Link: http://lkml.kernel.org/r/20170525185107.12869-2-manfred@colorfullife.com
    Link: http://lkml.kernel.org/r/20170515171912.6298-2-manfred@colorfullife.com
    Signed-off-by: Manfred Spraul
    Acked-by: Kees Cook
    Cc: Kees Cook
    Cc:
    Cc: Davidlohr Bueso
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Fabian Frederick
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Add /proc/self/task//fail-nth file that allows failing
    0-th, 1-st, 2-nd and so on calls systematically.
    Excerpt from the added documentation:

    "Write to this file of integer N makes N-th call in the current task
    fail (N is 0-based). Read from this file returns a single char 'Y' or
    'N' that says if the fault setup with a previous write to this file
    was injected or not, and disables the fault if it wasn't yet injected.
    Note that this file enables all types of faults (slab, futex, etc).
    This setting takes precedence over all other generic settings like
    probability, interval, times, etc. But per-capability settings (e.g.
    fail_futex/ignore-private) take precedence over it. This feature is
    intended for systematic testing of faults in a single system call. See
    an example below"

    Why add a new setting:
    1. Existing settings are global rather than per-task.
    So parallel testing is not possible.
    2. attr->interval is close but it depends on attr->count
    which is non reset to 0, so interval does not work as expected.
    3. Trying to model this with existing settings requires manipulations
    of all of probability, interval, times, space, task-filter and
    unexposed count and per-task make-it-fail files.
    4. Existing settings are per-failure-type, and the set of failure
    types is potentially expanding.
    5. make-it-fail can't be changed by unprivileged user and aggressive
    stress testing better be done from an unprivileged user.
    Similarly, this would require opening the debugfs files to the
    unprivileged user, as he would need to reopen at least times file
    (not possible to pre-open before dropping privs).

    The proposed interface solves all of the above (see the example).

    We want to integrate this into syzkaller fuzzer. A prototype has found
    10 bugs in kernel in first day of usage:

    https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance

    I've made the current interface work with all types of our sandboxes.
    For setuid the secret sauce was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to
    make /proc entries non-root owned. So I am fine with the current
    version of the code.

    [akpm@linux-foundation.org: fix build]
    Link: http://lkml.kernel.org/r/20170328130128.101773-1-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Cc: Akinobu Mita
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • kcmp syscall is build iif CONFIG_CHECKPOINT_RESTORE is selected, so wrap
    appropriate helpers in epoll code with the config to build it
    conditionally.

    Link: http://lkml.kernel.org/r/20170513083456.GG1881@uranus.lan
    Signed-off-by: Cyrill Gorcunov
    Reported-by: Andrew Morton
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Pavel Emelyanov
    Cc: Michael Kerrisk
    Cc: Jason Baron
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • With current epoll architecture target files are addressed with
    file_struct and file descriptor number, where the last is not unique.
    Moreover files can be transferred from another process via unix socket,
    added into queue and closed then so we won't find this descriptor in the
    task fdinfo list.

    Thus to checkpoint and restore such processes CRIU needs to find out
    where exactly the target file is present to add it into epoll queue.
    For this sake one can use kcmp call where some particular target file
    from the queue is compared with arbitrary file passed as an argument.

    Because epoll target files can have same file descriptor number but
    different file_struct a caller should explicitly specify the offset
    within.

    To test if some particular file is matching entry inside epoll one have
    to

    - fill kcmp_epoll_slot structure with epoll file descriptor,
    target file number and target file offset (in case if only
    one target is present then it should be 0)

    - call kcmp as kcmp(pid1, pid2, KCMP_EPOLL_TFD, fd, &kcmp_epoll_slot)
    - the kernel fetch file pointer matching file descriptor @fd of pid1
    - lookups for file struct in epoll queue of pid2 and returns traditional
    0,1,2 result for sorting purpose

    Link: http://lkml.kernel.org/r/20170424154423.511592110@gmail.com
    Signed-off-by: Cyrill Gorcunov
    Acked-by: Andrey Vagin
    Cc: Al Viro
    Cc: Pavel Emelyanov
    Cc: Michael Kerrisk
    Cc: Jason Baron
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Since it is possbile to have same number in tfd field (say file added,
    closed, then nother file dup'ed to same number and added back) it is
    imposible to distinguish such target files solely by their numbers.

    Strictly speaking regular applications don't need to recognize these
    targets at all but for checkpoint/restore sake we need to collect
    targets to be able to push them back on restore stage in a proper order.

    Thus lets add file position, inode and device number where this target
    lays. This three fields can be used as a primary key for sorting, and
    together with kcmp help CRIU can find out an exact file target (from the
    whole set of processes being checkpointed).

    Link: http://lkml.kernel.org/r/20170424154423.436491881@gmail.com
    Signed-off-by: Cyrill Gorcunov
    Acked-by: Andrei Vagin
    Cc: Al Viro
    Cc: Pavel Emelyanov
    Cc: Michael Kerrisk
    Cc: Jason Baron
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • This is a layering violation so we replace the uses with calls to
    sg_page(). This is a prep patch for replacing page_link and this is one
    of the very few uses outside of scatterlist.h.

    Link: http://lkml.kernel.org/r/1495663199-22234-1-git-send-email-logang@deltatee.com
    Signed-off-by: Logan Gunthorpe
    Signed-off-by: Stephen Bates
    Acked-by: Stefani Seibold
    Cc: Stefani Seibold
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     
  • Use errors=replace because it is never desirable for lx-dmesg to fail on
    string decoding errors, not even if the log buffer is corrupt and we
    show incorrect info.

    The kernel will sometimes print utf8, for example the copyright symbol
    from jffs2. In order to make this work specify 'utf8' everywhere
    because python2 otherwise defaults to 'ascii'.

    In theory the second errors='replace' is not be required because
    everything that can be decoded as utf8 should also be encodable back to
    utf8. But it's better to be extra safe here. It's worth noting that
    this is definitely not true for encoding='ascii', unknown characters are
    replaced with U+FFFD REPLACEMENT CHARACTER and they fail to encode back
    to ascii.

    Link: http://lkml.kernel.org/r/acee067f3345954ed41efb77b80eebdc038619c6.1498481469.git.leonard.crestez@nxp.com
    Signed-off-by: Leonard Crestez
    Acked-by: Jan Kiszka
    Cc: Jason Wessel
    Cc: Kieran Bingham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Leonard Crestez
     
  • In some cases it is possible for the str() conversion here to throw
    encoding errors because log_buf might not point to valid ascii. For
    example:

    (gdb) python print str(gdb.parse_and_eval("log_buf"))
    Traceback (most recent call last):
    File "", line 1, in
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u0303' in
    position 24: ordinal not in range(128)

    Avoid this by explicitly casting to (void *) inside the gdb expression.

    Link: http://lkml.kernel.org/r/ba6f85dbb02ca980ebd0e2399b0649423399b565.1498481469.git.leonard.crestez@nxp.com
    Signed-off-by: Leonard Crestez
    Reviewed-by: Jan Kiszka
    Cc: Jason Wessel
    Cc: Kieran Bingham
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Leonard Crestez
     
  • lx-fdtdump dumps the flattened device tree passed to the kernel from the
    bootloader to the filename specified as the command argument. If no
    argument is provided it defaults to fdtdump.dtb. This then allows
    further post processing on the machine running GDB. The fdt header is
    also also printed in the GDB console. For example:

    (gdb) lx-fdtdump
    fdt_magic: 0xD00DFEED
    fdt_totalsize: 0xC108
    off_dt_struct: 0x38
    off_dt_strings: 0x3804
    off_mem_rsvmap: 0x28
    version: 17
    last_comp_version: 16
    Dumped fdt to fdtdump.dtb

    >fdtdump fdtdump.dtb | less

    This command is useful as the bootloader can often re-write parts of the
    device tree, and this can sometimes cause the kernel to not boot.

    Link: http://lkml.kernel.org/r/1481280065-5336-2-git-send-email-kbingham@kernel.org
    Signed-off-by: Peter Griffin
    Signed-off-by: Kieran Bingham
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Griffin
     
  • As of commit bf3eac84c42d ("percpu-rwsem: kill CONFIG_PERCPU_RWSEM") we
    unconditionally build pcpu-rwsems. Remove a leftover in for
    FILE_LOCKING.

    Link: http://lkml.kernel.org/r/20170518180115.2794-1-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Mount fails if file system image has empty files because of sanity check
    while reading superblock. For empty files disk offset to end of file
    (i_eoffset) is cpu_to_le32(-1). Sanity check comparison, which compares
    disk offset with file system size isn't valid for this value and hence
    is ignored with this patch.

    Steps to reproduce:

    $ dd if=/dev/zero of=bfs-image count=204800
    $ mkfs.bfs bfs-image
    $ mkdir bfs-mount-point
    $ sudo mount -t bfs -o loop bfs-image bfs-mount-point/
    $ cd bfs-mount-point/
    $ sudo touch a
    $ cd ..
    $ sudo umount bfs-mount-point/
    $ sudo mount -t bfs -o loop bfs-image bfs-mount-point/
    mount: /dev/loop0: can't read superblock

    $ dmesg
    [25526.689580] BFS-fs: bfs_fill_super(): Inode 0x00000003 corrupted

    Tigran said:
    "If you had created the filesystem with the proper mkfs under SCO
    UnixWare 7 you (probably) wouldn't encounter this issue. But since
    commercial Unix-es are now part of history and the only proper way is
    the Linux mkfs.bfs utility, your patch is fine"

    Link: http://lkml.kernel.org/r/20170505201625.GA3097@hercules.tuxera.com
    Signed-off-by: Rakesh Pandit
    Acked-by: Tigran Aivazian
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rakesh Pandit
     
  • The add_device_randomness() function would ignore incoming bytes if the
    crng wasn't ready. This additionally makes sure to make an early enough
    call to add_latent_entropy() to influence the initial stack canary,
    which is especially important on non-x86 systems where it stays the same
    through the life of the boot.

    Link: http://lkml.kernel.org/r/20170626233038.GA48751@beast
    Signed-off-by: Kees Cook
    Cc: "Theodore Ts'o"
    Cc: Arnd Bergmann
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jessica Yu
    Cc: Steven Rostedt (VMware)
    Cc: Viresh Kumar
    Cc: Tejun Heo
    Cc: Prarit Bhargava
    Cc: Lokesh Vutla
    Cc: Nicholas Piggin
    Cc: AKASHI Takahiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Prevent use of uninitialized memory (originating from the stack frame of
    do_sysctl()) by verifying that the name array is filled with sufficient
    input data before comparing its specific entries with integer constants.

    Through timing measurement or analyzing the kernel debug logs, a
    user-mode program could potentially infer the results of comparisons
    against the uninitialized memory, and acquire some (very limited)
    information about the state of the kernel stack. The change also
    eliminates possible future warnings by tools such as KMSAN and other
    code checkers / instrumentations.

    Link: http://lkml.kernel.org/r/20170524122139.21333-1-mjurczyk@google.com
    Signed-off-by: Mateusz Jurczyk
    Acked-by: Kees Cook
    Cc: "David S. Miller"
    Cc: Matthew Whitehead
    Cc: "Eric W. Biederman"
    Cc: Tetsuo Handa
    Cc: Alexander Potapenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mateusz Jurczyk
     
  • Add a few initial respective tests for an array:

    o Echoing values separated by spaces works
    o Echoing only first elements will set first elements
    o Confirm PAGE_SIZE limit still applies even if an array is used

    Link: http://lkml.kernel.org/r/20170630224431.17374-7-mcgrof@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Cc: Kees Cook
    Cc: "Eric W. Biederman"
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • Test against a simple proc_douintvec() case. While at it, add a test
    against UINT_MAX. Make sure UINT_MAX works, and UINT_MAX+1 will fail
    and that negative values are not accepted.

    Link: http://lkml.kernel.org/r/20170630224431.17374-6-mcgrof@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Cc: Kees Cook
    Cc: "Eric W. Biederman"
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • Test against a simple proc_dointvec() case. While at it, add a test
    against INT_MAX. Make sure INT_MAX works, and INT_MAX+1 will fail.
    Also test negative values work.

    Link: http://lkml.kernel.org/r/20170630224431.17374-5-mcgrof@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Cc: Kees Cook
    Cc: "Eric W. Biederman"
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     
  • Add the following tests to ensure we do not regress:

    o Test using a buffer full of space (PAGE_SIZE-1) followed by a
    single digit works

    o Test using a buffer full of spaces (PAGE_SIZE or over) will fail

    As tests increase instead of unloading the module and reloading it we
    can just do a shell reset_vals() with a reset to values we know are set
    at init on the driver.

    Link: http://lkml.kernel.org/r/20170630224431.17374-4-mcgrof@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Cc: Kees Cook
    Cc: "Eric W. Biederman"
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez