22 Nov, 2013

1 commit

  • Pull audit updates from Eric Paris:
    "Nothing amazing. Formatting, small bug fixes, couple of fixes where
    we didn't get records due to some old VFS changes, and a change to how
    we collect execve info..."

    Fixed conflict in fs/exec.c as per Eric and linux-next.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    audit: fix type of sessionid in audit_set_loginuid()
    audit: call audit_bprm() only once to add AUDIT_EXECVE information
    audit: move audit_aux_data_execve contents into audit_context union
    audit: remove unused envc member of audit_aux_data_execve
    audit: Kill the unused struct audit_aux_data_capset
    audit: do not reject all AUDIT_INODE filter types
    audit: suppress stock memalloc failure warnings since already managed
    audit: log the audit_names record type
    audit: add child record before the create to handle case where create fails
    audit: use given values in tty_audit enable api
    audit: use nlmsg_len() to get message payload length
    audit: use memset instead of trying to initialize field by field
    audit: fix info leak in AUDIT_GET requests
    audit: update AUDIT_INODE filter rule to comparator function
    audit: audit feature to set loginuid immutable
    audit: audit feature to only allow unsetting the loginuid
    audit: allow unsetting the loginuid (with priv)
    audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
    audit: loginuid functions coding style
    selinux: apply selinux checks on new audit message types
    ...

    Linus Torvalds
     

13 Nov, 2013

3 commits

  • Merge first patch-bomb from Andrew Morton:
    "Quite a lot of other stuff is banked up awaiting further
    next->mainline merging, but this batch contains:

    - Lots of random misc patches
    - OCFS2
    - Most of MM
    - backlight updates
    - lib/ updates
    - printk updates
    - checkpatch updates
    - epoll tweaking
    - rtc updates
    - hfs
    - hfsplus
    - documentation
    - procfs
    - update gcov to gcc-4.7 format
    - IPC"

    * emailed patches from Andrew Morton : (269 commits)
    ipc, msg: fix message length check for negative values
    ipc/util.c: remove unnecessary work pending test
    devpts: plug the memory leak in kill_sb
    ./Makefile: export initial ramdisk compression config option
    init/Kconfig: add option to disable kernel compression
    drivers: w1: make w1_slave::flags long to avoid memory corruption
    drivers/w1/masters/ds1wm.cuse dev_get_platdata()
    drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
    drivers/memstick/core/mspro_block.c: fix attributes array allocation
    drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
    kernel/panic.c: reduce 1 byte usage for print tainted buffer
    gcov: reuse kbasename helper
    kernel/gcov/fs.c: use pr_warn()
    kernel/module.c: use pr_foo()
    gcov: compile specific gcov implementation based on gcc version
    gcov: add support for gcc 4.7 gcov format
    gcov: move gcov structs definitions to a gcc version specific file
    kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
    kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
    kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
    ...

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "All kinds of stuff this time around; some more notable parts:

    - RCU'd vfsmounts handling
    - new primitives for coredump handling
    - files_lock is gone
    - Bruce's delegations handling series
    - exportfs fixes

    plus misc stuff all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
    ecryptfs: ->f_op is never NULL
    locks: break delegations on any attribute modification
    locks: break delegations on link
    locks: break delegations on rename
    locks: helper functions for delegation breaking
    locks: break delegations on unlink
    namei: minor vfs_unlink cleanup
    locks: implement delegations
    locks: introduce new FL_DELEG lock flag
    vfs: take i_mutex on renamed file
    vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
    vfs: don't use PARENT/CHILD lock classes for non-directories
    vfs: pull ext4's double-i_mutex-locking into common code
    exportfs: fix quadratic behavior in filehandle lookup
    exportfs: better variable name
    exportfs: move most of reconnect_path to helper function
    exportfs: eliminate unused "noprogress" counter
    exportfs: stop retrying once we race with rename/remove
    exportfs: clear DISCONNECTED on all parents sooner
    exportfs: more detailed comment for path_reconnect
    ...

    Linus Torvalds
     
  • The get_dumpable() return value is not boolean. Most users of the
    function actually want to be testing for non-SUID_DUMP_USER(1) rather than
    SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
    protected state. Almost all places did this correctly, excepting the two
    places fixed in this patch.

    Wrong logic:
    if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
    or
    if (dumpable == 0) { /* be protective */ }
    or
    if (!dumpable) { /* be protective */ }

    Correct logic:
    if (dumpable != SUID_DUMP_USER) { /* be protective */ }
    or
    if (dumpable != 1) { /* be protective */ }

    Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
    user was able to ptrace attach to processes that had dropped privileges to
    that user. (This may have been partially mitigated if Yama was enabled.)

    The macros have been moved into the file that declares get/set_dumpable(),
    which means things like the ia64 code can see them too.

    CVE-2013-2929

    Reported-by: Vasily Kulikov
    Signed-off-by: Kees Cook
    Cc: "Luck, Tony"
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

06 Nov, 2013

1 commit

  • Move the audit_bprm() call from search_binary_handler() to exec_binprm(). This
    allows us to get rid of the mm member of struct audit_aux_data_execve since
    bprm->mm will equal current->mm.

    This also mitigates the issue that ->argc could be modified by the
    load_binary() call in search_binary_handler().

    audit_bprm() was being called to add an AUDIT_EXECVE record to the audit
    context every time search_binary_handler() was recursively called. Only one
    reference is necessary.

    Reported-by: Oleg Nesterov
    Cc: Eric Paris
    Signed-off-by: Richard Guy Briggs
    Signed-off-by: Eric Paris
    ---
    This patch is against 3.11, but was developed on Oleg's post-3.11 patches that
    introduce exec_binprm().

    Richard Guy Briggs
     

25 Oct, 2013

1 commit


09 Oct, 2013

1 commit

  • It is possible for a task in a numa group to call exec, and
    have the new (unrelated) executable inherit the numa group
    association from its former self.

    This has the potential to break numa grouping, and is trivial
    to fix.

    Signed-off-by: Rik van Riel
    Signed-off-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Srikar Dronamraju
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de
    Signed-off-by: Ingo Molnar

    Rik van Riel
     

12 Sep, 2013

9 commits

  • The error hanling and ret-from-loop look confusing and inconsistent.

    - "retval >= 0" simply returns

    - "!bprm->file" returns too but with read_unlock() because
    binfmt_lock was already re-acquired

    - "retval != -ENOEXEC || bprm->mm == NULL" does "break" and
    relies on the same check after the main loop

    Consolidate these checks into a single if/return statement.

    need_retry still checks "retval == -ENOEXEC", but this and -ENOENT before
    the main loop are not needed. This is only for pathological and
    impossible list_empty(&formats) case.

    It is not clear why do we check "bprm->mm == NULL", probably this
    should be removed.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • A separate one-liner for better documentation.

    It doesn't make sense to retry if request_module() fails to exec
    /sbin/modprobe, add the additional "request_module() < 0" check.

    However, this logic still doesn't look exactly right:

    1. It would be better to check "request_module() != 0", the user
    space modprobe process should report the correct exit code.
    But I didn't dare to add the user-visible change.

    2. The whole ENOEXEC logic looks suboptimal. Suppose that we try
    to exec a "#!path-to-unsupported-binary" script. In this case
    request_module() + "retry" will be done twice: first by the
    "depth == 1" code, and then again by the "depth == 0" caller
    which doesn't make sense.

    3. And note that in the case above bprm->buf was already changed
    by load_script()->prepare_binprm(), so this looks even more
    ugly.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • search_binary_handler() uses "for (try=0; try
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • search_binary_handler() checks ->load_binary != NULL for no reason, this
    method should be always defined. Turn this check into WARN_ON() and move
    it into __register_binfmt().

    Also, kill the function pointer. The current code looks confusing, as if
    ->load_binary can go away after read_unlock(&binfmt_lock). But we rely on
    module_get(fmt->module), this fmt can't be changed or unregistered,
    otherwise this code is buggy anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • When search_binary_handler() succeeds it does allow_write_access() and
    fput(), then it clears bprm->file to ensure the caller will not do the
    same.

    We can simply move this code to exec_binprm() which is called only once.
    In fact we could move this to free_bprm() and remove the same code in
    do_execve_common's error path.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • A separate one-liner with the minor fix.

    PROC_EVENT_EXEC reports the "exec" event, but this message is sent at
    least twice if search_binary_handler() is called by ->load_binary()
    recursively, say, load_script().

    Move it to exec_binprm(), this is "depth == 0" code too.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Nobody except search_binary_handler() should touch ->recursion_depth, "int
    depth" buys nothing but complicates the code, kill it.

    Probably we should also kill "fn" and the !NULL check, ->load_binary
    should be always defined. And it can not go away after read_unlock() or
    this code is buggy anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • task_pid_nr_ns() and trace/ptrace code in the middle of the recursive
    search_binary_handler() looks confusing and imho annoying. We only need
    this code if "depth == 0", lets add a simple helper which calls
    search_binary_handler() and does trace_sched_process_exec() +
    ptrace_event().

    The patch also moves the setting of task->did_exec, we need to do this
    only once.

    Note: we can kill either task->did_exec or PF_FORKNOEXEC.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Pavel reported that in case if vma area get unmapped and then mapped (or
    expanded) in-place, the soft dirty tracker won't be able to recognize this
    situation since it works on pte level and ptes are get zapped on unmap,
    loosing soft dirty bit of course.

    So to resolve this situation we need to track actions on vma level, there
    VM_SOFTDIRTY flag comes in. When new vma area created (or old expanded)
    we set this bit, and keep it here until application calls for clearing
    soft dirty bit.

    Thus when user space application track memory changes now it can detect if
    vma area is renewed.

    Reported-by: Pavel Emelyanov
    Signed-off-by: Cyrill Gorcunov
    Cc: Andy Lutomirski
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Cc: Rob Landley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

16 Aug, 2013

1 commit

  • Ben Tebulin reported:

    "Since v3.7.2 on two independent machines a very specific Git
    repository fails in 9/10 cases on git-fsck due to an SHA1/memory
    failures. This only occurs on a very specific repository and can be
    reproduced stably on two independent laptops. Git mailing list ran
    out of ideas and for me this looks like some very exotic kernel issue"

    and bisected the failure to the backport of commit 53a59fc67f97 ("mm:
    limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").

    That commit itself is not actually buggy, but what it does is to make it
    much more likely to hit the partial TLB invalidation case, since it
    introduces a new case in tlb_next_batch() that previously only ever
    happened when running out of memory.

    The real bug is that the TLB gather virtual memory range setup is subtly
    buggered. It was introduced in commit 597e1c3580b7 ("mm/mmu_gather:
    enable tlb flush range in generic mmu_gather"), and the range handling
    was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB
    range flushed when __tlb_remove_page() runs out of slots"), but that fix
    was not complete.

    The problem with the TLB gather virtual address range is that it isn't
    set up by the initial tlb_gather_mmu() initialization (which didn't get
    the TLB range information), but it is set up ad-hoc later by the
    functions that actually flush the TLB. And so any such case that forgot
    to update the TLB range entries would potentially miss TLB invalidates.

    Rather than try to figure out exactly which particular ad-hoc range
    setup was missing (I personally suspect it's the hugetlb case in
    zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
    did), this patch just gets rid of the problem at the source: make the
    TLB range information available to tlb_gather_mmu(), and initialize it
    when initializing all the other tlb gather fields.

    This makes the patch larger, but conceptually much simpler. And the end
    result is much more understandable; even if you want to play games with
    partial ranges when invalidating the TLB contents in chunks, now the
    range information is always there, and anybody who doesn't want to
    bother with it won't introduce subtle bugs.

    Ben verified that this fixes his problem.

    Reported-bisected-and-tested-by: Ben Tebulin
    Build-testing-by: Stephen Rothwell
    Build-testing-by: Richard Weinberger
    Reviewed-by: Michal Hocko
    Acked-by: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Jul, 2013

4 commits

  • 924b42d5 ("Use boot based time for process start time and boot time in
    /proc") updated copy_process/do_task_stat but forgot about de_thread().
    This breaks "ps axOT" if a sub-thread execs.

    Note: I think that task->start_time should die.

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Acked-by: John Stultz
    Cc: Tomas Janousek
    Cc: Tomas Smetana
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Trivial cleanup. do_execve_common() can use current_user() and avoid the
    unnecessary "struct cred *cred" var.

    Signed-off-by: Oleg Nesterov
    Cc: Vasiliy Kulikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • de_thread() can use change_pid() instead of detach + attach. This looks
    better and this ensures that, say, next_thread() can never see a task with
    ->pid == NULL.

    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Cc: Michal Hocko
    Cc: Pavel Emelyanov
    Cc: Sergey Dyasly
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Pull second set of VFS changes from Al Viro:
    "Assorted f_pos race fixes, making do_splice_direct() safe to call with
    i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
    ->d_hash/->d_compare calling conventions changes from Linus, misc
    stuff all over the place."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    Document ->tmpfile()
    ext4: ->tmpfile() support
    vfs: export lseek_execute() to modules
    lseek_execute() doesn't need an inode passed to it
    block_dev: switch to fixed_size_llseek()
    cpqphp_sysfs: switch to fixed_size_llseek()
    tile-srom: switch to fixed_size_llseek()
    proc_powerpc: switch to fixed_size_llseek()
    ubi/cdev: switch to fixed_size_llseek()
    pci/proc: switch to fixed_size_llseek()
    isapnp: switch to fixed_size_llseek()
    lpfc: switch to fixed_size_llseek()
    locks: give the blocked_hash its own spinlock
    locks: add a new "lm_owner_key" lock operation
    locks: turn the blocked_list into a hashtable
    locks: convert fl_link to a hlist_node
    locks: avoid taking global lock if possible when waking up blocked waiters
    locks: protect most of the file_lock handling with i_lock
    locks: encapsulate the fl_link list handling
    locks: make "added" in __posix_lock_file a bool
    ...

    Linus Torvalds
     

29 Jun, 2013

1 commit


26 Jun, 2013

1 commit

  • There was a a bug in setup_new_exec(), whereby
    the test to disabled perf monitoring was not
    correct because the new credentials for the
    process were not yet committed and therefore
    the get_dumpable() test was never firing.

    The patch fixes the problem by moving the
    perf_event test until after the credentials
    are committed.

    Signed-off-by: Stephane Eranian
    Tested-by: Jiri Olsa
    Acked-by: Peter Zijlstra
    Cc:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

2 commits

  • threadgroup_lock() takes signal->cred_guard_mutex to ensure that
    thread_group_leader() is stable. This doesn't look nice, the scope of
    this lock in do_execve() is huge.

    And as Dave pointed out this can lead to deadlock, we have the
    following dependencies:

    do_execve: cred_guard_mutex -> i_mutex
    cgroup_mount: i_mutex -> cgroup_mutex
    attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

    Change de_thread() to take threadgroup_change_begin() around the
    switch-the-leader code and change threadgroup_lock() to avoid
    ->cred_guard_mutex.

    Note that de_thread() can't sleep with ->group_rwsem held, this can
    obviously deadlock with the exiting leader if the writer is active, so it
    does threadgroup_change_end() before schedule().

    Reported-by: Dave Jones
    Acked-by: Tejun Heo
    Acked-by: Li Zefan
    Signed-off-by: Oleg Nesterov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • set_task_comm() does memset() + wmb() before strlcpy(). This buys
    nothing and to add to the confusion, the comment is wrong.

    - We do not need memset() to be "safe from non-terminating string
    reads", the final char is always zero and we never change it.

    - wmb() is paired with nothing, it cannot prevent from printing
    the mixture of the old/new data unless the reader takes the lock.

    Signed-off-by: Oleg Nesterov
    Cc: Andi Kleen
    Cc: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

30 Apr, 2013

2 commits

  • On architectures where a pgd entry may be shared between user and kernel
    (e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0.
    This patch introduces a generic USER_PGTABLES_CEILING that arch code can
    override. It is the responsibility of the arch code setting the ceiling
    to ensure the complete freeing of the page tables (usually in
    pgd_free()).

    [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
    Signed-off-by: Hugh Dickins
    Signed-off-by: Catalin Marinas
    Cc: Russell King
    Cc: [3.3+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • switch binfmts that use ->read() to that (and to kernel_read()
    in several cases in binfmt_flat - sure, it's nommu, but still,
    doing ->read() into kmalloc'ed buffer...)

    Signed-off-by: Al Viro

    Al Viro
     

28 Feb, 2013

1 commit

  • The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_*
    defines introduced in 54b501992dd2 ("coredump: warn about unsafe
    suid_dumpable / core_pattern combo"). Remove the new ones, and use the
    prior values instead.

    Signed-off-by: Kees Cook
    Reported-by: Chen Gang
    Cc: Alexander Viro
    Cc: Alan Cox
    Cc: "Eric W. Biederman"
    Cc: Doug Ledford
    Cc: Serge Hallyn
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

26 Feb, 2013

1 commit


23 Feb, 2013

1 commit


12 Jan, 2013

1 commit

  • The tricky problem is this check:

    if (i++ >= max)

    icc (mis)optimizes this check as:

    if (++i > max)

    The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF).

    This is "allowed" by the C standard, assuming i++ never overflows,
    because signed integer overflow is undefined behavior. This
    optimization effectively reverts the previous commit 362e6663ef23
    ("exec.c, compat.c: fix count(), compat_count() bounds checking") that
    tries to fix the check.

    This patch simply moves ++ after the check.

    Signed-off-by: Xi Wang
    Cc: Jason Baron
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xi Wang
     

21 Dec, 2012

3 commits

  • Merge the rest of Andrew's patches for -rc1:
    "A bunch of fixes and misc missed-out-on things.

    That'll do for -rc1. I still have a batch of IPC patches which still
    have a possible bug report which I'm chasing down."

    * emailed patches from Andrew Morton : (25 commits)
    keys: use keyring_alloc() to create module signing keyring
    keys: fix unreachable code
    sendfile: allows bypassing of notifier events
    SGI-XP: handle non-fatal traps
    fat: fix incorrect function comment
    Documentation: ABI: remove testing/sysfs-devices-node
    proc: fix inconsistent lock state
    linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
    memcg: don't register hotcpu notifier from ->css_alloc()
    checkpatch: warn on uapi #includes that #include
    mm: cma: WARN if freed memory is still in use
    exec: do not leave bprm->interp on stack
    ...

    Linus Torvalds
     
  • Pull signal handling cleanups from Al Viro:
    "sigaltstack infrastructure + conversion for x86, alpha and um,
    COMPAT_SYSCALL_DEFINE infrastructure.

    Note that there are several conflicts between "unify
    SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
    resolution is trivial - just remove definitions of SS_ONSTACK and
    SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
    include/uapi/linux/signal.h contains the unified variant."

    Fixed up conflicts as per Al.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    alpha: switch to generic sigaltstack
    new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
    generic compat_sys_sigaltstack()
    introduce generic sys_sigaltstack(), switch x86 and um to it
    new helper: compat_user_stack_pointer()
    new helper: restore_altstack()
    unify SS_ONSTACK/SS_DISABLE definitions
    new helper: current_user_stack_pointer()
    missing user_stack_pointer() instances
    Bury the conditionals from kernel_thread/kernel_execve series
    COMPAT_SYSCALL_DEFINE: infrastructure

    Linus Torvalds
     
  • If a series of scripts are executed, each triggering module loading via
    unprintable bytes in the script header, kernel stack contents can leak
    into the command line.

    Normally execution of binfmt_script and binfmt_misc happens recursively.
    However, when modules are enabled, and unprintable bytes exist in the
    bprm->buf, execution will restart after attempting to load matching
    binfmt modules. Unfortunately, the logic in binfmt_script and
    binfmt_misc does not expect to get restarted. They leave bprm->interp
    pointing to their local stack. This means on restart bprm->interp is
    left pointing into unused stack memory which can then be copied into the
    userspace argv areas.

    After additional study, it seems that both recursion and restart remains
    the desirable way to handle exec with scripts, misc, and modules. As
    such, we need to protect the changes to interp.

    This changes the logic to require allocation for any changes to the
    bprm->interp. To avoid adding a new kmalloc to every exec, the default
    value is left as-is. Only when passing through binfmt_script or
    binfmt_misc does an allocation take place.

    For a proof of concept, see DoTest.sh from:

    http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

    Signed-off-by: Kees Cook
    Cc: halfdog
    Cc: P J P
    Cc: Alexander Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

20 Dec, 2012

1 commit

  • All architectures have
    CONFIG_GENERIC_KERNEL_THREAD
    CONFIG_GENERIC_KERNEL_EXECVE
    __ARCH_WANT_SYS_EXECVE
    None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
    of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
    Kill the conditionals and make both callers use do_execve().

    Signed-off-by: Al Viro

    Al Viro
     

18 Dec, 2012

3 commits

  • Merge misc patches from Andrew Morton:
    "Incoming:

    - lots of misc stuff

    - backlight tree updates

    - lib/ updates

    - Oleg's percpu-rwsem changes

    - checkpatch

    - rtc

    - aoe

    - more checkpoint/restart support

    I still have a pile of MM stuff pending - Pekka should be merging
    later today after which that is good to go. A number of other things
    are twiddling thumbs awaiting maintainer merges."

    * emailed patches from Andrew Morton : (180 commits)
    scatterlist: don't BUG when we can trivially return a proper error.
    docs: update documentation about /proc//fdinfo/ fanotify output
    fs, fanotify: add @mflags field to fanotify output
    docs: add documentation about /proc//fdinfo/ output
    fs, notify: add procfs fdinfo helper
    fs, exportfs: add exportfs_encode_inode_fh() helper
    fs, exportfs: escape nil dereference if no s_export_op present
    fs, epoll: add procfs fdinfo helper
    fs, eventfd: add procfs fdinfo helper
    procfs: add ability to plug in auxiliary fdinfo providers
    tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
    breakpoint selftests: print failure status instead of cause make error
    kcmp selftests: print fail status instead of cause make error
    kcmp selftests: make run_tests fix
    mem-hotplug selftests: print failure status instead of cause make error
    cpu-hotplug selftests: print failure status instead of cause make error
    mqueue selftests: print failure status instead of cause make error
    vm selftests: print failure status instead of cause make error
    ubifs: use prandom_bytes
    mtd: nandsim: use prandom_bytes
    ...

    Linus Torvalds
     
  • To avoid an explosion of request_module calls on a chain of abusive
    scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
    as maximum recursion depth is hit, the error will fail all the way back
    up the chain, aborting immediately.

    This also has the side-effect of stopping the user's shell from attempting
    to reexecute the top-level file as a shell script. As seen in the
    dash source:

    if (cmd != path_bshell && errno == ENOEXEC) {
    *argv-- = cmd;
    *argv = cmd = path_bshell;
    goto repeat;
    }

    The above logic was designed for running scripts automatically that lacked
    the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
    things continue to behave as the shell expects.

    Additionally, when tracking recursion, the binfmt handlers should not be
    involved. The recursion being tracked is the depth of calls through
    search_binary_handler(), so that function should be exclusively responsible
    for tracking the depth.

    Signed-off-by: Kees Cook
    Cc: halfdog
    Cc: P J P
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Pull user namespace changes from Eric Biederman:
    "While small this set of changes is very significant with respect to
    containers in general and user namespaces in particular. The user
    space interface is now complete.

    This set of changes adds support for unprivileged users to create user
    namespaces and as a user namespace root to create other namespaces.
    The tyranny of supporting suid root preventing unprivileged users from
    using cool new kernel features is broken.

    This set of changes completes the work on setns, adding support for
    the pid, user, mount namespaces.

    This set of changes includes a bunch of basic pid namespace
    cleanups/simplifications. Of particular significance is the rework of
    the pid namespace cleanup so it no longer requires sending out
    tendrils into all kinds of unexpected cleanup paths for operation. At
    least one case of broken error handling is fixed by this cleanup.

    The files under /proc//ns/ have been converted from regular files
    to magic symlinks which prevents incorrect caching by the VFS,
    ensuring the files always refer to the namespace the process is
    currently using and ensuring that the ptrace_mayaccess permission
    checks are always applied.

    The files under /proc//ns/ have been given stable inode numbers
    so it is now possible to see if different processes share the same
    namespaces.

    Through the David Miller's net tree are changes to relax many of the
    permission checks in the networking stack to allowing the user
    namespace root to usefully use the networking stack. Similar changes
    for the mount namespace and the pid namespace are coming through my
    tree.

    Two small changes to add user namespace support were commited here adn
    in David Miller's -net tree so that I could complete the work on the
    /proc//ns/ files in this tree.

    Work remains to make it safe to build user namespaces and 9p, afs,
    ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
    Kconfig guard remains in place preventing that user namespaces from
    being built when any of those filesystems are enabled.

    Future design work remains to allow root users outside of the initial
    user namespace to mount more than just /proc and /sys."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
    proc: Usable inode numbers for the namespace file descriptors.
    proc: Fix the namespace inode permission checks.
    proc: Generalize proc inode allocation
    userns: Allow unprivilged mounts of proc and sysfs
    userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
    procfs: Print task uids and gids in the userns that opened the proc file
    userns: Implement unshare of the user namespace
    userns: Implent proc namespace operations
    userns: Kill task_user_ns
    userns: Make create_new_namespaces take a user_ns parameter
    userns: Allow unprivileged use of setns.
    userns: Allow unprivileged users to create new namespaces
    userns: Allow setting a userns mapping to your current uid.
    userns: Allow chown and setgid preservation
    userns: Allow unprivileged users to create user namespaces.
    userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
    userns: fix return value on mntns_install() failure
    vfs: Allow unprivileged manipulation of the mount namespace.
    vfs: Only support slave subtrees across different user namespaces
    vfs: Add a user namespace reference from struct mnt_namespace
    ...

    Linus Torvalds
     

29 Nov, 2012

1 commit