26 Aug, 2014

1 commit


10 Jul, 2014

2 commits


08 Jul, 2014

1 commit

  • Enabling NO_HZ_FULL currently has the side effect of enabling callback
    offloading on all CPUs. This results in lots of additional rcuo kthreads,
    and can also increase context switching and wakeups, even in cases where
    callback offloading is neither needed nor particularly desirable. This
    commit therefore enables callback offloading on a given CPU only if
    specifically requested at build time or boot time, or if that CPU has
    been specifically designated (again, either at build time or boot time)
    as a nohz_full CPU.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

12 Jun, 2014

1 commit

  • Pull module updates from Rusty Russell:
    "Most of this is cleaning up various driver sysfs permissions so we can
    re-add the perm check (we unified the module param and sysfs checks,
    but the module ones were stronger so we weakened them temporarily).

    Param parsing gets documented, and also "--" now forces args to be
    handed to init (and ignored by the kernel).

    Module NX/RO protections get tightened: we now set them before calling
    parse_args()"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    module: set nx before marking module MODULE_STATE_COMING.
    samples/kobject/: avoid world-writable sysfs files.
    drivers/hid/hid-picolcd_fb: avoid world-writable sysfs files.
    drivers/staging/speakup/: avoid world-writable sysfs files.
    drivers/regulator/virtual: avoid world-writable sysfs files.
    drivers/scsi/pm8001/pm8001_ctl.c: avoid world-writable sysfs files.
    drivers/hid/hid-lg4ff.c: avoid world-writable sysfs files.
    drivers/video/fbdev/sm501fb.c: avoid world-writable sysfs files.
    drivers/mtd/devices/docg3.c: avoid world-writable sysfs files.
    speakup: fix incorrect perms on speakup_acntsa.c
    cpumask.h: silence warning with -Wsign-compare
    Documentation: Update kernel-parameters.tx
    param: hand arguments after -- straight to init
    modpost: Fix resource leak in read_dump()

    Linus Torvalds
     

05 Jun, 2014

11 commits

  • Pull x86-64 espfix changes from Peter Anvin:
    "This is the espfix64 code, which fixes the IRET information leak as
    well as the associated functionality problem. With this code applied,
    16-bit stack segments finally work as intended even on a 64-bit
    kernel.

    Consequently, this patchset also removes the runtime option that we
    added as an interim measure.

    To help the people working on Linux kernels for very small systems,
    this patchset also makes these compile-time configurable features"

    * 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
    x86, espfix: Make it possible to disable 16-bit support
    x86, espfix: Make espfix64 a Kconfig option, fix UML
    x86, espfix: Fix broken header guard
    x86, espfix: Move espfix definitions into a separate header file
    x86-32, espfix: Remove filter for espfix32 due to race
    x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack

    Linus Torvalds
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • 1. Remove CLONE_KERNEL, it has no users and it is dangerous.

    The (old) comment says "List of flags we want to share for kernel
    threads" but this is not true, we do not want to share ->sighand by
    default. This flag can only be used if the caller is sure that both
    parent/child will never play with signals (say, allow_signal/etc).

    2. Change rest_init() to clone kernel_init() without CLONE_SIGHAND.

    In this case CLONE_SIGHAND does not really hurt, and it looks like
    optimization because copy_sighand() can avoid kmem_cache_alloc().

    But in fact this only adds the minor pessimization. kernel_init()
    is going to exec the init process, and de_thread() will need to
    unshare ->sighand and do kmem_cache_alloc(sighand_cachep) anyway,
    but it needs to do more work and take tasklist_lock and siglock.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Acked-by: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • When a module is built into the kernel the module_init() function
    becomes an initcall. Sometimes debugging through dynamic debug can
    help, however, debugging built in kernel modules is typically done by
    changing the .config, recompiling, and booting the new kernel in an
    effort to determine exactly which module caused a problem.

    This patchset can be useful stand-alone or combined with initcall_debug.
    There are cases where some initcalls can hang the machine before the
    console can be flushed, which can make initcall_debug output inaccurate.
    Having the ability to skip initcalls can help further debugging of these
    scenarios.

    Usage: initcall_blacklist=

    ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and
    the log contains:

    blacklisting initcall sgi_uv_sysfs_init
    ...
    ...
    initcall sgi_uv_sysfs_init blacklisted

    ex) added "initcall_blacklist=foo_bar,sgi_uv_sysfs_init" as a kernel parameter
    and the log contains:

    blacklisting initcall foo_bar
    blacklisting initcall sgi_uv_sysfs_init
    ...
    ...
    initcall sgi_uv_sysfs_init blacklisted

    [akpm@linux-foundation.org: tweak printk text]
    Signed-off-by: Prarit Bhargava
    Cc: Richard Weinberger
    Cc: Andi Kleen
    Cc: Josh Boyer
    Cc: Rob Landley
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prarit Bhargava
     
  • Pertially revert commit ea676e846a81 ("init/main.c: convert to
    pr_foo()").

    Unbeknownst to me, pr_debug() is different from the other pr_foo()
    levels: pr_debug() is a no-op when DEBUG is not defined.

    Happily, init/main.c does have a #define DEBUG so we didn't break
    initcall_debug. But the functioning of initcall_debug should not be
    dependent upon the presence of that #define DEBUG.

    Reported-by: Russell King
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • ... instead of naked numbers.

    Stuff in sysrq.c used to set it to 8 which is supposed to mean above
    default level so set it to DEBUG instead as we're terminating/killing all
    tasks and we want to be verbose there.

    Also, correct the check in x86_64_start_kernel which should be >= as
    we're clearly issuing the string there for all debug levels, not only
    the magical 10.

    Signed-off-by: Borislav Petkov
    Acked-by: Kees Cook
    Acked-by: Randy Dunlap
    Cc: Joe Perches
    Cc: Valdis Kletnieks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • sys_sgetmask and sys_ssetmask are obsolete system calls no longer
    supported in libc.

    This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
    mode configuration.That option is enabled by default for those
    architectures.

    Signed-off-by: Fabian Frederick
    Cc: Steven Miao
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Koichi Yasutake
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Greg Ungerer
    Cc: Heiko Carstens
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • CONFIG_CROSS_MEMORY_ATTACH adds couple syscalls: process_vm_readv and
    process_vm_writev, it's a kind of IPC for copying data between processes.
    Currently this option is placed inside "Processor type and features".

    This patch moves it into "General setup" (where all other arch-independed
    syscalls and ipc features are placed) and changes prompt string to less
    cryptic.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Christopher Yeoh
    Cc: Davidlohr Bueso
    Cc: Hugh Dickins
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Remove start_kernel()->mm_init_owner(&init_mm, &init_task).

    This doesn't really hurt but unnecessary and misleading. init_task is the
    "swapper" thread == current, its ->mm is always NULL. And init_mm can
    only be used as ->active_mm, not as ->mm.

    mm_init_owner() has a single caller with this patch, perhaps it should
    die. mm_init() can initialize ->owner under #ifdef.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Michal Hocko
    Cc: Balbir Singh
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Michal Hocko
    Cc: Peter Chiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • CONFIG_MM_OWNER makes no sense. It is not user-selectable, it is only
    selected by CONFIG_MEMCG automatically. So we can kill this option in
    init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally.

    Signed-off-by: Oleg Nesterov
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Kmemcg is currently under development and lacks some important features.
    In particular, it does not have support of kmem reclaim on memory pressure
    inside cgroup, which practically makes it unusable in real life. Let's
    warn about it in both Kconfig and Documentation to prevent complaints
    arising.

    Signed-off-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

22 May, 2014

1 commit


06 May, 2014

1 commit


05 May, 2014

1 commit

  • Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML
    build which had broken due to the non-existence of init_espfix_bsp()
    in UML: since UML uses its own Kconfig, this option does not appear in
    the UML build.

    This also makes it possible to make support for 16-bit segments a
    configuration option, for the people who want to minimize the size of
    the kernel.

    Reported-by: Ingo Molnar
    Signed-off-by: H. Peter Anvin
    Cc: Richard Weinberger
    Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com

    H. Peter Anvin
     

01 May, 2014

1 commit

  • The IRET instruction, when returning to a 16-bit segment, only
    restores the bottom 16 bits of the user space stack pointer. This
    causes some 16-bit software to break, but it also leaks kernel state
    to user space. We have a software workaround for that ("espfix") for
    the 32-bit kernel, but it relies on a nonzero stack segment base which
    is not available in 64-bit mode.

    In checkin:

    b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

    we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
    the logic that 16-bit support is crippled on 64-bit kernels anyway (no
    V86 support), but it turns out that people are doing stuff like
    running old Win16 binaries under Wine and expect it to work.

    This works around this by creating percpu "ministacks", each of which
    is mapped 2^16 times 64K apart. When we detect that the return SS is
    on the LDT, we copy the IRET frame to the ministack and use the
    relevant alias to return to userspace. The ministacks are mapped
    readonly, so if IRET faults we promote #GP to #DF which is an IST
    vector and thus has its own stack; we then do the fixup in the #DF
    handler.

    (Making #GP an IST exception would make the msr_safe functions unsafe
    in NMI/MC context, and quite possibly have other effects.)

    Special thanks to:

    - Andy Lutomirski, for the suggestion of using very small stack slots
    and copy (as opposed to map) the IRET frame there, and for the
    suggestion to mark them readonly and let the fault promote to #DF.
    - Konrad Wilk for paravirt fixup and testing.
    - Borislav Petkov for testing help and useful comments.

    Reported-by: Brian Gerst
    Signed-off-by: H. Peter Anvin
    Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
    Cc: Konrad Rzeszutek Wilk
    Cc: Borislav Petkov
    Cc: Andrew Lutomriski
    Cc: Linus Torvalds
    Cc: Dirk Hohndel
    Cc: Arjan van de Ven
    Cc: comex
    Cc: Alexander van Heukelum
    Cc: Boris Ostrovsky
    Cc: # consider after upstream merge

    H. Peter Anvin
     

28 Apr, 2014

1 commit

  • The kernel passes any args it doesn't need through to init, except it
    assumes anything containing '.' belongs to the kernel (for a module).
    This change means all users can clearly distinguish which arguments
    are for init.

    For example, the kernel uses debug ("dee-bug") to mean log everything to
    the console, where systemd uses the debug from the Scandinavian "day-boog"
    meaning "fail to boot". If a future versions uses argv[] instead of
    reading /proc/cmdline, this confusion will be avoided.

    eg: test 'FOO="this is --foo"' -- 'systemd.debug="true true true"'

    Gives:
    argv[0] = '/debug-init'
    argv[1] = 'test'
    argv[2] = 'systemd.debug=true true true'
    envp[0] = 'HOME=/'
    envp[1] = 'TERM=linux'
    envp[2] = 'FOO=this is --foo'

    Signed-off-by: Rusty Russell

    Rusty Russell
     

19 Apr, 2014

1 commit


13 Apr, 2014

1 commit

  • Pull audit updates from Eric Paris.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
    audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
    audit: do not cast audit_rule_data pointers pointlesly
    AUDIT: Allow login in non-init namespaces
    audit: define audit_is_compat in kernel internal header
    kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
    sched: declare pid_alive as inline
    audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
    syscall_get_arch: remove useless function arguments
    audit: remove stray newline from audit_log_execve_info() audit_panic() call
    audit: remove stray newlines from audit_log_lost messages
    audit: include subject in login records
    audit: remove superfluous new- prefix in AUDIT_LOGIN messages
    audit: allow user processes to log from another PID namespace
    audit: anchor all pid references in the initial pid namespace
    audit: convert PPIDs to the inital PID namespace.
    pid: get pid_t ppid of task in init_pid_ns
    audit: rename the misleading audit_get_context() to audit_take_context()
    audit: Add generic compat syscall support
    audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
    ...

    Linus Torvalds
     

08 Apr, 2014

2 commits

  • This can greatly aid in narrowing down the real source of initramfs
    problems such as failures related to the compression of the in-kernel
    initramfs when an external initramfs is in use as well. Existing errors
    are ambiguous as to which initramfs is a problem and why.

    [akpm@linux-foundation.org: use pr_debug()]
    Signed-off-by: Daniel M. Weeks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel M. Weeks
     
  • "make allnoconfig" exists to ease testing of minimal configurations.
    Documentation/SubmitChecklist includes a note to test with allnoconfig.
    This helps catch missing dependencies on common-but-not-required
    functionality, which might otherwise go unnoticed.

    However, allnoconfig still leaves many symbols enabled, because they're
    hidden behind CONFIG_EMBEDDED or CONFIG_EXPERT. For instance, allnoconfig
    still has CONFIG_PRINTK and CONFIG_BLOCK enabled, so drivers don't
    typically get build-tested with those disabled.

    To address this, introduce a new Kconfig option "allnoconfig_y", used on
    symbols which only exist to hide other symbols. Set it on CONFIG_EMBEDDED
    (which then selects CONFIG_EXPERT). allnoconfig will then disable all the
    symbols hidden behind those.

    Signed-off-by: Josh Triplett
    Tested-by: Paul E. McKenney
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     

04 Apr, 2014

5 commits

  • Merge first patch-bomb from Andrew Morton:
    - Various misc bits
    - kmemleak fixes
    - small befs, codafs, cifs, efs, freexxfs, hfsplus, minixfs, reiserfs things
    - fanotify
    - I appear to have become SuperH maintainer
    - ocfs2 updates
    - direct-io tweaks
    - a bit of the MM queue
    - printk updates
    - MAINTAINERS maintenance
    - some backlight things
    - lib/ updates
    - checkpatch updates
    - the rtc queue
    - nilfs2 updates
    - Small Documentation/ updates

    * emailed patches from Andrew Morton : (237 commits)
    Documentation/SubmittingPatches: remove references to patch-scripts
    Documentation/SubmittingPatches: update some dead URLs
    Documentation/filesystems/ntfs.txt: remove changelog reference
    Documentation/kmemleak.txt: updates
    fs/reiserfs/super.c: add __init to init_inodecache
    fs/reiserfs: move prototype declaration to header file
    fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache()
    fs/hfsplus/extents.c: fix concurrent acess of alloc_blocks
    fs/hfsplus/extents.c: remove unused variable in hfsplus_get_block
    nilfs2: update project's web site in nilfs2.txt
    nilfs2: update MAINTAINERS file entries fix
    nilfs2: verify metadata sizes read from disk
    nilfs2: add FITRIM ioctl support for nilfs2
    nilfs2: add nilfs_sufile_trim_fs to trim clean segs
    nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
    nilfs2: add nilfs_sufile_set_suinfo to update segment usage
    nilfs2: add struct nilfs_suinfo_update and flags
    nilfs2: update MAINTAINERS file entries
    fs/coda/inode.c: add __init to init_inodecache()
    BEFS: logging cleanup
    ...

    Linus Torvalds
     
  • Signed-off-by: chishanmingshen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    chishanmingshen
     
  • uselib hasn't been used since libc5; glibc does not use it. Support
    turning it off.

    When disabled, also omit the load_elf_library implementation from
    binfmt_elf.c, which only uselib invokes.

    bloat-o-meter:
    add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785)
    function old new delta
    padzero 39 36 -3
    uselib_flags 20 - -20
    sys_uselib 168 - -168
    SyS_uselib 168 - -168
    load_elf_library 426 - -426

    The new CONFIG_USELIB defaults to `y'.

    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • sys_sysfs is an obsolete system call no longer supported by libc.

    - This patch adds a default CONFIG_SYSFS_SYSCALL=y

    - Option can be turned off in expert mode.

    - cond_syscall added to kernel/sys_ni.c

    [akpm@linux-foundation.org: tweak Kconfig help text]
    Signed-off-by: Fabian Frederick
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Pull cgroup updates from Tejun Heo:
    "A lot updates for cgroup:

    - The biggest one is cgroup's conversion to kernfs. cgroup took
    after the long abandoned vfs-entangled sysfs implementation and
    made it even more convoluted over time. cgroup's internal objects
    were fused with vfs objects which also brought in vfs locking and
    object lifetime rules. Naturally, there are places where vfs rules
    don't fit and nasty hacks, such as credential switching or lock
    dance interleaving inode mutex and cgroup_mutex with object serial
    number comparison thrown in to decide whether the operation is
    actually necessary, needed to be employed.

    After conversion to kernfs, internal object lifetime and locking
    rules are mostly isolated from vfs interactions allowing shedding
    of several nasty hacks and overall simplification. This will also
    allow implmentation of operations which may affect multiple cgroups
    which weren't possible before as it would have required nesting
    i_mutexes.

    - Various simplifications including dropping of module support,
    easier cgroup name/path handling, simplified cgroup file type
    handling and task_cg_lists optimization.

    - Prepatory changes for the planned unified hierarchy, which is still
    a patchset away from being actually operational. The dummy
    hierarchy is updated to serve as the default unified hierarchy.
    Controllers which aren't claimed by other hierarchies are
    associated with it, which BTW was what the dummy hierarchy was for
    anyway.

    - Various fixes from Li and others. This pull request includes some
    patches to add missing slab.h to various subsystems. This was
    triggered xattr.h include removal from cgroup.h. cgroup.h
    indirectly got included a lot of files which brought in xattr.h
    which brought in slab.h.

    There are several merge commits - one to pull in kernfs updates
    necessary for converting cgroup (already in upstream through
    driver-core), others for interfering changes in the fixes branch"

    * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
    cgroup: remove useless argument from cgroup_exit()
    cgroup: fix spurious lockdep warning in cgroup_exit()
    cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
    cgroup: break kernfs active_ref protection in cgroup directory operations
    cgroup: fix cgroup_taskset walking order
    cgroup: implement CFTYPE_ONLY_ON_DFL
    cgroup: make cgrp_dfl_root mountable
    cgroup: drop const from @buffer of cftype->write_string()
    cgroup: rename cgroup_dummy_root and related names
    cgroup: move ->subsys_mask from cgroupfs_root to cgroup
    cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
    cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
    cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
    cgroup: reorganize cgroup bootstrapping
    cgroup: relocate setting of CGRP_DEAD
    cpuset: use rcu_read_lock() to protect task_cs()
    cgroup_freezer: document freezer_fork() subtleties
    cgroup: update cgroup_transfer_tasks() to either succeed or fail
    cgroup: drop task_lock() protection around task->cgroups
    cgroup: update how a newly forked task gets associated with css_set
    ...

    Linus Torvalds
     

01 Apr, 2014

1 commit

  • Pull core locking updates from Ingo Molnar:
    "The biggest change is the MCS spinlock generalization changes from Tim
    Chen, Peter Zijlstra, Jason Low et al. There's also lockdep
    fixes/enhancements from Oleg Nesterov, in particular a false negative
    fix related to lockdep_set_novalidate_class() usage"

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    locking/mutex: Fix debug checks
    locking/mutexes: Add extra reschedule point
    locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
    locking/mutexes: Unlock the mutex without the wait_lock
    locking/mutexes: Modify the way optimistic spinners are queued
    locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
    locking: Move mcs_spinlock.h into kernel/locking/
    m68k: Skip futex_atomic_cmpxchg_inatomic() test
    futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
    Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
    lockdep: Change lockdep_set_novalidate_class() to use _and_name
    lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
    lockdep: Don't create the wrong dependency on hlock->check == 0
    lockdep: Make held_lock->check and "int check" argument bool
    locking/mcs: Allow architecture specific asm files to be used for contended case
    locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
    sched/wait: Suppress Sparse 'variable shadowing' warning
    hung_task/Documentation: Fix hung_task_warnings description
    locking/mcs: Allow architectures to hook in to contended paths
    locking/mcs: Micro-optimize the MCS code, add extra comments
    ...

    Linus Torvalds
     

20 Mar, 2014

2 commits

  • Currently AUDITSYSCALL has a long list of architecture depencency:
    depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
    SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
    The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
    for simplicity.

    Signed-off-by: AKASHI Takahiro
    Acked-by: Will Deacon (arm)
    Acked-by: Richard Guy Briggs (audit)
    Acked-by: Matt Turner (alpha)
    Acked-by: Michael Ellerman (powerpc)
    Signed-off-by: Eric Paris

    AKASHI Takahiro
     
  • Signed-off-by: Zhenglong.cai
    Signed-off-by: Matt Turner

    蔡正龙
     

13 Mar, 2014

1 commit

  • Commit 73f7d1ca3263 (ACPI / init: Run acpi_early_init() before
    timekeeping_init()) optimistically moved the early ACPI initialization
    before timekeeping_init(), but that didn't work, because it broke fast
    TSC calibration for Julian Wollrath on Thinkpad x121e (and most likely
    for others too). The reason is that acpi_early_init() enables the SCI
    and that interferes with the fast TSC calibration mechanism.

    Thus follow the original idea to execute acpi_early_init() before
    efi_enter_virtual_mode() to help the EFI people for now and we can
    revisit the other problem that commit 73f7d1ca3263 attempted to
    address in the future (if really necessary).

    Fixes: 73f7d1ca3263 (ACPI / init: Run acpi_early_init() before timekeeping_init())
    Reported-by: Julian Wollrath
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

03 Mar, 2014

1 commit

  • If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
    is no runtime check necessary, allow to skip the test within futex_init().

    This allows to get rid of some code which would always give the same result,
    and also allows the compiler to optimize a couple of if statements away.

    Signed-off-by: Heiko Carstens
    Cc: Finn Thain
    Cc: Geert Uytterhoeven
    Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
    Signed-off-by: Thomas Gleixner

    Heiko Carstens
     

12 Feb, 2014

1 commit

  • cgroup filesystem code was derived from the original sysfs
    implementation which was heavily intertwined with vfs objects and
    locking with the goal of re-using the existing vfs infrastructure.
    That experiment turned out rather disastrous and sysfs switched, a
    long time ago, to distributed filesystem model where a separate
    representation is maintained which is queried by vfs. Unfortunately,
    cgroup stuck with the failed experiment all these years and
    accumulated even more problems over time.

    Locking and object lifetime management being entangled with vfs is
    probably the most egregious. vfs is never designed to be misused like
    this and cgroup ends up jumping through various convoluted dancing to
    make things work. Even then, operations across multiple cgroups can't
    be done safely as it'll deadlock with rename locking.

    Recently, kernfs is separated out from sysfs so that it can be used by
    users other than sysfs. This patch converts cgroup to use kernfs,
    which will bring the following benefits.

    * Separation from vfs internals. Locking and object lifetime
    management is contained in cgroup proper making things a lot
    simpler. This removes significant amount of locking convolutions,
    hairy object lifetime rules and the restriction on multi-cgroup
    operations.

    * Can drop a lot of code to implement filesystem interface as most are
    provided by kernfs.

    * Proper "severing" semantics, which allows controllers to not worry
    about lingering file accesses after offline.

    While the preceding patches did as much as possible to make the
    transition less painful, large part of the conversion has to be one
    discrete step making this patch rather large. The rest of the commit
    message lists notable changes in different areas.

    Overall
    -------

    * vfs constructs replaced with kernfs ones. cgroup->dentry w/ ->kn,
    cgroupfs_root->sb w/ ->kf_root.

    * All dentry accessors are removed. Helpers to map from kernfs
    constructs are added.

    * All vfs plumbing around dentry, inode and bdi removed.

    * cgroup_mount() now directly looks for matching root and then
    proceeds to create a new one if not found.

    Synchronization and object lifetime
    -----------------------------------

    * vfs inode locking removed. Among other things, this removes the
    need for the convolution in cgroup_cfts_commit(). Future patches
    will further simplify it.

    * vfs refcnting replaced with cgroup internal ones. cgroup->refcnt,
    cgroupfs_root->refcnt added. cgroup_put_root() now directly puts
    root->refcnt and when it reaches zero proceeds to destroy it thus
    merging cgroup_put_root() and the former cgroup_kill_sb().
    Simliarly, cgroup_put() now directly schedules cgroup_free_rcu()
    when refcnt reaches zero.

    * Unlike before, kernfs objects don't hold onto cgroup objects. When
    cgroup destroys a kernfs node, all existing operations are drained
    and the association is broken immediately. The same for
    cgroupfs_roots and mounts.

    * All operations which come through kernfs guarantee that the
    associated cgroup is and stays valid for the duration of operation;
    however, there are two paths which need to find out the associated
    cgroup from dentry without going through kernfs -
    css_tryget_from_dir() and cgroupstats_build(). For these two,
    kernfs_node->priv is RCU managed so that they can dereference it
    under RCU read lock.

    File and directory handling
    ---------------------------

    * File and directory operations converted to kernfs_ops and
    kernfs_syscall_ops.

    * xattrs is implicitly supported by kernfs. No need to worry about it
    from cgroup. This means that "xattr" mount option is no longer
    necessary. A future patch will add a deprecated warning message
    when sane_behavior.

    * When cftype->max_write_len > PAGE_SIZE, it's necessary to make a
    private copy of one of the kernfs_ops to set its atomic_write_len.
    cftype->kf_ops is added and cgroup_init/exit_cftypes() are updated
    to handle it.

    * cftype->lockdep_key added so that kernfs lockdep annotation can be
    per cftype.

    * Inidividual file entries and open states are now managed by kernfs.
    No need to worry about them from cgroup. cfent, cgroup_open_file
    and their friends are removed.

    * kernfs_nodes are created deactivated and kernfs_activate()
    invocations added to places where creation of new nodes are
    committed.

    * cgroup_rmdir() uses kernfs_[un]break_active_protection() for
    self-removal.

    v2: - Li pointed out in an earlier patch that specifying "name="
    during mount without subsystem specification should succeed if
    there's an existing hierarchy with a matching name although it
    should fail with -EINVAL if a new hierarchy should be created.
    Prior to the conversion, this used by handled by deferring
    failure from NULL return from cgroup_root_from_opts(), which was
    necessary because root was being created before checking for
    existing ones. Note that cgroup_root_from_opts() returned an
    ERR_PTR() value for error conditions which require immediate
    mount failure.

    As we now have separate search and creation steps, deferring
    failure from cgroup_root_from_opts() is no longer necessary.
    cgroup_root_from_opts() is updated to always return ERR_PTR()
    value on failure.

    - The logic to match existing roots is updated so that a mount
    attempt with a matching name but different subsys_mask are
    rejected. This was handled by a separate matching loop under
    the comment "Check for name clashes with existing mounts" but
    got lost during conversion. Merge the check into the main
    search loop.

    - Add __rcu __force casting in RCU_INIT_POINTER() in
    cgroup_destroy_locked() to avoid the sparse address space
    warning reported by kbuild test bot. Maybe we want an explicit
    interface to use kn->priv as RCU protected pointer?

    v3: Make CONFIG_CGROUPS select CONFIG_KERNFS.

    v4: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to
    cgroup_idr with cgroup_mutex").

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Cc: kbuild test robot fengguang.wu@intel.com>

    Tejun Heo
     

06 Feb, 2014

1 commit

  • This changes 'do_execve()' to get the executable name as a 'struct
    filename', and to free it when it is done. This is what the normal
    users want, and it simplifies and streamlines their error handling.

    The controlled lifetime of the executable name also fixes a
    use-after-free problem with the trace_sched_process_exec tracepoint: the
    lifetime of the passed-in string for kernel users was not at all
    obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
    the pathname allocation lifetime with the execve() having finished,
    which in turn meant that the trace point that happened after
    mm_release() of the old process VM ended up using already free'd memory.

    To solve the kernel string lifetime issue, this simply introduces
    "getname_kernel()" that works like the normal user-space getname()
    function, except with the source coming from kernel memory.

    As Oleg points out, this also means that we could drop the tcomm[] array
    from 'struct linux_binprm', since the pathname lifetime now covers
    setup_new_exec(). That would be a separate cleanup.

    Reported-by: Igor Zhbanov
    Tested-by: Steven Rostedt
    Cc: Oleg Nesterov
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Feb, 2014

1 commit


29 Jan, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
    assorted cleanups and fixes all over the place...

    There will be another pile later this week"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
    __dentry_path() fixes
    vfs: Remove second variable named error in __dentry_path
    vfs: Is mounted should be testing mnt_ns for NULL or error.
    Fix race when checking i_size on direct i/o read
    hfsplus: remove can_set_xattr
    nfsd: use get_acl and ->set_acl
    fs: remove generic_acl
    nfs: use generic posix ACL infrastructure for v3 Posix ACLs
    gfs2: use generic posix ACL infrastructure
    jfs: use generic posix ACL infrastructure
    xfs: use generic posix ACL infrastructure
    reiserfs: use generic posix ACL infrastructure
    ocfs2: use generic posix ACL infrastructure
    jffs2: use generic posix ACL infrastructure
    hfsplus: use generic posix ACL infrastructure
    f2fs: use generic posix ACL infrastructure
    ext2/3/4: use generic posix ACL infrastructure
    btrfs: use generic posix ACL infrastructure
    fs: make posix_acl_create more useful
    fs: make posix_acl_chmod more useful
    ...

    Linus Torvalds
     

28 Jan, 2014

1 commit