25 Mar, 2012

1 commit

  • Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
    "Fix up files in fs/ and lib/ dirs to only use module.h if they really
    need it.

    These are trivial in scope vs the work done previously. We now have
    things where any few remaining cleanups can be farmed out to arch or
    subsystem maintainers, and I have done so when possible. What is
    remaining here represents the bits that don't clearly lie within a
    single arch/subsystem boundary, like the fs dir and the lib dir.

    Some duplicate includes arising from overlapping fixes from
    independent subsystem maintainer submissions are also quashed."

    Fix up trivial conflicts due to clashes with other include file cleanups
    (including some due to the previous bug.h cleanup pull).

    * tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    lib: reduce the use of module.h wherever possible
    fs: reduce the use of module.h wherever possible
    includecheck: delete any duplicate instances of module.h

    Linus Torvalds
     

24 Mar, 2012

20 commits

  • Pull sysctl updates from Eric Biederman:

    - Rewrite of sysctl for speed and clarity.

    Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
    are no longer bottlenecks in the process of adding and removing
    network devices.

    sysctl is now focused on being a filesystem instead of system call
    and the code can all be found in fs/proc/proc_sysctl.c. Hopefully
    this means the code is now approachable.

    Much thanks is owed to Lucian Grinjincu for keeping at this until
    something was found that was usable.

    - The recent proc_sys_poll oops found by the fuzzer during hibernation
    is fixed.

    * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
    sysctl: protect poll() in entries that may go away
    sysctl: Don't call sysctl_follow_link unless we are a link.
    sysctl: Comments to make the code clearer.
    sysctl: Correct error return from get_subdir
    sysctl: An easier to read version of find_subdir
    sysctl: fix memset parameters in setup_sysctl_set()
    sysctl: remove an unused variable
    sysctl: Add register_sysctl for normal sysctl users
    sysctl: Index sysctl directories with rbtrees.
    sysctl: Make the header lists per directory.
    sysctl: Move sysctl_check_dups into insert_header
    sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
    sysctl: Replace root_list with links between sysctl_table_sets.
    sysctl: Add sysctl_print_dir and use it in get_subdir
    sysctl: Stop requiring explicit management of sysctl directories
    sysctl: Add a root pointer to ctl_table_set
    sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
    sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
    sysctl: Normalize the root_table data structure.
    sysctl: Factor out insert_header and erase_header
    ...

    Linus Torvalds
     
  • As Tetsuo Handa pointed out, request_module() can stress the system
    while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.

    The task T uses "almost all" memory, then it does something which
    triggers request_module(). Say, it can simply call sys_socket(). This
    in turn needs more memory and leads to OOM. oom-killer correctly
    chooses T and kills it, but this can't help because it sleeps in
    TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
    TIF_MEMDIE task T.

    Make __request_module() killable. The only necessary change is that
    call_modprobe() should kmalloc argv and module_name, they can't live in
    the stack if we use UMH_KILLABLE. This memory is freed via
    call_usermodehelper_freeinfo()->cleanup.

    Reported-by: Tetsuo Handa
    Signed-off-by: Oleg Nesterov
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No functional changes. Move the call_usermodehelper code from
    __request_module() into the new simple helper, call_modprobe().

    Signed-off-by: Oleg Nesterov
    Cc: Tetsuo Handa
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Minor cleanup. ____call_usermodehelper() can simply return, no need to
    call do_exit() explicitely.

    Signed-off-by: Oleg Nesterov
    Cc: Tetsuo Handa
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No functional changes. It is not sane to use UMH_KILLABLE with enum
    umh_wait, but obviously we do not want another argument in
    call_usermodehelper_* helpers. Kill this enum, use the plain int.

    Signed-off-by: Oleg Nesterov
    Cc: Tetsuo Handa
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
    The caller must ensure that subprocess_info->path/etc can not go away
    until call_usermodehelper_freeinfo().

    call_usermodehelper_exec(UMH_KILLABLE) does
    wait_for_completion_killable. If it fails, it uses
    xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
    does the same xhcg() to access sub_info->complete.

    If call_usermodehelper_exec wins, it can safely return. umh_complete()
    should get NULL and call call_usermodehelper_freeinfo().

    Otherwise we know that umh_complete() was already called, in this case
    call_usermodehelper_exec() falls back to wait_for_completion() which
    should succeed "very soon".

    Note: UMH_NO_WAIT == -1 but it obviously should not be used with
    UMH_KILLABLE. We delay the neccessary cleanup to simplify the back
    porting.

    Signed-off-by: Oleg Nesterov
    Cc: Tetsuo Handa
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Preparation. Add the new trivial helper, umh_complete(). Currently it
    simply does complete(sub_info->complete).

    Signed-off-by: Oleg Nesterov
    Cc: Tetsuo Handa
    Cc: Rusty Russell
    Cc: Tejun Heo
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Change zap_pid_ns_processes() to use SEND_SIG_FORCED, it looks more
    clear compared to SEND_SIG_NOINFO which relies on from_ancestor_ns logic
    send_signal().

    It is also more efficient if we need to kill a lot of tasks because it
    doesn't alloc sigqueue.

    While at it, add the __fatal_signal_pending(task) check as a minor
    optimization.

    Signed-off-by: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Anton Vorontsov
    Cc: "Eric W. Biederman"
    Cc: KOSAKI Motohiro
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cosmetic, rename the from_ancestor_ns argument in prepare_signal()
    paths. After the previous change it doesn't match the reality.

    Signed-off-by: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Anton Vorontsov
    Cc: "Eric W. Biederman"
    Cc: KOSAKI Motohiro
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • force_sig_info() and friends have the special semantics for synchronous
    signals, this interface should not be used if the target is not current.
    And it needs the fixes, in particular the clearing of SIGNAL_UNKILLABLE
    is not exactly right.

    However there are callers which have to use force_ exactly because it
    clears SIGNAL_UNKILLABLE and thus it can kill the CLONE_NEWPID tasks,
    although this is almost always is wrong by various reasons.

    With this patch SEND_SIG_FORCED ignores SIGNAL_UNKILLABLE, like we do if
    the signal comes from the ancestor namespace.

    This makes the naming in prepare_signal() paths insane, fixed by the
    next cleanup.

    Note: this only affects SIGKILL/SIGSTOP, but this is enough for
    force_sig() abusers.

    Signed-off-by: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Anton Vorontsov
    Cc: "Eric W. Biederman"
    Cc: KOSAKI Motohiro
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • PTRACE_SEIZE code is tested and ready for production use, remove the
    code which requires special bit in data argument to make PTRACE_SEIZE
    work.

    Strace team prepares for a new release of strace, and we would like to
    ship the code which uses PTRACE_SEIZE, preferably after this change goes
    into released kernel.

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Acked-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • This can be used to close a few corner cases in strace where we get
    unwanted racy behavior after attach, but before we have a chance to set
    options (the notorious post-execve SIGTRAP comes to mind), and removes
    the need to track "did we set opts for this task" state in strace
    internals.

    While we are at it:

    Make it possible to extend SEIZE in the future with more functionality
    by passing non-zero 'addr' parameter. To that end, error out if 'addr'
    is non-zero. PTRACE_ATTACH did not (and still does not) have such
    check, and users (strace) do pass garbage there... let's avoid
    repeating this mistake with SEIZE.

    Set all task->ptrace bits in one operation - before this change, we were
    adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops. This
    was probably ok (not a bug), but let's be on a safer side.

    Changes since v2: use (unsigned long) casts instead of (long) ones, move
    PTRACE_SEIZE_DEVEL-related code to separate lines of code.

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Cc: Pedro Alves
    Reviewed-by: Oleg Nesterov
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
    PT_option bits contiguous and therefore makes code in
    ptrace_setoptions() much simpler.

    Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
    instead of using explicit numeric constants, to ensure we don't mess up
    relationship between bit positions and event ids.

    PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
    value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

    PT_TRACE_MASK constant is nuked, the only its use is replaced by
    (PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Reviewed-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • On ptrace(PTRACE_SETOPTIONS, pid, 0, ), we used to set those
    option bits which are known, and then fail with -EINVAL if there are
    some unknown bits in .

    This is inconsistent with typical error handling, which does not change
    any state if input is invalid.

    This patch changes PTRACE_SETOPTIONS behavior so that in this case, we
    return -EINVAL and don't change any bits in task->ptrace.

    It's very unlikely that there is userspace code in the wild which will
    be affected by this change: it should have the form

    ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_BOGUSOPT)

    where PTRACE_O_BOGUSOPT is a constant unknown to the kernel. But kernel
    headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only
    way userspace can use one if it defines one itself. I can't see why
    anyone would do such a thing deliberately.

    Signed-off-by: Denys Vlasenko
    Acked-by: Tejun Heo
    Reviewed-by: Oleg Nesterov
    Cc: Pedro Alves
    Cc: Jan Kratochvil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • Revelation from Peter.

    Cc: Peter Zijlstra
    Cc: Don Zickus
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • It fixes some 80-col wordwrappings and adds some consistency.

    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • If the system is loaded while hotplugging a CPU we might end up with a
    bogus hardlockup detection. This has been seen during LTP pounder test
    executed in parallel with hotplug test.

    The main problem is that enable_watchdog (called when CPU is brought up)
    registers perf event which periodically checks per-cpu counter
    (hrtimer_interrupts), updated from a hrtimer callback, but the hrtimer
    is fired from the kernel thread.

    This means that while we already do check for the hard lockup the kernel
    thread might be sitting on the runqueue with zillions of tasks so there
    is nobody to update the value we rely on and so we KABOOM.

    Let's fix this by boosting the watchdog thread priority before we wake
    it up rather than when it's already running. This still doesn't handle
    a case where we have the same amount of high prio FIFO tasks but that
    doesn't seem to be common. The current implementation doesn't handle
    that case anyway so this is not worse at least.

    Unfortunately, we cannot start perf counter from the watchdog thread
    because we could miss a real lock up and also we cannot start the
    hrtimer watchdog_enable because we there is no way (at least I don't
    know any) to start a hrtimer from a different CPU.

    [dzickus@redhat.com: fix compile issue with param]
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Reviewed-by: Mandeep Singh Baines
    Signed-off-by: Michal Hocko
    Signed-off-by: Don Zickus
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • I just received another user's pleas for help when their init
    mysteriously died. I again explained that they need to check whether it
    died because of bad instruction, a segv, or something else. Which was
    an annoying detour into writing a trivial C program to spawn his init
    and print its exit code:

    http://lists.busybox.net/pipermail/busybox/2012-January/077172.html

    I hear you saying "just test it under /bin/sh". Well, the crashing init
    _was_ /bin/sh.

    Which prompted me to make kernel do this first step automatically. We can
    print exit code, which makes it possible to see that death was from e.g.
    SIGILL without writing test programs.

    [akpm@linux-foundation.org: add 0x to hex number output]
    Signed-off-by: Denys Vlasenko
    Acked-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denys Vlasenko
     
  • Userspace service managers/supervisors need to track their started
    services. Many services daemonize by double-forking and get implicitly
    re-parented to PID 1. The service manager will no longer be able to
    receive the SIGCHLD signals for them, and is no longer in charge of
    reaping the children with wait(). All information about the children is
    lost at the moment PID 1 cleans up the re-parented processes.

    With this prctl, a service manager process can mark itself as a sort of
    'sub-init', able to stay as the parent for all orphaned processes
    created by the started services. All SIGCHLD signals will be delivered
    to the service manager.

    Receiving SIGCHLD and doing wait() is in cases of a service-manager much
    preferred over any possible asynchronous notification about specific
    PIDs, because the service manager has full access to the child process
    data in /proc and the PID can not be re-used until the wait(), the
    service-manager itself is in charge of, has happened.

    As a side effect, the relevant parent PID information does not get lost
    by a double-fork, which results in a more elaborate process tree and
    'ps' output:

    before:
    # ps afx
    253 ? Ss 0:00 /bin/dbus-daemon --system --nofork
    294 ? Sl 0:00 /usr/libexec/polkit-1/polkitd
    328 ? S 0:00 /usr/sbin/modem-manager
    608 ? Sl 0:00 /usr/libexec/colord
    658 ? Sl 0:00 /usr/libexec/upowerd
    819 ? Sl 0:00 /usr/libexec/imsettings-daemon
    916 ? Sl 0:00 /usr/libexec/udisks-daemon
    917 ? S 0:00 \_ udisks-daemon: not polling any devices

    after:
    # ps afx
    294 ? Ss 0:00 /bin/dbus-daemon --system --nofork
    426 ? Sl 0:00 \_ /usr/libexec/polkit-1/polkitd
    449 ? S 0:00 \_ /usr/sbin/modem-manager
    635 ? Sl 0:00 \_ /usr/libexec/colord
    705 ? Sl 0:00 \_ /usr/libexec/upowerd
    959 ? Sl 0:00 \_ /usr/libexec/udisks-daemon
    960 ? S 0:00 | \_ udisks-daemon: not polling any devices
    977 ? Sl 0:00 \_ /usr/libexec/packagekitd

    This prctl is orthogonal to PID namespaces. PID namespaces are isolated
    from each other, while a service management process usually requires the
    services to live in the same namespace, to be able to talk to each
    other.

    Users of this will be the systemd per-user instance, which provides
    init-like functionality for the user's login session and D-Bus, which
    activates bus services on-demand. Both need init-like capabilities to
    be able to properly keep track of the services they start.

    Many thanks to Oleg for several rounds of review and insights.

    [akpm@linux-foundation.org: fix comment layout and spelling]
    [akpm@linux-foundation.org: add lengthy code comment from Oleg]
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Lennart Poettering
    Signed-off-by: Kay Sievers
    Acked-by: Valdis Kletnieks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lennart Poettering
     
  • Pull KGDB/KDB updates from Jason Wessel:
    "Fixes:
    - Fix KDB keyboard repeat scan codes and leaked keyboard events
    - Fix kernel crash with kdb_printf() for users who compile new
    kdb_printf()'s in early code
    - Return all segment registers to gdb on x86_64

    Features:
    - KDB/KGDB hook the reboot notifier and end user can control if it
    stops, detaches or does nothing (updated docs as well)
    - Notify users who use CONFIG_DEBUG_RODATA to use hw breakpoints"

    * tag 'for_linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
    kdb: Add message about CONFIG_DEBUG_RODATA on failure to install breakpoint
    kdb: Avoid using dbg_io_ops until it is initialized
    kgdb,debug_core: add the ability to control the reboot notifier
    KDB: Fix usability issues relating to the 'enter' key.
    kgdb,debug-core,gdbstub: Hook the reboot notifier for debugger detach
    kgdb: Respect that flush op is optional
    kgdb: x86: Return all segment registers also in 64-bit mode

    Linus Torvalds
     

23 Mar, 2012

8 commits

  • Pull input subsystem updates from Dmitry Torokhov:
    "- we finally merged driver for USB version of Synaptics touchpads
    (I guess most commonly found in IBM/Lenovo keyboard/touchpad combo);

    - a bunch of new drivers for embedded platforms (Cypress
    touchscreens, DA9052 OnKey, MAX8997-haptic, Ilitek ILI210x
    touchscreens, TI touchscreen);

    - input core allows clients to specify desired clock source for
    timestamps on input events (EVIOCSCLOCKID ioctl);

    - input core allows querying state of all MT slots for given event
    code via EVIOCGMTSLOTS ioctl;

    - various driver fixes and improvements."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits)
    Input: ili210x - add support for Ilitek ILI210x based touchscreens
    Input: altera_ps2 - use of_match_ptr()
    Input: synaptics_usb - switch to module_usb_driver()
    Input: convert I2C drivers to use module_i2c_driver()
    Input: convert SPI drivers to use module_spi_driver()
    Input: omap4-keypad - move platform_data to
    Input: kxtj9 - who_am_i check value and initial data rate fixes
    Input: add driver support for MAX8997-haptic
    Input: tegra-kbc - revise device tree support
    Input: of_keymap - add device tree bindings for simple key matrices
    Input: wacom - fix physical size calculation for 3rd-gen Bamboo
    Input: twl4030-vibra - really switch from #if to #ifdef
    Input: hp680_ts_input - ensure arguments to request_irq and free_irq are compatible
    Input: max8925_onkey - avoid accessing input device too early
    Input: max8925_onkey - allow to be used as a wakeup source
    Input: atmel-wm97xx - convert to dev_pm_ops
    Input: atmel-wm97xx - set driver owner
    Input: add cyttsp touchscreen maintainer entry
    Input: cyttsp - remove useless checks in cyttsp_probe()
    Input: usbtouchscreen - add support for Data Modul EasyTouch TP 72037
    ...

    Linus Torvalds
     
  • On x86, if CONFIG_DEBUG_RODATA is set, one cannot set breakpoints
    via KDB. Apparently this is a well-known problem, as at least one distribution
    now ships with both KDB enabled and CONFIG_DEBUG_RODATA=y for security reasons.

    This patch adds an printk message to the breakpoint failure case,
    in order to provide suggestions about how to use the debugger.

    Reported-by: Tim Bird
    Signed-off-by: Jason Wessel
    Acked-by: Tim Bird

    Jason Wessel
     
  • This fixes a bug with setting a breakpoint during kdb initialization
    (from kdb_cmds). Any call to kdb_printf() before the initialization
    of the kgdboc serial console driver (which happens much later during
    bootup than kdb_init), results in kernel panic due to the use of
    dbg_io_ops before it is initialized.

    Signed-off-by: Tim Bird
    Signed-off-by: Jason Wessel

    Tim Bird
     
  • Sometimes it is desirable to stop the kernel debugger before allowing
    a system to reboot either with kdb or kgdb. This patch adds the
    ability to turn the reboot notifier on and off or enter the debugger
    and stop kernel execution before rebooting.

    It is possible to change the setting after booting the kernel with the
    following:

    echo 1 > /sys/module/debug_core/parameters/kgdbreboot

    It is also possible to change this setting using kdb / kgdb to
    manipulate the variable directly.

    Using KDB:
    mm kgdbreboot 1

    Using gdb:
    set kgdbreboot=1

    Reported-by: Jan Kiszka
    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • This fixes the following problems:
    1) Typematic-repeat of 'enter' gives warning message
    and leaks make/break if KDB exits. Repeats
    look something like 0x1c 0x1c .... 0x9c
    2) Use of 'keypad enter' gives warning message and
    leaks the ENTER break/make code out if KDB exits.
    KP ENTER repeats look someting like 0xe0 0x1c
    0xe0 0x1c ... 0xe0 0x9c.
    3) Lag on the order of seconds between "break" and "make" when
    expecting the enter "break" code. Seen under virtualized
    environments such as VMware ESX.

    The existing special enter handler tries to glob the enter break code,
    but this fails if the other (KP) enter was used, or if there was a key
    repeat. It also fails if you mashed some keys along with enter, and
    you ended up with a non-enter make or non-enter break code coming
    after the enter make code. So first, we modify the handler to handle
    these cases. But performing these actions on every enter is annoying
    since now you can't hold ENTER down to scroll d messages in
    KDB. Since this special behaviour is only necessary to handle the
    exiting KDB ('g' + ENTER) without leaking scancodes to the OS. This
    cleanup needs to get executed anytime the kdb_main loop exits.

    Tested on QEMU. Set a bp on atkbd.c to verify no scan code was leaked.

    Cc: Andrei Warkentin
    [jason.wessel@windriver.com: move cleanup calls to kdb_main.c]
    Signed-off-by: Andrei Warkentin
    Signed-off-by: Jason Wessel

    Andrei Warkentin
     
  • The gdbstub and kdb should get detached if the system is rebooting.
    Calling gdbstub_exit() will set the proper debug core state and send a
    message to any debugger that is connected to correctly detach.

    An attached debugger will receive the exit code from
    include/linux/reboot.h based on SYS_HALT, SYS_REBOOT, etc...

    Reported-by: Jan Kiszka
    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • Not all kgdb I/O drivers implement a flush operation. Adjust
    gdbstub_exit accordingly.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Jason Wessel

    Jan Kiszka
     
  • Merge first batch of patches from Andrew Morton:
    "A few misc things and all the MM queue"

    * emailed from Andrew Morton : (92 commits)
    memcg: avoid THP split in task migration
    thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
    memcg: clean up existing move charge code
    mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
    mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
    mm/memcontrol.c: s/stealed/stolen/
    memcg: fix performance of mem_cgroup_begin_update_page_stat()
    memcg: remove PCG_FILE_MAPPED
    memcg: use new logic for page stat accounting
    memcg: remove PCG_MOVE_LOCK flag from page_cgroup
    memcg: simplify move_account() check
    memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
    memcg: kill dead prev_priority stubs
    memcg: remove PCG_CACHE page_cgroup flag
    memcg: let css_get_next() rely upon rcu_read_lock()
    cgroup: revert ss_id_lock to spinlock
    idr: make idr_get_next() good for rcu_read_lock()
    memcg: remove unnecessary thp check in page stat accounting
    memcg: remove redundant returns
    memcg: enum lru_list lru
    ...

    Linus Torvalds
     

22 Mar, 2012

11 commits

  • Remove lock and unlock around css_get_next()'s call to idr_get_next().
    memcg iterators (only users of css_get_next) already did rcu_read_lock(),
    and its comment demands that; but add a WARN_ON_ONCE to make sure of it.

    Signed-off-by: Hugh Dickins
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Li Zefan
    Cc: Eric Dumazet
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Commit c1e2ee2dc436 ("memcg: replace ss->id_lock with a rwlock") has now
    been seen to cause the unfair behavior we should have expected from
    converting a spinlock to an rwlock: softlockup in cgroup_mkdir(), whose
    get_new_cssid() is waiting for the wlock, while there are 19 tasks using
    the rlock in css_get_next() to get on with their memcg workload (in an
    artificial test, admittedly). Yet lib/idr.c was made suitable for RCU
    way back: revert that commit, restoring ss->id_lock to a spinlock.

    Signed-off-by: Hugh Dickins
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Li Zefan
    Cc: Eric Dumazet
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • sync_mm_rss() can only be used for current to avoid race conditions in
    iterating and clearing its per-task counters. Remove the task argument
    for it and its helper function, __sync_task_rss_stat(), to avoid thinking
    it can be used safely for anything other than current.

    Signed-off-by: David Rientjes
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
    changing cpuset's mems") wins a super prize for the largest number of
    memory barriers entered into fast paths for one commit.

    [get|put]_mems_allowed is incredibly heavy with pairs of full memory
    barriers inserted into a number of hot paths. This was detected while
    investigating at large page allocator slowdown introduced some time
    after 2.6.32. The largest portion of this overhead was shown by
    oprofile to be at an mfence introduced by this commit into the page
    allocator hot path.

    For extra style points, the commit introduced the use of yield() in an
    implementation of what looks like a spinning mutex.

    This patch replaces the full memory barriers on both read and write
    sides with a sequence counter with just read barriers on the fast path
    side. This is much cheaper on some architectures, including x86. The
    main bulk of the patch is the retry logic if the nodemask changes in a
    manner that can cause a false failure.

    While updating the nodemask, a check is made to see if a false failure
    is a risk. If it is, the sequence number gets bumped and parallel
    allocators will briefly stall while the nodemask update takes place.

    In a page fault test microbenchmark, oprofile samples from
    __alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
    actual results were

    3.3.0-rc3 3.3.0-rc3
    rc3-vanilla nobarrier-v2r1
    Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
    Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
    Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
    Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
    Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
    Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
    Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
    Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
    Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
    Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
    Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
    Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
    Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
    Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
    Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds) 135.68 132.17
    User+Sys Time Running Test (seconds) 164.2 160.13
    Total Elapsed Time (seconds) 123.46 120.87

    The overall improvement is small but the System CPU time is much
    improved and roughly in correlation to what oprofile reported (these
    performance figures are without profiling so skew is expected). The
    actual number of page faults is noticeably improved.

    For benchmarks like kernel builds, the overall benefit is marginal but
    the system CPU time is slightly reduced.

    To test the actual bug the commit fixed I opened two terminals. The
    first ran within a cpuset and continually ran a small program that
    faulted 100M of anonymous data. In a second window, the nodemask of the
    cpuset was continually randomised in a loop.

    Without the commit, the program would fail every so often (usually
    within 10 seconds) and obviously with the commit everything worked fine.
    With this patch applied, it also worked fine so the fix should be
    functionally equivalent.

    Signed-off-by: Mel Gorman
    Cc: Miao Xie
    Cc: David Rientjes
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Warn about non-zero rss counters at final mmdrop.

    This check will prevent reoccurences of bugs such as that fixed in "mm:
    fix rss count leakage during migration".

    I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
    rss counters cover whole page-table management, so this is a good
    invariant.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Pull vfs pile 1 from Al Viro:
    "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
    yet."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
    ext4: initialization of ext4_li_mtx needs to be done earlier
    debugfs-related mode_t whack-a-mole
    hfsplus: add an ioctl to bless files
    hfsplus: change finder_info to u32
    hfsplus: initialise userflags
    qnx4: new helper - try_extent()
    qnx4: get rid of qnx4_bread/qnx4_getblk
    take removal of PF_FORKNOEXEC to flush_old_exec()
    trim includes in inode.c
    um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
    um: embed ->stub_pages[] into mmu_context
    gadgetfs: list_for_each_safe() misuse
    ocfs2: fix leaks on failure exits in module_init
    ecryptfs: make register_filesystem() the last potential failure exit
    ntfs: forgets to unregister sysctls on register_filesystem() failure
    logfs: missing cleanup on register_filesystem() failure
    jfs: mising cleanup on register_filesystem() failure
    make configfs_pin_fs() return root dentry on success
    configfs: configfs_create_dir() has parent dentry in dentry->d_parent
    configfs: sanitize configfs_create()
    ...

    Linus Torvalds
     
  • Pull security subsystem updates for 3.4 from James Morris:
    "The main addition here is the new Yama security module from Kees Cook,
    which was discussed at the Linux Security Summit last year. Its
    purpose is to collect miscellaneous DAC security enhancements in one
    place. This also marks a departure in policy for LSM modules, which
    were previously limited to being standalone access control systems.
    Chromium OS is using Yama, and I believe there are plans for Ubuntu,
    at least.

    This patchset also includes maintenance updates for AppArmor, TOMOYO
    and others."

    Fix trivial conflict in due to the jumo_label->static_key
    rename.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
    AppArmor: Fix location of const qualifier on generated string tables
    TOMOYO: Return error if fails to delete a domain
    AppArmor: add const qualifiers to string arrays
    AppArmor: Add ability to load extended policy
    TOMOYO: Return appropriate value to poll().
    AppArmor: Move path failure information into aa_get_name and rename
    AppArmor: Update dfa matching routines.
    AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
    AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
    AppArmor: Add const qualifiers to generated string tables
    AppArmor: Fix oops in policy unpack auditing
    AppArmor: Fix error returned when a path lookup is disconnected
    KEYS: testing wrong bit for KEY_FLAG_REVOKED
    TOMOYO: Fix mount flags checking order.
    security: fix ima kconfig warning
    AppArmor: Fix the error case for chroot relative path name lookup
    AppArmor: fix mapping of META_READ to audit and quiet flags
    AppArmor: Fix underflow in xindex calculation
    AppArmor: Fix dropping of allowed operations that are force audited
    AppArmor: Add mising end of structure test to caps unpacking
    ...

    Linus Torvalds
     
  • Pull crypto update from Herbert Xu:
    "* sha512 bug fixes (already in your tree).
    * SHA224/SHA384 AEAD support in caam.
    * X86-64 optimised version of Camellia.
    * Tegra AES support.
    * Bulk algorithm registration interface to make driver registration easier.
    * padata race fixes.
    * Misc fixes."

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (31 commits)
    padata: Fix race on sequence number wrap
    padata: Fix race in the serialization path
    crypto: camellia - add assembler implementation for x86_64
    crypto: camellia - rename camellia.c to camellia_generic.c
    crypto: camellia - fix checkpatch warnings
    crypto: camellia - rename camellia module to camellia_generic
    crypto: tcrypt - add more camellia tests
    crypto: testmgr - add more camellia test vectors
    crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro
    crypto: twofish-x86_64/i586 - set alignmask to zero
    crypto: blowfish-x86_64 - set alignmask to zero
    crypto: serpent-sse2 - combine ablk_*_init functions
    crypto: blowfish-x86_64 - use crypto_[un]register_algs
    crypto: twofish-x86_64-3way - use crypto_[un]register_algs
    crypto: serpent-sse2 - use crypto_[un]register_algs
    crypto: serpent-sse2 - remove dead code from serpent_sse2_glue.c::serpent_sse2_init()
    crypto: twofish-x86 - Remove dead code from twofish_glue_3way.c::init()
    crypto: In crypto_add_alg(), 'exact' wants to be initialized to 0
    crypto: caam - fix gcc 4.6 warning
    crypto: Add bulk algorithm registration interface
    ...

    Linus Torvalds
     
  • Pull irq_domain support for all architectures from Grant Likely:
    "Generialize powerpc's irq_host as irq_domain

    This branch takes the PowerPC irq_host infrastructure (reverse mapping
    from Linux IRQ numbers to hardware irq numbering), generalizes it,
    renames it to irq_domain, and makes it available to all architectures.

    Originally the plan has been to create an all-new irq_domain
    implementation which addresses some of the powerpc shortcomings such
    as not handling 1:1 mappings well, but doing that proved to be far
    more difficult and invasive than generalizing the working code and
    refactoring it in-place. So, this branch rips out the 'new'
    irq_domain and replaces it with the modified powerpc version (in a
    fully bisectable way of course). It converts all users over to the
    new API and makes irq_domain selectable on any architecture.

    No architecture is forced to enable irq_domain, but the infrastructure
    is required for doing OpenFirmware style irq translations. It will
    even work on SPARC even though SPARC has it's own mechanism for
    translating irqs at boot time. MIPS, microblaze, embedded x86 and c6x
    are converted too.

    The resulting irq_domain code is probably still too verbose and can be
    optimized more, but that can be done incrementally and is a task for
    follow-on patches."

    * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: (31 commits)
    dt: fix twl4030 for non-dt compile on x86
    mfd: twl-core: Add IRQ_DOMAIN dependency
    devicetree: Add empty of_platform_populate() for !CONFIG_OF_ADDRESS (sparc)
    irq_domain: Centralize definition of irq_dispose_mapping()
    irq_domain/mips: Allow irq_domain on MIPS
    irq_domain/x86: Convert x86 (embedded) to use common irq_domain
    ppc-6xx: fix build failure in flipper-pic.c and hlwd-pic.c
    irq_domain/microblaze: Convert microblaze to use irq_domains
    irq_domain/powerpc: Replace custom xlate functions with library functions
    irq_domain/powerpc: constify irq_domain_ops
    irq_domain/c6x: Use library of xlate functions
    irq_domain/c6x: constify irq_domain structures
    irq_domain/c6x: Convert c6x to use generic irq_domain support.
    irq_domain: constify irq_domain_ops
    irq_domain: Create common xlate functions that device drivers can use
    irq_domain: Remove irq_domain_add_simple()
    irq_domain: Remove 'new' irq_domain in favour of the ppc one
    mfd: twl-core.c: Fix the number of interrupts managed by twl4030
    of/address: add empty static inlines for !CONFIG_OF
    irq_domain: Add support for base irq and hwirq in legacy mappings
    ...

    Linus Torvalds
     
  • Pull power management updates for 3.4 from Rafael Wysocki:
    "Assorted extensions and fixes including:

    * Introduction of early/late suspend/hibernation device callbacks.
    * Generic PM domains extensions and fixes.
    * devfreq updates from Axel Lin and MyungJoo Ham.
    * Device PM QoS updates.
    * Fixes of concurrency problems with wakeup sources.
    * System suspend and hibernation fixes."

    * tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (43 commits)
    PM / Domains: Check domain status during hibernation restore of devices
    PM / devfreq: add relation of recommended frequency.
    PM / shmobile: Make MTU2 driver use pm_genpd_dev_always_on()
    PM / shmobile: Make CMT driver use pm_genpd_dev_always_on()
    PM / shmobile: Make TMU driver use pm_genpd_dev_always_on()
    PM / Domains: Introduce "always on" device flag
    PM / Domains: Fix hibernation restore of devices, v2
    PM / Domains: Fix handling of wakeup devices during system resume
    sh_mmcif / PM: Use PM QoS latency constraint
    tmio_mmc / PM: Use PM QoS latency constraint
    PM / QoS: Make it possible to expose PM QoS latency constraints
    PM / Sleep: JBD and JBD2 missing set_freezable()
    PM / Domains: Fix include for PM_GENERIC_DOMAINS=n case
    PM / Freezer: Remove references to TIF_FREEZE in comments
    PM / Sleep: Add more wakeup source initialization routines
    PM / Hibernate: Enable usermodehelpers in hibernate() error path
    PM / Sleep: Make __pm_stay_awake() delete wakeup source timers
    PM / Sleep: Fix race conditions related to wakeup source timer function
    PM / Sleep: Fix possible infinite loop during wakeup source destruction
    PM / Hibernate: print physical addresses consistently with other parts of kernel
    ...

    Linus Torvalds
     
  • Pull kmap_atomic cleanup from Cong Wang.

    It's been in -next for a long time, and it gets rid of the (no longer
    used) second argument to k[un]map_atomic().

    Fix up a few trivial conflicts in various drivers, and do an "evil
    merge" to catch some new uses that have come in since Cong's tree.

    * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
    feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
    highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
    drbd: remove the second argument of k[un]map_atomic()
    zcache: remove the second argument of k[un]map_atomic()
    gma500: remove the second argument of k[un]map_atomic()
    dm: remove the second argument of k[un]map_atomic()
    tomoyo: remove the second argument of k[un]map_atomic()
    sunrpc: remove the second argument of k[un]map_atomic()
    rds: remove the second argument of k[un]map_atomic()
    net: remove the second argument of k[un]map_atomic()
    mm: remove the second argument of k[un]map_atomic()
    lib: remove the second argument of k[un]map_atomic()
    power: remove the second argument of k[un]map_atomic()
    kdb: remove the second argument of k[un]map_atomic()
    udf: remove the second argument of k[un]map_atomic()
    ubifs: remove the second argument of k[un]map_atomic()
    squashfs: remove the second argument of k[un]map_atomic()
    reiserfs: remove the second argument of k[un]map_atomic()
    ocfs2: remove the second argument of k[un]map_atomic()
    ntfs: remove the second argument of k[un]map_atomic()
    ...

    Linus Torvalds