06 Apr, 2012

2 commits

  • Merge batch of fixes from Andrew Morton:
    "The simple_open() cleanup was held back while I wanted for laggards to
    merge things.

    I still need to send a few checkpoint/restore patches. I've been
    wobbly about merging them because I'm wobbly about the overall
    prospects for success of the project. But after speaking with Pavel
    at the LSF conference, it sounds like they're further toward
    completion than I feared - apparently davem is at the "has stopped
    complaining" stage regarding the net changes. So I need to go back
    and re-review those patchs and their (lengthy) discussion."

    * emailed from Andrew Morton : (16 patches)
    memcg swap: use mem_cgroup_uncharge_swap fix
    backlight: add driver for DA9052/53 PMIC v1
    C6X: use set_current_blocked() and block_sigmask()
    MAINTAINERS: add entry for sparse checker
    MAINTAINERS: fix REMOTEPROC F: typo
    alpha: use set_current_blocked() and block_sigmask()
    simple_open: automatically convert to simple_open()
    scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()
    libfs: add simple_open()
    hugetlbfs: remove unregister_filesystem() when initializing module
    drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback
    fs/xattr.c:setxattr(): improve handling of allocation failures
    fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
    fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
    sysrq: use SEND_SIG_FORCED instead of force_sig()
    proc: fix mount -t proc -o AAA

    Linus Torvalds
     
  • Many users of debugfs copy the implementation of default_open() when
    they want to support a custom read/write function op. This leads to a
    proliferation of the default_open() implementation across the entire
    tree.

    Now that the common implementation has been consolidated into libfs we
    can replace all the users of this function with simple_open().

    This replacement was done with the following semantic patch:

    @ open @
    identifier open_f != simple_open;
    identifier i, f;
    @@
    -int open_f(struct inode *i, struct file *f)
    -{
    (
    -if (i->i_private)
    -f->private_data = i->i_private;
    |
    -f->private_data = i->i_private;
    )
    -return 0;
    -}

    @ has_open depends on open @
    identifier fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ...
    -.open = open_f,
    +.open = simple_open,
    ...
    };

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stephen Boyd
    Cc: Greg Kroah-Hartman
    Cc: Al Viro
    Cc: Julia Lawall
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     

05 Apr, 2012

2 commits

  • Pull KGDB/KDB regression fixes from Jason Wessel:
    - Fix a Smatch warning that appeared in the 3.4 merge window
    - Fix kgdb test suite with SMP for all archs without HW single stepping
    - Fix kgdb sw breakpoints with CONFIG_DEBUG_RODATA=y limitations on x86
    - Fix oops on kgdb test suite with CONFIG_DEBUG_RODATA
    - Fix kgdb test suite with SMP for all archs with HW single stepping

    * tag 'for_linus-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
    x86,kgdb: Fix DEBUG_RODATA limitation using text_poke()
    kgdb,debug_core: pass the breakpoint struct instead of address and memory
    kgdbts: (2 of 2) fix single step awareness to work correctly with SMP
    kgdbts: (1 of 2) fix single step awareness to work correctly with SMP
    kgdbts: Fix kernel oops with CONFIG_DEBUG_RODATA
    kdb: Fix smatch warning on dbg_io_ops->is_console

    Linus Torvalds
     
  • Pull more power management updates from Rafael Wysocki:
    - Patch series that hopefully fixes races between the freezer and
    request_firmware() and request_firmware_nowait() for good, with two
    cleanups from Stephen Boyd on top.
    - Runtime PM fix from Alan Stern preventing tasks from getting stuck
    indefinitely in the runtime PM wait queue.
    - Device PM QoS update from MyungJoo Ham introducing a new variant of
    pm_qos_update_request() allowing the callers to specify a timeout.

    * tag 'pm-for-3.4-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM / QoS: add pm_qos_update_request_timeout() API
    firmware_class: Move request_firmware_nowait() to workqueues
    firmware_class: Reorganize fw_create_instance()
    PM / Sleep: Mitigate race between the freezer and request_firmware()
    PM / Sleep: Move disabling of usermode helpers to the freezer
    PM / Hibernate: Disable usermode helpers right before freezing tasks
    firmware_class: Do not warn that system is not ready from async loads
    firmware_class: Split _request_firmware() into three functions, v2
    firmware_class: Rework usermodehelper check
    PM / Runtime: don't forget to wake up waitqueue on failure

    Linus Torvalds
     

03 Apr, 2012

2 commits

  • This merges some of the fixes from Paul Gortmaker for the header file
    cleanup fallout.

    Some of the patches are going through arch maintainer trees, and David
    Howells suggested another be done differently, but this at least fixes a
    few cases.

    * emailed from Paul Gortmaker :
    asm-generic: add linux/types.h to cmpxchg.h
    firewire: restore the device.h include in linux/firewire.h
    frv: fix warnings in mb93090-mb00/pci-dma.c about implicit EXPORT_SYMBOL
    parisc: fix missing cmpxchg file error from system.h split
    blackfin: fix cmpxchg build fails from system.h fallout
    avr32: fix build failures from mis-naming of atmel_nand.h
    ARM: mach-msm: fix compile fail from system.h fallout
    irq_work: fix compile failure on MIPS from system.h split

    Linus Torvalds
     
  • Pull crypto fixes from Herbert Xu:
    - Fix for CPU hotplug hang in padata.
    - Avoid using cpu_active inappropriately in pcrypt and padata.
    - Fix for user-space algorithm lookup hang with IV generators.
    - Fix for netlink dump of algorithms where stuff went missing due to
    incorrect calculation of message size.

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: user - Fix size of netlink dump message
    crypto: user - Fix lookup of algorithms with IV generator
    crypto: pcrypt - Use the online cpumask as the default
    padata: Fix cpu hotplug
    padata: Use the online cpumask as the default
    padata: Add a reference to the api documentation

    Linus Torvalds
     

02 Apr, 2012

2 commits

  • Pull cpumask cleanups from Rusty Russell:
    "(Somehow forgot to send this out; it's been sitting in linux-next, and
    if you don't want it, it can sit there another cycle)"

    I'm a sucker for things that actually delete lines of code.

    Fix up trivial conflict in arch/arm/kernel/kprobes.c, where Rusty fixed
    a user of &cpu_online_map to be cpu_online_mask, but that code got
    deleted by commit b21d55e98ac2 ("ARM: 7332/1: extract out code patch
    function from kprobes").

    * tag 'for-linus' of git://github.com/rustyrussell/linux:
    cpumask: remove old cpu_*_map.
    documentation: remove references to cpu_*_map.
    drivers/cpufreq/db8500-cpufreq: remove references to cpu_*_map.
    remove references to cpu_*_map in arch/

    Linus Torvalds
     
  • Builds of the MIPS platform ip32_defconfig fails as of commit
    0195c00244dc ("Merge tag 'split-asm_system_h ...") because MIPS xchg()
    macro uses BUILD_BUG_ON and it was moved in commit b81947c646bf
    ("Disintegrate asm/system.h for MIPS").

    The root cause is that the system.h split wasn't tested on a baseline
    with commit 6c03438edeb5 ("kernel.h: doesn't explicitly use bug.h, so
    don't include it.")

    Since this file uses BUG code in several other places besides the xchg
    call, simply make the inclusion explicit.

    Signed-off-by: Paul Gortmaker
    Acked-by: David Howells
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

01 Apr, 2012

2 commits

  • Pull scheduler fixes from Ingo Molnar.

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Fix incorrect usage of for_each_cpu_mask() in select_fallback_rq()
    sched: Fix __schedule_bug() output when called from an interrupt
    sched/arch: Introduce the finish_arch_post_lock_switch() scheduler callback

    Linus Torvalds
     
  • Pull perf updates and fixes from Ingo Molnar:
    "It's mostly fixes, but there's also two late items:

    - preliminary GTK GUI support for perf report
    - PMU raw event format descriptors in sysfs, to be parsed by tooling

    The raw event format in sysfs is a new ABI. For example for the 'CPU'
    PMU we have:

    aldebaran:~> ll /sys/bus/event_source/devices/cpu/format/*
    -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/any
    -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/cmask
    -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/edge
    -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/event
    -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/inv
    -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/offcore_rsp
    -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/pc
    -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/umask

    those lists of fields contain a specific format:

    aldebaran:~> cat /sys/bus/event_source/devices/cpu/format/offcore_rsp
    config1:0-63

    So, those who wish to specify raw events can now use the following
    event format:

    -e cpu/cmask=1,event=2,umask=3

    Most people will not want to specify any events (let alone raw
    events), they'll just use whatever default event the tools use.

    But for more obscure PMU events that have no cross-architecture
    generic events the above syntax is more usable and a bit more
    structured than specifying hex numbers."

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
    perf tools: Remove auto-generated bison/flex files
    perf annotate: Fix off by one symbol hist size allocation and hit accounting
    perf tools: Add missing ref-cycles event back to event parser
    perf annotate: addr2line wants addresses in same format as objdump
    perf probe: Finder fails to resolve function name to address
    tracing: Fix ent_size in trace output
    perf symbols: Handle NULL dso in dso__name_len
    perf symbols: Do not include libgen.h
    perf tools: Fix bug in raw sample parsing
    perf tools: Fix display of first level of callchains
    perf tools: Switch module.h into export.h
    perf: Move mmap page data_head offset assertion out of header
    perf: Fix mmap_page capabilities and docs
    perf diff: Fix to work with new hists design
    perf tools: Fix modifier to be applied on correct events
    perf tools: Fix various casting issues for 32 bits
    perf tools: Simplify event_read_id exit path
    tracing: Fix ftrace stack trace entries
    tracing: Move the tracing_on/off() declarations into CONFIG_TRACING
    perf report: Add a simple GTK2-based 'perf report' browser
    ...

    Linus Torvalds
     

31 Mar, 2012

4 commits

  • The function for_each_cpu_mask() expects a *pointer* to struct
    cpumask as its second argument, whereas select_fallback_rq()
    passes the value itself.

    And moreover, for_each_cpu_mask() has been marked as obselete
    in include/linux/cpumask.h. So move to the more appropriate
    for_each_cpu() variant.

    Reported-by: Sasha Levin
    Signed-off-by: Srivatsa S. Bhat
    Acked-by: Peter Zijlstra
    Cc: Dave Jones
    Cc: Liu Chuansheng
    Cc: vapier@gentoo.org
    Cc: rusty@rustcorp.com.au
    Link: http://lkml.kernel.org/r/4F75BED4.9050005@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Srivatsa S. Bhat
     
  • Pull genirq updates from Thomas Gleixner.

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Adjust irq thread affinity on IRQ_SET_MASK_OK_NOCOPY return value
    genirq: Respect NUMA node affinity in setup_irq_irq affinity()
    genirq: Get rid of unneeded force parameter in irq_finalize_oneshot()
    genirq: Minor readablity improvement in irq_wake_thread()

    Linus Torvalds
     
  • Pull core locking updates from Thomas Gleixner.

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    futex: Mark get_robust_list as deprecated
    futex: Do not leak robust list to unprivileged process

    Linus Torvalds
     
  • irq_move_masked_irq() checks the return code of
    chip->irq_set_affinity() only for 0, but IRQ_SET_MASK_OK_NOCOPY is
    also a valid return code, which is there to avoid a redundant copy of
    the cpumask. But in case of IRQ_SET_MASK_OK_NOCOPY we not only avoid
    the redundant copy, we also fail to adjust the thread affinity of an
    eventually threaded interrupt handler.

    Handle IRQ_SET_MASK_OK (==0) and IRQ_SET_MASK_OK_NOCOPY(==1) return
    values correctly by checking the valid return values seperately.

    Signed-off-by: Jiang Liu
    Cc: Jiang Liu
    Cc: Keping Chen
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1333120296-13563-2-git-send-email-jiang.liu@huawei.com
    Signed-off-by: Thomas Gleixner

    Jiang Liu
     

30 Mar, 2012

10 commits

  • Pull urgent cgroup fix from Tejun Heo:
    "Commit 61d1d219c4c0 ('cgroup: remove extra calls to
    find_existing_css_set') which was part of the rc1 cgroup pull request
    made writes to the cgroup "tasks" file return an uninitialized retval
    on success which can cause boot failures with systemd.

    The change stayed in linux-next for quite some time but gcc
    interestingly failed to emit warning about using uninitialized
    variable and the problem seems to materialize only for certain build
    combinations (probably depends on register allocation).

    It's just missing local variable initialization and the fix is trivial
    & safe. As the problem is critical when it materializes, I'm
    fast-tracking it. Also included is Li's email address change in
    MAINTAINERS."

    * 'for-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: cgroup_attach_task() could return -errno after success
    cgroup: update MAINTAINERS entry

    Linus Torvalds
     
  • 61d1d219c4 "cgroup: remove extra calls to find_existing_css_set" made
    cgroup_task_migrate() return void. An unfortunate side effect was
    that cgroup_attach_task() was depending on that function's return
    value to clear its @retval on the success path. On cgroup mounts
    without any subsystem with ->can_attach() callback,
    cgroup_attach_task() ended up returning @retval without initializing
    it on success.

    For some reason, gcc failed to warn about it and it didn't cause
    cgroup_attach_task() to return non-zero value in many cases, probably
    due to difference in register allocation. When the problem
    materializes, systemd fails to populate /systemd cgroup mount and
    fails to boot.

    Fix it by initializing @retval to zero on declaration.

    Signed-off-by: Tejun Heo
    Reported-by: Jiri Kosina
    LKML-Reference:
    Reviewed-by: Mandeep Singh Baines
    Acked-by: Li Zefan

    Tejun Heo
     
  • Pull x32 support for x86-64 from Ingo Molnar:
    "This tree introduces the X32 binary format and execution mode for x86:
    32-bit data space binaries using 64-bit instructions and 64-bit kernel
    syscalls.

    This allows applications whose working set fits into a 32 bits address
    space to make use of 64-bit instructions while using a 32-bit address
    space with shorter pointers, more compressed data structures, etc."

    Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

    * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    x32: Fix alignment fail in struct compat_siginfo
    x32: Fix stupid ia32/x32 inversion in the siginfo format
    x32: Add ptrace for x32
    x32: Switch to a 64-bit clock_t
    x32: Provide separate is_ia32_task() and is_x32_task() predicates
    x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
    x86/x32: Fix the binutils auto-detect
    x32: Warn and disable rather than error if binutils too old
    x32: Only clear TIF_X32 flag once
    x32: Make sure TS_COMPAT is cleared for x32 tasks
    fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
    fs: Fix close_on_exec pointer in alloc_fdtable
    x32: Drop non-__vdso weak symbols from the x32 VDSO
    x32: Fix coding style violations in the x32 VDSO code
    x32: Add x32 VDSO support
    x32: Allow x32 to be configured
    x32: If configured, add x32 system calls to system call tables
    x32: Handle process creation
    x32: Signal-related system calls
    x86: Add #ifdef CONFIG_COMPAT to
    ...

    Linus Torvalds
     
  • Pull more ARM updates from Russell King.

    This got a fair number of conflicts with the split, but
    also with some other sparse-irq and header file include cleanups. They
    all looked pretty trivial, though.

    * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (59 commits)
    ARM: fix Kconfig warning for HAVE_BPF_JIT
    ARM: 7361/1: provide XIP_VIRT_ADDR for no-MMU builds
    ARM: 7349/1: integrator: convert to sparse irqs
    ARM: 7259/3: net: JIT compiler for packet filters
    ARM: 7334/1: add jump label support
    ARM: 7333/2: jump label: detect %c support for ARM
    ARM: 7338/1: add support for early console output via semihosting
    ARM: use set_current_blocked() and block_sigmask()
    ARM: exec: remove redundant set_fs(USER_DS)
    ARM: 7332/1: extract out code patch function from kprobes
    ARM: 7331/1: extract out insn generation code from ftrace
    ARM: 7330/1: ftrace: use canonical Thumb-2 wide instruction format
    ARM: 7351/1: ftrace: remove useless memory checks
    ARM: 7316/1: kexec: EOI active and mask all interrupts in kexec crash path
    ARM: Versatile Express: add NO_IOPORT
    ARM: get rid of asm/irq.h in asm/prom.h
    ARM: 7319/1: Print debug info for SIGBUS in user faults
    ARM: 7318/1: gic: refactor irq_start assignment
    ARM: 7317/1: irq: avoid NULL check in for_each_irq_desc loop
    ARM: 7315/1: perf: add support for the Cortex-A7 PMU
    ...

    Linus Torvalds
     
  • There is extra state information that needs to be exposed in the
    kgdb_bpt structure for tracking how a breakpoint was installed. The
    debug_core only uses the the probe_kernel_write() to install
    breakpoints, but this is not enough for all the archs. Some arch such
    as x86 need to use text_poke() in order to install a breakpoint into a
    read only page.

    Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and
    kgdb_arch_remove_breakpoint() allows other archs to set the type
    variable which indicates how the breakpoint was installed.

    Cc: stable@vger.kernel.org # >= 2.6.36
    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • The Smatch tool warned that the change from commit b8adde8dd
    (kdb: Avoid using dbg_io_ops until it is initialized) should
    add another null check later in the kdb_printf().

    It is worth noting that the second use of dbg_io_ops->is_console
    is protected by the KDB_PAGER state variable which would only
    get set when kdb is fully active and initialized. If we
    ever encounter changes or defects in the KDB_PAGER state
    we do not want to crash the kernel in a kdb_printf/printk.

    CC: Tim Bird
    Reported-by: Dan Carpenter
    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • Pull scheduler fixes from Ingo Molnar.

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    cpusets: Remove an unused variable
    sched/rt: Improve pick_next_highest_task_rt()
    sched: Fix select_fallback_rq() vs cpu_active/cpu_online
    sched/x86/smp: Do not enable IRQs over calibrate_delay()
    sched: Fix compiler warning about declared inline after use
    MAINTAINERS: Update email address for SCHEDULER and PERF EVENTS

    Linus Torvalds
     
  • Pull x86 updates from Ingo Molnar.

    This touches some non-x86 files due to the sanitized INLINE_SPIN_UNLOCK
    config usage.

    Fixed up trivial conflicts due to just header include changes (removing
    headers due to cpu_idle() merge clashing with the split).

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/apic/amd: Be more verbose about LVT offset assignments
    x86, tls: Off by one limit check
    x86/ioapic: Add io_apic_ops driver layer to allow interception
    x86/olpc: Add debugfs interface for EC commands
    x86: Merge the x86_32 and x86_64 cpu_idle() functions
    x86/kconfig: Remove CONFIG_TR=y from the defconfigs
    x86: Stop recursive fault in print_context_stack after stack overflow
    x86/io_apic: Move and reenable irq only when CONFIG_GENERIC_PENDING_IRQ=y
    x86/apic: Add separate apic_id_valid() functions for selected apic drivers
    locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage
    x86/kconfig: Update defconfigs
    x86: Fix excessive MSR print out when show_msr is not specified

    Linus Torvalds
     
  • Pull timer core updates from Thomas Gleixner.

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ia64: vsyscall: Add missing paranthesis
    alarmtimer: Don't call rtc_timer_init() when CONFIG_RTC_CLASS=n
    x86: vdso: Put declaration before code
    x86-64: Inline vdso clock_gettime helpers
    x86-64: Simplify and optimize vdso clock_gettime monotonic variants
    kernel-time: fix s/then/than/ spelling errors
    time: remove no_sync_cmos_clock
    time: Avoid scary backtraces when warning of > 11% adj
    alarmtimer: Make sure we initialize the rtctimer
    ntp: Fix leap-second hrtimer livelock
    x86, tsc: Skip refined tsc calibration on systems with reliable TSC
    rtc: Provide flag for rtc devices that don't support UIE
    ia64: vsyscall: Use seqcount instead of seqlock
    x86: vdso: Use seqcount instead of seqlock
    x86: vdso: Remove bogus locking in update_vsyscall_tz()
    time: Remove bogus comments
    time: Fix change_clocksource locking
    time: x86: Fix race switching from vsyscall to non-vsyscall clock

    Linus Torvalds
     
  • The debugfs code is really generic for all platforms. This patch removes the
    powerpc-specific directory reference and makes it available to all
    architectures.

    Signed-off-by: Grant Likely

    Grant Likely
     

29 Mar, 2012

16 commits

  • We don't remove the cpu that went offline from our cpumasks
    on cpu hotplug. This got lost somewhere along the line, so
    restore it. This fixes a hang of the padata instance on cpu
    hotplug.

    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • We use the active cpumask to determine the superset of cpus
    to use for parallelization. However, the active cpumask is
    for internal usage of the scheduler and therefore not the
    appropriate cpumask for these purposes. So use the online
    cpumask instead.

    Reported-by: Peter Zijlstra
    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • Add a reference to the padata api documentation at Documentation/padata.txt

    Suggested-by: Peter Zijlstra
    Signed-off-by: Steffen Klassert
    Signed-off-by: Herbert Xu

    Steffen Klassert
     
  • Merge reason: It has not gone upstream via the ARM tree, merge it via
    the scheduler tree.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Notify get_robust_list users that the syscall is going away.

    Suggested-by: Thomas Gleixner
    Signed-off-by: Kees Cook
    Cc: Randy Dunlap
    Cc: Darren Hart
    Cc: Peter Zijlstra
    Cc: Jiri Kosina
    Cc: Eric W. Biederman
    Cc: David Howells
    Cc: Serge E. Hallyn
    Cc: kernel-hardening@lists.openwall.com
    Cc: spender@grsecurity.net
    Link: http://lkml.kernel.org/r/20120323190855.GA27213@www.outflux.net
    Signed-off-by: Thomas Gleixner

    Kees Cook
     
  • It was possible to extract the robust list head address from a setuid
    process if it had used set_robust_list(), allowing an ASLR info leak. This
    changes the permission checks to be the same as those used for similar
    info that comes out of /proc.

    Running a setuid program that uses robust futexes would have had:
    cred->euid != pcred->euid
    cred->euid == pcred->uid
    so the old permissions check would allow it. I'm not aware of any setuid
    programs that use robust futexes, so this is just a preventative measure.

    (This patch is based on changes from grsecurity.)

    Signed-off-by: Kees Cook
    Cc: Darren Hart
    Cc: Peter Zijlstra
    Cc: Jiri Kosina
    Cc: Eric W. Biederman
    Cc: David Howells
    Cc: Serge E. Hallyn
    Cc: kernel-hardening@lists.openwall.com
    Cc: spender@grsecurity.net
    Link: http://lkml.kernel.org/r/20120319231253.GA20893@www.outflux.net
    Signed-off-by: Thomas Gleixner

    Kees Cook
     
  • We respect node affinity of devices already in the irq descriptor
    allocation, but we ignore it for the initial interrupt affinity
    setup, so the interrupt might be routed to a different node.

    Restrict the default affinity mask to the node on which the irq
    descriptor is allocated.

    [ tglx: Massaged changelog ]

    Signed-off-by: Prarit Bhargava
    Acked-by: Neil Horman
    Cc: Yinghai Lu
    Cc: David Rientjes
    Link: http://lkml.kernel.org/r/1332788538-17425-1-git-send-email-prarit@redhat.com
    Signed-off-by: Thomas Gleixner

    Prarit Bhargava
     
  • The only place irq_finalize_oneshot() is called with force parameter set
    is the threaded handler error exit path. But IRQTF_RUNTHREAD is dropped
    at this point and irq_wake_thread() is not going to set it again,
    since PF_EXITING is set for this thread already. So irq_finalize_oneshot()
    will drop the threads bit in threads_oneshot anyway and hence the force
    parameter is superfluous.

    Signed-off-by: Alexander Gordeev
    Link: http://lkml.kernel.org/r/20120321162234.GP24806@dhcp-26-207.brq.redhat.com
    Signed-off-by: Thomas Gleixner

    Alexander Gordeev
     
  • exit_irq_thread() clears IRQTF_RUNTHREAD flag and drops the thread's bit in
    desc->threads_oneshot then. The bit must not be set again in between and it
    does not, since irq_wake_thread() sees PF_EXITING flag first and returns.

    Due to above the order or checking PF_EXITING and IRQTF_RUNTHREAD flags in
    irq_wake_thread() is important. This change just makes it more visible in the
    source code.

    Signed-off-by: Alexander Gordeev
    Link: http://lkml.kernel.org/r/20120321162212.GO24806@dhcp-26-207.brq.redhat.com
    Signed-off-by: Thomas Gleixner

    Alexander Gordeev
     
  • If schedule is called from an interrupt handler __schedule_bug()
    will call show_regs() with the registers saved during the
    interrupt handling done in do_IRQ(). This means we'll see the
    registers and the backtrace for the process that was interrupted
    and not the full backtrace explaining who called schedule().

    This is due to 838225b ("sched: use show_regs() to improve
    __schedule_bug() output", 2007-10-24) which improperly assumed
    that get_irq_regs() would return the registers for the current
    stack because it is being called from within an interrupt
    handler. Simply remove the show_reg() code so that we dump a
    backtrace for the interrupt handler that called schedule().

    [ I ran across this when I was presented with a scheduling while
    atomic log with a stacktrace pointing at spin_unlock_irqrestore().
    It made no sense and I had to guess what interrupt handler could
    be called and poke around for someone calling schedule() in an
    interrupt handler. A simple test of putting an msleep() in
    an interrupt handler works better with this patch because you
    can actually see the msleep() call in the backtrace. ]

    Also-reported-by: Chris Metcalf
    Signed-off-by: Stephen Boyd
    Cc: Satyam Sharma
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1332979847-27102-1-git-send-email-sboyd@codeaurora.org
    Signed-off-by: Ingo Molnar

    Stephen Boyd
     
  • This has been obsolescent for a while, fix documentation and
    misc comments.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Merge third batch of patches from Andrew Morton:
    - Some MM stragglers
    - core SMP library cleanups (on_each_cpu_mask)
    - Some IPI optimisations
    - kexec
    - kdump
    - IPMI
    - the radix-tree iterator work
    - various other misc bits.

    "That'll do for -rc1. I still have ~10 patches for 3.4, will send
    those along when they've baked a little more."

    * emailed from Andrew Morton : (35 commits)
    backlight: fix typo in tosa_lcd.c
    crc32: add help text for the algorithm select option
    mm: move hugepage test examples to tools/testing/selftests/vm
    mm: move slabinfo.c to tools/vm
    mm: move page-types.c from Documentation to tools/vm
    selftests/Makefile: make `run_tests' depend on `all'
    selftests: launch individual selftests from the main Makefile
    radix-tree: use iterators in find_get_pages* functions
    radix-tree: rewrite gang lookup using iterator
    radix-tree: introduce bit-optimized iterator
    fs/proc/namespaces.c: prevent crash when ns_entries[] is empty
    nbd: rename the nbd_device variable from lo to nbd
    pidns: add reboot_pid_ns() to handle the reboot syscall
    sysctl: use bitmap library functions
    ipmi: use locks on watchdog timeout set on reboot
    ipmi: simplify locking
    ipmi: fix message handling during panics
    ipmi: use a tasklet for handling received messages
    ipmi: increase KCS timeouts
    ipmi: decrease the IPMI message transaction time in interrupt mode
    ...

    Linus Torvalds
     
  • In the case of a child pid namespace, rebooting the system does not really
    makes sense. When the pid namespace is used in conjunction with the other
    namespaces in order to create a linux container, the reboot syscall leads
    to some problems.

    A container can reboot the host. That can be fixed by dropping the
    sys_reboot capability but we are unable to correctly to poweroff/
    halt/reboot a container and the container stays stuck at the shutdown time
    with the container's init process waiting indefinitively.

    After several attempts, no solution from userspace was found to reliabily
    handle the shutdown from a container.

    This patch propose to make the init process of the child pid namespace to
    exit with a signal status set to : SIGINT if the child pid namespace
    called "halt/poweroff" and SIGHUP if the child pid namespace called
    "reboot". When the reboot syscall is called and we are not in the initial
    pid namespace, we kill the pid namespace for "HALT", "POWEROFF",
    "RESTART", and "RESTART2". Otherwise we return EINVAL.

    Returning EINVAL is also an easy way to check if this feature is supported
    by the kernel when invoking another 'reboot' option like CAD.

    By this way the parent process of the child pid namespace knows if it
    rebooted or not and can take the right decision.

    Test case:
    ==========

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #include

    static int do_reboot(void *arg)
    {
    int *cmd = arg;

    if (reboot(*cmd))
    printf("failed to reboot(%d): %m\n", *cmd);
    }

    int test_reboot(int cmd, int sig)
    {
    long stack_size = 4096;
    void *stack = alloca(stack_size) + stack_size;
    int status;
    pid_t ret;

    ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd);
    if (ret < 0) {
    printf("failed to clone: %m\n");
    return -1;
    }

    if (wait(&status) < 0) {
    printf("unexpected wait error: %m\n");
    return -1;
    }

    if (!WIFSIGNALED(status)) {
    printf("child process exited but was not signaled\n");
    return -1;
    }

    if (WTERMSIG(status) != sig) {
    printf("signal termination is not the one expected\n");
    return -1;
    }

    return 0;
    }

    int main(int argc, char *argv[])
    {
    int status;

    status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP);
    if (status < 0)
    return 1;
    printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n");

    status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP);
    if (status < 0)
    return 1;
    printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n");

    status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT);
    if (status < 0)
    return 1;
    printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n");

    status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT);
    if (status < 0)
    return 1;
    printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n");

    status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1);
    if (status >= 0) {
    printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n");
    return 1;
    }
    printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n");

    return 0;
    }

    [akpm@linux-foundation.org: tweak and add comments]
    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Daniel Lezcano
    Acked-by: Serge Hallyn
    Tested-by: Serge Hallyn
    Reviewed-by: Oleg Nesterov
    Cc: Michael Kerrisk
    Cc: "Eric W. Biederman"
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     
  • Use bitmap_set() instead of using set_bit() for each bit. This conversion
    is valid because the bitmap is private in the function call and atomic
    bitops were unnecessary.

    This also includes minor change.
    - Use bitmap_copy() for shorter typing

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • When using crashkernel=2M-256M, the kernel doesn't give any warning. This
    is misleading sometimes.

    Signed-off-by: Zhenzhong Duan
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhenzhong Duan
     
  • nommu platforms don't have very interesting swapper_pg_dir pointers and
    usually just #define them to NULL, meaning that we can't include them in
    the vmcoreinfo on the kexec crash path.

    This patch only saves the swapper_pg_dir if we have an MMU.

    Signed-off-by: Will Deacon
    Reviewed-by: Simon Horman
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon