07 Aug, 2015

3 commits

  • The s-Par visornic driver, currently in staging, processes a queue being
    serviced by the an s-Par service partition. We can get a message that
    something has happened with the Service Partition, when that happens, we
    must not access the channel until we get a message that the service
    partition is back again.

    The visornic driver has a thread for processing the channel, when we get
    the message, we need to be able to park the thread and then resume it
    when the problem clears.

    We can do this with kthread_park and unpark but they are not exported
    from the kernel, this patch exports the needed functions.

    Signed-off-by: David Kershner
    Acked-by: Ingo Molnar
    Acked-by: Neil Horman
    Acked-by: Thomas Gleixner
    Cc: Richard Weinberger
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Kershner
     
  • This function may copy the si_addr_lsb, si_lower and si_upper fields to
    user mode when they haven't been initialized, which can leak kernel
    stack data to user mode.

    Just checking the value of si_code is insufficient because the same
    si_code value is shared between multiple signals. This is solved by
    checking the value of si_signo in addition to si_code.

    Signed-off-by: Amanieu d'Antras
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amanieu d'Antras
     
  • This function can leak kernel stack data when the user siginfo_t has a
    positive si_code value. The top 16 bits of si_code descibe which fields
    in the siginfo_t union are active, but they are treated inconsistently
    between copy_siginfo_from_user32, copy_siginfo_to_user32 and
    copy_siginfo_to_user.

    copy_siginfo_from_user32 is called from rt_sigqueueinfo and
    rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
    of si_code.

    This fixes the following information leaks:
    x86: 8 bytes leaked when sending a signal from a 32-bit process to
    itself. This leak grows to 16 bytes if the process uses x32.
    (si_code = __SI_CHLD)
    x86: 100 bytes leaked when sending a signal from a 32-bit process to
    a 64-bit process. (si_code = -1)
    sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
    64-bit process. (si_code = any)

    parsic and s390 have similar bugs, but they are not vulnerable because
    rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
    to a different process. These bugs are also fixed for consistency.

    Signed-off-by: Amanieu d'Antras
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Chris Metcalf
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amanieu d'Antras
     

29 Jul, 2015

1 commit

  • We don't actually hold the module_mutex when calling find_module_all
    from module_kallsyms_lookup_name: that's because it's used by the oops
    code and we don't want to deadlock.

    However, access to the list read-only is safe if preempt is disabled,
    so we can weaken the assertion. Keep a strong version for external
    callers though.

    Fixes: 0be964be0d45 ("module: Sanitize RCU usage and locking")
    Reported-by: He Kuang
    Cc: stable@kernel.org
    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Rusty Russell
     

27 Jul, 2015

1 commit

  • Pull x86 fixes from Thomas Gleixner:
    "This update contains:

    - the manual revert of the SYSCALL32 changes which caused a
    regression

    - a fix for the MPX vma handling

    - three fixes for the ioremap 'is ram' checks.

    - PAT warning fixes

    - a trivial fix for the size calculation of TLB tracepoints

    - handle old EFI structures gracefully

    This also contains a PAT fix from Jan plus a revert thereof. Toshi
    explained why the code is correct"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mm/pat: Revert 'Adjust default caching mode translation tables'
    x86/asm/entry/32: Revert 'Do not use R9 in SYSCALL32' commit
    x86/mm: Fix newly introduced printk format warnings
    mm: Fix bugs in region_is_ram()
    x86/mm: Remove region_is_ram() call from ioremap
    x86/mm: Move warning from __ioremap_check_ram() to the call site
    x86/mm/pat, drivers/media/ivtv: Move the PAT warning and replace WARN() with pr_warn()
    x86/mm/pat, drivers/infiniband/ipath: Replace WARN() with pr_warn()
    x86/mm/pat: Adjust default caching mode translation tables
    x86/fpu: Disable dependent CPU features on "noxsave"
    x86/mpx: Do not set ->vm_ops on MPX VMAs
    x86/mm: Add parenthesis for TLB tracepoint size calculation
    efi: Handle memory error structures produced based on old versions of standard

    Linus Torvalds
     

26 Jul, 2015

1 commit

  • Pull ftrace fix from Steven Rostedt:
    "Back in 3.16 the ftrace code was redesigned and cleaned up to remove
    the double iteration list (one for registered ftrace ops, and one for
    registered "global" ops), to just use one list. That simplified the
    code but also broke the function tracing filtering on pid.

    This updates the code to handle the filtering again with the new
    logic"

    * tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ftrace: Fix breakage of set_ftrace_pid

    Linus Torvalds
     

25 Jul, 2015

1 commit

  • Commit 4104d326b670 ("ftrace: Remove global function list and call function
    directly") simplified the ftrace code by removing the global_ops list with a
    new design. But this cleanup also broke the filtering of PIDs that are added
    to the set_ftrace_pid file.

    Add back the proper hooks to have pid filtering working once again.

    Cc: stable@vger.kernel.org # 3.16+
    Reported-by: Matt Fleming
    Reported-by: Richard Weinberger
    Tested-by: Matt Fleming
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

22 Jul, 2015

1 commit

  • region_is_ram() looks up the iomem_resource table to check if
    a target range is in RAM. However, it always returns with -1
    due to invalid range checks. It always breaks the loop at the
    first entry of the table.

    Another issue is that it compares p->flags and flags, but it always
    fails. flags is declared as int, which makes it as a negative value
    with IORESOURCE_BUSY (0x80000000) set while p->flags is unsigned long.

    Fix the range check and flags so that region_is_ram() works as
    advertised.

    Signed-off-by: Toshi Kani
    Reviewed-by: Dan Williams
    Cc: Mike Travis
    Cc: Luis R. Rodriguez
    Cc: Andrew Morton
    Cc: Roland Dreier
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/1437088996-28511-4-git-send-email-toshi.kani@hp.com
    Signed-off-by: Thomas Gleixner

    Toshi Kani
     

19 Jul, 2015

4 commits

  • Pull x86 fixes from Ingo Molnar:
    "Two families of fixes:

    - Fix an FPU context related boot crash on newer x86 hardware with
    larger context sizes than what most people test. To fix this
    without ugly kludges or extensive reverts we had to touch core task
    allocator, to allow x86 to determine the task size dynamically, at
    boot time.

    I've tested it on a number of x86 platforms, and I cross-built it
    to a handful of architectures:

    (warns) (warns)
    testing x86-64: -git: pass ( 0), -tip: pass ( 0)
    testing x86-32: -git: pass ( 0), -tip: pass ( 0)
    testing arm: -git: pass ( 1359), -tip: pass ( 1359)
    testing cris: -git: pass ( 1031), -tip: pass ( 1031)
    testing m32r: -git: pass ( 1135), -tip: pass ( 1135)
    testing m68k: -git: pass ( 1471), -tip: pass ( 1471)
    testing mips: -git: pass ( 1162), -tip: pass ( 1162)
    testing mn10300: -git: pass ( 1058), -tip: pass ( 1058)
    testing parisc: -git: pass ( 1846), -tip: pass ( 1846)
    testing sparc: -git: pass ( 1185), -tip: pass ( 1185)

    ... so I hope the cross-arch impact 'none', as intended.

    (by Dave Hansen)

    - Fix various NMI handling related bugs unearthed by the big asm code
    rewrite and generally make the NMI code more robust and more
    maintainable while at it. These changes are a bit late in the
    cycle, I hope they are still acceptable.

    (by Andy Lutomirski)"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
    x86/fpu, sched: Dynamically allocate 'struct fpu'
    x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
    x86/nmi/64: Make the "NMI executing" variable more consistent
    x86/nmi/64: Minor asm simplification
    x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
    x86/nmi/64: Reorder nested NMI checks
    x86/nmi/64: Improve nested NMI comments
    x86/nmi/64: Switch stacks on userspace NMI entry
    x86/nmi/64: Remove asm code that saves CR2
    x86/nmi: Enable nested do_nmi() handling for 64-bit kernels

    Linus Torvalds
     
  • Pull timer fix from Ingo Molnar:
    "Fix for a misplaced export that can cause build failures in certain
    (rare) Kconfig situations"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tick: Move the export of tick_broadcast_oneshot_control to the proper place

    Linus Torvalds
     
  • Pull scheduler fix from Ingo Molnar:
    "A oneliner rq throttling fix"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/fair: Test list head instead of list entry in throttle_cfs_rq()

    Linus Torvalds
     
  • Pull irq fixes from Ingo Molnar:
    "Misc irq fixes:

    - two driver fixes
    - a Xen regression fix
    - a nested irq thread crash fix"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/gicv3-its: Fix mapping of LPIs to collections
    genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
    genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
    gpio/davinci: Fix race in installing chained irq handler

    Linus Torvalds
     

18 Jul, 2015

2 commits

  • Don't burden architectures without dynamic task_struct sizing
    with the overhead of dynamic sizing.

    Also optimize the x86 code a bit by caching task_struct_size.

    Acked-and-Tested-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The FPU rewrite removed the dynamic allocations of 'struct fpu'.
    But, this potentially wastes massive amounts of memory (2k per
    task on systems that do not have AVX-512 for instance).

    Instead of having a separate slab, this patch just appends the
    space that we need to the 'task_struct' which we dynamically
    allocate already. This saves from doing an extra slab
    allocation at fork().

    The only real downside here is that we have to stick everything
    and the end of the task_struct. But, I think the
    BUILD_BUG_ON()s I stuck in there should keep that from being too
    fragile.

    Signed-off-by: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

17 Jul, 2015

1 commit

  • The resend mechanism happily calls the interrupt handler of interrupts
    which are marked IRQ_NESTED_THREAD from softirq context. This can
    result in crashes because the interrupt handler is not the proper way
    to invoke the device handlers. They must be invoked via
    handle_nested_irq.

    Prevent the resend even if the interrupt has no valid parent irq
    set. Its better to have a lost interrupt than a crashing machine.

    Reported-by: Uwe Kleine-König
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     

16 Jul, 2015

1 commit

  • Pull tracing fix from Steven Rostedt:
    "Fengguang Wu discovered a crash that happened to be because of the
    branch tracer (traces unlikely and likely branches) when enabled with
    certain debug options.

    What happened was that various debug options like lockdep and
    DEBUG_PREEMPT can cause parts of the branch tracer to recurse outside
    its recursion protection. In fact, part of its recursion protection
    used these features that caused the lockup. This cleans up the code a
    little and makes the recursion protection a bit more robust"

    * tag 'trace-v4.2-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Have branch tracer use recursive field of task struct

    Linus Torvalds
     

15 Jul, 2015

1 commit

  • Boris reported that the sparse_irq protection around __cpu_up() in the
    generic code causes a regression on Xen. Xen allocates interrupts and
    some more in the xen_cpu_up() function, so it deadlocks on the
    sparse_irq_lock.

    There is no simple fix for this and we really should have the
    protection for all architectures, but for now the only solution is to
    move it to x86 where actual wreckage due to the lack of protection has
    been observed.

    Reported-and-tested-by: Boris Ostrovsky
    Fixes: a89941816726 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down'
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: xiao jin
    Cc: Joerg Roedel
    Cc: Borislav Petkov
    Cc: Yanmin Zhang
    Cc: xen-devel

    Thomas Gleixner
     

14 Jul, 2015

1 commit


13 Jul, 2015

2 commits

  • Pull timer fixes from Thomas Gleixner:
    "This update from the timer departement contains:

    - A series of patches which address a shortcoming in the tick
    broadcast code.

    If the broadcast device is not available or an hrtimer emulated
    broadcast device, some of the original assumptions lead to boot
    failures. I rather plugged all of the corner cases instead of only
    addressing the issue reported, so the change got a little larger.

    Has been extensivly tested on x86 and arm.

    - Get rid of the last holdouts using do_posix_clock_monotonic_gettime()

    - A regression fix for the imx clocksource driver

    - An update to the new state callbacks mechanism for clockevents.
    This is required to simplify the conversion, which will take place
    in 4.3"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tick/broadcast: Prevent NULL pointer dereference
    time: Get rid of do_posix_clock_monotonic_gettime
    cris: Replace do_posix_clock_monotonic_gettime()
    tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
    tick/broadcast: Handle spurious interrupts gracefully
    tick/broadcast: Check for hrtimer broadcast active early
    tick/broadcast: Return busy when IPI is pending
    tick/broadcast: Return busy if periodic mode and hrtimer broadcast
    tick/broadcast: Move the check for periodic mode inside state handling
    tick/broadcast: Prevent deep idle if no broadcast device available
    tick/broadcast: Make idle check independent from mode and config
    tick/broadcast: Sanity check the shutdown of the local clock_event
    tick/broadcast: Prevent hrtimer recursion
    clockevents: Allow set-state callbacks to be optional
    clocksource/imx: Define clocksource for mx27

    Linus Torvalds
     
  • Pull irq fix from Thomas Gleixner:
    "A single fix for a cpu hotplug race vs. interrupt descriptors:

    Prevent irq setup/teardown across the cpu starting/dying parts of cpu
    hotplug so that the starting/dying cpu has a stable view of the
    descriptor space. This has been an issue for all architectures in the
    cpu dying phase, where interrupts are migrated away from the dying
    cpu. In the starting phase its mostly a x86 issue vs the vector space
    update"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    hotplug: Prevent alloc/free of irq descriptors during cpu up/down

    Linus Torvalds
     

11 Jul, 2015

1 commit


09 Jul, 2015

2 commits

  • The load_module() error path frees a module but forgot to take it out
    of the mod_tree, leaving a dangling entry in the tree, causing havoc.

    Cc: Mathieu Desnoyers
    Reported-by: Arthur Marsh
    Tested-by: Arthur Marsh
    Fixes: 93c2e105f6bc ("module: Optimize __module_address() using a latched RB-tree")
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • The "fix" in commit 0b08c5e5944 ("audit: Fix check of return value of
    strnlen_user()") didn't fix anything, it broke things. As reported by
    Steven Rostedt:

    "Yes, strnlen_user() returns 0 on fault, but if you look at what len is
    set to, than you would notice that on fault len would be -1"

    because we just subtracted one from the return value. So testing
    against 0 doesn't test for a fault condition, it tests against a
    perfectly valid empty string.

    Also fix up the usual braindamage wrt using WARN_ON() inside a
    conditional - make it part of the conditional and remove the explicit
    unlikely() (which is already part of the WARN_ON*() logic, exactly so
    that you don't have to write unreadable code.

    Reported-and-tested-by: Steven Rostedt
    Cc: Jan Kara
    Cc: Paul Moore
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Jul, 2015

11 commits

  • Fengguang Wu's tests triggered a bug in the branch tracer's start up
    test when CONFIG_DEBUG_PREEMPT set. This was because that config
    adds some debug logic in the per cpu field, which calls back into
    the branch tracer.

    The branch tracer has its own recursive checks, but uses a per cpu
    variable to implement it. If retrieving the per cpu variable calls
    back into the branch tracer, you can see how things will break.

    Instead of using a per cpu variable, use the trace_recursion field
    of the current task struct. Simply set a bit when entering the
    branch tracing and clear it when leaving. If the bit is set on
    entry, just don't do the tracing.

    There's also the case with lockdep, as the local_irq_save() called
    before the recursion can also trigger code that can call back into
    the function. Changing that to a raw_local_irq_save() will protect
    that as well.

    This prevents the recursion and the inevitable crash that follows.

    Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com

    Cc: stable@vger.kernel.org # 3.10+
    Reported-by: Fengguang Wu
    Tested-by: Fengguang Wu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • When a cpu goes up some architectures (e.g. x86) have to walk the irq
    space to set up the vector space for the cpu. While this needs extra
    protection at the architecture level we can avoid a few race
    conditions by preventing the concurrent allocation/free of irq
    descriptors and the associated data.

    When a cpu goes down it moves the interrupts which are targeted to
    this cpu away by reassigning the affinities. While this happens
    interrupts can be allocated and freed, which opens a can of race
    conditions in the code which reassignes the affinities because
    interrupt descriptors might be freed underneath.

    Example:

    CPU1 CPU2
    cpu_up/down
    irq_desc = irq_to_desc(irq);
    remove_from_radix_tree(desc);
    raw_spin_lock(&desc->lock);
    free(desc);

    We could protect the irq descriptors with RCU, but that would require
    a full tree change of all accesses to interrupt descriptors. But
    fortunately these kind of race conditions are rather limited to a few
    things like cpu hotplug. The normal setup/teardown is very well
    serialized. So the simpler and obvious solution is:

    Prevent allocation and freeing of interrupt descriptors accross cpu
    hotplug.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    Cc: xiao jin
    Cc: Joerg Roedel
    Cc: Borislav Petkov
    Cc: Yanmin Zhang
    Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de

    Thomas Gleixner
     
  • Andriy reported that on a virtual machine the warning about negative
    expiry time in the clock events programming code triggered:

    hpet: hpet0 irq 40 for MSI
    hpet: hpet1 irq 41 for MSI
    Switching to clocksource hpet
    WARNING: at kernel/time/clockevents.c:239

    [] clockevents_program_event+0xdb/0xf0
    [] tick_handle_periodic_broadcast+0x41/0x50
    [] timer_interrupt+0x15/0x20

    When the second hpet is installed as a per cpu timer the broadcast
    event is not longer required and stopped, which sets the next_evt of
    the broadcast device to KTIME_MAX.

    If after that a spurious interrupt happens on the broadcast device,
    then the current code blindly handles it and tries to reprogram the
    broadcast device afterwards, which adds the period to
    next_evt. KTIME_MAX + period results in a negative expiry value
    causing the WARN_ON in the clockevents code to trigger.

    Add a proper check for the state of the broadcast device into the
    interrupt handler and return if the interrupt is spurious.

    [ Folded in pointer fix from Sudeep ]

    Reported-by: Andriy Gapon
    Signed-off-by: Thomas Gleixner
    Cc: Sudeep Holla
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Link: http://lkml.kernel.org/r/20150705205221.802094647@linutronix.de

    Thomas Gleixner
     
  • If the current cpu is the one which has the hrtimer based broadcast
    queued then we better return busy immediately instead of going through
    loops and hoops to figure that out.

    [ Split out from a larger combo patch ]

    Tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • Tell the idle code not to go deep if the broadcast IPI is about to
    arrive.

    [ Split out from a larger combo patch ]

    Tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • If the system is in periodic mode and the broadcast device is hrtimer
    based, return busy as we have no proper handling for this.

    [ Split out from a larger combo patch ]

    Tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • We need to check more than the periodic mode for proper operation in
    all runtime combinations. To avoid code duplication move the check
    into the enter state handling.

    No functional change.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • Add a check for a installed broadcast device to the oneshot control
    function and return busy if not.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • Currently the broadcast busy check, which prevents the idle code from
    going into deep idle, works only in one shot mode.

    If NOHZ and HIGHRES are off (config or command line) there is no
    sanity check at all, so under certain conditions cpus are allowed to
    go into deep idle, where the local timer stops, and are not woken up
    again because there is no broadcast timer installed or a hrtimer based
    broadcast device is not evaluated.

    Move tick_broadcast_oneshot_control() into the common code and provide
    proper subfunctions for the various config combinations.

    The common check in tick_broadcast_oneshot_control() is for the C3STOP
    misfeature flag of the local clock event device. If its not set, idle
    can proceed. If set, further checks are necessary.

    Provide checks for the trivial cases:

    - If broadcast is disabled in the config, then return busy

    - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
    busy if the broadcast device is hrtimer based.

    - If oneshot mode is enabled in the config call the original
    tick_broadcast_oneshot_control() function. That function needs
    extra checks which will be implemented in seperate patches.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • The broadcast code shuts down the local clock event unconditionally
    even if no broadcast device is installed or if the broadcast device is
    hrtimer based.

    Add proper sanity checks.

    [ Split out from a larger combo patch ]

    Reported-and-tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     
  • The hrtimer based broadcast vehicle can cause a hrtimer recursion
    which went unnoticed until we changed the hrtimer expiry code to keep
    track of the currently running timer.

    local_timer_interrupt()
    local_handler()
    hrtimer_interrupt()
    expire_hrtimers()
    broadcast_hrtimer()
    send_ipis()
    local_handler()
    hrtimer_interrupt()
    ....

    Solution is simple: Prevent the local handler call from the broadcast
    code when the broadcast 'device' is hrtimer based.

    [ Split out from a larger combo patch ]

    Tested-by: Sudeep Holla
    Signed-off-by: Thomas Gleixner
    Cc: Suzuki Poulose
    Cc: Lorenzo Pieralisi
    Cc: Catalin Marinas
    Cc: Rafael J. Wysocki
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Ingo Molnar
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

    Thomas Gleixner
     

07 Jul, 2015

1 commit

  • Its mandatory for the drivers to provide set_state_{oneshot|periodic}()
    (only if related modes are supported) and set_state_shutdown() callbacks
    today, if they are implementing the new set-state interface.

    But this leads to unnecessary noop callbacks for drivers which don't
    want to implement them. Over that, it will lead to a full function call
    for nothing really useful.

    Lets make all set-state callbacks optional.

    Suggested-by: Daniel Lezcano
    Signed-off-by: Viresh Kumar
    Signed-off-by: Daniel Lezcano
    Link: http://lkml.kernel.org/r/1436256875-15562-1-git-send-email-daniel.lezcano@linaro.org
    Signed-off-by: Thomas Gleixner

    Viresh Kumar
     

06 Jul, 2015

2 commits

  • According to the comments, we need to test if this is
    the first throttled task, however, list_empty() tests on
    the entry cfs_rq->throttled_list, not the head, this is wrong.

    This is a bug because we don't re-init the list entry after
    removing it from the list, so list_empty() could return false
    even if the list is really empty.

    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Ben Segall
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1435174907-432-1-git-send-email-xiyou.wangcong@gmail.com
    Signed-off-by: Ingo Molnar

    Cong Wang
     
  • Its currently possible to drop the last refcount to the aux buffer
    from NMI context, which results in the expected fireworks.

    The refcounting needs a bigger overhaul, but to cure the immediate
    problem, delay the freeing by using an irq_work.

    Reviewed-and-tested-by: Alexander Shishkin
    Reported-by: Vince Weaver
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

05 Jul, 2015

2 commits

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     
  • Pull kvm fixes from Paolo Bonzini:
    "Except for the preempt notifiers fix, these are all small bugfixes
    that could have been waited for -rc2. Sending them now since I was
    taking care of Peter's patch anyway"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm: add hyper-v crash msrs values
    KVM: x86: remove data variable from kvm_get_msr_common
    KVM: s390: virtio-ccw: don't overwrite config space values
    KVM: x86: keep track of LVT0 changes under APICv
    KVM: x86: properly restore LVT0
    KVM: x86: make vapics_in_nmi_mode atomic
    sched, preempt_notifier: separate notifier registration from static_key inc/dec

    Linus Torvalds
     

04 Jul, 2015

1 commit

  • Pull scheduler fixes from Ingo Molnar:
    "Debug info and other statistics fixes and related enhancements"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/numa: Fix numa balancing stats in /proc/pid/sched
    sched/numa: Show numa_group ID in /proc/sched_debug task listings
    sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h
    sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y
    sched/stat: Simplify the sched_info accounting dependency

    Linus Torvalds