19 Dec, 2013

1 commit

  • Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic
    kernel") moved reboot= handling to generic code. In the process it also
    removed the code in native_machine_shutdown() which are moving reboot
    process to reboot_cpu/cpu0.

    I guess that thought must have been that all reboot paths are calling
    migrate_to_reboot_cpu(), so we don't need this special handling. But
    kexec reboot path (kernel_kexec()) is not calling
    migrate_to_reboot_cpu() so above change broke kexec. Now reboot can
    happen on non-boot cpu and when INIT is sent in second kerneo to bring
    up BP, it brings down the machine.

    So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid
    this problem.

    Bisected by WANG Chao.

    Reported-by: Matthew Whitehead
    Reported-by: Dave Young
    Signed-off-by: Vivek Goyal
    Tested-by: Baoquan He
    Tested-by: WANG Chao
    Acked-by: H. Peter Anvin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     

18 Dec, 2013

1 commit


16 Dec, 2013

1 commit

  • Pull PCI updates from Bjorn Helgaas:
    "PCI device hotplug
    - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael
    Wysocki)

    Host bridge drivers
    - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn
    Helgaas)
    - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
    (Jason Gunthorpe)

    Miscellaneous
    - Avoid unnecessary CPU switch when calling .probe() (Alexander
    Duyck)
    - Revert "workqueue: allow work_on_cpu() to be called recursively"
    (Bjorn Helgaas)
    - Disable Bus Master only on kexec reboot (Khalid Aziz)
    - Omit PCI ID macro strings to shorten quirk names for LTO (Michal
    Marek)"

    * tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
    MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers
    PCI: Disable Bus Master only on kexec reboot
    PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
    PCI: Omit PCI ID macro strings to shorten quirk names
    PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
    Revert "workqueue: allow work_on_cpu() to be called recursively"
    PCI: Avoid unnecessary CPU switch when calling driver .probe() method

    Linus Torvalds
     

13 Dec, 2013

3 commits

  • Pull misc keyrings fixes from David Howells:
    "These break down into five sets:

    - A patch to error handling in the big_key type for huge payloads.
    If the payload is larger than the "low limit" and the backing store
    allocation fails, then big_key_instantiate() doesn't clear the
    payload pointers in the key, assuming them to have been previously
    cleared - but only one of them is.

    Unfortunately, the garbage collector still calls big_key_destroy()
    when sees one of the pointers with a weird value in it (and not
    NULL) which it then tries to clean up.

    - Three patches to fix the keyring type:

    * A patch to fix the hash function to correctly divide keyrings off
    from keys in the topology of the tree inside the associative
    array. This is only a problem if searching through nested
    keyrings - and only if the hash function incorrectly puts the a
    keyring outside of the 0 branch of the root node.

    * A patch to fix keyrings' use of the associative array. The
    __key_link_begin() function initially passes a NULL key pointer
    to assoc_array_insert() on the basis that it's holding a place in
    the tree whilst it does more allocation and stuff.

    This is only a problem when a node contains 16 keys that match at
    that level and we want to add an also matching 17th. This should
    easily be manufactured with a keyring full of keyrings (without
    chucking any other sort of key into the mix) - except for (a)
    above which makes it on average adding the 65th keyring.

    * A patch to fix searching down through nested keyrings, where any
    keyring in the set has more than 16 keyrings and none of the
    first keyrings we look through has a match (before the tree
    iteration needs to step to a more distal node).

    Test in keyutils test suite:

    http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1

    - A patch to fix the big_key type's use of a shmem file as its
    backing store causing audit messages and LSM check failures. This
    is done by setting S_PRIVATE on the file to avoid LSM checks on the
    file (access to the shmem file goes through the keyctl() interface
    and so is gated by the LSM that way).

    This isn't normally a problem if a key is used by the context that
    generated it - and it's currently only used by libkrb5.

    Test in keyutils test suite:

    http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e

    - A patch to add a generated file to .gitignore.

    - A patch to fix the alignment of the system certificate data such
    that it it works on s390. As I understand it, on the S390 arch,
    symbols must be 2-byte aligned because loading the address discards
    the least-significant bit"

    * tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    KEYS: correct alignment of system_certificate_list content in assembly file
    Ignore generated file kernel/x509_certificate_list
    security: shmem: implement kernel private shmem inodes
    KEYS: Fix searching of nested keyrings
    KEYS: Fix multiple key add into associative array
    KEYS: Fix the keyring hash function
    KEYS: Pre-clear struct key on allocation

    Linus Torvalds
     
  • When debugging the read-only hugepage case, I was confused by the fact
    that get_futex_key() did an access_ok() only for the non-shared futex
    case, since the user address checking really isn't in any way specific
    to the private key handling.

    Now, it turns out that the shared key handling does effectively do the
    equivalent checks inside get_user_pages_fast() (it doesn't actually
    check the address range on x86, but does check the page protections for
    being a user page). So it wasn't actually a bug, but the fact that we
    treat the address differently for private and shared futexes threw me
    for a loop.

    Just move the check up, so that it gets done for both cases. Also, use
    the 'rw' parameter for the type, even if it doesn't actually matter any
    more (it's a historical artifact of the old racy i386 "page faults from
    kernel space don't check write protections").

    Cc: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The hugepage code had the exact same bug that regular pages had in
    commit 7485d0d3758e ("futexes: Remove rw parameter from
    get_futex_key()").

    The regular page case was fixed by commit 9ea71503a8ed ("futex: Fix
    regression with read only mappings"), but the transparent hugepage case
    (added in a5b338f2b0b1: "thp: update futex compound knowledge") case
    remained broken.

    Found by Dave Jones and his trinity tool.

    Reported-and-tested-by: Dave Jones
    Cc: stable@kernel.org # v2.6.38+
    Acked-by: Thomas Gleixner
    Cc: Mel Gorman
    Cc: Darren Hart
    Cc: Andrea Arcangeli
    Cc: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Dec, 2013

4 commits

  • Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This
    results in him occasionally seeing time going backwards - which
    crashes the scheduler ...

    Most of our time accounting can actually handle that except the most
    common one; the tick time update of sched_fair.

    There is a further problem with that code; previously we assumed that
    because we get a tick every TICK_NSEC our time delta could never
    exceed 32bits and math was simpler.

    However, ever since Frederic managed to get NO_HZ_FULL merged; this is
    no longer the case since now a task can run for a long time indeed
    without getting a tick. It only takes about ~4.2 seconds to overflow
    our u32 in nanoseconds.

    This means we not only need to better deal with time going backwards;
    but also means we need to be able to deal with large deltas.

    This patch reworks the entire code and uses mul_u64_u32_shr() as
    proposed by Andy a long while ago.

    We express our virtual time scale factor in a u32 multiplier and shift
    right and the 32bit mul_u64_u32_shr() implementation reduces to a
    single 32x32->64 multiply if the time delta is still short (common
    case).

    For 64bit a 64x64->128 multiply can be used if ARCH_SUPPORTS_INT128.

    Reported-and-Tested-by: Christian Engelmayer
    Signed-off-by: Peter Zijlstra
    Cc: fweisbec@gmail.com
    Cc: Paul Turner
    Cc: Stanislaw Gruszka
    Cc: Andy Lutomirski
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Yinghai reported that he saw a /0 in sg_capacity on his EX parts.
    Make sure to always initialize power_orig now that we actually use it.

    Ideally build_sched_domains() -> init_sched_groups_power() would also
    initialize this; but for some yet unexplained reason some setups seem
    to miss updates there.

    Reported-by: Yinghai Lu
    Tested-by: Yinghai Lu
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Apart from data-type specific alignment constraints, there are also
    architecture-specific alignment requirements.
    For example, on s390 symbols must be on even addresses implying a 2-byte
    alignment. If the system_certificate_list_end symbol is on an odd address
    and if this address is loaded, the least-significant bit is ignored. As a
    result, the load_system_certificate_list() fails to load the certificates
    because of a wrong certificate length calculation.

    To be safe, align system_certificate_list on an 8-byte boundary. Also improve
    the length calculation of the system_certificate_list content. Introduce a
    system_certificate_list_size (8-byte aligned because of unsigned long) variable
    that stores the length. Let the linker calculate this size by introducing
    a start and end label for the certificate content.

    Signed-off-by: Hendrik Brueckner
    Signed-off-by: David Howells

    Hendrik Brueckner
     
  • $ git status
    # On branch pending-rebases
    # Untracked files:
    # (use "git add ..." to include in what will be committed)
    #
    # kernel/x509_certificate_list
    nothing added to commit but untracked files present (use "git add" to track)
    $

    Signed-off-by: Rusty Russell
    Signed-off-by: David Howells

    Rusty Russell
     

08 Dec, 2013

1 commit

  • Add a flag to tell the PCI subsystem that kernel is shutting down in
    preparation to kexec a kernel. Add code in PCI subsystem to use this flag
    to clear Bus Master bit on PCI devices only in case of kexec reboot.

    This fixes a power-off problem on Acer Aspire V5-573G and likely other
    machines and avoids any other issues caused by clearing Bus Master bit on
    PCI devices in normal shutdown path. The problem was introduced by
    b566a22c2332 ("PCI: disable Bus Master on PCI device shutdown").

    This patch is based on discussion at
    http://marc.info/?l=linux-pci&m=138425645204355&w=2

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=63861
    Reported-by: Chang Liu
    Signed-off-by: Khalid Aziz
    Signed-off-by: Bjorn Helgaas
    Acked-by: Konstantin Khlebnikov
    Cc: stable@vger.kernel.org # v3.5+

    Khalid Aziz
     

07 Dec, 2013

1 commit

  • …t/rostedt/linux-trace

    Pull tracing fix from Steven Rostedt:
    "A regression showed up that there's a large delay when enabling all
    events. This was prevalent when FTRACE_SELFTEST was enabled which
    enables all events several times, and caused the system bootup to
    pause for over a minute.

    This was tracked down to an addition of a synchronize_sched()
    performed when system call tracepoints are unregistered.

    The synchronize_sched() is needed between the unregistering of the
    system call tracepoint and a deletion of a tracing instance buffer.
    But placing the synchronize_sched() in the unreg of *every* system
    call tracepoint is a bit overboard. A single synchronize_sched()
    before the deletion of the instance is sufficient"

    * tag 'trace-fixes-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Only run synchronize_sched() at instance deletion time

    Linus Torvalds
     

06 Dec, 2013

1 commit

  • It has been reported that boot up with FTRACE_SELFTEST enabled can take a
    very long time. There can be stalls of over a minute.

    This was tracked down to the synchronize_sched() called when a system call
    event is disabled. As the self tests enable and disable thousands of events,
    this makes the synchronize_sched() get called thousands of times.

    The synchornize_sched() was added with d562aff93bfb53 "tracing: Add support
    for SOFT_DISABLE to syscall events" which caused this regression (added
    in 3.13-rc1).

    The synchronize_sched() is to protect against the events being accessed
    when a tracer instance is being deleted. When an instance is being deleted
    all the events associated to it are unregistered. The synchronize_sched()
    makes sure that no more users are running when it finishes.

    Instead of calling synchronize_sched() for all syscall events, we only
    need to call it once, after the events are unregistered and before the
    instance is deleted. The event_mutex is held during this action to
    prevent new users from enabling events.

    Link: http://lkml.kernel.org/r/20131203124120.427b9661@gandalf.local.home

    Reported-by: Petr Mladek
    Acked-by: Tom Zanussi
    Acked-by: Petr Mladek
    Tested-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

05 Dec, 2013

1 commit

  • Pull timer fixes from Thomas Gleixner:

    - timekeeping: Cure a subtle drift issue on GENERIC_TIME_VSYSCALL_OLD

    - nohz: Make CONFIG_NO_HZ=n and nohz=off command line option behave the
    same way. Fixes a long standing load accounting wreckage.

    - clocksource/ARM: Kconfig update to avoid ARM=n wreckage

    - clocksource/ARM: Fixlets for the AT91 and SH clocksource/clockevents

    - Trivial documentation update and kzalloc conversion from akpms pile

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off
    time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD
    clocksource: arm_arch_timer: Hide eventstream Kconfig on non-ARM
    clocksource: sh_tmu: Add clk_prepare/unprepare support
    clocksource: sh_tmu: Release clock when sh_tmu_register() fails
    clocksource: sh_mtu2: Add clk_prepare/unprepare support
    clocksource: sh_mtu2: Release clock when sh_mtu2_register() fails
    ARM: at91: rm9200: switch back to clockevents_config_and_register
    tick: Document tick_do_timer_cpu
    timer: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
    NOHZ: Check for nohz active instead of nohz enabled

    Linus Torvalds
     

03 Dec, 2013

3 commits

  • Pull irq fixes from Thomas Gleixner:
    - Correction of fuzzy and fragile IRQ_RETVAL macro
    - IRQ related resume fix affecting only XEN
    - ARM/GIC fix for chained GIC controllers

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip: Gic: fix boot for chained gics
    irq: Enable all irqs unconditionally in irq_resume
    genirq: Correct fuzzy and fragile IRQ_RETVAL() definition

    Linus Torvalds
     
  • Pull scheduler fixes from Ingo Molnar:
    "Various smaller fixlets, all over the place"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/doc: Fix generation of device-drivers
    sched: Expose preempt_schedule_irq()
    sched: Fix a trivial typo in comments
    sched: Remove unused variable in 'struct sched_domain'
    sched: Avoid NULL dereference on sd_busy
    sched: Check sched_domain before computing group power
    MAINTAINERS: Update file patterns in the lockdep and scheduler entries

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "Misc kernel and tooling fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tools lib traceevent: Fix conversion of pointer to integer of different size
    perf/trace: Properly use u64 to hold event_id
    perf: Remove fragile swevent hlist optimization
    ftrace, perf: Avoid infinite event generation loop
    tools lib traceevent: Fix use of multiple options in processing field
    perf header: Fix possible memory leaks in process_group_desc()
    perf header: Fix bogus group name
    perf tools: Tag thread comm as overriden

    Linus Torvalds
     

30 Nov, 2013

2 commits

  • Pull workqueue fixes from Tejun Heo:
    "This contains one important fix. The NUMA support added a while back
    broke ordering guarantees on ordered workqueues. It was enforced by
    having single frontend interface with @max_active == 1 but the NUMA
    support puts multiple interfaces on unbound workqueues on NUMA
    machines thus breaking the ordered guarantee. This is fixed by
    disabling NUMA support on ordered workqueues.

    The above and a couple other patches were sitting in for-3.12-fixes
    but I forgot to push that out, so they ended up waiting a bit too
    long. My aplogies.

    Other fixes are minor"

    * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues
    workqueue: fix comment typo for __queue_work()
    workqueue: fix ordered workqueues in NUMA setups
    workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY

    Linus Torvalds
     
  • Pull cgroup fixes from Tejun Heo:
    "Fixes for three issues.

    - cgroup destruction path could swamp system_wq possibly leading to
    deadlock. This actually seems to happen in the wild with memcg
    because memcg destruction path adds nested dependency on system_wq.

    Resolved by isolating cgroup destruction work items on its
    dedicated workqueue.

    - Possible locking context deadlock through seqcount reported by
    lockdep

    - Memory leak under certain conditions"

    * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: fix cgroup_subsys_state leak for seq_files
    cpuset: Fix memory allocator deadlock
    cgroup: use a dedicated workqueue for cgroup destruction

    Linus Torvalds
     

29 Nov, 2013

2 commits


28 Nov, 2013

2 commits

  • If a cgroup file implements either read_map() or read_seq_string(),
    such file is served using seq_file by overriding file->f_op to
    cgroup_seqfile_operations, which also overrides the release method to
    single_release() from cgroup_file_release().

    Because cgroup_file_open() didn't use to acquire any resources, this
    used to be fine, but since f7d58818ba42 ("cgroup: pin
    cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open()
    pins the css (cgroup_subsys_state) which is put by
    cgroup_file_release(). The patch forgot to update the release path
    for seq_files and each open/release cycle leaks a css reference.

    Fix it by updating cgroup_file_release() to also handle seq_files and
    using it for seq_file release path too.

    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org # v3.12

    Tejun Heo
     
  • Juri hit the below lockdep report:

    [ 4.303391] ======================================================
    [ 4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
    [ 4.303394] 3.12.0-dl-peterz+ #144 Not tainted
    [ 4.303395] ------------------------------------------------------
    [ 4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
    [ 4.303399] (&p->mems_allowed_seq){+.+...}, at: [] new_slab+0x6c/0x290
    [ 4.303417]
    [ 4.303417] and this task is already holding:
    [ 4.303418] (&(&q->__queue_lock)->rlock){..-...}, at: [] blk_execute_rq_nowait+0x5b/0x100
    [ 4.303431] which would create a new lock dependency:
    [ 4.303432] (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
    [ 4.303436]

    [ 4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
    [ 4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 {
    [ 4.303922] HARDIRQ-ON-W at:
    [ 4.303923] [] __lock_acquire+0x65a/0x1ff0
    [ 4.303926] [] lock_acquire+0x93/0x140
    [ 4.303929] [] kthreadd+0x86/0x180
    [ 4.303931] [] ret_from_fork+0x7c/0xb0
    [ 4.303933] SOFTIRQ-ON-W at:
    [ 4.303933] [] __lock_acquire+0x68c/0x1ff0
    [ 4.303935] [] lock_acquire+0x93/0x140
    [ 4.303940] [] kthreadd+0x86/0x180
    [ 4.303955] [] ret_from_fork+0x7c/0xb0
    [ 4.303959] INITIAL USE at:
    [ 4.303960] [] __lock_acquire+0x344/0x1ff0
    [ 4.303963] [] lock_acquire+0x93/0x140
    [ 4.303966] [] kthreadd+0x86/0x180
    [ 4.303969] [] ret_from_fork+0x7c/0xb0
    [ 4.303972] }

    Which reports that we take mems_allowed_seq with interrupts enabled. A
    little digging found that this can only be from
    cpuset_change_task_nodemask().

    This is an actual deadlock because an interrupt doing an allocation will
    hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin
    forever waiting for the write side to complete.

    Cc: John Stultz
    Cc: Mel Gorman
    Reported-by: Juri Lelli
    Signed-off-by: Peter Zijlstra
    Tested-by: Juri Lelli
    Acked-by: Li Zefan
    Acked-by: Mel Gorman
    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org

    Peter Zijlstra
     

27 Nov, 2013

2 commits

  • Tony reported that aa0d53260596 ("ia64: Use preempt_schedule_irq")
    broke PREEMPT=n builds on ia64.

    Ok, wrapped my brain around it. I tripped over the magic asm foo which
    has a single need_resched check and schedule point for both sys call
    return and interrupt return.

    So you need the schedule_preempt_irq() for kernel preemption from
    interrupt return while on a normal syscall preemption a schedule would
    be sufficient. But using schedule_preempt_irq() is not harmful here in
    any way. It just sets the preempt_active bit also in cases where it
    would not be required.

    Even on preempt=n kernels adding the preempt_active bit is completely
    harmless. So instead of having an extra function, moving the existing
    one out of the ifdef PREEMPT looks like the sanest thing to do.

    It would also allow getting rid of various other sti/schedule/cli asm
    magic in other archs.

    Reported-and-Tested-by: Tony Luck
    Fixes: aa0d53260596 ("ia64: Use preempt_schedule_irq")
    Signed-off-by: Thomas Gleixner
    [slightly edited Changelog]
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311211230030.30673@ionos.tec.linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • …it/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "This includes two fixes.

    1) is a bug fix that happens when root does the following:

    echo function_graph > current_tracer
    modprobe foo
    echo nop > current_tracer

    This causes the ftrace internal accounting to get screwed up and
    crashes ftrace, preventing the user from using the function tracer
    after that.

    2) if a TRACE_EVENT has a string field, and NULL is given for it.

    The internal trace event code does a strlen() and strcpy() on the
    source of field. If it is NULL it causes the system to oops.

    This bug has been there since 2.6.31, but no TRACE_EVENT ever passed
    in a NULL to the string field, until now"

    * tag 'trace-fixes-v3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ftrace: Fix function graph with loading of modules
    tracing: Allow events to have NULL strings

    Linus Torvalds
     

26 Nov, 2013

3 commits

  • Commit 8c4f3c3fa9681 "ftrace: Check module functions being traced on reload"
    fixed module loading and unloading with respect to function tracing, but
    it missed the function graph tracer. If you perform the following

    # cd /sys/kernel/debug/tracing
    # echo function_graph > current_tracer
    # modprobe nfsd
    # echo nop > current_tracer

    You'll get the following oops message:

    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
    Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
    CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
    Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
    0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
    0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
    ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
    Call Trace:
    [] dump_stack+0x4f/0x7c
    [] warn_slowpath_common+0x81/0x9b
    [] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
    [] warn_slowpath_null+0x1a/0x1c
    [] __ftrace_hash_rec_update.part.35+0x168/0x1b9
    [] ? __mutex_lock_slowpath+0x364/0x364
    [] ftrace_shutdown+0xd7/0x12b
    [] unregister_ftrace_graph+0x49/0x78
    [] graph_trace_reset+0xe/0x10
    [] tracing_set_tracer+0xa7/0x26a
    [] tracing_set_trace_write+0x8b/0xbd
    [] ? ftrace_return_to_handler+0xb2/0xde
    [] ? __sb_end_write+0x5e/0x5e
    [] vfs_write+0xab/0xf6
    [] ftrace_graph_caller+0x85/0x85
    [] SyS_write+0x59/0x82
    [] ftrace_graph_caller+0x85/0x85
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace 940358030751eafb ]---

    The above mentioned commit didn't go far enough. Well, it covered the
    function tracer by adding checks in __register_ftrace_function(). The
    problem is that the function graph tracer circumvents that (for a slight
    efficiency gain when function graph trace is running with a function
    tracer. The gain was not worth this).

    The problem came with ftrace_startup() which should always be called after
    __register_ftrace_function(), if you want this bug to be completely fixed.

    Anyway, this solution moves __register_ftrace_function() inside of
    ftrace_startup() and removes the need to call them both.

    Reported-by: Dave Wysochanski
    Fixes: ed926f9b35cd ("ftrace: Use counters to enable functions to trace")
    Cc: stable@vger.kernel.org # 3.0+
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • This reverts commit c2fda509667b0fda4372a237f5a59ea4570b1627.

    c2fda509667b removed lockdep annotation from work_on_cpu() to work around
    the PCI path that calls work_on_cpu() from within a work_on_cpu() work item
    (PF driver .probe() method -> pci_enable_sriov() -> add VFs -> VF driver
    .probe method).

    961da7fb6b22 ("PCI: Avoid unnecessary CPU switch when calling driver
    .probe() method) avoids that recursive work_on_cpu() use in a different
    way, so this revert restores the work_on_cpu() lockdep annotation.

    Signed-off-by: Bjorn Helgaas
    Acked-by: Tejun Heo

    Bjorn Helgaas
     
  • When the system enters suspend, it disables all interrupts in
    suspend_device_irqs(), including the interrupts marked EARLY_RESUME.

    On the resume side things are different. The EARLY_RESUME interrupts
    are reenabled in sys_core_ops->resume and the non EARLY_RESUME
    interrupts are reenabled in the normal system resume path.

    When suspend_noirq() failed or suspend is aborted for any other
    reason, we might omit the resume side call to sys_core_ops->resume()
    and therefor the interrupts marked EARLY_RESUME are not reenabled and
    stay disabled forever.

    To solve this, enable all irqs unconditionally in irq_resume()
    regardless whether interrupts marked EARLY_RESUMEhave been already
    enabled or not.

    This might try to reenable already enabled interrupts in the non
    failure case, but the only affected platform is XEN and it has been
    confirmed that it does not cause any side effects.

    [ tglx: Massaged changelog. ]

    Signed-off-by: Laxman Dewangan
    Acked-by-and-tested-by: Konrad Rzeszutek Wilk
    Acked-by: Heiko Stuebner
    Reviewed-by: Pavel Machek
    Cc:
    Cc:
    Cc:
    Cc:
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1385388587-16442-1-git-send-email-ldewangan@nvidia.com
    Signed-off-by: Thomas Gleixner

    Laxman Dewangan
     

24 Nov, 2013

1 commit

  • Pull crypto update from Herbert Xu:
    - Made x86 ablk_helper generic for ARM
    - Phase out chainiv in favour of eseqiv (affects IPsec)
    - Fixed aes-cbc IV corruption on s390
    - Added constant-time crypto_memneq which replaces memcmp
    - Fixed aes-ctr in omap-aes
    - Added OMAP3 ROM RNG support
    - Add PRNG support for MSM SoC's
    - Add and use Job Ring API in caam
    - Misc fixes

    [ NOTE! This pull request was sent within the merge window, but Herbert
    has some questionable email sending setup that makes him public enemy
    #1 as far as gmail is concerned. So most of his emails seem to be
    trapped by gmail as spam, resulting in me not seeing them. - Linus ]

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (49 commits)
    crypto: s390 - Fix aes-cbc IV corruption
    crypto: omap-aes - Fix CTR mode counter length
    crypto: omap-sham - Add missing modalias
    padata: make the sequence counter an atomic_t
    crypto: caam - Modify the interface layers to use JR API's
    crypto: caam - Add API's to allocate/free Job Rings
    crypto: caam - Add Platform driver for Job Ring
    hwrng: msm - Add PRNG support for MSM SoC's
    ARM: DT: msm: Add Qualcomm's PRNG driver binding document
    crypto: skcipher - Use eseqiv even on UP machines
    crypto: talitos - Simplify key parsing
    crypto: picoxcell - Simplify and harden key parsing
    crypto: ixp4xx - Simplify and harden key parsing
    crypto: authencesn - Simplify key parsing
    crypto: authenc - Export key parsing helper function
    crypto: mv_cesa: remove deprecated IRQF_DISABLED
    hwrng: OMAP3 ROM Random Number Generator support
    crypto: sha256_ssse3 - also test for BMI2
    crypto: mv_cesa - Remove redundant of_match_ptr
    crypto: sahara - Remove redundant of_match_ptr
    ...

    Linus Torvalds
     

23 Nov, 2013

6 commits

  • When one work starts execution, the high bits of work's data contain
    pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID
    is assigned WORK_OFFQ_POOL_NONE when the work being initialized
    indicating that no pool is associated and get_work_pool() uses it to
    check the associated pool. So if worker_pool_assign_id() assigns a
    ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers
    leakage, and it may break the non-reentrance guarantee.

    This patch fix this issue by modifying the worker_pool_assign_id()
    function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE.

    Furthermore, in the current implementation, the BUILD_BUG_ON() in
    init_workqueues makes no sense. The number of worker pools needed
    cannot be determined at compile time, because the number of backing
    pools for UNBOUND workqueues is dynamic based on the assigned custom
    attributes. So remove it.

    tj: Minor comment and indentation updates.

    Signed-off-by: Li Bin
    Signed-off-by: Tejun Heo

    Li Bin
     
  • It seems the "dying" should be "draining" here.

    Signed-off-by: Li Bin
    Signed-off-by: Tejun Heo

    Li Bin
     
  • An ordered workqueue implements execution ordering by using single
    pool_workqueue with max_active == 1. On a given pool_workqueue, work
    items are processed in FIFO order and limiting max_active to 1
    enforces the queued work items to be processed one by one.

    Unfortunately, 4c16bd327c ("workqueue: implement NUMA affinity for
    unbound workqueues") accidentally broke this guarantee by applying
    NUMA affinity to ordered workqueues too. On NUMA setups, an ordered
    workqueue would end up with separate pool_workqueues for different
    nodes. Each pool_workqueue still limits max_active to 1 but multiple
    work items may be executed concurrently and out of order depending on
    which node they are queued to.

    Fix it by using dedicated ordered_wq_attrs[] when creating ordered
    workqueues. The new attrs match the unbound ones except that no_numa
    is always set thus forcing all NUMA nodes to share the default
    pool_workqueue.

    While at it, add sanity check in workqueue creation path which
    verifies that an ordered workqueues has only the default
    pool_workqueue.

    Signed-off-by: Tejun Heo
    Reported-by: Libin
    Cc: stable@vger.kernel.org
    Cc: Lai Jiangshan

    Tejun Heo
     
  • Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
    in create_worker(). Otherwise userland can change ->cpus_allowed
    in between.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Tejun Heo

    Oleg Nesterov
     
  • Since be44562613851 ("cgroup: remove synchronize_rcu() from
    cgroup_diput()"), cgroup destruction path makes use of workqueue. css
    freeing is performed from a work item from that point on and a later
    commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two
    steps"), moves css offlining to workqueue too.

    As cgroup destruction isn't depended upon for memory reclaim, the
    destruction work items were put on the system_wq; unfortunately, some
    controller may block in the destruction path for considerable duration
    while holding cgroup_mutex. As large part of destruction path is
    synchronized through cgroup_mutex, when combined with high rate of
    cgroup removals, this has potential to fill up system_wq's max_active
    of 256.

    Also, it turns out that memcg's css destruction path ends up queueing
    and waiting for work items on system_wq through work_on_cpu(). If
    such operation happens while system_wq is fully occupied by cgroup
    destruction work items, work_on_cpu() can't make forward progress
    because system_wq is full and other destruction work items on
    system_wq can't make forward progress because the work item waiting
    for work_on_cpu() is holding cgroup_mutex, leading to deadlock.

    This can be fixed by queueing destruction work items on a separate
    workqueue. This patch creates a dedicated workqueue -
    cgroup_destroy_wq - for this purpose. As these work items shouldn't
    have inter-dependencies and mostly serialized by cgroup_mutex anyway,
    giving high concurrency level doesn't buy anything and the workqueue's
    @max_active is set to 1 so that destruction work items are executed
    one by one on each CPU.

    Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
    cgroup_destroy_wq can't be allocated from cgroup_init(). Do it from a
    separate core_initcall(). In the future, we probably want to reorder
    so that workqueue init happens before cgroup_init().

    Signed-off-by: Tejun Heo
    Reported-by: Hugh Dickins
    Reported-by: Shawn Bohrer
    Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
    Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
    Cc: stable@vger.kernel.org # v3.9+

    Tejun Heo
     
  • Since commit 1e75fa8be9f (time: Condense timekeeper.xtime
    into xtime_sec - merged in v3.6), there has been an problem
    with the error accounting in the timekeeping code, such that
    when truncating to nanoseconds, we round up to the next nsec,
    but the balancing adjustment to the ntp_error value was dropped.

    This causes 1ns per tick drift forward of the clock.

    In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD
    architectures (s390, ia64, powerpc).

    The fix is simply to balance the accounting and to subtract the
    added nanosecond from ntp_error. This allows the internal long-term
    clock steering to keep the clock accurate.

    While this fix removes the regression added in 1e75fa8be9f, the
    ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD
    and use the new VSYSCALL method, which avoids entirely the
    nanosecond granular rounding, and the resulting short-term clock
    adjustment oscillation needed to keep long term accurate time.

    [ jstultz: Many thanks to Martin for his efforts identifying this
    subtle bug, and providing the fix. ]

    Originally-from: Martin Schwidefsky
    Cc: Tony Luck
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Andy Lutomirski
    Cc: Paul Turner
    Cc: Steven Rostedt
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Cc: Fenghua Yu
    Cc: Thomas Gleixner
    Cc: stable #v3.6+
    Link: http://lkml.kernel.org/r/1385149491-20307-1-git-send-email-john.stultz@linaro.org
    Signed-off-by: John Stultz
    Signed-off-by: Thomas Gleixner

    Martin Schwidefsky
     

22 Nov, 2013

2 commits

  • Pull security subsystem updates from James Morris:
    "In this patchset, we finally get an SELinux update, with Paul Moore
    taking over as maintainer of that code.

    Also a significant update for the Keys subsystem, as well as
    maintenance updates to Smack, IMA, TPM, and Apparmor"

    and since I wanted to know more about the updates to key handling,
    here's the explanation from David Howells on that:

    "Okay. There are a number of separate bits. I'll go over the big bits
    and the odd important other bit, most of the smaller bits are just
    fixes and cleanups. If you want the small bits accounting for, I can
    do that too.

    (1) Keyring capacity expansion.

    KEYS: Consolidate the concept of an 'index key' for key access
    KEYS: Introduce a search context structure
    KEYS: Search for auth-key by name rather than target key ID
    Add a generic associative array implementation.
    KEYS: Expand the capacity of a keyring

    Several of the patches are providing an expansion of the capacity of a
    keyring. Currently, the maximum size of a keyring payload is one page.
    Subtract a small header and then divide up into pointers, that only gives
    you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses
    a keyring to store ID mapping data, that has proven to be insufficient to
    the cause.

    Whatever data structure I use to handle the keyring payload, it can only
    store pointers to keys, not the keys themselves because several keyrings
    may point to a single key. This precludes inserting, say, and rb_node
    struct into the key struct for this purpose.

    I could make an rbtree of records such that each record has an rb_node
    and a key pointer, but that would use four words of space per key stored
    in the keyring. It would, however, be able to use much existing code.

    I selected instead a non-rebalancing radix-tree type approach as that
    could have a better space-used/key-pointer ratio. I could have used the
    radix tree implementation that we already have and insert keys into it by
    their serial numbers, but that means any sort of search must iterate over
    the whole radix tree. Further, its nodes are a bit on the capacious side
    for what I want - especially given that key serial numbers are randomly
    allocated, thus leaving a lot of empty space in the tree.

    So what I have is an associative array that internally is a radix-tree
    with 16 pointers per node where the index key is constructed from the key
    type pointer and the key description. This means that an exact lookup by
    type+description is very fast as this tells us how to navigate directly to
    the target key.

    I made the data structure general in lib/assoc_array.c as far as it is
    concerned, its index key is just a sequence of bits that leads to a
    pointer. It's possible that someone else will be able to make use of it
    also. FS-Cache might, for example.

    (2) Mark keys as 'trusted' and keyrings as 'trusted only'.

    KEYS: verify a certificate is signed by a 'trusted' key
    KEYS: Make the system 'trusted' keyring viewable by userspace
    KEYS: Add a 'trusted' flag and a 'trusted only' flag
    KEYS: Separate the kernel signature checking keyring from module signing

    These patches allow keys carrying asymmetric public keys to be marked as
    being 'trusted' and allow keyrings to be marked as only permitting the
    addition or linkage of trusted keys.

    Keys loaded from hardware during kernel boot or compiled into the kernel
    during build are marked as being trusted automatically. New keys can be
    loaded at runtime with add_key(). They are checked against the system
    keyring contents and if their signatures can be validated with keys that
    are already marked trusted, then they are marked trusted also and can
    thus be added into the master keyring.

    Patches from Mimi Zohar make this usable with the IMA keyrings also.

    (3) Remove the date checks on the key used to validate a module signature.

    X.509: Remove certificate date checks

    It's not reasonable to reject a signature just because the key that it was
    generated with is no longer valid datewise - especially if the kernel
    hasn't yet managed to set the system clock when the first module is
    loaded - so just remove those checks.

    (4) Make it simpler to deal with additional X.509 being loaded into the kernel.

    KEYS: Load *.x509 files into kernel keyring
    KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate

    The builder of the kernel now just places files with the extension ".x509"
    into the kernel source or build trees and they're concatenated by the
    kernel build and stuffed into the appropriate section.

    (5) Add support for userspace kerberos to use keyrings.

    KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
    KEYS: Implement a big key type that can save to tmpfs

    Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
    We looked at storing it in keyrings instead as that confers certain
    advantages such as tickets being automatically deleted after a certain
    amount of time and the ability for the kernel to get at these tokens more
    easily.

    To make this work, two things were needed:

    (a) A way for the tickets to persist beyond the lifetime of all a user's
    sessions so that cron-driven processes can still use them.

    The problem is that a user's session keyrings are deleted when the
    session that spawned them logs out and the user's user keyring is
    deleted when the UID is deleted (typically when the last log out
    happens), so neither of these places is suitable.

    I've added a system keyring into which a 'persistent' keyring is
    created for each UID on request. Each time a user requests their
    persistent keyring, the expiry time on it is set anew. If the user
    doesn't ask for it for, say, three days, the keyring is automatically
    expired and garbage collected using the existing gc. All the kerberos
    tokens it held are then also gc'd.

    (b) A key type that can hold really big tickets (up to 1MB in size).

    The problem is that Active Directory can return huge tickets with lots
    of auxiliary data attached. We don't, however, want to eat up huge
    tracts of unswappable kernel space for this, so if the ticket is
    greater than a certain size, we create a swappable shmem file and dump
    the contents in there and just live with the fact we then have an
    inode and a dentry overhead. If the ticket is smaller than that, we
    slap it in a kmalloc()'d buffer"

    * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
    KEYS: Fix keyring content gc scanner
    KEYS: Fix error handling in big_key instantiation
    KEYS: Fix UID check in keyctl_get_persistent()
    KEYS: The RSA public key algorithm needs to select MPILIB
    ima: define '_ima' as a builtin 'trusted' keyring
    ima: extend the measurement list to include the file signature
    kernel/system_certificate.S: use real contents instead of macro GLOBAL()
    KEYS: fix error return code in big_key_instantiate()
    KEYS: Fix keyring quota misaccounting on key replacement and unlink
    KEYS: Fix a race between negating a key and reading the error set
    KEYS: Make BIG_KEYS boolean
    apparmor: remove the "task" arg from may_change_ptraced_domain()
    apparmor: remove parent task info from audit logging
    apparmor: remove tsk field from the apparmor_audit_struct
    apparmor: fix capability to not use the current task, during reporting
    Smack: Ptrace access check mode
    ima: provide hash algo info in the xattr
    ima: enable support for larger default filedata hash algorithms
    ima: define kernel parameter 'ima_template=' to change configured default
    ima: add Kconfig default measurement list template
    ...

    Linus Torvalds
     
  • Pull audit updates from Eric Paris:
    "Nothing amazing. Formatting, small bug fixes, couple of fixes where
    we didn't get records due to some old VFS changes, and a change to how
    we collect execve info..."

    Fixed conflict in fs/exec.c as per Eric and linux-next.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    audit: fix type of sessionid in audit_set_loginuid()
    audit: call audit_bprm() only once to add AUDIT_EXECVE information
    audit: move audit_aux_data_execve contents into audit_context union
    audit: remove unused envc member of audit_aux_data_execve
    audit: Kill the unused struct audit_aux_data_capset
    audit: do not reject all AUDIT_INODE filter types
    audit: suppress stock memalloc failure warnings since already managed
    audit: log the audit_names record type
    audit: add child record before the create to handle case where create fails
    audit: use given values in tty_audit enable api
    audit: use nlmsg_len() to get message payload length
    audit: use memset instead of trying to initialize field by field
    audit: fix info leak in AUDIT_GET requests
    audit: update AUDIT_INODE filter rule to comparator function
    audit: audit feature to set loginuid immutable
    audit: audit feature to only allow unsetting the loginuid
    audit: allow unsetting the loginuid (with priv)
    audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
    audit: loginuid functions coding style
    selinux: apply selinux checks on new audit message types
    ...

    Linus Torvalds
     

21 Nov, 2013

2 commits

  • Pull vfs bits and pieces from Al Viro:
    "Assorted bits that got missed in the first pull request + fixes for a
    couple of coredump regressions"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fold try_to_ascend() into the sole remaining caller
    dcache.c: get rid of pointless macros
    take read_seqbegin_or_lock() and friends to seqlock.h
    consolidate simple ->d_delete() instances
    gfs2: endianness misannotations
    dump_emit(): use __kernel_write(), not vfs_write()
    dump_align(): fix the dumb braino

    Linus Torvalds
     
  • Pull more ACPI and power management updates from Rafael Wysocki:

    - ACPI-based device hotplug fixes for issues introduced recently and a
    fix for an older error code path bug in the ACPI PCI host bridge
    driver

    - Fix for recently broken OMAP cpufreq build from Viresh Kumar

    - Fix for a recent hibernation regression related to s2disk

    - Fix for a locking-related regression in the ACPI EC driver from
    Puneet Kumar

    - System suspend error code path fix related to runtime PM and runtime
    PM documentation update from Ulf Hansson

    - cpufreq's conservative governor fix from Xiaoguang Chen

    - New processor IDs for intel_idle and turbostat and removal of an
    obsolete Kconfig option from Len Brown

    - New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and
    ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg

    - Removal of several ACPI video DMI blacklist entries that are not
    necessary any more from Aaron Lu

    - Rework of the ACPI companion representation in struct device and code
    cleanup related to that change from Rafael J Wysocki, Lan Tianyu and
    Jarkko Nikula

    - Fixes for assigning names to ACPI-enumerated I2C and SPI devices from
    Jarkko Nikula

    * tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
    PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
    ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed()
    ACPI / PCI root: Clear driver_data before failing enumeration
    ACPI / hotplug: Fix PCI host bridge hot removal
    ACPI / hotplug: Fix acpi_bus_get_device() return value check
    cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs()
    ACPI / video: clean up DMI table for initial black screen problem
    ACPI / EC: Ensure lock is acquired before accessing ec struct members
    PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()
    ACPI / AC: Remove struct acpi_device pointer from struct acpi_ac
    spi: Use stable dev_name for ACPI enumerated SPI slaves
    i2c: Use stable dev_name for ACPI enumerated I2C slaves
    ACPI: Provide acpi_dev_name accessor for struct acpi_device device name
    ACPI / bind: Use (put|get)_device() on ACPI device objects too
    ACPI: Eliminate the DEVICE_ACPI_HANDLE() macro
    ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node
    cpufreq: OMAP: Fix compilation error 'r & ret undeclared'
    PM / Runtime: Fix error path for prepare
    PM / Runtime: Update documentation around probe|remove|suspend
    cpufreq: conservative: set requested_freq to policy max when it is over policy max
    ...

    Linus Torvalds
     

20 Nov, 2013

1 commit

  • Pull networking fixes from David Miller:
    "Mostly these are fixes for fallout due to merge window changes, as
    well as cures for problems that have been with us for a much longer
    period of time"

    1) Johannes Berg noticed two major deficiencies in our genetlink
    registration. Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

    2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports. From Ben Hutchings.

    3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

    4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet(). From Alexei Starovoitov.

    5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
    Makita.

    6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

    7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease. One particularly problematic area is 802.11 AMPDU in
    wireless. From Eric Dumazet.

    8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here. Fix from Eric Dumzaet, reported by Dave Jones.

    9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

    10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account. From Michael Dalton.

    11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
    bumping, this one has been with us for a while. From Eric Dumazet.

    12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

    13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

    14) macvlan has the same issue as normal vlans did wrt. propagating LRO
    disabling down to the real device, fix it the same way. From Michal
    Kubecek.

    15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

    16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

    17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

    18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little. Fix
    from Vlad Yasevich.

    19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace. From Hannes Frederic Sowa. There is another fix in the
    works from Hannes which will prevent future problems of this nature.

    20) Fix route leak in VTI tunnel transmit, from Fan Du.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
    genetlink: make multicast groups const, prevent abuse
    genetlink: pass family to functions using groups
    genetlink: add and use genl_set_err()
    genetlink: remove family pointer from genl_multicast_group
    genetlink: remove genl_unregister_mc_group()
    hsr: don't call genl_unregister_mc_group()
    quota/genetlink: use proper genetlink multicast APIs
    drop_monitor/genetlink: use proper genetlink multicast APIs
    genetlink: only pass array to genl_register_family_with_ops()
    tcp: don't update snd_nxt, when a socket is switched from repair mode
    atm: idt77252: fix dev refcnt leak
    xfrm: Release dst if this dst is improper for vti tunnel
    netlink: fix documentation typo in netlink_set_err()
    be2net: Delete secondary unicast MAC addresses during be_close
    be2net: Fix unconditional enabling of Rx interface options
    net, virtio_net: replace the magic value
    ping: prevent NULL pointer dereference on write to msg_name
    bnx2x: Prevent "timeout waiting for state X"
    bnx2x: prevent CFC attention
    bnx2x: Prevent panic during DMAE timeout
    ...

    Linus Torvalds