23 May, 2017

38 commits

  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in async_run_entry_fn() and
    async_synchronize_cookie_domain() to handle the extra states.

    Tested-by: Mark Rutland
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Arjan van de Ven
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170516184735.865155020@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in of_iommu_driver_present() to handle the
    extra states.

    Tested-by: Mark Rutland
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Joerg Roedel
    Acked-by: Robin Murphy
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: iommu@lists.linux-foundation.org
    Link: http://lkml.kernel.org/r/20170516184735.788023442@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state checks in dmar_parse_one_atsr() and
    dmar_iommu_notify_scope_dev() to handle the extra states.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Joerg Roedel
    Cc: David Woodhouse
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: iommu@lists.linux-foundation.org
    Link: http://lkml.kernel.org/r/20170516184735.712365947@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in pas_cpufreq_cpu_exit() to handle the extra
    states.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Viresh Kumar
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Cc: Steven Rostedt
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/20170516184735.620023128@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    get_nid_for_pfn() checks for system_state == BOOTING to decide whether to
    use early_pfn_to_nid() when CONFIG_DEFERRED_STRUCT_PAGE_INIT=y.

    That check is dubious, because the switch to state RUNNING happes way after
    page_alloc_init_late() has been invoked.

    Change the check to less than RUNNING state so it covers the new
    intermediate states as well.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170516184735.528279534@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Make the decision whether a pci root is hotplugged depend on SYSTEM_RUNNING
    instead of !SYSTEM_BOOTING. It makes no sense to cover states greater than
    SYSTEM_RUNNING as there are not hotplug events on reboot and poweroff.

    Tested-by: Mark Rutland
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Cc: Greg Kroah-Hartman
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Link: http://lkml.kernel.org/r/20170516184735.446455652@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in smp_generic_cpu_bootable() to handle the
    extra states.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/20170516184735.359536998@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in stop_this_cpu() to handle the extra states.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Greg Kroah-Hartman
    Cc: James Hogan
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170516184735.283420315@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in announce_cpu() to handle the extra states.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20170516184735.191715856@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in smp_send_stop() to handle the extra states.

    Tested-by: Mark Rutland
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Mark Rutland
    Acked-by: Catalin Marinas
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20170516184735.112589728@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • To enable smp_processor_id() and might_sleep() debug checks earlier, it's
    required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.

    Adjust the system_state check in ipi_cpu_stop() to handle the extra states.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Mark Rutland
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Steven Rostedt
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/20170516184735.020718977@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Some of the boot code in init_kernel_freeable() which runs before SMP
    bringup assumes (rightfully) that it runs on the boot CPU and therefore can
    use smp_processor_id() in preemptible context.

    That works so far because the smp_processor_id() check starts to be
    effective after smp bringup. That's just wrong. Starting with SMP bringup
    and the ability to move threads around, smp_processor_id() in preemptible
    context is broken.

    Aside of that it does not make sense to allow init to run on all CPUs
    before sched_smp_init() has been run.

    Pin the init to the boot CPU so the existing code can continue to use
    smp_processor_id() without triggering the checks when the enabling of those
    checks starts earlier.

    Tested-by: Mark Rutland
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Greg Kroah-Hartman
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20170516184734.943149935@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • A customer has reported a soft-lockup when running an intensive
    memory stress test, where the trace on multiple CPU's looks like this:

    RIP: 0010:[]
    [] native_queued_spin_lock_slowpath+0x10e/0x190
    ...
    Call Trace:
    [] queued_spin_lock_slowpath+0x7/0xa
    [] change_protection_range+0x3b1/0x930
    [] change_prot_numa+0x18/0x30
    [] task_numa_work+0x1fe/0x310
    [] task_work_run+0x72/0x90

    Further investigation showed that the lock contention here is pmd_lock().

    The task_numa_work() function makes sure that only one thread is let to perform
    the work in a single scan period (via cmpxchg), but if there's a thread with
    mmap_sem locked for writing for several periods, multiple threads in
    task_numa_work() can build up a convoy waiting for mmap_sem for read and then
    all get unblocked at once.

    This patch changes the down_read() to the trylock version, which prevents the
    build up. For a workload experiencing mmap_sem contention, it's probably better
    to postpone the NUMA balancing work anyway. This seems to have fixed the soft
    lockups involving pmd_lock(), which is in line with the convoy theory.

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170515131316.21909-1-vbabka@suse.cz
    Signed-off-by: Ingo Molnar

    Vlastimil Babka
     
  • With CONFIG_RT_GROUP_SCHED=y, do_sched_rt_period_timer() sequentially
    takes each CPU's rq->lock. On a large, busy system, the cumulative time it
    takes to acquire each lock can be excessive, even triggering a watchdog
    timeout.

    If rt_rq->rt_time and rt_rq->rt_nr_running are both zero, this function does
    nothing while holding the lock, so don't bother taking it at all.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/a767637b-df85-912f-ba69-c90ee00a3fb6@oracle.com
    Signed-off-by: Ingo Molnar

    Dave Kleikamp
     
  • When priority inheritance was added back in 2.6.18 to sched_setscheduler(), it
    added a path to taking an rt-mutex wait_lock, which is not IRQ safe. As PI
    is not a common occurrence, lockdep will likely never trigger if
    sched_setscheduler was called from interrupt context. A BUG_ON() was added
    to trigger if __sched_setscheduler() was ever called from interrupt context
    because there was a possibility to take the wait_lock.

    Today the wait_lock is irq safe, but the path to taking it in
    sched_setscheduler() is the same as the path to taking it from normal
    context. The wait_lock is taken with raw_spin_lock_irq() and released with
    raw_spin_unlock_irq() which will indiscriminately enable interrupts,
    which would be bad in interrupt context.

    The problem is that normalize_rt_tasks, which is called by triggering the
    sysrq nice-all-RT-tasks was changed to call __sched_setscheduler(), and this
    is done from interrupt context!

    Now __sched_setscheduler() takes a "pi" parameter that is used to know if
    the priority inheritance should be called or not. As the BUG_ON() only cares
    about calling the PI code, it should only bug if called from interrupt
    context with the "pi" parameter set to true.

    Reported-by: Laurent Dufour
    Tested-by: Laurent Dufour
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: dbc7f069b93a ("sched: Use replace normalize_task() with __sched_setscheduler()")
    Link: http://lkml.kernel.org/r/20170308124654.10e598f2@gandalf.local.home
    Signed-off-by: Ingo Molnar

    Steven Rostedt (VMware)
     
  • pick_next_pushable_dl_task(rq) has BUG_ON(rq->cpu != task_cpu(task))
    when it returns a task other than NULL, which means that task_cpu(task)
    must be rq->cpu. So if task == next_task, then task_cpu(next_task) must
    be rq->cpu as well. Remove the redundant condition and make the code simpler.

    This way one unnecessary branch and two LOAD operations can be avoided.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Juri Lelli
    Reviewed-by: Daniel Bristot de Oliveira
    Cc:
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1494551159-22367-1-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • pick_next_pushable_task(rq) has BUG_ON(rq_cpu != task_cpu(task)) when
    it returns a task other than NULL, which means that task_cpu(task) must
    be rq->cpu. So if task == next_task, then task_cpu(next_task) must be
    rq->cpu as well. Remove the redundant condition and make the code simpler.

    This way one unnecessary branch and two LOAD operations can be avoided.

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Steven Rostedt (VMware)
    Reviewed-by: Juri Lelli
    Reviewed-by: Daniel Bristot de Oliveira
    Cc:
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1494551143-22219-1-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Now that we've added llist_for_each_entry_safe(), use it to simplify
    an open coded version of it in sched_ttwu_pending().

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1494549584-11730-1-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • Sometimes we have to dereference next field of llist node before entering
    loop becasue the node might be deleted or the next field might be
    modified within the loop. So this adds the safe version of llist_for_each(),
    that is, llist_for_each_safe().

    Signed-off-by: Byungchul Park
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Huang, Ying
    Cc:
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1494549416-10539-1-git-send-email-byungchul.park@lge.com
    Signed-off-by: Ingo Molnar

    Byungchul Park
     
  • The cpumasks in smp_call_function_many() are private and not subject
    to concurrency, atomic bitops are pointless and expensive.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Inter-Processor-Interrupt(IPI) is needed when a page is unmapped and the
    process' mm_cpumask() shows the process has ever run on other CPUs. page
    migration, page reclaim all need IPIs. The number of IPI needed to send
    to different CPUs is especially large for multi-threaded workload since
    mm_cpumask() is per process.

    For smp_call_function_many(), whenever a CPU queues a CSD to a target
    CPU, it will send an IPI to let the target CPU to handle the work.
    This isn't necessary - we need only send IPI when queueing a CSD
    to an empty call_single_queue.

    The reason:

    flush_smp_call_function_queue() that is called upon a CPU receiving an
    IPI will empty the queue and then handle all of the CSDs there. So if
    the target CPU's call_single_queue is not empty, we know that:
    i. An IPI for the target CPU has already been sent by 'previous queuers';
    ii. flush_smp_call_function_queue() hasn't emptied that CPU's queue yet.
    Thus, it's safe for us to just queue our CSD there without sending an
    addtional IPI. And for the 'previous queuers', we can limit it to the
    first queuer.

    To demonstrate the effect of this patch, a multi-thread workload that
    spawns 80 threads to equally consume 100G memory is used. This is tested
    on a 2 node broadwell-EP which has 44cores/88threads and 32G memory. So
    after 32G memory is used up, page reclaiming starts to happen a lot.

    With this patch, IPI number dropped 88% and throughput increased about
    15% for the above workload.

    Signed-off-by: Aaron Lu
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dave Hansen
    Cc: Huang Ying
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tim Chen
    Link: http://lkml.kernel.org/r/20170519075331.GE2084@aaronlu.sh.intel.com
    Signed-off-by: Ingo Molnar

    Aaron Lu
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Pull crypto fix from Herbert Xu:
    "This fixes a regression in the skcipher interface that allows bogus
    key parameters to hit underlying implementations which can cause
    crashes"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: skcipher - Add missing API setkey checks

    Linus Torvalds
     
  • Pull pstore fix from Kees Cook:
    "Marta noticed another misbehavior in EFI pstore, which this fixes.

    Hopefully this is the last of the v4.12 fixes for pstore!"

    * tag 'pstore-v4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    efi-pstore: Fix write/erase id tracking

    Linus Torvalds
     
  • Pull ACPI fixes from Rafael Wysocki:
    "These revert a 4.11 change that turned out to be problematic and add a
    .gitignore file.

    Specifics:

    - Revert a 4.11 commit related to the ACPI-based handling of laptop
    lids that made changes incompatible with existing user space stacks
    and broke things there (Lv Zheng).

    - Add .gitignore to the ACPI tools directory (Prarit Bhargava)"

    * tag 'acpi-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    Revert "ACPI / button: Remove lid_init_state=method mode"
    tools/power/acpi: Add .gitignore file

    Linus Torvalds
     
  • Pull power management fixes from Rafael Wysocki:
    "These fix RTC wakeup from suspend-to-idle broken recently, fix CPU
    idleness detection condition in the schedutil cpufreq governor, fix a
    cpufreq driver build failure, fix an error code path in the power
    capping framework, clean up the hibernate core and update the
    intel_pstate documentation.

    Specifics:

    - Fix RTC wakeup from suspend-to-idle broken by the recent rework of
    ACPI wakeup handling (Rafael Wysocki).

    - Update intel_pstate driver documentation to reflect the current
    code and explain how it works in more detail (Rafael Wysocki).

    - Fix an issue related to CPU idleness detection on systems with
    shared cpufreq policies in the schedutil governor (Juri Lelli).

    - Fix a possible build issue in the dbx500 cpufreq driver (Arnd
    Bergmann).

    - Fix a function in the power capping framework core to return an
    error code instead of 0 when there's an error (Dan Carpenter).

    - Clean up variable definition in the hibernation core (Pushkar
    Jambhlekar)"

    * tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    cpufreq: dbx500: add a Kconfig symbol
    PM / hibernate: Declare variables as static
    PowerCap: Fix an error code in powercap_register_zone()
    RTC: rtc-cmos: Fix wakeup from suspend-to-idle
    PM / wakeup: Fix up wakeup_source_report_event()
    cpufreq: intel_pstate: Document the current behavior and user interface
    cpufreq: schedutil: use now as reference when aggregating shared policy requests

    Linus Torvalds
     
  • We need to initializes those variables to 0 for platforms that do not
    provide ACPI parameters. Otherwise, we set sda_hold_time to random
    values, breaking e.g. Galileo and IOT2000 boards.

    Reported-and-tested-by: Linus Torvalds
    Reported-by: Tobias Klausmann
    Fixes: 9d6408433019 ("i2c: designware: don't infer timings described by ACPI from clock rate")
    Signed-off-by: Jan Kiszka
    Reviewed-by: Ard Biesheuvel
    Acked-by: Jarkko Nikula
    Signed-off-by: Wolfram Sang
    Signed-off-by: Linus Torvalds

    Jan Kiszka
     
  • Prior to the pstore interface refactoring, the "id" generated during
    a backend pstore_write() was only retained by the internal pstore
    inode tracking list. Additionally the "part" was ignored, so EFI
    would encode this in the id. This corrects the misunderstandings
    and correctly sets "id" during pstore_write(), and uses "part"
    directly during pstore_erase().

    Reported-by: Marta Lofstedt
    Fixes: 76cc9580e3fb ("pstore: Replace arguments for write() API")
    Fixes: a61072aae693 ("pstore: Replace arguments for erase() API")
    Signed-off-by: Kees Cook
    Tested-by: Marta Lofstedt

    Kees Cook
     
  • Pull networking fixes from David Miller:
    "Mostly netfilter bug fixes in here, but we have some bits elsewhere as
    well.

    1) Don't do SNAT replies for non-NATed connections in IPVS, from
    Julian Anastasov.

    2) Don't delete conntrack helpers while they are still in use, from
    Liping Zhang.

    3) Fix zero padding in xtables's xt_data_to_user(), from Willem de
    Bruijn.

    4) Add proper RCU protection to nf_tables_dump_set() because we
    cannot guarantee that we hold the NFNL_SUBSYS_NFTABLES lock. From
    Liping Zhang.

    5) Initialize rcv_mss in tcp_disconnect(), from Wei Wang.

    6) smsc95xx devices can't handle IPV6 checksums fully, so don't
    advertise support for offloading them. From Nisar Sayed.

    7) Fix out-of-bounds access in __ip6_append_data(), from Eric
    Dumazet.

    8) Make atl2_probe() propagate the error code properly on failures,
    from Alexey Khoroshilov.

    9) arp_target[] in bond_check_params() is used uninitialized. This
    got changes from a global static to a local variable, which is how
    this mistake happened. Fix from Jarod Wilson.

    10) Fix fallout from unnecessary NULL check removal in cls_matchall,
    from Jiri Pirko. This is definitely brown paper bag territory..."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
    net: sched: cls_matchall: fix null pointer dereference
    vsock: use new wait API for vsock_stream_sendmsg()
    bonding: fix randomly populated arp target array
    net: Make IP alignment calulations clearer.
    bonding: fix accounting of active ports in 3ad
    net: atheros: atl2: don't return zero on failure path in atl2_probe()
    ipv6: fix out of bound writes in __ip6_append_data()
    bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
    smsc95xx: Support only IPv4 TCP/UDP csum offload
    arp: always override existing neigh entries with gratuitous ARP
    arp: postpone addr_type calculation to as late as possible
    arp: decompose is_garp logic into a separate function
    arp: fixed error in a comment
    tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
    netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT
    ebtables: arpreply: Add the standard target sanity check
    netfilter: nf_tables: revisit chain/object refcounting from elements
    netfilter: nf_tables: missing sanitization in data from userspace
    netfilter: nf_tables: can't assume lock is acquired when dumping set elems
    netfilter: synproxy: fix conntrackd interaction
    ...

    Linus Torvalds
     
  • Since the head is guaranteed by the check above to be null, the call_rcu
    would explode. Remove the previously logically dead code that was made
    logically very much alive and kicking.

    Fixes: 985538eee06f ("net/sched: remove redundant null check on head")
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • As reported by Michal, vsock_stream_sendmsg() could still
    sleep at vsock_stream_has_space() after prepare_to_wait():

    vsock_stream_has_space
    vmci_transport_stream_has_space
    vmci_qpair_produce_free_space
    qp_lock
    qp_acquire_queue_mutex
    mutex_lock

    Just switch to the new wait API like we did for commit
    d9dc8b0f8b4e ("net: fix sleeping for sk_wait_event()").

    Reported-by: Michal Kubecek
    Cc: Stefan Hajnoczi
    Cc: Jorgen Hansen
    Cc: "Michael S. Tsirkin"
    Cc: Claudio Imbrenda
    Signed-off-by: Cong Wang
    Reviewed-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    WANG Cong
     
  • In commit dc9c4d0fe023, the arp_target array moved from a static global
    to a local variable. By the nature of static globals, the array used to
    be initialized to all 0. At present, it's full of random data, which
    that gets interpreted as arp_target values, when none have actually been
    specified. Systems end up booting with spew along these lines:

    [ 32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
    [ 32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
    [ 32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0
    [ 32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
    [ 32.204892] lacp0: Setting MII monitoring interval to 100
    [ 32.211071] lacp0: Removing ARP target 216.124.228.17
    [ 32.216824] lacp0: Removing ARP target 218.160.255.255
    [ 32.222646] lacp0: Removing ARP target 185.170.136.184
    [ 32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal
    [ 32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
    [ 32.243987] lacp0: Removing ARP target 56.125.228.17
    [ 32.249625] lacp0: Removing ARP target 218.160.255.255
    [ 32.255432] lacp0: Removing ARP target 15.157.233.184
    [ 32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal
    [ 32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
    [ 32.276632] lacp0: Removing ARP target 16.0.0.0
    [ 32.281755] lacp0: Removing ARP target 218.160.255.255
    [ 32.287567] lacp0: Removing ARP target 72.125.228.17
    [ 32.293165] lacp0: Removing ARP target 218.160.255.255
    [ 32.298970] lacp0: Removing ARP target 8.125.228.17
    [ 32.304458] lacp0: Removing ARP target 218.160.255.255

    None of these were actually specified as ARP targets, and the driver does
    seem to clean up the mess okay, but it's rather noisy and confusing, leaks
    values to userspace, and the 255.255.255.255 spew shows up even when debug
    prints are disabled.

    The fix: just zero out arp_target at init time.

    While we're in here, init arp_all_targets_value in the right place.

    Fixes: dc9c4d0fe023 ("bonding: reduce scope of some global variables")
    CC: Mahesh Bandewar
    CC: Jay Vosburgh
    CC: Veaceslav Falico
    CC: Andy Gospodarek
    CC: netdev@vger.kernel.org
    CC: stable@vger.kernel.org
    Signed-off-by: Jarod Wilson
    Acked-by: Andy Gospodarek
    Signed-off-by: David S. Miller

    Jarod Wilson
     
  • * pm-sleep:
    PM / hibernate: Declare variables as static
    RTC: rtc-cmos: Fix wakeup from suspend-to-idle
    PM / wakeup: Fix up wakeup_source_report_event()

    * powercap:
    PowerCap: Fix an error code in powercap_register_zone()

    Rafael J. Wysocki
     
  • * acpi-button:
    Revert "ACPI / button: Remove lid_init_state=method mode"

    * acpi-tools:
    tools/power/acpi: Add .gitignore file

    Rafael J. Wysocki
     
  • * intel_pstate:
    cpufreq: intel_pstate: Document the current behavior and user interface

    * pm-cpufreq:
    cpufreq: dbx500: add a Kconfig symbol

    * pm-cpufreq-sched:
    cpufreq: schedutil: use now as reference when aggregating shared policy requests

    Rafael J. Wysocki
     
  • The assignmnet:

    ip_align = strict ? 2 : NET_IP_ALIGN;

    in compare_pkt_ptr_alignment() trips up Coverity because we can only
    get to this code when strict is true, therefore ip_align will always
    be 2 regardless of NET_IP_ALIGN's value.

    So just assign directly to '2' and explain the situation in the
    comment above.

    Reported-by: "Gustavo A. R. Silva"
    Signed-off-by: David S. Miller

    David S. Miller
     
  • As of 7bb11dc9f59d and 0622cab0341c, bond slaves in a 3ad bond are not
    removed from the aggregator when they are down, and the active slave count
    is NOT equal to number of ports in the aggregator, but rather the number
    of ports in the aggregator that are still enabled. The sysfs spew for
    bonding_show_ad_num_ports() has a comment that says "Show number of active
    802.3ad ports.", but it's currently showing total number of ports, both
    active and inactive. Remedy it by using the same logic introduced in
    0622cab0341c in __bond_3ad_get_active_agg_info(), so sysfs, procfs and
    netlink all report the number of active ports. Note that this means that
    IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of
    NUM_PORTS, and thus perhaps should be renamed for clarity.

    Lightly tested on a dual i40e lacp bond, simulating link downs with an ip
    link set dev down, was able to produce the state where I could
    see both in the same aggregator, but a number of ports count of 1.

    MII Status: up
    Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 2
    CC: Veaceslav Falico
    CC: Andy Gospodarek
    CC: netdev@vger.kernel.org
    Signed-off-by: Jarod Wilson
    Signed-off-by: David S. Miller

    Jarod Wilson
     
  • If dma mask checks fail in atl2_probe(), it breaks off initialization,
    deallocates all resources, but returns zero.

    The patch adds proper error code return value and
    make error code setup unified.

    Found by Linux Driver Verification project (linuxtesting.org).

    Signed-off-by: Alexey Khoroshilov
    Signed-off-by: David S. Miller

    Alexey Khoroshilov
     

22 May, 2017

2 commits

  • Andrey Konovalov and idaifish@gmail.com reported crashes caused by
    one skb shared_info being overwritten from __ip6_append_data()

    Andrey program lead to following state :

    copy -4200 datalen 2000 fraglen 2040
    maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200

    The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
    fraggap, 0); is overwriting skb->head and skb_shared_info

    Since we apparently detect this rare condition too late, move the
    code earlier to even avoid allocating skb and risking crashes.

    Once again, many thanks to Andrey and syzkaller team.

    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Reported-by:
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Linus Torvalds