12 May, 2016

11 commits

  • Use the future-safe accessor for struct task_struct's.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1462969411-17735-1-git-send-email-bigeasy@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
    have no load at all.

    Uptime and /proc/loadavg on all systems with kernels released during the
    last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
    minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
    idle systems, but the way the kernel calculates this value prevents it
    from getting lower than the mentioned values.

    Likewise but not as obviously noticeable, a fully loaded system with no
    processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
    (multiplied by number of cores).

    Once the (old) load becomes 93 or higher, it mathematically can never
    get lower than 93, even when the active (load) remains 0 forever.
    This results in the strange 0.00, 0.01, 0.05 uptime values on idle
    systems. Note: 93/2048 = 0.0454..., which rounds up to 0.05.

    It is not correct to add a 0.5 rounding (=1024/2048) here, since the
    result from this function is fed back into the next iteration again,
    so the result of that +0.5 rounding value then gets multiplied by
    (2048-2037), and then rounded again, so there is a virtual "ghost"
    load created, next to the old and active load terms.

    By changing the way the internally kept value is rounded, that internal
    value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
    increasing load, the internally kept load value is rounded up, when the
    load is decreasing, the load value is rounded down.

    The modified code was tested on nohz=off and nohz kernels. It was tested
    on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
    tested on single, dual, and octal cores system. It was tested on virtual
    hosts and bare hardware. No unwanted effects have been observed, and the
    problems that the patch intended to fix were indeed gone.

    Tested-by: Damien Wyart
    Signed-off-by: Vik Heyndrickx
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Doug Smythies
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 0f004f5a696a ("sched: Cure more NO_HZ load average woes")
    Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.net
    Signed-off-by: Ingo Molnar

    Vik Heyndrickx
     
  • In calculate_imbalance() load_above_capacity currently has the unit
    [capacity] while it is used as being [load/capacity]. Not only is it
    wrong it also makes it unlikely that load_above_capacity is ever used
    as the subsequent code picks the smaller of load_above_capacity and
    the avg_load

    This patch ensures that load_above_capacity has the right unit
    [load/capacity].

    Signed-off-by: Morten Rasmussen
    [ Changed changelog to note it was in capacity unit; +rebase. ]
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Dietmar Eggemann
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggemann@arm.com
    Signed-off-by: Ingo Molnar

    Morten Rasmussen
     
  • Wanpeng noted that the scale_load_down() in calculate_imbalance() was
    weird. I agree, it should be SCHED_CAPACITY_SCALE, since we're going
    to compare against busiest->group_capacity, which is in [capacity]
    units.

    Reported-by: Wanpeng Li
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Morten Rasmussen
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Yuyang Du
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The following commit:

    9642d18eee2c ("nohz: Affine unpinned timers to housekeepers")'

    intended to affine unpinned timers to housekeepers:

    unpinned timers(full dynaticks, idle) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
    unpinned timers(full dynaticks, busy) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
    unpinned timers(houserkeepers, idle) => nearest busy housekeepers(otherwise, fallback to itself)

    However, the !idle_cpu(i) && is_housekeeping_cpu(cpu) check modified the
    intention to:

    unpinned timers(full dynaticks, idle) => any housekeepers(no mattter cpu topology)
    unpinned timers(full dynaticks, busy) => any housekeepers(no mattter cpu topology)
    unpinned timers(housekeepers, idle) => any busy cpus(otherwise, fallback to any housekeepers)

    This patch fixes it by checking if there are busy housekeepers nearby,
    otherwise falls to any housekeepers/itself. After the patch:

    unpinned timers(full dynaticks, idle) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
    unpinned timers(full dynaticks, busy) => nearest busy housekeepers(otherwise, fallback to any housekeepers)
    unpinned timers(housekeepers, idle) => nearest busy housekeepers(otherwise, fallback to itself)

    Signed-off-by: Wanpeng Li
    Signed-off-by: Peter Zijlstra (Intel)
    [ Fixed the changelog. ]
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Fixes: 'commit 9642d18eee2c ("nohz: Affine unpinned timers to housekeepers")'
    Link: http://lkml.kernel.org/r/1462344334-8303-1-git-send-email-wanpeng.li@hotmail.com
    Signed-off-by: Ingo Molnar

    Wanpeng Li
     
  • Pavan reported that in the presence of very light tasks (or cgroups)
    the placement of migrated tasks can cause severe fairness issues.

    The problem is that enqueue_entity() places the task before it updates
    time, thereby it can place the task far in the past (remember that
    light tasks will shoot virtual time forward at a high speed, so in
    relation to the pre-existing light task, we can land far in the past).

    This is done because update_curr() needs the current task, and we
    might be placing the current task.

    The obvious solution is to differentiate between the current and any
    other task; placing the current before we update time, and placing any
    other task after, such that !curr tasks end up at the current moment
    in time, and not in the past.

    This commit re-introduces the previously reverted commit:

    3a47d5124a95 ("sched/fair: Fix fairness issue on migration")

    ... which is now safe to do, after we've also fixed another
    underlying bug first, in:

    sched/fair: Prepare to fix fairness problems on migration

    and cleaned up other details in the migration code:

    sched/core: Kill sched_class::task_waking

    Reported-by: Pavan Kondeti
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • With sched_class::task_waking being called only when we do
    set_task_cpu(), we can make sched_class::migrate_task_rq() do the work
    and eliminate sched_class::task_waking entirely.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Hunter
    Cc: Ben Segall
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Mike Galbraith
    Cc: Mike Galbraith
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Pavan Kondeti
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: byungchul.park@lge.com
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Mike reported that our recent attempt to fix migration problems:

    3a47d5124a95 ("sched/fair: Fix fairness issue on migration")

    broke interactivity and the signal starve test. We reverted that
    commit and now let's try it again more carefully, with some other
    underlying problems fixed first.

    One problem is that I assumed ENQUEUE_WAKING was only set when we do a
    cross-cpu wakeup (migration), which isn't true. This means we now
    destroy the vruntime history of tasks and wakeup-preemption suffers.

    Cure this by making my assumption true, only call
    sched_class::task_waking() when we do a cross-cpu wakeup. This avoids
    the indirect call in the case we do a local wakeup.

    Reported-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Hunter
    Cc: Ben Segall
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Mike Galbraith
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Pavan Kondeti
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: byungchul.park@lge.com
    Cc: linux-kernel@vger.kernel.org
    Fixes: 3a47d5124a95 ("sched/fair: Fix fairness issue on migration")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since I want to make ->task_woken() conditional on the task getting
    migrated, we cannot use it to call record_wakee().

    Move it to select_task_rq_fair(), which gets called in almost all the
    same conditions. The only exception is if the woken task (@p) is
    CPU-bound (as per the nr_cpus_allowed test in select_task_rq()).

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Hunter
    Cc: Ben Segall
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Mike Galbraith
    Cc: Mike Galbraith
    Cc: Morten Rasmussen
    Cc: Paul Turner
    Cc: Pavan Kondeti
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: byungchul.park@lge.com
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Conflicts:
    kernel/sched/core.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

11 May, 2016

4 commits


10 May, 2016

10 commits

  • We got this warning:

    WARNING: CPU: 1 PID: 2468 at kernel/sched/core.c:1161 set_task_cpu+0x1af/0x1c0
    [...]
    Call Trace:

    dump_stack+0x63/0x87
    __warn+0xd1/0xf0
    warn_slowpath_null+0x1d/0x20
    set_task_cpu+0x1af/0x1c0
    push_dl_task.part.34+0xea/0x180
    push_dl_tasks+0x17/0x30
    __balance_callback+0x45/0x5c
    __sched_setscheduler+0x906/0xb90
    SyS_sched_setattr+0x150/0x190
    do_syscall_64+0x62/0x110
    entry_SYSCALL64_slow_path+0x25/0x25

    This corresponds to:

    WARN_ON_ONCE(p->state == TASK_RUNNING &&
    p->sched_class == &fair_sched_class &&
    (p->on_rq && !task_on_rq_migrating(p)))

    It happens because in find_lock_later_rq(), the task whose scheduling
    class was changed to fair class is still pushed away as if it were
    a deadline task ...

    So, check in find_lock_later_rq() after double_lock_balance(), if the
    scheduling class of the deadline task was changed, break and retry.

    Apply the same logic to RT tasks.

    Signed-off-by: Xunlei Pang
    Reviewed-by: Steven Rostedt
    Acked-by: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Juri Lelli
    Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.com
    Signed-off-by: Ingo Molnar

    Xunlei Pang
     
  • Josef reported that the uncore driver trips over with CONFIG_SMP=n because
    x86_max_cores is 16 instead of 12.

    The reason is, that for SMP=n the extended topology detection is a NOOP and
    the cache leaf is used to determine the number of cores. That's wrong in two
    aspects:

    1) The cache leaf enumerates the maximum addressable number of cores in the
    package, which is obviously not correct

    2) UP has no business with topology bits at all.

    Make intel_num_cpu_cores() return 1 for CONFIG_SMP=n

    Reported-by: Josef Bacik
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: kernel-team
    Cc: Kan Liang
    Link: http://lkml.kernel.org/r/761b4a2a-0332-7954-f030-c6639f949612@fb.com

    Thomas Gleixner
     
  • Pull libnvdimm build fix from Dan Williams:
    "A build fix for the usage of HPAGE_SIZE in the last libnvdimm pull
    request.

    I have taken note that the kbuild robot build success test does not
    include results for alpha_allmodconfig. Thanks to Guenter for the
    report. It's tagged for -stable since the original fix will land
    there and cause build problems"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure

    Linus Torvalds
     
  • Allowing unprivileged kernel profiling lets any user dump follow kernel
    control flow and dump kernel registers. This most likely allows trivial
    kASLR bypassing, and it may allow other mischief as well. (Off the top
    of my head, the PERF_SAMPLE_REGS_INTR output during /dev/urandom reads
    could be quite interesting.)

    Signed-off-by: Andy Lutomirski
    Acked-by: Kees Cook
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • Merge fixes from Andrew Morton:
    "2 fixes"

    * emailed patches from Andrew Morton :
    zsmalloc: fix zs_can_compact() integer overflow
    Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan""

    Linus Torvalds
     
  • zs_can_compact() has two race conditions in its core calculation:

    unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
    zs_stat_get(class, OBJ_USED);

    1) classes are not locked, so the numbers of allocated and used
    objects can change by the concurrent ops happening on other CPUs
    2) shrinker invokes it from preemptible context

    Depending on the circumstances, thus, OBJ_ALLOCATED can become
    less than OBJ_USED, which can result in either very high or
    negative `total_scan' value calculated later in do_shrink_slab().

    do_shrink_slab() has some logic to prevent those cases:

    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62

    However, due to the way `total_scan' is calculated, not every
    shrinker->count_objects() overflow can be spotted and handled.
    To demonstrate the latter, I added some debugging code to do_shrink_slab()
    (x86_64) and the results were:

    vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
    vmscan: but total_scan > 0: 92679974445502
    vmscan: resulting total_scan: 92679974445502
    [..]
    vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
    vmscan: but total_scan > 0: 22634041808232578
    vmscan: resulting total_scan: 22634041808232578

    Even though shrinker->count_objects() has returned an overflowed value,
    the resulting `total_scan' is positive, and, what is more worrisome, it
    is insanely huge. This value is getting used later on in
    shrinker->scan_objects() loop:

    while (total_scan >= batch_size ||
    total_scan >= freeable) {
    unsigned long ret;
    unsigned long nr_to_scan = min(batch_size, total_scan);

    shrinkctl->nr_to_scan = nr_to_scan;
    ret = shrinker->scan_objects(shrinker, shrinkctl);
    if (ret == SHRINK_STOP)
    break;
    freed += ret;

    count_vm_events(SLABS_SCANNED, nr_to_scan);
    total_scan -= nr_to_scan;

    cond_resched();
    }

    `total_scan >= batch_size' is true for a very-very long time and
    'total_scan >= freeable' is also true for quite some time, because
    `freeable < 0' and `total_scan' is large enough, for example,
    22634041808232578. The only break condition, in the given scheme of
    things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
    bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.

    To fix the issue, take a pool stat snapshot and use it instead of
    racy zs_stat_get() calls.

    Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Cc: [4.3+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • This reverts the 4.6-rc1 commit 7e2bc81da333 ("proc/base: make prompt
    shell start from new line after executing "cat /proc/$pid/wchan")
    because it breaks /proc/$PID/whcan formatting in ps and top.

    Revert also because the patch is inconsistent - it adds a newline at the
    end of only the '0' wchan, and does not add a newline when
    /proc/$PID/wchan contains a symbol name.

    eg.
    $ ps -eo pid,stat,wchan,comm
    PID STAT WCHAN COMMAND
    ...
    1189 S - dbus-launch
    1190 Ssl 0
    dbus-daemon
    1198 Sl 0
    lightdm
    1299 Ss ep_pol systemd
    1301 S - (sd-pam)
    1304 Ss wait sh

    Signed-off-by: Robin Humble
    Cc: Minfei Huang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Humble
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes the following issues:

    - bug in ahash SG list walking that may lead to crashes

    - resource leak in qat

    - missing RSA dependency that causes it to fail"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: rsa - select crypto mgr dependency
    crypto: hash - Fix page length clamping in hash walk
    crypto: qat - fix adf_ctl_drv.c:undefined reference to adf_init_pf_wq
    crypto: qat - fix invalid pf2vf_resp_wq logic

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Check klogctl failure correctly, from Colin Ian King.

    2) Prevent OOM when under memory pressure in flowcache, from Steffen
    Klassert.

    3) Fix info leak in llc and rtnetlink ifmap code, from Kangjie Lu.

    4) Memory barrier and multicast handling fixes in bnxt_en, from Michael
    Chan.

    5) Endianness bug in mlx5, from Daniel Jurgens.

    6) Fix disconnect handling in VSOCK, from Ian Campbell.

    7) Fix locking of netdev list walking in get_bridge_ifindices(), from
    Nikolay Aleksandrov.

    8) Bridge multicast MLD parser can look at wrong packet offsets, fix
    from Linus Lüssing.

    9) Fix chip hang in qede driver, from Sudarsana Reddy Kalluru.

    10) Fix missing setting of encapsulation before inner handling completes
    in udp_offload code, from Jarno Rajahalme.

    11) Missing rollbacks during LAG join and flood configuration failures
    in mlxsw driver, from Ido Schimmel.

    12) Fix error code checks in netxen driver, from Dan Carpenter.

    13) Fix key size in new macsec driver, from Sabrina Dubroca.

    14) Fix mlx5/VXLAN dependencies, from Arnd Bergmann.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
    net/mlx5e: make VXLAN support conditional
    Revert "net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue"
    macsec: key identifier is 128 bits, not 64
    Documentation/networking: more accurate LCO explanation
    macvtap: segmented packet is consumed
    tools: bpf_jit_disasm: check for klogctl failure
    qede: uninitialized variable in qede_start_xmit()
    netxen: netxen_rom_fast_read() doesn't return -1
    netxen: reversed condition in netxen_nic_set_link_parameters()
    netxen: fix error handling in netxen_get_flash_block()
    mlxsw: spectrum: Add missing rollback in flood configuration
    mlxsw: spectrum: Fix rollback order in LAG join failure
    udp_offload: Set encapsulation before inner completes.
    udp_tunnel: Remove redundant udp_tunnel_gro_complete().
    qede: prevent chip hang when increasing channels
    net: ipv6: tcp reset, icmp need to consider L3 domain
    bridge: fix igmp / mld query parsing
    net: bridge: fix old ioctl unlocked net device walk
    VSOCK: do not disconnect socket when peer has shutdown SEND only
    net/mlx4_en: Fix endianness bug in IPV6 csum calculation
    ...

    Linus Torvalds
     
  • gcc support for __builtin_bswap16() was supposedly added for powerpc in
    gcc 4.6, and was then later added for other architectures in gcc 4.8.

    However, Stephen Rothwell reported that attempting to use it on powerpc
    in gcc 4.6 fails with:

    lib/vsprintf.c:160:2: error: initializer element is not constant
    lib/vsprintf.c:160:2: error: (near initialization for 'decpair[0]')
    lib/vsprintf.c:160:2: error: initializer element is not constant
    lib/vsprintf.c:160:2: error: (near initialization for 'decpair[1]')
    ...

    I'm not entirely sure what those errors mean, but I don't see them on
    gcc 4.8. So let's consider gcc 4.8 to be the official starting point
    for __builtin_bswap16().

    Arnd Bergmann adds:
    "I found the commit in gcc-4.8 that replaced the powerpc-specific
    implementation of __builtin_bswap16 with an architecture-independent
    one. Apparently the powerpc version (gcc-4.6 and 4.7) just mapped to
    the lhbrx/sthbrx instructions, so it ended up not being a constant,
    though the intent of the patch was mainly to add support for the
    builtin to x86:

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52624

    has the patch that went into gcc-4.8 and more information."

    Fixes: 7322dd755e7d ("byteswap: try to avoid __builtin_constant_p gcc bug")
    Reported-by: Stephen Rothwell
    Tested-by: Stephen Rothwell
    Acked-by: Arnd Bergmann
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Josh Poimboeuf
     

09 May, 2016

11 commits

  • ... the comment clearly refers to wake_up_q(), and not
    wake_up_list().

    Signed-off-by: Davidlohr Bueso
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Link: http://lkml.kernel.org/r/1462766290-28664-1-git-send-email-dave@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     
  • Remove unused variable 'ret', and directly return 0.

    Signed-off-by: Muhammad Falak R Wani
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1462441879-10092-1-git-send-email-falakreyaz@gmail.com
    Signed-off-by: Ingo Molnar

    Muhammad Falak R Wani
     
  • Saeed Mahameed says:

    ====================
    net/mlx5e: Kconfig fixes for VxLAN

    Reposting to net the build errors fixes posted by Arnd last week.

    Originally Arnd posted those fixes to net-next, while the issue
    is also seen in net. For net-next a different approach is required
    for fixing the issue as VXLAN and Device Drivers are no longer
    dependent, but there is no harm for those fixes to get into net-next.

    Optionally, once net is merged into net-next we can
    Revert "net/mlx5e: make VXLAN support conditional" as the
    CONFIG_MLX5_CORE_EN_VXLAN will no longer be required.

    Applied on top: 288928658583 ('mlxsw: spectrum: Add missing rollback in flood configuration')
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • VXLAN can be disabled at compile-time or it can be a loadable
    module while mlx5 is built-in, which leads to a link error:

    drivers/net/built-in.o: In function `mlx5e_create_netdev':
    ntb_netdev.c:(.text+0x106de4): undefined reference to `vxlan_get_rx_port'

    This avoids the link error and makes the vxlan code optional,
    like the other ethernet drivers do as well.

    Link: https://patchwork.ozlabs.org/patch/589296/
    Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • This reverts commit 69976fb1045850a742deb9790ea49cbc6f497531.

    We cannot select VXLAN when IPv4 support is disabled, that just gives
    us additional build errors, including:

    warning: (MLX5_CORE_EN) selects VXLAN which has unmet direct dependencies (NETDEVICES && NET_CORE && INET)
    In file included from ../drivers/net/vxlan.c:36:0:
    include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
    include/net/udp_tunnel.h:112:9: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
    return iptunnel_handle_offloads(skb, type);
    ^~~~~~~~~~~~~~~~~~~~~~~~

    I'm sending a proper fix for the original bug in a separate patch.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • The MACsec standard mentions a key identifier for each key, but
    doesn't specify anything about it, so I arbitrarily chose 64 bits.

    IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the
    key identifier to be 128 bits (96 bits "member identifier" + 32 bits
    "key number").

    Signed-off-by: Sabrina Dubroca
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     
  • In few places the term "ones-complement sum" was used but the actual
    meaning is "the complement of the ones-complement sum".

    Also, avoid enclosing long statements with underscore, to ease
    readability.

    Signed-off-by: Shmulik Ladkani
    Acked-by: Edward Cree
    Signed-off-by: David S. Miller

    Shmulik Ladkani
     
  • If GSO packet is segmented and its segments are properly queued,
    we call consume_skb() instead of kfree_skb() to be drop monitor
    friendly.

    Fixes: 3e4f8b7873709 ("macvtap: Perform GSO on forwarding path.")
    Signed-off-by: Eric Dumazet
    Cc: Vlad Yasevich
    Reviewed-by: Shmulik Ladkani
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • klogctl can fail and return -ve len, so check for this and
    return NULL to avoid passing a (size_t)-1 to malloc.

    Signed-off-by: Colin Ian King
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Colin Ian King
     
  • "data_split" was never set to false. It's just uninitialized.

    Fixes: 2950219d87b0 ('qede: Add basic network device support')
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • Linus Torvalds
     

08 May, 2016

4 commits