28 Aug, 2015

4 commits


27 Aug, 2015

8 commits

  • Make sure to indicate to tunnel driver that key.tun_id is set,
    otherwise gre won't recognize the metadata.

    Fixes: d3aa45ce6b94 ("bpf: add helpers to access tunnel metadata")
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Alexei Starovoitov says:

    ====================
    act_bpf: remove spinlock in fast path

    v1 version had a race condition in cleanup path of bpf_prog.
    I tried to fix it by adding new callback 'cleanup_rcu' to 'struct tcf_common'
    and call it out of act_api cleanup path, but Daniel noticed
    (thanks for the idea!) that most of the classifiers already do action cleanup
    out of rcu callback.
    So instead this set of patches converts tcindex and rsvp classifiers to call
    tcf_exts_destroy() after rcu grace period and since action cleanup logic
    in __tcf_hash_release() is only called when bind and refcnt goes to zero,
    it's guaranteed that cleanup() callback is called from rcu callback.
    More specifically:
    patches 1 and 2 - simple fixes
    patches 2 and 3 - convert tcf_exts_destroy in tcindex and rsvp to call_rcu
    patch 5 - removes spin_lock from act_bpf

    The cleanup of actions is now universally done after rcu grace period
    and in the future we can drop (now unnecessary) call_rcu from tcf_hash_destroy()
    patch 5 is using synchronize_rcu() in act_bpf replacement path, since it's
    very rare and alternative of dynamically allocating 'struct tcf_bpf_cfg' just
    to pass it to call_rcu looks even less appealing.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
    with extra care taken to free bpf programs after rcu grace period.
    Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
    and final destruction is done from tc_action_ops->cleanup() callback that is
    called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
    bind and refcnt reach zero which is only possible when classifier is destroyed.
    Previous two patches fixed the last two classifiers (tcindex and rsvp) to
    call tcf_exts_destroy() from rcu callback.

    Similar to gact/mirred there is a race between prog->filter and
    prog->tcf_action. Meaning that the program being replaced may use
    previous default action if it happened to return TC_ACT_UNSPEC.
    act_mirred race betwen tcf_action and tcfm_dev is similar.
    In all cases the race is harmless.
    Long term we may want to improve the situation by replacing the whole
    tc_action->priv as single pointer instead of updating inner fields one by one.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Adjust destroy path of cls_rsvp to call tcf_exts_destroy() after
    rcu grace period.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Adjust destroy path of cls_tcindex to call tcf_exts_destroy() after
    rcu grace period.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Fix harmless typo and avoid unnecessary copy of empty 'prog' into
    unused 'strcut tcf_bpf_cfg old'.

    Fixes: f4eaed28c783 ("act_bpf: fix memory leaks when replacing bpf programs")
    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • tcf_hash_destroy() used once. Make it static.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • The Kconfig option AVERAGE and its implementation has been removed by
    commit f4e774f55fe0 ("average: remove out-of-line implementation").
    Remove the dead build rule in lib/Makefile.

    Signed-off-by: Valentin Rothberg
    Reviewed-by: Johannes Berg
    Signed-off-by: David S. Miller

    Valentin Rothberg
     

26 Aug, 2015

28 commits

  • Florian Fainelli says:

    ====================
    Documentation: dsa

    This patch series adds some documentation about DSA as a subsystem as well
    as the SF2 driver since it slightly diverges from your average DSA driver ;)
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Add a document describing the Broadcom Starfigther 2 switch hardware,
    its specifics, and how the driver is implemented and its specifics.

    Signed-off-by: Florian Fainelli
    Reviewed-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Describe how the DSA subsystem works, its design principles,
    limitations, and describe in details how to implement a DSA switch
    driver.

    Acked-by: Andrew Lunn
    Acked-by: Scott Feldman
    Reviewed-by: Vivien Didelot
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Santosh Shilimkar says:

    ====================
    RDS: Few more fixes

    As indicated in the earlier series [1], this is a follow-up series which
    addresses few issues around the RDS FMR code. With [1] and the subject
    series, now I can run many parallel threads with multiple sockets with
    N x N traffic. The stress tests has survived overnight runs.

    [1] https://lkml.org/lkml/2015/8/22/127
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Memory allocated for 'ibmr' uses kzalloc_node() which already
    initialises the memory to zero. There is no need to do
    memset() 0 on that memory.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    santosh.shilimkar@oracle.com
     
  • FMR flush is an expensive and time consuming operation. Reduce the
    frequency of FMR pool flush by 50% so that more FMR work gets accumulated
    for more efficient flushing.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    santosh.shilimkar@oracle.com
     
  • RDS FMR flush operation and also it races with connect/reconect
    which happes a lot with RDS. FMR flush being on common rds_wq aggrevates
    the problem. Lets push RDS FMR pool flush work to its own worker.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    santosh.shilimkar@oracle.com
     
  • In rds_ib_flush_mr_pool(), dirty_count accounts the clean ones
    which is wrong. This can lead to a negative dirty count value.

    Lets fix it.

    Signed-off-by: Wengang Wang
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Wengang Wang
     
  • rds_rdma_unuse() drops the mr reference count which it hasn't
    taken. Correct way of removing mr is to remove mr from the tree
    and then rdma_destroy_mr() it first, then rds_mr_put() to decrement
    its reference count. Whichever thread holds last reference will free
    the mr via rds_mr_put()

    This bug was triggering weird null pointer crashes. One if the trace
    for it is captured below.

    BUG: unable to handle kernel NULL pointer dereference at
    0000000000000104
    IP: [] rds_ib_free_mr+0x31/0x130 [rds_rdma]
    PGD 4366fa067 PUD 4366f9067 PMD 0
    Oops: 0000 [#1] SMP

    [...]

    task: ffff88046da6a000 ti: ffff88046da6c000 task.ti: ffff88046da6c000
    RIP: 0010:[] []
    rds_ib_free_mr+0x31/0x130 [rds_rdma]
    RSP: 0018:ffff88046fa43bd8 EFLAGS: 00010286
    RAX: 0000000071d38b80 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880079e7ff40
    RBP: ffff88046fa43bf8 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff88046fa43ca8 R11: ffff88046a802ed8 R12: ffff880079e7fa40
    R13: 0000000000000000 R14: ffff880079e7ff40 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff88046fa40000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000104 CR3: 00000004366fb000 CR4: 00000000000006e0
    Stack:
    ffff880079e7fa40 ffff880671d38f08 ffff880079e7ff40 0000000000000296
    ffff88046fa43c28 ffffffffa087a38b ffff880079e7fa40 ffff880671d38f10
    0000000000000000 0000000000000292 ffff88046fa43c48 ffffffffa087a3b6
    Call Trace:

    [] rds_destroy_mr+0x8b/0xa0 [rds]
    [] __rds_put_mr_final+0x16/0x30 [rds]
    [] rds_rdma_unuse+0xc2/0x120 [rds]
    [] rds_recv_incoming_exthdrs+0x83/0xa0 [rds]
    [] rds_recv_incoming+0x92/0x200 [rds]
    [] rds_ib_process_recv+0x259/0x320 [rds_rdma]
    [] rds_ib_recv_tasklet_fn+0x1a8/0x490 [rds_rdma]
    [] ? __remove_hrtimer+0x58/0x90
    [] tasklet_action+0xb1/0xc0
    [] __do_softirq+0xe2/0x290
    [] irq_exit+0xa6/0xb0
    [] do_IRQ+0x65/0xf0
    [] common_interrupt+0x6b/0x6b

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    santosh.shilimkar@oracle.com
     
  • On rds_ib_frag_slab allocation failure, ensure rds_ib_incoming_slab
    is not pointing to the detsroyed memory.

    Signed-off-by: Santosh Shilimkar
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    santosh.shilimkar@oracle.com
     
  • Antonio Quartulli says:

    ====================
    Included changes:
    - code restyling and beautification
    - use int kernel types instead of C99
    - update kereldoc
    - prevent potential hlist double deletion of VLAN objects
    - fix gw bandwidth calculation
    - convert list to hlist when needed
    - add lockdep_asserts calls in function with lock requirements
    described in kerneldoc
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • posted_index is RO in firmware. We need not do ioread everytime to get
    posted index. Store posted index locally.

    Signed-off-by: Govindarajulu Varadarajan
    Signed-off-by: David S. Miller

    Govindarajulu Varadarajan
     
  • The r8169 driver collects statistical information returned by
    @get_stats64 by counting them in the driver itself, even though many
    (but not all) of the values are already collected by tally counters
    (TCs) in the NIC. Some of these TC values are not returned by
    @get_stats64. Especially the received multicast packages are missing
    from /proc/net/dev.

    Rectify this by fetching the TCs and returning them from
    rtl8169_get_stats64.

    The counters collected in the driver obviously disappear as soon as the
    driver is unloaded so after a driver is loaded the counters always start
    at 0. The TCs on the other hand are only reset by a power cycle. Without
    further considerations the values collected by the driver would not match
    up against the TC values.

    This patch introduces a new function rtl8169_reset_counters which
    resets the TCs. Also, since rtl8169_reset_counters shares most of
    its code with rtl8169_update_counters, refactor the shared code into
    two new functions rtl8169_map_counters and rtl8169_unmap_counters.

    Unfortunately chip versions prior to RTL_GIGA_MAC_VER_19 don't allow
    to reset the TCs programatically. Therefore introduce an addition to
    the rtl8169_private struct and a function rtl8169_init_counter_offsets
    to store the TCs at first rtl_open. Use these values as offsets in
    rtl8169_get_stats64. Propagate a failure to reset *and* update the
    counters up to rtl_open and emit a warning message, if so.

    Signed-off-by: Corinna Vinschen
    Signed-off-by: David S. Miller

    Corinna Vinschen
     
  • >> net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base types)
    net/rds/ib_recv.c:382:28: expected int [signed] can_wait
    net/rds/ib_recv.c:382:28: got restricted gfp_t
    net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64

    Reported-by: kbuild test robot
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Shreyas Bhatewara would no longer maintain the vmxnet3 driver. Taking over
    the role of vmxnet3 maintainer.

    Signed-off-by: Shrikrishna Khare
    Signed off-by: Shreyas Bhatewara
    Signed-off-by: David S. Miller

    Shrikrishna Khare
     
  • The vxlan_get_sk_family inline function was added after the last #endif,
    making multiple inclusion of net/vxlan.h fail. Move it to the proper place.

    Reported-by: Mark Rustad
    Fixes: 705cc62f6728c ("vxlan: provide access function for vxlan socket address family")
    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • Add entry for new VRF device driver.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • This patch fixes the following crash:

    general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff88010656d280 ti: ffff880106570000 task.ti: ffff880106570000
    RIP: 0010:[] [] dst_destroy+0xa6/0xef
    RSP: 0018:ffff880107603e38 EFLAGS: 00010202
    RAX: 0000000000000001 RBX: ffff8800d225a000 RCX: ffffffff82250fd0
    RDX: 0000000000000001 RSI: ffffffff82250fd0 RDI: 6b6b6b6b6b6b6b6b
    RBP: ffff880107603e58 R08: 0000000000000001 R09: 0000000000000001
    R10: 000000000000b530 R11: ffff880107609000 R12: 0000000000000000
    R13: ffffffff82343c40 R14: 0000000000000000 R15: ffffffff8182fb4f
    FS: 0000000000000000(0000) GS:ffff880107600000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007fcabd9d3000 CR3: 00000000d7279000 CR4: 00000000000006e0
    Stack:
    ffffffff82250fd0 ffff8801077d6f00 ffffffff82253c40 ffff8800d225a000
    ffff880107603e68 ffffffff8182fb5d ffff880107603f08 ffffffff810d795e
    ffffffff810d7648 ffff880106574000 ffff88010656d280 ffff88010656d280
    Call Trace:

    [] dst_destroy_rcu+0xe/0x1d
    [] rcu_process_callbacks+0x618/0x7eb
    [] ? rcu_process_callbacks+0x302/0x7eb
    [] ? dst_gc_task+0x1eb/0x1eb
    [] __do_softirq+0x178/0x39f
    [] irq_exit+0x41/0x95
    [] smp_apic_timer_interrupt+0x34/0x40
    [] apic_timer_interrupt+0x6d/0x80

    [] ? default_idle+0x21/0x32
    [] ? default_idle+0x1f/0x32
    [] arch_cpu_idle+0xf/0x11
    [] default_idle_call+0x1f/0x21
    [] cpu_startup_entry+0x1ad/0x273
    [] start_secondary+0x135/0x156

    dst is freed right before lwtstate_put(), this is not correct...

    Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
    Acked-by: Jiri Benc
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • This patch fix following warnings.

    .//net/core/skbuff.c:407: warning: No description found
    for parameter 'len'
    .//net/core/skbuff.c:407: warning: Excess function parameter
    'length' description in '__netdev_alloc_skb'
    .//net/core/skbuff.c:476: warning: No description found
    for parameter 'len'
    .//net/core/skbuff.c:476: warning: Excess function parameter
    'length' description in '__napi_alloc_skb'

    Signed-off-by: Masanari Iida
    Signed-off-by: David S. Miller

    Masanari Iida
     
  • Let packets move from one netns to the other at PPP encapsulation and
    decapsulation time.

    PPP units and channels remain in the netns in which they were
    originally created. Only the net_device may move to a different
    namespace. Cross netns handling is thus transparent to lower PPP
    layers (PPPoE, L2TP, etc.).

    PPP devices are automatically unregistered when their netns gets
    removed. So read() and poll() on the unit file descriptor will
    respectively receive EOF and POLLHUP. Channels aren't affected.

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • Claim the emac sram ourselves, rather then relying on the bootloader
    having mapped the sram to the emac controller during boot.

    Signed-off-by: Hans de Goede
    Signed-off-by: David S. Miller

    Hans de Goede
     
  • Remove various inlined functions not referenced in the kernel.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • To avoid multiply/division operations on the data path,
    we hold a {channel, tc}==>txq mapping table.
    We held this mapping table inside the channel object that is
    being destroyed upon some configuration operations (e.g MTU change).
    So in case ndo_select_queue occurs during such a configuration operation,
    it may access a NULL channel pointer, resulting in kernel panic.
    To fix this issue we moved the {channel, tc}==>txq mapping table
    outside the channel object so that it will be available also
    during such configuration operations.

    Signed-off-by: Rana Shahout
    Signed-off-by: David S. Miller

    Rana Shahout
     
  • Noticed by Herbert Xu.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Return a negative error code on failure.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @@
    identifier ret; expression e1,e2;
    @@
    (
    if (\(ret < 0\|ret != 0\))
    { ... return ret; }
    |
    ret = 0
    )
    ... when != ret = e1
    when != &ret
    *if(...)
    {
    ... when != ret = e2
    when forall
    return ret;
    }
    //

    Signed-off-by: Julia Lawall
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • Return a negative error code on failure.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @@
    identifier ret; expression e1,e2;
    @@
    (
    if (\(ret < 0\|ret != 0\))
    { ... return ret; }
    |
    ret = 0
    )
    ... when != ret = e1
    when != &ret
    *if(...)
    {
    ... when != ret = e2
    when forall
    return ret;
    }
    //

    Signed-off-by: Julia Lawall
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • Propagate error code on failure.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @@
    identifier ret; expression e1,e2;
    @@
    (
    if (\(ret < 0\|ret != 0\))
    { ... return ret; }
    |
    ret = 0
    )
    ... when != ret = e1
    when != &ret
    *if(...)
    {
    ... when != ret = e2
    when forall
    return ret;
    }
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • Santosh Shilimkar says:

    ====================
    RDS: Assorted bug fixes

    We would like to improve RDS upstream support and in that context, I
    started playing with it. But run into number of issues including as
    basic is RDS IB RDMA doesn't work. As part of the debug, I ended up
    creating the $subject series which has bunch of assorted fixes. At
    least with this series I can run RDS IB RDMA and other tests
    successfully.

    Some of these fixes have been done by Chris Meson, Andy Grover and
    Zach Brown while at Oracle. There are still more kinks with FMR and
    error handling and I plan to address them in a follow up series.

    Series generated against Linus's master(v4.2-rc-7) but also applies
    against next-next cleanly. Its tested on Oracle hardware with IB
    fabric for both bcopy as well as RDMA mode. I don't have access
    to iWARP hardware so any testing help on iWARP hardware appreciated.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller