21 Mar, 2015

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    net/core/sysctl_net_core.c
    net/ipv4/inet_diag.c

    The be_main.c conflict resolution was really tricky. The conflict
    hunks generated by GIT were very unhelpful, to say the least. It
    split functions in half and moved them around, when the real actual
    conflict only existed solely inside of one function, that being
    be_map_pci_bars().

    So instead, to resolve this, I checked out be_main.c from the top
    of net-next, then I applied the be_main.c changes from 'net' since
    the last time I merged. And this worked beautifully.

    The inet_diag.c and sysctl_net_core.c conflicts were simple
    overlapping changes, and were easily to resolve.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Mar, 2015

2 commits

  • We dynamically allocate divisor+1 entries for ->ht[] in tc_u_hnode:

    ht = kzalloc(sizeof(*ht) + divisor*sizeof(void *), GFP_KERNEL);

    So ->ht is supposed to be the last field of this struct, however
    this is broken, since an rcu head is appended after it.

    Fixes: 1ce87720d456 ("net: sched: make cls_u32 lockless")
    Cc: Jamal Hadi Salim
    Cc: John Fastabend
    Signed-off-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Kernel automatically creates a tp for each
    (kind, protocol, priority) tuple, which has handle 0,
    when we add a new filter, but it still is left there
    after we remove our own, unless we don't specify the
    handle (literally means all the filters under
    the tuple). For example this one is left:

    # tc filter show dev eth0
    filter parent 8001: protocol arp pref 49152 basic

    The user-space is hard to clean up these for kernel
    because filters like u32 are organized in a complex way.
    So kernel is responsible to remove it after all filters
    are gone. Each type of filter has its own way to
    store the filters, so each type has to provide its
    way to check if all filters are gone.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Cong Wang
     

10 Dec, 2014

1 commit


02 Oct, 2014

1 commit

  • This fixes the following crash:

    [ 63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [ 63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
    [ 63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
    [ 63.980094] RIP: 0010:[] [] u32_destroy_key+0x27/0x6d
    [ 63.980094] RSP: 0018:ffff880117dffcc0 EFLAGS: 00010202
    [ 63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
    [ 63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
    [ 63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
    [ 63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
    [ 63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
    [ 63.980094] FS: 0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
    [ 63.980094] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
    [ 63.980094] Stack:
    [ 63.980094] ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
    [ 63.980094] ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
    [ 63.980094] ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
    [ 63.980094] Call Trace:
    [ 63.980094] [] u32_delete_key_freepf_rcu+0x1b/0x1d
    [ 63.980094] [] rcu_process_callbacks+0x3bb/0x691
    [ 63.980094] [] ? rcu_process_callbacks+0x2c1/0x691
    [ 63.980094] [] ? u32_destroy_key+0x6d/0x6d
    [ 63.980094] [] __do_softirq+0x142/0x323
    [ 63.980094] [] run_ksoftirqd+0x23/0x53
    [ 63.980094] [] smpboot_thread_fn+0x203/0x221
    [ 63.980094] [] ? smpboot_unpark_thread+0x33/0x33
    [ 63.980094] [] kthread+0xc9/0xd1
    [ 63.980094] [] ? do_wait_for_common+0xf8/0x125
    [ 63.980094] [] ? __kthread_parkme+0x61/0x61
    [ 63.980094] [] ret_from_fork+0x7c/0xb0
    [ 63.980094] [] ? __kthread_parkme+0x61/0x61

    tp could be freed in call_rcu callback too, the order is not guaranteed.

    John Fastabend says:

    ====================
    Its worth noting why this is safe. Any running schedulers will either
    read the valid class field or it will be zeroed.

    All schedulers today when the class is 0 do a lookup using the
    same call used by the tcf_exts_bind(). So even if we have a running
    classifier hit the null class pointer it will do a lookup and get
    to the same result. This is particularly fragile at the moment because
    the only way to verify this is to audit the schedulers call sites.
    ====================

    Cc: John Fastabend
    Signed-off-by: Cong Wang
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    WANG Cong
     

29 Sep, 2014

1 commit


23 Sep, 2014

3 commits

  • $ grep CONFIG_CLS_U32_MARK .config
    # CONFIG_CLS_U32_MARK is not set

    net/sched/cls_u32.c: In function 'u32_change':
    net/sched/cls_u32.c:852:1: warning: label 'errout' defined but not used
    [-Wunused-label]

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Changes to the cls_u32 classifier must appear atomic to the
    readers. Before this patch if a change is requested for both
    the exts and ifindex, first the ifindex is updated then the
    exts with tcf_exts_change(). This opens a small window where
    a reader can have a exts chain with an incorrect ifindex. This
    violates the the RCU semantics.

    Here we resolve this by always passing u32_set_parms() a copy
    of the tc_u_knode to work on and then inserting it into the hash
    table after the updates have been successfully applied.

    Tested with the following short script:

    #tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 handle 1: \
    u32 divisor 256

    #tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 \
    u32 link 1: hashkey mask ffffff00 at 12 \
    match ip src 192.168.8.0/2

    #tc filter add dev p3p2 parent 8001:0 protocol ip prio 102 \
    handle 1::10 u32 classid 1:2 ht 1: \
    match ip src 192.168.8.0/8 match ip tos 0x0a 1e

    #tc filter change dev p3p2 parent 8001:0 protocol ip prio 102 \
    handle 1::10 u32 classid 1:2 ht 1: \
    match ip src 1.1.0.0/8 match ip tos 0x0b 1e

    CC: Eric Dumazet
    CC: Jamal Hadi Salim
    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This fixes a missed free_percpu in the unwind code path and when
    keys are destroyed.

    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

20 Sep, 2014

1 commit

  • tc_u32_sel 'sel' in tc_u_knode expects to be the last element in the
    structure and pads the structure with tc_u32_key fields for each key.

    kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL)

    CC: Eric Dumazet
    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

17 Sep, 2014

2 commits

  • Add missing rcu_assign_pointer and missing annotation for ht_up
    in cls_u32.c

    Caught by kbuild bot,

    >> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces)
    net/sched/cls_u32.c:378:36: expected struct tc_u_hnode *ht
    net/sched/cls_u32.c:378:36: got struct tc_u_hnode [noderef] *ht_up
    >> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces)
    net/sched/cls_u32.c:610:54: expected struct tc_u_hnode *ht
    net/sched/cls_u32.c:610:54: got struct tc_u_hnode [noderef] *ht_up
    >> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces)
    net/sched/cls_u32.c:684:18: expected struct tc_u_hnode [noderef] *ht_up
    net/sched/cls_u32.c:684:18: got struct tc_u_hnode *[assigned] ht
    >> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression

    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • kbuild test robot reported an unused variable cpu in cls_u32.c
    after the patch below. This happens when PERF and MARK config
    variables are disabled

    Fix this is to use separate variables for perf and mark
    and define the cpu variable inside the ifdef logic.

    Fixes: 459d5f626da7 ("net: sched: make cls_u32 per cpu")'
    Signed-off-by: John Fastabend
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    John Fastabend
     

14 Sep, 2014

2 commits

  • Make cls_u32 classifier safe to run without holding lock. This patch
    converts statistics that are kept in read section u32_classify into
    per cpu counters.

    This patch was tested with a tight u32 filter add/delete loop while
    generating traffic with pktgen. By running pktgen on vlan devices
    created on top of a physical device we can hit the qdisc layer
    correctly. For ingress qdisc's a loopback cable was used.

    for i in {1..100}; do
    q=`echo $i%8|bc`;
    echo -n "u32 tos: iteration $i on queue $q";
    tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
    action skbedit queue_mapping $q;
    sleep 1;
    tc filter del dev p3p2 prio $i;

    echo -n "u32 tos hash table: iteration $i on queue $q";
    tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
    tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
    match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
    tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
    ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
    sleep 2;
    tc filter del dev p3p2 prio $i
    sleep 1;
    done

    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This uses per cpu counters in cls_u32 in preparation
    to convert over to rcu.

    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

21 Jul, 2014

1 commit

  • When kernel generates a handle for a u32 filter, it tries to start
    from the max in the bucket. So when we have a filter with the max (fff)
    handle, it will cause kernel always generates the same handle for new
    filters. This can be shown by the following command:

    tc qdisc add dev eth0 ingress
    tc filter add dev eth0 parent ffff: protocol ip pref 770 handle 800::fff u32 match ip protocol 1 0xff
    tc filter add dev eth0 parent ffff: protocol ip pref 770 u32 match ip protocol 1 0xff
    ...

    we will get some u32 filters with same handle:

    # tc filter show dev eth0 parent ffff:
    filter protocol ip pref 770 u32
    filter protocol ip pref 770 u32 fh 800: ht divisor 1
    filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
    match 00010000/00ff0000 at 8
    filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
    match 00010000/00ff0000 at 8
    filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
    match 00010000/00ff0000 at 8
    filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
    match 00010000/00ff0000 at 8

    handles should be unique. This patch fixes it by looking up a bitmap,
    so that can guarantee the handle is as unique as possible. For compatibility,
    we still start from 0x800.

    Cc: "David S. Miller"
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Cong Wang
     

28 Apr, 2014

1 commit

  • When actions are attached to a filter, they are a part of the filter
    itself, so when changing a filter we should allow to overwrite the actions
    inside as well.

    In my specific case, when I tried to _append_ a new action to an existing
    filter which already has an action, I got EEXIST since kernel refused
    to overwrite the existing one in kernel.

    This patch checks if we are changing the filter checking NLM_F_CREATE flag
    (Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down
    to actions. This fixes the problem above.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Cong Wang
     

14 Jan, 2014

3 commits

  • tp->root is a void* pointer, no need to cast it.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • tcf_match_indev() is called in fast path, it is not wise to
    search for a netdev by ifindex and then compare by its name,
    just compare the ifindex.

    Also, dev->name could be changed by user-space, therefore
    the match would be always fail, but dev->ifindex could
    be consistent.

    BTW, this will also save some bytes from the core struct of u32.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • It will be needed by the next patch.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

19 Dec, 2013

2 commits

  • These information can be saved in tcf_exts, and this will
    simplify the code.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Currently actions are chained by a singly linked list,
    therefore it is a bit hard to add and remove a specific
    entry. Convert it to struct list_head so that in the
    latter patch we can remove an action without finding
    its head.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

11 Dec, 2013

1 commit


15 Jan, 2013

1 commit

  • Eric Dumazet pointed out that act_mirred needs to find the current net_ns,
    and struct net pointer is not provided in the call chain. His original
    patch made use of current->nsproxy->net_ns to find the network namespace,
    but this fails to work correctly for userspace code that makes use of
    netlink sockets in different network namespaces. Instead, pass the
    "struct net *" down along the call chain to where it is needed.

    This version removes the ifb changes as Eric has submitted that patch
    separately, but is otherwise identical to the previous version.

    Signed-off-by: Benjamin LaHaise
    Tested-by: Eric Dumazet
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     

15 Aug, 2012

1 commit

  • cls_flow.c plays with uids and gids. Unless I misread that
    code it is possible for classifiers to depend on the specific uid and
    gid values. Therefore I need to know the user namespace of the
    netlink socket that is installing the packet classifiers. Pass
    in the rtnetlink skb so I can access the NETLINK_CB of the passed
    packet. In particular I want access to sk_user_ns(NETLINK_CB(in_skb).ssk).

    Pass in not the user namespace but the incomming rtnetlink skb into
    the the classifier change routines as that is generally the more useful
    parameter.

    Cc: Jamal Hadi Salim
    Acked-by: David S. Miller
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

16 May, 2012

1 commit


02 Apr, 2012

1 commit


06 Jul, 2011

1 commit


23 Feb, 2011

1 commit


20 Jan, 2011

1 commit


05 Oct, 2010

1 commit

  • skb_headroom() is unsigned so "skb_headroom(skb) + toff" is also
    unsigned and can't be less than zero. This test was added in 66d50d25:
    "u32: negative offset fix" It was supposed to fix a regression.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

03 Aug, 2010

1 commit

  • It was possible to use a negative offset in a u32 match to reference
    the ethernet header or other parts of the link layer header.
    This fixes the regression caused by:

    commit fbc2e7d9cf49e0bf89b9e91fd60a06851a855c5d
    Author: Changli Gao
    Date: Wed Jun 2 07:32:42 2010 -0700

    cls_u32: use skb_header_pointer() to dereference data safely

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

02 Jun, 2010

1 commit

  • use skb_header_pointer() to dereference data safely

    the original skb->data dereference isn't safe, as there isn't any skb->len or
    skb_is_nonlinear() check. skb_header_pointer() is used instead in this patch.
    And when the skb isn't long enough, we terminate the function u32_classify()
    immediately with -1.

    Signed-off-by: Changli Gao
    Signed-off-by: David S. Miller

    Changli Gao
     

18 May, 2010

1 commit

  • The previous patch encourage me to go look at all the messages in
    the network scheduler and fix them. Many messages were missing
    any severity level. Some serious ones that should never happen
    were turned into WARN(), and the random noise messages that were
    handled changed to pr_debug().

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

12 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

1 commit


06 Jan, 2009

1 commit

  • New nodes are inserted in u32_change() under rtnl_lock() with wmb(),
    so without tcf_tree_lock() like in other classifiers (e.g. cls_fw).
    This isn't enough without rmb() on the read side, but on the other
    hand adding such barriers doesn't give any savings, so the lock is
    added instead.

    Reported-by: m0sia
    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

20 Nov, 2008

1 commit

  • The use of xchg() hasn't been necessary since 2.2.something when proper
    locking was added to packet schedulers. In the case of classifiers they
    mostly weren't even necessary before that since they're mainly used
    to assign a NULL pointer to the filter root in the ->destroy path;
    the root is destroyed immediately after that.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

26 Jul, 2008

1 commit

  • Removes legacy reinvent-the-wheel type thing. The generic
    machinery integrates much better to automated debugging aids
    such as kerneloops.org (and others), and is unambiguous due to
    better naming. Non-intuively BUG_TRAP() is actually equal to
    WARN_ON() rather than BUG_ON() though some might actually be
    promoted to BUG_ON() but I left that to future.

    I could make at least one BUILD_BUG_ON conversion.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

19 Jul, 2008

1 commit

  • The u32_list is just an indirect way of maintaining a reference
    to a U32 node on a per-qdisc basis.

    Just add an explicit node pointer for u32 to struct Qdisc an do
    away with this global list.

    Signed-off-by: David S. Miller

    David S. Miller