22 May, 2014

1 commit

  • Kelly reported the following crash:

    IP: [] tcf_action_exec+0x46/0x90
    PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
    RIP: 0010:[] [] tcf_action_exec+0x46/0x90
    RSP: 0018:ffff8800d21b9b90 EFLAGS: 00010283
    RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
    RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
    RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
    R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
    R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
    FS: 00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
    Stack:
    ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
    ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
    ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
    Call Trace:
    [] tcindex_classify+0x88/0x9b
    [] tc_classify_compat+0x3e/0x7b
    [] tc_classify+0x25/0x9f
    [] htb_enqueue+0x55/0x27a
    [] dsmark_enqueue+0x165/0x1a4
    [] __dev_queue_xmit+0x35e/0x536
    [] dev_queue_xmit+0x10/0x12
    [] packet_sendmsg+0xb26/0xb9a
    [] ? __lock_acquire+0x3ae/0xdf3
    [] __sock_sendmsg_nosec+0x25/0x27
    [] sock_aio_write+0xd0/0xe7
    [] do_sync_write+0x59/0x78
    [] vfs_write+0xb5/0x10a
    [] SyS_write+0x49/0x7f
    [] system_call_fastpath+0x16/0x1b

    This is because we memcpy struct tcindex_filter_result which contains
    struct tcf_exts, obviously struct list_head can not be simply copied.
    This is a regression introduced by commit 33be627159913b094bb578
    (net_sched: act: use standard struct list_head).

    It's not very easy to fix it as the code is a mess:

    if (old_r)
    memcpy(&cr, r, sizeof(cr));
    else {
    memset(&cr, 0, sizeof(cr));
    tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
    }
    ...
    tcf_exts_change(tp, &cr.exts, &e);
    ...
    memcpy(r, &cr, sizeof(cr));

    the above code should equal to:

    tcindex_filter_result_init(&cr);
    if (old_r)
    cr.res = r->res;
    ...
    if (old_r)
    tcf_exts_change(tp, &r->exts, &e);
    else
    tcf_exts_change(tp, &cr.exts, &e);
    ...
    r->res = cr.res;

    after this change, since there is no need to copy struct tcf_exts.

    And it also fixes other places zero'ing struct's contains struct tcf_exts.

    Fixes: commit 33be627159913b0 (net_sched: act: use standard struct list_head)
    Reported-by: Kelly Anderson
    Tested-by: Kelly Anderson
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

05 May, 2014

1 commit

  • hhf_change() takes the sch_tree_lock and releases it but misses the
    error cases. Fix the missed case here.

    To reproduce try a command like this,

    # tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000

    Signed-off-by: John Fastabend
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

25 Apr, 2014

1 commit

  • It is possible by passing a netlink socket to a more privileged
    executable and then to fool that executable into writing to the socket
    data that happens to be valid netlink message to do something that
    privileged executable did not intend to do.

    To keep this from happening replace bare capable and ns_capable calls
    with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
    Which act the same as the previous calls except they verify that the
    opener of the socket had the desired permissions as well.

    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

01 Apr, 2014

1 commit

  • This allows to monitor carrier on/off transitions and detect link
    flapping issues:
    - new /sys/class/net/X/carrier_changes
    - new rtnetlink IFLA_CARRIER_CHANGES (getlink)

    Tested:
    - grep . /sys/class/net/*/carrier_changes
    + ip link set dev X down/up
    + plug/unplug cable
    - updated iproute2: prints IFLA_CARRIER_CHANGES
    - iproute2 20121211-2 (debian): unchanged behavior

    Signed-off-by: David Decotigny
    Signed-off-by: David S. Miller

    david decotigny
     

19 Mar, 2014

1 commit

  • In commit b4e9b520ca5d ("[NET_SCHED]: Add mask support to fwmark
    classifier") Patrick added an u32 field in fw_head, making it slightly
    bigger than one page.

    Lets use 256 slots to make fw_hash() more straight forward, and move
    @mask to the beginning of the structure as we often use a small number
    of skb->mark. @mask and first hash buckets share the same cache line.

    This brings back the memory usage to less than 4000 bytes, and permits
    John to add a rcu_head at the end of the structure later without any
    worry.

    Signed-off-by: Eric Dumazet
    Cc: Thomas Graf
    Cc: John Fastabend
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Mar, 2014

1 commit


14 Mar, 2014

1 commit


12 Mar, 2014

2 commits

  • We have seen delays of more than 50ms in class or qdisc dumps, in case
    device is under high TX stress, even with the prior 4KB per skb limit.

    Add cond_resched() to give a chance to higher prio tasks to get cpu.

    Signed-off-by; Eric Dumazet
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Like all rtnetlink dump operations, we hold RTNL in tc_dump_qdisc(),
    so we do not need to use rcu protection to protect list of netdevices.

    This will allow preemption to occur, thus reducing latencies.
    Following patch adds explicit cond_resched() calls.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Mar, 2014

2 commits

  • Resizing fq hash table allocates memory while holding qdisc spinlock,
    with BH disabled.

    This is definitely not good, as allocation might sleep.

    We can drop the lock and get it when needed, we hold RTNL so no other
    changes can happen at the same time.

    Signed-off-by: Eric Dumazet
    Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The WARN_ON(root == &noop_qdisc)) added in qdisc_list_add()
    can trigger in normal conditions when devices are not up.
    It should be done only right before the list_add_tail() call.

    Fixes: e57a784d8cae4 ("pkt_sched: set root qdisc before change() in attach_default_qdiscs()")
    Reported-by: Valdis Kletnieks
    Tested-by: Mirco Tischler
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Mar, 2014

1 commit

  • Resizing fq hash table allocates memory while holding qdisc spinlock,
    with BH disabled.

    This is definitely not good, as allocation might sleep.

    We can drop the lock and get it when needed, we hold RTNL so no other
    changes can happen at the same time.

    Signed-off-by: Eric Dumazet
    Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Mar, 2014

1 commit


06 Mar, 2014

1 commit

  • Conflicts:
    drivers/net/wireless/ath/ath9k/recv.c
    drivers/net/wireless/mwifiex/pcie.c
    net/ipv6/sit.c

    The SIT driver conflict consists of a bug fix being done by hand
    in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
    was created (netdev_alloc_pcpu_stats()) which takes care of this.

    The two wireless conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Mar, 2014

1 commit

  • On x86_64 we have 3 holes in struct tbf_sched_data.

    The member peak_present can be replaced with peak.rate_bytes_ps,
    because peak.rate_bytes_ps is set only when peak is specified in
    tbf_change(). tbf_peak_present() is introduced to test
    peak.rate_bytes_ps.

    The member max_size is moved to fill 32bit hole.

    Signed-off-by: Hiroaki SHIMODA
    Signed-off-by: David S. Miller

    Hiroaki SHIMODA
     

28 Feb, 2014

1 commit

  • The allocated child qdisc is not freed in error conditions.
    Defer the allocation after user configuration turns out to be
    valid and acceptable.

    Fixes: cc106e441a63b ("net: sched: tbf: fix the calculation of max_size")
    Signed-off-by: Hiroaki SHIMODA
    Cc: Yang Yingliang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hiroaki SHIMODA
     

19 Feb, 2014

1 commit


18 Feb, 2014

1 commit


14 Feb, 2014

4 commits


13 Feb, 2014

5 commits


27 Jan, 2014

1 commit


23 Jan, 2014

1 commit


22 Jan, 2014

4 commits

  • Jakub Zawadzki noticed that some divisions by reciprocal_divide()
    were not correct [1][2], which he could also show with BPF code
    after divisions are transformed into reciprocal_value() for runtime
    invariance which can be passed to reciprocal_divide() later on;
    reverse in BPF dump ended up with a different, off-by-one K in
    some situations.

    This has been fixed by Eric Dumazet in commit aee636c4809fa5
    ("bpf: do not use reciprocal divide"). This follow-up patch
    improves reciprocal_value() and reciprocal_divide() to work in
    all cases by using Granlund and Montgomery method, so that also
    future use is safe and without any non-obvious side-effects.
    Known problems with the old implementation were that division by 1
    always returned 0 and some off-by-ones when the dividend and divisor
    where very large. This seemed to not be problematic with its
    current users, as far as we can tell. Eric Dumazet checked for
    the slab usage, we cannot surely say so in the case of flex_array.
    Still, in order to fix that, we propose an extension from the
    original implementation from commit 6a2d7a955d8d resp. [3][4],
    by using the algorithm proposed in "Division by Invariant Integers
    Using Multiplication" [5], Torbjörn Granlund and Peter L.
    Montgomery, that is, pseudocode for q = n/d where q, n, d is in
    u32 universe:

    1) Initialization:

    int l = ceil(log_2 d)
    uword m' = floor((1<<
    Cc: Eric Dumazet
    Cc: Austin S Hemmelgarn
    Cc: linux-kernel@vger.kernel.org
    Cc: Jesse Gross
    Cc: Jamal Hadi Salim
    Cc: Stephen Hemminger
    Cc: Matt Mackall
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Andy Gospodarek
    Cc: Veaceslav Falico
    Cc: Jay Vosburgh
    Cc: Jakub Zawadzki
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Many functions have open coded a function that returns a random
    number in range [0,N-1]. Under the assumption that we have a PRNG
    such as taus113 with being well distributed in [0, ~0U] space,
    we can implement such a function as uword t = (n*m')>>32, where
    m' is a random number obtained from PRNG, n the right open interval
    border and t our resulting random number, with n,m',t in u32 universe.

    Lets go with Joe and simply call it prandom_u32_max(), although
    technically we have an right open interval endpoint, but that we
    have documented. Other users can further be migrated to the new
    prandom_u32_max() function later on; for now, we need to make sure
    to migrate reciprocal_divide() users for the reciprocal_divide()
    follow-up fixup since their function signatures are going to change.

    Joint work with Hannes Frederic Sowa.

    Cc: Jakub Zawadzki
    Cc: Eric Dumazet
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • So that we will not expose struct tcf_common to modules.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Every action ops has a pointer to hash info, so we don't need to
    hard-code it in each module.

    Cc: Jamal Hadi Salim
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

20 Jan, 2014

2 commits


17 Jan, 2014

3 commits


15 Jan, 2014

2 commits