06 Aug, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds
     

17 Jul, 2020

2 commits

  • This reverts commit aebe4426ccaa4838f36ea805cdf7d76503e65117.

    Signed-off-by: Petr Machata
    Signed-off-by: Jakub Kicinski

    Petr Machata
     
  • Using uninitialized_var() is dangerous as it papers over real bugs[1]
    (or can in the future), and suppresses unrelated compiler warnings
    (e.g. "unused variable"). If the compiler thinks it is uninitialized,
    either simply initialize the variable or make compiler changes.

    In preparation for removing[2] the[3] macro[4], remove all remaining
    needless uses with the following script:

    git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
    xargs perl -pi -e \
    's/\buninitialized_var\(([^\)]+)\)/\1/g;
    s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

    drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
    pathological white-space.

    No outstanding warnings were found building allmodconfig with GCC 9.3.0
    for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
    alpha, and m68k.

    [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
    [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
    [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
    [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

    Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
    Acked-by: Jason Gunthorpe # IB
    Acked-by: Kalle Valo # wireless drivers
    Reviewed-by: Chao Yu # erofs
    Signed-off-by: Kees Cook

    Kees Cook
     

08 Jul, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

30 Jun, 2020

1 commit

  • A following patch introduces qevents, points in qdisc algorithm where
    packet can be processed by user-defined filters. Should this processing
    lead to a situation where a new packet is to be enqueued on the same port,
    holding the root lock would lead to deadlocks. To solve the issue, qevent
    handler needs to unlock and relock the root lock when necessary.

    To that end, add the root lock argument to the qdisc op enqueue, and
    propagate throughout.

    Signed-off-by: Petr Machata
    Signed-off-by: David S. Miller

    Petr Machata
     

21 Jun, 2020

1 commit

  • The user tool modinfo is used to get information on kernel modules, including a
    description where it is available.

    This patch adds a brief MODULE_DESCRIPTION to the following modules:

    9p
    drop_monitor
    esp4_offload
    esp6_offload
    fou
    fou6
    ila
    sch_fq
    sch_fq_codel
    sch_hhf

    Signed-off-by: Rob Gill
    Signed-off-by: David S. Miller

    Rob Gill
     

28 Apr, 2020

1 commit


23 Oct, 2019

1 commit


09 Aug, 2019

1 commit

  • Fixes gcc '-Wunused-but-set-variable' warning:

    net/sched/sch_fq_codel.c: In function fq_codel_dequeue:
    net/sched/sch_fq_codel.c:288:23: warning: variable prev_ecn_mark set but not used [-Wunused-but-set-variable]
    net/sched/sch_fq_codel.c:288:6: warning: variable prev_drop_count set but not used [-Wunused-but-set-variable]

    They are not used since commit 77ddaff218fc ("fq_codel: Kill
    useless per-flow dropped statistic")

    Reported-by: Hulk Robot
    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller

    YueHaibing
     

07 Aug, 2019

2 commits

  • It is almost impossible to get anything other than a 0 out of
    flow->dropped statistic with a tc class dump, as it resets to 0
    on every round.

    It also conflates ecn marks with drops.

    It would have been useful had it kept a cumulative drop count, but
    it doesn't. This patch doesn't change the API, it just stops
    tracking a stat and state that is impossible to measure and nobody
    uses.

    Signed-off-by: Dave Taht
    Signed-off-by: David S. Miller

    Dave Taht
     
  • In the field fq_codel is often used with a smaller memory or
    packet limit than the default, and when the bulk dropper is hit,
    the drop pattern bifircates into one that more slowly increases
    the codel drop rate and hits the bulk dropper more than it should.

    The scan through the 1024 queues happens more often than it needs to.

    This patch increases the codel count in the bulk dropper, but
    does not change the drop rate there, relying on the next codel round
    to deliver the next packet at the original drop rate
    (after that burst of loss), then escalate to a higher signaling rate.

    Signed-off-by: Dave Taht
    Signed-off-by: David S. Miller

    Dave Taht
     

18 Jul, 2019

1 commit

  • For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
    notably fq_codel, it makes no sense to let packets bypass the TC
    filters we setup in any scenario, otherwise our packets steering
    policy could not be enforced.

    This can be reproduced easily with the following script:

    ip li add dev dummy0 type dummy
    ifconfig dummy0 up
    tc qd add dev dummy0 root fq_codel
    tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
    tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
    ping -I dummy0 192.168.112.1

    Without this patch, packets are sent directly to dummy0 without
    hitting any of the filters. With this patch, packets are redirected
    to loopback as expected.

    This fix is not perfect, it only unsets the flag but does not set it back
    because we have to save the information somewhere in the qdisc if we
    really want that. Note, both fq_codel and sfq clear this flag in their
    ->bind_tcf() but this is clearly not sufficient when we don't use any
    class ID.

    Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization")
    Cc: Eric Dumazet
    Signed-off-by: Cong Wang
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Cong Wang
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

11 Sep, 2018

1 commit


13 Jul, 2018

1 commit

  • When fq_codel_init fails, qdisc_create_dflt will cleanup by using
    qdisc_destroy. This function calls the ->reset() op prior to calling the
    ->destroy() op.

    Unfortunately, during the failure flow for sch_fq_codel, the ->flows
    parameter is not initialized, so the fq_codel_reset function will null
    pointer dereference.

    kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    kernel: IP: fq_codel_reset+0x58/0xd0 [sch_fq_codel]
    kernel: PGD 0 P4D 0
    kernel: Oops: 0000 [#1] SMP PTI
    kernel: Modules linked in: i40iw i40e(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore ib_core intel_rapl_perf mei_me mei joydev i2c_i801 lpc_ich ioatdma shpchp wmi sch_fq_codel xfs libcrc32c mgag200 ixgbe drm_kms_helper isci ttm firewire_ohci
    kernel: mdio drm igb libsas crc32c_intel firewire_core ptp pps_core scsi_transport_sas crc_itu_t dca i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler [last unloaded: i40e]
    kernel: CPU: 10 PID: 4219 Comm: ip Tainted: G OE 4.16.13custom-fq-codel-test+ #3
    kernel: Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.05.0004.051120151007 05/11/2015
    kernel: RIP: 0010:fq_codel_reset+0x58/0xd0 [sch_fq_codel]
    kernel: RSP: 0018:ffffbfbf4c1fb620 EFLAGS: 00010246
    kernel: RAX: 0000000000000400 RBX: 0000000000000000 RCX: 00000000000005b9
    kernel: RDX: 0000000000000000 RSI: ffff9d03264a60c0 RDI: ffff9cfd17b31c00
    kernel: RBP: 0000000000000001 R08: 00000000000260c0 R09: ffffffffb679c3e9
    kernel: R10: fffff1dab06a0e80 R11: ffff9cfd163af800 R12: ffff9cfd17b31c00
    kernel: R13: 0000000000000001 R14: ffff9cfd153de600 R15: 0000000000000001
    kernel: FS: 00007fdec2f92800(0000) GS:ffff9d0326480000(0000) knlGS:0000000000000000
    kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    kernel: CR2: 0000000000000008 CR3: 0000000c1956a006 CR4: 00000000000606e0
    kernel: Call Trace:
    kernel: qdisc_destroy+0x56/0x140
    kernel: qdisc_create_dflt+0x8b/0xb0
    kernel: mq_init+0xc1/0xf0
    kernel: qdisc_create_dflt+0x5a/0xb0
    kernel: dev_activate+0x205/0x230
    kernel: __dev_open+0xf5/0x160
    kernel: __dev_change_flags+0x1a3/0x210
    kernel: dev_change_flags+0x21/0x60
    kernel: do_setlink+0x660/0xdf0
    kernel: ? down_trylock+0x25/0x30
    kernel: ? xfs_buf_trylock+0x1a/0xd0 [xfs]
    kernel: ? rtnl_newlink+0x816/0x990
    kernel: ? _xfs_buf_find+0x327/0x580 [xfs]
    kernel: ? _cond_resched+0x15/0x30
    kernel: ? kmem_cache_alloc+0x20/0x1b0
    kernel: ? rtnetlink_rcv_msg+0x200/0x2f0
    kernel: ? rtnl_calcit.isra.30+0x100/0x100
    kernel: ? netlink_rcv_skb+0x4c/0x120
    kernel: ? netlink_unicast+0x19e/0x260
    kernel: ? netlink_sendmsg+0x1ff/0x3c0
    kernel: ? sock_sendmsg+0x36/0x40
    kernel: ? ___sys_sendmsg+0x295/0x2f0
    kernel: ? ebitmap_cmp+0x6d/0x90
    kernel: ? dev_get_by_name_rcu+0x73/0x90
    kernel: ? skb_dequeue+0x52/0x60
    kernel: ? __inode_wait_for_writeback+0x7f/0xf0
    kernel: ? bit_waitqueue+0x30/0x30
    kernel: ? fsnotify_grab_connector+0x3c/0x60
    kernel: ? __sys_sendmsg+0x51/0x90
    kernel: ? do_syscall_64+0x74/0x180
    kernel: ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    kernel: Code: 00 00 48 89 87 00 02 00 00 8b 87 a0 01 00 00 85 c0 0f 84 84 00 00 00 31 ed 48 63 dd 83 c5 01 48 c1 e3 06 49 03 9c 24 90 01 00 00 8b 73 08 48 8b 3b e8 6c 9a 4f f6 48 8d 43 10 48 c7 03 00 00
    kernel: RIP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] RSP: ffffbfbf4c1fb620
    kernel: CR2: 0000000000000008
    kernel: ---[ end trace e81a62bede66274e ]---

    This is caused because flows_cnt is non-zero, but flows hasn't been
    initialized. fq_codel_init has left the private data in a partially
    initialized state.

    To fix this, reset flows_cnt to 0 when we fail to initialize.
    Additionally, to make the state more consistent, also cleanup the flows
    pointer when the allocation of backlogs fails.

    This fixes the NULL pointer dereference, since both the for-loop and
    memset in fq_codel_reset will be no-ops when flow_cnt is zero.

    Signed-off-by: Jacob Keller
    Signed-off-by: David S. Miller

    Jacob Keller
     

13 Jun, 2018

1 commit

  • The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
    patch replaces cases of:

    kvzalloc(a * b, gfp)

    with:
    kvcalloc(a * b, gfp)

    as well as handling cases of:

    kvzalloc(a * b * c, gfp)

    with:

    kvzalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kvcalloc(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kvzalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kvzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kvzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kvzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kvzalloc
    + kvcalloc
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kvzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kvzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kvzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kvzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kvzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kvzalloc(C1 * C2 * C3, ...)
    |
    kvzalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvzalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvzalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kvzalloc(sizeof(THING) * C2, ...)
    |
    kvzalloc(sizeof(TYPE) * C2, ...)
    |
    kvzalloc(C1 * C2 * C3, ...)
    |
    kvzalloc(C1 * C2, ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

22 Dec, 2017

5 commits


22 Oct, 2017

1 commit


17 Oct, 2017

1 commit


02 Sep, 2017

1 commit


31 Aug, 2017

1 commit

  • It is very unlikely to happen but the backlogs memory allocation
    could fail and will free q->flows, but then ->destroy() will free
    q->flows too. For correctness remove the first free and let ->destroy
    clean up.

    Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

26 Aug, 2017

1 commit

  • For TC classes, their ->get() and ->put() are always paired, and the
    reference counting is completely useless, because:

    1) For class modification and dumping paths, we already hold RTNL lock,
    so all of these ->get(),->change(),->put() are atomic.

    2) For filter bindiing/unbinding, we use other reference counter than
    this one, and they should have RTNL lock too.

    3) For ->qlen_notify(), it is special because it is called on ->enqueue()
    path, but we already hold qdisc tree lock there, and we hold this
    tree lock when graft or delete the class too, so it should not be gone
    or changed until we release the tree lock.

    Therefore, this patch removes ->get() and ->put(), but:

    1) Adds a new ->find() to find the pointer to a class by classid, no
    refcnt.

    2) Move the original class destroy upon the last refcnt into ->delete(),
    right after releasing tree lock. This is fine because the class is
    already removed from hash when holding the lock.

    For those who also use ->put() as ->unbind(), just rename them to reflect
    this change.

    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Acked-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    WANG Cong
     

07 Jun, 2017

1 commit

  • There is need to instruct the HW offloaded path to push certain matched
    packets to cpu/kernel for further analysis. So this patch introduces a
    new TRAP control action to TC.

    For kernel datapath, this action does not make much sense. So with the
    same logic as in HW, new TRAP behaves similar to STOLEN. The skb is just
    dropped in the datapath (and virtually ejected to an upper level, which
    does not exist in case of kernel).

    Signed-off-by: Jiri Pirko
    Reviewed-by: Yotam Gigi
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Jiri Pirko
     

18 May, 2017

2 commits

  • Currently, the filter chains are direcly put into the private structures
    of qdiscs. In order to be able to have multiple chains per qdisc and to
    allow filter chains sharing among qdiscs, there is a need for common
    object that would hold the chains. This introduces such object and calls
    it "tcf_block".

    Helpers to get and put the blocks are provided to be called from
    individual qdisc code. Also, the original filter_list pointers are left
    in qdisc privs to allow the entry into tcf_block processing without any
    added overhead of possible multiple pointer dereference on fast path.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Move tc_classify function to cls_api.c where it belongs, rename it to
    fit the namespace.

    Signed-off-by: Jiri Pirko
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jiri Pirko
     

09 May, 2017

1 commit

  • There are many code paths opencoding kvmalloc. Let's use the helper
    instead. The main difference to kvmalloc is that those users are
    usually not considering all the aspects of the memory allocator. E.g.
    allocation requests
    Reviewed-by: Boris Ostrovsky # Xen bits
    Acked-by: Kees Cook
    Acked-by: Vlastimil Babka
    Acked-by: Andreas Dilger # Lustre
    Acked-by: Christian Borntraeger # KVM/s390
    Acked-by: Dan Williams # nvdim
    Acked-by: David Sterba # btrfs
    Acked-by: Ilya Dryomov # Ceph
    Acked-by: Tariq Toukan # mlx4
    Acked-by: Leon Romanovsky # mlx5
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Herbert Xu
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Tony Luck
    Cc: "Rafael J. Wysocki"
    Cc: Ben Skeggs
    Cc: Kent Overstreet
    Cc: Santosh Raspatur
    Cc: Hariprasad S
    Cc: Yishai Hadas
    Cc: Oleg Drokin
    Cc: "Yan, Zheng"
    Cc: Alexander Viro
    Cc: Alexei Starovoitov
    Cc: Eric Dumazet
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

14 Apr, 2017

1 commit


17 Mar, 2017

1 commit

  • The code introduced by commit 2ccccf5fb43f ("net_sched: update
    hierarchical backlog too") only sets prev_backlog in fq_codel_dequeue()
    but not using that anywhere, remove that setting.

    Cc: Cong Wang
    Signed-off-by: Or Gerlitz
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Or Gerlitz
     

11 Feb, 2017

1 commit


21 Jan, 2017

1 commit

  • The fq_codel qdisc currently always regenerates the skb flow hash.
    This wastes some cycles and prevents flow seperation in cases where
    the traffic has been encrypted and can no longer be understood by the
    flow dissector.

    Change it to use the prexisting flow hash if one exists, and only
    regenerate if necessary.

    Signed-off-by: Andrew Collins
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Andrew Collins
     

26 Jun, 2016

2 commits

  • Now we defer skb drops, it makes sense to keep a copy
    of skb->truesize in struct codel_skb_cb to avoid one
    cache line miss per dropped skb in fq_codel_drop(),
    to reduce latencies a bit further.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Qdisc performance suffers when packets are dropped at enqueue()
    time because drops (kfree_skb()) are done while qdisc lock is held,
    delaying a dequeue() draining the queue.

    Nominal throughput can be reduced by 50 % when this happens,
    at a time we would like the dequeue() to proceed as fast as possible.

    Even FQ is vulnerable to this problem, while one of FQ goals was
    to provide some flow isolation.

    This patch adds a 'struct sk_buff **to_free' parameter to all
    qdisc->enqueue(), and in qdisc_drop() helper.

    I measured a performance increase of up to 12 %, but this patch
    is a prereq so that future batches in enqueue() can fly.

    Signed-off-by: Eric Dumazet
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Jun, 2016

1 commit


11 Jun, 2016

1 commit

  • Conflicts:
    net/sched/act_police.c
    net/sched/sch_drr.c
    net/sched/sch_hfsc.c
    net/sched/sch_prio.c
    net/sched/sch_red.c
    net/sched/sch_tbf.c

    In net-next the drop methods of the packet schedulers got removed, so
    the bug fixes to them in 'net' are irrelevant.

    A packet action unload crash fix conflicts with the addition of the
    new firstuse timestamp.

    Signed-off-by: David S. Miller

    David S. Miller