21 Jun, 2020

1 commit

  • The user tool modinfo is used to get information on kernel modules, including a
    description where it is available.

    This patch adds a brief MODULE_DESCRIPTION to the following modules:

    9p
    drop_monitor
    esp4_offload
    esp6_offload
    fou
    fou6
    ila
    sch_fq
    sch_fq_codel
    sch_hhf

    Signed-off-by: Rob Gill
    Signed-off-by: David S. Miller

    Rob Gill
     

24 Oct, 2019

1 commit

  • UDP IPv6 packets auto flowlabels are using a 32bit secret
    (static u32 hashrnd in net/core/flow_dissector.c) and
    apply jhash() over fields known by the receivers.

    Attackers can easily infer the 32bit secret and use this information
    to identify a device and/or user, since this 32bit secret is only
    set at boot time.

    Really, using jhash() to generate cookies sent on the wire
    is a serious security concern.

    Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
    a dead end. Trying to periodically change the secret (like in sch_sfq.c)
    could change paths taken in the network for long lived flows.

    Let's switch to siphash, as we did in commit df453700e8d8
    ("inet: switch IP ID generator to siphash")

    Using a cryptographically strong pseudo random function will solve this
    privacy issue and more generally remove other weak points in the stack.

    Packet schedulers using skb_get_hash_perturb() benefit from this change.

    Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default")
    Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels")
    Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
    Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit")
    Signed-off-by: Eric Dumazet
    Reported-by: Jonathan Berger
    Reported-by: Amit Klein
    Reported-by: Benny Pinkas
    Cc: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Sep, 2019

1 commit

  • In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero,
    it would make no progress inside the loop in hhf_dequeue() thus
    kernel would get stuck.

    Fix this by checking this corner case in hhf_change().

    Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
    Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com
    Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com
    Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Cc: Terry Lam
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have MODULE_LICENCE("GPL*") inside which was used in the initial
    scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

11 Sep, 2018

1 commit


13 Jun, 2018

1 commit

  • The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
    patch replaces cases of:

    kvzalloc(a * b, gfp)

    with:
    kvcalloc(a * b, gfp)

    as well as handling cases of:

    kvzalloc(a * b * c, gfp)

    with:

    kvzalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kvcalloc(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kvzalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kvzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kvzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kvzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kvzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kvzalloc
    + kvcalloc
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kvzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kvzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kvzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kvzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kvzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kvzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kvzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kvzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kvzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kvzalloc(C1 * C2 * C3, ...)
    |
    kvzalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvzalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvzalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kvzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kvzalloc(sizeof(THING) * C2, ...)
    |
    kvzalloc(sizeof(TYPE) * C2, ...)
    |
    kvzalloc(C1 * C2 * C3, ...)
    |
    kvzalloc(C1 * C2, ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kvzalloc
    + kvcalloc
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

22 Dec, 2017

2 commits


31 Aug, 2017

1 commit

  • If sch_hhf fails in its ->init() function (either due to wrong
    user-space arguments as below or memory alloc failure of hh_flows) it
    will do a null pointer deref of q->hh_flows in its ->destroy() function.

    To reproduce the crash:
    $ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000

    Crash log:
    [ 690.654882] BUG: unable to handle kernel NULL pointer dereference at (null)
    [ 690.655565] IP: hhf_destroy+0x48/0xbc
    [ 690.655944] PGD 37345067
    [ 690.655948] P4D 37345067
    [ 690.656252] PUD 58402067
    [ 690.656554] PMD 0
    [ 690.656857]
    [ 690.657362] Oops: 0000 [#1] SMP
    [ 690.657696] Modules linked in:
    [ 690.658032] CPU: 3 PID: 920 Comm: tc Not tainted 4.13.0-rc6+ #57
    [ 690.658525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    [ 690.659255] task: ffff880058578000 task.stack: ffff88005acbc000
    [ 690.659747] RIP: 0010:hhf_destroy+0x48/0xbc
    [ 690.660146] RSP: 0018:ffff88005acbf9e0 EFLAGS: 00010246
    [ 690.660601] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
    [ 690.661155] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff821f63f0
    [ 690.661710] RBP: ffff88005acbfa08 R08: ffffffff81b10a90 R09: 0000000000000000
    [ 690.662267] R10: 00000000f42b7019 R11: ffff880058578000 R12: 00000000ffffffea
    [ 690.662820] R13: ffff8800372f6400 R14: 0000000000000000 R15: 0000000000000000
    [ 690.663769] FS: 00007f8ae5e8b740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
    [ 690.667069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 690.667965] CR2: 0000000000000000 CR3: 0000000058523000 CR4: 00000000000406e0
    [ 690.668918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 690.669945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 690.671003] Call Trace:
    [ 690.671743] qdisc_create+0x377/0x3fd
    [ 690.672534] tc_modify_qdisc+0x4d2/0x4fd
    [ 690.673324] rtnetlink_rcv_msg+0x188/0x197
    [ 690.674204] ? rcu_read_unlock+0x3e/0x5f
    [ 690.675091] ? rtnl_newlink+0x729/0x729
    [ 690.675877] netlink_rcv_skb+0x6c/0xce
    [ 690.676648] rtnetlink_rcv+0x23/0x2a
    [ 690.677405] netlink_unicast+0x103/0x181
    [ 690.678179] netlink_sendmsg+0x326/0x337
    [ 690.678958] sock_sendmsg_nosec+0x14/0x3f
    [ 690.679743] sock_sendmsg+0x29/0x2e
    [ 690.680506] ___sys_sendmsg+0x209/0x28b
    [ 690.681283] ? __handle_mm_fault+0xc7d/0xdb1
    [ 690.681915] ? check_chain_key+0xb0/0xfd
    [ 690.682449] __sys_sendmsg+0x45/0x63
    [ 690.682954] ? __sys_sendmsg+0x45/0x63
    [ 690.683471] SyS_sendmsg+0x19/0x1b
    [ 690.683974] entry_SYSCALL_64_fastpath+0x23/0xc2
    [ 690.684516] RIP: 0033:0x7f8ae529d690
    [ 690.685016] RSP: 002b:00007fff26d2d6b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [ 690.685931] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f8ae529d690
    [ 690.686573] RDX: 0000000000000000 RSI: 00007fff26d2d700 RDI: 0000000000000003
    [ 690.687047] RBP: ffff88005acbff98 R08: 0000000000000001 R09: 0000000000000000
    [ 690.687519] R10: 00007fff26d2d480 R11: 0000000000000246 R12: 0000000000000002
    [ 690.687996] R13: 0000000001258070 R14: 0000000000000001 R15: 0000000000000000
    [ 690.688475] ? trace_hardirqs_off_caller+0xa7/0xcf
    [ 690.688887] Code: 00 00 e8 2a 02 ae ff 49 8b bc 1d 60 02 00 00 48 83
    c3 08 e8 19 02 ae ff 48 83 fb 20 75 dc 45 31 f6 4d 89 f7 4d 03 bd 20 02
    00 00 8b 07 49 39 c7 75 24 49 83 c6 10 49 81 fe 00 40 00 00 75 e1
    [ 690.690200] RIP: hhf_destroy+0x48/0xbc RSP: ffff88005acbf9e0
    [ 690.690636] CR2: 0000000000000000

    Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
    Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

09 May, 2017

1 commit

  • There are many code paths opencoding kvmalloc. Let's use the helper
    instead. The main difference to kvmalloc is that those users are
    usually not considering all the aspects of the memory allocator. E.g.
    allocation requests
    Reviewed-by: Boris Ostrovsky # Xen bits
    Acked-by: Kees Cook
    Acked-by: Vlastimil Babka
    Acked-by: Andreas Dilger # Lustre
    Acked-by: Christian Borntraeger # KVM/s390
    Acked-by: Dan Williams # nvdim
    Acked-by: David Sterba # btrfs
    Acked-by: Ilya Dryomov # Ceph
    Acked-by: Tariq Toukan # mlx4
    Acked-by: Leon Romanovsky # mlx5
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Herbert Xu
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Tony Luck
    Cc: "Rafael J. Wysocki"
    Cc: Ben Skeggs
    Cc: Kent Overstreet
    Cc: Santosh Raspatur
    Cc: Hariprasad S
    Cc: Yishai Hadas
    Cc: Oleg Drokin
    Cc: "Yan, Zheng"
    Cc: Alexander Viro
    Cc: Alexei Starovoitov
    Cc: Eric Dumazet
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

14 Apr, 2017

1 commit


12 Feb, 2017

1 commit

  • Dmitry reported uses after free in qdisc code [1]

    The problem here is that ops->init() can return an error.

    qdisc_create_dflt() then call ops->destroy(),
    while qdisc_create() does _not_ call it.

    Four qdisc chose to call their own ops->destroy(), assuming their caller
    would not.

    This patch makes sure qdisc_create() calls ops->destroy()
    and fixes the four qdisc to avoid double free.

    [1]
    BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr ffff8801d415d440
    Read of size 8 by task syz-executor2/5030
    CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400
    ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898
    ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0
    Call Trace:
    [] __dump_stack lib/dump_stack.c:15 [inline]
    [] dump_stack+0x6c/0x98 lib/dump_stack.c:51
    [] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
    [] print_address_description mm/kasan/report.c:196 [inline]
    [] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
    [] kasan_report mm/kasan/report.c:305 [inline]
    [] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326
    [] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
    [] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
    [] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
    [] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline]
    [] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
    [] __dev_open+0x221/0x320 net/core/dev.c:1403
    [] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
    [] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
    [] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
    [] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
    [] sock_do_ioctl+0x99/0xb0 net/socket.c:879
    [] sock_ioctl+0x2a0/0x390 net/socket.c:958
    [] vfs_ioctl fs/ioctl.c:44 [inline]
    [] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
    [] SYSC_ioctl fs/ioctl.c:626 [inline]
    [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
    [] entry_SYSCALL_64_fastpath+0x12/0x17

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Jun, 2016

1 commit

  • Qdisc performance suffers when packets are dropped at enqueue()
    time because drops (kfree_skb()) are done while qdisc lock is held,
    delaying a dequeue() draining the queue.

    Nominal throughput can be reduced by 50 % when this happens,
    at a time we would like the dequeue() to proceed as fast as possible.

    Even FQ is vulnerable to this problem, while one of FQ goals was
    to provide some flow isolation.

    This patch adds a 'struct sk_buff **to_free' parameter to all
    qdisc->enqueue(), and in qdisc_drop() helper.

    I measured a performance increase of up to 12 %, but this patch
    is a prereq so that future batches in enqueue() can fly.

    Signed-off-by: Eric Dumazet
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Jun, 2016

1 commit


09 Jun, 2016

1 commit

  • after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
    more callers of ->drop() outside of other ->drop functions, i.e.
    nothing calls them.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

01 Mar, 2016

1 commit

  • When the bottom qdisc decides to, for example, drop some packet,
    it calls qdisc_tree_decrease_qlen() to update the queue length
    for all its ancestors, we need to update the backlog too to
    keep the stats on root qdisc accurate.

    Cc: Jamal Hadi Salim
    Acked-by: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

11 Oct, 2015

1 commit

  • Similar to commit c0afd9ce4d6a ("fq_codel: fix return value of fq_codel_drop()")
    ->drop() is supposed to return the number of bytes it dropped,
    but hhf_drop () returns the id of the bucket where it drops
    a packet from.

    Cc: Jamal Hadi Salim
    Cc: Terry Lam
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

04 May, 2015

1 commit


30 Sep, 2014

1 commit


05 Jun, 2014

1 commit


13 May, 2014

1 commit


05 May, 2014

1 commit

  • hhf_change() takes the sch_tree_lock and releases it but misses the
    error cases. Fix the missed case here.

    To reproduce try a command like this,

    # tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000

    Signed-off-by: John Fastabend
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     

14 Mar, 2014

1 commit


15 Jan, 2014

1 commit


14 Jan, 2014

1 commit


27 Dec, 2013

1 commit


20 Dec, 2013

1 commit

  • This patch implements the first size-based qdisc that attempts to
    differentiate between small flows and heavy-hitters. The goal is to
    catch the heavy-hitters and move them to a separate queue with less
    priority so that bulk traffic does not affect the latency of critical
    traffic. Currently "less priority" means less weight (2:1 in
    particular) in a Weighted Deficit Round Robin (WDRR) scheduler.

    In essence, this patch addresses the "delay-bloat" problem due to
    bloated buffers. In some systems, large queues may be necessary for
    obtaining CPU efficiency, or due to the presence of unresponsive
    traffic like UDP, or just a large number of connections with each
    having a small amount of outstanding traffic. In these circumstances,
    HHF aims to reduce the HoL blocking for latency sensitive traffic,
    while not impacting the queues built up by bulk traffic. HHF can also
    be used in conjunction with other AQM mechanisms such as CoDel.

    To capture heavy-hitters, we implement the "multi-stage filter" design
    in the following paper:
    C. Estan and G. Varghese, "New Directions in Traffic Measurement and
    Accounting", in ACM SIGCOMM, 2002.

    Some configurable qdisc settings through 'tc':
    - hhf_reset_timeout: period to reset counter values in the multi-stage
    filter (default 40ms)
    - hhf_admit_bytes: threshold to classify heavy-hitters
    (default 128KB)
    - hhf_evict_timeout: threshold to evict idle heavy-hitters
    (default 1s)
    - hhf_non_hh_weight: Weighted Deficit Round Robin (WDRR) weight for
    non-heavy-hitters (default 2)
    - hh_flows_limit: max number of heavy-hitter flow entries
    (default 2048)

    Note that the ratio between hhf_admit_bytes and hhf_reset_timeout
    reflects the bandwidth of heavy-hitters that we attempt to capture
    (25Mbps with the above default settings).

    The false negative rate (heavy-hitter flows getting away unclassified)
    is zero by the design of the multi-stage filter algorithm.
    With 100 heavy-hitter flows, using four hashes and 4000 counters yields
    a false positive rate (non-heavy-hitters mistakenly classified as
    heavy-hitters) of less than 1e-4.

    Signed-off-by: Terry Lam
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Terry Lam