06 Oct, 2020

1 commit


03 Oct, 2020

3 commits

  • Add policy to the struct genl_ops structure, this time
    with maxattr, so it can be used properly.

    Propagate .policy and .maxattr from the family
    in genl_get_cmd() if needed, this way the rest of the
    code does not have to worry if the policy is per op
    or global.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Johannes Berg
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • We want to add maxattr and policy back to genl_ops, to enable
    dumping per command policy to user space. This, however, would
    cause bloat for all the families with global policies. Introduce
    smaller version of ops (half the size of genl_ops). Translate
    these smaller ops into a full blown struct before use in the
    core.

    v1:
    - use struct assignment
    - put a full copy of the op in struct genl_dumpit_info
    - s/light/small/

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Johannes Berg
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • There are holes and oversized members in struct genl_family.

    Before: /* size: 104, cachelines: 2, members: 16 */
    After: /* size: 88, cachelines: 2, members: 16 */

    The command field in struct genlmsghdr is a u8, so no point
    in the operation count being 32 bit. Also operation 0 is
    usually undefined, so we only need 255 entries.

    netnsok and parallel_ops are only ever initialized to true.

    We can grow the fields as needed, compiler should warn us
    if someone tries to assign larger constants.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Johannes Berg
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

29 Sep, 2020

1 commit


18 Sep, 2020

1 commit


02 Jul, 2020

1 commit

  • A potential deadlock can occur during registering or unregistering a
    new generic netlink family between the main nl_table_lock and the
    cb_lock where each thread wants the lock held by the other, as
    demonstrated below.

    1) Thread 1 is performing a netlink_bind() operation on a socket. As part
    of this call, it will call netlink_lock_table(), incrementing the
    nl_table_users count to 1.
    2) Thread 2 is registering (or unregistering) a genl_family via the
    genl_(un)register_family() API. The cb_lock semaphore will be taken for
    writing.
    3) Thread 1 will call genl_bind() as part of the bind operation to handle
    subscribing to GENL multicast groups at the request of the user. It will
    attempt to take the cb_lock semaphore for reading, but it will fail and
    be scheduled away, waiting for Thread 2 to finish the write.
    4) Thread 2 will call netlink_table_grab() during the (un)registration
    call. However, as Thread 1 has incremented nl_table_users, it will not
    be able to proceed, and both threads will be stuck waiting for the
    other.

    genl_bind() is a noop, unless a genl_family implements the mcast_bind()
    function to handle setting up family-specific multicast operations. Since
    no one in-tree uses this functionality as Cong pointed out, simply removing
    the genl_bind() function will remove the possibility for deadlock, as there
    is no attempt by Thread 1 above to take the cb_lock semaphore.

    Fixes: c380d9a7afff ("genetlink: pass multicast bind/unbind to families")
    Suggested-by: Cong Wang
    Acked-by: Johannes Berg
    Reported-by: kernel test robot
    Signed-off-by: Sean Tranchetti
    Signed-off-by: David S. Miller

    Sean Tranchetti
     

30 Jun, 2020

1 commit

  • genl_family_rcv_msg_attrs_parse() reuses the global family->attrbuf
    when family->parallel_ops is false. However, family->attrbuf is not
    protected by any lock on the genl_family_rcv_msg_doit() code path.

    This leads to several different consequences, one of them is UAF,
    like the following:

    genl_family_rcv_msg_doit(): genl_start():
    genl_family_rcv_msg_attrs_parse()
    attrbuf = family->attrbuf
    __nlmsg_parse(attrbuf);
    genl_family_rcv_msg_attrs_parse()
    attrbuf = family->attrbuf
    __nlmsg_parse(attrbuf);
    info->attrs = attrs;
    cb->data = info;

    netlink_unicast_kernel():
    consume_skb()
    genl_lock_dumpit():
    genl_dumpit_info(cb)->attrs

    Note family->attrbuf is an array of pointers to the skb data, once
    the skb is freed, any dereference of family->attrbuf will be a UAF.

    Maybe we could serialize the family->attrbuf with genl_mutex too, but
    that would make the locking more complicated. Instead, we can just get
    rid of family->attrbuf and always allocate attrbuf from heap like the
    family->parallel_ops==true code path. This may add some performance
    overhead but comparing with taking the global genl_mutex, it still
    looks better.

    Fixes: 75cdbdd08900 ("net: ieee802154: have genetlink code to parse the attrs during dumpit")
    Fixes: 057af7071344 ("net: tipc: have genetlink code to parse the attrs during dumpit")
    Reported-and-tested-by: syzbot+3039ddf6d7b13daf3787@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+80cad1e3cb4c41cde6ff@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+736bcbcb11b60d0c0792@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+520f8704db2b68091d44@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+c96e4dfb32f8987fdeed@syzkaller.appspotmail.com
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

06 Oct, 2019

3 commits


28 Apr, 2019

3 commits

  • Add options to strictly validate messages and dump messages,
    sometimes perhaps validating dump messages non-strictly may
    be required, so add an option for that as well.

    Since none of this can really be applied to existing commands,
    set the options everwhere using the following spatch:

    @@
    identifier ops;
    expression X;
    @@
    struct genl_ops ops[] = {
    ...,
    {
    .cmd = X,
    + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
    ...
    },
    ...
    };

    For new commands one should just not copy the .validate 'opt-out'
    flags and thus get strict validation.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This re-adds the parse and validate functions like nla_parse()
    that are now actually strict after the previous rename and were
    just split out to make sure everything is converted (and if not
    compilation of the previous patch would fail.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

22 Mar, 2019

1 commit

  • Since maxattr is common, the policy can't really differ sanely,
    so make it common as well.

    The only user that did in fact manage to make a non-common policy
    is taskstats, which has to be really careful about it (since it's
    still using a common maxattr!). This is no longer supported, but
    we can fake it using pre_doit.

    This reduces the size of e.g. nl80211.o (which has lots of commands):

    text data bss dec hex filename
    398745 14323 2240 415308 6564c net/wireless/nl80211.o (before)
    397913 14331 2240 414484 65314 net/wireless/nl80211.o (after)
    --------------------------------
    -832 +8 0 -824

    Which is obviously just 8 bytes for each command, and an added 8
    bytes for the new policy pointer. I'm not sure why the ops list is
    counted as .text though.

    Most of the code transformations were done using the following spatch:
    @ops@
    identifier OPS;
    expression POLICY;
    @@
    struct genl_ops OPS[] = {
    ...,
    {
    - .policy = POLICY,
    },
    ...
    };

    @@
    identifier ops.OPS;
    expression ops.POLICY;
    identifier fam;
    expression M;
    @@
    struct genl_family fam = {
    .ops = OPS,
    .maxattr = M,
    + .policy = POLICY,
    ...
    };

    This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
    the cb->data as ops, which we want to change in a later genl patch.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

30 Aug, 2018

1 commit


16 Nov, 2017

1 commit

  • According to the description, first argument of genlmsg_nlhdr() points to
    what genlmsg_put() returns, i.e. beginning of user header. Therefore we
    should only subtract size of genetlink header and netlink message header,
    not user header.

    This also means we don't need to pass the pointer to genetlink family and
    the same is true for genl_dump_check_consistent() which is the only caller
    of genlmsg_nlhdr(). (Note that at the moment, these functions are only
    used for families which do not have user header so that they are not
    affected.)

    Fixes: 670dc2833d14 ("netlink: advertise incomplete dumps")
    Signed-off-by: Michal Kubecek
    Reviewed-by: Johannes Berg
    Signed-off-by: David S. Miller

    Michal Kubecek
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

05 Jun, 2017

1 commit

  • commit d91824c08fbc ("genetlink: register family ops as array") removed the
    ops_list member from both genl_family and genl_ops; while the
    documentation of genl_family was updated accordingly by this patch,
    ops_list remained in the documentation of the genl_ops object.
    This patch fixes it by removing ops_list from genl_ops documentation.

    Signed-off-by: Rami Rosen
    Signed-off-by: David S. Miller

    Rosen, Rami
     

14 Apr, 2017

2 commits


14 Nov, 2016

1 commit


28 Oct, 2016

4 commits

  • Since generic netlink family IDs are small integers, allocated
    densely, IDR is an ideal match for lookups. Replace the existing
    hand-written hash-table with IDR for allocation and lookup.

    This lets the families only be written to once, during register,
    since the list_head can be removed and removal of a family won't
    cause any writes.

    It also slightly reduces the code size (by about 1.3k on x86-64).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Instead of providing macros/inline functions to initialize
    the families, make all users initialize them statically and
    get rid of the macros.

    This reduces the kernel code size by about 1.6k on x86-64
    (with allyesconfig).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Static family IDs have never really been used, the only
    use case was the workaround I introduced for those users
    that assumed their family ID was also their multicast
    group ID.

    Additionally, because static family IDs would never be
    reserved by the generic netlink code, using a relatively
    low ID would only work for built-in families that can be
    registered immediately after generic netlink is started,
    which is basically only the control family (apart from
    the workaround code, which I also had to add code for so
    it would reserve those IDs)

    Thus, anything other than GENL_ID_GENERATE is flawed and
    luckily not used except in the cases I mentioned. Move
    those workarounds into a few lines of code, and then get
    rid of GENL_ID_GENERATE entirely, making it more robust.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This helper function allows family implementations to access
    their family's attrbuf. This gets rid of the attrbuf usage
    in families, and also adds locking validation, since it's not
    valid to use the attrbuf with parallel_ops or outside of the
    dumpit callback.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

19 Feb, 2016

1 commit


16 Dec, 2015

1 commit


25 Sep, 2015

1 commit

  • The genl_notify function has too many arguments for no real reason - all
    callers use genl_info to get them anyway. Just pass the genl_info down to
    genl_notify.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

13 Mar, 2015

1 commit

  • Having to say
    > #ifdef CONFIG_NET_NS
    > struct net *net;
    > #endif

    in structures is a little bit wordy and a little bit error prone.

    Instead it is possible to say:
    > typedef struct {
    > #ifdef CONFIG_NET_NS
    > struct net *net;
    > #endif
    > } possible_net_t;

    And then in a header say:

    > possible_net_t net;

    Which is cleaner and easier to use and easier to test, as the
    possible_net_t is always there no matter what the compile options.

    Further this allows read_pnet and write_pnet to be functions in all
    cases which is better at catching typos.

    This change adds possible_net_t, updates the definitions of read_pnet
    and write_pnet, updates optional struct net * variables that
    write_pnet uses on to have the type possible_net_t, and finally fixes
    up the b0rked users of read_pnet and write_pnet.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

28 Jan, 2015

1 commit


27 Jan, 2015

1 commit


18 Jan, 2015

1 commit

  • Contrary to common expectations for an "int" return, these functions
    return only a positive value -- if used correctly they cannot even
    return 0 because the message header will necessarily be in the skb.

    This makes the very common pattern of

    if (genlmsg_end(...) < 0) { ... }

    be a whole bunch of dead code. Many places also simply do

    return nlmsg_end(...);

    and the caller is expected to deal with it.

    This also commonly (at least for me) causes errors, because it is very
    common to write

    if (my_function(...))
    /* error condition */

    and if my_function() does "return nlmsg_end()" this is of course wrong.

    Additionally, there's not a single place in the kernel that actually
    needs the message length returned, and if anyone needs it later then
    it'll be very easy to just use skb->len there.

    Remove this, and make the functions void. This removes a bunch of dead
    code as described above. The patch adds lines because I did

    - return nlmsg_end(...);
    + nlmsg_end(...);
    + return 0;

    I could have preserved all the function's return values by returning
    skb->len, but instead I've audited all the places calling the affected
    functions and found that none cared. A few places actually compared
    the return value with < 0 with no change in behaviour, so I opted for the more
    efficient version.

    One instance of the error I've made numerous times now is also present
    in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
    check for
    Signed-off-by: David S. Miller

    Johannes Berg
     

17 Jan, 2015

2 commits

  • In addition to the problem Jeff Layton reported, I looked at the code
    and reproduced the same warning by subscribing and removing the genl
    family with a socket still open. This is a fairly tricky race which
    originates in the fact that generic netlink allows the family to go
    away while sockets are still open - unlike regular netlink which has
    a module refcount for every open socket so in general this cannot be
    triggered.

    Trying to resolve this issue by the obvious locking isn't possible as
    it will result in deadlocks between unregistration and group unbind
    notification (which incidentally lockdep doesn't find due to the home
    grown locking in the netlink table.)

    To really resolve this, introduce a "closing socket" reference counter
    (for generic netlink only, as it's the only affected family) in the
    core netlink code and use that in generic netlink to wait for all the
    sockets that are being closed at the same time as a generic netlink
    family is removed.

    This fixes the race that when a socket is closed, it will should call
    the unbind, but if the family is removed at the same time the unbind
    will not find it, leading to the warning. The real problem though is
    that in this case the unbind could actually find a new family that is
    registered to have a multicast group with the same ID, and call its
    mcast_unbind() leading to confusing.

    Also remove the warning since it would still trigger, but is now no
    longer a problem.

    This also moves the code in af_netlink.c to before unreferencing the
    module to avoid having the same problem in the normal non-genl case.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • The kernel-doc for the parallel_ops family struct member is
    missing, add it.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

27 Dec, 2014

3 commits

  • Netlink families can exist in multiple namespaces, and for the most
    part multicast subscriptions are per network namespace. Thus it only
    makes sense to have bind/unbind notifications per network namespace.

    To achieve this, pass the network namespace of a given client socket
    to the bind/unbind functions.

    Also do this in generic netlink, and there also make sure that any
    bind for multicast groups that only exist in init_net is rejected.
    This isn't really a problem if it is accepted since a client in a
    different namespace will never receive any notifications from such
    a group, but it can confuse the family if not rejected (it's also
    possible to silently (without telling the family) accept it, but it
    would also have to be ignored on unbind so families that take any
    kind of action on bind/unbind won't do unnecessary work for invalid
    clients like that.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • In order to make the newly fixed multicast bind/unbind
    functionality in generic netlink, pass them down to the
    appropriate family.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • There's no point to force the caller to know about the internal
    genl_sock to use inside struct net, just have them pass the network
    namespace. This doesn't really change code generation since it's
    an inline, but makes the caller less magic - there's never any
    reason to pass another socket.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

20 Sep, 2014

1 commit


07 Jan, 2014

1 commit

  • Allocates a new sk_buff large enough to cover the specified payload
    plus required Netlink headers. Will check receiving socket for
    memory mapped i/o capability and use it if enabled. Will fall back
    to non-mapped skb if message size exceeds the frame size of the ring.

    Signed-of-by: Thomas Graf
    Reviewed-by: Daniel Borkmann
    Signed-off-by: Jesse Gross

    Thomas Graf