22 Oct, 2020

1 commit

  • "ip addr show" command execute error when we have a physical
    network card with a large number of VFs

    The return value of if_nlmsg_size() in rtnl_calcit() will exceed
    range of u16 data type when any network cards has a larger number of
    VFs. rtnl_vfinfo_size() will significant increase needed dump size when
    the value of num_vfs is larger.

    Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
    which decides the memory size needed by netlink dump and netlink_dump()
    will return -EMSGSIZE because of not enough memory was allocated.

    So fix it by promoting min_dump_alloc data type to u32 to
    avoid whole netlink message size overflow and it's also align
    with the data type of struct netlink_callback{}.min_dump_alloc
    which is assigned by return value of rtnl_calcit()

    Signed-off-by: Di Zhu
    Link: https://lore.kernel.org/r/20201021020053.1401-1-zhudi21@huawei.com
    Signed-off-by: Jakub Kicinski

    Di Zhu
     

10 Oct, 2020

1 commit

  • Add a new attribute NLMSGERR_ATTR_POLICY to the extended ACK
    to advertise the policy, e.g. if an attribute was out of range,
    you'll know the range that's permissible.

    Add new NL_SET_ERR_MSG_ATTR_POL() and NL_SET_ERR_MSG_ATTR_POL()
    macros to set this, since realistically it's only useful to do
    this when the bad attribute (offset) is also returned.

    Use it in lib/nlattr.c which practically does all the policy
    validation.

    v2:
    - add and use netlink_policy_dump_attr_size_estimate()
    v3:
    - remove redundant break
    v4:
    - really remove redundant break ... sorry

    Reviewed-by: Jakub Kicinski
    Signed-off-by: Johannes Berg
    Signed-off-by: Jakub Kicinski

    Johannes Berg
     

26 Mar, 2020

1 commit


24 Mar, 2020

1 commit

  • Unlike NL_SET_ERR_* macros, nl_set_extack_cookie_u64() and
    nl_set_extack_cookie_u32() helpers do not check extack argument for null
    and neither do their callers, as syzbot recently discovered for
    ethnl_parse_header().

    Instead of fixing the callers and leaving the trap in place, add check of
    null extack to both helpers to make them consistent with NL_SET_ERR_*
    macros.

    v2: drop incorrect second Fixes tag

    Fixes: 2363d73a2f3e ("ethtool: reject unrecognized request flags")
    Reported-by: syzbot+258a9089477493cea67b@syzkaller.appspotmail.com
    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Michal Kubecek
     

16 Mar, 2020

1 commit


28 Feb, 2020

1 commit

  • This patch will dump out the bpf_sk_storages of a sk
    if the request has the INET_DIAG_REQ_SK_BPF_STORAGES nlattr.

    An array of SK_DIAG_BPF_STORAGE_REQ_MAP_FD can be specified in
    INET_DIAG_REQ_SK_BPF_STORAGES to select which bpf_sk_storage to dump.
    If no map_fd is specified, all bpf_sk_storages of a sk will be dumped.

    bpf_sk_storages can be added to the system at runtime. It is difficult
    to find a proper static value for cb->min_dump_alloc.

    This patch learns the nlattr size required to dump the bpf_sk_storages
    of a sk. If it happens to be the very first nlmsg of a dump and it
    cannot fit the needed bpf_sk_storages, it will try to expand the
    skb by "pskb_expand_head()".

    Instead of expanding it in inet_sk_diag_fill(), it is expanded at a
    sleepable context in __inet_diag_dump() so __GFP_DIRECT_RECLAIM can
    be used. In __inet_diag_dump(), it will retry as long as the
    skb is empty and the cb->min_dump_alloc becomes larger than before.
    cb->min_dump_alloc is bounded by KMALLOC_MAX_SIZE. The min_dump_alloc
    is also changed from 'u16' to 'u32' to accommodate a sk that may have
    a few large bpf_sk_storages.

    The updated cb->min_dump_alloc will also be used to allocate the skb in
    the next dump. This logic already exists in netlink_dump().

    Here is the sample output of a locally modified 'ss' and it could be made
    more readable by using BTF later:
    [root@arch-fb-vm1 ~]# ss --bpf-map-id 14 --bpf-map-id 13 -t6an 'dst [::1]:8989'
    State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
    ESTAB 0 0 [::1]:51072 [::1]:8989
    bpf_map_id:14 value:[ 3feb ]
    bpf_map_id:13 value:[ 3f ]
    ESTAB 0 0 [::1]:51070 [::1]:8989
    bpf_map_id:14 value:[ 3feb ]
    bpf_map_id:13 value:[ 3f ]

    [root@arch-fb-vm1 ~]# ~/devshare/github/iproute2/misc/ss --bpf-maps -t6an 'dst [::1]:8989'
    State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
    ESTAB 0 0 [::1]:51072 [::1]:8989
    bpf_map_id:14 value:[ 3feb ]
    bpf_map_id:13 value:[ 3f ]
    bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]
    ESTAB 0 0 [::1]:51070 [::1]:8989
    bpf_map_id:14 value:[ 3feb ]
    bpf_map_id:13 value:[ 3f ]
    bpf_map_id:12 value:[ 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000... total:65407 ]

    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Alexei Starovoitov
    Acked-by: Song Liu
    Link: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com

    Martin KaFai Lau
     

02 Jul, 2019

1 commit

  • People are inclined to stuff random things into cb->args[n] because it
    looks like an array of integers. Sometimes people even put u64s in there
    with comments noting that a certain member takes up two slots. The
    horror! Really this should mirror the usage of skb->cb, which are just
    48 opaque bytes suitable for casting a struct. Then people can create
    their usual casting macros for accessing strongly typed members of a
    struct.

    As a plus, this also gives us the same amount of space on 32bit and 64bit.

    Signed-off-by: Jason A. Donenfeld
    Reviewed-by: Johannes Berg
    Signed-off-by: David S. Miller

    Jason A. Donenfeld
     

20 Jan, 2019

1 commit


21 Dec, 2018

1 commit


09 Nov, 2018

1 commit

  • Add a helper function nl_set_extack_cookie_u64() to use a u64 as
    the netlink extended ACK cookie, to avoid having to open-code it
    in any users of the cookie.

    A u64 should be sufficient for most subsystems though we allow
    for up to 20 bytes right now. This also matches the cookies in
    nl80211 where I intend to use this.

    Signed-off-by: Johannes Berg
    Acked-by: David S. Miller
    Signed-off-by: Johannes Berg

    Johannes Berg
     

16 Oct, 2018

1 commit

  • With dump filtering we need a way to ensure the NLM_F_DUMP_FILTERED
    flag is set on a message back to the user if the data returned is
    influenced by some input attributes. Normally this can be done as
    messages are added to the skb, but if the filter results in no data
    being returned, the user could be confused as to why.

    This patch adds answer_flags to the netlink_callback allowing dump
    handlers to set the NLM_F_DUMP_FILTERED at a minimum in the
    NLMSG_DONE message ensuring the flag gets back to the user.

    The netlink_callback space is initialized to 0 via a memset in
    __netlink_dump_start, so init of the new answer_flags is covered.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

09 Oct, 2018

2 commits

  • Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace
    can use via setsockopt to request strict checking of headers and
    attributes on dump requests.

    To get dump features such as kernel side filtering based on data in
    the header or attributes appended to the dump request, userspace
    must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero
    value. Since the netlink sock and its flags are private to the
    af_netlink code, the strict checking flag is passed to dump handlers
    via a flag in the netlink_callback struct.

    For old userspace on new kernel there is no impact as all of the data
    checks in later patches are wrapped in a check on the new strict flag.

    For new userspace on old kernel, the setsockopt will fail and even if
    new userspace sets data in the headers and appended attributes the
    kernel will silently ignore it. Moving forward when the setsockopt
    succeeds, the new userspace on old kernel means the dump request can
    pass an attribute the kernel does not understand. The dump will then
    fail as the older kernel does not understand it.

    New userspace on new kernel setting the socket option gets the benefit
    of the improved data dump.

    Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic
    NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter
    checking on the NEW, DEL, and SET commands.

    Signed-off-by: David Ahern
    Acked-by: Christian Brauner
    Signed-off-by: David S. Miller

    David Ahern
     
  • Declare extack in netlink_dump and pass to dump handlers via
    netlink_callback. Add any extack message after the dump_done_errno
    allowing error messages to be returned. This will be useful when
    strict checking is done on dump requests, returning why the dump
    fails EINVAL.

    Signed-off-by: David Ahern
    Acked-by: Christian Brauner
    Signed-off-by: David S. Miller

    David Ahern
     

25 Jul, 2018

1 commit


16 Jan, 2018

1 commit

  • NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
    in newer versions of gcc:
    warning: array initialized from parenthesized string constant

    Just remove the parentheses, they're not needed in this context since
    anyway since there can be no operator precendence issues or similar.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

14 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

30 May, 2017

1 commit

  • Pass extack arg down to lwtunnel_build_state and the build_state callbacks.
    Add messages for failures in lwtunnel_build_state, and add the extarg to
    nla_parse where possible in the build_state callbacks.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

23 May, 2017

1 commit

  • Add messages for non-obvious errors (e.g, no need to add text for malloc
    failures or ENODEV failures). This mostly covers the annoying EINVAL errors
    Some message strings violate the 80-columns but searchable strings need to
    trump that rule.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

03 May, 2017

1 commit

  • Small follow-up to d74a32acd59a ("xdp: use netlink extended ACK reporting")
    in order to let drivers all use the same NL_SET_ERR_MSG_MOD() helper macro
    for reporting. This also ensures that we consistently add the driver's
    prefix for dumping the report in user space to indicate that the error
    message is driver specific and not coming from core code. Furthermore,
    NL_SET_ERR_MSG_MOD() now reuses NL_SET_ERR_MSG() and thus makes all macros
    check the pointer as suggested.

    References: https://www.spinics.net/lists/netdev/msg433267.html
    Signed-off-by: Daniel Borkmann
    Acked-by: Jakub Kicinski
    Reviewed-by: Johannes Berg
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

01 May, 2017

1 commit

  • As we propagate extended ack reporting throughout various paths in
    the kernel it may be that the same function is called with the
    extended ack parameter passed as NULL. One place where that happens
    is in drivers which have a centralized reconfiguration function
    called both from ndos and from ethtool_ops. Add a new helper for
    setting the error message in such conditions.

    Existing helper is left as is to encourage propagating the ext act
    fully wherever possible. It also makes it clear in the code which
    messages may be lost due to ext ack being NULL.

    Signed-off-by: Jakub Kicinski
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

14 Apr, 2017

2 commits

  • Now that we have extended error reporting and a new message format for
    netlink ACK messages, also extend this to be able to return arbitrary
    cookie data on success.

    This will allow, for example, nl80211 to not send an extra message for
    cookies identifying newly created objects, but return those directly
    in the ACK message.

    The cookie data size is currently limited to 20 bytes (since Jamal
    talked about using SHA1 for identifiers.)

    Thanks to Jamal Hadi Salim for bringing up this idea during the
    discussions.

    Signed-off-by: Johannes Berg
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Add the base infrastructure and UAPI for netlink extended ACK
    reporting. All "manual" calls to netlink_ack() pass NULL for now and
    thus don't get extended ACK reporting.

    Big thanks goes to Pablo Neira Ayuso for not only bringing up the
    whole topic at netconf (again) but also coming up with the nlattr
    passing trick and various other ideas.

    Signed-off-by: Johannes Berg
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Johannes Berg
     

19 Feb, 2016

1 commit


16 Dec, 2015

1 commit


10 Sep, 2015

1 commit

  • When netlink mmap on receive side is the consumer of nf queue data,
    it can happen that in some edge cases, we write skb shared info into
    the user space mmap buffer:

    Assume a possible rx ring frame size of only 4096, and the network skb,
    which is being zero-copied into the netlink skb, contains page frags
    with an overall skb->len larger than the linear part of the netlink
    skb.

    skb_zerocopy(), which is generic and thus not aware of the fact that
    shared info cannot be accessed for such skbs then tries to write and
    fill frags, thus leaking kernel data/pointers and in some corner cases
    possibly writing out of bounds of the mmap area (when filling the
    last slot in the ring buffer this way).

    I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has
    an advertised length larger than 4096, where the linear part is visible
    at the slot beginning, and the leaked sizeof(struct skb_shared_info)
    has been written to the beginning of the next slot (also corrupting
    the struct nl_mmap_hdr slot header incl. status etc), since skb->end
    points to skb->data + ring->frame_size - NL_MMAP_HDRLEN.

    The fix adds and lets __netlink_alloc_skb() take the actual needed
    linear room for the network skb + meta data into account. It's completely
    irrelevant for non-mmaped netlink sockets, but in case mmap sockets
    are used, it can be decided whether the available skb_tailroom() is
    really large enough for the buffer, or whether it needs to internally
    fallback to a normal alloc_skb().

    >From nf queue side, the information whether the destination port is
    an mmap RX ring is not really available without extra port-to-socket
    lookup, thus it can only be determined in lower layers i.e. when
    __netlink_alloc_skb() is called that checks internally for this. I
    chose to add the extra ldiff parameter as mmap will then still work:
    We have data_len and hlen in nfqnl_build_packet_message(), data_len
    is the full length (capped at queue->copy_range) for skb_zerocopy()
    and hlen some possible part of data_len that needs to be copied; the
    rem_len variable indicates the needed remaining linear mmap space.

    The only other workaround in nf queue internally would be after
    allocation time by f.e. cap'ing the data_len to the skb_tailroom()
    iff we deal with an mmap skb, but that would 1) expose the fact that
    we use a mmap skb to upper layers, and 2) trim the skb where we
    otherwise could just have moved the full skb into the normal receive
    queue.

    After the patch, in my test case the ring slot doesn't fit and therefore
    shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and
    thus needs to be picked up via recv().

    Fixes: 3ab1f683bf8b ("nfnetlink: add support for memory mapped netlink")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 May, 2015

1 commit

  • More accurately, listen all netns that have a nsid assigned into the netns
    where the netlink socket is opened.
    For this purpose, a netlink socket option is added:
    NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
    socket will receive netlink notifications from all netns that have a nsid
    assigned into the netns where the socket has been opened. The nsid is sent
    to userland via an anscillary data.

    With this patch, a daemon needs only one socket to listen many netns. This
    is useful when the number of netns is high.

    Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
    the field nsid is valid or not. skb->cb is initialized to 0 on skb
    allocation, thus we are sure that we will never send a nsid 0 by error to
    the userland.

    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

14 Apr, 2015

1 commit


27 Dec, 2014

1 commit

  • Netlink families can exist in multiple namespaces, and for the most
    part multicast subscriptions are per network namespace. Thus it only
    makes sense to have bind/unbind notifications per network namespace.

    To achieve this, pass the network namespace of a given client socket
    to the bind/unbind functions.

    Also do this in generic netlink, and there also make sure that any
    bind for multicast groups that only exist in init_net is rejected.
    This isn't really a problem if it is accepted since a client in a
    different namespace will never receive any notifications from such
    a group, but it can confuse the family if not rejected (it's also
    possible to silently (without telling the family) accept it, but it
    would also have to be ignored on unbind so families that take any
    kind of action on bind/unbind won't do unnecessary work for invalid
    clients like that.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

04 Jun, 2014

1 commit


03 Jun, 2014

1 commit

  • It was possible to get a setuid root or setcap executable to write to
    it's stdout or stderr (which has been set made a netlink socket) and
    inadvertently reconfigure the networking stack.

    To prevent this we check that both the creator of the socket and
    the currentl applications has permission to reconfigure the network
    stack.

    Unfortunately this breaks Zebra which always uses sendto/sendmsg
    and creates it's socket without any privileges.

    To keep Zebra working don't bother checking if the creator of the
    socket has privilege when a destination address is specified. Instead
    rely exclusively on the privileges of the sender of the socket.

    Note from Andy: This is exactly Eric's code except for some comment
    clarifications and formatting fixes. Neither I nor, I think, anyone
    else is thrilled with this approach, but I'm hesitant to wait on a
    better fix since 3.15 is almost here.

    Note to stable maintainers: This is a mess. An earlier series of
    patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
    but they did so in a way that breaks Zebra. The offending series
    includes:

    commit aa4cf9452f469f16cea8c96283b641b4576d4a7b
    Author: Eric W. Biederman
    Date: Wed Apr 23 14:28:03 2014 -0700

    net: Add variants of capable for use on netlink messages

    If a given kernel version is missing that series of fixes, it's
    probably worth backporting it and this patch. if that series is
    present, then this fix is critical if you care about Zebra.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Andy Lutomirski
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

13 May, 2014

1 commit

  • Conflicts:
    drivers/net/ethernet/altera/altera_sgdma.c
    net/netlink/af_netlink.c
    net/sched/cls_api.c
    net/sched/sch_api.c

    The netlink conflict dealt with moving to netlink_capable() and
    netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
    in non-init namespaces. These were simple transformations from
    netlink_capable to netlink_ns_capable.

    The Altera driver conflict was simply code removal overlapping some
    void pointer cast cleanups in net-next.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Apr, 2014

1 commit

  • netlink_net_capable - The common case use, for operations that are safe on a network namespace
    netlink_capable - For operations that are only known to be safe for the global root
    netlink_ns_capable - The general case of capable used to handle special cases

    __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
    the skbuff of a netlink message.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

23 Apr, 2014

1 commit

  • Have the netlink per-protocol optional bind function return an int error code
    rather than void to signal a failure.

    This will enable netlink protocols to perform extra checks including
    capabilities and permissions verifications when updating memberships in
    multicast groups.

    In netlink_bind() and netlink_setsockopt() the call to the per-protocol bind
    function was moved above the multicast group update to prevent any access to
    the multicast socket groups before checking with the per-protocol bind
    function. This will enable the per-protocol bind function to be used to check
    permissions which could be denied before making them available, and to avoid
    the messy job of undoing the addition should the per-protocol bind function
    fail.

    The netfilter subsystem seems to be the only one currently using the
    per-protocol bind function.

    Signed-off-by: Richard Guy Briggs
    Signed-off-by: David S. Miller

    Richard Guy Briggs
     

02 Jan, 2014

1 commit


28 Jun, 2013

1 commit

  • Since (c05cdb1 netlink: allow large data transfers from user-space),
    netlink splats if it invokes skb_clone on large netlink skbs since:

    * skb_shared_info was not correctly initialized.
    * skb->destructor is not set in the cloned skb.

    This was spotted by trinity:

    [ 894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
    [ 894.991034] IP: [] skb_clone+0x24/0xc0
    [...]
    [ 894.991034] Call Trace:
    [ 894.991034] [] nl_fib_input+0x6a/0x240
    [ 894.991034] [] ? _raw_read_unlock+0x26/0x40
    [ 894.991034] [] netlink_unicast+0x169/0x1e0
    [ 894.991034] [] netlink_sendmsg+0x251/0x3d0

    Fix it by:

    1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
    that sets our special skb->destructor in the cloned skb. Moreover, handle
    the release of the large cloned skb head area in the destructor path.

    2) not allowing large skbuffs in the netlink broadcast path. I cannot find
    any reasonable use of the large data transfer using netlink in that path,
    moreover this helps to skip extra skb_clone handling.

    I found two more netlink clients that are cloning the skbs, but they are
    not in the sendmsg path. Therefore, the sole client cloning that I found
    seems to be the fib frontend.

    Thanks to Eric Dumazet for helping to address this issue.

    Reported-by: Fengguang Wu
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira
     

25 Jun, 2013

1 commit

  • Similarly to the networking receive path with ptype_all taps, we add
    the possibility to register netdevices that are for ARPHRD_NETLINK to
    the netlink subsystem, so that those can be used for netlink analyzers
    resp. debuggers. We do not offer a direct callback function as out-of-tree
    modules could do crap with it. Instead, a netdevice must be registered
    properly and only receives a clone, managed by the netlink layer. Symbols
    are exported as GPL-only.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

11 Jun, 2013

1 commit

  • As we know, netlink sockets are private resource of
    net namespace, they can communicate with each other
    only when they in the same net namespace. this works
    well until we try to add namespace support for other
    subsystems which use netlink.

    Don't like ipv4 and route table.., it is not suited to
    make these subsytems belong to net namespace, Such as
    audit and crypto subsystems,they are more suitable to
    user namespace.

    So we must have the ability to make the netlink sockets
    in same user namespace can communicate with each other.

    This patch adds a new function pointer "compare" for
    netlink_table, we can decide if the netlink sockets can
    communicate with each other through this netlink_table
    self-defined compare function.

    The behavior isn't changed if we don't provide the compare
    function for netlink_table.

    Signed-off-by: Gao feng
    Acked-by: Serge E. Hallyn
    Signed-off-by: David S. Miller

    Gao feng
     

20 Apr, 2013

2 commits

  • Add support for mmap'ed recvmsg(). To allow the kernel to construct messages
    into the mapped area, a dataless skb is allocated and the data pointer is
    set to point into the ring frame. This means frames will be delivered to
    userspace in order of allocation instead of order of transmission. This
    usually doesn't matter since the order is either not determinable by
    userspace or message creation/transmission is serialized. The only case
    where this can have a visible difference is nfnetlink_queue. Userspace
    can't assume mmap'ed messages have ordered IDs anymore and needs to check
    this if using batched verdicts.

    For non-mapped sockets, nothing changes.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Add helper functions for looking up mmap'ed frame headers, reading and
    writing their status, allocating skbs with mmap'ed data areas and a poll
    function.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy