19 Feb, 2016

1 commit


16 Dec, 2015

1 commit


10 Sep, 2015

1 commit

  • When netlink mmap on receive side is the consumer of nf queue data,
    it can happen that in some edge cases, we write skb shared info into
    the user space mmap buffer:

    Assume a possible rx ring frame size of only 4096, and the network skb,
    which is being zero-copied into the netlink skb, contains page frags
    with an overall skb->len larger than the linear part of the netlink
    skb.

    skb_zerocopy(), which is generic and thus not aware of the fact that
    shared info cannot be accessed for such skbs then tries to write and
    fill frags, thus leaking kernel data/pointers and in some corner cases
    possibly writing out of bounds of the mmap area (when filling the
    last slot in the ring buffer this way).

    I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has
    an advertised length larger than 4096, where the linear part is visible
    at the slot beginning, and the leaked sizeof(struct skb_shared_info)
    has been written to the beginning of the next slot (also corrupting
    the struct nl_mmap_hdr slot header incl. status etc), since skb->end
    points to skb->data + ring->frame_size - NL_MMAP_HDRLEN.

    The fix adds and lets __netlink_alloc_skb() take the actual needed
    linear room for the network skb + meta data into account. It's completely
    irrelevant for non-mmaped netlink sockets, but in case mmap sockets
    are used, it can be decided whether the available skb_tailroom() is
    really large enough for the buffer, or whether it needs to internally
    fallback to a normal alloc_skb().

    >From nf queue side, the information whether the destination port is
    an mmap RX ring is not really available without extra port-to-socket
    lookup, thus it can only be determined in lower layers i.e. when
    __netlink_alloc_skb() is called that checks internally for this. I
    chose to add the extra ldiff parameter as mmap will then still work:
    We have data_len and hlen in nfqnl_build_packet_message(), data_len
    is the full length (capped at queue->copy_range) for skb_zerocopy()
    and hlen some possible part of data_len that needs to be copied; the
    rem_len variable indicates the needed remaining linear mmap space.

    The only other workaround in nf queue internally would be after
    allocation time by f.e. cap'ing the data_len to the skb_tailroom()
    iff we deal with an mmap skb, but that would 1) expose the fact that
    we use a mmap skb to upper layers, and 2) trim the skb where we
    otherwise could just have moved the full skb into the normal receive
    queue.

    After the patch, in my test case the ring slot doesn't fit and therefore
    shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and
    thus needs to be picked up via recv().

    Fixes: 3ab1f683bf8b ("nfnetlink: add support for memory mapped netlink")
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 May, 2015

1 commit

  • More accurately, listen all netns that have a nsid assigned into the netns
    where the netlink socket is opened.
    For this purpose, a netlink socket option is added:
    NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
    socket will receive netlink notifications from all netns that have a nsid
    assigned into the netns where the socket has been opened. The nsid is sent
    to userland via an anscillary data.

    With this patch, a daemon needs only one socket to listen many netns. This
    is useful when the number of netns is high.

    Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
    the field nsid is valid or not. skb->cb is initialized to 0 on skb
    allocation, thus we are sure that we will never send a nsid 0 by error to
    the userland.

    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

14 Apr, 2015

1 commit


27 Dec, 2014

1 commit

  • Netlink families can exist in multiple namespaces, and for the most
    part multicast subscriptions are per network namespace. Thus it only
    makes sense to have bind/unbind notifications per network namespace.

    To achieve this, pass the network namespace of a given client socket
    to the bind/unbind functions.

    Also do this in generic netlink, and there also make sure that any
    bind for multicast groups that only exist in init_net is rejected.
    This isn't really a problem if it is accepted since a client in a
    different namespace will never receive any notifications from such
    a group, but it can confuse the family if not rejected (it's also
    possible to silently (without telling the family) accept it, but it
    would also have to be ignored on unbind so families that take any
    kind of action on bind/unbind won't do unnecessary work for invalid
    clients like that.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

04 Jun, 2014

1 commit


03 Jun, 2014

1 commit

  • It was possible to get a setuid root or setcap executable to write to
    it's stdout or stderr (which has been set made a netlink socket) and
    inadvertently reconfigure the networking stack.

    To prevent this we check that both the creator of the socket and
    the currentl applications has permission to reconfigure the network
    stack.

    Unfortunately this breaks Zebra which always uses sendto/sendmsg
    and creates it's socket without any privileges.

    To keep Zebra working don't bother checking if the creator of the
    socket has privilege when a destination address is specified. Instead
    rely exclusively on the privileges of the sender of the socket.

    Note from Andy: This is exactly Eric's code except for some comment
    clarifications and formatting fixes. Neither I nor, I think, anyone
    else is thrilled with this approach, but I'm hesitant to wait on a
    better fix since 3.15 is almost here.

    Note to stable maintainers: This is a mess. An earlier series of
    patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
    but they did so in a way that breaks Zebra. The offending series
    includes:

    commit aa4cf9452f469f16cea8c96283b641b4576d4a7b
    Author: Eric W. Biederman
    Date: Wed Apr 23 14:28:03 2014 -0700

    net: Add variants of capable for use on netlink messages

    If a given kernel version is missing that series of fixes, it's
    probably worth backporting it and this patch. if that series is
    present, then this fix is critical if you care about Zebra.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Andy Lutomirski
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

13 May, 2014

1 commit

  • Conflicts:
    drivers/net/ethernet/altera/altera_sgdma.c
    net/netlink/af_netlink.c
    net/sched/cls_api.c
    net/sched/sch_api.c

    The netlink conflict dealt with moving to netlink_capable() and
    netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
    in non-init namespaces. These were simple transformations from
    netlink_capable to netlink_ns_capable.

    The Altera driver conflict was simply code removal overlapping some
    void pointer cast cleanups in net-next.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Apr, 2014

1 commit

  • netlink_net_capable - The common case use, for operations that are safe on a network namespace
    netlink_capable - For operations that are only known to be safe for the global root
    netlink_ns_capable - The general case of capable used to handle special cases

    __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
    the skbuff of a netlink message.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

23 Apr, 2014

1 commit

  • Have the netlink per-protocol optional bind function return an int error code
    rather than void to signal a failure.

    This will enable netlink protocols to perform extra checks including
    capabilities and permissions verifications when updating memberships in
    multicast groups.

    In netlink_bind() and netlink_setsockopt() the call to the per-protocol bind
    function was moved above the multicast group update to prevent any access to
    the multicast socket groups before checking with the per-protocol bind
    function. This will enable the per-protocol bind function to be used to check
    permissions which could be denied before making them available, and to avoid
    the messy job of undoing the addition should the per-protocol bind function
    fail.

    The netfilter subsystem seems to be the only one currently using the
    per-protocol bind function.

    Signed-off-by: Richard Guy Briggs
    Signed-off-by: David S. Miller

    Richard Guy Briggs
     

02 Jan, 2014

1 commit


28 Jun, 2013

1 commit

  • Since (c05cdb1 netlink: allow large data transfers from user-space),
    netlink splats if it invokes skb_clone on large netlink skbs since:

    * skb_shared_info was not correctly initialized.
    * skb->destructor is not set in the cloned skb.

    This was spotted by trinity:

    [ 894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
    [ 894.991034] IP: [] skb_clone+0x24/0xc0
    [...]
    [ 894.991034] Call Trace:
    [ 894.991034] [] nl_fib_input+0x6a/0x240
    [ 894.991034] [] ? _raw_read_unlock+0x26/0x40
    [ 894.991034] [] netlink_unicast+0x169/0x1e0
    [ 894.991034] [] netlink_sendmsg+0x251/0x3d0

    Fix it by:

    1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
    that sets our special skb->destructor in the cloned skb. Moreover, handle
    the release of the large cloned skb head area in the destructor path.

    2) not allowing large skbuffs in the netlink broadcast path. I cannot find
    any reasonable use of the large data transfer using netlink in that path,
    moreover this helps to skip extra skb_clone handling.

    I found two more netlink clients that are cloning the skbs, but they are
    not in the sendmsg path. Therefore, the sole client cloning that I found
    seems to be the fib frontend.

    Thanks to Eric Dumazet for helping to address this issue.

    Reported-by: Fengguang Wu
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira
     

25 Jun, 2013

1 commit

  • Similarly to the networking receive path with ptype_all taps, we add
    the possibility to register netdevices that are for ARPHRD_NETLINK to
    the netlink subsystem, so that those can be used for netlink analyzers
    resp. debuggers. We do not offer a direct callback function as out-of-tree
    modules could do crap with it. Instead, a netdevice must be registered
    properly and only receives a clone, managed by the netlink layer. Symbols
    are exported as GPL-only.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

11 Jun, 2013

1 commit

  • As we know, netlink sockets are private resource of
    net namespace, they can communicate with each other
    only when they in the same net namespace. this works
    well until we try to add namespace support for other
    subsystems which use netlink.

    Don't like ipv4 and route table.., it is not suited to
    make these subsytems belong to net namespace, Such as
    audit and crypto subsystems,they are more suitable to
    user namespace.

    So we must have the ability to make the netlink sockets
    in same user namespace can communicate with each other.

    This patch adds a new function pointer "compare" for
    netlink_table, we can decide if the netlink sockets can
    communicate with each other through this netlink_table
    self-defined compare function.

    The behavior isn't changed if we don't provide the compare
    function for netlink_table.

    Signed-off-by: Gao feng
    Acked-by: Serge E. Hallyn
    Signed-off-by: David S. Miller

    Gao feng
     

20 Apr, 2013

3 commits

  • Add support for mmap'ed recvmsg(). To allow the kernel to construct messages
    into the mapped area, a dataless skb is allocated and the data pointer is
    set to point into the ring frame. This means frames will be delivered to
    userspace in order of allocation instead of order of transmission. This
    usually doesn't matter since the order is either not determinable by
    userspace or message creation/transmission is serialized. The only case
    where this can have a visible difference is nfnetlink_queue. Userspace
    can't assume mmap'ed messages have ordered IDs anymore and needs to check
    this if using batched verdicts.

    For non-mapped sockets, nothing changes.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Add helper functions for looking up mmap'ed frame headers, reading and
    writing their status, allocating skbs with mmap'ed data areas and a poll
    function.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Memory mapped netlink needs to store the receiving userspace socket
    when sending from the kernel to userspace. Rename 'ssk' to 'sk' to
    avoid confusion.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

13 Oct, 2012

1 commit


07 Oct, 2012

1 commit

  • I get a panic when I use ss -a and rmmod inet_diag at the
    same time.

    It's because netlink_dump uses inet_diag_dump which belongs to module
    inet_diag.

    I search the codes and find many modules have the same problem. We
    need to add a reference to the module which the cb->dump belongs to.

    Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.

    Change From v3:
    change netlink_dump_start to inline,suggestion from Pablo and
    Eric.

    Change From v2:
    delete netlink_dump_done,and call module_put in netlink_dump
    and netlink_sock_destruct.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

23 Sep, 2012

1 commit


22 Sep, 2012

1 commit

  • Since (9f00d97 netlink: hide struct module parameter in netlink_kernel_create),
    linux/netlink.h includes linux/module.h because of the use of THIS_MODULE.

    Use linux/export.h instead, as suggested by Stephen Rothwell, which is
    significantly smaller and defines THIS_MODULES.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

09 Sep, 2012

2 commits

  • This patch defines netlink_kernel_create as a wrapper function of
    __netlink_kernel_create to hide the struct module *me parameter
    (which seems to be THIS_MODULE in all existing netlink subsystems).

    Suggested by David S. Miller.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Replace netlink_set_nonroot by one new field `flags' in
    struct netlink_kernel_cfg that is passed to netlink_kernel_create.

    This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
    now the flags field in nl_table is generic (so we can add more
    flags if needed in the future).

    Also adjust all callers in the net-next tree to use these flags
    instead of netlink_set_nonroot.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

08 Sep, 2012

1 commit

  • Passing uids and gids on NETLINK_CB from a process in one user
    namespace to a process in another user namespace can result in the
    wrong uid or gid being presented to userspace. Avoid that problem by
    passing kuids and kgids instead.

    - define struct scm_creds for use in scm_cookie and netlink_skb_parms
    that holds uid and gid information in kuid_t and kgid_t.

    - Modify scm_set_cred to fill out scm_creds by heand instead of using
    cred_to_ucred to fill out struct ucred. This conversion ensures
    userspace does not get incorrect uid or gid values to look at.

    - Modify scm_recv to convert from struct scm_creds to struct ucred
    before copying credential values to userspace.

    - Modify __scm_send to populate struct scm_creds on in the scm_cookie,
    instead of just copying struct ucred from userspace.

    - Modify netlink_sendmsg to copy scm_creds instead of struct ucred
    into the NETLINK_CB.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

15 Aug, 2012

1 commit

  • The sending socket of an skb is already available by it's port id
    in the NETLINK_CB. If you want to know more like to examine the
    credentials on the sending socket you have to look up the sending
    socket by it's port id and all of the needed functions and data
    structures are static inside of af_netlink.c. So do the simple
    thing and pass the sending socket to the receivers in the NETLINK_CB.

    I intend to use this to get the user namespace of the sending socket
    in inet_diag so that I can report uids in the context of the process
    who opened the socket, the same way I report uids in the contect
    of the process who opens files.

    Acked-by: David S. Miller
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

30 Jun, 2012

2 commits

  • This patch adds a hook in the binding path of netlink.

    This is used by ctnetlink to allow module autoloading for the case
    in which one user executes:

    conntrack -E

    So far, this resulted in nfnetlink loaded, but not
    nf_conntrack_netlink.

    I have received in the past many complains on this behaviour.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • This patch adds the following structure:

    struct netlink_kernel_cfg {
    unsigned int groups;
    void (*input)(struct sk_buff *skb);
    struct mutex *cb_mutex;
    };

    That can be passed to netlink_kernel_create to set optional configurations
    for netlink kernel sockets.

    I've populated this structure by looking for NULL and zero parameters at the
    existing code. The remaining parameters that always need to be set are still
    left in the original interface.

    That includes optional parameters for the netlink socket creation. This allows
    easy extensibility of this interface in the future.

    This patch also adapts all callers to use this new interface.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

27 Jun, 2012

1 commit


09 May, 2012

1 commit

  • This patch removes ip_queue support which was marked as obsolete
    years ago. The nfnetlink_queue modules provides more advanced
    user-space packet queueing mechanism.

    This patch also removes capability code included in SELinux that
    refers to ip_queue. Otherwise, we break compilation.

    Several warning has been sent regarding this to the mailing list
    in the past month without anyone rising the hand to stop this
    with some strong argument.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

27 Feb, 2012

2 commits

  • This patch allows you to pass a data pointer that can be
    accessed from the dump callback.

    Netfilter is going to use this patch to provide filtered dumps
    to user-space. This is specifically interesting in ctnetlink that
    may handle lots of conntrack entries. We can save precious
    cycles by skipping the conversion to TLV format of conntrack
    entries that are not interesting for user-space.

    More specifically, ctnetlink will include one operation to allow
    to filter the dumping of conntrack entries by ctmark values.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Davem considers that the argument list of this interface is getting
    out of control. This patch tries to address this issue following
    his proposal:

    struct netlink_dump_control c = { .dump = dump, .done = done, ... };

    netlink_dump_start(..., &c);

    Suggested by David S. Miller.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

31 Jan, 2012

1 commit

  • text data bss dec hex filename
    8455963 532732 1810804 10799499 a4c98b vmlinux.o.before
    8448899 532732 1810804 10792435 a4adf3 vmlinux.o

    This change also removes commented-out copy of __nlmsg_put
    which was last touched in 2005 with "Enable once all users
    have been converted" comment on top.

    Changes in v2: rediffed against net-next.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

07 Dec, 2011

1 commit

  • The ultimate goal is to get the sock_diag module, that works in
    family+protocol terms. Currently this is suitable to do on the
    inet_diag basis, so rename parts of the code. It will be moved
    to sock_diag.c later.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

02 Nov, 2011

1 commit

  • * git://github.com/herbertx/crypto: (48 commits)
    crypto: user - Depend on NET instead of selecting it
    crypto: user - Add dependency on NET
    crypto: talitos - handle descriptor not found in error path
    crypto: user - Initialise match in crypto_alg_match
    crypto: testmgr - add twofish tests
    crypto: testmgr - add blowfish test-vectors
    crypto: Make hifn_795x build depend on !ARCH_DMA_ADDR_T_64BIT
    crypto: twofish-x86_64-3way - fix ctr blocksize to 1
    crypto: blowfish-x86_64 - fix ctr blocksize to 1
    crypto: whirlpool - count rounds from 0
    crypto: Add userspace report for compress type algorithms
    crypto: Add userspace report for cipher type algorithms
    crypto: Add userspace report for rng type algorithms
    crypto: Add userspace report for pcompress type algorithms
    crypto: Add userspace report for nivaead type algorithms
    crypto: Add userspace report for aead type algorithms
    crypto: Add userspace report for givcipher type algorithms
    crypto: Add userspace report for ablkcipher type algorithms
    crypto: Add userspace report for blkcipher type algorithms
    crypto: Add userspace report for ahash type algorithms
    ...

    Linus Torvalds
     

21 Oct, 2011

1 commit


27 Aug, 2011

1 commit


08 Aug, 2011

1 commit

  • Currently userland will barf when including linux/netlink.h unless it
    precisely includes sys/socket.h first. The issue is where the
    definition of "sa_family_t" comes from.

    We've been back and forth on how to fix this issue in the past, see:

    http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621
    http://thread.gmane.org/gmane.linux.network/143380

    Ben Hutchings suggested we take a hint from how we handle the
    sockaddr_storage type. First we define a "__kernel_sa_family_t"
    to linux/socket.h that is always defined.

    Then if __KERNEL__ is defined, we also define "sa_family_t" as
    equal to "__kernel_sa_family_t".

    Then in places like linux/netlink.h we use __kernel_sa_family_t
    in user visible datastructures.

    Reported-by: Michel Machado
    Signed-off-by: David S. Miller

    David S. Miller
     

25 Jun, 2011

1 commit