12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Mar, 2014

1 commit

  • One known problem with netlink is the fact that NLMSG_GOODSIZE is
    really small on PAGE_SIZE==4096 architectures, and it is difficult
    to know in advance what buffer size is used by the application.

    This patch adds an automatic learning of the size.

    First netlink message will still be limited to ~4K, but if user used
    bigger buffers, then following messages will be able to use up to 16KB.

    This speedups dump() operations by a large factor and should be safe
    for legacy applications.

    Signed-off-by: Eric Dumazet
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Mar, 2014

1 commit

  • Conflicts:
    drivers/net/wireless/ath/ath9k/recv.c
    drivers/net/wireless/mwifiex/pcie.c
    net/ipv6/sit.c

    The SIT driver conflict consists of a bug fix being done by hand
    in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
    was created (netdev_alloc_pcpu_stats()) which takes care of this.

    The two wireless conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Feb, 2014

1 commit

  • netlink_sendmsg() was changed to prevent non-root processes from sending
    messages with dst_pid != 0.
    netlink_connect() however still only checks if nladdr->nl_groups is set.
    This patch modifies netlink_connect() to check for the same condition.

    Signed-off-by: Mike Pecovnik
    Signed-off-by: David S. Miller

    Mike Pecovnik
     

18 Feb, 2014

1 commit


19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

07 Jan, 2014

3 commits

  • Jesse Gross says:

    ====================
    [GIT net-next] Open vSwitch

    Open vSwitch changes for net-next/3.14. Highlights are:
    * Performance improvements in the mechanism to get packets to userspace
    using memory mapped netlink and skb zero copy where appropriate.
    * Per-cpu flow stats in situations where flows are likely to be shared
    across CPUs. Standard flow stats are used in other situations to save
    memory and allocation time.
    * A handful of code cleanups and rationalization.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • An insufficent ring frame size configuration can lead to an
    unnecessary skb allocation for every Netlink message. Check frame
    size before taking the queue lock and allocating the skb and
    re-check with lock to be safe.

    Signed-off-by: Thomas Graf
    Reviewed-by: Daniel Borkmann
    Signed-off-by: Jesse Gross

    Thomas Graf
     
  • Allocates a new sk_buff large enough to cover the specified payload
    plus required Netlink headers. Will check receiving socket for
    memory mapped i/o capability and use it if enabled. Will fall back
    to non-mapped skb if message size exceeds the frame size of the ring.

    Signed-of-by: Thomas Graf
    Reviewed-by: Daniel Borkmann
    Signed-off-by: Jesse Gross

    Thomas Graf
     

02 Jan, 2014

1 commit


01 Jan, 2014

2 commits

  • In order to facilitate development for netlink protocol dissector,
    fill the unused field skb->pkt_type of the cloned skb with a hint
    of the address space of the new owner (receiver) socket in the
    notion of "to kernel" resp. "to user".

    At the time we invoke __netlink_deliver_tap_skb(), we already have
    set the new skb owner via netlink_skb_set_owner_r(), so we can use
    that for netlink_is_kernel() probing.

    In normal PF_PACKET network traffic, this field denotes if the
    packet is destined for us (PACKET_HOST), if it's broadcast
    (PACKET_BROADCAST), etc.

    As we only have 3 bit reserved, we can use the value (= 6) of
    PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
    and not supported anywhere, and packets of such type were never
    exposed to user space, so there are no overlapping users of such
    kind. Thus, as wished, that seems the only way to make both
    PACKET_* values non-overlapping and therefore device agnostic.

    By using those two flags for netlink skbs on nlmon devices, they
    can be made available and picked up via sll_pkttype (previously
    unused in netlink context) in struct sockaddr_ll. We now have
    these two directions:

    - PACKET_USER (= 6) -> to user space
    - PACKET_KERNEL (= 7) -> to kernel space

    Partial `ip a` example strace for sa_family=AF_NETLINK with
    detected nl msg direction:

    syscall: direction:
    sendto(3, ...) = 40 /* to kernel */
    recvmsg(3, ...) = 3404 /* to user */
    recvmsg(3, ...) = 1120 /* to user */
    recvmsg(3, ...) = 20 /* to user */
    sendto(3, ...) = 40 /* to kernel */
    recvmsg(3, ...) = 168 /* to user */
    recvmsg(3, ...) = 144 /* to user */
    recvmsg(3, ...) = 20 /* to user */

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Jakub Zawadzki
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • We should also deliver packets to nlmon devices when we are in
    netlink_unicast_kernel(), and only one of the {src,dst} sockets
    is user sk and the other one kernel sk. That's e.g. the case in
    netlink diag, netlink route, etc. Still, forbid to deliver messages
    from kernel to kernel sks.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Jakub Zawadzki
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

29 Nov, 2013

2 commits

  • The pmcraid driver is abusing the genetlink API and is using its
    family ID as the multicast group ID, which is invalid and may
    belong to somebody else (and likely will.)

    Make it use the correct API, but since this may already be used
    as-is by userspace, reserve a family ID for this code and also
    reserve that group ID to not break userspace assumptions.

    My previous patch broke event delivery in the driver as I missed
    that it wasn't using the right API and forgot to update it later
    in my series.

    While changing this, I noticed that the genetlink code could use
    the static group ID instead of a strcmp(), so also do that for
    the VFS_DQUOT family.

    Cc: Anil Ravindranath
    Cc: "James E.J. Bottomley"
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • net/netlink/genetlink.c: In function ‘genl_validate_assign_mc_groups’:
    net/netlink/genetlink.c:217: warning: ‘err’ may be used uninitialized in this
    function

    Commit 2a94fe48f32ccf7321450a2cc07f2b724a444e5b ("genetlink: make multicast
    groups const, prevent abuse") split genl_register_mc_group() in multiple
    functions, but dropped the initialization of err.

    Initialize err to zero to fix this.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Geert Uytterhoeven
     

22 Nov, 2013

1 commit

  • Unfortunately, I introduced a tremendously stupid bug into
    genlmsg_multicast() when doing all those multicast group
    changes: it adjusts the group number, but then passes it
    to genlmsg_multicast_netns() which does that again.

    Somehow, my tests failed to catch this, so add a warning
    into genlmsg_multicast_netns() and remove the offending
    group ID adjustment.

    Also add a warning to the similar code in other functions
    so people who misuse them are more loudly warned.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

21 Nov, 2013

1 commit


20 Nov, 2013

8 commits

  • Register generic netlink multicast groups as an array with
    the family and give them contiguous group IDs. Then instead
    of passing the global group ID to the various functions that
    send messages, pass the ID relative to the family - for most
    families that's just 0 because the only have one group.

    This avoids the list_head and ID in each group, adding a new
    field for the mcast group ID offset to the family.

    At the same time, this allows us to prevent abusing groups
    again like the quota and dropmon code did, since we can now
    check that a family only uses a group it owns.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This doesn't really change anything, but prepares for the
    next patch that will change the APIs to pass the group ID
    within the family, rather than the global group ID.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • There's no reason to have the family pointer there since it
    can just be passed internally where needed, so remove it.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • There are no users of this API remaining, and we'll soon
    change group registration to be static (like ops are now)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • The quota code is abusing the genetlink API and is using
    its family ID as the multicast group ID, which is invalid
    and may belong to somebody else (and likely will.)

    Make the quota code use the correct API, but since this
    is already used as-is by userspace, reserve a family ID
    for this code and also reserve that group ID to not break
    userspace assumptions.

    Acked-by: Jan Kara
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • The drop monitor code is abusing the genetlink API and is
    statically using the generic netlink multicast group 1, even
    if that group belongs to somebody else (which it invariably
    will, since it's not reserved.)

    Make the drop monitor code use the proper APIs to reserve a
    group ID, but also reserve the group id 1 in generic netlink
    code to preserve the userspace API. Since drop monitor can
    be a module, don't clear the bit for it on unregistration.

    Acked-by: Neil Horman
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • As suggested by David Miller, make genl_register_family_with_ops()
    a macro and pass only the array, evaluating ARRAY_SIZE() in the
    macro, this is a little safer.

    The openvswitch has some indirection, assing ops/n_ops directly in
    that code. This might ultimately just assign the pointers in the
    family initializations, saving the struct genl_family_and_ops and
    code (once mcast groups are handled differently.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • The parameter is just 'group', not 'groups', fix the documentation typo.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

19 Nov, 2013

1 commit


16 Nov, 2013

1 commit

  • Now that the ops assignment is just two variables rather than a
    long list iteration etc., there's no reason to separately export
    __genl_register_family() and __genl_register_family_with_ops().

    Unify the two functions into __genl_register_family() and make
    genl_register_family_with_ops() call it after assigning the ops.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

15 Nov, 2013

3 commits

  • Allow making the ops array const by not modifying the ops
    flags on registration but rather only when ops are sent
    out in the family information.

    No users are updated yet except for the pre_doit/post_doit
    calls in wireless (the only ones that exist now.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Instead of using a linked list, use an array. This reduces
    the data size needed by the users of genetlink, for example
    in wireless (net/wireless/nl80211.c) on 64-bit it frees up
    over 1K of data space.

    Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event
    since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns
    -EINVAL anyway, therefore no such event could ever be sent.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • genl_register_ops() is still needed for internal registration,
    but is no longer available to users of the API.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

07 Sep, 2013

1 commit

  • Fix finer-grained control and let only a whitelist of allowed netlink
    protocols pass, in our case related to networking. If later on, other
    subsystems decide they want to add their protocol as well to the list
    of allowed protocols they shall simply add it. While at it, we also
    need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can
    not pick it up (as it's not filled out).

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

06 Sep, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    net/bridge/br_multicast.c
    net/ipv6/sit.c

    The conflicts were minor:

    1) sit.c changes overlap with change to ip_tunnel_xmit() signature.

    2) br_multicast.c had an overlap between computing max_delay using
    msecs_to_jiffies and turning MLDV2_MRC() into an inline function
    with a name using lowercase instead of uppercase letters.

    3) stmmac had two overlapping changes, one which conditionally allocated
    and hooked up a dma_cfg based upon the presence of the pbl OF property,
    and another one handling store-and-forward DMA made. The latter of
    which should not go into the new of_find_property() basic block.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Aug, 2013

2 commits

  • netlink dump operations take module as parameter to hold
    reference for entire netlink dump duration.
    Currently it holds ref only on genl module which is not correct
    when we use ops registered to genl from another module.
    Following patch adds module pointer to genl_ops so that netlink
    can hold ref count on it.

    CC: Jesse Gross
    CC: Johannes Berg
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • In case of genl-family with parallel ops off, dumpif() callback
    is expected to run under genl_lock, But commit def3117493eafd9df
    (genl: Allow concurrent genl callbacks.) changed this behaviour
    where only first dumpit() op was called under genl-lock.
    For subsequent dump, only nlk->cb_lock was taken.
    Following patch fixes it by defining locked dumpit() and done()
    callback which takes care of genl-locking.

    CC: Jesse Gross
    CC: Johannes Berg
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

27 Aug, 2013

1 commit


23 Aug, 2013

1 commit

  • This reverts commit 58ad436fcf49810aa006016107f494c9ac9013db.

    It turns out that the change introduced a potential deadlock
    by causing a locking dependency with netlink's cb_mutex. I
    can't seem to find a way to resolve this without doing major
    changes to the locking, so revert this.

    Signed-off-by: Johannes Berg
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Johannes Berg
     

17 Aug, 2013

1 commit


16 Aug, 2013

1 commit

  • Following patch stores struct netlink_callback in netlink_sock
    to avoid allocating and freeing it on every netlink dump msg.
    Only one dump operation is allowed for a given socket at a time
    therefore we can safely convert cb pointer to cb struct inside
    netlink_sock.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

13 Aug, 2013

1 commit

  • When dumping generic netlink families, only the first dump call
    is locked with genl_lock(), which protects the list of families,
    and thus subsequent calls can access the data without locking,
    racing against family addition/removal. This can cause a crash.
    Fix it - the locking needs to be conditional because the first
    time around it's already locked.

    A similar bug was reported to me on an old kernel (3.4.47) but
    the exact scenario that happened there is no longer possible,
    on those kernels the first round wasn't locked either. Looking
    at the current code I found the race described above, which had
    also existed on the old kernel.

    Cc: stable@vger.kernel.org
    Reported-by: Andrei Otcheretianski
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

04 Aug, 2013

1 commit


03 Aug, 2013

1 commit