08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

07 Apr, 2015

1 commit

  • Conflicts:
    drivers/net/ethernet/mellanox/mlx4/cmd.c
    net/core/fib_rules.c
    net/ipv4/fib_frontend.c

    The fib_rules.c and fib_frontend.c conflicts were locking adjustments
    in 'net' overlapping addition and removal of code in 'net-next'.

    The mlx4 conflict was a bug fix in 'net' happening in the same
    place a constant was being replaced with a more suitable macro.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Apr, 2015

1 commit


03 Apr, 2015

1 commit

  • We have to hold rtnl lock for fib_rules_unregister()
    otherwise the following race could happen:

    fib_rules_unregister(): fib_nl_delrule():
    ... ...
    ... ops = lookup_rules_ops();
    list_del_rcu(&ops->list);
    list_for_each_entry(ops->rules) {
    fib_rules_cleanup_ops(ops); ...
    list_del_rcu(); list_del_rcu();
    }

    Note, net->rules_mod_lock is actually not needed at all,
    either upper layer netns code or rtnl lock guarantees
    we are safe.

    Cc: Alexander Duyck
    Cc: Thomas Graf
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

10 Mar, 2015

1 commit

  • After my change to neigh_hh_init to obtain the protocol from the
    neigh_table there are no more users of protocol in struct dst_ops.
    Remove the protocol field from dst_ops and all of it's initializers.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

07 Mar, 2015

1 commit

  • Other users users of the neighbour table use neigh->output as the method
    to decided when and which link-layer header to place on a packet.
    DECnet has been using neigh->output to decide which DECnet headers to
    place on a packet depending which neighbour the packet is destined for.

    The DECnet usage isn't totally wrong but it can run into problems if the
    neighbour output function is run for a second time as the teql driver
    and the bridge netfilter code can do.

    Therefore to avoid pathologic problems later down the line and make the
    neighbour code easier to understand by refactoring the decnet output
    code to only use a neighbour method to add a link layer header to a
    packet.

    This is done by moving the neigbhour operations lookup from
    dn_to_neigh_output to dn_neigh_output_packet.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

04 Mar, 2015

2 commits

  • While looking at the mpls code I found myself writing yet another
    version of neigh_lookup_noref. We currently have __ipv4_lookup_noref
    and __ipv6_lookup_noref.

    So to make my work a little easier and to make it a smidge easier to
    verify/maintain the mpls code in the future I stopped and wrote
    ___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and
    __ipv6_lookup_noref in terms of this new function. I tested my new
    version by verifying that the same code is generated in
    ip_finish_output2 and ip6_finish_output2 where these functions are
    inlined.

    To get to ___neigh_lookup_noref I added a new neighbour cache table
    function key_eq. So that the static size of the key would be
    available.

    I also added __neigh_lookup_noref for people who want to to lookup
    a neighbour table entry quickly but don't know which neibhgour table
    they are going to look up.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Conflicts:
    drivers/net/ethernet/rocker/rocker.c

    The rocker commit was two overlapping changes, one to rename
    the ->vport member to ->pport, and another making the bitmask
    expression use '1ULL' instead of plain '1'.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Mar, 2015

2 commits

  • - Add protocol to neigh_tbl so that dst->ops->protocol is not needed
    - Acquire the device from neigh->dev

    This results in a neigh_hh_init that will cache the samve values
    regardless of the packets flowing through it.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

24 Feb, 2015

1 commit


19 Jan, 2015

1 commit

  • Commit 053c095a82cf ("netlink: make nlmsg_end() and genlmsg_end()
    void") didn't catch all of the cases where callers were breaking out
    on the return value being equal to zero, which they no longer should
    when zero means success.

    Fix all such cases.

    Reported-by: Marcel Holtmann
    Reported-by: Scott Feldman
    Signed-off-by: David S. Miller

    David S. Miller
     

18 Jan, 2015

1 commit

  • Contrary to common expectations for an "int" return, these functions
    return only a positive value -- if used correctly they cannot even
    return 0 because the message header will necessarily be in the skb.

    This makes the very common pattern of

    if (genlmsg_end(...) < 0) { ... }

    be a whole bunch of dead code. Many places also simply do

    return nlmsg_end(...);

    and the caller is expected to deal with it.

    This also commonly (at least for me) causes errors, because it is very
    common to write

    if (my_function(...))
    /* error condition */

    and if my_function() does "return nlmsg_end()" this is of course wrong.

    Additionally, there's not a single place in the kernel that actually
    needs the message length returned, and if anyone needs it later then
    it'll be very easy to just use skb->len there.

    Remove this, and make the functions void. This removes a bunch of dead
    code as described above. The patch adds lines because I did

    - return nlmsg_end(...);
    + nlmsg_end(...);
    + return 0;

    I could have preserved all the function's return values by returning
    skb->len, but instead I've audited all the places calling the affected
    functions and found that none cared. A few places actually compared
    the return value with < 0 with no change in behaviour, so I opted for the more
    efficient version.

    One instance of the error I've made numerous times now is also present
    in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
    check for
    Signed-off-by: David S. Miller

    Johannes Berg
     

06 Jan, 2015

1 commit

  • This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
    control metric to be set up and dumped back to user space.

    While the internal representation of RTAX_CC_ALGO is handled as a u32
    key, we avoided to expose this implementation detail to user space, thus
    instead, we chose the netlink attribute that is being exchanged between
    user space to be the actual congestion control algorithm name, similarly
    as in the setsockopt(2) API in order to allow for maximum flexibility,
    even for 3rd party modules.

    It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
    it should have been stored in RTAX_FEATURES instead, we first thought
    about reusing it for the congestion control key, but it brings more
    complications and/or confusion than worth it.

    Joint work with Florian Westphal.

    Signed-off-by: Florian Westphal
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

24 Nov, 2014

2 commits


12 Nov, 2014

1 commit

  • Currently there are only three neigh tables in the whole kernel:
    arp table, ndisc table and decnet neigh table. What's more,
    we don't support registering multiple tables per family.
    Therefore we can just make these tables statically built-in.

    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

23 Aug, 2014

3 commits

  • The functions time_before, time_before_eq, time_after, and time_after_eq
    are more robust for comparing jiffies against other values.

    A simplified version of the Coccinelle semantic patch making this change
    is as follows:

    @change@
    expression E1,E2,E3;
    @@
    - jiffies - E1 >= (E2*E3)
    + time_after_eq(jiffies, E1+E2*E3)

    Signed-off-by: Himangi Saraogi
    Acked-by: Julia Lawall
    Signed-off-by: David S. Miller

    Himangi Saraogi
     
  • The functions time_before, time_before_eq, time_after, and time_after_eq
    are more robust for comparing jiffies against other values.

    A simplified version of the Coccinelle semantic patch making this change
    is as follows:

    @change@
    expression E1,E2;
    @@
    - (jiffies - E1) >= E2
    + time_after_eq(jiffies, E1+E2)

    Signed-off-by: Himangi Saraogi
    Acked-by: Julia Lawall
    Signed-off-by: David S. Miller

    Himangi Saraogi
     
  • The functions time_before, time_before_eq, time_after, and time_after_eq
    are more robust for comparing jiffies against other values.

    A simplified version of the Coccinelle semantic patch making this change
    is as follows:

    @change@
    expression E1,E2;
    @@

    (
    - (jiffies - E1) < E2
    + time_before(jiffies, E1+E2)
    )

    Signed-off-by: Himangi Saraogi
    Acked-by: Julia Lawall
    Signed-off-by: David S. Miller

    Himangi Saraogi
     

24 May, 2014

1 commit

  • Define separate fields in the sock structure for configuring disabling
    checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
    The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
    removed UDP_CSUM_* defines since they are no longer necessary.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

25 Apr, 2014

1 commit

  • It is possible by passing a netlink socket to a more privileged
    executable and then to fool that executable into writing to the socket
    data that happens to be valid netlink message to do something that
    privileged executable did not intend to do.

    To keep this from happening replace bare capable and ns_capable calls
    with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
    Which act the same as the previous calls except they verify that the
    opener of the socket had the desired permissions as well.

    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

16 Apr, 2014

1 commit

  • In the dst->output() path for ipv4, the code assumes the skb it has to
    transmit is attached to an inet socket, specifically via
    ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
    provider of the packet is an AF_PACKET socket.

    The dst->output() method gets an additional 'struct sock *sk'
    parameter. This needs a cascade of changes so that this parameter can
    be propagated from vxlan to final consumer.

    Fixes: 8f646c922d55 ("vxlan: keep original skb ownership")
    Reported-by: lucien xin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Feb, 2014

2 commits

  • Move prototype declaration of functions to header file include/net/dn.h
    from net/decnet/af_decnet.c because they are used by more than one file.

    This eliminates the following warning in net/decnet/af_decnet.c:
    net/decnet/sysctl_net_decnet.c:354:6: warning: no previous prototype for ‘dn_register_sysctl’ [-Wmissing-prototypes]
    net/decnet/sysctl_net_decnet.c:359:6: warning: no previous prototype for ‘dn_unregister_sysctl’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: David S. Miller

    Rashika Kheria
     
  • Move prototype declaration of functions to header file include/net/dn_route.h
    from net/decnet/af_decnet.c because it is used by more than one file.

    This eliminates the following warning in net/decnet/dn_route.c:
    net/decnet/dn_route.c:629:5: warning: no previous prototype for ‘dn_route_rcv’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: David S. Miller

    Rashika Kheria
     

19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

15 Jan, 2014

1 commit

  • The following call chain we can identify that dn_cache_getroute() is
    protected under rtnl_lock. So if we use __dev_get_by_index() instead
    of dev_get_by_index() to find interface handlers in it, this would help
    us avoid to change interface reference counter.

    rtnetlink_rcv()
    rtnl_lock()
    netlink_rcv_skb()
    dn_cache_getroute()
    rtnl_unlock()

    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

20 Dec, 2013

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2013-12-19

    1) Use the user supplied policy index instead of a generated one
    if present. From Fan Du.

    2) Make xfrm migration namespace aware. From Fan Du.

    3) Make the xfrm state and policy locks namespace aware. From Fan Du.

    4) Remove ancient sleeping when the SA is in acquire state,
    we now queue packets to the policy instead. This replaces the
    sleeping code.

    5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
    posibility to sleep. The sleeping code is gone, so remove it.

    6) Check user specified spi for IPComp. Thr spi for IPcomp is only
    16 bit wide, so check for a valid value. From Fan Du.

    7) Export verify_userspi_info to check for valid user supplied spi ranges
    with pfkey and netlink. From Fan Du.

    8) RFC3173 states that if the total size of a compressed payload and the IPComp
    header is not smaller than the size of the original payload, the IP datagram
    must be sent in the original non-compressed form. These packets are dropped
    by the inbound policy check because they are not transformed. Document the need
    to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Dec, 2013

1 commit


10 Dec, 2013

1 commit

  • This patch converts the neigh param members to an array. This allows easier
    manipulation which will be needed later on to provide better management of
    default values.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

06 Dec, 2013

1 commit


14 Oct, 2013

1 commit


13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

29 May, 2013

1 commit

  • So far, only net_device * could be passed along with netdevice notifier
    event. This patch provides a possibility to pass custom structure
    able to provide info that event listener needs to know.

    Signed-off-by: Jiri Pirko

    v2->v3: fix typo on simeth
    shortened dev_getter
    shortened notifier_info struct name
    v1->v2: fix notifier_call parameter in call_netdevice_notifier()
    Signed-off-by: David S. Miller

    Jiri Pirko
     

08 Apr, 2013

1 commit


29 Mar, 2013

1 commit


23 Mar, 2013

1 commit


22 Mar, 2013

2 commits

  • With decnet converted, we can finally get rid of rta_buf and its
    computations around it. It also gets rid of the minimal header
    length verification since all message handlers do that explicitly
    anyway.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • decnet is the only subsystem left that is relying on the global
    netlink attribute buffer rta_buf. It's horrible design and we
    want to get rid of it.

    This converts all of decnet to do implicit attribute parsing. It
    also gets rid of the error prone struct dn_kern_rta.

    Yes, the fib_magic() stuff is not pretty.

    It's compiled tested but I need someone with appropriate hardware
    to test the patch since I don't have access to it.

    Cc: linux-decnet-user@lists.sourceforge.net
    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf