18 Apr, 2017

1 commit

  • Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
    for doit functions that call it directly.

    This is the first step to using extended error reporting in rtnetlink.
    >From here individual subsystems can be updated to set netlink_ext_ack as
    needed.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

14 Apr, 2017

1 commit


11 Apr, 2016

1 commit


17 Oct, 2015

2 commits

  • This merge resolves conflicts with 75aec9df3a78 ("bridge: Remove
    br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve
    netns support in the network stack that reached upstream via David's
    net-next tree.

    Signed-off-by: Pablo Neira Ayuso

    Conflicts:
    net/bridge/br_netfilter_hooks.c

    Pablo Neira Ayuso
     
  • A recent change to the dst_output handling caused a new warning
    when the call to NF_HOOK() is the only used of a local variable
    passed as 'dev', and CONFIG_NETFILTER is disabled:

    net/ipv6/ip6_output.c: In function 'ip6_output':
    net/ipv6/ip6_output.c:135:21: warning: unused variable 'dev' [-Wunused-variable]

    The reason for this is that the NF_HOOK macro in this case does
    not reference the variable at all, and the call to dev_net(dev)
    got removed from the ip6_output function. To avoid that warning now
    and in the future, this changes the macro into an equivalent
    inline function, which tells the compiler that the variable is
    passed correctly but still unused.

    The dn_forward function apparently had the same problem in
    the past and added a local workaround that no longer works
    with the inline function. In order to avoid a regression, we
    have to also remove the #ifdef from decnet in the same patch.

    Fixes: ede2059dbaf9 ("dst: Pass net into dst->output")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Pablo Neira Ayuso

    Arnd Bergmann
     

08 Oct, 2015

1 commit


18 Sep, 2015

2 commits

  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

10 Mar, 2015

1 commit

  • After my change to neigh_hh_init to obtain the protocol from the
    neigh_table there are no more users of protocol in struct dst_ops.
    Remove the protocol field from dst_ops and all of it's initializers.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

07 Mar, 2015

1 commit

  • Other users users of the neighbour table use neigh->output as the method
    to decided when and which link-layer header to place on a packet.
    DECnet has been using neigh->output to decide which DECnet headers to
    place on a packet depending which neighbour the packet is destined for.

    The DECnet usage isn't totally wrong but it can run into problems if the
    neighbour output function is run for a second time as the teql driver
    and the bridge netfilter code can do.

    Therefore to avoid pathologic problems later down the line and make the
    neighbour code easier to understand by refactoring the decnet output
    code to only use a neighbour method to add a link layer header to a
    packet.

    This is done by moving the neigbhour operations lookup from
    dn_to_neigh_output to dn_neigh_output_packet.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

24 Feb, 2015

1 commit


19 Jan, 2015

1 commit

  • Commit 053c095a82cf ("netlink: make nlmsg_end() and genlmsg_end()
    void") didn't catch all of the cases where callers were breaking out
    on the return value being equal to zero, which they no longer should
    when zero means success.

    Fix all such cases.

    Reported-by: Marcel Holtmann
    Reported-by: Scott Feldman
    Signed-off-by: David S. Miller

    David S. Miller
     

18 Jan, 2015

1 commit

  • Contrary to common expectations for an "int" return, these functions
    return only a positive value -- if used correctly they cannot even
    return 0 because the message header will necessarily be in the skb.

    This makes the very common pattern of

    if (genlmsg_end(...) < 0) { ... }

    be a whole bunch of dead code. Many places also simply do

    return nlmsg_end(...);

    and the caller is expected to deal with it.

    This also commonly (at least for me) causes errors, because it is very
    common to write

    if (my_function(...))
    /* error condition */

    and if my_function() does "return nlmsg_end()" this is of course wrong.

    Additionally, there's not a single place in the kernel that actually
    needs the message length returned, and if anyone needs it later then
    it'll be very easy to just use skb->len there.

    Remove this, and make the functions void. This removes a bunch of dead
    code as described above. The patch adds lines because I did

    - return nlmsg_end(...);
    + nlmsg_end(...);
    + return 0;

    I could have preserved all the function's return values by returning
    skb->len, but instead I've audited all the places calling the affected
    functions and found that none cared. A few places actually compared
    the return value with < 0 with no change in behaviour, so I opted for the more
    efficient version.

    One instance of the error I've made numerous times now is also present
    in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
    check for
    Signed-off-by: David S. Miller

    Johannes Berg
     

16 Apr, 2014

1 commit

  • In the dst->output() path for ipv4, the code assumes the skb it has to
    transmit is attached to an inet socket, specifically via
    ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
    provider of the packet is an AF_PACKET socket.

    The dst->output() method gets an additional 'struct sock *sk'
    parameter. This needs a cascade of changes so that this parameter can
    be propagated from vxlan to final consumer.

    Fixes: 8f646c922d55 ("vxlan: keep original skb ownership")
    Reported-by: lucien xin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Jan, 2014

1 commit

  • The following call chain we can identify that dn_cache_getroute() is
    protected under rtnl_lock. So if we use __dev_get_by_index() instead
    of dev_get_by_index() to find interface handlers in it, this would help
    us avoid to change interface reference counter.

    rtnetlink_rcv()
    rtnl_lock()
    netlink_rcv_skb()
    dn_cache_getroute()
    rtnl_unlock()

    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

06 Dec, 2013

1 commit


23 Mar, 2013

1 commit


22 Mar, 2013

2 commits

  • With decnet converted, we can finally get rid of rta_buf and its
    computations around it. It also gets rid of the minimal header
    length verification since all message handlers do that explicitly
    anyway.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • decnet is the only subsystem left that is relying on the global
    netlink attribute buffer rta_buf. It's horrible design and we
    want to get rid of it.

    This converts all of decnet to do implicit attribute parsing. It
    also gets rid of the error prone struct dn_kern_rta.

    Yes, the fib_magic() stuff is not pretty.

    It's compiled tested but I need someone with appropriate hardware
    to test the patch since I don't have access to it.

    Cc: linux-decnet-user@lists.sourceforge.net
    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

28 Jan, 2013

1 commit


11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

10 Aug, 2012

1 commit

  • As pointed out, there are places, that access net->loopback_dev->ifindex
    and after ifindex generation is made per-net this value becomes constant
    equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
    it where appropriate.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

24 Jul, 2012

1 commit


21 Jul, 2012

1 commit


17 Jul, 2012

1 commit

  • This will be used so that we can compose a full flow key.

    Even though we have a route in this context, we need more. In the
    future the routes will be without destination address, source address,
    etc. keying. One ipv4 route will cover entire subnets, etc.

    In this environment we have to have a way to possess persistent storage
    for redirects and PMTU information. This persistent storage will exist
    in the FIB tables, and that's why we'll need to be able to rebuild a
    full lookup flow key here. Using that flow key will do a fib_lookup()
    and create/update the persistent entry.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Jul, 2012

1 commit


11 Jul, 2012

2 commits


05 Jul, 2012

2 commits


28 Jun, 2012

1 commit


27 Jun, 2012

1 commit


16 May, 2012

1 commit


16 Apr, 2012

1 commit


06 Feb, 2012

1 commit


06 Dec, 2011

1 commit


27 Nov, 2011

1 commit