19 Nov, 2016

1 commit


09 Aug, 2016

1 commit


03 Aug, 2016

1 commit

  • There was only one use of __initdata_refok and __exit_refok

    __init_refok was used 46 times against 82 for __ref.

    Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
    section reference annotations tags: __ref, __refdata, __refconst")

    This patch removes the following compatibility definitions and replaces
    them treewide.

    /* compatibility defines */
    #define __init_refok __ref
    #define __initdata_refok __refdata
    #define __exit_refok __ref

    I can also provide separate patches if necessary.
    (One patch per tree and check in 1 month or 2 to remove old definitions)

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Cc: Ingo Molnar
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

14 Dec, 2015

1 commit


07 Aug, 2015

1 commit

  • - Move the nfnl_acct_list into the network namespace, initialize
    and destroy it per namespace
    - Keep track of refcnt on nfacct objects, the old logic does not
    longer work with a per namespace list
    - Adjust xt_nfacct to pass the namespace when registring objects

    Signed-off-by: Andreas Schultz
    Signed-off-by: Pablo Neira Ayuso

    Andreas Schultz
     

19 Jun, 2015

1 commit


18 May, 2015

1 commit

  • The spinlock is used to protect netns_ids which is per net,
    so there is no need to use a global spinlock.

    Cc: Nicolas Dichtel
    Signed-off-by: Cong Wang
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    WANG Cong
     

10 May, 2015

2 commits

  • More accurately, listen all netns that have a nsid assigned into the netns
    where the netlink socket is opened.
    For this purpose, a netlink socket option is added:
    NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
    socket will receive netlink notifications from all netns that have a nsid
    assigned into the netns where the socket has been opened. The nsid is sent
    to userland via an anscillary data.

    With this patch, a daemon needs only one socket to listen many netns. This
    is useful when the number of netns is high.

    Because 0 is a valid value for a nsid, the field nsid_is_set indicates if
    the field nsid is valid or not. skb->cb is initialized to 0 on skb
    allocation, thus we are sure that we will never send a nsid 0 by error to
    the userland.

    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • In a following commit, a new function will be introduced to only lookup for
    a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the
    existing function is renamed.

    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

13 Mar, 2015

2 commits

  • Having to say
    > #ifdef CONFIG_NET_NS
    > struct net *net;
    > #endif

    in structures is a little bit wordy and a little bit error prone.

    Instead it is possible to say:
    > typedef struct {
    > #ifdef CONFIG_NET_NS
    > struct net *net;
    > #endif
    > } possible_net_t;

    And then in a header say:

    > possible_net_t net;

    Which is cleaner and easier to use and easier to test, as the
    possible_net_t is always there no matter what the compile options.

    Further this allows read_pnet and write_pnet to be functions in all
    cases which is better at catching typos.

    This change adds possible_net_t, updates the definitions of read_pnet
    and write_pnet, updates optional struct net * variables that
    write_pnet uses on to have the type possible_net_t, and finally fixes
    up the b0rked users of read_pnet and write_pnet.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • hold_net and release_net were an idea that turned out to be useless.
    The code has been disabled since 2008. Kill the code it is long past due.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

12 Mar, 2015

1 commit

  • A long standing problem in netlink socket dumps is the use
    of kernel socket addresses as cookies.

    1) It is a security concern.

    2) Sockets can be reused quite quickly, so there is
    no guarantee a cookie is used once and identify
    a flow.

    3) request sock, establish sock, and timewait socks
    for a given flow have different cookies.

    Part of our effort to bring better TCP statistics requires
    to switch to a different allocator.

    In this patch, I chose to use a per network namespace 64bit generator,
    and to use it only in the case a socket needs to be dumped to netlink.
    (This might be refined later if needed)

    Note that I tried to carry cookies from request sock, to establish sock,
    then timewait sockets.

    Signed-off-by: Eric Dumazet
    Cc: Eric Salo
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Mar, 2015

1 commit

  • This change adds a new Kconfig option MPLS_ROUTING.

    The core of this change is the code to look at an mpls packet received
    from another machine. Look that packet up in a routing table and
    forward the packet on.

    Support of MPLS over ATM is not considered or attempted here. This
    implemntation follows RFC3032 and implements the MPLS shim header that
    can pass over essentially any network.

    What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
    net->mpls.platform_label[]. What RFC3031 refers to as the Next Label
    Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the
    label fordwarding information base (lfib) might also be valid.

    Further the implemntation forwards packets as described in RFC3032.
    There is no need and given the original motivation for MPLS a strong
    discincentive to have a flexible label forwarding path. In essence
    the logic is the topmost label is read, looked up, removed, and
    replaced by 0 or more new lables and the sent out the specified
    interface to it's next hop.

    Quite a few optional features are not implemented here. Among them
    are generation of ICMP errors when the TTL is exceeded or the packet
    is larger than the next hop MTU (those conditions are detected and the
    packets are dropped instead of generating an icmp error). The traffic
    class field is always set to 0. The implementation focuses on IP over
    MPLS and does not handle egress of other kinds of protocols.

    Instead of implementing coordination with the neighbour table and
    sorting out how to input next hops in a different address family (for
    which there is value). I was lazy and implemented a next hop mac
    address instead. The code is simpler and there are flavor of MPLS
    such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
    appropriate so a next hop by mac address would need to be implemented
    at some point.

    Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.

    Decoding the mpls header must be done by first byeswapping a 32bit bit
    endian word into the local cpu endian and then bit shifting to extract
    the pieces. There is no C bit-field that can represent a wire format
    mpls header on a little endian machine as the low bits of the 20bit
    label wind up in the wrong half of third byte. Therefore internally
    everything is deal with in cpu native byte order except when writing
    to and reading from a packet.

    For management simplicity if a label is configured to forward out
    an interface that is down the packet is dropped early. Similarly
    if an network interface is removed rt_dev is updated to NULL
    (so no reference is preserved) and any packets for that label
    are dropped. Keeping the label entries in the kernel allows
    the kernel label table to function as the definitive source
    of which labels are allocated and which are not.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

20 Jan, 2015

1 commit

  • With this patch, a user can define an id for a peer netns by providing a FD or a
    PID. These ids are local to the netns where it is added (ie valid only into this
    netns).

    The main function (ie the one exported to other module), peernet2id(), allows to
    get the id of a peer netns. If no id has been assigned by the user, this
    function allocates one.

    These ids will be used in netlink messages to point to a peer netns, for example
    in case of a x-netns interface.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

05 Dec, 2014

1 commit


01 Oct, 2014

1 commit

  • Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
    are already marked DST_HOST (e.g. input routes routes) will always be
    invalidated during sk_dst_check. Thus per-socket dst caching absolutely
    had no effect and early demuxing had no effect.

    Thus this patch removes rt6i_genid: fn_sernum already gets modified during
    add operations, so we only must ensure we mutate fn_sernum during ipv6
    address remove operations. This is a fairly cost extensive operations,
    but address removal should not happen that often. Also our mtu update
    functions do the same and we heard no complains so far. xfrm policy
    changes also cause a call into fib6_flush_trees. Also plug a hole in
    rt6_info (no cacheline changes).

    I verified via tracing that this change has effect.

    Cc: Eric Dumazet
    Cc: YOSHIFUJI Hideaki
    Cc: Vlad Yasevich
    Cc: Nicolas Dichtel
    Cc: Martin Lau
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

25 Apr, 2014

1 commit

  • Johannes noted this is not needed, all of the fragment
    accessors don't need CONFIG_NET_NS. This goes test compiled with
    CONFIG_BT_6LOWPAN=y and a disabled CONFIG_NET_NS.

    CC: Alexander Smirnov
    Cc: Dmitry Eremin-Solenikov
    Cc: linux-zigbee-devel@lists.sourceforge.net
    Cc: David S. Miller"
    Cc: netdev@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: Johannes Berg
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: David S. Miller

    Luis R. Rodriguez
     

21 Apr, 2014

1 commit

  • This will simplify the new reassembly backport
    with no code changes being required.

    CC: Alexander Smirnov
    Cc: Dmitry Eremin-Solenikov
    Cc: linux-zigbee-devel@lists.sourceforge.net
    Cc: David S. Miller"
    Cc: netdev@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: David S. Miller

    Luis R. Rodriguez
     

17 Apr, 2014

1 commit

  • As suggested by Julian:

    Simply, flowi4_iif must not contain 0, it does not
    look logical to ignore all ip rules with specified iif.

    because in fib_rule_match() we do:

    if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
    goto out;

    flowi4_iif should be LOOPBACK_IFINDEX by default.

    We need to move LOOPBACK_IFINDEX to include/net/flow.h:

    1) It is mostly used by flowi_iif

    2) Fix the following compile error if we use it in flow.h
    by the patches latter:

    In file included from include/linux/netfilter.h:277:0,
    from include/net/netns/netfilter.h:5,
    from include/net/net_namespace.h:21,
    from include/linux/netdevice.h:43,
    from include/linux/icmpv6.h:12,
    from include/linux/ipv6.h:61,
    from include/net/ipv6.h:16,
    from include/linux/sunrpc/clnt.h:27,
    from include/linux/nfs_fs.h:30,
    from init/do_mounts.c:32:
    include/net/flow.h: In function ‘flowi4_init_output’:
    include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

    Cc: Eric Biederman
    Cc: Julian Anastasov
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

01 Mar, 2014

1 commit


10 Feb, 2014

1 commit

  • Move prototype declaration of function to header file
    include/net/net_namespace.h from net/ipx/af_ipx.c because they are used
    by more than one file.

    This eliminates the following warning in net/ipx/sysctl_net_ipx.c:
    net/ipx/sysctl_net_ipx.c:33:6: warning: no previous prototype for ‘ipx_register_sysctl’ [-Wmissing-prototypes]
    net/ipx/sysctl_net_ipx.c:38:6: warning: no previous prototype for ‘ipx_unregister_sysctl’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Signed-off-by: David S. Miller

    Rashika Kheria
     

15 Oct, 2013

1 commit


02 Oct, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be.h
    drivers/net/usb/qmi_wwan.c
    drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
    include/net/netfilter/nf_conntrack_synproxy.h
    include/net/secure_seq.h

    The conflicts are of two varieties:

    1) Conflicts with Joe Perches's 'extern' removal from header file
    function declarations. Usually it's an argument signature change
    or a function being added/removed. The resolutions are trivial.

    2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
    a new value, another changes an existing value. That sort of
    thing.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Sep, 2013

1 commit

  • There is currently serialization network namespaces exiting and
    network devices exiting as the final part of netdev_run_todo does not
    happen under the rtnl_lock. This is compounded by the fact that the
    only list of devices unregistering in netdev_run_todo is local to the
    netdev_run_todo.

    This lack of serialization in extreme cases results in network devices
    unregistering in netdev_run_todo after the loopback device of their
    network namespace has been freed (making dst_ifdown unsafe), and after
    the their network namespace has exited (making the NETDEV_UNREGISTER,
    and NETDEV_UNREGISTER_FINAL callbacks unsafe).

    Add the missing serialization by a per network namespace count of how
    many network devices are unregistering and having a wait queue that is
    woken up whenever the count is decreased. The count and wait queue
    allow default_device_exit_batch to wait until all of the unregistration
    activity for a network namespace has finished before proceeding to
    unregister the loopback device and then allowing the network namespace
    to exit.

    Only a single global wait queue is used because there is a single global
    lock, and there is a single waiter, per network namespace wait queues
    would be a waste of resources.

    The per network namespace count of unregistering devices gives a
    progress guarantee because the number of network devices unregistering
    in an exiting network namespace must ultimately drop to zero (assuming
    network device unregistration completes).

    The basic logic remains the same as in v1. This patch is now half
    comment and half rtnl_lock_unregistering an expanded version of
    wait_event performs no extra work in the common case where no network
    devices are unregistering when we get to default_device_exit_batch.

    Reported-by: Francesco Ruggeri
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

22 Sep, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

01 Aug, 2013

1 commit

  • Current net name space has only one genid for both IPv4 and IPv6, it has below
    drawbacks:

    - Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
    - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
    entries even when the policy is only applied for one address family.

    Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
    separately in a fine granularity.

    Signed-off-by: Fan Du
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    fan.du
     

26 Jun, 2013

1 commit


03 Jun, 2013

1 commit

  • commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
    added the support to flush learned pmtu information.

    However, using rt_genid is quite heavy as it is bumped on route
    add/change and multicast events amongst other places. These can
    happen quite often, especially if using dynamic routing protocols.

    While this is ok with routes (as they are just recreated locally),
    the pmtu information is learned from remote systems and the icmp
    notification can come with long delays. It is worthy to have separate
    genid to avoid excessive pmtu resets.

    Cc: Steffen Klassert
    Signed-off-by: Timo Teräs
    Signed-off-by: David S. Miller

    Timo Teräs
     

06 Apr, 2013

1 commit


20 Nov, 2012

1 commit

  • Assign a unique proc inode to each namespace, and use that
    inode number to ensure we only allocate at most one proc
    inode for every namespace in proc.

    A single proc inode per namespace allows userspace to test
    to see if two processes are in the same namespace.

    This has been a long requested feature and only blocked because
    a naive implementation would put the id in a global space and
    would ultimately require having a namespace for the names of
    namespaces, making migration and certain virtualization tricks
    impossible.

    We still don't have per superblock inode numbers for proc, which
    appears necessary for application unaware checkpoint/restart and
    migrations (if the application is using namespace file descriptors)
    but that is now allowd by the design if it becomes important.

    I have preallocated the ipc and uts initial proc inode numbers so
    their structures can be statically initialized.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

19 Nov, 2012

2 commits

  • The user namespace which creates a new network namespace owns that
    namespace and all resources created in it. This way we can target
    capability checks for privileged operations against network resources to
    the user_ns which created the network namespace in which the resource
    lives. Privilege to the user namespace which owns the network
    namespace, or any parent user namespace thereof, provides the same
    privilege to the network resource.

    This patch is reworked from a version originally by
    Serge E. Hallyn

    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • The copy of copy_net_ns used when the network stack is not
    built is broken as it does not return -EINVAL when attempting
    to create a new network namespace. We don't even have
    a previous network namespace.

    Since we need a copy of copy_net_ns in net/net_namespace.h that is
    available when the networking stack is not built at all move the
    correct version of copy_net_ns from net_namespace.c into net_namespace.h
    Leaving us with just 2 versions of copy_net_ns. One version for when
    we compile in network namespace suport and another stub for all other
    occasions.

    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

06 Oct, 2012

1 commit


29 Sep, 2012

1 commit

  • Conflicts:
    drivers/net/team/team.c
    drivers/net/usb/qmi_wwan.c
    net/batman-adv/bat_iv_ogm.c
    net/ipv4/fib_frontend.c
    net/ipv4/route.c
    net/l2tp/l2tp_netlink.c

    The team, fib_frontend, route, and l2tp_netlink conflicts were simply
    overlapping changes.

    qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

    With help from Antonio Quartulli.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Sep, 2012

1 commit

  • As pointed by Michal, it is necessary to add a new
    namespace for nf_conntrack_reasm code, this prepares
    for the second patch.

    Cc: Herbert Xu
    Cc: Michal Kubeček
    Cc: David Miller
    Cc: Patrick McHardy
    Cc: Pablo Neira Ayuso
    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     

19 Sep, 2012

1 commit


15 Aug, 2012

1 commit

  • - Move the address lists into struct net
    - Add per network namespace initialization and cleanup
    - Pass around struct net so it is everywhere I need it.
    - Rename all of the global variable references into references
    to the variables moved into struct net

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

10 Aug, 2012

2 commits

  • As pointed out, there are places, that access net->loopback_dev->ifindex
    and after ifindex generation is made per-net this value becomes constant
    equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
    it where appropriate.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Strictly speaking this is only _really_ required for checkpoint-restore to
    make loopback device always have the same index.

    This change appears to be safe wrt "ifindex should be unique per-system"
    concept, as all the ifindex usage is either already made per net namespace
    of is explicitly limited with init_net only.

    There are two cool side effects of this. The first one -- ifindices of
    devices in container are always small, regardless of how many containers
    we've started (and re-started) so far. The second one is -- we can speed
    up the loopback ifidex access as shown in the next patch.

    v2: Place ifindex right after dev_base_seq : avoid two holes and use the
    same cache line, dirtied in list_netdevice()/unlist_netdevice()

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

17 Jul, 2012

1 commit

  • Before this patch sock_diag works for init_net only and dumps
    information about sockets from all namespaces.

    This patch expands sock_diag for all name-spaces.
    It creates a netlink kernel socket for each netns and filters
    data during dumping.

    v2: filter accoding with netns in all places
    remove an unused variable.

    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Cc: Pavel Emelyanov
    CC: Eric Dumazet
    Cc: linux-kernel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Andrew Vagin
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Andrey Vagin