15 Oct, 2013

1 commit


02 Oct, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be.h
    drivers/net/usb/qmi_wwan.c
    drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
    include/net/netfilter/nf_conntrack_synproxy.h
    include/net/secure_seq.h

    The conflicts are of two varieties:

    1) Conflicts with Joe Perches's 'extern' removal from header file
    function declarations. Usually it's an argument signature change
    or a function being added/removed. The resolutions are trivial.

    2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
    a new value, another changes an existing value. That sort of
    thing.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Sep, 2013

1 commit

  • There is currently serialization network namespaces exiting and
    network devices exiting as the final part of netdev_run_todo does not
    happen under the rtnl_lock. This is compounded by the fact that the
    only list of devices unregistering in netdev_run_todo is local to the
    netdev_run_todo.

    This lack of serialization in extreme cases results in network devices
    unregistering in netdev_run_todo after the loopback device of their
    network namespace has been freed (making dst_ifdown unsafe), and after
    the their network namespace has exited (making the NETDEV_UNREGISTER,
    and NETDEV_UNREGISTER_FINAL callbacks unsafe).

    Add the missing serialization by a per network namespace count of how
    many network devices are unregistering and having a wait queue that is
    woken up whenever the count is decreased. The count and wait queue
    allow default_device_exit_batch to wait until all of the unregistration
    activity for a network namespace has finished before proceeding to
    unregister the loopback device and then allowing the network namespace
    to exit.

    Only a single global wait queue is used because there is a single global
    lock, and there is a single waiter, per network namespace wait queues
    would be a waste of resources.

    The per network namespace count of unregistering devices gives a
    progress guarantee because the number of network devices unregistering
    in an exiting network namespace must ultimately drop to zero (assuming
    network device unregistration completes).

    The basic logic remains the same as in v1. This patch is now half
    comment and half rtnl_lock_unregistering an expanded version of
    wait_event performs no extra work in the common case where no network
    devices are unregistering when we get to default_device_exit_batch.

    Reported-by: Francesco Ruggeri
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

22 Sep, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

01 Aug, 2013

1 commit

  • Current net name space has only one genid for both IPv4 and IPv6, it has below
    drawbacks:

    - Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
    - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
    entries even when the policy is only applied for one address family.

    Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
    separately in a fine granularity.

    Signed-off-by: Fan Du
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    fan.du
     

26 Jun, 2013

1 commit


03 Jun, 2013

1 commit

  • commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
    added the support to flush learned pmtu information.

    However, using rt_genid is quite heavy as it is bumped on route
    add/change and multicast events amongst other places. These can
    happen quite often, especially if using dynamic routing protocols.

    While this is ok with routes (as they are just recreated locally),
    the pmtu information is learned from remote systems and the icmp
    notification can come with long delays. It is worthy to have separate
    genid to avoid excessive pmtu resets.

    Cc: Steffen Klassert
    Signed-off-by: Timo Teräs
    Signed-off-by: David S. Miller

    Timo Teräs
     

06 Apr, 2013

1 commit


20 Nov, 2012

1 commit

  • Assign a unique proc inode to each namespace, and use that
    inode number to ensure we only allocate at most one proc
    inode for every namespace in proc.

    A single proc inode per namespace allows userspace to test
    to see if two processes are in the same namespace.

    This has been a long requested feature and only blocked because
    a naive implementation would put the id in a global space and
    would ultimately require having a namespace for the names of
    namespaces, making migration and certain virtualization tricks
    impossible.

    We still don't have per superblock inode numbers for proc, which
    appears necessary for application unaware checkpoint/restart and
    migrations (if the application is using namespace file descriptors)
    but that is now allowd by the design if it becomes important.

    I have preallocated the ipc and uts initial proc inode numbers so
    their structures can be statically initialized.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

19 Nov, 2012

2 commits

  • The user namespace which creates a new network namespace owns that
    namespace and all resources created in it. This way we can target
    capability checks for privileged operations against network resources to
    the user_ns which created the network namespace in which the resource
    lives. Privilege to the user namespace which owns the network
    namespace, or any parent user namespace thereof, provides the same
    privilege to the network resource.

    This patch is reworked from a version originally by
    Serge E. Hallyn

    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • The copy of copy_net_ns used when the network stack is not
    built is broken as it does not return -EINVAL when attempting
    to create a new network namespace. We don't even have
    a previous network namespace.

    Since we need a copy of copy_net_ns in net/net_namespace.h that is
    available when the networking stack is not built at all move the
    correct version of copy_net_ns from net_namespace.c into net_namespace.h
    Leaving us with just 2 versions of copy_net_ns. One version for when
    we compile in network namespace suport and another stub for all other
    occasions.

    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

06 Oct, 2012

1 commit


29 Sep, 2012

1 commit

  • Conflicts:
    drivers/net/team/team.c
    drivers/net/usb/qmi_wwan.c
    net/batman-adv/bat_iv_ogm.c
    net/ipv4/fib_frontend.c
    net/ipv4/route.c
    net/l2tp/l2tp_netlink.c

    The team, fib_frontend, route, and l2tp_netlink conflicts were simply
    overlapping changes.

    qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

    With help from Antonio Quartulli.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Sep, 2012

1 commit

  • As pointed by Michal, it is necessary to add a new
    namespace for nf_conntrack_reasm code, this prepares
    for the second patch.

    Cc: Herbert Xu
    Cc: Michal Kubeček
    Cc: David Miller
    Cc: Patrick McHardy
    Cc: Pablo Neira Ayuso
    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     

19 Sep, 2012

1 commit


15 Aug, 2012

1 commit

  • - Move the address lists into struct net
    - Add per network namespace initialization and cleanup
    - Pass around struct net so it is everywhere I need it.
    - Rename all of the global variable references into references
    to the variables moved into struct net

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

10 Aug, 2012

2 commits

  • As pointed out, there are places, that access net->loopback_dev->ifindex
    and after ifindex generation is made per-net this value becomes constant
    equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
    it where appropriate.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Strictly speaking this is only _really_ required for checkpoint-restore to
    make loopback device always have the same index.

    This change appears to be safe wrt "ifindex should be unique per-system"
    concept, as all the ifindex usage is either already made per net namespace
    of is explicitly limited with init_net only.

    There are two cool side effects of this. The first one -- ifindices of
    devices in container are always small, regardless of how many containers
    we've started (and re-started) so far. The second one is -- we can speed
    up the loopback ifidex access as shown in the next patch.

    v2: Place ifindex right after dev_base_seq : avoid two holes and use the
    same cache line, dirtied in list_netdevice()/unlist_netdevice()

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

17 Jul, 2012

1 commit

  • Before this patch sock_diag works for init_net only and dumps
    information about sockets from all namespaces.

    This patch expands sock_diag for all name-spaces.
    It creates a netlink kernel socket for each netns and filters
    data during dumping.

    v2: filter accoding with netns in all places
    remove an unused variable.

    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Cc: Pavel Emelyanov
    CC: Eric Dumazet
    Cc: linux-kernel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Andrew Vagin
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Andrey Vagin
     

24 Apr, 2012

1 commit

  • Randy Dunlap reported:
    > On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
    >
    >> Hi all,
    >>
    >> Changes since 20120420:
    >
    >
    >
    > ERROR: "unregister_net_sysctl_table" [net/phonet/phonet.ko] undefined!
    > ERROR: "register_net_sysctl" [net/phonet/phonet.ko] undefined!
    >
    > when CONFIG_SYSCTL is not enabled.

    Add static inline stub functions to gracefully handle the case when sysctl
    support is not present.

    Signed-off-by: Eric W. Biederman
    Acked-by: Randy Dunlap
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

21 Apr, 2012

4 commits

  • All of the users have been converted to use registera_net_sysctl so we
    no longer need register_net_sysctl.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • register_sysctl_rotable never caught on as an interesting way to
    register sysctls. My take on the situation is that what we want are
    sysctls that we can only see in the initial network namespace. What we
    have implemented with register_sysctl_rotable are sysctls that we can
    see in all of the network namespaces and can only change in the initial
    network namespace.

    That is a very silly way to go. Just register the network sysctls
    in the initial network namespace and we don't have any weird special
    cases to deal with.

    The sysctls affected are:
    /proc/sys/net/ipv4/ipfrag_secret_interval
    /proc/sys/net/ipv4/ipfrag_max_dist
    /proc/sys/net/ipv6/ip6frag_secret_interval
    /proc/sys/net/ipv6/mld_max_msf

    I really don't expect anyone will miss them if they can't read them in a
    child user namespace.

    CC: Pavel Emelyanov
    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • If the netfilter code is modified to use register_net_sysctl_table the
    kernel fails to boot because the per net sysctl infrasturce is not setup
    soon enough. So to avoid races call net_sysctl_init from sock_init().

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Right now all of the networking sysctl registrations are running in a
    compatibiity mode. The natvie sysctl registration api takes a cstring
    for a path and a simple ctl_table. Implement register_net_sysctl so
    that we can register network sysctls without needing to use
    compatiblity code in the sysctl core.

    Switching from a ctl_path to a cstring results in less boiler plate
    and denser code that is a little easier to read.

    I would simply have changed the arguments to register_net_sysctl_table
    instead of keeping two functions in parallel but gcc will allow a
    ctl_path pointer to be passed to a char * pointer with only issuing a
    warning resulting in completely incorrect code can be built. Since I
    have to change the function name I am taking advantage of the situation
    to let both register_net_sysctl and register_net_sysctl_table live for a
    short time in parallel which makes clean conversion patches a bit easier
    to read and write.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

12 Dec, 2011

1 commit


27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

02 Jul, 2011

1 commit

  • This patch adds a change sequence counter to each net namespace
    which is bumped whenever a netdevice is added or removed from
    the list. If such a change occurred while a link dump took place,
    the dump will have the NLM_F_DUMP_INTR flag set in the first
    message which has been interrupted and in all subsequent messages
    of the same dump.

    Note that links may still be modified or renamed while a dump is
    taking place but we can guarantee for userspace to receive a
    complete list of links and not miss any.

    Testing:
    I have added 500 VLAN netdevices to make sure the dump is split
    over multiple messages. Then while continuously dumping links in
    one process I also continuously deleted and re-added a dummy
    netdevice in another process. Multiple dumps per seconds have
    had the NLM_F_DUMP_INTR flag set.

    I guess we can wait for Johannes patch to hit net-next via the
    wireless tree. I just wanted to give this some testing right away.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

13 Jun, 2011

1 commit

  • * new refcount in struct net, controlling actual freeing of the memory
    * new method in kobj_ns_type_operations (->drop_ns())
    * ->current_ns() semantics change - it's supposed to be followed by
    corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps
    the new refcount; net_drop_ns() decrements it and calls net_free() if the
    last reference has been dropped. Method renamed to ->grab_current_ns().
    * old net_free() callers call net_drop_ns() instead.
    * sysfs_exit_ns() is gone, along with a large part of callchain
    leading to it; now that the references stored in ->ns[...] stay valid we
    do not need to hunt them down and replace them with NULL. That fixes
    problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
    of sb->s_instances abuse.

    Note that struct net *shutdown* logics has not changed - net_cleanup()
    is called exactly when it used to be called. The only thing postponed by
    having a sysfs instance refering to that struct net is actual freeing of
    memory occupied by struct net.

    Signed-off-by: Al Viro

    Al Viro
     

28 May, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    net: Kill ratelimit.h dependency in linux/net.h
    net: Add linux/sysctl.h includes where needed.
    net: Kill ether_table[] declaration.
    inetpeer: fix race in unused_list manipulations
    atm: expose ATM device index in sysfs
    IPVS: bug in ip_vs_ftp, same list heaad used in all netns.
    bug.h: Move ratelimit warn interfaces to ratelimit.h
    bonding: cleanup module option descriptions
    net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version.
    net: davinci_emac: fix dev_err use at probe
    can: convert to %pK for kptr_restrict support
    net: fix ETHTOOL_SFEATURES compatibility with old ethtool_ops.set_flags
    netfilter: Fix several warnings in compat_mtw_from_user().
    netfilter: ipset: fix ip_set_flush return code
    netfilter: ipset: remove unused variable from type_pf_tdel()
    netfilter: ipset: Use proper timeout value to jiffies conversion

    Linus Torvalds
     
  • Several networking headers were depending upon the implicit
    linux/sysctl.h include they get when including linux/net.h

    Add explicit includes.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 May, 2011

1 commit


15 Mar, 2011

1 commit

  • Remove include/net/netns/ip_vs.h because it depends on
    structures from include/net/ip_vs.h. As ipvs is pointer in
    struct net it is better to move struct netns_ipvs into
    include/net/ip_vs.h, so that we can easily use other structures
    in struct netns_ipvs.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     

13 Jan, 2011

1 commit

  • Preparation for network name-space init, in this stage
    some empty functions exists.

    In most files there is a check if it is root ns i.e. init_net
    if (!net_eq(net, &init_net))
    return ...
    this will be removed by the last patch, when enabling name-space.

    *v3
    ip_vs_conn.c merge error corrected.
    net_ipvs #ifdef removed as sugested by Jan Engelhardt

    [ horms@verge.net.au: Removed whitespace-change-only hunks ]
    Signed-off-by: Hans Schillstrom
    Acked-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Hans Schillstrom
     

26 Oct, 2010

1 commit


18 Oct, 2010

1 commit

  • In a network bench, I noticed an unfortunate false sharing between
    'loopback_dev' and 'count' fields in "struct net".

    'count' is written each time a socket is created or destroyed, while
    loopback_dev might be often read in routing code.

    Move loopback_dev in a read mostly section of "struct net"

    Note: struct netns_xfrm is cache line aligned on SMP.
    (It contains a "struct dst_ops")
    Move it at the end to avoid holes, and reduce sizeof(struct net) by 128
    bytes on ia32.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Mar, 2010

1 commit

  • Remove INIT_NSPROXY(), use C99 initializer.
    Remove INIT_IPC_NS(), INIT_NET_NS() while I'm at it.

    Note: headers trim will be done later, now it's quite pointless because
    results will be invalidated by merge window.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

13 Jan, 2010

1 commit


04 Dec, 2009

1 commit

  • - Add exit_list to struct net to support building lists of network
    namespaces to cleanup.

    - Add exit_batch to pernet_operations to allow running operations only
    once during a network namespace exit. Instead of once per network
    namespace.

    - Factor opt ops_exit_list and ops_exit_free so the logic with cleanup
    up a network namespace does not need to be duplicated.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

02 Dec, 2009

2 commits

  • No that all of the callers have been updated to set fields in
    struct pernet_operations, and simplified to let the network
    namespace core handle the allocation and freeing of the storage
    for them, remove the surpurpflous methods and update the docs
    to the new style.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • To get the full benefit of batched network namespace cleanup netowrk
    device deletion needs to be performed by the generic code. When
    using register_pernet_gen_device and freeing the data in exit_net
    it is impossible to delay allocation until after exit_net has called
    as the device uninit methods are no longer safe.

    To correct this, and to simplify working with per network namespace data
    I have moved allocation and deletion of per network namespace data into
    the network namespace core. The core now frees the data only after
    all of the network namespace exit routines have run.

    Now it is only required to set the new fields .id and .size
    in the pernet_operations structure if you want network namespace
    data to be managed for you automatically.

    This makes the current register_pernet_gen_device and
    register_pernet_gen_subsys routines unnecessary. For the moment
    I have left them as compatibility wrappers in net_namespace.h
    They will be removed once all of the users have been updated.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman