26 May, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
    net: fix get_net_ns_by_fd for !CONFIG_NET_NS
    ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry.
    ns: Declare sys_setns in syscalls.h
    net: Allow setting the network namespace by fd
    ns proc: Add support for the ipc namespace
    ns proc: Add support for the uts namespace
    ns proc: Add support for the network namespace.
    ns: Introduce the setns syscall
    ns: proc files for namespace naming policy.

    Linus Torvalds
     
  • Commit e67f88dd12f6 (dont hold rtnl mutex during netlink dump callbacks)
    missed fact that rtnl_fill_ifinfo() must be called with rtnl held.

    Because of possible deadlocks between two mutexes (cb_mutex and rtnl),
    its not easy to solve this problem, so revert this part of the patch.

    It also forgot one rcu_read_unlock() in FIB dump_rules()

    Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 May, 2011

1 commit


11 May, 2011

1 commit


10 May, 2011

1 commit

  • veth devices dont use the batched device unregisters yet.

    Since veth are a pair of devices, it makes sense to use a batch of two
    unregisters, this roughly divides dismantle time by two.

    Fix this by changing dellink() callers to always provide a non NULL
    head. (Idea from Michał Mirosław)

    This patch also handles macvlan case : We now dismantle all macvlans on
    top of a lower dev at once.

    Reported-by: Alex Bligh
    Signed-off-by: Eric Dumazet
    Cc: Michał Mirosław
    Cc: Jesse Gross
    Cc: Paul E. McKenney
    Cc: Ben Greear
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 May, 2011

1 commit

  • Force dev_alloc_name() to be called from register_netdevice() by
    dev_get_valid_name(). That allows to remove multiple explicit
    dev_alloc_name() calls.

    The possibility to call dev_alloc_name in advance remains.

    This also fixes veth creation regresion caused by
    84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

03 May, 2011

1 commit

  • Four years ago, Patrick made a change to hold rtnl mutex during netlink
    dump callbacks.

    I believe it was a wrong move. This slows down concurrent dumps, making
    good old /proc/net/ files faster than rtnetlink in some situations.

    This occurred to me because one "ip link show dev ..." was _very_ slow
    on a workload adding/removing network devices in background.

    All dump callbacks are able to use RCU locking now, so this patch does
    roughly a revert of commits :

    1c2d670f366 : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks
    6313c1e0992 : [RTNETLINK]: Remove unnecessary locking in dump callbacks

    This let writers fight for rtnl mutex and readers going full speed.

    It also takes care of phonet : phonet_route_get() is now called from rcu
    read section. I renamed it to phonet_route_get_rcu()

    Signed-off-by: Eric Dumazet
    Cc: Patrick McHardy
    Cc: Remi Denis-Courmont
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 Mar, 2011

1 commit


14 Feb, 2011

1 commit


01 Feb, 2011

1 commit


30 Jan, 2011

1 commit

  • Ed Swierk writes:
    > On 2.6.35.7
    > ip link add link eth0 netns 9999 type macvlan
    > where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang:
    > [10663.821898] BUG: unable to handle kernel NULL pointer dereference at 000000000000006d
    > [10663.821917] IP: [] __dev_alloc_name+0x9a/0x170
    > [10663.821933] PGD 1d3927067 PUD 22f5c5067 PMD 0
    > [10663.821944] Oops: 0000 [#1] SMP
    > [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
    > [10663.821959] CPU 3
    > [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class
    > [10663.822155]
    > [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO
    > [10663.822167] RIP: 0010:[] [] __dev_alloc_name+0x9a/0x170
    > [10663.822177] RSP: 0018:ffff88014aebf7b8 EFLAGS: 00010286
    > [10663.822182] RAX: 00000000fffffff4 RBX: ffff8801ad900800 RCX: 0000000000000000
    > [10663.822187] RDX: ffff880000000000 RSI: 0000000000000000 RDI: ffff88014ad63000
    > [10663.822191] RBP: ffff88014aebf808 R08: 0000000000000041 R09: 0000000000000041
    > [10663.822196] R10: 0000000000000000 R11: dead000000200200 R12: ffff88014aebf818
    > [10663.822201] R13: fffffffffffffffd R14: ffff88014aebf918 R15: ffff88014ad62000
    > [10663.822207] FS: 00007f00c487f700(0000) GS:ffff880001f80000(0000) knlGS:0000000000000000
    > [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    > [10663.822216] CR2: 000000000000006d CR3: 0000000231f19000 CR4: 00000000000026e0
    > [10663.822221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > [10663.822226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > [10663.822231] Process ip (pid: 6000, threadinfo ffff88014aebe000, task ffff88014afb16e0)
    > [10663.822236] Stack:
    > [10663.822240] ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6
    > [10663.822251] 0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818
    > [10663.822265] ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413
    > [10663.822281] Call Trace:
    > [10663.822290] [] ? dev_addr_init+0x75/0xb0
    > [10663.822298] [] dev_alloc_name+0x43/0x90
    > [10663.822307] [] rtnl_create_link+0xbe/0x1b0
    > [10663.822314] [] rtnl_newlink+0x48a/0x570
    > [10663.822321] [] ? rtnl_newlink+0x1ac/0x570
    > [10663.822332] [] ? native_x2apic_icr_read+0x4/0x20
    > [10663.822339] [] rtnetlink_rcv_msg+0x177/0x290
    > [10663.822346] [] ? rtnetlink_rcv_msg+0x0/0x290
    > [10663.822354] [] netlink_rcv_skb+0xa9/0xd0
    > [10663.822360] [] rtnetlink_rcv+0x25/0x40
    > [10663.822367] [] netlink_unicast+0x2de/0x2f0
    > [10663.822374] [] netlink_sendmsg+0x1fe/0x2e0
    > [10663.822383] [] sock_sendmsg+0xf3/0x120
    > [10663.822391] [] ? _raw_spin_lock+0xe/0x20
    > [10663.822400] [] ? __d_lookup+0x136/0x150
    > [10663.822406] [] ? _raw_spin_lock+0xe/0x20
    > [10663.822414] [] ? _atomic_dec_and_lock+0x4d/0x80
    > [10663.822422] [] ? mntput_no_expire+0x30/0x110
    > [10663.822429] [] ? move_addr_to_kernel+0x65/0x70
    > [10663.822435] [] ? verify_iovec+0x88/0xe0
    > [10663.822442] [] sys_sendmsg+0x240/0x3a0
    > [10663.822450] [] ? __do_fault+0x479/0x560
    > [10663.822457] [] ? _raw_spin_lock+0xe/0x20
    > [10663.822465] [] ? alloc_fd+0x10a/0x150
    > [10663.822473] [] ? do_page_fault+0x15e/0x350
    > [10663.822482] [] system_call_fastpath+0x16/0x1b
    > [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55
    > [10663.822618] RIP [] __dev_alloc_name+0x9a/0x170
    > [10663.822627] RSP
    > [10663.822631] CR2: 000000000000006d
    > [10663.822636] ---[ end trace 3dfd6c3ad5327ca7 ]---

    This bug was introduced in:
    commit 81adee47dfb608df3ad0b91d230fb3cef75f0060
    Author: Eric W. Biederman
    Date: Sun Nov 8 00:53:51 2009 -0800

    net: Support specifying the network namespace upon device creation.

    There is no good reason to not support userspace specifying the
    network namespace during device creation, and it makes it easier
    to create a network device and pass it to a child network namespace
    with a well known name.

    We have to be careful to ensure that the target network namespace
    for the new device exists through the life of the call. To keep
    that logic clear I have factored out the network namespace grabbing
    logic into rtnl_link_get_net.

    In addtion we need to continue to pass the source network namespace
    to the rtnl_link_ops.newlink method so that we can find the base
    device source network namespace.

    Signed-off-by: Eric W. Biederman
    Acked-by: Eric Dumazet

    Where apparently I forgot to add error handling to the path where we create
    a new network device in a new network namespace, and pass in an invalid pid.

    Cc: stable@kernel.org
    Reported-by: Ed Swierk
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

28 Jan, 2011

2 commits


25 Jan, 2011

1 commit


21 Jan, 2011

1 commit

  • rtnl_group_changelink() is invoked by rtnl_newlink() before the link
    attributes have been validated. Additionally the group changes are
    performed even if NLM_F_CREATE is specified and a new link is
    created, while more reasonable semantics would be to set the group
    value on the newly created link.

    Fix both problems by moving the rtnl_group_changelink() invocation
    down to the handling of non-existant links without NLM_F_CREATE()
    and add a dev_set_group() call to rtnl_create_link().

    Signed-off-by: Patrick McHardy
    Acked-by: Vlad Dogaru
    Signed-off-by: David S. Miller

    Patrick McHardy
     

20 Jan, 2011

3 commits


10 Jan, 2011

1 commit

  • Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
    when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
    being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
    non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

    Substitute the condition to test for _all_ bits being set.

    Signed-off-by: Jan Engelhardt
    Acked-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Jan Engelhardt
     

28 Nov, 2010

1 commit

  • As David pointed out correctly, updates to af-specific attributes
    are currently not atomic. If multiple changes are requested and
    one of them fails, previous updates may have been applied already
    leaving the link behind in a undefined state.

    This patch splits the function parse_link_af() into two functions
    validate_link_af() and set_link_at(). validate_link_af() is placed
    to validate_linkmsg() check for errors as early as possible before
    any changes to the link have been made. set_link_af() is called to
    commit the changes later.

    This method is not fail proof, while it is currently sufficient
    to make set_link_af() inerrable and thus 100% atomic, the
    validation function method will not be able to detect all error
    scenarios in the future, there will likely always be errors
    depending on states which are f.e. not protected by rtnl_mutex
    and thus may change between validation and setting.

    Also, instead of silently ignoring unknown address families and
    config blocks for address families which did not register a set
    function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
    returned to avoid comitting 4 out of 5 update requests without
    notifying the user.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

18 Nov, 2010

1 commit

  • Each net_device contains address family specific data such as
    per device settings and statistics. We already expose this data
    via procfs/sysfs and partially netlink.

    The netlink method requires the requester to send one RTM_GETLINK
    request for each address family it wishes to receive data of
    and then merge this data itself.

    This patch implements a new API which combines all address family
    specific link data in a new netlink attribute IFLA_AF_SPEC.
    IFLA_AF_SPEC contains a sequence of nested attributes, one for each
    address family which in turn defines the structure of its own
    attribute. Example:

    [IFLA_AF_SPEC] = {
    [AF_INET] = {
    [IFLA_INET_CONF] = ...,
    },
    [AF_INET6] = {
    [IFLA_INET6_FLAGS] = ...,
    [IFLA_INET6_CONF] = ...,
    }
    }

    The API also allows for address families to implement a function
    which parses the IFLA_AF_SPEC attribute sent by userspace to
    implement address family specific link options.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

13 Nov, 2010

1 commit

  • nlmsg_total_size() calculates the length of a netlink message
    including header and alignment. nla_total_size() calculates the
    space an individual attribute consumes which was meant to be used
    in this context.

    Also, ensure to account for the attribute header for the
    IFLA_INFO_XSTATS attribute as implementations of get_xstats_size()
    seem to assume that we do so.

    The addition of two message headers minus the missing attribute
    header resulted in a calculated message size that was larger than
    required. Therefore we never risked running out of skb tailroom.

    Signed-off-by: Thomas Graf
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Thomas Graf
     

21 Oct, 2010

1 commit


24 Aug, 2010

1 commit


13 Jul, 2010

1 commit


08 Jul, 2010

1 commit

  • There is a small possibility that a reader gets incorrect values on 32
    bit arches. SNMP applications could catch incorrect counters when a
    32bit high part is changed by another stats consumer/provider.

    One way to solve this is to add a rtnl_link_stats64 param to all
    ndo_get_stats64() methods, and also add such a parameter to
    dev_get_stats().

    Rule is that we are not allowed to use dev->stats64 as a temporary
    storage for 64bit stats, but a caller provided area (usually on stack)

    Old drivers (only providing get_stats() method) need no changes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Jun, 2010

1 commit

  • Use struct rtnl_link_stats64 as the statistics structure.

    On 32-bit architectures, insert 32 bits of padding after/before each
    field of struct net_device_stats to make its layout compatible with
    struct rtnl_link_stats64. Add an anonymous union in net_device; move
    stats into the union and add struct rtnl_link_stats64 stats64.

    Add net_device_ops::ndo_get_stats64, implementations of which will
    return a pointer to struct rtnl_link_stats64. Drivers that implement
    this operation must not update the structure asynchronously.

    Change dev_get_stats() to call ndo_get_stats64 if available, and to
    return a pointer to struct rtnl_link_stats64. Change callers of
    dev_get_stats() accordingly.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     

28 May, 2010

2 commits

  • The wrong size was being calculated for vfinfo. In one case, it was over-
    calculating using nlmsg_total_size on attrs, in another case, it was
    under-calculating by assuming ifla_vf_* structs are packed together, but
    each struct is it's own attr w/ hdr (and padding).

    Signed-off-by: Scott Feldman
    Signed-off-by: David S. Miller

    Scott Feldman
     
  • Noticed by Patrick McHardy: was continuing to fill skb after a
    nla_put_failure, ignoring the size calculated by upper layer. Now,
    return -EMSGSIZE on any overruns, but also allow netdev to
    fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding
    nest.

    Signed-off-by: Scott Feldman
    Signed-off-by: David S. Miller

    Scott Feldman
     

24 May, 2010

1 commit

  • Commit c02db8c6290bb992442fec1407643c94cc414375:

    Author: Chris Wright
    Date: Sun May 16 01:05:45 2010 -0700
    Subject: rtnetlink: make SR-IOV VF interface symmetric

    adds broken error handling to do_setlink() in net/core/rtnetlink.c. The
    problem is the following chunk of code:

    if (tb[IFLA_VFINFO_LIST]) {
    struct nlattr *attr;
    int rem;
    nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
    if (nla_type(attr) != IFLA_VF_INFO)
    ----> goto errout;
    err = do_setvfinfo(dev, attr);
    if (err < 0)
    goto errout;
    modified = 1;
    }
    }

    which can get to errout without setting err, resulting in the following error:

    net/core/rtnetlink.c: In function 'do_setlink':
    net/core/rtnetlink.c:904: warning: 'err' may be used uninitialized in this function

    Change the code to return -EINVAL in this case. Note that this might not be
    the appropriate error though.

    Signed-off-by: David Howells
    cc: Chris Wright
    cc: David S. Miller
    Acked-by: Chris Wright
    Signed-off-by: David S. Miller

    David Howells
     

18 May, 2010

1 commit

  • Add new netdev ops ndo_{set|get}_vf_port to allow setting of
    port-profile on a netdev interface. Extends netlink socket RTM_SETLINK/
    RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
    (added to end of IFLA_cmd list). These are both nested atrtibutes
    using this layout:

    [IFLA_NUM_VF]
    [IFLA_VF_PORTS]
    [IFLA_VF_PORT]
    [IFLA_PORT_*], ...
    [IFLA_VF_PORT]
    [IFLA_PORT_*], ...
    ...
    [IFLA_PORT_SELF]
    [IFLA_PORT_*], ...

    These attributes are design to be set and get symmetrically. VF_PORTS
    is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV
    device. PORT_SELF is for the PF of the SR-IOV device, in case it wants
    to also have a port-profile, or for the case where the VF==PF, like in
    enic patch 2/2 of this patch set.

    A port-profile is used to configure/enable the external switch virtual port
    backing the netdev interface, not to configure the host-facing side of the
    netdev. A port-profile is an identifier known to the switch. How port-
    profiles are installed on the switch or how available port-profiles are
    made know to the host is outside the scope of this patch.

    There are two types of port-profiles specs in the netlink msg. The first spec
    is for 802.1Qbg (pre-)standard, VDP protocol. The second spec is for devices
    that run a similar protocol as VDP but in firmware, thus hiding the protocol
    details. In either case, the specs have much in common and makes sense to
    define the netlink msg as the union of the two specs. For example, both specs
    have a notition of associating/deassociating a port-profile. And both specs
    require some information from the hypervisor manager, such as client port
    instance ID.

    The general flow is the port-profile is applied to a host netdev interface
    using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
    switch, and the switch virtual port backing the host netdev interface is
    configured/enabled based on the settings defined by the port-profile. What
    those settings comprise, and how those settings are managed is again
    outside the scope of this patch, since this patch only deals with the
    first step in the flow.

    Signed-off-by: Scott Feldman
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Scott Feldman
     

17 May, 2010

1 commit


16 May, 2010

1 commit

  • Now we have a set of nested attributes:

    IFLA_VFINFO_LIST (NESTED)
    IFLA_VF_INFO (NESTED)
    IFLA_VF_MAC
    IFLA_VF_VLAN
    IFLA_VF_TX_RATE

    This allows a single set to operate on multiple attributes if desired.
    Among other things, it means a dump can be replayed to set state.

    The current interface has yet to be released, so this seems like
    something to consider for 2.6.34.

    Signed-off-by: Chris Wright
    Signed-off-by: David S. Miller

    Chris Wright
     

28 Apr, 2010

2 commits


26 Apr, 2010

1 commit

  • Decouple rtnetlink address families from real address families in socket.h to
    be able to add rtnetlink interfaces to code that is not a real address family
    without increasing AF_MAX/NPROTO.

    This will be used to add support for multicast route dumping from all tables
    as the proc interface can't be extended to support anything but the main table
    without breaking compatibility.

    This partialy undoes the patch to introduce independant families for routing
    rules and converts ipmr routing rules to a new rtnetlink family. Similar to
    that patch, values up to 127 are reserved for real address families, values
    above that may be used arbitrarily.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

23 Apr, 2010

1 commit


14 Apr, 2010

1 commit

  • Decouple the address family values used for fib_rules from the real
    address families in socket.h. This allows to use fib_rules for
    code that is not a real address family without increasing AF_MAX/NPROTO.

    Values up to 127 are reserved for real address families and map directly
    to the corresponding AF value, values starting from 128 are for other
    uses. rtnetlink is changed to invoke the AF_UNSPEC dumpit/doit handlers
    for these families.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

28 Mar, 2010

2 commits