17 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
    net: sh_eth alignment fix for sh7724 using NET_IP_ALIGN V2
    ixgbe: allow tx of pre-formatted vlan tagged packets
    ixgbe: Fix 82598 premature copper PHY link indicatation
    ixgbe: Fix tx_restart_queue/non_eop_desc statistics counters
    bcm63xx_enet: fix compilation failure after get_stats_count removal
    packet: dont call sleeping functions while holding rcu_read_lock()
    tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.
    ipvs: zero usvc and udest
    netfilter: fix crashes in bridge netfilter caused by fragment jumps
    ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
    sky2: leave PCI config space writeable
    sky2: print Optima chip name
    x25: Update maintainer.
    ipvs: fix synchronization on connection close
    netfilter: xtables: document minimal required version
    drivers/net/bonding/: : use pr_fmt
    can: CAN_MCP251X should depend on HAS_DMA
    drivers/net/usb: Correct code taking the size of a pointer
    drivers/net/cpmac.c: Correct code taking the size of a pointer
    drivers/net/sfc: Correct code taking the size of a pointer
    ...

    Linus Torvalds
     

16 Dec, 2009

1 commit


14 Dec, 2009

1 commit

  • I received some bug reports about userspace programs having problems
    because after RTM_NEWLINK was received they could not immediate access
    files under /proc/sys/net/ because they had not been registered yet.

    The original problem was trivially fixed by moving the userspace
    notification from rtnetlink_event() to the end of
    register_netdevice().

    When testing that change I discovered I was still getting RTM_NEWLINK
    events before I could access proc and I was also getting RTM_NEWLINK
    events after I was seeing RTM_DELLINK. Things practically guaranteed
    to confuse userspace.

    After a little more investigation these extra notifications proved to
    be from the new notifiers NETDEV_POST_INIT and NETDEV_UNREGISTER_BATCH
    hitting the default case in rtnetlink_event, and triggering
    unnecessary RTM_NEWLINK messages.

    rtnetlink_event now explicitly handles NETDEV_UNREGISTER_BATCH and
    NETDEV_POST_INIT to avoid sending the incorrect userspace
    notifications.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

12 Dec, 2009

1 commit

  • Fix two problems:

    1. If unregister_netdevice_many() is called with both registered
    and unregistered devices, rollback_registered_many() bails out
    when it reaches the first unregistered device. The processing
    of the prior registered devices is unfinished, and the
    remaining devices are skipped, and possible registered netdev's
    are leaked/unregistered.

    2. System hangs or panics depending on how the devices are passed,
    since when netdev_run_todo() runs, some devices were not fully
    processed.

    Tested by passing intermingled unregistered and registered vlan
    devices to unregister_netdevice_many() as follows:
    1. dev, fake_dev1, fake_dev2: hangs in run_todo
    ("unregister_netdevice: waiting for eth1.100 to become
    free. Usage count = 1")
    2. fake_dev1, dev, fake_dev2: failure during de-registration
    and next registration, followed by a vlan driver Oops
    during subsequent registration.

    Confirmed that the patch fixes both cases.

    Signed-off-by: Krishna Kumar
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Krishna Kumar
     

08 Dec, 2009

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
    mac80211: fix reorder buffer release
    iwmc3200wifi: Enable wimax core through module parameter
    iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
    iwmc3200wifi: Coex table command does not expect a response
    iwmc3200wifi: Update wiwi priority table
    iwlwifi: driver version track kernel version
    iwlwifi: indicate uCode type when fail dump error/event log
    iwl3945: remove duplicated event logging code
    b43: fix two warnings
    ipw2100: fix rebooting hang with driver loaded
    cfg80211: indent regulatory messages with spaces
    iwmc3200wifi: fix NULL pointer dereference in pmkid update
    mac80211: Fix TX status reporting for injected data frames
    ath9k: enable 2GHz band only if the device supports it
    airo: Fix integer overflow warning
    rt2x00: Fix padding bug on L2PAD devices.
    WE: Fix set events not propagated
    b43legacy: avoid PPC fault during resume
    b43: avoid PPC fault during resume
    tcp: fix a timewait refcnt race
    ...

    Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
    CTL_UNNUMBERED removed) in
    kernel/sysctl_check.c
    net/ipv4/sysctl_net_ipv4.c
    net/ipv6/addrconf.c
    net/sctp/sysctl.c

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
    security/tomoyo: Remove now unnecessary handling of security_sysctl.
    security/tomoyo: Add a special case to handle accesses through the internal proc mount.
    sysctl: Drop & in front of every proc_handler.
    sysctl: Remove CTL_NONE and CTL_UNNUMBERED
    sysctl: kill dead ctl_handler definitions.
    sysctl: Remove the last of the generic binary sysctl support
    sysctl net: Remove unused binary sysctl code
    sysctl security/tomoyo: Don't look at ctl_name
    sysctl arm: Remove binary sysctl support
    sysctl x86: Remove dead binary sysctl support
    sysctl sh: Remove dead binary sysctl support
    sysctl powerpc: Remove dead binary sysctl support
    sysctl ia64: Remove dead binary sysctl support
    sysctl s390: Remove dead sysctl binary support
    sysctl frv: Remove dead binary sysctl support
    sysctl mips/lasat: Remove dead binary sysctl support
    sysctl drivers: Remove dead binary sysctl support
    sysctl crypto: Remove dead binary sysctl support
    sysctl security/keys: Remove dead binary sysctl support
    sysctl kernel: Remove binary sysctl logic
    ...

    Linus Torvalds
     

06 Dec, 2009

2 commits


04 Dec, 2009

8 commits

  • Provide common routine for the transition of operational state for a leaf
    device during a root device transition.

    Signed-off-by: Patrick Mullaney
    Acked-by: Arnd Bergmann
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick Mullaney
     
  • Refactor the code so fib_rules_register always takes a template instead
    of the actual fib_rules_ops structure that will be used. This is
    required for network namespace support so 2 out of the 3 callers already
    do this, it allows the error handling to be made common, and it allows
    fib_rules_unregister to free the template for hte caller.

    Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu
    to allw multiple namespaces to be cleaned up in the same rcu grace
    period.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This allows namespace exit methods to batch work that comes requires an
    rcu barrier using call_rcu without having to treat the
    unregister_pernet_operations cases specially.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Move network device exit batching from a special case in
    net_namespace.c to using common mechanisms in dev.c

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • - Add exit_list to struct net to support building lists of network
    namespaces to cleanup.

    - Add exit_batch to pernet_operations to allow running operations only
    once during a network namespace exit. Instead of once per network
    namespace.

    - Factor opt ops_exit_list and ops_exit_free so the logic with cleanup
    up a network namespace does not need to be duplicated.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • commit d124356ce314fff22a047ea334379d5105b2d834
    Author: Patrick McHardy
    Date: Thu Dec 3 12:16:35 2009 +0100

    net: fib_rules: allow to delete local rule

    Allow to delete the local rule and recreate it with a higher priority. This
    can be used to force packets with a local destination out on the wire instead
    of routing them to loopback. Additionally this patch allows to recreate rules
    with a priority of 0.

    Combined with the previous patch to allow oif classification, a socket can
    be bound to the desired interface and packets routed to the wire like this:

    # move local rule to lower priority
    ip rule add pref 1000 lookup local
    ip rule del pref 0

    # route packets of sockets bound to eth0 to the wire independant
    # of the destination address
    ip rule add pref 100 oif eth0 lookup 100
    ip route add default dev eth0 table 100

    Signed-off-by: Patrick McHardy

    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • commit 68144d350f4f6c348659c825cde6a82b34c27a91
    Author: Patrick McHardy
    Date: Thu Dec 3 12:05:25 2009 +0100

    net: fib_rules: add oif classification

    Support routing table lookup based on the flow's oif. This is useful to
    classify packets originating from sockets bound to interfaces differently.

    The route cache already includes the oif and needs no changes.

    Signed-off-by: Patrick McHardy

    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • commit 229e77eec406ad68662f18e49fda8b5d366768c5
    Author: Patrick McHardy
    Date: Thu Dec 3 12:05:23 2009 +0100

    net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME

    The next patch will add oif classification, rename interface related members
    and attributes to reflect that they're used for iif classification.

    Signed-off-by: Patrick McHardy

    Signed-off-by: David S. Miller

    Patrick McHardy
     

03 Dec, 2009

1 commit

  • The two functions skb_dma_map/unmap are unsafe to use as they cause
    problems when packets are cloned and sent to multiple devices while a HW
    IOMMU is enabled. Due to this it is best to remove the code so it is not
    used by any other network driver maintainters.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     

02 Dec, 2009

4 commits

  • - Defer dellink to net_cleanup() allowing for batching.
    - Fix comment.
    - Use for_each_netdev_safe again as dev_change_net_namespace touches
    at most one network device (unlike veth dellink).

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • To get the full benefit of batched network namespace cleanup netowrk
    device deletion needs to be performed by the generic code. When
    using register_pernet_gen_device and freeing the data in exit_net
    it is impossible to delay allocation until after exit_net has called
    as the device uninit methods are no longer safe.

    To correct this, and to simplify working with per network namespace data
    I have moved allocation and deletion of per network namespace data into
    the network namespace core. The core now frees the data only after
    all of the network namespace exit routines have run.

    Now it is only required to set the new fields .id and .size
    in the pernet_operations structure if you want network namespace
    data to be managed for you automatically.

    This makes the current register_pernet_gen_device and
    register_pernet_gen_subsys routines unnecessary. For the moment
    I have left them as compatibility wrappers in net_namespace.h
    They will be removed once all of the users have been updated.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • It is fairly common to kill several network namespaces at once. Either
    because they are nested one inside the other or because they are cooperating
    in multiple machine networking experiments. As the network stack control logic
    does not parallelize easily batch up multiple network namespaces existing
    together.

    To get the full benefit of batching the virtual network devices to be
    removed must be all removed in one batch. For that purpose I have added
    a loop after the last network device operations have run that batches
    up all remaining network devices and deletes them.

    An extra benefit is that the reorganization slightly shrinks the size
    of the per network namespace data structures replaceing a work_struct
    with a list_head.

    In a trivial test with 4K namespaces this change reduced the cost of
    a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
    (at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • The motivation for an additional notifier in batched netdevice
    notification (rt_do_flush) only needs to be called once per batch not
    once per namespace.

    For further batching improvements I need a guarantee that the
    netdevices are unregistered in order allowing me to unregister an all
    of the network devices in a network namespace at the same time with
    the guarantee that the loopback device is really and truly
    unregistered last.

    Additionally it appears that we moved the route cache flush after
    the final synchronize_net, which seems wrong and there was no
    explanation. So I have restored the original location of the final
    synchronize_net.

    Cc: Octavian Purdila
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

30 Nov, 2009

1 commit


29 Nov, 2009

2 commits

  • pktgen threads are bound to given CPU, we can allocate memory for
    these threads in a NUMA aware way.

    After a pktgen session on two threads, we can check flows memory was
    allocated on right node, instead of a not related one.

    # grep pktgen_thread_write /proc/vmallocinfo
    0xffffc90007204000-0xffffc90007385000 1576960 pktgen_thread_write+0x3a4/0x6b0 [pktgen] pages=384 vmalloc N0=384
    0xffffc90007386000-0xffffc90007507000 1576960 pktgen_thread_write+0x3a4/0x6b0 [pktgen] pages=384 vmalloc N1=384

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Conflicts:
    drivers/ieee802154/fakehard.c
    drivers/net/e1000e/ich8lan.c
    drivers/net/e1000e/phy.c
    drivers/net/netxen/netxen_nic_init.c
    drivers/net/wireless/ath/ath9k/main.c

    David S. Miller
     

27 Nov, 2009

1 commit

  • The veth driver contains code to forward an skb
    from the start_xmit function of one network
    device into the receive path of another device.

    Moving that code into a common location lets us
    reuse the code for direct forwarding of data
    between macvlan ports, and possibly in other
    drivers.

    Signed-off-by: Arnd Bergmann
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

25 Nov, 2009

1 commit

  • When multi queue compatable names are used by pktgen (eg eth0@0),
    we currently cannot unload a NIC driver if one of its device
    is currently in use.

    Allow pktgen_find_dev() to find pktgen devices by their suffix (netdev name)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Nov, 2009

1 commit

  • Commit e6fce5b916cd7f7f7 (pktgen: multiqueue etc.) tried to relax
    the pktgen restriction of one device per kernel thread, adding a '@'
    tag to device names.

    Problem is we dont perform check on full pktgen device name.
    This allows adding many time same 'device' to pktgen thread

    pgset "add_device eth0@0"

    one session later :

    pgset "add_device eth0@0"

    (This doesnt find previous device)

    This consumes ~1.5 MBytes of vmalloc memory per round and also triggers
    this warning :

    [ 673.186380] proc_dir_entry 'pktgen/eth0@0' already registered
    [ 673.186383] Modules linked in: pktgen ixgbe ehci_hcd psmouse mdio mousedev evdev [last unloaded: pktgen]
    [ 673.186406] Pid: 6219, comm: bash Tainted: G W 2.6.32-rc7-03302-g41cec6f-dirty #16
    [ 673.186410] Call Trace:
    [ 673.186417] [] warn_slowpath_common+0x7b/0xc0
    [ 673.186422] [] warn_slowpath_fmt+0x41/0x50
    [ 673.186426] [] proc_register+0x109/0x210
    [ 673.186433] [] ? apic_timer_interrupt+0xe/0x20
    [ 673.186438] [] proc_create_data+0x75/0xd0
    [ 673.186444] [] pktgen_thread_write+0x568/0x640 [pktgen]
    [ 673.186449] [] ? pktgen_thread_write+0x0/0x640 [pktgen]
    [ 673.186453] [] proc_reg_write+0x84/0xc0
    [ 673.186458] [] vfs_write+0xb8/0x180
    [ 673.186463] [] sys_write+0x51/0x90
    [ 673.186468] [] system_call_fastpath+0x16/0x1b
    [ 673.186470] ---[ end trace ccbb991b0a8d994d ]---

    Solution to this problem is to use a odevname field (includes @ tag and suffix),
    instead of using netdevice name.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Robert Olsson
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Nov, 2009

1 commit


21 Nov, 2009

1 commit


19 Nov, 2009

1 commit


18 Nov, 2009

4 commits

  • Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     
  • Herbert Xu a écrit :
    > On Tue, Nov 17, 2009 at 04:26:04AM -0800, David Miller wrote:
    >> Really, the link watch stuff is just due for a redesign. I don't
    >> think a simple hack is going to cut it this time, sorry Eric :-)
    >
    > I have no objections against any redesigns, but since the only
    > caller of linkwatch_forget_dev runs in process context with the
    > RTNL, it could also legally emit those events.

    Thanks guys, here an updated version then, before linkwatch surgery ?

    In this version, I force the event to be sent synchronously.

    [PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle

    time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

    real 0m0.266s
    user 0m0.000s
    sys 0m0.001s

    real 0m0.770s
    user 0m0.000s
    sys 0m0.000s

    real 0m1.022s
    user 0m0.000s
    sys 0m0.000s

    One problem of current schem in vlan dismantle phase is the
    holding of device done by following chain :

    vlan_dev_stop() ->
    netif_carrier_off(dev) ->
    linkwatch_fire_event(dev) ->
    dev_hold() ...

    And __linkwatch_run_queue() runs up to one second later...

    A generic fix to this problem is to add a linkwatch_forget_dev() method
    to unlink the device from the list of watched devices.

    dev->link_watch_next becomes dev->link_watch_list (and use a bit more memory),
    to be able to unlink device in O(1).

    After patch :
    time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

    real 0m0.024s
    user 0m0.000s
    sys 0m0.000s

    real 0m0.032s
    user 0m0.000s
    sys 0m0.001s

    real 0m0.033s
    user 0m0.000s
    sys 0m0.000s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This new event is called once for each unique net namespace in batched
    unregister operations (with the argument set to a random device from
    that namespace) and once per device in non-batched unregister
    operations.

    It allows us to factorize some device unregister work such as clearing the
    routing cache.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     
  • Some drivers ndo_get_stats() method need to perform txqueue stats folding.

    Move folding from dev_get_stats() to a new dev_txq_stats_fold() function

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Nov, 2009

2 commits


16 Nov, 2009

3 commits

  • net: Fix the rollback test in dev_change_name()

    In dev_change_name() an err variable is used for storing the original
    call_netdevice_notifiers() errno (negative) and testing for a rollback
    error later, but the test for non-zero is wrong, because the err might
    have positive value as well - from dev_alloc_name(). It means the
    rollback for a netdevice with a number > 0 will never happen. (The err
    test is reordered btw. to make it more readable.)

    Signed-off-by: Jarek Poplawski
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Recent changes in the TX error propagation require additional checking
    and masking of values returned from hard_start_xmit(), mainly to
    separate cases where skb was consumed. This aim can be simplified by
    changing the order of NETDEV_TX and NET_XMIT codes, because the latter
    are treated similarly to negative (ERRNO) values.

    After this change much simpler dev_xmit_complete() is also used in
    sch_direct_xmit(), so it is moved to netdevice.h.

    Additionally NET_RX definitions in netdevice.h are moved up from
    between TX codes to avoid confusion while reading the TX comment.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Check the return value of ndo_select_queue(). If the value isn't smaller
    than the real_num_tx_queues, print a warning message, and reset it to zero.

    Signed-off-by: Changli Gao
    Signed-off-by: Eric Dumazet
    ----
    Signed-off-by: David S. Miller

    Eric Dumazet