02 Jul, 2008

3 commits


28 Jun, 2008

2 commits

  • Signed-off-by: Wang Chen
    Signed-off-by: David S. Miller

    Wang Chen
     
  • If an skb has nr_frags set to zero but its frag_list is not empty (as
    it can happen if software LRO is enabled), and a previous
    tcp_read_sock has consumed the linear part of the skb, then
    __skb_splice_bits:

    (a) incorrectly reports an error and

    (b) forgets to update the offset to account for the linear part

    Any of the two problems will cause the subsequent __skb_splice_bits
    call (the one that handles the frag_list skbs) to either skip data,
    or, if the unadjusted offset is greater then the size of the next skb
    in the frag_list, make tcp_splice_read loop forever.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

21 Jun, 2008

1 commit

  • Alexey Dobriyan writes:
    > Subject: ICMP sockets destruction vs ICMP packets oops

    > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
    > icmp_reply() wants ICMP socket.
    >
    > Steps to reproduce:
    >
    > launch shell in new netns
    > move real NIC to netns
    > setup routing
    > ping -i 0
    > exit from shell
    >
    > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    > IP: [] icmp_sk+0x17/0x30
    > PGD 17f3cd067 PUD 17f3ce067 PMD 0
    > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
    > CPU 0
    > Modules linked in: usblp usbcore
    > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
    > RIP: 0010:[] [] icmp_sk+0x17/0x30
    > RSP: 0018:ffffffff8057fc30 EFLAGS: 00010286
    > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
    > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
    > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
    > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
    > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
    > FS: 0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
    > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
    > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
    > Stack: 0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
    > ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
    > 000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
    > Call Trace:
    > [] icmp_reply+0x44/0x1e0
    > [] ? ip_route_input+0x23a/0x1360
    > [] icmp_echo+0x65/0x70
    > [] icmp_rcv+0x180/0x1b0
    > [] ip_local_deliver+0xf4/0x1f0
    > [] ip_rcv+0x33b/0x650
    > [] netif_receive_skb+0x27a/0x340
    > [] process_backlog+0x9d/0x100
    > [] net_rx_action+0x18d/0x250
    > [] __do_softirq+0x75/0x100
    > [] call_softirq+0x1c/0x30
    > [] do_softirq+0x65/0xa0
    > [] irq_exit+0x97/0xa0
    > [] do_IRQ+0xa8/0x130
    > [] ? mwait_idle+0x0/0x60
    > [] ret_from_intr+0x0/0xf
    > [] ? mwait_idle+0x4c/0x60
    > [] ? mwait_idle+0x43/0x60
    > [] ? cpu_idle+0x57/0xa0
    > [] ? rest_init+0x70/0x80
    > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
    > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 8b 04 c3 48 83 c4 08
    > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
    > RIP [] icmp_sk+0x17/0x30
    > RSP
    > CR2: 0000000000000000
    > ---[ end trace ea161157b76b33e8 ]---
    > Kernel panic - not syncing: Aiee, killing interrupt handler!

    Receiving packets while we are cleaning up a network namespace is a
    racy proposition. It is possible when the packet arrives that we have
    removed some but not all of the state we need to fully process it. We
    have the choice of either playing wack-a-mole with the cleanup routines
    or simply dropping packets when we don't have a network namespace to
    handle them.

    Since the check looks inexpensive in netif_receive_skb let's just
    drop the incoming packets.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

17 Jun, 2008

1 commit

  • Selected device feature bits can be propagated to VLAN devices, so we
    can make use of TX checksum offload and TSO on VLAN-tagged packets.
    However, if the physical device does not do VLAN tag insertion or
    generic checksum offload then the test for TX checksum offload in
    dev_queue_xmit() will see a protocol of htons(ETH_P_8021Q) and yield
    false.

    This splits the checksum offload test into two functions:

    - can_checksum_protocol() tests a given protocol against a feature bitmask

    - dev_can_checksum() first tests the skb protocol against the device
    features; if that fails and the protocol is htons(ETH_P_8021Q) then
    it tests the encapsulated protocol against the effective device
    features for VLANs

    Signed-off-by: Ben Hutchings
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Ben Hutchings
     

05 Jun, 2008

1 commit

  • skb_splice_bits temporary drops the socket lock while iterating over
    the socket queue in order to break a reverse locking condition which
    happens with sendfile. This, however, opens a window of opportunity
    for tcp_collapse() to aggregate skbs and thus potentially free the
    current skb used in skb_splice_bits and tcp_read_sock.

    This patch fixes the problem by (re-)getting the same "logical skb"
    after the lock has been temporary dropped.

    Based on idea and initial patch from Evgeniy Polyakov.

    Signed-off-by: Octavian Purdila
    Acked-by: Evgeniy Polyakov
    Signed-off-by: David S. Miller

    Octavian Purdila
     

04 Jun, 2008

3 commits

  • Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
    nla_nest_cancel() void functions.

    Return -EMSGSIZE instead of -1 if the provided message buffer is not
    big enough.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • No need to compute copy twice in the frags loop in
    dma_skb_copy_datagram_iovec().

    Signed-off-by: Brice Goglin
    Acked-by: Shannon Nelson
    Signed-off-by: Maciej Sosnowski
    Signed-off-by: Dan Williams
    Signed-off-by: David S. Miller

    Brice Goglin
     
  • The neighbor table time of last use information is returned in the
    incorrect unit. Kernel to user space ABI's need to use USER_HZ (or
    milliseconds), otherwise the application has to try and discover the
    real system HZ value which is problematic. Linux has standardized on
    keeping USER_HZ consistent (100hz) even when kernel is running
    internally at some other value.

    This change is small, but it breaks the ABI for older version of
    iproute2 utilities. But these utilities are already broken since they
    are looking at the psched_hz values which are completely different. So
    let's just go ahead and fix both kernel and user space. Older
    utilities will just print wrong values.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

21 May, 2008

2 commits

  • The following courruption can happen during pktgen stop:
    list_del corruption. prev->next should be ffff81007e8a5e70, but was 6b6b6b6b6b6b6b6b
    kernel BUG at lib/list_debug.c:67!
    :pktgen:pktgen_thread_worker+0x374/0x10b0
    ? autoremove_wake_function+0x0/0x40
    ? _spin_unlock_irqrestore+0x42/0x80
    ? :pktgen:pktgen_thread_worker+0x0/0x10b0
    kthread+0x4d/0x80
    child_rip+0xa/0x12
    ? restore_args+0x0/0x30
    ? kthread+0x0/0x80
    ? child_rip+0x0/0x12
    RIP list_del+0x48/0x70

    The problem is that pktgen_thread_worker can not be executed if kthread_stop
    has been called too early. Insert a completion on the normal initialization
    path to make sure that pktgen_thread_worker will gain the control for sure.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Am I just being particularly dim today, or can the call to
    dev->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags() never
    happen?

    We've just set dev->flags = flags & IFF_MULTICAST, effectively. So the
    condition '(dev->flags ^ flags) & IFF_MULTICAST' is _never_ going to be
    true.

    Signed-off-by: David Woodhouse
    Signed-off-by: David S. Miller

    David Woodhouse
     

15 May, 2008

1 commit


14 May, 2008

1 commit


13 May, 2008

1 commit

  • This patch adds needed_headroom/needed_tailroom members to struct
    net_device and updates many places that allocate sbks to use them. Not
    all of them can be converted though, and I'm sure I missed some (I
    mostly grepped for LL_RESERVED_SPACE)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

08 May, 2008

2 commits

  • dev_open() and dev_close() must be called holding the RTNL, since they
    call device functions and netdevice notifiers that are promised the RTNL.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • When a net namespace is destroyed, some devices (those, not killed
    on ns stop explicitly) are moved back to init_net.

    The problem, is that this net_ns change has one point of failure -
    the __dev_alloc_name() may be called if a name collision occurs (and
    this is easy to trigger). This allocator performs a likely-to-fail
    GFP_ATOMIC allocation to find a suitable number. Other possible
    conditions that may cause error (for device being ns local or not
    registered) are always false in this case.

    So, when this call fails, the device is unregistered. But this is
    *not* the right thing to do, since after this the device may be
    released (and kfree-ed) improperly. E. g. bridges require more
    actions (sysfs update, timer disarming, etc.), some other devices
    want to remove their private areas from lists, etc.

    I. e. arbitrary use-after-free cases may occur.

    The proposed fix is the following: since the only reason for the
    dev_change_net_namespace to fail is the name generation, we may
    give it a unique fall-back name w/o %d-s in it - the dev
    one, since ifindexes are still unique.

    So make this change, raise the failure-case printk loglevel to
    EMERG and replace the unregister_netdevice call with BUG().

    [ Use snprintf() -DaveM ]

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

04 May, 2008

1 commit

  • include/linux/skbuff.h says:
    /* These elements must be at the end, see alloc_skb() for details. */

    net/core/skbuff.c says:
    * See comment in sk_buff definition, just before the 'tail' member

    This patch contains my guess as to the actual reason rather than a
    dead comment reference loop.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

03 May, 2008

4 commits

  • When a netdev is moved across namespaces with the
    'dev_change_net_namespace' function, the 'device_rename' function is
    used to fixup kobject and refresh the sysfs tree. The device_rename
    function will call kobject_rename and this one will check if there is
    an object with the same name and this is the case because we are
    renaming the object with the same name.

    The use of 'device_rename' seems for me wrong because we usually don't
    rename it but just move it across namespaces. As we just want to do a
    mini "netdev_[un]register", IMO the functions
    'netdev_[un]register_kobject' should be used instead, like an usual
    network device [un]registering.

    This patch replace device_rename by netdev_unregister_kobject,
    followed by netdev_register_kobject.

    The netdev_register_kobject will call device_initialize and will raise
    a warning indicating the device was already initialized. In order to
    fix that, I split the device initialization into a separate function
    and use it together with 'netdev_register_kobject' into
    register_netdevice. So we can safely call 'netdev_register_kobject' in
    'dev_change_net_namespace'.

    This fix will allow to properly use the sysfs per namespace which is
    coming from -mm tree.

    Signed-off-by: Daniel Lezcano
    Acked-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Daniel Lezcano
     
  • Remove the fixed size channels[NR_CPUS] array in net/core/dev.c and
    dynamically allocate array based on nr_cpu_ids.

    Signed-off-by: Mike Travis
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Mike Travis
     
  • Signed-off-by: Harvey Harrison
    Signed-off-by: David S. Miller

    Harvey Harrison
     
  • One finds all kinds of crazy things with some shell pipelining.

    Signed-off-by: Ilpo Järvinen
    Acked-by: David Howells
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

02 May, 2008

1 commit


29 Apr, 2008

1 commit

  • Some drivers have duplicated unlikely() macros. IS_ERR() already has
    unlikely() in itself.

    This patch cleans up such pointless code.

    Signed-off-by: Hirofumi Nakagawa
    Acked-by: David S. Miller
    Acked-by: Jeff Garzik
    Cc: Paul Clements
    Cc: Richard Purdie
    Cc: Alessandro Zummo
    Cc: David Brownell
    Cc: James Bottomley
    Cc: Michael Halcrow
    Cc: Anton Altaparmakov
    Cc: Al Viro
    Cc: Carsten Otte
    Cc: Patrick McHardy
    Cc: Paul Mundt
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hirofumi Nakagawa
     

26 Apr, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits)
    net: Fix wrong interpretation of some copy_to_user() results.
    xfrm: alg_key_len & alg_icv_len should be unsigned
    [netdrvr] tehuti: move ioctl perm check closer to function start
    ipv6: Fix typo in net/ipv6/Kconfig
    via-velocity: fix vlan receipt
    tg3: sparse cleanup
    forcedeth: realtek phy crossover detection
    ibm_newemac: Increase MDIO timeouts
    gianfar: Fix skb allocation strategy
    netxen: reduce stack usage of netxen_nic_flash_print
    smc911x: test after postfix decrement fails in smc911x_{reset,drop_pkt}
    net drivers: fix platform driver hotplug/coldplug
    forcedeth: new backoff implementation
    ehea: make things static
    phylib: Add support for board-level PHY fixups
    [netdrvr] atlx: code movement: move atl1 parameter parsing
    atlx: remove flash vendor parameter
    korina: misc cleanup
    korina: fix misplaced return statement
    WAN: Fix confusing insmod error code for C101 too.
    ...

    Linus Torvalds
     

25 Apr, 2008

1 commit

  • In the ethtool user-space application, tg3 and natsemi over-ride the
    default implementation of dump_eeprom(). In both tg3_dump_eeprom() and
    natsemi_dump_eeprom(), there is a magic number check which is not
    present in the default implementation.

    Commit b131dd5d ("[ETHTOOL]: Add support for large eeproms") snipped
    the code which copied the ethtool_eeprom structure back to
    user-space. tg3 and natsemi are over-writing the magic number field
    and then checking it in user-space. With the ethtool_eeprom copy
    removed, the check is failing.

    The fix is simple. Add the ethtool_eeprom copy back.

    Signed-off-by: Mandeep Singh Baines
    Signed-off-by: David S. Miller

    Mandeep Singh Baines
     

24 Apr, 2008

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
    tun: Multicast handling in tun_chr_ioctl() needs proper locking.
    [NET]: Fix heavy stack usage in seq_file output routines.
    [AF_UNIX] Initialise UNIX sockets before general device initcalls
    [RTNETLINK]: Fix bogus ASSERT_RTNL warning
    iwlwifi: Fix built-in compilation of iwlcore (part 2)
    tun: Fix minor race in TUNSETLINK ioctl handling.
    ppp_generic: use stats from net_device structure
    iwlwifi: Don't unlock priv->mutex if it isn't locked
    wireless: rndis_wlan: modparam_workaround_interval is never below 0.
    prism54: prism54_get_encode() test below 0 on unsigned index
    mac80211: update mesh EID values
    b43: Workaround DMA quirks
    mac80211: fix use before check of Qdisc length
    net/mac80211/rx.c: fix off-by-one
    mac80211: Fix race between ieee80211_rx_bss_put and lookup routines.
    ath5k: Fix radio identification on AR5424/2424
    ssb: Fix all-ones boardflags
    b43: Add more btcoexist workarounds
    b43: Fix HostFlags data types
    b43: Workaround invalid bluetooth settings
    ...

    Linus Torvalds
     
  • ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is
    held. This bogus warnings when running in atomic context, which
    f.e. happens when adding secondary unicast addresses through
    macvlan or vlan or when synchronizing multicast addresses from
    wireless devices.

    Mid-term we might want to consider moving all address updates
    to process context since the locking seems overly complicated,
    for now just fix the bogus warning by changing ASSERT_RTNL to
    use mutex_is_locked().

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    iwlwifi: Fix built-in compilation of iwlcore
    net: Unexport move_addr_to_{kernel,user}
    rt2x00: Select LEDS_CLASS.
    iwlwifi: Select LEDS_CLASS.
    leds: Do not guard NEW_LEDS with HAS_IOMEM
    [IPSEC]: Fix catch-22 with algorithm IDs above 31
    time: Export set_normalized_timespec.
    tcp: Make use of before macro in tcp_input.c
    hamradio: Remove unneeded and deprecated cli()/sti() calls in dmascc.c
    [NETNS]: Remove empty ->init callback.
    [DCCP]: Convert do_gettimeofday() to getnstimeofday().
    [NETNS]: Don't initialize err variable twice.
    [NETNS]: The ip6_fib_timer can work with garbage on net namespace stop.
    [IPV4]: Convert do_gettimeofday() to getnstimeofday().
    [IPV4]: Make icmp_sk_init() static.
    [IPV6]: Make struct ip6_prohibit_entry_template static.
    tcp: Trivial fix to correct function name in a comment in net/ipv4/tcp.c
    [NET]: Expose netdevice dev_id through sysfs
    skbuff: fix missing kernel-doc notation
    [ROSE]: Fix soft lockup wrt. rose_node_list_lock

    Linus Torvalds
     

22 Apr, 2008

4 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
    [SPARC]: Remove SunOS and Solaris binary support.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial: (24 commits)
    DOC: A couple corrections and clarifications in USB doc.
    Generate a slightly more informative error msg for bad HZ
    fix typo "is" -> "if" in Makefile
    ext*: spelling fix prefered -> preferred
    DOCUMENTATION: Use newer DEFINE_SPINLOCK macro in docs.
    KEYS: Fix the comment to match the file name in rxrpc-type.h.
    RAID: remove trailing space from printk line
    DMA engine: typo fixes
    Remove unused MAX_NODES_SHIFT
    MAINTAINERS: Clarify access to OCFS2 development mailing list.
    V4L: Storage class should be before const qualifier (sn9c102)
    V4L: Storage class should be before const qualifier
    sonypi: Storage class should be before const qualifier
    intel_menlow: Storage class should be before const qualifier
    DVB: Storage class should be before const qualifier
    arm: Storage class should be before const qualifier
    ALSA: Storage class should be before const qualifier
    acpi: Storage class should be before const qualifier
    firmware_sample_driver.c: fix coding style
    MAINTAINERS: Add ati_remote2 driver
    ...

    Fixed up trivial conflicts in firmware_sample_driver.c

    Linus Torvalds
     
  • As you can see, there's no zero_it arg (in fact code always uses __GFP_ZERO).

    Signed-off-by: Rusty Russell
    Signed-off-by: Jesper Juhl

    Rusty Russell
     
  • As per Documentation/feature-removal-schedule.txt

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Apr, 2008

1 commit

  • Expose dev_id to userspace, because it helps to disambiguate between
    interfaces where the MAC address is unique.

    This should allow us to simplify the handling of persistent naming for
    S390 network devices in udev -- because it can depend on a simple
    attribute of the device like the other match criteria, rather than
    having a special case for SUBSYSTEMS=="ccwgroup".

    Signed-off-by: David Woodhouse
    Signed-off-by: David S. Miller

    David Woodhouse
     

19 Apr, 2008

2 commits

  • None of these files use any of the functionality promised by
    asm/semaphore.h. It's possible that they rely on it dragging in some
    unrelated header file, but I can't build all these files, so we'll have
    fix any build failures as they come up.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • This patch effectively reverts commit d0498d9ae1a5cebac363e38907266d5cd2eedf89
    aka "[NET]: Do not allocate unneeded memory for dev->priv alignment."
    It was found to be buggy because of final unconditional += NETDEV_ALIGN_CONST
    removal.

    For example, for sizeof(struct net_device) being 2048 bytes, "alloc_size"
    was also 2048 bytes, but allocator with debugging options turned on started
    giving out !32-byte aligned memory resulting in redzones overwrites.

    Patch does small optimization in ->priv'less case: bumping size to next
    32-byte boundary was always done to ensure ->priv will also be aligned.
    But, no ->priv, no need to do that.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

16 Apr, 2008

3 commits

  • The alloc_netdev_mq() tries to produce 32-bytes alignment for both
    the net_device itself and its private data. The second alignment is
    achieved by adding the NETDEV_ALIGN_CONST to the whole size of
    the memory to be allocated.

    However, for those devices that do not need the private area, this
    addition just makes the net_device weight 1024 + 32 = 1068 bytes,
    i.e. consume twice as much memory.

    Since loopback device is such (sizeof_priv == 0 for it), and each
    net namespace creates one, this can save a noticeable amount of
    memory for kernel with net namespaces turned on.

    After this set the lo device is actually allocated from a size-1024
    kmem cache on i386 box even with NETPOLL and WIRELESS_EXT turned on.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • dev_set_net is called for
    - just allocated devices
    - devices moving from one namespace to another
    release_net has proper check inside to distinguish these cases.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev