23 Feb, 2011

7 commits

  • Currently the bridge multicast snooping feature periodically issues
    IPv6 general multicast listener queries to sense the absence of a
    listener.

    For this, it uses :: as its source address - however RFC 2710 requires:
    "To be valid, the Query message MUST come from a link-local IPv6 Source
    Address". Current Linux kernel versions seem to follow this requirement
    and ignore our bogus MLD queries.

    With this commit a link local address from the bridge interface is being
    used to issue the MLD query, resulting in other Linux devices which are
    multicast listeners in the network to respond with a MLD response (which
    was not the case before).

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • Map the IPv6 header's destination multicast address to an ethernet
    source address instead of the MLD queries multicast address.

    For instance for a general MLD query (multicast address in the MLD query
    set to ::), this would wrongly be mapped to 33:33:00:00:00:00, although
    an MLD queries destination MAC should always be 33:33:00:00:00:01 which
    matches the IPv6 header's multicast destination ff02::1.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • Currently the multicast bridge snooping support is not active for
    link local multicast. I assume this has been done to leave
    important multicast data untouched, like IPv6 Neighborhood Discovery.

    In larger, bridged, local networks it could however be desirable to
    optimize for instance local multicast audio/video streaming too.

    With the transient flag in IPv6 multicast addresses we have an easy
    way to optimize such multimedia traffic without tempering with the
    high priority multicast data from well-known addresses.

    This patch alters the multicast bridge snooping for IPv6, to take
    effect for transient multicast addresses instead of non-link-local
    addresses.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • This commit adds the missing IPv6 multicast address flag defines to
    complement the already existing multicast address scope defines and to
    be able to check these flags nicely in the future.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • The nsrcs number is 2 Byte wide, therefore we need to call ntohs()
    before using it.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • We actually want a pointer to the grec_nsrcr and not the following
    field. Otherwise we can get very high values for *nsrcs as the first two
    bytes of the IPv6 multicast address are being used instead, leading to
    a failing pskb_may_pull() which results in MLDv2 reports not being
    parsed.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     
  • The protocol type for IPv6 entries in the hash table for multicast
    bridge snooping is falsely set to ETH_P_IP, marking it as an IPv4
    address, instead of setting it to ETH_P_IPV6, which results in negative
    look-ups in the hash table later.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     

22 Feb, 2011

1 commit

  • Fix a bug that undo_retrans is incorrectly decremented when undo_marker is
    not set or undo_retrans is already 0. This happens when sender receives
    more DSACK ACKs than packets retransmitted during the current
    undo phase. This may also happen when sender receives DSACK after
    the undo operation is completed or cancelled.

    Fix another bug that undo_retrans is incorrectly incremented when
    sender retransmits an skb and tcp_skb_pcount(skb) > 1 (TSO). This case
    is rare but not impossible.

    Signed-off-by: Yuchung Cheng
    Acked-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

21 Feb, 2011

1 commit

  • From: Eric W. Biederman

    In the beginning with batching unreg_list was a list that was used only
    once in the lifetime of a network device (I think). Now we have calls
    using the unreg_list that can happen multiple times in the life of a
    network device like dev_deactivate and dev_close that are also using the
    unreg_list. In addition in unregister_netdevice_queue we also do a
    list_move because for devices like veth pairs it is possible that
    unregister_netdevice_queue will be called multiple times.

    So I think the change below to fix dev_deactivate which Eric D. missed
    will fix this problem. Now to go test that.

    Signed-off-by: David S. Miller

    Eric W. Biederman
     

20 Feb, 2011

4 commits

  • commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809 re-worked the
    handling of unknown parameters. sctp_init_cause_fixed() can now
    return -ENOSPC if there is not enough tailroom in the error
    chunk skb. When this happens, the error header is not appended to
    the error chunk. In that case, the payload of the unknown parameter
    should not be appended either.

    Signed-off-by: Jiri Bohac
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Jiri Bohac
     
  • The dcb_app protocol field is a __u32 however the 802.1Qaz
    specification defines it as a 16 bit field. This patch brings
    the structure inline with the spec making it a __u16.

    CC: Shmulik Ravid
    Signed-off-by: John Fastabend
    Signed-off-by: David S. Miller

    John Fastabend
     
  • Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule()

    This is caused by inet_twsk_purge(), run from process context,
    and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.)
    removed the BH disabling that was necessary.

    Add the BH disabling but fine grained, right before calling
    inet_twsk_deschedule(), instead of whole function.

    With help from Linus Torvalds and Eric W. Biederman

    Reported-by: Eric W. Biederman
    Signed-off-by: Eric Dumazet
    CC: Daniel Lezcano
    CC: Pavel Emelyanov
    CC: Arnaldo Carvalho de Melo
    CC: stable (# 2.6.33+)
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David S. Miller
     

19 Feb, 2011

8 commits

  • * 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    RTC: Re-enable UIE timer/polling emulation
    RTC: Revert UIE emulation removal
    RTC: Release mutex in error path of rtc_alarm_irq_enable

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
    net: deinit automatic LIST_HEAD
    net: dont leave active on stack LIST_HEAD
    net: provide default_advmss() methods to blackhole dst_ops
    tg3: Restrict phy ioctl access
    drivers/net: Call netif_carrier_off at the end of the probe
    ixgbe: work around for DDP last buffer size
    ixgbe: fix panic due to uninitialised pointer
    e1000e: flush all writebacks before unload
    e1000e: check down flag in tasks
    isdn: hisax: Use l2headersize() instead of dup (and buggy) func.
    arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.
    cxgb4vf: Use defined Mailbox Timeout
    cxgb4vf: Quiesce Virtual Interfaces on shutdown ...
    cxgb4vf: Behave properly when CONFIG_DEBUG_FS isn't defined ...
    cxgb4vf: Check driver parameters in the right place ...
    pch_gbe: Fix the MAC Address load issue.
    iwlwifi: Delete iwl3945_good_plcp_health.
    net/can/softing: make CAN_SOFTING_CS depend on CAN_SOFTING
    netfilter: nf_iterate: fix incorrect RCU usage
    pch_gbe: Fix the issue that the receiving data is not normal.
    ...

    Linus Torvalds
     
  • * 'for-linus/bugfixes' of git://xenbits.xen.org/people/ianc/linux-2.6:
    xen: suspend and resume system devices when running PVHVM

    Linus Torvalds
     
  • * 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
    workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
    workqueue: wake up a worker when a rescuer is leaving a gcwq

    Linus Torvalds
     
  • commit 9b5e383c11b08784 (net: Introduce
    unregister_netdevice_many()) left an active LIST_HEAD() in
    rollback_registered(), with possible memory corruption.

    Even if device is freed without touching its unreg_list (and therefore
    touching the previous memory location holding LISTE_HEAD(single), better
    close the bug for good, since its really subtle.

    (Same fix for default_device_exit_batch() for completeness)

    Reported-by: Michal Hocko
    Tested-by: Michal Hocko
    Reported-by: Eric W. Biderman
    Tested-by: Eric W. Biderman
    Signed-off-by: Linus Torvalds
    Signed-off-by: Eric Dumazet
    CC: Ingo Molnar
    CC: Octavian Purdila
    CC: stable [.33+]
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Eric W. Biderman and Michal Hocko reported various memory corruptions
    that we suspected to be related to a LIST head located on stack, that
    was manipulated after thread left function frame (and eventually exited,
    so its stack was freed and reused).

    Eric Dumazet suggested the problem was probably coming from commit
    443457242beb (net: factorize
    sync-rcu call in unregister_netdevice_many)

    This patch fixes __dev_close() and dev_close() to properly deinit their
    respective LIST_HEAD(single) before exiting.

    References: https://lkml.org/lkml/2011/2/16/304
    References: https://lkml.org/lkml/2011/2/14/223

    Reported-by: Michal Hocko
    Tested-by: Michal Hocko
    Reported-by: Eric W. Biderman
    Tested-by: Eric W. Biderman
    Signed-off-by: Linus Torvalds
    Signed-off-by: Eric Dumazet
    CC: Ingo Molnar
    CC: Octavian Purdila
    Signed-off-by: David S. Miller

    Linus Torvalds
     
  • Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an
    accessor.) introduced a possible crash in tcp_connect_init(), when
    dst->default_advmss() is called from dst_metric_advmss()

    Reported-by: George Spelvin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When list debugging is enabled, we aim to readably show list corruption
    errors, and the basic list_add/list_del operations end up having extra
    debugging code in them to do some basic validation of the list entries.

    However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
    the debug code due to how they were written. This fixes that.

    So the _next_ time we have list_move() problems with stale list entries,
    we'll hopefully have an easier time finding them..

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Feb, 2011

12 commits

  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PM / Hibernate: Return error code when alloc_image_page() fails

    Linus Torvalds
     
  • * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
    drm/radeon/kms: add missing frac fb div flag for dce4+
    drm/radeon/kms: do not reject X16 and Y16X16 floating-point formats on r300
    drm/nouveau: fix suspend/resume on GPUs that don't have PM support
    drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup()
    drm/nv40: fix tiling-related setup for a number of chipsets
    drm/nouveau: fix non-EDIDful native mode selection
    drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.
    drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode.
    drm/nv10: Fix crash when allocating a BO larger than half the available VRAM.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
    IB/qib: Prevent double completions after a timeout or RNR error
    IB/qib: Fix double add_timer()
    RDMA/nes: Don't generate async events for unregistered devices

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
    sparc64: Fix NMI startup bug which also breaks perf.
    sparc: fix size argument to find_next_zero_bit()
    sparc: use bitmap_set()
    sparc32: unaligned memory access (MNA) trap handler bug

    Linus Torvalds
     
  • Validate number of blocks in map and remove redundant variable.

    Signed-off-by: Timo Warns
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Timo Warns
     
  • This patch re-enables UIE timer/polling emulation for rtc devices
    that do not support alarm irqs.

    CC: Uwe Kleine-König
    CC: Thomas Gleixner
    Reported-by: Uwe Kleine-König
    Tested-by: Uwe Kleine-König
    Signed-off-by: John Stultz

    John Stultz
     
  • Uwe pointed out that my alarm based UIE emulation is not sufficient
    to replace the older timer/polling based UIE emulation on devices
    where there is no alarm irq. This causes rtc devices without alarms
    to return -EINVAL to UIE ioctls. The fix is to re-instate the old
    timer/polling method for devices without alarm irqs.

    This patch reverts the following commits:
    042620a018afcfba1d678062b62e46 - Remove UIE emulation
    1daeddd5962acad1bea55e524fc0fa - Cleanup removed UIE emulation declaration
    b5cc8ca1c9c3a37eaddf709b2fd3e1 - Remove Kconfig symbol for UIE emulation

    The emulation mode will still need to be wired-in with a following
    patch before it will work.

    CC: Uwe Kleine-König
    CC: Thomas Gleixner
    Reported-by: Uwe Kleine-König
    Signed-off-by: John Stultz

    John Stultz
     
  • On hardware that doesn't support alarm interrupts, rtc_alarm_irq_enable
    could return without releasing the ops_lock mutex.

    This was introduced in
    aa0be0f (RTC: Propagate error handling via rtc_timer_enqueue properly)

    This patch corrects the issue by only returning once the mutex is
    released.

    [john.stultz: Reworded the commit log]

    Signed-off-by: Uwe Kleine-König
    Signed-off-by: John Stultz

    Uwe Kleine-König
     
  • If management firmware is present and the device is down, the firmware
    will assume control of the phy. If a phy access were allowed from the
    host, it will collide with firmware phy accesses, resulting in
    unpredictable behavior. This patch fixes the problem by disallowing phy
    accesses during the problematic condition.

    Signed-off-by: Matt Carlson
    Reviewed-by: Michael Chan
    Signed-off-by: David S. Miller

    Matt Carlson
     
  • Without calling of netif_carrier_off at the end of the probe the operstate
    is unknown when the device is initially opened. By default the carrier is
    on so when the device is opened and netif_carrier_on is called the link
    watch event is not fired and operstate remains zero (unknown).

    This patch fixes this behavior in forcedeth and r8169.

    Signed-off-by: Ivan Vecera
    Acked-by: Francois Romieu
    Signed-off-by: David S. Miller

    Ivan Vecera
     
  • Roland Dreier
     
  • There is a double completion associated with error handling for RC QPs.

    The sequence is:

    - The do_rc_ack() routine fields an RNR nack and there are 0
    rnr_retries configured on the QP.
    - qib_error_qp() stops the pending timer
    - qib_rc_send_complete() is called from sdma_complete()
    - qib_rc_send_complete() starts the timer because the msb of the psn
    just completed says an ack is needed.
    - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
    - rc_timeout() calls qib_restart_rc()
    - qib_restart_rc() calls qib_send_complete() with a
    IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
    past

    The fix avoids starting the timer since another packet will never
    arrive.

    Signed-off-by: Mike Marciniszyn
    Signed-off-by: Roland Dreier

    Mike Marciniszyn
     

17 Feb, 2011

7 commits

  • The flaw was in skipping the second byte in MAC header due to increasing
    the pointer AND indexed access starting at '1'.

    Signed-off-by: Joerg Marx
    Signed-off-by: Patrick McHardy

    Joerg Marx
     
  • Assigning a socket in timewait state to skb->sk can trigger
    kernel oops, e.g. in nfnetlink_log, which does:

    if (skb->sk) {
    read_lock_bh(&skb->sk->sk_callback_lock);
    if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...

    in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
    is invalid.

    Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
    or xt_TPROXY must not assign a timewait socket to skb->sk.

    This does the latter.

    If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
    thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.

    The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
    listener socket.

    Cc: Balazs Scheidler
    Cc: KOVACS Krisztian
    Signed-off-by: Florian Westphal
    Signed-off-by: Patrick McHardy

    Florian Westphal
     
  • Otherwise we fail to properly suspend/resume all of the emulated devices.

    Something between 2.6.38-rc2 and rc3 appears to have exposed this
    issue, but it's always been wrong not to do this.

    Signed-off-by: Ian Campbell
    Acked-by: Stefano Stabellini
    Acked-by: Jeremy Fitzhardinge

    Ian Campbell
     
  • A HW limitation was recently discovered where the last buffer in a DDP offload
    cannot be a full buffer size in length. Fix the issue with a work around by
    adding another buffer with size = 1.

    Signed-off-by: Amir Hanania
    Tested-by: Ross Brattain
    Signed-off-by: Jeff Kirsher

    Amir Hanania
     
  • Systems containing an 82599EB and running a backported driver from
    upstream were panicing on boot. It turns out hw->mac.ops.setup_sfp is
    only set for 82599, so one should check to be sure that pointer is set
    before continuing in ixgbe_sfp_config_module_task. I verified by
    inspection that the upstream driver has the same issue and also added a
    check before the call in ixgbe_sfp_link_config.

    Signed-off-by: Andy Gospodarek
    Signed-off-by: Jeff Kirsher

    Andy Gospodarek
     
  • The driver was not flushing all writebacks before unloading, possibly
    causing memory to be written by the hardware after the driver had
    reinitialized the rings.

    This adds missing functionality to flush any pending writebacks and is
    called in all spots where descriptors should be completed before the driver
    begins processing.

    Signed-off-by: Jesse Brandeburg
    Reviewed-by: Bruce Allan
    Tested-by: Jeff Pieper
    Signed-off-by: Jeff Kirsher

    Jesse Brandeburg
     
  • This change is part of a fix to avoid any tasks running while the driver is
    exiting and deinitializing resources.

    Signed-off-by: Jesse Brandeburg
    Tested-by: Jeff Pieper
    Signed-off-by: Jeff Kirsher

    Jesse Brandeburg