08 Mar, 2012

6 commits


07 Mar, 2012

8 commits

  • When OVS_VPORT_ATTR_NAME is specified and dp_ifindex is nonzero, the
    logical behavior would be for the vport name lookup scope to be limited
    to the specified datapath, but in fact the dp_ifindex value was ignored.
    This commit causes the search scope to be honored.

    Signed-off-by: Ben Pfaff
    Signed-off-by: Jesse Gross

    Ben Pfaff
     
  • When forwarding was set and a new net device is register,
    we need add this device to the all-router mcast group.

    Signed-off-by: Li Wei
    Signed-off-by: David S. Miller

    Li Wei
     
  • If reliable event delivery is enabled and ctnetlink fails to deliver
    the destroy event in early_drop, the conntrack subsystem cannot
    drop any the candidate flow that was planned to be evicted.

    Reported-by: Kerin Millar
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • When net.bridge.bridge-nf-filter-vlan-tagged is 0 (default), vlan packets
    arriving should not be sent to ip(6)tables by bridge netfilter.

    However, it turns out that we currently always send VLAN packets to
    netfilter, if ..
    a), CONFIG_VLAN_8021Q is enabled ; or
    b), CONFIG_VLAN_8021Q is not set but rx vlan offload is enabled
    on the bridge port.

    This is because bridge netfilter treats skb with
    skb->protocol == ETH_P_IP{V6} as "non-vlan packet".

    With rx vlan offload on or CONFIG_VLAN_8021Q=y, the vlan header has
    already been removed here, and we cannot rely on skb->protocol alone.

    Fix this by only using skb->protocol if the skb has no vlan tag,
    or if a vlan tag is present and filter-vlan-tagged bridge netfilter
    sysctl is enabled.

    We cannot remove the skb->protocol == htons(ETH_P_8021Q) test
    because the vlan tag is still around in the CONFIG_VLAN_8021Q=n &&
    "ethtool -K $itf rxvlan off" case.

    reproducer:
    iptables -t raw -I PREROUTING -i br0
    iptables -t raw -I PREROUTING -i br0.1

    Then send packets to an ip address configured on br0.1 interface.
    Even with net.bridge.bridge-nf-filter-vlan-tagged=0, the 1st rule
    will match instead of the 2nd one.

    With this patch applied, the 2nd rule will match instead.
    In the non-local address case, netfilter won't be consulted after
    this patch unless the sysctl is switched on.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • In adf7ff8, a invalid dereference was added in ebt_make_names.

    CC [M] net/bridge/netfilter/ebtables.o
    net/bridge/netfilter/ebtables.c: In function `ebt_make_names':
    net/bridge/netfilter/ebtables.c:1371:20: warning: `t' may be used uninitialized in this function [-Wuninitialized]

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Since 7d367e0, ctnetlink_new_conntrack is called without holding
    the nf_conntrack_lock spinlock. Thus, ctnetlink_parse_nat_setup
    does not require to release that spinlock anymore in the NAT module
    autoload case.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • user-space ebtables expects 32 bytes-long names, but xt_match names
    use 29 bytes. We have to copy less 29 bytes and then, make sure we
    fill the remaining bytes with zeroes.

    Signed-off-by: Santosh Nayak
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Santosh Nayak
     
  • This commit fixes tcp_shift_skb_data() so that it does not shift
    SACKed data below snd_una.

    This fixes an issue whose symptoms exactly match reports showing
    tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
    net/ipv4/tcp_input.c:3418" thread on netdev).

    Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41)
    tcp_shift_skb_data() had been shifting SACKed ranges that were below
    snd_una. It checked that the *end* of the skb it was about to shift
    from was above snd_una, but did not check that the end of the actual
    shifted range was above snd_una; this commit adds that check.

    Shifting SACKed ranges below snd_una is problematic because for such
    ranges tcp_sacktag_one() short-circuits: it does not declare anything
    as SACKed and does not increase sacked_out.

    Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9
    and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges
    below snd_una happened to work because tcp_shifted_skb() was always
    (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
    tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
    tcp_sacktag_one() never short-circuited and always increased
    tp->sacked_out in this case.

    After those two fixes, my testing has verified that shifting SACKed
    ranges below snd_una could cause tp->sacked_out to go negative with
    the following sequence of events:

    (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
    then shifts a prefix of that skb that is below snd_una

    (2) tcp_shifted_skb() increments the packet count of the
    already-SACKed prev sk_buff

    (3) tcp_sacktag_one() sees the end of the new SACKed range is below
    snd_una, so it short-circuits and doesn't increase tp->sacked_out

    (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
    decrements tp->sacked_out by this "inflated" pcount that was
    missing a matching increase in tp->sacked_out, and hence
    tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
    to s32 is negative.

    (6) this leads to the warnings seen in the recent "WARNING: at
    net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
    tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0);

    More generally, I think this bug can be tickled in some cases where
    two or more ACKs from the receiver are lost and then a DSACK arrives
    that is immediately above an existing SACKed skb in the write queue.

    This fix changes tcp_shift_skb_data() to abort this sequence at step
    (1) in the scenario above by noticing that the bytes are below snd_una
    and not shifting them.

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     

06 Mar, 2012

1 commit


05 Mar, 2012

3 commits


04 Mar, 2012

1 commit

  • In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
    to mark the first portion as lost. This is for two primary reasons:

    (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
    doing this, it preserves the sum of their packet counts in order to
    reflect the real-world dynamics on the wire. But given that skbs can
    have remainders that do not align to MSS boundaries, this packet count
    preservation means that for SACKed skbs there is not necessarily a
    direct linear relationship between tcp_skb_pcount(skb) and
    skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
    off and mark as lost a prefix of length (packets - oldcnt)*mss from
    SACKed skbs were leading to occasional failures of the WARN_ON(len >
    skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
    recent "crash in tcp_fragment" thread on netdev).

    (2) there is no real point in fragmenting off part of a SACKed skb and
    calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
    for SACKed skbs.

    Signed-off-by: Neal Cardwell
    Acked-by: Ilpo Järvinen
    Acked-by: Yuchung Cheng
    Acked-by: Nandita Dukkipati
    Signed-off-by: David S. Miller

    Neal Cardwell
     

02 Mar, 2012

1 commit


29 Feb, 2012

1 commit

  • When tcp_shifted_skb() shifts bytes from the skb that is currently
    pointed to by 'highest_sack' then the increment of
    TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
    implicit advancement, combined with the recent fix to pass the correct
    SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
    that the newly SACKed range was before the tcp_highest_sack_seq(),
    leading to a call to tcp_update_reordering() with a degree of
    reordering matching the size of the newly SACKed range (typically just
    1 packet, which is a NOP, but potentially larger).

    This commit fixes this by simply calling tcp_sacktag_one() before the
    TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
    highest SACKed sequence.

    Correspondingly, we can simplify the code a little now that
    tcp_shifted_skb() should update the lost_cnt_hint in all cases where
    skb == tp->lost_skb_hint.

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     

28 Feb, 2012

1 commit


27 Feb, 2012

1 commit

  • 1) ICMP sockets leave err uninitialized but we try to return it for the
    unsupported MSG_OOB case, reported by Dave Jones.

    2) Add new Zaurus device ID entries, from Dave Jones.

    3) Pointer calculation in hso driver memset is wrong, from Dan
    Carpenter.

    4) ks8851_probe() checks unsigned value as negative, fix also from Dan
    Carpenter.

    5) Fix crashes in atl1c driver due to TX queue handling, from Eric
    Dumazet. I anticipate some TX side locking fixes coming in the near
    future for this driver as well.

    6) The inline directive fix in Bluetooth which was breaking the build
    only with very new versions of GCC, from Johan Hedberg.

    7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
    window, reported by Meelis Roos and fixed by Eric Dumazet.

    8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.

    9) Some ip6_route_output() callers test the return value for NULL, but
    this never happens as the convention is to return a dst entry with
    dst->error set. Fixes from RonQing Li.

    10) Logitech Harmony 900 should be handled by zaurus driver not
    cdc_ether, update white lists and black lists accordingly. From
    Scott Talbert.

    11) Receiving from certain kinds of devices there won't be a MAC header,
    so there is no MAC header to fixup in the IPSEC code, and if we try
    to do it we'll crash. Fix from Eric Dumazet.

    12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
    Petrilin.

    13) Fix regression in link-down handling in davinci_emac which causes
    all RX descriptors to be freed up and therefore RX to wedge
    completely, from Christian Riesch.

    14) It took two attempts, but ctnetlink soft lockups seem to be
    cured now, from Pablo Neira Ayuso.

    15) Endianness bug fix in ENIC driver, from Santosh Nayak.

    16) The long ago conversion of the PPP fragmentation code over to
    abstracted SKB list handling wasn't perfect, once we get an
    out of sequence SKB we don't flush the rest of them like we
    should. From Ben McKeegan.

    17) Fix regression of ->ip_summed initialization in sfc driver.
    From Ben Hutchings.

    18) Bluetooth timeout mistakenly using msecs instead of jiffies,
    from Andrzej Kaczmarek.

    19) Using _sync variant of work cancellation results in deadlocks,
    use the non _sync variants instead. From Andre Guedes.

    20) Bluetooth rfcomm code had reference counting problems leading
    to crashes, fix from Octavian Purdila.

    21) The conversion of netem over to classful qdisc handling added
    two bugs to netem_dequeue(), fixes from Eric Dumazet.

    22) Missing pci_iounmap() in ATM Solos driver. Fix from Julia Lawall.

    23) b44_pci_exit() should not have __exit tag since it's invoked from
    non-__exit code. From Nikola Pajkovsky.

    24) The conversion of the neighbour hash tables over to RCU added a
    race, fixed here by adding the necessary reread of tbl->nht, fix
    from Michel Machado.

    25) When we added VF (virtual function) attributes for network device
    dumps, this potentially bloats up the size of the dump of one
    network device such that the dump size is too large for the buffer
    allocated by properly written netlink applications.

    In particular, if you add 255 VFs to a network device, parts of
    GLIBC stop working.

    To fix this, we add an attribute that is used to turn on these
    extended portions of the network device dump. Sophisticaed
    applications like 'ip' that want to see this stuff will be changed
    to set the attribute, whereas things like GLIBC that don't care
    about VFs simply will not, and therefore won't be busted by the
    mere presence of VFs on a network device.

    Thanks to the tireless work of Greg Rose on this fix.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
    sfc: Fix assignment of ip_summed for pre-allocated skbs
    ppp: fix 'ppp_mp_reconstruct bad seq' errors
    enic: Fix endianness bug.
    gre: fix spelling in comments
    netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
    Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
    davinci_emac: Do not free all rx dma descriptors during init
    mlx4_core: Fixing array indexes when setting port types
    phy: IC+101G and PHY_HAS_INTERRUPT flag
    netdev/phy/icplus: Correct broken phy_init code
    ipsec: be careful of non existing mac headers
    Move Logitech Harmony 900 from cdc_ether to zaurus
    hso: memsetting wrong data in hso_get_count()
    netfilter: ip6_route_output() never returns NULL.
    ethernet/broadcom: ip6_route_output() never returns NULL.
    ipv6: ip6_route_output() never returns NULL.
    jme: Fix FIFO flush issue
    atm: clip: remove clip_tbl
    ipv4: ping: Fix recvmsg MSG_OOB error handling.
    rtnetlink: Fix problem with buffer allocation
    ...

    Linus Torvalds
     

25 Feb, 2012

3 commits


24 Feb, 2012

3 commits

  • Marcell Zambo and Janos Farago noticed and reported that when
    new conntrack entries are added via netlink and the conntrack table
    gets full, soft lockup happens. This is because the nf_conntrack_lock
    is held while nf_conntrack_alloc is called, which is in turn wants
    to lock nf_conntrack_lock while evicting entries from the full table.

    The patch fixes the soft lockup with limiting the holding of the
    nf_conntrack_lock to the minimum, where it's absolutely required.
    It required to extend (and thus change) nf_conntrack_hash_insert
    so that it makes sure conntrack and ctnetlink do not add the same entry
    twice to the conntrack table.

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Jozsef Kadlecsik
     
  • This reverts commit af14cca162ddcdea017b648c21b9b091e4bf1fa4.

    This patch contains a race condition between packets and ctnetlink
    in the conntrack addition. A new patch to fix this issue follows up.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Niccolo Belli reported ipsec crashes in case we handle a frame without
    mac header (atm in his case)

    Before copying mac header, better make sure it is present.

    Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809

    Reported-by: Niccolò Belli
    Tested-by: Niccolò Belli
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Feb, 2012

3 commits


22 Feb, 2012

6 commits

  • Commit 32092ecf0644 (atm: clip: Use device neigh support on top of
    "arp_tbl".) introduced a bug since clip_tbl is zeroed : Crash occurs in
    __neigh_for_each_release()

    idle_timer_check() must use instead arp_tbl and neigh_check_cb() should
    ignore non clip neighbours.

    Idea from David Miller.

    Reported-by: Meelis Roos
    Signed-off-by: Eric Dumazet
    Tested-by: Meelis Roos
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Don't return an uninitialized variable as the error, return
    -EOPNOTSUPP instead.

    Reported-by: Dave Jones
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Implement a new netlink attribute type IFLA_EXT_MASK. The mask
    is a 32 bit value that can be used to indicate to the kernel that
    certain extended ifinfo values are requested by the user application.
    At this time the only mask value defined is RTEXT_FILTER_VF to
    indicate that the user wants the ifinfo dump to send information
    about the VFs belonging to the interface.

    This patch fixes a bug in which certain applications do not have
    large enough buffers to accommodate the extra information returned
    by the kernel with large numbers of SR-IOV virtual functions.
    Those applications will not send the new netlink attribute with
    the interface info dump request netlink messages so they will
    not get unexpectedly large request buffers returned by the kernel.

    Modifies the rtnl_calcit function to traverse the list of net
    devices and compute the minimum buffer size that can hold the
    info dumps of all matching devices based upon the filter passed
    in via the new netlink attribute filter mask. If no filter
    mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

    With this change it is possible to add yet to be defined netlink
    attributes to the dump request which should make it fairly extensible
    in the future.

    Signed-off-by: Greg Rose
    Signed-off-by: David S. Miller

    Greg Rose
     
  • When the fixed race condition happens:

    1. While function neigh_periodic_work scans the neighbor hash table
    pointed by field tbl->nht, it unlocks and locks tbl->lock between
    buckets in order to call cond_resched.

    2. Assume that function neigh_periodic_work calls cond_resched, that is,
    the lock tbl->lock is available, and function neigh_hash_grow runs.

    3. Once function neigh_hash_grow finishes, and RCU calls
    neigh_hash_free_rcu, the original struct neigh_hash_table that function
    neigh_periodic_work was using doesn't exist anymore.

    4. Once back at neigh_periodic_work, whenever the old struct
    neigh_hash_table is accessed, things can go badly.

    Signed-off-by: Michel Machado
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Michel Machado
     
  • nothing needs to be done for monitor/AP_VLAN mode on calling
    ieee80211_bss_info_change_notify -> drv_bss_info_changed with the change
    flag 'BSS_CHANGED_IDLE'. 'wl1271' seems to use BSS_CHANGED_IDLE only for
    STA and IBSS mode. further the non-idle state of the monitor mode is
    taken care by the 'count' variable which counts non-idle interfaces.
    ieee80211_idle_off(local, "in use") will be called.
    this fixes the following WARNING when we have initially STA mode
    (network manager running) and not associated, and change it to monitor
    mode with network manager disabled and bringing up the monitor mode.
    this changes the idle state from 'true' (STA unassociated) to 'false'
    (MONITOR mode)
    exposed by the commit 405385f8ce7a2ed8f82e216d88b5282142e1288b
    "mac80211: set bss_conf.idle when vif is connected"

    WARNING: net/mac80211/main.c:212
    ieee80211_bss_info_change_notify+0x1cf/0x330 [mac80211]()
    Hardware name: 64756D6
    Pid: 3835, comm: ifconfig Tainted: G O
    3.3.0-rc3-wl #9
    Call Trace:
    [] warn_slowpath_common+0x72/0xa0
    [] ?
    ieee80211_bss_info_change_notify+0x1cf/0x330 [mac80211]
    [] ?
    ieee80211_bss_info_change_notify+0x1cf/0x330 [mac80211]
    [] warn_slowpath_null+0x22/0x30
    []
    ieee80211_bss_info_change_notify+0x1cf/0x330 [mac80211]
    [] __ieee80211_recalc_idle+0x113/0x430
    [mac80211]
    [] ieee80211_do_open+0x156/0x7e0 [mac80211]
    [] ?
    ieee80211_check_concurrent_iface+0x25/0x180 [mac80211]
    [] ? raw_notifier_call_chain+0x1f/0x30
    [] ieee80211_open+0x40/0x80 [mac80211]
    [] __dev_open+0x96/0xe0
    [] ? _raw_spin_unlock_bh+0x35/0x40
    [] __dev_change_flags+0x109/0x170
    [] dev_change_flags+0x23/0x60
    [] devinet_ioctl+0x6a0/0x770

    ieee80211 phy0: device no longer idle - in use

    Cc: Eliad Peller
    Signed-off-by: Mohammed Shafi Shajakhan
    Signed-off-by: John W. Linville

    Mohammed Shafi Shajakhan
     
  • rate control algorithms concludes the rate as invalid
    with rate[i].idx < -1 , while they do also check for rate[i].count is
    non-zero. it would be safer to zero initialize the 'count' field.
    recently we had a ath9k rate control crash where the ath9k rate control
    in ath_tx_status assumed to check only for rate[i].count being non-zero
    in one instance and ended up in using invalid rate index for
    'connection monitoring NULL func frames' which eventually lead to the crash.
    thanks to Pavel Roskin for fixing it and finding the root cause.
    https://bugzilla.redhat.com/show_bug.cgi?id=768639

    Cc: stable@vger.kernel.org
    Cc: Pavel Roskin
    Signed-off-by: Mohammed Shafi Shajakhan
    Signed-off-by: John W. Linville

    Mohammed Shafi Shajakhan
     

21 Feb, 2012

2 commits

  • Marcell Zambo and Janos Farago noticed and reported that when
    new conntrack entries are added via netlink and the conntrack table
    gets full, soft lockup happens. This is because the nf_conntrack_lock
    is held while nf_conntrack_alloc is called, which is in turn wants
    to lock nf_conntrack_lock while evicting entries from the full table.

    The patch fixes the soft lockup with limiting the holding of the
    nf_conntrack_lock to the minimum, where it's absolutely required.

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Jozsef Kadlecsik
     
  • Assorted fixes, sat in -next for a week or so...

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
    vfs: fix compat_sys_stat() handling of overflows in st_nlink
    quota: Fix deadlock with suspend and quotas
    vfs: Provide function to get superblock and wait for it to thaw
    vfs: fix panic in __d_lookup() with high dentry hashtable counts
    autofs4 - fix lockdep splat in autofs
    vfs: fix d_inode_lookup() dentry ref leak

    Linus Torvalds