24 Dec, 2011

26 commits


23 Dec, 2011

14 commits

  • "! --connbytes 23:42" should match if the packet/byte count is not in range.

    As there is no explict "invert match" toggle in the match structure,
    userspace swaps the from and to arguments
    (i.e., as if "--connbytes 42:23" were given).

    However, "what = 42" will always be false.

    Change things so we use "||" in case "from" is larger than "to".

    This change may look like it breaks backwards compatibility when "to" is 0.
    However, older iptables binaries will refuse "connbytes 42:0",
    and current releases treat it to mean "! --connbytes 0:42",
    so we should be fine.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This closes races where btrfs is calling d_instantiate too soon during
    inode creation. All of the callers of btrfs_add_nondir are updated to
    instantiate after the inode is fully setup in memory.

    Signed-off-by: Al Viro
    Signed-off-by: Chris Mason

    Al Viro
     
  • Dan Carpenter noticed that we were doing a double unlock on the worker
    lock, and sometimes picking a worker thread without the lock held.

    This fixes both errors.

    Signed-off-by: Chris Mason
    Reported-by: Dan Carpenter

    Chris Mason
     
  • skb->truesize might be big even for a small packet.

    Its even bigger after commit 87fb4b7b533 (net: more accurate skb
    truesize) and big MTU.

    We should allow queueing at least one packet per receiver, even with a
    low RCVBUF setting.

    Reported-by: Michal Simek
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will
    cause a kernel oops due to insufficient bounds checking.

    if (count > 1<<< 30) * 8 will overflow
    32 bits.

    This patch replaces the magic number (1 << 30) with a symbolic bound.

    Suggested-by: Eric Dumazet
    Signed-off-by: Xi Wang
    Signed-off-by: David S. Miller

    Xi Wang
     
  • Chris Boot reported crashes occurring in ipv6_select_ident().

    [ 461.457562] RIP: 0010:[] []
    ipv6_select_ident+0x31/0xa7

    [ 461.578229] Call Trace:
    [ 461.580742]
    [ 461.582870] [] ? udp6_ufo_fragment+0x124/0x1a2
    [ 461.589054] [] ? ipv6_gso_segment+0xc0/0x155
    [ 461.595140] [] ? skb_gso_segment+0x208/0x28b
    [ 461.601198] [] ? ipv6_confirm+0x146/0x15e
    [nf_conntrack_ipv6]
    [ 461.608786] [] ? nf_iterate+0x41/0x77
    [ 461.614227] [] ? dev_hard_start_xmit+0x357/0x543
    [ 461.620659] [] ? nf_hook_slow+0x73/0x111
    [ 461.626440] [] ? br_parse_ip_options+0x19a/0x19a
    [bridge]
    [ 461.633581] [] ? dev_queue_xmit+0x3af/0x459
    [ 461.639577] [] ? br_dev_queue_push_xmit+0x72/0x76
    [bridge]
    [ 461.646887] [] ? br_nf_post_routing+0x17d/0x18f
    [bridge]
    [ 461.653997] [] ? nf_iterate+0x41/0x77
    [ 461.659473] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.665485] [] ? nf_hook_slow+0x73/0x111
    [ 461.671234] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.677299] [] ?
    nf_bridge_update_protocol+0x20/0x20 [bridge]
    [ 461.684891] [] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
    [ 461.691520] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.697572] [] ? NF_HOOK.constprop.8+0x3c/0x56
    [bridge]
    [ 461.704616] [] ?
    nf_bridge_push_encap_header+0x1c/0x26 [bridge]
    [ 461.712329] [] ? br_nf_forward_finish+0x8a/0x95
    [bridge]
    [ 461.719490] [] ?
    nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
    [ 461.727223] [] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
    [ 461.734292] [] ? nf_iterate+0x41/0x77
    [ 461.739758] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.746203] [] ? nf_hook_slow+0x73/0x111
    [ 461.751950] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.758378] [] ? NF_HOOK.constprop.4+0x56/0x56
    [bridge]

    This is caused by bridge netfilter special dst_entry (fake_rtable), a
    special shared entry, where attaching an inetpeer makes no sense.

    Problem is present since commit 87c48fa3b46 (ipv6: make fragment
    identifications less predictable)

    Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
    __ip_select_ident() fallback to the 'no peer attached' handling.

    Reported-by: Chris Boot
    Tested-by: Chris Boot
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Userspace may not provide TCA_OPTIONS, in fact tc currently does
    so not do so if no arguments are specified on the command line.
    Return EINVAL instead of panicing.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Commit 618f9bc74a039da76 (net: Move mtu handling down to the protocol
    depended handlers) forgot the bridge netfilter case, adding a NULL
    dereference in ip_fragment().

    Reported-by: Chris Boot
    CC: Steffen Klassert
    Signed-off-by: Eric Dumazet
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • * 'for-linus' of git://neil.brown.name/md:
    md/bitmap: It is OK to clear bits during recovery.
    md: don't give up looking for spares on first failure-to-add
    md/raid5: ensure correct assessment of drives during degraded reshape.
    md/linear: fix hot-add of devices to linear arrays.

    Linus Torvalds
     
  • commit d0a4bb492772ce5c4bdfba3744a99ed6f6fb238f introduced a
    regression which is annoying but fairly harmless.

    When writing to an array that is undergoing recovery (a spare
    in being integrated into the array), writing to the array will
    set bits in the bitmap, but they will not be cleared when the
    write completes.

    For bits covering areas that have not been recovered yet this is not a
    problem as the recovery will clear the bits. However bits set in
    already-recovered region will stay set and never be cleared.
    This doesn't risk data integrity. The only negatives are:
    - next time there is a crash, more resyncing than necessary will
    be done.
    - the bitmap doesn't look clean, which is confusing.

    While an array is recovering we don't want to update the
    'events_cleared' setting in the bitmap but we do still want to clear
    bits that have very recently been set - providing they were written to
    the recovering device.

    So split those two needs - which previously both depended on 'success'
    and always clear the bit of the write went to all devices.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Before performing a recovery we try to remove any spares that
    might not be working, then add any that might have become relevant.

    Currently we abort on the first spare that cannot be added.
    This is a false optimisation.
    It is conceivable that - depending on rules in the personality - a
    subsequent spare might be accepted.
    Also the loop does other things like count the available spares and
    reset the 'recovery_offset' value.

    If we abort early these might not happen properly.

    So remove the early abort.

    In particular if you have an array what is undergoing recovery and
    which has extra spares, then the recovery may not restart after as
    reboot as the could of 'spares' might end up as zero.

    Reported-by: Anssi Hannula
    Signed-off-by: NeilBrown

    NeilBrown
     
  • While reshaping a degraded array (as when reshaping a RAID0 by first
    converting it to a degraded RAID4) we currently get confused about
    which devices are in_sync. In most cases we get it right, but in the
    region that is being reshaped we need to treat non-failed devices as
    in-sync when we have the data but haven't actually written it out yet.

    Reported-by: Adam Kwolek
    Signed-off-by: NeilBrown

    NeilBrown
     
  • commit d70ed2e4fafdbef0800e73942482bb075c21578b
    broke hot-add to a linear array.
    After that commit, metadata if not written to devices until they
    have been fully integrated into the array as determined by
    saved_raid_disk. That patch arranged to clear that field after
    a recovery completed.

    However for linear arrays, there is no recovery - the integration is
    instantaneous. So we need to explicitly clear the saved_raid_disk
    field.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • This silently was working for many years and stopped working on
    Niagara-T3 machines.

    We need to set the MSIQ to VALID before we can set it's state to IDLE.

    On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL
    errors. The hypervisor documentation says, rather ambiguously, that
    the MSIQ must be "initialized" before one can set the state.

    I previously understood this to mean merely that a successful setconf()
    operation has been performed on the MSIQ, which we have done at this
    point. But it seems to also mean that it has been set VALID too.

    Signed-off-by: David S. Miller

    David S. Miller