17 Jan, 2012

2 commits

  • When we are initializing using arch_get_random_long() we only need to
    loop enough times to touch all the bytes in the buffer; using
    poolwords for that does twice the number of operations necessary on a
    64-bit machine, since in the random number generator code "word" means
    32 bits.

    Signed-off-by: H. Peter Anvin
    Cc: "Theodore Ts'o"
    Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu

    H. Peter Anvin
     
  • If there is an architecture-specific random number generator (such as
    RDRAND for Intel architectures), use it to initialize /dev/random's
    entropy stores. Even in the worst case, if RDRAND is something like
    AES(NSA_KEY, counter++), it won't hurt, and it will definitely help
    against any other adversaries.

    Signed-off-by: "Theodore Ts'o"
    Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu
    Signed-off-by: H. Peter Anvin

    Theodore Ts'o
     

30 Dec, 2011

1 commit

  • We still don't use rdrand in /dev/random, which just seems stupid. We
    accept the *cycle*counter* as a random input, but we don't accept
    rdrand? That's just broken.

    Sure, people can do things in user space (write to /dev/random, use
    rdrand in addition to /dev/random themselves etc etc), but that
    *still* seems to be a particularly stupid reason for saying "we
    shouldn't bother to try to do better in /dev/random".

    And even if somebody really doesn't trust rdrand as a source of random
    bytes, it seems singularly stupid to trust the cycle counter *more*.

    So I'd suggest the attached patch. I'm not going to even bother
    arguing that we should add more bits to the entropy estimate, because
    that's not the point - I don't care if /dev/random fills up slowly or
    not, I think it's just stupid to not use the bits we can get from
    rdrand and mix them into the strong randomness pool.

    Link: http://lkml.kernel.org/r/CA%2B55aFwn59N1=m651QAyTy-1gO1noGbK18zwKDwvwqnravA84A@mail.gmail.com
    Acked-by: "David S. Miller"
    Acked-by: "Theodore Ts'o"
    Acked-by: Herbert Xu
    Cc: Matt Mackall
    Cc: Tony Luck
    Cc: Eric Dumazet
    Signed-off-by: H. Peter Anvin

    Linus Torvalds
     

24 Dec, 2011

9 commits


23 Dec, 2011

18 commits

  • "! --connbytes 23:42" should match if the packet/byte count is not in range.

    As there is no explict "invert match" toggle in the match structure,
    userspace swaps the from and to arguments
    (i.e., as if "--connbytes 42:23" were given).

    However, "what = 42" will always be false.

    Change things so we use "||" in case "from" is larger than "to".

    This change may look like it breaks backwards compatibility when "to" is 0.
    However, older iptables binaries will refuse "connbytes 42:0",
    and current releases treat it to mean "! --connbytes 0:42",
    so we should be fine.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • This closes races where btrfs is calling d_instantiate too soon during
    inode creation. All of the callers of btrfs_add_nondir are updated to
    instantiate after the inode is fully setup in memory.

    Signed-off-by: Al Viro
    Signed-off-by: Chris Mason

    Al Viro
     
  • Dan Carpenter noticed that we were doing a double unlock on the worker
    lock, and sometimes picking a worker thread without the lock held.

    This fixes both errors.

    Signed-off-by: Chris Mason
    Reported-by: Dan Carpenter

    Chris Mason
     
  • skb->truesize might be big even for a small packet.

    Its even bigger after commit 87fb4b7b533 (net: more accurate skb
    truesize) and big MTU.

    We should allow queueing at least one packet per receiver, even with a
    low RCVBUF setting.

    Reported-by: Michal Simek
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will
    cause a kernel oops due to insufficient bounds checking.

    if (count > 1<<< 30) * 8 will overflow
    32 bits.

    This patch replaces the magic number (1 << 30) with a symbolic bound.

    Suggested-by: Eric Dumazet
    Signed-off-by: Xi Wang
    Signed-off-by: David S. Miller

    Xi Wang
     
  • Chris Boot reported crashes occurring in ipv6_select_ident().

    [ 461.457562] RIP: 0010:[] []
    ipv6_select_ident+0x31/0xa7

    [ 461.578229] Call Trace:
    [ 461.580742]
    [ 461.582870] [] ? udp6_ufo_fragment+0x124/0x1a2
    [ 461.589054] [] ? ipv6_gso_segment+0xc0/0x155
    [ 461.595140] [] ? skb_gso_segment+0x208/0x28b
    [ 461.601198] [] ? ipv6_confirm+0x146/0x15e
    [nf_conntrack_ipv6]
    [ 461.608786] [] ? nf_iterate+0x41/0x77
    [ 461.614227] [] ? dev_hard_start_xmit+0x357/0x543
    [ 461.620659] [] ? nf_hook_slow+0x73/0x111
    [ 461.626440] [] ? br_parse_ip_options+0x19a/0x19a
    [bridge]
    [ 461.633581] [] ? dev_queue_xmit+0x3af/0x459
    [ 461.639577] [] ? br_dev_queue_push_xmit+0x72/0x76
    [bridge]
    [ 461.646887] [] ? br_nf_post_routing+0x17d/0x18f
    [bridge]
    [ 461.653997] [] ? nf_iterate+0x41/0x77
    [ 461.659473] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.665485] [] ? nf_hook_slow+0x73/0x111
    [ 461.671234] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.677299] [] ?
    nf_bridge_update_protocol+0x20/0x20 [bridge]
    [ 461.684891] [] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
    [ 461.691520] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.697572] [] ? NF_HOOK.constprop.8+0x3c/0x56
    [bridge]
    [ 461.704616] [] ?
    nf_bridge_push_encap_header+0x1c/0x26 [bridge]
    [ 461.712329] [] ? br_nf_forward_finish+0x8a/0x95
    [bridge]
    [ 461.719490] [] ?
    nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
    [ 461.727223] [] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
    [ 461.734292] [] ? nf_iterate+0x41/0x77
    [ 461.739758] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.746203] [] ? nf_hook_slow+0x73/0x111
    [ 461.751950] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.758378] [] ? NF_HOOK.constprop.4+0x56/0x56
    [bridge]

    This is caused by bridge netfilter special dst_entry (fake_rtable), a
    special shared entry, where attaching an inetpeer makes no sense.

    Problem is present since commit 87c48fa3b46 (ipv6: make fragment
    identifications less predictable)

    Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
    __ip_select_ident() fallback to the 'no peer attached' handling.

    Reported-by: Chris Boot
    Tested-by: Chris Boot
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Userspace may not provide TCA_OPTIONS, in fact tc currently does
    so not do so if no arguments are specified on the command line.
    Return EINVAL instead of panicing.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Commit 618f9bc74a039da76 (net: Move mtu handling down to the protocol
    depended handlers) forgot the bridge netfilter case, adding a NULL
    dereference in ip_fragment().

    Reported-by: Chris Boot
    CC: Steffen Klassert
    Signed-off-by: Eric Dumazet
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • * 'for-linus' of git://neil.brown.name/md:
    md/bitmap: It is OK to clear bits during recovery.
    md: don't give up looking for spares on first failure-to-add
    md/raid5: ensure correct assessment of drives during degraded reshape.
    md/linear: fix hot-add of devices to linear arrays.

    Linus Torvalds
     
  • commit d0a4bb492772ce5c4bdfba3744a99ed6f6fb238f introduced a
    regression which is annoying but fairly harmless.

    When writing to an array that is undergoing recovery (a spare
    in being integrated into the array), writing to the array will
    set bits in the bitmap, but they will not be cleared when the
    write completes.

    For bits covering areas that have not been recovered yet this is not a
    problem as the recovery will clear the bits. However bits set in
    already-recovered region will stay set and never be cleared.
    This doesn't risk data integrity. The only negatives are:
    - next time there is a crash, more resyncing than necessary will
    be done.
    - the bitmap doesn't look clean, which is confusing.

    While an array is recovering we don't want to update the
    'events_cleared' setting in the bitmap but we do still want to clear
    bits that have very recently been set - providing they were written to
    the recovering device.

    So split those two needs - which previously both depended on 'success'
    and always clear the bit of the write went to all devices.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Before performing a recovery we try to remove any spares that
    might not be working, then add any that might have become relevant.

    Currently we abort on the first spare that cannot be added.
    This is a false optimisation.
    It is conceivable that - depending on rules in the personality - a
    subsequent spare might be accepted.
    Also the loop does other things like count the available spares and
    reset the 'recovery_offset' value.

    If we abort early these might not happen properly.

    So remove the early abort.

    In particular if you have an array what is undergoing recovery and
    which has extra spares, then the recovery may not restart after as
    reboot as the could of 'spares' might end up as zero.

    Reported-by: Anssi Hannula
    Signed-off-by: NeilBrown

    NeilBrown
     
  • While reshaping a degraded array (as when reshaping a RAID0 by first
    converting it to a degraded RAID4) we currently get confused about
    which devices are in_sync. In most cases we get it right, but in the
    region that is being reshaped we need to treat non-failed devices as
    in-sync when we have the data but haven't actually written it out yet.

    Reported-by: Adam Kwolek
    Signed-off-by: NeilBrown

    NeilBrown
     
  • commit d70ed2e4fafdbef0800e73942482bb075c21578b
    broke hot-add to a linear array.
    After that commit, metadata if not written to devices until they
    have been fully integrated into the array as determined by
    saved_raid_disk. That patch arranged to clear that field after
    a recovery completed.

    However for linear arrays, there is no recovery - the integration is
    instantaneous. So we need to explicitly clear the saved_raid_disk
    field.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • This silently was working for many years and stopped working on
    Niagara-T3 machines.

    We need to set the MSIQ to VALID before we can set it's state to IDLE.

    On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL
    errors. The hypervisor documentation says, rather ambiguously, that
    the MSIQ must be "initialized" before one can set the state.

    I previously understood this to mean merely that a successful setconf()
    operation has been performed on the MSIQ, which we have done at this
    point. But it seems to also mean that it has been set VALID too.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • * 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    USB: Fix usb/isp1760 build on sparc
    usb: gadget: epautoconf: do not change number of streams
    usb: dwc3: core: fix cached revision on our structure
    usb: musb: fix reset issue with full speed device

    Linus Torvalds
     
  • * 'upstream-linus' of git://github.com/jgarzik/libata-dev:
    pata_of_platform: Add missing CONFIG_OF_IRQ dependency.

    Linus Torvalds
     
  • Signed-off-by: David S. Miller
    Signed-off-by: Jeff Garzik

    David Miller
     
  • Signed-off-by: Stephen Rothwell
    Acked-by: Eric Dumazet
    Acked-by: David Miller
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     

22 Dec, 2011

10 commits

  • Currently, the *_global_[un]lock_online() routines are not at all synchronized
    with CPU hotplug. Soft-lockups detected as a consequence of this race was
    reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
    for finding out that the root-cause of this issue is the race condition
    between br_write_[un]lock() and CPU hotplug, which results in the lock states
    getting messed up).

    Fixing this race by just adding {get,put}_online_cpus() at appropriate places
    in *_global_[un]lock_online() is not a good option, because, then suddenly
    br_write_[un]lock() would become blocking, whereas they have been kept as
    non-blocking all this time, and we would want to keep them that way.

    So, overall, we want to ensure 3 things:
    1. br_write_lock() and br_write_unlock() must remain as non-blocking.
    2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
    for different sets of CPUs.
    3. Either prevent any new CPU online operation in between this lock-unlock, or
    ensure that the newly onlined CPU does not proceed with its corresponding
    per-cpu spinlock unlocked.

    To achieve all this:
    (a) We introduce a new spinlock that is taken by the *_global_lock_online()
    routine and released by the *_global_unlock_online() routine.
    (b) We register a callback for CPU hotplug notifications, and this callback
    takes the same spinlock as above.
    (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
    initialized in the lock_init() code, all future updates to it are done in
    the callback, under the above spinlock.
    (d) The above bitmap is used (instead of cpu_online_mask) while locking and
    unlocking the per-cpu locks.

    The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
    br_write_lock-unlock sequence is in progress, the callback keeps spinning,
    thus preventing the CPU online operation till the lock-unlock sequence is
    complete. This takes care of requirement (3).

    The bitmap that we maintain remains unmodified throughout the lock-unlock
    sequence, since all updates to it are managed by the callback, which takes
    the same spinlock as the one taken by the lock code and released only by the
    unlock routine. Combining this with (d) above, satisfies requirement (2).

    Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
    operations from racing with br_write_lock-unlock, requirement (1) is also
    taken care of.

    By the way, it is to be noted that a CPU offline operation can actually run
    in parallel with our lock-unlock sequence, because our callback doesn't react
    to notifications earlier than CPU_DEAD (in order to maintain our bitmap
    properly). And this means, since we use our own bitmap (which is stale, on
    purpose) during the lock-unlock sequence, we could end up unlocking the
    per-cpu lock of an offline CPU (because we had locked it earlier, when the
    CPU was online), in order to satisfy requirement (2). But this is harmless,
    though it looks a bit awkward.

    Debugged-by: Cong Meng
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Al Viro
    Cc: stable@vger.kernel.org

    Srivatsa S. Bhat
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: Add a flow_cache_flush_deferred function
    ipv4: reintroduce route cache garbage collector
    net: have ipconfig not wait if no dev is available
    sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd
    asix: new device id
    davinci-cpdma: fix locking issue in cpdma_chan_stop
    sctp: fix incorrect overflow check on autoclose
    r8169: fix Config2 MSIEnable bit setting.
    llc: llc_cmsg_rcv was getting called after sk_eat_skb.
    net: bpf_jit: fix an off-one bug in x86_64 cond jump target
    iwlwifi: update SCD BC table for all SCD queues
    Revert "Bluetooth: Revert: Fix L2CAP connection establishment"
    Bluetooth: Clear RFCOMM session timer when disconnecting last channel
    Bluetooth: Prevent uninitialized data access in L2CAP configuration
    iwlwifi: allow to switch to HT40 if not associated
    iwlwifi: tx_sync only on PAN context
    mwifiex: avoid double list_del in command cancel path
    ath9k: fix max phy rate at rate control init
    nfc: signedness bug in __nci_request()
    iwlwifi: do not set the sequence control bit is not needed

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: atmel/ac97c: using software reset instead hardware reset if not available

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
    mfd: Include linux/io.h to jz4740-adc
    mfd: Use request_threaded_irq for twl4030-irq instead of irq_set_chained_handler
    mfd: Base interrupt for twl4030-irq must be one-shot
    mfd: Handle tps65910 clear-mask correctly
    mfd: add #ifdef CONFIG_DEBUG_FS guard for ab8500_debug_resources
    mfd: Fix twl-core oops while calling twl_i2c_* for unbound driver
    mfd: include linux/module.h for ab5500-debugfs
    mfd: Update wm8994 active device checks for WM1811
    mfd: Set tps6586x bits if new value is different from the old one
    mfd: Set da903x bits if new value is different from the old one
    mfd: Set adp5520 bits if new value is different from the old one
    mfd: Add missed free_irq in da903x_remove

    Linus Torvalds
     
  • lockdep reports a deadlock in jfs because a special inode's rw semaphore
    is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not
    used when __read_cache_page() calls add_to_page_cache_lru().

    Signed-off-by: Dave Kleikamp
    Acked-by: Hugh Dickins
    Acked-by: Al Viro
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • * 'for-greg' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
    usb: gadget: epautoconf: do not change number of streams
    usb: dwc3: core: fix cached revision on our structure
    usb: musb: fix reset issue with full speed device

    Greg Kroah-Hartman
     
  • This commit:

    commit 8f5d621543cb064d2989fc223d3c2bc61a43981e
    Author: Joachim Foerster
    Date: Mon Oct 10 18:06:54 2011 +0200

    usb/isp1760: Let OF bindings depend on general CONFIG_OF instead of PPC_OF .

    To be able to use the driver on other OF-aware architectures, too.
    And add necessary OF related #includes to fix compilation error.

    Signed-off-by: Joachim Foerster
    Signed-off-by: Greg Kroah-Hartman

    enabled the build on all CONFIG_OF architectures, but it cannot do
    this.

    This driver depends upon CONFIG_OF_IRQ but not all CONFIG_OF platforms
    support that infrastructure, in particular Sparc does not so the
    build fails.

    Please push a patch like the following to Linus so that this code only
    gets built where it actually should.

    --------------------
    usb/isp1760: Add missing CONFIG_OF_IRQ dependency on OF code.

    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Miller
     
  • flow_cach_flush() might sleep but can be called from
    atomic context via the xfrm garbage collector. So add
    a flow_cache_flush_deferred() function and use this if
    the xfrm garbage colector is invoked from within the
    packet path.

    Signed-off-by: Steffen Klassert
    Acked-by: Timo Teräs
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer)
    removed IP route cache garbage collector a bit too soon, as this gc was
    responsible for expired routes cleanup, releasing their neighbour
    reference.

    As pointed out by Robert Gladewitz, recent kernels can fill and exhaust
    their neighbour cache.

    Reintroduce the garbage collection, since we'll have to wait our
    neighbour lookups become refcount-less to not depend on this stuff.

    Reported-by: Robert Gladewitz
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • …wireless into for-davem

    John W. Linville