06 May, 2011

3 commits

  • Combine the current TREE_PREEMPT_RCU ->blocked_tasks[] lists in the
    rcu_node structure into a single ->blkd_tasks list with ->gp_tasks
    and ->exp_tasks tail pointers. This is in preparation for RCU priority
    boosting, which will add a third dimension to the combinatorial explosion
    in the ->blocked_tasks[] case, but simply a third pointer in the new
    ->blkd_tasks case.

    Also update documentation to reflect blocked_tasks[] merge

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Commit d09b62d fixed grace-period synchronization, but left some smp_mb()
    invocations in rcu_process_callbacks() that are no longer needed, but
    sheer paranoia prevented them from being removed. This commit removes
    them and provides a proof of correctness in their absence. It also adds
    a memory barrier to rcu_report_qs_rsp() immediately before the update to
    rsp->completed in order to handle the theoretical possibility that the
    compiler or CPU might move massive quantities of code into a lock-based
    critical section. This also proves that the sheer paranoia was not
    entirely unjustified, at least from a theoretical point of view.

    In addition, the old dyntick-idle synchronization depended on the fact
    that grace periods were many milliseconds in duration, so that it could
    be assumed that no dyntick-idle CPU could reorder a memory reference
    across an entire grace period. Unfortunately for this design, the
    addition of expedited grace periods breaks this assumption, which has
    the unfortunate side-effect of requiring atomic operations in the
    functions that track dyntick-idle state for RCU. (There is some hope
    that the algorithms used in user-level RCU might be applied here, but
    some work is required to handle the NMIs that user-space applications
    can happily ignore. For the short term, better safe than sorry.)

    This proof assumes that neither compiler nor CPU will allow a lock
    acquisition and release to be reordered, as doing so can result in
    deadlock. The proof is as follows:

    1. A given CPU declares a quiescent state under the protection of
    its leaf rcu_node's lock.

    2. If there is more than one level of rcu_node hierarchy, the
    last CPU to declare a quiescent state will also acquire the
    ->lock of the next rcu_node up in the hierarchy, but only
    after releasing the lower level's lock. The acquisition of this
    lock clearly cannot occur prior to the acquisition of the leaf
    node's lock.

    3. Step 2 repeats until we reach the root rcu_node structure.
    Please note again that only one lock is held at a time through
    this process. The acquisition of the root rcu_node's ->lock
    must occur after the release of that of the leaf rcu_node.

    4. At this point, we set the ->completed field in the rcu_state
    structure in rcu_report_qs_rsp(). However, if the rcu_node
    hierarchy contains only one rcu_node, then in theory the code
    preceding the quiescent state could leak into the critical
    section. We therefore precede the update of ->completed with a
    memory barrier. All CPUs will therefore agree that any updates
    preceding any report of a quiescent state will have happened
    before the update of ->completed.

    5. Regardless of whether a new grace period is needed, rcu_start_gp()
    will propagate the new value of ->completed to all of the leaf
    rcu_node structures, under the protection of each rcu_node's ->lock.
    If a new grace period is needed immediately, this propagation
    will occur in the same critical section that ->completed was
    set in, but courtesy of the memory barrier in #4 above, is still
    seen to follow any pre-quiescent-state activity.

    6. When a given CPU invokes __rcu_process_gp_end(), it becomes
    aware of the end of the old grace period and therefore makes
    any RCU callbacks that were waiting on that grace period eligible
    for invocation.

    If this CPU is the same one that detected the end of the grace
    period, and if there is but a single rcu_node in the hierarchy,
    we will still be in the single critical section. In this case,
    the memory barrier in step #4 guarantees that all callbacks will
    be seen to execute after each CPU's quiescent state.

    On the other hand, if this is a different CPU, it will acquire
    the leaf rcu_node's ->lock, and will again be serialized after
    each CPU's quiescent state for the old grace period.

    On the strength of this proof, this commit therefore removes the memory
    barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
    The effect is to reduce the number of memory barriers by one and to
    reduce the frequency of execution from about once per scheduling tick
    per CPU to once per grace period.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • The RCU CPU stall warnings can now be controlled using the
    rcu_cpu_stall_suppress boot-time parameter or via the same parameter
    from sysfs. There is therefore no longer any reason to have
    kernel config parameters for this feature. This commit therefore
    removes the RCU_CPU_STALL_DETECTOR and RCU_CPU_STALL_DETECTOR_RUNNABLE
    kernel config parameters. The RCU_CPU_STALL_TIMEOUT parameter remains
    to allow the timeout to be tuned and the RCU_CPU_STALL_VERBOSE parameter
    remains to allow task-stall information to be suppressed if desired.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

04 May, 2011

10 commits


03 May, 2011

18 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: wm831x-ts - move BTN_TOUCH reporting to data transfer
    Input: wm831x-ts - allow IRQ flags to be specified
    Input: wm831x-ts - fix races with IRQ management

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
    sysctl: net: call unregister_net_sysctl_table where needed
    Revert: veth: remove unneeded ifname code from veth_newlink()
    smsc95xx: fix reset check
    tg3: Fix failure to enable WoL by default when possible
    networking: inappropriate ioctl operation should return ENOTTY
    amd8111e: trivial typo spelling: Negotitate -> Negotiate
    ipv4: don't spam dmesg with "Using LC-trie" messages
    af_unix: Only allow recv on connected seqpacket sockets.
    mii: add support of pause frames in mii_get_an
    net: ftmac100: fix scheduling while atomic during PHY link status change
    usbnet: Transfer of maintainership
    usbnet: add support for some Huawei modems with cdc-ether ports
    bnx2: cancel timer on device removal
    iwl4965: fix "Received BA when not expected"
    iwlagn: fix "Received BA when not expected"
    dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
    usbnet: Resubmit interrupt URB if device is open
    iwl4965: fix "TX Power requested while scanning"
    iwlegacy: led stay solid on when no traffic
    b43: trivial: update module info about ucode16_mimo firmware
    ...

    Linus Torvalds
     
  • ctl_table_headers registered with register_net_sysctl_table should
    have been unregistered with the equivalent unregister_net_sysctl_table

    Signed-off-by: Lucian Adrian Grijincu
    Signed-off-by: David S. Miller

    Lucian Adrian Grijincu
     
  • 84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 ("veth: remove unneeded
    ifname code from veth_newlink()") caused regression on veth
    creation. This patch reverts the original one.

    Reported-by: Michał Mirosław
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • The reset loop check should check the MII_BMCR register value for
    BMCR_RESET rather than for MII_BMCR (the register address, which also
    happens to be zero).

    Signed-off-by: Rabin Vincent
    Signed-off-by: David S. Miller

    Rabin Vincent
     
  • tg3 is supposed to enable WoL by default on adapters which support
    that, but it fails to do so unless the adapter's
    /sys/devices/.../power/wakeup file contains 'enabled' during the
    initialization of the adapter. Fix that by making tg3 use
    device_set_wakeup_enable() to enable wakeup automatically whenever
    WoL should be enabled by default.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: David S. Miller

    Rafael J. Wysocki
     
  • ioctl() calls against a socket with an inappropriate ioctl operation
    are incorrectly returning EINVAL rather than ENOTTY:

    [ENOTTY]
    Inappropriate I/O control operation.

    BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=33992

    Signed-off-by: Lifeng Sun
    Signed-off-by: David S. Miller

    Lifeng Sun
     
  • The use of base for %ebx in this file is arbitrary, *except* that we
    also use it to compute the real-mode segment. Therefore, make it so
    that r_base really is the true address to which %ebx points.

    This resolves kernel bugzilla 33302.

    Reported-and-tested-by: Alexey Zaytsev
    Signed-off-by: H. Peter Anvin
    Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org

    H. Peter Anvin
     
  • Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • mask_rw_pte is currently checking if a pfn is a pagetable page if it
    falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
    because pgt_buf_end is a moving target: pgt_buf_top is the real
    boundary.

    Acked-by: "H. Peter Anvin"
    Signed-off-by: Stefano Stabellini
    Signed-off-by: Konrad Rzeszutek Wilk

    Stefano Stabellini
     
  • As a consequence of the commit:

    commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e
    Author: Yinghai Lu
    Date: Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

    it causes the Linux kernel to crash under Xen:

    mapping kernel into physical memory
    Xen: setup ISA identity maps
    about to get started...
    (XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
    (XEN) mm.c:3027:d0 Error while pinning mfn b1d89
    (XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
    (XEN) domain_crash_sync called from entry.S
    (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
    ...

    The reason is that at some point init_memory_mapping is going to reach
    the pagetable pages area and map those pages too (mapping them as normal
    memory that falls in the range of addresses passed to init_memory_mapping
    as argument). Some of those pages are already pagetable pages (they are
    in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
    mapped RO and everything is fine.
    Some of these pages are not pagetable pages yet (they fall in the range
    pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
    are going to be mapped RW. When these pages become pagetable pages and
    are hooked into the pagetable, xen will find that the guest has already
    a RW mapping of them somewhere and fail the operation.
    The reason Xen requires pagetables to be RO is that the hypervisor needs
    to verify that the pagetables are valid before using them. The validation
    operations are called "pinning" (more details in arch/x86/xen/mmu.c).

    In order to fix the issue we mark all the pages in the entire range
    pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
    is completed only the range pgt_buf_start-pgt_buf_end is reserved by
    init_memory_mapping. Hence the kernel is going to crash as soon as one
    of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
    ranges are RO).

    For this reason, this function is introduced which is called _after_
    the init_memory_mapping has completed (in a perfect world we would
    call this function from init_memory_mapping, but lets ignore that).

    Because we are called _after_ init_memory_mapping the pgt_buf_[start,
    end,top] have all changed to new values (b/c another init_memory_mapping
    is called). Hence, the first time we enter this function, we save
    away the pgt_buf_start value and update the pgt_buf_[end,top].

    When we detect that the "old" pgt_buf_start through pgt_buf_end
    PFNs have been reserved (so memblock_x86_reserve_range has been called),
    we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.

    And then we update those "old" pgt_buf_[end|top] with the new ones
    so that we can redo this on the next pagetable.

    Acked-by: "H. Peter Anvin"
    Reviewed-by: Jeremy Fitzhardinge
    [v1: Updated with Jeremy's comments]
    [v2: Added the crash output]
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • David S. Miller
     
  • * 'for-linus' of git://git.infradead.org/ubifs-2.6:
    UBIFS: seek journal heads to the latest bud in replay
    UBIFS: do not free write-buffers when in R/O mode

    Linus Torvalds
     
  • * 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: (47 commits)
    CLKDEV: Fix clkdev return value for NULL clk case
    ARM: 6891/1: prevent heap corruption in OABI semtimedop
    ARM: kprobes: Tidy-up kprobes-decode.c
    ARM: kprobes: Add emulation of hint instructions like NOP and WFI
    ARM: kprobes: Add emulation of SBFX, UBFX, BFI and BFC instructions
    ARM: kprobes: Add emulation of MOVW and MOVT instructions
    ARM: kprobes: Reject probing of undefined data processing instructions
    ARM: kprobes: Remove redundant code in space_1111
    ARM: kprobes: Fix emulation of PLD instructions
    ARM: kprobes: Reject probing of SETEND instructions
    ARM: kprobes: Consolidate stub decoding functions
    ARM: kprobes: Reject probing of all coprocessor instructions
    ARM: kprobes: Fix emulation of USAD8 instructions
    ARM: kprobes: Fix emulation of SMUAD, SMUSD and SMMUL instructions
    ARM: kprobes: Fix emulation of SXTB16, SXTB, SXTH, UXTB16, UXTB and UXTH instructions
    ARM: kprobes: Reject probing of undefined media instructions
    ARM: kprobes: Add emulation of RBIT instruction
    ARM: kprobes: Reject probing of LDRB instructions which load PC
    ARM: kprobes: Fix emulation of LDRD and STRD instructions
    ARM: kprobes: Reject probing of LDR/STR instructions which update PC unpredictably
    ...

    Linus Torvalds
     
  • commit ab7798ffcf98b11a9525cf65bacdae3fd58d357f ("genirq: Expand generic
    show_interrupts()") added the Kconfig option GENERIC_IRQ_SHOW_LEVEL to
    accomodate PowerPC, but this doesn't actually enable the functionality due
    to a typo in the #ifdef check.

    Signed-off-by: Geert Uytterhoeven
    Cc: Linux/PPC Development
    Link: http://lkml.kernel.org/r/%3Calpine.DEB.2.00.1104302251370.19068%40ayla.of.borg%3E
    Signed-off-by: Thomas Gleixner

    Geert Uytterhoeven
     
  • This is the second fix of the following symptom:

    UBIFS error (pid 34456): could not find an empty LEB

    which sometimes happens after power cuts when we mount the file-system - UBIFS
    refuses it with the above error message which comes from the
    'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test
    with the UBIFS power cut emulation enabled.

    Analysis of the problem.

    Currently UBIFS replay seeks the journal heads to the last _replayed_ bud.
    But the buds are replayed out-of-order, so the replay basically seeks journal
    heads to the "random" bud belonging to this head, and not to the _last_ one.

    The result of this is that the GC head may be seeked to a full LEB with no free
    space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a
    fully or mostly dirty LEB to match the current GC head (because we need to
    garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum).
    So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and
    also fail. As a result - recovery fails and mounting fails.

    This patch teaches the replay to initialize the GC heads exactly to the latest
    buds, i.e. the buds which have the largest sequence number in corresponding
    log reference nodes.

    Signed-off-by: Artem Bityutskiy
    Cc: stable@kernel.org

    Artem Bityutskiy
     
  • Currently UBIFS has a small optimization - it frees write-buffers when it is
    re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it
    does not allocate write-buffers as well.

    This optimization is nice but it leads to subtle problems and complications
    in recovery, which I can reproduce using the integck test. The symptoms are
    that after a power cut the file-system cannot be mounted if we first mount
    it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints:

    UBIFS error (pid 34456): could not find an empty LEB

    Analysis of the problem.

    When mounting R/W, the reply process sets journal heads to buds [1], but
    when mounting R/O - it does not do this, because the write-buffers are not
    allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the
    same file-system but for the following 2 cases:

    1. mounting R/W after a power cut and recover
    2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery

    In the former case, we have journal heads seeked to the a bud, in the latter
    case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not
    try to recover the GC LEB by garbage-collecting to the GC head, but we just
    try to find an empty LEB, and there may be no empty LEBs, so we just fail.
    On the other hand, in the former case (mount R/W), we are able to make a GC LEB
    (@c->gc_lnum) by garbage-collecting.

    Thus, let's remove this small nice optimization and always allocate
    write-buffers. This should not make too big difference - we have only 3
    of them, each of max. write unit size, which is usually 2KiB. So this is
    about 6KiB of RAM for the typical case, and only when mounted R/O.

    [1]: Note, currently the replay process is setting (seeking) the journal heads
    to _some_ buds, not necessarily to the buds which had been the journal heads
    before the power cut happened. This will be fixed separately.

    Signed-off-by: Artem Bityutskiy
    Cc: stable@kernel.org

    Artem Bityutskiy
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: hda - Fix Realtek's chained fixup checks
    Revert "ALSA: hda - Fix pin-config of Gigabyte mobo"
    ALSA: HDA: Fix automute for Gateway NV79
    ALSA: hda: add beep quirk for Realtek 0x1043:831a
    ALSA: usb-audio - Terratec Aureon 7.1 USB ID as C-Media cm6206 quirks
    ALSA: hda - VIA: Fix notify_aa_path_ctls() invalid issue.
    ALSA - au88x0 - Add buffer bytes constraints

    Linus Torvalds
     

02 May, 2011

9 commits

  • * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
    [S390] irqstats: fix counting of pfault, dasd diag and virtio irqs
    [S390] prng: fix pointer arithmetic

    Linus Torvalds
     
  • * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
    hwmon: (twl4030-madc-hwmon) Return proper error if hwmon_device_register fails

    Linus Torvalds
     
  • * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
    i2c-parport: Fix adapter list handling
    i2c-i801: Move device ID definitions to driver

    Linus Torvalds
     
  • The old code considered valid empty LZMA2 streams to be corrupt.
    Note that a typical empty .xz file has no LZMA2 data at all,
    and thus most .xz files having no uncompressed data are handled
    correctly even without this fix.

    Signed-off-by: Lasse Collin
    Signed-off-by: Linus Torvalds

    Lasse Collin
     
  • The check of chained fixup list entry was done against the wrong element.
    A stupid mistake during refactoring.

    Cc:
    Signed-off-by: Takashi Iwai

    Takashi Iwai
     
  • This reverts commit c6b358748e19ce7e230b0926ac42696bc485a562.

    It turned out that there are different pin configurations for this
    PCI SSID, including multi-channel modes. And more proper fix for
    allowing line-out mutes will come up in 2.6.40 tree, so we won't need
    this fixup any more there.

    Reported-by: Andrew Clayton
    Reported-by: Emmanuel Benisty
    Cc:
    Signed-off-by: Takashi Iwai

    Takashi Iwai
     
  • fib_trie_table() is called during netns creation and
    Chromium uses clone(CLONE_NEWNET) to sandbox renderer process.

    Don't print anything.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • This fixes the following oops discovered by Dan Aloni:
    > Anyway, the following is the output of the Oops that I got on the
    > Ubuntu kernel on which I first detected the problem
    > (2.6.37-12-generic). The Oops that followed will be more useful, I
    > guess.

    >[ 5594.669852] BUG: unable to handle kernel NULL pointer dereference
    > at           (null)
    > [ 5594.681606] IP: [] unix_dgram_recvmsg+0x1fb/0x420
    > [ 5594.687576] PGD 2a05d067 PUD 2b951067 PMD 0
    > [ 5594.693720] Oops: 0002 [#1] SMP
    > [ 5594.699888] last sysfs file:

    The bug was that unix domain sockets use a pseduo packet for
    connecting and accept uses that psudo packet to get the socket.
    In the buggy seqpacket case we were allowing unconnected
    sockets to call recvmsg and try to receive the pseudo packet.

    That is always wrong and as of commit 7361c36c5 the pseudo
    packet had become enough different from a normal packet
    that the kernel started oopsing.

    Do for seqpacket_recv what was done for seqpacket_send in 2.5
    and only allow it on connected seqpacket sockets.

    Cc: stable@kernel.org
    Tested-by: Dan Aloni
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • numa_cleanup_meminfo() trims each memblk between low (0) and
    high (max_pfn) limits and discards empty ones. However, the
    emptiness detection incorrectly used equality test. If the
    start of a memblk is higher than max_pfn, it is empty but fails
    the equality test and doesn't get discarded.

    The condition triggers when max_pfn is lower than start of a
    NUMA node and results in memory misconfiguration - leading to
    WARN_ON()s and other funnies. The bug was discovered in devel
    branch where 32bit too uses this code path for NUMA init. If a
    node is above the addressing limit, max_pfn ends up lower than
    the node triggering this problem.

    The failure hasn't been observed on x86-64 but is still possible
    with broken hardware e820/NUMA info. As the fix is very low
    risk, it would be better to apply it even for 64bit.

    Fix it by using >= instead of ==.

    Signed-off-by: Yinghai Lu
    [ Extracted the actual fix from the original patch and rewrote patch description. ]
    Signed-off-by: Tejun Heo
    Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.org
    Signed-off-by: Ingo Molnar

    Yinghai Lu