29 May, 2012

2 commits

  • On systems based on chip select rows, all channels need to use memories
    with the same properties, otherwise the memories on channels A and B
    won't be recognized.

    However, such assumption is not true for all types of memory
    controllers.

    Controllers for FB-DIMM's don't have such requirements.

    Also, modern Intel controllers seem to be capable of handling such
    differences.

    So, we need to get rid of storing the DIMM information into a per-csrow
    data, storing it, instead at the right place.

    The first step is to move grain, mtype, dtype and edac_mode to the
    per-dimm struct.

    Reviewed-by: Aristeu Rozanski
    Reviewed-by: Borislav Petkov
    Acked-by: Chris Metcalf
    Cc: Doug Thompson
    Cc: Borislav Petkov
    Cc: Mark Gross
    Cc: Jason Uhlenkott
    Cc: Tim Small
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: Olof Johansson
    Cc: Egor Martovetsky
    Cc: Michal Marek
    Cc: Jiri Kosina
    Cc: Joe Perches
    Cc: Dmitry Eremin-Solenikov
    Cc: Benjamin Herrenschmidt
    Cc: Hitoshi Mitake
    Cc: Andrew Morton
    Cc: James Bottomley
    Cc: "Niklas Söderlund"
    Cc: Shaohui Xie
    Cc: Josh Boyer
    Cc: Mike Williams
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     
  • The way a DIMM is currently represented implies that they're
    linked into a per-csrow struct. However, some drivers don't see
    csrows, as they're ridden behind some chip like the AMB's
    on FBDIMM's, for example.

    This forced drivers to fake^Wvirtualize a csrow struct, and to create
    a mess under csrow/channel original's concept.

    Move the DIMM labels into a per-DIMM struct, and add there
    the real location of the socket, in terms of csrow/channel.
    Latter patches will modify the location to properly represent the
    memory architecture.

    All other drivers will use a per-csrow type of location.
    Some of those drivers will require a latter conversion, as
    they also fake the csrows internally.

    TODO: While this patch doesn't change the existing behavior, on
    csrows-based memory controllers, a csrow/channel pair points to a memory
    rank. There's a known bug at the EDAC core that allows having different
    labels for the same DIMM, if it has more than one rank. A latter patch
    is need to merge the several ranks for a DIMM into the same dimm_info
    struct, in order to avoid having different labels for the same DIMM.

    The edac_mc_alloc() will now contain a per-dimm initialization loop that
    will be changed by latter patches in order to match other types of
    memory architectures.

    Reviewed-by: Aristeu Rozanski
    Reviewed-by: Borislav Petkov
    Cc: Doug Thompson
    Cc: Ranganathan Desikan
    Cc: "Arvind R."
    Cc: "Niklas Söderlund"
    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

21 May, 2012

1 commit


20 May, 2012

5 commits

  • Pull PA-RISC fixes from James Bottomley:
    "This is a set of three bug fixes that gets parisc running again on
    systems with PA1.1 processors.

    Two fix regressions introduced in 2.6.39 and one fixes a prefetch bug
    that only affects PA7300LC processors. We also have another pending
    fix to do with the sectional arrangement of vmlinux.lds, but there's a
    query on it during testing on one particular system type, so I'll hold
    off sending it in for now."

    * tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
    [PARISC] fix panic on prefetch(NULL) on PA7300LC
    [PARISC] fix crash in flush_icache_page_asm on PA1.1
    [PARISC] fix PA1.1 oops on boot

    Linus Torvalds
     
  • Pull x86 linker bug workarounds from Peter Anvin.

    GNU ld-2.22.52.0.[12] (*) has an unfortunate bug where it incorrectly
    turns certain relocation entries absolute. Section-relative symbols
    that are part of otherwise empty sections are silently changed them to
    absolute. We rely on section-relative symbols staying section-relative,
    and actually have several sections in the linker script solely for this
    purpose.

    See for example

    http://sourceware.org/bugzilla/show_bug.cgi?id=14052

    We could just black-list the buggy linker, but it appears that it got
    shipped in at least F17, and possibly other distros too, so it's sadly
    not some rare unusual case.

    This backports the workaround from the x86/trampoline branch, and as
    Peter says: "This is not a minimal fix, not at all, but it is a tested
    code base."

    * 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, relocs: When printing an error, say relative or absolute
    x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
    x86, realmode: 16-bit real-mode code support for relocs tool

    (*) That's a manly release numbering system. Stupid, sure. But manly.

    Linus Torvalds
     
  • Pull block layer fixes from Jens Axboe:
    "A few small, but important fixes. Most of them are marked for stable
    as well

    - Fix failure to release a semaphore on error path in mtip32xx.
    - Fix crashable condition in bio_get_nr_vecs().
    - Don't mark end-of-disk buffers as mapped, limit it to i_size.
    - Fix for build problem with CONFIG_BLOCK=n on arm at least.
    - Fix for a buffer overlow on UUID partition printing.
    - Trivial removal of unused variables in dac960."

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: fix buffer overflow when printing partition UUIDs
    Fix blkdev.h build errors when BLOCK=n
    bio allocation failure due to bio_get_nr_vecs()
    block: don't mark buffers beyond end of disk as mapped
    mtip32xx: release the semaphore on an error path
    dac960: Remove unused variables from DAC960_CreateProcEntries()

    Linus Torvalds
     
  • Pull one more networking bug-fix from David Miller:
    "One last straggler.

    Eric Dumazet's pktgen unload oops fix was not entirely complete, but
    all the cases should be handled properly now.... fingers crossed."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    pktgen: fix module unload for good

    Linus Torvalds
     
  • Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows
    a flurry of hundreds of warnings at kernel/res_counter.c:96, where
    res_counter_uncharge_locked() does WARN_ON(counter->usage < val).

    The first trace of each flurry implicates __mem_cgroup_cancel_charge()
    of mc.precharge, and an audit of mc.precharge handling points to
    mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850e806
    ("memcg: avoid THP split in task migration").

    Checking !mc.precharge is good everywhere else, when a single page is to
    be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to
    follow, is liable to result in underflow (a lot can change since the
    precharge was estimated).

    Simply check against HPAGE_PMD_NR: there's probably a better
    alternative, trying precharge for more, splitting if unsuccessful; but
    this one-liner is safer for now - no kernel/res_counter.c:96 warnings
    seen in 26 hours.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

19 May, 2012

13 commits

  • When the relocs tool throws an error, let the error message say if it
    is an absolute or relative symbol. This should make it a lot more
    clear what action the programmer needs to take and should help us find
    the reason if additional symbol bugs show up.

    Signed-off-by: H. Peter Anvin
    Cc:

    H. Peter Anvin
     
  • GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
    section-relative to absolute if they are in a section of zero length.
    This turns the symbols __init_begin and __init_end into absolute
    symbols. Let the relocs program know that those should be treated as
    relative symbols.

    Reported-by: Ingo Molnar
    Signed-off-by: H. Peter Anvin
    Cc: H.J. Lu
    Cc:
    Cc: Jarkko Sakkinen

    H. Peter Anvin
     
  • A new option is added to the relocs tool called '--realmode'.
    This option causes the generation of 16-bit segment relocations
    and 32-bit linear relocations for the real-mode code. When
    the real-mode code is moved to the low-memory during kernel
    initialization, these relocation entries can be used to
    relocate the code properly.

    In the assembly code 16-bit segment relocations must be relative
    to the 'real_mode_seg' absolute symbol. Linear relocations must be
    relative to a symbol prefixed with 'pa_'.

    16-bit segment relocation is used to load cs:ip in 16-bit code.
    Linear relocations are used in the 32-bit code for relocatable
    data references. They are declared in the linker script of the
    real-mode code.

    The relocs tool is moved to arch/x86/tools/relocs.c, and added new
    target archscripts that can be used to build scripts needed building
    an architecture. be compiled before building the arch/x86 tree.

    [ hpa: accelerating this because it detects invalid absolute
    relocations, a serious bug in binutils 2.22.52.0.x which currently
    produces bad kernels. ]

    Signed-off-by: H. Peter Anvin
    Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.com
    Signed-off-by: Jarkko Sakkinen
    Signed-off-by: H. Peter Anvin
    Cc:

    H. Peter Anvin
     
  • Pull a dm fix from Alasdair G Kergon:
    "A fix to the thin provisioning userspace interface."

    * tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
    dm thin: fix table output when pool target disables discard passdown internally

    Linus Torvalds
     
  • When the thin pool target clears the discard_passdown parameter
    internally, it incorrectly changes the table line reported to userspace.
    This breaks dumb string comparisons on these table lines in generic
    userspace device-mapper library code and leads to tables being reloaded
    repeatedly when nothing is actually meant to be changing.

    This patch corrects this by no longer changing the table line when
    discard passdown was disabled.

    We can still tell when discard passdown is overridden by looking for the
    message "Discard unsupported by data device (sdX): Disabling discard passdown."

    This automatic detection is also moved from the 'load' to the 'resume'
    so that it is re-evaluated should the properties of underlying devices
    change.

    Signed-off-by: Mike Snitzer
    Acked-by: Joe Thornber
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Pull one more md bugfix from NeilBrown:
    "Fix bug in recent fix to RAID10.

    Without this patch, recovery will crash"

    * tag 'md-3.4-fixes' of git://neil.brown.name/md:
    md/raid10: fix transcription error in calc_sectors conversion.

    Linus Torvalds
     
  • Pull tile tree bugfix from Chris Metcalf:
    "This fixes a security vulnerability (and correctness bug) in tilegx"

    * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    tilegx: enable SYSCALL_WRAPPERS support

    Linus Torvalds
     
  • The old code was
    sector_div(stride, fc);
    the new code was
    sector_dir(size, conf->near_copies);

    'size' is right (the stride various wasn't really needed), but
    'fc' means 'far_copies', and that is an important difference.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Merge misc fixes from Andrew Morton.

    * emailed from Andrew Morton : (4 patches)
    frv: delete incorrect task prototypes causing compile fail
    slub: missing test for partial pages flush work in flush_all()
    fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
    drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01

    Linus Torvalds
     
  • Instead of doing the i_mode calculations at proc_fd_instantiate() time,
    move them into tid_fd_revalidate(), which is where the other inode state
    (notably uid/gid information) is updated too.

    Otherwise we'll end up with stale i_mode information if an fd is re-used
    while the dentry still hangs around. Not that anything really *cares*
    (symlink permissions don't really matter), but Tetsuo Handa noticed that
    the owner read/write bits don't always match the state of the
    readability of the file descriptor, and we _used_ to get this right a
    long time ago in a galaxy far, far away.

    Besides, aside from fixing an ugly detail (that has apparently been this
    way since commit 61a28784028e: "proc: Remove the hard coded inode
    numbers" in 2006), this removes more lines of code than it adds. And it
    just makes sense to update i_mode in the same place we update i_uid/gid.

    Al Viro correctly points out that we could just do the inode fill in the
    inode iops ->getattr() function instead. However, that does require
    somewhat slightly more invasive changes, and adds yet *another* lookup
    of the file descriptor. We need to do the revalidate() for other
    reasons anyway, and have the file descriptor handy, so we might as well
    fill in the information at this point.

    Reported-by: Tetsuo Handa
    Cc: Al Viro
    Acked-by: Eric Biederman
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • commit c57b5468406 (pktgen: fix crash at module unload) did a very poor
    job with list primitives.

    1) list_splice() arguments were in the wrong order

    2) list_splice(list, head) has undefined behavior if head is not
    initialized.

    3) We should use the list_splice_init() variant to clear pktgen_threads
    list.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Some discussion with the glibc mailing lists revealed that this was
    necessary for 64-bit platforms with MIPS-like sign-extension rules
    for 32-bit values. The original symptom was that passing (uid_t)-1 to
    setreuid() was failing in programs linked -pthread because of the "setxid"
    mechanism for passing setxid-type function arguments to the syscall code.
    SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with
    proper sign-extension and is thus the appropriate fix for this problem.

    On other platforms (s390, powerpc, sparc64, and mips) this was fixed
    in 2.6.28.6. The general issue is tracked as CVE-2009-0029.

    Cc:
    Signed-off-by: Chris Metcalf

    Chris Metcalf
     
  • Pull a machine check recovery fix from Tony Luck.

    I really don't like how the MCE code does some of the things it does,
    but this does seem to be an improvement.

    * tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
    x86/mce: Only restart instruction after machine check recovery if it is safe

    Linus Torvalds
     

18 May, 2012

16 commits

  • Commit 41101809a865 ("fork: Provide weak arch_release_[task_struct|
    thread_info] functions") in -tip highlights a problem in the frv arch,
    where it has needles prototypes for alloc_task_struct_node and
    free_task_struct. This now shows up as:

    kernel/fork.c:120:66: error: static declaration of 'alloc_task_struct_node' follows non-static declaration
    kernel/fork.c:127:51: error: static declaration of 'free_task_struct' follows non-static declaration

    since that commit turned them into real functions. Since arch/frv does
    does not define define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR (i.e. it just
    uses the generic ones) it shouldn't list these at all.

    Signed-off-by: Paul Gortmaker
    Cc: David Howells
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     
  • I found some kernel messages such as:

    SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects.
    Pid: 6143, comm: mdadm Tainted: G O 3.4.0-rc6+ #75
    Call Trace:
    kmem_cache_destroy+0x328/0x400
    free_conf+0x2d/0xf0 [raid456]
    stop+0x41/0x60 [raid456]
    md_stop+0x1a/0x60 [md_mod]
    do_md_stop+0x74/0x470 [md_mod]
    md_ioctl+0xff/0x11f0 [md_mod]
    blkdev_ioctl+0xd8/0x7a0
    block_ioctl+0x3b/0x40
    do_vfs_ioctl+0x96/0x560
    sys_ioctl+0x91/0xa0
    system_call_fastpath+0x16/0x1b

    Then using kmemleak I found these messages:

    unreferenced object 0xffff8800b6db7380 (size 112):
    comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s)
    hex dump (first 32 bytes):
    01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff .....N..........
    ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff .........@J.....
    backtrace:
    kmemleak_alloc+0x21/0x50
    kmem_cache_alloc+0xeb/0x1b0
    kmem_cache_open+0x2f1/0x430
    kmem_cache_create+0x158/0x320
    setup_conf+0x649/0x770 [raid456]
    run+0x68b/0x840 [raid456]
    md_run+0x529/0x940 [md_mod]
    do_md_run+0x18/0xc0 [md_mod]
    md_ioctl+0xba8/0x11f0 [md_mod]
    blkdev_ioctl+0xd8/0x7a0
    block_ioctl+0x3b/0x40
    do_vfs_ioctl+0x96/0x560
    sys_ioctl+0x91/0xa0
    system_call_fastpath+0x16/0x1b

    This bug was introduced by commit a8364d5555b ("slub: only IPI CPUs that
    have per cpu obj to flush"), which did not include checks for per cpu
    partial pages being present on a cpu.

    Signed-off-by: majianpeng
    Cc: Gilad Ben-Yossef
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Tested-by: Jeff Layton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    majianpeng
     
  • map_files/ entries are never supposed to be executed, still curious
    minds might try to run them, which leads to the following deadlock

    ======================================================
    [ INFO: possible circular locking dependency detected ]
    3.4.0-rc4-24406-g841e6a6 #121 Not tainted
    -------------------------------------------------------
    bash/1556 is trying to acquire lock:
    (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1

    but task is already holding lock:
    (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&sig->cred_guard_mutex){+.+.+.}:
    validate_chain+0x444/0x4f4
    __lock_acquire+0x387/0x3f8
    lock_acquire+0x12b/0x158
    __mutex_lock_common+0x56/0x3a9
    mutex_lock_killable_nested+0x40/0x45
    lock_trace+0x24/0x59
    proc_map_files_lookup+0x5a/0x165
    __lookup_hash+0x52/0x73
    do_lookup+0x276/0x2b1
    walk_component+0x3d/0x114
    do_last+0xfc/0x540
    path_openat+0xd3/0x306
    do_filp_open+0x3d/0x89
    do_sys_open+0x74/0x106
    sys_open+0x21/0x23
    tracesys+0xdd/0xe2

    -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
    check_prev_add+0x6a/0x1ef
    validate_chain+0x444/0x4f4
    __lock_acquire+0x387/0x3f8
    lock_acquire+0x12b/0x158
    __mutex_lock_common+0x56/0x3a9
    mutex_lock_nested+0x40/0x45
    do_lookup+0x267/0x2b1
    walk_component+0x3d/0x114
    link_path_walk+0x1f9/0x48f
    path_openat+0xb6/0x306
    do_filp_open+0x3d/0x89
    open_exec+0x25/0xa0
    do_execve_common+0xea/0x2f9
    do_execve+0x43/0x45
    sys_execve+0x43/0x5a
    stub_execve+0x6c/0xc0

    This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
    and when do_lookup happens we try to grab task->signal->cred_guard_mutex
    again in lock_trace.

    Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
    and in proc_map_files_readdir() instead of lock_trace(), the caller must
    be CAP_SYS_ADMIN granted anyway.

    Signed-off-by: Cyrill Gorcunov
    Reported-by: Sasha Levin
    Cc: Konstantin Khlebnikov
    Cc: Pavel Emelyanov
    Cc: Dave Jones
    Cc: Vasiliy Kulikov
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • The reset date of the ST Micro version of PL031 is 2000-01-01. The
    correct weekday for 2000-01-01 is saturday, but pl031 is initialized to
    sunday. This may lead to alarm malfunction, so configure the correct
    wday if RTC_DR indicates reset.

    Signed-off-by: Rajkumar Kasirajan
    Signed-off-by: Linus Walleij
    Cc: Mattias Wallin
    Cc: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rajkumar Kasirajan
     
  • Pull ARM fixes from Russell King:
    "Small set of fixes again."

    * 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
    ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path
    ARM: 7418/1: LPAE: fix access flag setup in mem_type_table
    ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS
    ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access

    Linus Torvalds
     
  • Pull two networking fixes from David S. Miller:

    1) Thanks to Willy Tarreau and Eric Dumazet, we've unlocked a bug that's
    been present in do_tcp_sendpages() since that function was written in
    2002.

    When we block to wait for memory we have to unconditionally try and
    push out pending TCP data, otherwise we can block for an unreasonably
    long amount of time.

    2) Fix deadlock in e1000, fixes kernel bugzilla 43132

    From Tushar Dave.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    e1000: Prevent reset task killing itself.
    tcp: do_tcp_sendpages() must try to push data out on oom conditions

    Linus Torvalds
     
  • Commit 1cc0c998fdf2 ("ACPI: Fix D3hot v D3cold confusion") introduced a
    bug in __acpi_bus_set_power() and changed the behavior of
    acpi_pci_set_power_state() in such a way that it generally doesn't work
    as expected if PCI_D3hot is passed to it as the second argument.

    First off, if ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) is passed to
    __acpi_bus_set_power() and the explicit_set flag is set for the D3cold
    state, the function will try to execute AML method called "_PS4", which
    doesn't exist.

    Fix this by adding a check to ensure that the name of the AML method
    to execute for transitions to ACPI_STATE_D3_COLD is correct in
    __acpi_bus_set_power(). Also make sure that the explicit_set flag
    for ACPI_STATE_D3_COLD will be set if _PS3 is present and modify
    acpi_power_transition() to avoid accessing power resources for
    ACPI_STATE_D3_COLD, because they don't exist.

    Second, if PCI_D3hot is passed to acpi_pci_set_power_state() as the
    target state, the function will request a transition to
    ACPI_STATE_D3_HOT instead of ACPI_STATE_D3. However,
    ACPI_STATE_D3_HOT is now only marked as supported if the _PR3 AML
    method is defined for the given device, which is rare. This causes
    problems to happen on systems where devices were successfully put
    into ACPI D3 by pci_set_power_state(PCI_D3hot) which doesn't work
    now. In particular, some unused graphics adapters are not turned
    off as a result.

    To fix this issue restore the old behavior of
    acpi_pci_set_power_state(), which is to request a transition to
    ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) if either PCI_D3hot or
    PCI_D3cold is passed to it as the argument.

    This approach is not ideal, because generally power should not
    be removed from devices if PCI_D3hot is the target power state,
    but since this behavior is relied on, we have no choice but to
    restore it at the moment and spend more time on designing a
    better solution in the future.

    References: https://bugzilla.kernel.org/show_bug.cgi?id=43228
    Reported-by: rocko
    Reported-by: Cristian Rodríguez
    Reported-and-tested-by: Peter
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Killing reset task while adapter is resetting causes deadlock.
    Only kill reset task if adapter is not resetting.
    Ref bug #43132 on bugzilla.kernel.org

    CC: stable@vger.kernel.org
    Signed-off-by: Tushar Dave
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Tushar Dave
     
  • Since recent changes on TCP splicing (starting with commits 2f533844
    "tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
    tcp_sendpages() should call tcp_push() once"), I started seeing
    massive stalls when forwarding traffic between two sockets using
    splice() when pipe buffers were larger than socket buffers.

    Latest changes (net: netdev_alloc_skb() use build_skb()) made the
    problem even more apparent.

    The reason seems to be that if do_tcp_sendpages() fails on out of memory
    condition without being able to send at least one byte, tcp_push() is not
    called and the buffers cannot be flushed.

    After applying the attached patch, I cannot reproduce the stalls at all
    and the data rate it perfectly stable and steady under any condition
    which previously caused the problem to be permanent.

    The issue seems to have been there since before the kernel migrated to
    git, which makes me think that the stalls I occasionally experienced
    with tux during stress-tests years ago were probably related to the
    same issue.

    This issue was first encountered on 3.0.31 and 3.2.17, so please backport
    to -stable.

    Signed-off-by: Willy Tarreau
    Acked-by: Eric Dumazet
    Cc:

    Willy Tarreau
     
  • Pull two more target-core updates from Nicholas Bellinger:
    "The first patch addresses a SPC-2 reservations RELEASE bug in a
    special (iscsi specific) multi-ISID setup case that was allowing the
    same initiator to be able to incorrect release it's own reservation on
    a different SCSI path with enforce_pr_isid=1 operation. This bug was
    caught by Bernhard Kohl.

    The second patch is to address a bug with FILEIO backends where the
    incorrect number of blocks for READ_CAPACITY was being reported after
    an underlying device-mapper block_device size change. This patch uses
    now i_size_read() in fd_get_blocks() for FILEIO backends with an
    underlying block_device, instead of trying to determine this value at
    setup time during fd_create_virtdevice(). (hch CC'ed)

    Both are CC'ed to stable."

    * '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    target: Fix bug in handling of FILEIO + block_device resize ops
    target: Fix SPC-2 RELEASE bug for multi-session iSCSI client setups

    Linus Torvalds
     
  • This patch fixes a bug in the handling of FILEIO w/ underlying block_device
    resize operations where the original fd_dev->fd_dev_size was incorrectly being
    used in fd_get_blocks() for READ_CAPACITY response payloads.

    This patch avoids using fd_dev->fd_dev_size for FILEIO devices with
    an underlying block_device, and instead changes fd_get_blocks() to
    get the sector count directly from i_size_read() as recommended by hch.

    Reported-by: Christoph Hellwig
    Cc:
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • Pull slave-dmaengine fixes fromVinod Koul:
    "fixes of cylic dma usages in slave dma drivers"

    * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
    dmaengine: fix cyclic dma usage
    dmaengine: pl330: dont complete descriptor for cyclic dma

    Linus Torvalds
     
  • Pull last minute virtio fixes from Michael S. Tsirkin:
    "Here are a couple of last minute virtio fixes for 3.4. Hope it's not
    too late yes - I might have tried too hard to make sure the fix is
    well tested.

    Fixes are by Amit and myself. One fixes module removal and one
    suspend of a VM, the last one the handling of out of memory condition.

    They are thus very low risk as most people never hit these paths, but
    do fix very annoying problems for people that do use the feature.

    Signed-off-by: Michael S. Tsirkin "

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio_net: invoke softirqs after __napi_schedule
    virtio: balloon: let host know of updated balloon size before module removal
    virtio: console: tell host of open ports after resume from s3/s4

    Linus Torvalds
     
  • Pull ARM: SoC fixes from Olof Johansson:
    "I will stop trying to predict when we're done with fixes for a
    release.

    Here's another small batch of three patches for arm-soc:

    - A fix for a boot time WARN_ON() due to irq domain conversion on
    PRIMA2
    - Fix for a regression in Tegra SMP spinup code due to swapped
    register offsets
    - Fixed config dependency for mv_cesa crypto driver to avoid build
    breakage"

    * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    ARM: PRIMA2: fix irq domain size and IRQ mask of internal interrupt controller
    crypto: mv_cesa requires on CRYPTO_HASH to build
    ARM: tegra: Fix flow controller accesses

    Linus Torvalds
     
  • Pull two md fixes from NeilBrown:
    "One fixes a bug in the new raid10 resize code so is relevant to 3.4
    only.

    The other fixes a bug in the use of md by dm-raid, so is relevant to
    any kernel with dm-raid support"

    * tag 'md-3.4-fixes' of git://neil.brown.name/md:
    MD: Add del_timer_sync to mddev_suspend (fix nasty panic)
    md/raid10: set dev_sectors properly when resizing devices in array.

    Linus Torvalds
     
  • …-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull perf, x86 and scheduler updates from Ingo Molnar.

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    tracing: Do not enable function event with enable
    perf stat: handle ENXIO error for perf_event_open
    perf: Turn off compiler warnings for flex and bison generated files
    perf stat: Fix case where guest/host monitoring is not supported by kernel
    perf build-id: Fix filename size calculation

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
    x86: Fix section annotation of acpi_map_cpu2node()
    x86/microcode: Ensure that module is only loaded on supported Intel CPUs

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption

    Linus Torvalds
     

17 May, 2012

3 commits

  • Commit ff9a184c ("ARM: 7400/1: vfp: clear fpscr length and stride bits
    on entry to sig handler") flushes the VFP state prior to entering a
    signal handler so that a VFP operation inside the handler will trap and
    force a restore of ABI-compliant registers. Reflushing and disabling VFP
    on the sigreturn path is predicated on the saved thread state indicating
    that VFP was used by the handler -- however for SMP platforms this is
    only set on context-switch, making the check unreliable and causing VFP
    register corruption in userspace since the register values are not
    necessarily those restored from the sigframe.

    This patch unconditionally flushes the VFP state after a signal handler.
    Since we already perform the flush before the handler and the flushing
    itself happens lazily, the redundant flush when VFP is not used by the
    handler is essentially a nop.

    Reported-by: Jon Medhurst
    Signed-off-by: Jon Medhurst
    Signed-off-by: Will Deacon
    Signed-off-by: Russell King

    Will Deacon
     
  • A zero value for prot_sect in the memory types table implies that
    section mappings should never be created for the memory type in question.
    This is checked for in alloc_init_section().

    With LPAE, we set a bit to mask access flag faults for kernel mappings.
    This breaks the aforementioned (!prot_sect) check in alloc_init_section().

    This patch fixes this bug by first checking for a non-zero
    prot_sect before setting the PMD_SECT_AF flag.

    Signed-off-by: Vitaly Andrianov
    Acked-by: Catalin Marinas
    Signed-off-by: Russell King

    Vitaly Andrianov
     
  • __napi_schedule might raise softirq but nothing
    causes do_softirq to trigger, so it does not in fact
    run. As a result,
    the error message "NOHZ: local_softirq_pending 08"
    sometimes occurs during boot of a KVM guest when the network service is
    started and we are oom:

    ...
    Bringing up loopback interface: [ OK ]
    Bringing up interface eth0:
    Determining IP information for eth0...NOHZ: local_softirq_pending 08
    done.
    [ OK ]
    ...

    Further, receive queue processing might get delayed
    indefinitely until some interrupt triggers:
    virtio_net expected napi to be run immediately.

    One way to cause do_softirq to be executed is by
    invoking local_bh_enable(). As __napi_schedule is
    normally called from bh or irq context, this
    seems to make sense: disable bh before __napi_schedule
    and enable afterwards.

    In fact it's a very complicated way of calling do_softirq(),
    and works since this function is only used when we are not
    in interrupt context. It's not hot at all, in any ideal scenario.

    Reported-by: Ulrich Obergfell
    Tested-by: Ulrich Obergfell
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Rusty Russell

    Michael S. Tsirkin