01 Jan, 2012

1 commit

  • It was found (by Sasha) that if you use a futex located in the gate
    area we get stuck in an uninterruptible infinite loop, much like the
    ZERO_PAGE issue.

    While looking at this problem, PeterZ realized you'll get into similar
    trouble when hitting any install_special_pages() mapping. And are there
    still drivers setting up their own special mmaps without page->mapping,
    and without special VM or pte flags to make get_user_pages fail?

    In most cases, if page->mapping is NULL, we do not need to retry at all:
    Linus points out that even /proc/sys/vm/drop_caches poses no problem,
    because it ends up using remove_mapping(), which takes care not to
    interfere when the page reference count is raised.

    But there is still one case which does need a retry: if memory pressure
    called shmem_writepage in between get_user_pages_fast dropping page
    table lock and our acquiring page lock, then the page gets switched from
    filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
    Fault it back in to get the page->mapping needed for key->shared.inode.

    Reported-by: Sasha Levin
    Signed-off-by: Hugh Dickins
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

31 Dec, 2011

7 commits


30 Dec, 2011

11 commits

  • * 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
    iommu: Initialize domain->handler in iommu_domain_alloc()

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    packet: fix possible dev refcnt leak when bind fail
    netem: dont call vfree() under spinlock and BH disabled
    netfilter: ctnetlink: fix scheduling while atomic if helper is autoloaded
    netfilter: ctnetlink: fix return value of ctnetlink_get_expect()

    Linus Torvalds
     
  • * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Fix raw_spin_unlock_irqrestore() usage
    oprofile, arm/sh: Fix oprofile_arch_exit() linkage issue

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: log all dirty inodes in xfs_fs_sync_fs
    xfs: log the inode in ->write_inode calls for kupdate

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-block:
    block: fix blk_queue_end_tag()
    block: re-use existing 'reading' variable instead of checking direction again
    block, cfq: fix empty queue crash caused by request merge

    Linus Torvalds
     
  • If a huge page is enqueued under the protection of hugetlb_lock, then the
    operation is atomic and safe.

    Signed-off-by: Hillf Danton
    Reviewed-by: Michal Hocko
    Acked-by: KAMEZAWA Hiroyuki
    Cc: [2.6.37+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
    for nohz") did not take into account that one some architectures jiffies
    and cputime use different units.

    This causes get_idle_time() to return numbers in the wrong units, making
    the idle time fields in /proc/stat wrong.

    Instead of converting the usec value returned by
    get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
    usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

    Signed-off-by: Andreas Schwab
    Acked-by: Michal Hocko
    Cc: Arnd Bergmann
    Cc: "Artem S. Tashkinov"
    Cc: Dave Jones
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Schwab
     
  • commit 8aacc9f550 ("mm/mempolicy.c: fix pgoff in mbind vma merge") is the
    slightly incorrect fix.

    Why? Think following case.

    1. map 4 pages of a file at offset 0

    [0123]

    2. map 2 pages just after the first mapping of the same file but with
    page offset 2

    [0123][23]

    3. mbind() 2 pages from the first mapping at offset 2.
    mbind_range() should treat new vma is,

    [0123][23]
    |23|
    mbind vma

    but it does

    [0123][23]
    |01|
    mbind vma

    Oops. then, it makes wrong vma merge and splitting ([01][0123] or similar).

    This patch fixes it.

    [testcase]
    test result - before the patch

    case4: 126: test failed. expect '2,4', actual '2,2,2'
    case5: passed
    case6: passed
    case7: passed
    case8: passed
    case_n: 246: test failed. expect '4,2', actual '1,4'

    ------------[ cut here ]------------
    kernel BUG at mm/filemap.c:135!
    invalid opcode: 0000 [#4] SMP DEBUG_PAGEALLOC

    (snip long bug on messages)

    test result - after the patch

    case4: passed
    case5: passed
    case6: passed
    case7: passed
    case8: passed
    case_n: passed

    source: mbind_vma_test.c
    ============================================================
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static unsigned long pagesize;
    void* mmap_addr;
    struct bitmask *nmask;
    char buf[1024];
    FILE *file;
    char retbuf[10240] = "";
    int mapped_fd;

    char *rubysrc = "ruby -e '\
    pid = %d; \
    vstart = 0x%llx; \
    vend = 0x%llx; \
    s = `pmap -q #{pid}`; \
    rary = []; \
    s.each_line {|line|; \
    ary=line.split(\" \"); \
    addr = ary[0].to_i(16); \
    if(vstart < vend) then \
    rary.push(ary[1].to_i()/4); \
    end; \
    }; \
    print rary.join(\",\"); \
    '";

    void init(void)
    {
    void* addr;
    char buf[128];

    nmask = numa_allocate_nodemask();
    numa_bitmask_setbit(nmask, 0);

    pagesize = getpagesize();

    sprintf(buf, "%s", "mbind_vma_XXXXXX");
    mapped_fd = mkstemp(buf);
    if (mapped_fd == -1)
    perror("mkstemp "), exit(1);
    unlink(buf);

    if (lseek(mapped_fd, pagesize*8, SEEK_SET) < 0)
    perror("lseek "), exit(1);
    if (write(mapped_fd, "\0", 1) < 0)
    perror("write "), exit(1);

    addr = mmap(NULL, pagesize*8, PROT_NONE,
    MAP_SHARED, mapped_fd, 0);
    if (addr == MAP_FAILED)
    perror("mmap "), exit(1);

    if (mprotect(addr+pagesize, pagesize*6, PROT_READ|PROT_WRITE) < 0)
    perror("mprotect "), exit(1);

    mmap_addr = addr + pagesize;

    /* make page populate */
    memset(mmap_addr, 0, pagesize*6);
    }

    void fin(void)
    {
    void* addr = mmap_addr - pagesize;
    munmap(addr, pagesize*8);

    memset(buf, 0, sizeof(buf));
    memset(retbuf, 0, sizeof(retbuf));
    }

    void mem_bind(int index, int len)
    {
    int err;

    err = mbind(mmap_addr+pagesize*index, pagesize*len,
    MPOL_BIND, nmask->maskp, nmask->size, 0);
    if (err)
    perror("mbind "), exit(err);
    }

    void mem_interleave(int index, int len)
    {
    int err;

    err = mbind(mmap_addr+pagesize*index, pagesize*len,
    MPOL_INTERLEAVE, nmask->maskp, nmask->size, 0);
    if (err)
    perror("mbind "), exit(err);
    }

    void mem_unbind(int index, int len)
    {
    int err;

    err = mbind(mmap_addr+pagesize*index, pagesize*len,
    MPOL_DEFAULT, NULL, 0, 0);
    if (err)
    perror("mbind "), exit(err);
    }

    void Assert(char *expected, char *value, char *name, int line)
    {
    if (strcmp(expected, value) == 0) {
    fprintf(stderr, "%s: passed\n", name);
    return;
    }
    else {
    fprintf(stderr, "%s: %d: test failed. expect '%s', actual '%s'\n",
    name, line,
    expected, value);
    // exit(1);
    }
    }

    /*
    AAAA
    PPPPPPNNNNNN
    might become
    PPNNNNNNNNNN
    case 4 below
    */
    void case4(void)
    {
    init();
    sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

    mem_bind(0, 4);
    mem_unbind(2, 2);

    file = popen(buf, "r");
    fread(retbuf, sizeof(retbuf), 1, file);
    Assert("2,4", retbuf, "case4", __LINE__);

    fin();
    }

    /*
    AAAA
    PPPPPPNNNNNN
    might become
    PPPPPPPPPPNN
    case 5 below
    */
    void case5(void)
    {
    init();
    sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

    mem_bind(0, 2);
    mem_bind(2, 2);

    file = popen(buf, "r");
    fread(retbuf, sizeof(retbuf), 1, file);
    Assert("4,2", retbuf, "case5", __LINE__);

    fin();
    }

    /*
    AAAA
    PPPPNNNNXXXX
    might become
    PPPPPPPPPPPP 6
    */
    void case6(void)
    {
    init();
    sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

    mem_bind(0, 2);
    mem_bind(4, 2);
    mem_bind(2, 2);

    file = popen(buf, "r");
    fread(retbuf, sizeof(retbuf), 1, file);
    Assert("6", retbuf, "case6", __LINE__);

    fin();
    }

    /*
    AAAA
    PPPPNNNNXXXX
    might become
    PPPPPPPPXXXX 7
    */
    void case7(void)
    {
    init();
    sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

    mem_bind(0, 2);
    mem_interleave(4, 2);
    mem_bind(2, 2);

    file = popen(buf, "r");
    fread(retbuf, sizeof(retbuf), 1, file);
    Assert("4,2", retbuf, "case7", __LINE__);

    fin();
    }

    /*
    AAAA
    PPPPNNNNXXXX
    might become
    PPPPNNNNNNNN 8
    */
    void case8(void)
    {
    init();
    sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

    mem_bind(0, 2);
    mem_interleave(4, 2);
    mem_interleave(2, 2);

    file = popen(buf, "r");
    fread(retbuf, sizeof(retbuf), 1, file);
    Assert("2,4", retbuf, "case8", __LINE__);

    fin();
    }

    void case_n(void)
    {
    init();
    sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6);

    /* make redundunt mappings [0][1234][34][7] */
    mmap(mmap_addr + pagesize*4, pagesize*2, PROT_READ|PROT_WRITE,
    MAP_FIXED|MAP_SHARED, mapped_fd, pagesize*3);

    /* Expect to do nothing. */
    mem_unbind(2, 2);

    file = popen(buf, "r");
    fread(retbuf, sizeof(retbuf), 1, file);
    Assert("4,2", retbuf, "case_n", __LINE__);

    fin();
    }

    int main(int argc, char** argv)
    {
    case4();
    case5();
    case6();
    case7();
    case8();
    case_n();

    return 0;
    }
    =============================================================

    Signed-off-by: KOSAKI Motohiro
    Acked-by: Johannes Weiner
    Cc: Minchan Kim
    Cc: Caspar Zhang
    Cc: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: [3.1.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • The new iso bandwidth calculation code accidentally has broken support
    for bulk mode cameras. This has broken the following drivers:
    finepix, jeilinj, ovfx2, ov534, ov534_9, se401, sq905, sq905c, sq930x,
    stv0680, vicam.

    Thix patch fixes this. Fix tested with: se401, sq905, sq905c, stv0680 & vicam
    cams.

    Signed-off-by: Hans de Goede
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Linus Torvalds

    Hans de Goede
     
  • Fixing wrong register offset which is used to retrieve the number of buttons
    attached to the hardware.

    Signed-off-by: Tai-hwa Liang
    Signed-off-by: Dmitry Torokhov

    Tai-hwa Liang
     
  • Ceph attempts to use the dcache to satisfy negative lookups and readdir
    when the entire directory contents are in cache. Disable this behavior
    until lingering bugs in this code are shaken out; we'll re-enable these
    hooks once things are fully stable.

    Signed-off-by: Sage Weil

    Sage Weil
     

29 Dec, 2011

1 commit

  • Commit 5e081591 "block: warn if tag is greater than real_max_depth"
    cleaned up blk_queue_end_tag() to warn when the tag is truly invalid
    (greater than real_max_depth). However, it changed behavior in the tag <
    max_depth case to not end the request. Leading to triggering of
    BUG_ON(blk_queued_rq(rq)) in the request completion path:

    http://marc.info/?l=linux-kernel&m=132204370518629&w=2

    In order to allow blk_queue_resize_tags() to shrink the tag space
    blk_queue_end_tag() must always complete tags with a value less than
    real_max_depth regardless of the current max_depth. The comment about
    "handling the shrink case" seems to be what prompted changes in this
    space, so remove it and BUG on all invalid tags (made even simpler by
    Matthew's suggestion to use an unsigned compare).

    Signed-off-by: Dan Williams
    Cc: Tao Ma
    Cc: Matthew Wilcox
    Reported-by: Meelis Roos
    Reported-by: Ed Nadolski
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Dan Williams
     

28 Dec, 2011

4 commits

  • SROMC static memory mapping is included in the common s5p initialization
    code. Hence, remove the duplicated SROMC static memory mapping for EXYNOS.

    Signed-off-by: Thomas Abraham
    Cc: stable@kernel.org
    Signed-off-by: Kukjin Kim

    Thomas Abraham
     
  • Following is happened when CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
    is selected without building of s3c2410-iotiming.c file:

    arch/arm/mach-s3c2440/built-in.o:(.data+0x38c): undefined reference to `s3c2410_iotiming_debugfs

    Basically, the CONFIG_S3C2410_IOTIMING is not selected for
    MACH_MINI2440. Because the s3c2410-iotiming.c is not ever
    compiled and enabling CONFIG_CPU_FREQ_S3C24XX_DEBUGFS option
    caused undefined reference to s3c2410_iotiming_debugfs()
    defined in that file. The s3c2410_iotiming_debugfs defined
    as NULL for this case.

    Signed-off-by: Denis Kuzmenko
    Cc: stable@kernel.org
    [kgene.kim@samsung.com: removed useless changes]
    Signed-off-by: Kukjin Kim

    Denis Kuzmenko
     
  • If bind is fail when bind is called after set PACKET_FANOUT
    sock option, the dev refcnt will leak.

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Redhat Bugzilla: Bug 727875 - TCO_EN bit is disabled by TCO driver

    The previous patch breaks reset watchdog behaviour on the older hardware.
    It is therefor better to make sure that the behaviour for older hardware ( Do not turn off SMI clearing watchdog.
    turn_SMI_watchdog_clear_off=1 -> Turn off SMI clearing watchdog when iTCO_version=1
    (ICHO till ICH5 + 6300ESB only)
    turn_SMI_watchdog_clear_off=2 -> Turn off SMI clearing watchdog.

    Signed-off-by: Wim Van Sebroeck

    Wim Van Sebroeck
     

27 Dec, 2011

5 commits

  • RC6 fails again.

    > I found my system freeze mostly during starting up X and KDE. Sometimes it
    > works for some minutes, sometimes it freezes immediatly. When the freeze
    > happens, everything is dead (even the reset button does not work, I need to
    > power cycle).

    > I disabled RC6, and my system runs wonderfully.

    > The system is a Z68 Pro board with Sandybridge i5-2500K processor, 8
    > GB of RAM and UEFI firmware.

    Reported-by: Kai Krakow
    Signed-off-by: Keith Packard
    Signed-off-by: Linus Torvalds

    Keith Packard
     
  • Semaphores still cause problems on some machines:

    > From Udo Steinberg:
    >
    > With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of
    > text scroll in an xterm, such as when extracting a tar archive. Such as this
    > one (note the timestamps):
    >
    > I can reproduce it fairly easily with something
    > as simple as:
    >
    > while true; do dmesg; done

    This patch turns them off on SNB while leaving them on for IVB.

    Reported-by: Udo Steinberg
    Cc: Daniel Vetter
    Cc: Eugeni Dodonov
    Signed-off-by: Keith Packard
    Signed-off-by: Linus Torvalds

    Keith Packard
     
  • * 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: PPC: e500: include linux/export.h
    KVM: PPC: fix kvmppc_start_thread() for CONFIG_SMP=N
    KVM: PPC: protect use of kvmppc_h_pr
    KVM: PPC: move compute_tlbie_rb to book3s_64 common header
    KVM: Don't automatically expose the TSC deadline timer in cpuid
    KVM: Device assignment permission checks
    KVM: Remove ability to assign a device without iommu support
    KVM: x86: Prevent starting PIT timers in the absence of irqchip support

    Linus Torvalds
     
  • post 3.2-rc7 pull request

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
    MAINTAINERS: firewire git URL update

    Linus Torvalds
     
  • Bruce Fields notes that commit 778fc546f749 ("locks: fix tracking of
    inprogress lease breaks") introduced a possible error pointer
    dereference on failure to allocate memory. locks_conflict() will
    dereference the passed-in new lease lock structure that may be an error pointer.

    This means an open (without O_NONBLOCK set) on a file with a lease
    applied (generally only done when Samba or nfsd (with v4) is running)
    could crash if a kmalloc() fails.

    So instead of playing games with IS_ERROR() all over the place, just
    check the allocation failure early. That makes the code more
    straightforward, and avoids this possible bad pointer dereference.

    Based-on-patch-by: J. Bruce Fields
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 Dec, 2011

9 commits

  • This patch makes use of the set_memory_x() kernel API in order
    to make necessary BIOS calls to source NMIs.

    This is needed for SLES11 SP2 and the latest upstream kernel as it appears
    the NX Execute Disable has grown in its control.

    Signed-off by: Thomas Mingarelli
    Signed-off by: Wim Van Sebroeck
    Cc: stable@kernel.org

    Mingarelli, Thomas
     
  • The AMBA ID table is marked as __initdata, yet it is referenced by the
    driver struct which is not. This causes a (somewhat unhelpful) section
    mismatch warning:

    WARNING: drivers/watchdog/sp805_wdt.o(.data+0x4c): Section mismatch in
    reference from the variable sp805_wdt_driver to the (unknown
    reference) .init.data:(unknown)

    Fix this by removing the annotation.

    Signed-off-by: Nick Bowler
    Signed-off-by: Wim Van Sebroeck

    Nick Bowler
     
  • The state holders used in the PM path of the drivers report as
    unused variables when compiling without CONFIG_PM so let's
    move them inside CONFIG_PM.

    Signed-off-by: Linus Walleij
    Signed-off-by: Wim Van Sebroeck

    Linus Walleij
     
  • This is required for THIS_MODULE. We recently stopped acquiring
    it via some other header.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Currently kvmppc_start_thread() tries to wake other SMT threads via
    xics_wake_cpu(). Unfortunately xics_wake_cpu only exists when
    CONFIG_SMP=Y so when compiling with CONFIG_SMP=N we get:

    arch/powerpc/kvm/built-in.o: In function `.kvmppc_start_thread':
    book3s_hv.c:(.text+0xa1e0): undefined reference to `.xics_wake_cpu'

    The following should be fine since kvmppc_start_thread() shouldn't
    called to start non-zero threads when SMP=N since threads_per_core=1.

    Signed-off-by: Michael Neuling
    Signed-off-by: Alexander Graf

    Michael Neuling
     
  • kvmppc_h_pr is only available if CONFIG_KVM_BOOK3S_64_PR.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Alexander Graf

    Andreas Schwab
     
  • compute_tlbie_rb is only used on ppc64 and cannot be compiled on ppc32.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Alexander Graf

    Andreas Schwab
     
  • Unlike all of the other cpuid bits, the TSC deadline timer bit is set
    unconditionally, regardless of what userspace wants.

    This is broken in several ways:
    - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
    deadline timer feature, a guest that uses the feature will break
    - live migration to older host kernels that don't support the TSC deadline
    timer will cause the feature to be pulled from under the guest's feet;
    breaking it
    - guests that are broken wrt the feature will fail.

    Fix by not enabling the feature automatically; instead report it to userspace.
    Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
    will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
    KVM_GET_SUPPORTED_CPUID.

    Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

    [avi: add the KVM_CAP + documentation]

    Reported-by: Alexey Zaytsev
    Tested-by: Alexey Zaytsev
    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • Only allow KVM device assignment to attach to devices which:

    - Are not bridges
    - Have BAR resources (assume others are special devices)
    - The user has permissions to use

    Assigning a bridge is a configuration error, it's not supported, and
    typically doesn't result in the behavior the user is expecting anyway.
    Devices without BAR resources are typically chipset components that
    also don't have host drivers. We don't want users to hold such devices
    captive or cause system problems by fencing them off into an iommu
    domain. We determine "permission to use" by testing whether the user
    has access to the PCI sysfs resource files. By default a normal user
    will not have access to these files, so it provides a good indication
    that an administration agent has granted the user access to the device.

    [Yang Bai: add missing #include]
    [avi: fix comment style]

    Signed-off-by: Alex Williamson
    Signed-off-by: Yang Bai
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     

25 Dec, 2011

2 commits

  • This option has no users and it exposes a security hole that we
    can allow devices to be assigned without iommu protection. Make
    KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option.

    Signed-off-by: Alex Williamson
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     
  • User space may create the PIT and forgets about setting up the irqchips.
    In that case, firing PIT IRQs will crash the host:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
    IP: [] kvm_set_irq+0x30/0x170 [kvm]
    ...
    Call Trace:
    [] pit_do_work+0x51/0xd0 [kvm]
    [] process_one_work+0x111/0x4d0
    [] worker_thread+0x152/0x340
    [] kthread+0x7e/0x90
    [] kernel_thread_helper+0x4/0x10

    Prevent this by checking the irqchip mode before starting a timer. We
    can't deny creating the PIT if the irqchips aren't set up yet as
    current user land expects this order to work.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Marcelo Tosatti

    Jan Kiszka