04 Feb, 2010

1 commit

  • hrtimers callbacks are always done from hardirq context, either the
    jiffy tick interrupt or the hrtimer device interrupt.

    [ there is currently one exception that can still call a hrtimer
    callback from softirq, but even in that case this will still
    work correctly. ]

    Reported-by: Wei Yongjun
    Signed-off-by: Peter Zijlstra
    Cc: Yury Polyanskiy
    Tested-by: Wei Yongjun
    Acked-by: David S. Miller
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

03 Feb, 2010

31 commits

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    kernel/cred.c: use kmem_cache_free

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
    connector: Delete buggy notification code.
    be2net: use eq-id to calculate cev-isr reg offset
    Bluetooth: Use the control channel for raw HID reports
    Bluetooth: Add DFU driver for Atheros Bluetooth chipset AR3011
    Bluetooth: Redo checks in IRQ handler for shared IRQ support
    Bluetooth: Fix memory leak in L2CAP
    Bluetooth: Remove double free of SKB pointer in L2CAP
    cdc_ether: Partially revert "usbnet: Set link down initially ..."
    be2net: Fix memset() arg ordering.
    bonding: bond_open error return value
    ixgbe: if ixgbe_copy_dcb_cfg is going to fail learn about it early
    ixgbe: set the correct DCB bit for pg tx settings
    igbvf: fix issue w/ mapped_as_page being left set after unmap
    drivers/net: ks8851_mll ethernet network driver
    be2net: Bug fix to support newer generation of BE ASIC
    starfire: clean up properly if firmware loading fails
    mac80211: fix NULL pointer dereference when ftrace is enabled
    netfilter: ctnetlink: fix expectation mask dump
    ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure
    ath9k: fix eeprom INI values override for 2GHz-only cards
    ...

    Linus Torvalds
     
  • This is the counterpart to cba767175becadc5c4016cceb7bfdd2c7fe722f4
    ("pktcdvd: remove broken dev_t export of class devices"). Device is not
    registered using dev_t, so it should not be destroyed using device_destroy
    which looks up the device by dev_t. This will fail and adding the device
    again will fail with the "duplicate name" error. This is fixed using
    device_unregister instead of device_destroy.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Cc: Kay Sievers
    Cc: Peter Osterlund
    Cc: Al Viro
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thadeu Lima de Souza Cascardo
     
  • Newly added memory can not be accessed via /dev/mem, because we do not
    update the variables high_memory, max_pfn and max_low_pfn.

    Add a function update_end_of_memory_vars() to update these variables for
    64-bit kernels.

    [akpm@linux-foundation.org: simplify comment]
    Signed-off-by: Shaohui Zheng
    Cc: Andi Kleen
    Cc: Li Haicheng
    Reviewed-by: Wu Fengguang
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohui Zheng
     
  • init_fault_attr_entries() should be init_fault_attr_dentries().

    cleanup_fault_attr_entries() should be cleanup_fault_attr_dentries().

    Signed-off-by: Anton Blanchard
    Acked-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • hugetlb_sysfs_add_hstate is called by hugetlb_register_node directly
    during init and also indirectly via sysfs after init.

    This patch removes the __init tag from hugetlb_sysfs_add_hstate.

    Signed-off-by: Jeff Mahoney
    Cc: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     
  • Move the ulite_console_setup to the .devinit section since it might be
    called on probe, which is in devinit. Fixes the crash below where the
    uartlite hw is probed after the .init section is freed from the kernel.

    uartlite: ttyUL0 at MMIO 0xc8000100 (irq = 30) is a uartlite
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] ulite_console_setup+0x6f/0xa8
    *pdpt = 0000000036fb0001 *pde = 0000000000000000
    Oops: 0000 [#1] PREEMPT SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:1f.1/host0/uevent
    Modules linked in: puffin(+) serio_raw

    Pid: 151, comm: modprobe Not tainted (2.6.31.5-1.0.b1-b1 #1) POULSBO
    EIP: 0060:[] EFLAGS: 00010246 CPU: 0
    EIP is at ulite_console_setup+0x6f/0xa8
    EAX: c16ec824 EBX: c16ec824 ECX: c176719f EDX: 00000000
    ESI: 00000000 EDI: c17b42c4 EBP: f6fd1cf0 ESP: f6fd1cd8
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process modprobe (pid: 151, ti=f6fd0000 task=f6fa1020 task.ti=f6fd0000)
    Stack:
    c1031f51 00000000 00000000 00000246 c182237c f7742000 f6fd1d5c c11fd316
    c16ec85c f77420d4 0000001e 00000000 00000000 c1633e78 4f494d4d 63783020
    30303038 00303031 f6fd1d3c c10e0786 f6fd1d48 00000000 f6fd1d48 00000000
    Call Trace:
    [] ? register_console+0xf6/0x1fc
    [] ? uart_add_one_port+0x237/0x2bb
    [] ? sysfs_add_one+0x13/0xd3
    [] ? sysfs_do_create_link+0xba/0xfc
    [] ? ulite_probe+0x198/0x1eb
    [] ? platform_drv_probe+0xc/0xe
    [] ? driver_probe_device+0x79/0x105
    [] ? __device_attach+0x28/0x30
    [] ? bus_for_each_drv+0x3d/0x67
    [] ? device_attach+0x44/0x58
    [] ? __device_attach+0x0/0x30
    [] ? bus_probe_device+0x1f/0x34
    [] ? device_add+0x385/0x4c0
    [] ? _write_unlock+0x8/0x1f
    [] ? platform_device_add+0xd9/0x11c
    [] ? mfd_add_devices+0x165/0x1bc
    [] ? puffin_probe+0x2d0/0x390 [puffin]
    [] ? pci_match_device+0xa0/0xa7
    [] ? local_pci_probe+0xe/0x10
    [] ? pci_device_probe+0x43/0x66
    [] ? driver_probe_device+0x79/0x105
    [] ? __driver_attach+0x43/0x5f
    [] ? bus_for_each_dev+0x3d/0x67
    [] ? driver_attach+0x14/0x16
    [] ? __driver_attach+0x0/0x5f
    [] ? bus_add_driver+0xf9/0x220
    [] ? driver_register+0x8b/0xeb
    [] ? __pci_register_driver+0x43/0x9f
    [] ? __blocking_notifier_call_chain+0x40/0x4c
    [] ? puffin_init+0x0/0x48 [puffin]
    [] ? puffin_init+0x17/0x48 [puffin]
    [] ? do_one_initcall+0x4c/0x131
    [] ? sys_init_module+0xa7/0x1b7
    [] ? syscall_call+0x7/0xb
    Code: 6e 74 00 00 00 92 33 00 00 18 00 0e 01 73 79 6e 63 65 2d 72 65 67 69 73 74 72 79 0c 00 49 32
    00 00 14 00 09 01 61 6c 73 61 2d 69 66 6f 00 00 00 42 37 00 00 10 00 07 01 6b 69 6c 6c 61 6c 6c
    EIP: [] ulite_console_setup+0x6f/0xa8 SS:ESP 0068:f6fd1cd8
    CR2: 0000000000000000

    Signed-off-by: Richard Röjfors
    Acked-by: Peter Korsgaard
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Röjfors
     
  • The probe function passes a pointer to a struct fb_info to
    platform_set_drvdata(), so don't interpret the return value of
    platform_get_drvdata() as a pointer to struct imxfb_info.

    The original imxfb_info *fbi backlight_power was NULL but in imxfb_suspend
    it was 4 resulting in an oops as imxfb_suspend calls
    imxfb_disable_controller(fbi) which in turn has

    if (fbi->backlight_power)
    fbi->backlight_power(0);

    Signed-off-by: Uwe Kleine-König
    Acked-by: Sascha Hauer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • In cgroup_create(), if alloc_css_id() returns failure, the errno is not
    propagated to userspace, so mkdir will fail silently.

    To trigger this bug, we mount blkio (or memory subsystem), and create more
    then 65534 cgroups. (The number of cgroups is limited to 65535 if a
    subsystem has use_id == 1)

    # mount -t cgroup -o blkio xxx /mnt
    # for ((i = 0; i < 65534; i++)); do mkdir /mnt/$i; done
    # mkdir /mnt/65534
    (should return ENOSPC)
    #

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Acked-by: Paul Menage
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • When I use markup_oops.pl parse a x8664 oops, I got:

    objdump: --start-address: bad number: NaN
    No matching code found
    This is because:
    main::(./m.pl:228): open(FILE, "objdump -dS --adjust-vma=$vmaoffset --start-address=$decodestart --stop-address=$decodestop $filename |") || die "Cannot start objdump";
    DB p $decodestart
    NaN

    This NaN is from:
    main::(./m.pl:176): my $decodestart = Math::BigInt->from_hex("0x$target") - Math::BigInt->from_hex("0x$func_offset");
    DB p $func_offset
    0x175

    There is already a "0x" in $func_offset, another 0x makes it a NaN.

    The $func_offset is from line:

    if ($line =~ /RIP: 0010:\[\\] \[\\] ([a-zA-Z0-9\_]+)\+(0x[0-9a-f]+)\/0x[a-f0-9]/) {
    $function = $1;
    $func_offset = $2;
    }

    I make a patch to change "(0x[0-9a-f]+)\/0x[a-f0-9]/)" to "0x([0-9a-f]+)\/0x[a-f0-9]/)".

    Signed-off-by: Hui Zhu
    Cc: Arjan van de Ven
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     
  • When git has been set to always use color in .gitconfig then I get the
    warning message

    Bad divisor in main::vcs_assign: 0

    This is caused by vcs_file_signoffs not matching any commits due to the
    pattern not understand the colour codes. Fix this by telling git log to
    never use colour.

    Signed-off-by: Richard Kennedy
    Acked-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Kennedy
     
  • write_kmem() used to assume vwrite() always return the full buffer length.
    However now vwrite() could return 0 to indicate memory hole. This
    creates a bug that "buf" is not advanced accordingly.

    Fix it to simply ignore the return value, hence the memory hole.

    Signed-off-by: Wu Fengguang
    Cc: Andi Kleen
    Cc: Benjamin Herrenschmidt
    Cc: Christoph Lameter
    Cc: Ingo Molnar
    Cc: Tejun Heo
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Otherwise vmalloc_to_page() will BUG().

    This also makes the kmem read/write implementation aligned with mem(4):
    "References to nonexistent locations cause errors to be returned." Here we
    return -ENXIO (inspired by Hugh) if no bytes have been transfered to/from
    user space, otherwise return partial read/write results.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Wu Fengguang
    Cc: Greg Kroah-Hartman
    Cc: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • The cache alias problem will happen if the changes of user shared mapping
    is not flushed before copying, then user and kernel mapping may be mapped
    into two different cache line, it is impossible to guarantee the coherence
    after iov_iter_copy_from_user_atomic. So the right steps should be:

    flush_dcache_page(page);
    kmap_atomic(page);
    write to page;
    kunmap_atomic(page);
    flush_dcache_page(page);

    More precisely, we might create two new APIs flush_dcache_user_page and
    flush_dcache_kern_page to replace the two flush_dcache_page accordingly.

    Here is a snippet tested on omap2430 with VIPT cache, and I think it is
    not ARM-specific:

    int val = 0x11111111;
    fd = open("abc", O_RDWR);
    addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    *(addr+0) = 0x44444444;
    tmp = *(addr+0);
    *(addr+1) = 0x77777777;
    write(fd, &val, sizeof(int));
    close(fd);

    The results are not always 0x11111111 0x77777777 at the beginning as expected. Sometimes we see 0x44444444 0x77777777.

    Signed-off-by: Anfei
    Cc: Russell King
    Cc: Miklos Szeredi
    Cc: Nick Piggin
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    anfei zhou
     
  • Fix kfifo kernel-doc warnings:

    Warning(kernel/kfifo.c:361): No description found for parameter 'total'
    Warning(kernel/kfifo.c:402): bad line: @ @lenout: pointer to output variable with copied data
    Warning(kernel/kfifo.c:412): No description found for parameter 'lenout'

    Signed-off-by: Randy Dunlap
    Cc: Stefani Seibold
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Add missing braces for multiline 'if' statements in fm3130_probe.

    Signed-off-by: Sergey Matyukevich
    Signed-off-by: Alessandro Zummo
    Cc: Sergey Lapin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Matyukevich
     
  • Fix the kernel oops when dev_dbg is called with mx3_fbi->txd == NULL

    Fix the late initialisation of mx3fb->backlight_level. If not, in the
    chain of function started by init_fb_chan(), in __blank() call
    sdc_set_brightness(mx3fb, mx3fb->backlight_level) that will shut down the
    CONTRAST PWM output.

    Signed-off-by: Alberto Panizzo
    Acked-by: Guennadi Liakhovetski gmx.de>
    Cc: Sascha Hauer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alberto Panizzo
     
  • Eric Paris located a bug in idr. With IDR_BITS of 6, it grows to three
    layers when id 4096 is first allocated. When that happens, idr wraps
    incorrectly and searches the idr array ignoring the high bits. The
    following test code from Eric demonstrates the bug nicely.

    #include
    #include
    #include

    static DEFINE_IDR(test_idr);

    int init_module(void)
    {
    int ret, forty95, forty96;
    void *addr;

    /* add 2 entries both with 4095 as the start address */
    again1:
    if (!idr_pre_get(&test_idr, GFP_KERNEL))
    return -ENOMEM;
    ret = idr_get_new_above(&test_idr, (void *)4095, 4095, &forty95);
    if (ret) {
    if (ret == -EAGAIN)
    goto again1;
    return ret;
    }
    if (forty95 != 4095)
    printk(KERN_ERR "hmmm, forty95=%d\n", forty95);

    again2:
    if (!idr_pre_get(&test_idr, GFP_KERNEL))
    return -ENOMEM;
    ret = idr_get_new_above(&test_idr, (void *)4096, 4095, &forty96);
    if (ret) {
    if (ret == -EAGAIN)
    goto again2;
    return ret;
    }
    if (forty96 != 4096)
    printk(KERN_ERR "hmmm, forty96=%d\n", forty96);

    /* try to find the 2 entries, noticing that 4096 broke */
    addr = idr_find(&test_idr, forty95);
    if ((int)addr != forty95)
    printk(KERN_ERR "hmmm, after find forty95=%d addr=%d\n", forty95, (int)addr);
    addr = idr_find(&test_idr, forty96);
    if ((int)addr != forty96)
    printk(KERN_ERR "hmmm, after find forty96=%d addr=%d\n", forty96, (int)addr);
    /* really weird, the entry which should be at 4096 is actually at 0!! */
    addr = idr_find(&test_idr, 0);
    if ((int)addr)
    printk(KERN_ERR "found an entry at id=0 for addr=%d\n", (int)addr);

    idr_remove(&test_idr, forty95);
    idr_remove(&test_idr, forty96);

    return 0;
    }

    void cleanup_module(void)
    {
    }

    MODULE_AUTHOR("Eric Paris ");
    MODULE_DESCRIPTION("Simple idr test");
    MODULE_LICENSE("GPL");

    This happens because when sub_alloc() back tracks it doesn't always do it
    step-by-step while the over-the-limit detection assumes step-by-step
    backtracking. The logic in sub_alloc() looks like the following.

    restart:
    clear pa[top level + 1] for end cond detection
    l = top level
    while (true) {
    search for empty slot at this level
    if (not found) {
    push id to the next possible value
    l++
    A: if (pa[l] is clear)
    failed, return asking caller to grow the tree
    if (going up 1 level gives more slots to search)
    continue the while loop above with the incremented l
    else
    C: goto restart
    }
    adjust id accordingly to the found slot
    if (l == 0)
    return found id;
    create lower level if not there yet
    record pa[l] and l--
    }

    Test A is the fail exit condition but this assumes that failure is
    propagated upwared one level at a time but the B optimization path breaks
    the assumption and restarts the whole thing with a start value which is
    above the possible limit with the current layers. sub_alloc() assumes the
    start id value is inside the limit when called and test A is the only exit
    condition check, so it ends up searching for empty slot while ignoring
    high set bit.

    So, for 4095->4096 test, level0 search fails but pa[1] contains a valid
    pointer. However, going up 1 level wouldn't give any more empty slot so
    it takes C and when the whole thing restarts nobody notices the high bit
    set beyond the top level.

    This patch fixes the bug by changing the fail exit condition check to full
    id limit check.

    Based-on-patch-from: Eric Paris
    Reported-by: Eric Paris
    Signed-off-by: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • On Tue, Feb 02, 2010 at 02:57:14PM -0800, Greg KH (gregkh@suse.de) wrote:
    > > There are at least two ways to fix it: using a big cannon and a small
    > > one. The former way is to disable notification registration, since it is
    > > not used by anyone at all. Second way is to check whether calling
    > > process is root and its destination group is -1 (kind of priveledged
    > > one) before command is dispatched to workqueue.
    >
    > Well if no one is using it, removing it makes the most sense, right?
    >
    > No objection from me, care to make up a patch either way for this?

    Getting it is not used, let's drop support for notifications about
    (un)registered events from connector.
    Another option was to check credentials on receiving, but we can always
    restore it without bugs if needed, but genetlink has a wider code base
    and none complained, that userspace can not get notification when some
    other clients were (un)registered.

    Kudos for Sebastian Krahmer , who found a bug in the
    code.

    Signed-off-by: Evgeniy Polyakov
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Evgeniy Polyakov
     
  • Free memory allocated using kmem_cache_zalloc using kmem_cache_free rather
    than kfree.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    expression x,E,c;
    @@

    x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
    ... when != x = E
    when != &x
    ?-kfree(x)
    +kmem_cache_free(c,x)
    //

    Signed-off-by: Julia Lawall
    Acked-by: David Howells
    Cc: James Morris
    Cc: Steve Dickson
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: James Morris

    Julia Lawall
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cfq-iosched: Do not idle on async queues
    blk-cgroup: Fix potential deadlock in blk-cgroup
    block: fix bugs in bio-integrity mempool usage
    block: fix bio_add_page for non trivial merge_bvec_fn case
    drbd: null dereference bug
    drbd: fix max_segment_size initialization

    Linus Torvalds
     
  • Improve handling of fragmented per-CPU vmaps. We previously don't free
    up per-CPU maps until all its addresses have been used and freed. So
    fragmented blocks could fill up vmalloc space even if they actually had
    no active vmap regions within them.

    Add some logic to allow all CPUs to have these blocks purged in the case
    of failure to allocate a new vm area, and also put some logic to trim
    such blocks of a current CPU if we hit them in the allocation path (so
    as to avoid a large build up of them).

    Christoph reported some vmap allocation failures when using the per CPU
    vmap APIs in XFS, which cannot be reproduced after this patch and the
    previous bug fix.

    Cc: linux-mm@kvack.org
    Cc: stable@kernel.org
    Tested-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    --
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • RCU list walking of the per-cpu vmap cache was broken. It did not use
    RCU primitives, and also the union of free_list and rcu_head is
    obviously wrong (because free_list is indeed the list we are RCU
    walking).

    While we are there, remove a couple of unused fields from an earlier
    iteration.

    These APIs aren't actually used anywhere, because of problems with the
    XFS conversion. Christoph has now verified that the problems are solved
    with these patches. Also it is an exported interface, so I think it
    will be good to be merged now (and Christoph wants to get the XFS
    changes into their local tree).

    Cc: stable@kernel.org
    Cc: linux-mm@kvack.org
    Tested-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    --
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    random: Remove unused inode variable
    crypto: padlock-sha - Add import/export support
    random: drop weird m_time/a_time manipulation

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
    GFS2: Use GFP_NOFS for alloc structure
    GFS2: Fix previous patch
    GFS2: Don't withdraw on partial rindex entries
    GFS2: Fix refcnt leak on gfs2_follow_link() error path

    Linus Torvalds
     
  • * 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Fix access to released memory in clk_debugfs_register_one()
    sh: Fix access to released memory in dwarf_unwinder_cleanup()
    usb: r8a66597-hdc disable interrupts fix
    spi: spi_sh_msiof: Fixed data sampling on the correct edge

    Linus Torvalds
     
  • * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
    MIPS: 64-bit: Detect virtual memory size
    MIPS: AR7: Fix USB slave mem range typo
    MIPS: Alchemy: Fix dbdma ring destruction memory debugcheck.

    Linus Torvalds
     
  • Commit 221af7f87b9 ("Split 'flush_old_exec' into two functions") split
    the function at the point of no return - ie right where there were no
    more error cases to check. That made sense from a technical standpoint,
    but when we then also combined it with the actual personality setting
    going in between flush_old_exec() and setup_new_exec(), it needs to be a
    bit more careful.

    In particular, we need to make sure that we really flush the old
    personality bits in the 'flush' stage, rather than later in the 'setup'
    stage, since otherwise we might be flushing the _new_ personality state
    that we're just setting up.

    So this moves the flags and personality flushing (and 'flush_thread()',
    which is the arch-specific function that generally resets lazy FP state
    etc) of the old process into flush_old_exec(), so that it doesn't affect
    any state that execve() is setting up for the new process environment.

    This was reported by Michal Simek as breaking his Microblaze qemu
    environment.

    Reported-and-tested-by: Michal Simek
    Cc: Peter Anvin
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Few weeks back, Shaohua Li had posted similar patch. I am reposting it
    with more test results.

    This patch does two things.

    - Do not idle on async queues.

    - It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
    Currently, we seem to driving queue depth of 1 always for WRITES. This is
    true even if there is only one write queue in the system and all the logic
    of infinite queue depth in case of single busy queue as well as slowly
    increasing queue depth based on last delayed sync request does not seem to
    be kicking in at all.

    This patch will allow deeper WRITE queue depths (subjected to the other
    WRITE queue depth contstraints like cfq_quantum and last delayed sync
    request).

    Shaohua Li had reported getting more out of his SSD. For me, I have got
    one Lun exported from an HP EVA and when pure buffered writes are on, I
    can get more out of the system. Following are test results of pure
    buffered writes (with end_fsync=1) with vanilla and patched kernel. These
    results are average of 3 sets of run with increasing number of threads.

    AVERAGE[bufwfs][vanilla]
    -------
    job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
    --- --- -- ------------ ----------- ------------- -----------
    bufwfs 3 1 0 0 95349 474141
    bufwfs 3 2 0 0 100282 806926
    bufwfs 3 4 0 0 109989 2.7301e+06
    bufwfs 3 8 0 0 116642 3762231
    bufwfs 3 16 0 0 118230 6902970

    AVERAGE[bufwfs] [patched kernel]
    -------
    bufwfs 3 1 0 0 270722 404352
    bufwfs 3 2 0 0 206770 1.06552e+06
    bufwfs 3 4 0 0 195277 1.62283e+06
    bufwfs 3 8 0 0 260960 2.62979e+06
    bufwfs 3 16 0 0 299260 1.70731e+06

    I also ran buffered writes along with some sequential reads and some
    buffered reads going on in the system on a SATA disk because the potential
    risk could be that we should not be driving queue depth higher in presence
    of sync IO going to keep the max clat low.

    With some random and sequential reads going on in the system on one SATA
    disk I did not see any significant increase in max clat. So it looks like
    other WRITE queue depth control logic is doing its job. Here are the
    results.

    AVERAGE[brr, bsr, bufw together] [vanilla]
    -------
    job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
    --- --- -- ------------ ----------- ------------- -----------
    brr 3 1 850 546345 0 0
    bsr 3 1 14650 729543 0 0
    bufw 3 1 0 0 23908 8274517

    brr 3 2 981.333 579395 0 0
    bsr 3 2 14149.7 1175689 0 0
    bufw 3 2 0 0 21921 1.28108e+07

    brr 3 4 898.333 1.75527e+06 0 0
    bsr 3 4 12230.7 1.40072e+06 0 0
    bufw 3 4 0 0 19722.3 2.4901e+07

    brr 3 8 900 3160594 0 0
    bsr 3 8 9282.33 1.91314e+06 0 0
    bufw 3 8 0 0 18789.3 23890622

    AVERAGE[brr, bsr, bufw mixed] [patched kernel]
    -------
    job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
    --- --- -- ------------ ----------- ------------- -----------
    brr 3 1 837 417973 0 0
    bsr 3 1 14357.7 591275 0 0
    bufw 3 1 0 0 24869.7 8910662

    brr 3 2 1038.33 543434 0 0
    bsr 3 2 13351.3 1205858 0 0
    bufw 3 2 0 0 18626.3 13280370

    brr 3 4 913 1.86861e+06 0 0
    bsr 3 4 12652.3 1430974 0 0
    bufw 3 4 0 0 15343.3 2.81305e+07

    brr 3 8 890 2.92695e+06 0 0
    bsr 3 8 9635.33 1.90244e+06 0 0
    bufw 3 8 0 0 17200.3 24424392

    So looks like it might make sense to include this patch.

    Thanks
    Vivek

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Linux kernel 2.6.32 and later allocate address space from the top of the
    kernel virtual memory address space.

    This patch implements virtual memory size detection for 64 bit MIPS CPUs
    to avoid resulting crashes.

    Signed-off-by: Guenter Roeck
    Cc: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/935/
    Reviewed-by: David Daney
    Signed-off-by: Ralf Baechle

    Guenter Roeck
     
  • David S. Miller
     

02 Feb, 2010

8 commits