21 Jan, 2011

40 commits

  • Convert the irq chips to the new functions and use proper flow
    handlers. handle_level_irq is appropriate.

    Signed-off-by: Thomas Gleixner
    Cc: Hirokazu Takata
    Cc: Paul Mundt

    Thomas Gleixner
     
  • Convert the irq chips to the new functions and use proper flow
    handlers. handle_level_irq is appropriate.

    Signed-off-by: Thomas Gleixner
    Cc: Hirokazu Takata
    Cc: Paul Mundt

    Thomas Gleixner
     
  • Convert the irq chip to the new functions and use proper flow
    handlers. handle_level_irq is appropriate.

    Signed-off-by: Thomas Gleixner
    Cc: Hirokazu Takata
    Cc: Paul Mundt

    Thomas Gleixner
     
  • Convert the irq chips to the new functions and use proper flow
    handlers. handle_level_irq is appropriate.

    Signed-off-by: Thomas Gleixner
    Cc: Hirokazu Takata
    Cc: Paul Mundt

    Thomas Gleixner
     
  • Convert the irq chips to the new functions and use proper flow
    handlers. handle_level_irq is appropriate.

    Signed-off-by: Thomas Gleixner
    Cc: Hirokazu Takata
    Cc: Paul Mundt

    Thomas Gleixner
     
  • Convert the irq chips to the new functions and use proper flow
    handlers. handle_level_irq is appropriate.

    Signed-off-by: Thomas Gleixner
    Cc: Hirokazu Takata
    Cc: Paul Mundt

    Thomas Gleixner
     
  • Convert the irq chips to the new functions and use proper flow
    handlers. handle_level_irq is appropriate.

    Signed-off-by: Thomas Gleixner
    Cc: Hirokazu Takata
    Cc: Paul Mundt

    Thomas Gleixner
     
  • The irq descriptors are already initialized by the generic
    code. Remove the redundant init code and set the irq chip with the
    proper accessor function.

    Signed-off-by: Thomas Gleixner
    Cc: Hirokazu Takata
    Cc: Paul Mundt

    Thomas Gleixner
     
  • Use the generic irq Kconfig. Select GENERIC_HARDIRQS_NO_DEPRECATED as
    we have converted all irq_chip functions. Fix the fallout in
    show_interrupts().

    Signed-off-by: Thomas Gleixner
    Cc: Mikael Starvik

    Thomas Gleixner
     
  • Convert the irq chip functions and install handle_simple_irq for each
    interrupt to get rid of __do_IRQ()

    Signed-off-by: Thomas Gleixner
    Cc: Mikael Starvik

    Thomas Gleixner
     
  • Convert the irq_chip functions and install handle_simple_irq for each
    interrupt. This converts V10 to the flow handling and lets us remove
    __do_IRQ().

    Signed-off-by: Thomas Gleixner
    Cc: Mikael Starvik

    Thomas Gleixner
     
  • Use the wrapper around __do_IRQ() so we can convert V10 and V32
    seperately.

    Signed-off-by: Thomas Gleixner
    Cc: Mikael Starvik

    Thomas Gleixner
     
  • Switch to the generic irq Kconfig. h8300 has all irq chips converted
    to the new functions, so select the GENERIC_HARDIRQS_NO_DEPRECATED
    switch as well. Fixup the resulting fallout in show_interrupts().

    Signed-off-by: Thomas Gleixner
    Cc: Yoshinori Sato
    Cc: Paul Mundt

    Thomas Gleixner
     
  • __do_IRQ is deprecated so h8300 needs to be converted to proper flow
    handling. The irq chip is simple and does not required any
    mask/ack/eoi functions, so we can use handle_simple_irq.

    Signed-off-by: Thomas Gleixner
    Cc: Yoshinori Sato
    Cc: Paul Mundt

    Thomas Gleixner
     
  • No functional change, just straight forward conversion.

    Signed-off-by: Thomas Gleixner
    Cc: Yoshinori Sato
    Cc: Paul Mundt

    Thomas Gleixner
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
    lockdep: Move early boot local IRQ enable/disable status to init/main.c

    Linus Torvalds
     
  • It turns out that some device drivers map pages from the ACPI NVS region
    during resume using ioremap(), which conflicts with ioremap_cache() used
    for mapping those pages by the NVS save/restore code in nvs.c.

    Make the NVS pages mapped by the code in nvs.c be unmapped before device
    drivers' resume routines run.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Commit ca9b600be38c ("ACPI / PM: Make suspend_nvs_save() use
    acpi_os_map_memory()") attempted to prevent the code in osl.c and nvs.c
    from using different ioremap() variants by making the latter use
    acpi_os_map_memory() for mapping the NVS pages. However, that also
    requires acpi_os_unmap_memory() to be used for unmapping them, which
    causes synchronize_rcu() to be executed many times in a row
    unnecessarily and introduces substantial delays during resume on some
    systems.

    Instead of using acpi_os_map_memory() for mapping the NVS pages in nvs.c
    introduce acpi_os_ioremap() calling ioremap_cache() and make the code in
    both osl.c and nvs.c use it.

    Reported-by: Jeff Chua
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • * akpm:
    kernel/smp.c: consolidate writes in smp_call_function_interrupt()
    kernel/smp.c: fix smp_call_function_many() SMP race
    memcg: correctly order reading PCG_USED and pc->mem_cgroup
    backlight: fix 88pm860x_bl macro collision
    drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking
    MAINTAINERS: update Atmel AT91 entry
    mm: fix truncate_setsize() comment
    memcg: fix rmdir, force_empty with THP
    memcg: fix LRU accounting with THP
    memcg: fix USED bit handling at uncharge in THP
    memcg: modify accounting function for supporting THP better
    fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio
    mm: compaction: prevent division-by-zero during user-requested compaction
    mm/vmscan.c: remove duplicate include of compaction.h
    memblock: fix memblock_is_region_memory()
    thp: keep highpte mapped until it is no longer needed
    kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT

    Linus Torvalds
     
  • We have to test the cpu mask in the interrupt handler before checking the
    refs, otherwise we can start to follow an entry before its deleted and
    find it partially initailzed for the next trip. Presently we also clear
    the cpumask bit before executing the called function, which implies
    getting write access to the line. After the function is called we then
    decrement refs, and if they go to zero we then unlock the structure.

    However, this implies getting write access to the call function data
    before and after another the function is called. If we can assert that no
    smp_call_function execution function is allowed to enable interrupts, then
    we can move both writes to after the function is called, hopfully allowing
    both writes with one cache line bounce.

    On a 256 thread system with a kernel compiled for 1024 threads, the time
    to execute testcase in the "smp_call_function_many race" changelog was
    reduced by about 30-40ms out of about 545 ms.

    I decided to keep this as WARN because its now a buggy function, even
    though the stack trace is of no value -- a simple printk would give us the
    information needed.

    Raw data:

    Without patch:
    ipi_test startup took 1219366ns complete 539819014ns total 541038380ns
    ipi_test startup took 1695754ns complete 543439872ns total 545135626ns
    ipi_test startup took 7513568ns complete 539606362ns total 547119930ns
    ipi_test startup took 13304064ns complete 533898562ns total 547202626ns
    ipi_test startup took 8668192ns complete 544264074ns total 552932266ns
    ipi_test startup took 4977626ns complete 548862684ns total 553840310ns
    ipi_test startup took 2144486ns complete 541292318ns total 543436804ns
    ipi_test startup took 21245824ns complete 530280180ns total 551526004ns

    With patch:
    ipi_test startup took 5961748ns complete 500859628ns total 506821376ns
    ipi_test startup took 8975996ns complete 495098924ns total 504074920ns
    ipi_test startup took 19797750ns complete 492204740ns total 512002490ns
    ipi_test startup took 14824796ns complete 487495878ns total 502320674ns
    ipi_test startup took 11514882ns complete 494439372ns total 505954254ns
    ipi_test startup took 8288084ns complete 502570774ns total 510858858ns
    ipi_test startup took 6789954ns complete 493388112ns total 500178066ns

    #include
    #include
    #include /* sched clock */

    #define ITERATIONS 100

    static void do_nothing_ipi(void *dummy)
    {
    }

    static void do_ipis(struct work_struct *dummy)
    {
    int i;

    for (i = 0; i < ITERATIONS; i++)
    smp_call_function(do_nothing_ipi, NULL, 1);

    printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
    }

    static struct work_struct work[NR_CPUS];

    static int __init testcase_init(void)
    {
    int cpu;
    u64 start, started, done;

    start = local_clock();
    for_each_online_cpu(cpu) {
    INIT_WORK(&work[cpu], do_ipis);
    schedule_work_on(cpu, &work[cpu]);
    }
    started = local_clock();
    for_each_online_cpu(cpu)
    flush_work(&work[cpu]);
    done = local_clock();
    pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n",
    started-start, done-started, done-start);

    return 0;
    }

    static void __exit testcase_exit(void)
    {
    }

    module_init(testcase_init)
    module_exit(testcase_exit)
    MODULE_LICENSE("GPL");
    MODULE_AUTHOR("Anton Blanchard");

    Signed-off-by: Milton Miller
    Cc: Anton Blanchard
    Cc: Ingo Molnar
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milton Miller
     
  • I noticed a failure where we hit the following WARN_ON in
    generic_smp_call_function_interrupt:

    if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
    continue;

    data->csd.func(data->csd.info);

    refs = atomic_dec_return(&data->refs);
    WARN_ON(refs < 0); cpumask sees and
    clears bit in cpumask
    might be using old or new fn!
    decrements refs below 0

    set data->refs (too late!)

    The important thing to note is since the interrupt handler walks a
    potentially stale call_function.queue without any locking, then another
    cpu can view the percpu *data structure at any time, even when the owner
    is in the process of initialising it.

    The following test case hits the WARN_ON 100% of the time on my PowerPC
    box (having 128 threads does help :)

    #include
    #include

    #define ITERATIONS 100

    static void do_nothing_ipi(void *dummy)
    {
    }

    static void do_ipis(struct work_struct *dummy)
    {
    int i;

    for (i = 0; i < ITERATIONS; i++)
    smp_call_function(do_nothing_ipi, NULL, 1);

    printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
    }

    static struct work_struct work[NR_CPUS];

    static int __init testcase_init(void)
    {
    int cpu;

    for_each_online_cpu(cpu) {
    INIT_WORK(&work[cpu], do_ipis);
    schedule_work_on(cpu, &work[cpu]);
    }

    return 0;
    }

    static void __exit testcase_exit(void)
    {
    }

    module_init(testcase_init)
    module_exit(testcase_exit)
    MODULE_LICENSE("GPL");
    MODULE_AUTHOR("Anton Blanchard");

    I tried to fix it by ordering the read and the write of ->cpumask and
    ->refs. In doing so I missed a critical case but Paul McKenney was able
    to spot my bug thankfully :) To ensure we arent viewing previous
    iterations the interrupt handler needs to read ->refs then ->cpumask then
    ->refs _again_.

    Thanks to Milton Miller and Paul McKenney for helping to debug this issue.

    [miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ]
    [miltonm@bga.com: remove excess tests]
    Signed-off-by: Anton Blanchard
    Signed-off-by: Milton Miller
    Cc: Ingo Molnar
    Cc: "Paul E. McKenney"
    Cc: [2.6.32+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • The placement of the read-side barrier is confused: the writer first
    sets pc->mem_cgroup, then PCG_USED. The read-side barrier has to be
    between testing PCG_USED and reading pc->mem_cgroup.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Fix collision with kernel-supplied #define:

    drivers/video/backlight/88pm860x_bl.c:24:1: warning: "CURRENT_MASK" redefined
    arch/x86/include/asm/page_64_types.h:6:1: warning: this is the location of the previous definition

    Signed-off-by: Randy Dunlap
    Cc: Haojian Zhuang
    Cc: Richard Purdie
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Replicate changes made to drivers/leds/ledtrig-backlight.c.

    Cc: Paul Mundt
    Cc: Richard Purdie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Janusz Krzysztofik
     
  • Add two co-maintainers and update the entry with new information.

    Signed-off-by: Nicolas Ferre
    Acked-by: Andrew Victor
    Acked-by: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Ferre
     
  • Contrary to what the comment says, truncate_setsize() should be called
    *before* filesystem truncated blocks.

    Signed-off-by: Jan Kara
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Now, when THP is enabled, memcg's rmdir() function is broken because
    move_account() for THP page is not supported.

    This will cause account leak or -EBUSY issue at rmdir().
    This patch fixes the issue by supporting move_account() THP pages.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • memory cgroup's LRU stat should take care of size of pages because
    Transparent Hugepage inserts hugepage into LRU. If this value is the
    number wrong, memory reclaim will not work well.

    Note: only head page of THP's huge page is linked into LRU.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Now, under THP:

    at charge:
    - PageCgroupUsed bit is set to all page_cgroup on a hugepage.
    ....set to 512 pages.
    at uncharge
    - PageCgroupUsed bit is unset on the head page.

    So, some pages will remain with "Used" bit.

    This patch fixes that Used bit is set only to the head page.
    Used bits for tail pages will be set at splitting if necessary.

    This patch adds this lock order:
    compound_lock() -> page_cgroup_move_lock().

    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • mem_cgroup_charge_statisics() was designed for charging a page but now, we
    have transparent hugepage. To fix problems (in following patch) it's
    required to change the function to get the number of pages as its
    arguments.

    The new function gets following as argument.
    - type of page rather than 'pc'
    - size of page which is accounted.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • When using devices that support max_segments > BIO_MAX_PAGES (256), direct
    IO tries to allocate a bio with more pages than allowed, which leads to an
    oops in dio_bio_alloc(). Clamp the request to the supported maximum, and
    change dio_bio_alloc() to reflect that bio_alloc() will always return a
    bio when called with __GFP_WAIT and a valid number of vectors.

    [akpm@linux-foundation.org: remove redundant BUG_ON()]
    Signed-off-by: David Dillow
    Reviewed-by: Jeff Moyer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Dillow
     
  • Up until 3e7d344 ("mm: vmscan: reclaim order-0 and use compaction instead
    of lumpy reclaim"), compaction skipped calculating the fragmentation index
    of a zone when compaction was explicitely requested through the procfs
    knob.

    However, when compaction_suitable was introduced, it did not come with an
    extra check for order == -1, set on explicit compaction requests, and
    passed this order on to the fragmentation index calculation, where it
    overshifts the number of requested pages, leading to a division by zero.

    This patch makes sure that order == -1 is recognized as the flag it is
    rather than passing it along as valid order parameter.

    [akpm@linux-foundation.org: add comment, per Mel]
    Signed-off-by: Johannes Weiner
    Reviewed-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • memblock_is_region_memory() uses reserved memblocks to search for the
    given region, while it should use the memory memblocks.

    I encountered the problem with OMAP's framebuffer ram allocation.
    Normally the ram is allocated dynamically, and this function is not
    called. However, if we want to pass the framebuffer from the bootloader
    to the kernel (to retain the boot image), this function is used to check
    the validity of the kernel parameters for the framebuffer ram area.

    Signed-off-by: Tomi Valkeinen
    Acked-by: Yinghai Lu
    Cc: Benjamin Herrenschmidt
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomi Valkeinen
     
  • Two users reported THP-related crashes on 32-bit x86 machines. Their oops
    reports indicated an invalid pte, and subsequent code inspection showed
    that the highpte is actually used after unmap.

    The fix is to unmap the pte only after all operations against it are
    finished.

    Signed-off-by: Johannes Weiner
    Reported-by: Ilya Dryomov
    Reported-by: werner
    Cc: Andrea Arcangeli
    Tested-by: Ilya Dryomov
    Tested-by: Steven Rostedt
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
    is used to configure any non-standard kernel with a much larger scope than
    only small devices.

    This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
    references to the option throughout the kernel. A new CONFIG_EMBEDDED
    option is added that automatically selects CONFIG_EXPERT when enabled and
    can be used in the future to isolate options that should only be
    considered for embedded systems (RISC architectures, SLOB, etc).

    Calling the option "EXPERT" more accurately represents its intention: only
    expert users who understand the impact of the configuration changes they
    are making should enable it.

    Reviewed-by: Ingo Molnar
    Acked-by: David Woodhouse
    Signed-off-by: David Rientjes
    Cc: Greg KH
    Cc: "David S. Miller"
    Cc: Jens Axboe
    Cc: Arnd Bergmann
    Cc: Robin Holt
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • * 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
    tty: update MAINTAINERS file due to driver movement
    tty: move drivers/serial/ to drivers/tty/serial/
    tty: move hvc drivers to drivers/tty/hvc/

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched, cgroup: Use exit hook to avoid use-after-free crash
    sched: Fix signed unsigned comparison in check_preempt_tick()
    sched: Replace rq->bkl_count with rq->rq_sched_info.bkl_count
    sched, autogroup: Fix CONFIG_RT_GROUP_SCHED sched_setscheduler() failure
    sched: Display autogroup names in /proc/sched_debug
    sched: Reinstate group names in /proc/sched_debug
    sched: Update effective_load() to use global share weights

    Linus Torvalds
     
  • * 'xen/xenbus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
    xenbus: Fix memory leak on release
    xenbus: avoid zero returns from read()
    xenbus: add missing wakeup in concurrent read/write
    xenbus: allow any xenbus command over /proc/xen/xenbus
    xenfs/xenbus: report partial reads/writes correctly

    Linus Torvalds
     
  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: mangle existing header for SMB_COM_NT_CANCEL
    cifs: remove code for setting timeouts on requests
    [CIFS] cifs: reconnect unresponsive servers
    cifs: set up recurring workqueue job to do SMB echo requests
    cifs: add ability to send an echo request
    cifs: add cifs_call_async
    cifs: allow for different handling of received response
    cifs: clean up sync_mid_result
    cifs: don't reconnect server when we don't get a response
    cifs: wait indefinitely for responses
    cifs: Use mask of ACEs for SID Everyone to calculate all three permissions user, group, and other
    cifs: Fix regression during share-level security mounts (Repost)
    [CIFS] Update cifs version number
    cifs: move mid result processing into common function
    cifs: move locked sections out of DeleteMidQEntry and AllocMidQEntry
    cifs: clean up accesses to midCount
    cifs: make wait_for_free_request take a TCP_Server_Info pointer
    cifs: no need to mark smb_ses_list as cifs_demultiplex_thread is exiting
    cifs: don't fail writepages on -EAGAIN errors
    CIFS: Fix oplock break handling (try #2)

    Linus Torvalds