27 Oct, 2010

40 commits

  • Use the new {max,min}3 macros to save some cycles and bytes on the stack.
    This patch substitutes trivial nested macros with their counterpart.

    Signed-off-by: Hagen Paul Pfeifer
    Cc: Joe Perches
    Cc: Ingo Molnar
    Cc: Hartley Sweeten
    Cc: Russell King
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Herbert Xu
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hagen Paul Pfeifer
     
  • Introduce two additional min/max macros to compare three operands. This
    will save some cycles as well as some bytes on the stack and last but not
    least more pleasing as macro nesting.

    [akpm@linux-foundation.org: fix warnings]
    Signed-off-by: Hagen Paul Pfeifer
    Cc: Joe Perches
    Cc: Ingo Molnar
    Cc: Hartley Sweeten
    Cc: Russell King
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Herbert Xu
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hagen Paul Pfeifer
     
  • Some code cleanups for hostfs.

    Signed-off-by: Richard Weinberger
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • This patch removes __do_IRQ() from user mode linux. __do_IRQ is deprecated.

    Signed-off-by: Richard Weinberger
    Cc: Jeff Dike
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • With glibc 2.11 or later that was built with --enable-multi-arch, the UML
    link fails with undefined references to __rel_iplt_start and similar
    symbols. In recent binutils, the default linker script defines these
    symbols (see ld --verbose). Fix the UML linker scripts to match the new
    defaults for these sections.

    Signed-off-by: Roland McGrath
    Cc: Jeff Dike
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • I think that it's better to detect DMA misuse at build time rather than
    calling BUG_ON. Architectures that can't do DMA need to define
    CONFIG_NO_DMA.

    Thanks to Sam Ravnborg for explaining how CONFIG_NO_DMA and CONFIG_HAS_DMA
    work:

    http://marc.info/?l=linux-kernel&m=128359913825550&w=2

    HAS_DMA is defined like this:

    config HAS_DMA
    boolean
    depends on !NO_DMA
    default y

    So to set HAS_DMA to true an arch should do:
    1) Do not define NO_DMA
    2) Define NO_DMA abd set it to 'n'

    Must archs - including um - used principle 1).

    In the um case we want to say that we do NOT have any DMA.
    This can be done in two ways.
    a) define NO_DMA and set it to 'y'
    b) redefine HAS_DMA and set it to 'n'.

    The patch you provided used principle b) where other archs use principle a).
    So I suggest you should use principle a) for um too.

    Signed-off-by: FUJITA Tomonori
    Cc: Miklos Szeredi
    Cc: Jeff Dike
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     
  • T2 are the only alpha SMP systems that do HAE switching at runtime, which
    is fundamentally racy on SMP. This patch limits MMIO space on T2 to HAE0
    only, like we did on MCPCIA (rawhide) long ago. This leaves us with only
    112 Mb of PCI MMIO (128 Mb HAE aperture minus 16 Mb reserved for EISA),
    but since linux PCI allocations are reasonably tight, it should be enough
    for sane hardware configurations.

    Also, fix a typo in MCPCIA_FROB_MMIO macro which shouldn't call set_hae()
    if MCPCIA_ONE_HAE_WINDOW is defined. It's more for correctness, as
    set_hae() is a no-op anyway in that case.

    Signed-off-by: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Richard Henderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ivan Kokshaysky
     
  • Signed-off-by: FUJITA Tomonori
    Cc: Ivan Kokshaysky
    Cc: Richard Henderson
    Cc: Matt Turner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     
  • Structure info is copied to userland with some padding fields unitialized.
    It leads to leaking of stack memory.

    [akpm@linux-foundation.org: remove now-unneeded zeroing of info->hi_ireqfreq]
    Signed-off-by: Vasiliy Kulikov
    Cc: Clemens Ladisch
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • $./hpet_example info /dev/hpet
    -hpet: executing info
    hpet_info: hi_irqfreq 0x0 hi_flags 0x0 hi_hpet 0 hi_timer 2

    Signed-off-by: Jaswinder Singh Rajput
    Cc: Clemens Ladisch
    Cc: "Venkatesh Pallipadi (Venki)"
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     
  • Fix the following style problems:

    WARNING: Use #include instead of
    WARNING: Use #include instead of
    ERROR: code indent should use tabs where possible
    ERROR: do not initialise statics to 0 or NULL

    Signed-off-by: Jaswinder Singh Rajput
    Cc: Clemens Ladisch
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     
  • Jaswinder Singh Rajput wrote:
    > By executing Documentation/timers/hpet_example.c
    >
    > for polling, I requested for 3 iterations but it seems iteration work
    > for only 2 as first expired time is always very small.
    >
    > # ./hpet_example poll /dev/hpet 10 3
    > -hpet: executing poll
    > hpet_poll: info.hi_flags 0x0
    > hpet_poll: expired time = 0x13
    > hpet_poll: revents = 0x1
    > hpet_poll: data 0x1
    > hpet_poll: expired time = 0x1868c
    > hpet_poll: revents = 0x1
    > hpet_poll: data 0x1
    > hpet_poll: expired time = 0x18645
    > hpet_poll: revents = 0x1
    > hpet_poll: data 0x1

    Clearing the HPET interrupt enable bit disables interrupt generation
    but does not disable the timer, so the interrupt status bit will still
    be set when the timer elapses. If another interrupt arrives before
    the timer has been correctly programmed (due to some other device on
    the same interrupt line, or CONFIG_DEBUG_SHIRQ), this results in an
    extra unwanted interrupt event because the status bit is likely to be
    set from comparator matches that happened before the device was opened.

    Therefore, we have to ensure that the interrupt status bit is and
    stays cleared until we actually program the timer.

    Signed-off-by: Clemens Ladisch
    Reported-by: Jaswinder Singh Rajput
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: john stultz
    Cc: Bob Picco
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Clemens Ladisch
     
  • When the initialization code in hpet finds a memory resource and does not
    find an IRQ, it does not unmap the memory resource previously mapped.

    There are buggy BIOSes which report resources exactly like this and what
    is worse the memory region bases point to normal RAM. This normally would
    not matter since the space is not touched. But when PAT is turned on,
    ioremap causes the page to be uncached and sets this bit in page->flags.

    Then when the page is about to be used by the allocator, it is reported
    as:

    BUG: Bad page state in process md5sum pfn:3ed00
    page:ffffea0000dbd800 count:0 mapcount:0 mapping:(null) index:0x0
    page flags: 0x20000001000000(uncached)
    Pid: 7956, comm: md5sum Not tainted 2.6.34-12-desktop #1
    Call Trace:
    [] bad_page+0xb1/0x100
    [] prep_new_page+0x1a5/0x1c0
    [] get_page_from_freelist+0x3a1/0x640
    [] __alloc_pages_nodemask+0x10f/0x6b0
    ...

    In this particular case:

    1) HPET returns 3ed00000 as memory region base, but it is not in
    reserved ranges reported by the BIOS (excerpt):
    BIOS-e820: 0000000000100000 - 00000000af6cf000 (usable)
    BIOS-e820: 00000000af6cf000 - 00000000afdcf000 (reserved)

    2) there is no IRQ resource reported by HPET method. On the other
    hand, the Intel HPET specs (1.0a) says (3.2.5.1):
    _CRS (
    // Report 1K of memory consumed by this Timer Block
    memory range consumed
    // Optional: only used if BIOS allocates Interrupts [1]
    IRQs consumed
    )

    [1] For case where Timer Block is configured to consume IRQ0/IRQ8 AND
    Legacy 8254/Legacy RTC hardware still exists, the device objects
    associated with 8254 & RTC devices should not report IRQ0/IRQ8 as
    "consumed resources".

    So in theory we should check whether if it is the case and use those
    interrupts instead.

    Anyway the address reported by the BIOS here is bogus, so non-presence
    of IRQ doesn't mean the "optional" part in point 2).

    Since I got no reply previously, fix this by simply unmapping the space
    when IRQ is not found and memory region was mapped previously. It would
    be probably more safe to walk the resources again and unmap appropriately
    depending on type. But as we now use only ioremap for both 2 memory
    resource types, it is not necessarily needed right now.

    Addresses https://bugzilla.novell.com/show_bug.cgi?id=629908

    Reported-by: Olaf Hering
    Signed-off-by: Jiri Slaby
    Acked-by: Clemens Ladisch
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Simple code for reducing list_empty(&source) check.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • If not_managed is true all pages will be putback to lru, so break the loop
    earlier to skip other pages isolate.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • __test_page_isolated_in_pageblock() returns 1 if all pages in the range
    are isolated, so fix the comment. Variable `pfn' will be initialised in
    the following loop so remove it.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • page_order() is called by memory hotplug's user interface to check the
    section is removable or not. (is_mem_section_removable())

    It calls page_order() withoug holding zone->lock.
    So, even if the caller does

    if (PageBuddy(page))
    ret = page_order(page) ...
    The caller may hit BUG_ON().

    For fixing this, there are 2 choices.
    1. add zone->lock.
    2. remove BUG_ON().

    is_mem_section_removable() is used for some "advice" and doesn't need to
    be 100% accurate. This is_removable() can be called via user program..
    We don't want to take this important lock for long by user's request. So,
    this patch removes BUG_ON().

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Acked-by: Michal Hocko
    Acked-by: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Add missing spin_lock() of the page_table_lock before an error return in
    hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return.

    Signed-off-by: Dean Nelson
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     
  • The vma returned by find_vma does not necessarily include the target
    address. If this happens the code tries to follow a page outside of any
    vma and returns ENOENT instead of EFAULT.

    Signed-off-by: Gleb Natapov
    Acked-by: Christoph Lameter
    Cc: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gleb Natapov
     
  • System management wants to subscribe to changes in swap configuration.
    Make /proc/swaps pollable like /proc/mounts.

    [akpm@linux-foundation.org: document proc_poll_event]
    Signed-off-by: Kay Sievers
    Acked-by: Greg KH
    Cc: Jonathan Corbet
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kay Sievers
     
  • Add vzalloc() and vzalloc_node() to encapsulate the
    vmalloc-then-memset-zero operation.

    Use __GFP_ZERO to zero fill the allocated memory.

    Signed-off-by: Dave Young
    Cc: Christoph Lameter
    Acked-by: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • Reported-by: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • I had to go back to a 2.6.20 tree to work out why we're adding a
    number-of-inodes into a number-of-pages count. Restore the lost comment.

    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Introduce ___GFP_* masks in order for gfp_t to not be mixed with plain
    integers which causes a lot of warnings like the following:

    warning: restricted gfp_t degrades to integer

    Signed-off-by: Namhyung Kim
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • This removes following warning from sparse:

    mm/vmstat.c:466:5: warning: symbol 'fragmentation_index' was not declared. Should it be static?

    [akpm@linux-foundation.org: move the include to top-of-file]
    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Declare 'bdi_pending_list' and 'tag_pages_for_writeback()' to remove
    following sparse warnings:

    mm/backing-dev.c:46:1: warning: symbol 'bdi_pending_list' was not declared. Should it be static?
    mm/page-writeback.c:825:6: warning: symbol 'tag_pages_for_writeback' was not declared. Should it be static?

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • s_start() and s_stop() grab/release vmlist_lock but were missing proper
    annotations. Add them.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Rename redundant 'tmp' to fix following sparse warnings:

    mm/vmalloc.c:296:34: warning: symbol 'tmp' shadows an earlier one
    mm/vmalloc.c:293:24: originally declared here

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Make anon_vma_chain_free() static. It is called only in rmap.c and the
    corresponding alloc function is already static.

    Signed-off-by: Namhyung Kim
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The page_check_address() conditionally grabs *@ptlp in case of returning
    non-NULL. Rename and wrap it using __cond_lock() removes following
    warnings from sparse:

    mm/rmap.c:472:9: warning: context imbalance in 'page_mapped_in_vma' - unexpected unlock
    mm/rmap.c:524:9: warning: context imbalance in 'page_referenced_one' - unexpected unlock
    mm/rmap.c:706:9: warning: context imbalance in 'page_mkclean_one' - unexpected unlock
    mm/rmap.c:1066:9: warning: context imbalance in 'try_to_unmap_one' - unexpected unlock

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The page_lock_anon_vma() conditionally grabs RCU and anon_vma lock but
    page_unlock_anon_vma() releases them unconditionally. This leads sparse
    to complain about context imbalance. Annotate them.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The follow_pte() conditionally grabs *@ptlp in case of returning 0.
    Rename and wrap it using __cond_lock() removes following warnings:

    mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock
    mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The do_wp_page() releases @ptl but was missing proper annotation. Add it.
    This removes following warnings from sparse:

    mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock
    mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The get_locked_pte() conditionally grabs 'ptl' in case of returning
    non-NULL. This leads sparse to complain about context imbalance. Rename
    and wrap it using __cond_lock() to make sparse happy.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • This removes following warning from sparse:

    mm/page_alloc.c:1934:9: warning: restricted gfp_t degrades to integer

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • 'end' shadows earlier one and is not necessary at all. Remove it and use
    'pos' instead. This removes following sparse warnings:

    mm/filemap.c:2180:24: warning: symbol 'end' shadows an earlier one
    mm/filemap.c:2132:25: originally declared here

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • access_error() already takes error_code as an argument, so there is
    no need for an additional write flag.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: Nick Piggin
    Acked-by: Wu Fengguang
    Cc: Ying Han
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This change reduces mmap_sem hold times that are caused by waiting for
    disk transfers when accessing file mapped VMAs.

    It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
    site wants mmap_sem to be released if blocking on a pending disk transfer.
    In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
    do_page_fault() will then re-acquire mmap_sem and retry the page fault.

    It is expected that the retry will hit the same page which will now be
    cached, and thus it will complete with a low mmap_sem hold time.

    Tests:

    - microbenchmark: thread A mmaps a large file and does random read accesses
    to the mmaped area - achieves about 55 iterations/s. Thread B does
    mmap/munmap in a loop at a separate location - achieves 55 iterations/s
    before, 15000 iterations/s after.

    - We are seeing related effects in some applications in house, which show
    significant performance regressions when running without this change.

    [akpm@linux-foundation.org: fix warning & crash]
    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Acked-by: Linus Torvalds
    Cc: Nick Piggin
    Reviewed-by: Wu Fengguang
    Cc: Ying Han
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: "H. Peter Anvin"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Introduce a single location where filemap_fault() locks the desired page.
    There used to be two such places, depending if the initial find_get_page()
    was successful or not.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Acked-by: Linus Torvalds
    Cc: Nick Piggin
    Reviewed-by: Wu Fengguang
    Cc: Ying Han
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Reorder structure anon_vma to remove alignment padding on 64 builds when
    (CONFIG_KSM || CONFIG_MIGRATION).
    This will shrink the size of the anon_vma structure from 40 to 32 bytes
    & allow more objects per slab in its kmem_cache.

    Under slub the objects in the anon_vma kmem_cache will then be 40 bytes
    with 102 objects per slab. (On v2.6.36 without this patch,the size is 48
    bytes and 85 objects/slab.)

    Signed-off-by: Richard Kennedy
    Reviewed-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Kennedy