30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

07 Mar, 2010

1 commit

  • Presently, per-mm statistics counter is defined by macro in sched.h

    This patch modifies it to
    - defined in mm.h as inlinf functions
    - use array instead of macro's name creation.

    This patch is for reducing patch size in future patch to modify
    implementation of per-mm counter.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

28 Sep, 2009

1 commit


03 Apr, 2009

1 commit

  • The calculation of the value nr in do_xip_mapping_read is incorrect. If
    the copy required more than one iteration in the do while loop the copies
    variable will be non-zero. The maximum length that may be passed to the
    call to copy_to_user(buf+copied, xip_mem+offset, nr) is len-copied but the
    check only compares against (nr > len).

    This bug is the cause for the heap corruption Carsten has been chasing
    for so long:

    *** glibc detected *** /bin/bash: free(): invalid next size (normal): 0x00000000800e39f0 ***
    ======= Backtrace: =========
    /lib64/libc.so.6[0x200000b9b44]
    /lib64/libc.so.6(cfree+0x8e)[0x200000bdade]
    /bin/bash(free_buffered_stream+0x32)[0x80050e4e]
    /bin/bash(close_buffered_stream+0x1c)[0x80050ea4]
    /bin/bash(unset_bash_input+0x2a)[0x8001c366]
    /bin/bash(make_child+0x1d4)[0x8004115c]
    /bin/bash[0x8002fc3c]
    /bin/bash(execute_command_internal+0x656)[0x8003048e]
    /bin/bash(execute_command+0x5e)[0x80031e1e]
    /bin/bash(execute_command_internal+0x79a)[0x800305d2]
    /bin/bash(execute_command+0x5e)[0x80031e1e]
    /bin/bash(reader_loop+0x270)[0x8001efe0]
    /bin/bash(main+0x1328)[0x8001e960]
    /lib64/libc.so.6(__libc_start_main+0x100)[0x200000592a8]
    /bin/bash(clearerr+0x5e)[0x8001c092]

    With this bug fix the commit 0e4a9b59282914fe057ab17027f55123964bc2e2
    "ext2/xip: refuse to change xip flag during remount with busy inodes" can
    be removed again.

    Cc: Carsten Otte
    Cc: Nick Piggin
    Cc: Jared Hulbert
    Cc:
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     

07 Jan, 2009

1 commit

  • Remove page_remove_rmap()'s vma arg, which was only for the Eeek message.
    And remove the BUG_ON(page_mapcount(page) == 0) from CONFIG_DEBUG_VM's
    page_dup_rmap(): we're trying to be more resilient about that than BUGs.

    Signed-off-by: Hugh Dickins
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

21 Aug, 2008

3 commits

  • XIP can call into get_xip_mem concurrently with the same file,offset with
    create=1. This usually maps down to get_block, which expects the page
    lock to prevent such a situation. This causes ext2 to explode for one
    reason or another.

    Serialise those calls for the moment. For common usages today, I suspect
    get_xip_mem rarely is called to create new blocks. In future as XIP
    technologies evolve we might need to look at which operations require
    scalability, and rework the locking to suit.

    Signed-off-by: Nick Piggin
    Cc: Jared Hulbert
    Acked-by: Carsten Otte
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • XIP has a race between sparse pages being inserted into page tables, and
    sparse pages being zapped when its time to put a non-sparse page in.

    What can happen is that a process can be left with a dangling sparse page
    in a MAP_SHARED mapping, while the rest of the world sees the non-sparse
    version. Ie. data corruption.

    Guard these operations with a seqlock, making fault-in-sparse-pages the
    slowpath, and try-to-unmap-sparse-pages the fastpath.

    Signed-off-by: Nick Piggin
    Cc: Jared Hulbert
    Acked-by: Carsten Otte
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • There is a race with dirty page accounting where a page may not properly
    be accounted for.

    clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.

    page_mkclean walks the rmaps for that page, and for each one it cleans and
    write protects the pte if it was dirty. It uses page_check_address to
    find the pte. That function has a shortcut to avoid the ptl if the pte is
    not present. Unfortunately, the pte can be switched to not-present then
    back to present by other code while holding the page table lock -- this
    should not be a signal for page_mkclean to ignore that pte, because it may
    be dirty.

    For example, powerpc64's set_pte_at will clear a previously present pte
    before setting it to the desired value. There may also be other code in
    core mm or in arch which do similar things.

    The consequence of the bug is loss of data integrity due to msync, and
    loss of dirty page accounting accuracy. XIP's __xip_unmap could easily
    also be unreliable (depending on the exact XIP locking scheme), which can
    lead to data corruption.

    Fix this by having an option to always take ptl to check the pte in
    page_check_address.

    It's possible to retain this optimization for page_referenced and
    try_to_unmap.

    Signed-off-by: Nick Piggin
    Cc: Jared Hulbert
    Cc: Carsten Otte
    Cc: Hugh Dickins
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin