23 Sep, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

03 Jun, 2009

1 commit


08 May, 2009

1 commit

  • Tim Starling reported that crashdump will panic with kernel compiled
    with CONFIG_KEXEC_JUMP due to null pointer deference in
    machine_kexec_32.c: machine_kexec(), when deferencing
    kexec_image. Refering to:

    http://bugzilla.kernel.org/show_bug.cgi?id=13265

    This patch fixes the BUG via replacing global variable reference:
    kexec_image in machine_kexec() with local variable reference: image,
    which is more appropriate, and will not be null.

    Same BUG is in machine_kexec_64.c too, so fixed too in the same way.

    [ Impact: fix crash on kexec ]

    Reported-by: Tim Starling
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Huang Ying
     

11 Mar, 2009

3 commits


04 Feb, 2009

1 commit

  • Impact: reduce kernel BSS size by 7 pages, improve code readability

    Two page tables are used in current x86_64 kexec implementation. One
    is used to jump from kernel virtual address to identity map address,
    the other is used to map all physical memory. In fact, on x86_64,
    there is no conflict between kernel virtual address space and physical
    memory space, so just one page table is sufficient. The page table
    pages used to map control page are dynamically allocated to save
    memory if kexec image is not loaded. ASM code used to map control page
    is replaced by C code too.

    Signed-off-by: Huang Ying
    Signed-off-by: H. Peter Anvin

    Huang Ying
     

27 Jul, 2008

1 commit

  • This patch provides an enhancement to kexec/kdump. It implements the
    following features:

    - Backup/restore memory used by the original kernel before/after
    kexec.

    - Save/restore CPU state before/after kexec.

    The features of this patch can be used as a general method to call program in
    physical mode (paging turning off). This can be used to call BIOS code under
    Linux.

    kexec-tools needs to be patched to support kexec jump. The patches and
    the precompiled kexec can be download from the following URL:

    source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
    patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
    binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

    Usage example of calling some physical mode code and return:

    1. Compile and install patched kernel with following options selected:

    CONFIG_X86_32=y
    CONFIG_KEXEC=y
    CONFIG_PM=y
    CONFIG_KEXEC_JUMP=y

    2. Build patched kexec-tool or download the pre-built one.

    3. Build some physical mode executable named such as "phy_mode"

    4. Boot kernel compiled in step 1.

    5. Load physical mode executable with /sbin/kexec. The shell command
    line can be as follow:

    /sbin/kexec --load-preserve-context --args-none phy_mode

    6. Call physical mode executable with following shell command line:

    /sbin/kexec -e

    Implementation point:

    To support jumping without reserving memory. One shadow backup page (source
    page) is allocated for each page used by kexeced code image (destination
    page). When do kexec_load, the image of kexeced code is loaded into source
    pages, and before executing, the destination pages and the source pages are
    swapped, so the contents of destination pages are backupped. Before jumping
    to the kexeced code image and after jumping back to the original kernel, the
    destination pages and the source pages are swapped too.

    C ABI (calling convention) is used as communication protocol between
    kernel and called code.

    A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
    indicate that the loaded kernel image is used for jumping back.

    Now, only the i386 architecture is supported.

    Signed-off-by: Huang Ying
    Acked-by: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Pavel Machek
    Cc: Nigel Cunningham
    Cc: "Rafael J. Wysocki"
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     

14 Jul, 2008

1 commit


08 Jul, 2008

1 commit


24 May, 2008

1 commit


03 Apr, 2008

1 commit

  • Fix the problem that makedumpfile sometimes fails on x86_64 machine.

    This patch adds the symbol "phys_base" to a vmcoreinfo data. The
    vmcoreinfo data has the minimum debugging information only for dump
    filtering. makedumpfile (dump filtering command) gets it to distinguish
    unnecessary pages, and makedumpfile creates a small dumpfile.

    On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and
    CONFIG_RELOCATABLE=y, makedumpfile fails like the following:

    # makedumpfile -d31 /proc/vmcore dumpfile
    The kernel version is not supported.
    The created dumpfile may be incomplete.
    _exclude_free_page: Can't get next online node.

    makedumpfile Failed.
    #

    The cause is the lack of the symbol "phys_base" in a vmcoreinfo data.
    If the symbol "phys_base" does not exist, makedumpfile considers an
    x86_64 kernel as non relocatable. As the result, makedumpfile
    misunderstands the physical address where the kernel is loaded, and it
    cannot translate a kernel virtual address to physical address correctly.

    To fix this problem, this patch adds the symbol "phys_base" to a
    vmcoreinfo data.

    Signed-off-by: Ken'ichi Ohmichi
    Cc: "Eric W. Biederman"
    Cc:
    Acked-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     

08 Feb, 2008

1 commit

  • This patch fixes the configuration dependencies in the vmcoreinfo data.

    i386's "node_data" is defined in arch/x86/mm/discontig_32.c,
    and x86_64's one is defined in arch/x86/mm/numa_64.c.
    They depend on CONFIG_NUMA:
    arch/x86/mm/Makefile_32:7
    obj-$(CONFIG_NUMA) += discontig_32.o
    arch/x86/mm/Makefile_64:7
    obj-$(CONFIG_NUMA) += numa_64.o

    ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c,
    and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM:
    arch/ia64/mm/Makefile:9-10
    obj-$(CONFIG_DISCONTIGMEM) += discontig.o
    obj-$(CONFIG_SPARSEMEM) += discontig.o

    ia64's "node_memblk" is defined in arch/ia64/mm/numa.c,
    and it depends on CONFIG_NUMA:
    arch/ia64/mm/Makefile:8
    obj-$(CONFIG_NUMA) += numa.o

    Signed-off-by: Ken'ichi Ohmichi
    Acked-by: Simon Horman
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     

30 Jan, 2008

1 commit

  • Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements
    indicate that DISCONTIGMEM has a higher overhead than sparsemem. And
    FLATMEMs benefits are minimal. So I think its best to simply standardize
    on sparsemem.

    Results of page allocator tests (test can be had via git from slab git
    tree branch tests)

    Measurements in cycle counts. 1000 allocations were performed and then the
    average cycle count was calculated.

    Order FlatMem Discontig SparseMem
    0 639 665 641
    1 567 647 593
    2 679 774 692
    3 763 967 781
    4 961 1501 962
    5 1356 2344 1392
    6 2224 3982 2336
    7 4869 7225 5074
    8 12500 14048 12732
    9 27926 28223 28165
    10 58578 58714 58682

    (Note that FlatMem is an SMP config and the rest NUMA configurations)

    Memory use:

    SMP Sparsemem
    -------------

    Kernel size:

    text data bss dec hex filename
    3849268 397739 1264856 5511863 541ab7 vmlinux

    total used free shared buffers cached
    Mem: 8242252 41164 8201088 0 352 11512
    -/+ buffers/cache: 29300 8212952
    Swap: 9775512 0 9775512

    SMP Flatmem
    -----------

    Kernel size:

    text data bss dec hex filename
    3844612 397739 1264536 5506887 540747 vmlinux

    So 4.5k growth in text size vs. FLATMEM.

    total used free shared buffers cached
    Mem: 8244052 40544 8203508 0 352 11484
    -/+ buffers/cache: 28708 8215344

    2k growth in overall memory use after boot.

    NUMA discontig:

    text data bss dec hex filename
    3888124 470659 1276504 5635287 55fcd7 vmlinux

    total used free shared buffers cached
    Mem: 8256256 56908 8199348 0 352 11496
    -/+ buffers/cache: 45060 8211196
    Swap: 9775512 0 9775512

    NUMA sparse:

    text data bss dec hex filename
    3896428 470659 1276824 5643911 561e87 vmlinux

    8k text growth. Given that we fully inline virt_to_page and friends now
    that is rather good.

    total used free shared buffers cached
    Mem: 8264720 57240 8207480 0 352 11516
    -/+ buffers/cache: 45372 8219348
    Swap: 9775512 0 9775512

    The total available memory is increased by 8k.

    This patch makes sparsemem the default and removes discontig and
    flatmem support from x86.

    [ akpm@linux-foundation.org: allnoconfig build fix ]

    Acked-by: Andi Kleen
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Christoph Lameter
     

28 Oct, 2007

1 commit

  • This patch adds the symbol "init_level4_pgt" to the vmcoreinfo data so
    that makedumpfile (dump filtering command) supports x86_64 sparsemem
    kernel of linux-2.6.24.

    makedumpfile creates a small dumpfile by excluding unnecessary pages for
    the analysis. It checks attributes in page structures and distinguishes
    necessary pages and unnecessary ones. To check them, makedumpfile gets
    the vmcoreinfo data which has the minimum debugging information only for
    dump filtering.

    For older x86_64 kernel (linux-2.6.23 or before), makedumpfile translates
    the virtual address of page structure into physical address by subtracting
    PAGE_OFFSET from virtual address, but this translation isn't effective for
    linux-2.6.24 sparsemem kernel, because its page structures are in virtual
    memmap area. makedumpfile should translate their virtual address by 4-levels
    paging and it needs the symbol "init_level4_pgt".

    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Thomas Gleixner

    Ken'ichi Ohmichi
     

20 Oct, 2007

1 commit

  • This patch removes the crashkernel parsing from
    arch/x86_64/kernel/machine_kexec.c and calls the generic function, introduced
    in the last patch, in setup_bootmem_allocator().

    This is necessary because the amount of System RAM must be known in this
    function now because of the new syntax.

    Signed-off-by: Bernhard Walle
    Cc: Andi Kleen
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

17 Oct, 2007

2 commits

  • Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros
    were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is
    impossible to grep for them. So these names should be changed. This
    discussion is the following:
    http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html

    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     
  • This patch set frees the restriction that makedumpfile users should install a
    vmlinux file (including the debugging information) into each system.

    makedumpfile command is the dump filtering feature for kdump. It creates a
    small dumpfile by filtering unnecessary pages for the analysis. To
    distinguish unnecessary pages, it needs a vmlinux file including the debugging
    information. These days, the debugging package becomes a huge file, and it is
    hard to install it into each system.

    To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
    the result, we reached the conclusion that necessary information for dump
    filtering (called "vmcoreinfo") should be embedded into the first kernel file
    and it should be accessed through /proc/vmcore during the second kernel.
    (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)

    Dan Aloni created the patch set for the above implementation.
    (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)

    And I updated it for multi architectures and memory models.
    (http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)

    Signed-off-by: Dan Aloni
    Signed-off-by: Ken'ichi Ohmichi
    Signed-off-by: Bernhard Walle
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken'ichi Ohmichi
     

14 Oct, 2007

1 commit

  • Since the x86 merge, lots of files that referenced their own filenames
    are no longer correct. Rather than keep them up to date, just delete
    them, as they add no real value.

    Additionally:
    - fix up comment formatting in scx200_32.c
    - Remove a credit from myself in setup_64.c from a time when we had no SCM
    - remove longwinded history from tsc_32.c which can be figured out from
    git.

    Signed-off-by: Dave Jones
    Signed-off-by: Linus Torvalds

    Dave Jones
     

11 Oct, 2007

1 commit