Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

23 Sep, 2008

1 commit

f9092f358 kexec: fix segmentation fault in kimage_add_entry ... Browse Code »

A segmentation fault can occur in kimage_add_entry in kexec.c when loading
a kernel image into memory. The fault occurs because a page is requested
by calling kimage_alloc_page with gfp_mask GFP_KERNEL and the function may
actually return a page with gfp_mask GFP_HIGHUSER. The high mem page is
returned because it was swapped with the kernel page due to the kernel
page being a page that will shortly be copied to.

This patch ensures that kimage_alloc_page returns a page that was created
with the correct gfp flags.

I have verified the change and fixed the whitespace damage of the original
patch. Jonathan did a great job of tracking this down after he hit the
problem. -- Eric

Signed-off-by: Jonathan Steel
Signed-off-by: Eric W. Biederman
Acked-by: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jonathan Steel
2008-09-23 23:09:14 +0800

15 Aug, 2008

7 commits

8c5a1cf0a kexec: use a mutex for locking rather than xchg() ... Browse Code »

Functionally the same, but more conventional.

Cc: Huang Ying
Tested-by: Vivek Goyal
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2008-08-15 23:35:43 +0800
3122c3311 kexec jump: fix for ftrace ... Browse Code »

Ftrace depends on some processor state that we destroyed during kexec and
restored by restore_processor_state(). So save_processor_state() and
restore_processor_state() are moved into machine_kexec() and ftrace is
restored after restore_processor_state().

Signed-off-by: Huang Ying
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:43 +0800
73bd9c72a kexec jump: in sync with hibernation implementation ... Browse Code »

Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync with
current hibernation implementation.

Signed-off-by: Huang Ying
Acked-by: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800
ca195b7f6 kexec jump: remove duplication of kexec_restart_prepare() ... Browse Code »

Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the
code.

Signed-off-by: Huang Ying
Acked-by: Pavel Machek
Acked-by: Vivek Goyal
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800
163f6876f kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE ... Browse Code »

Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.

[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Cc: Russell King
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800
7ade3fcc1 kexec jump: clean up #ifdef and comments ... Browse Code »

Move if (kexec_image->preserve_context) { ... } into #ifdef
CONFIG_KEXEC_JUMP to make code looks cleaner.

Fix no longer correct comments of kernel_kexec().

Signed-off-by: Huang Ying
Acked-by: Vivek Goyal
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800
4cd69b986 kexec: fix compilation warning on xchg(&kexec_lock, 0) in kernel_kexec() ... Browse Code »

kernel/kexec.c: In function 'kernel_kexec':
kernel/kexec.c:1506: warning: value computed is not used

Signed-off-by: Huang Ying
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-08-15 23:35:42 +0800

27 Jul, 2008

3 commits

89081d17f kexec jump: save/restore device state ... Browse Code »

This patch implements devices state save/restore before after kexec.

This patch together with features in kexec_jump patch can be used for
following:

- A simple hibernation implementation without ACPI support. You can kexec a
hibernating kernel, save the memory image of original system and shutdown
the system. When resuming, you restore the memory image of original system
via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot. You can make system
snapshot, jump back, do some thing and make another system snapshot.

- Cooperative multi-kernel/system. With kexec jump, you can switch between
several kernels/systems quickly without boot process except the first time.
This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
off). This can be used to invoke BIOS code under Linux.

The following user-space tools can be used with kexec jump:

- kexec-tools needs to be patched to support kexec jump. The patches
and the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

- makedumpfile with patches are used as memory image saving tool, it
can exclude free pages from original kernel memory image file. The
patches and the precompiled makedumpfile can be download from the
following URL:
source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10

- An initramfs image can be used as the root file system of kexeced
kernel. An initramfs image built with "BuildRoot" can be downloaded
from the following URL:
initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
All user space tools above are included in the initramfs image.

Usage example of simple hibernation:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y

2. Build an initramfs image contains kexec-tool and makedumpfile, or
download the pre-built initramfs image, called rootfs.gz in
following text.

3. Prepare a partition to save memory image of original kernel, called
hibernating partition in following text.

4. Boot kernel compiled in step 1 (kernel A).

5. In the kernel A, load kernel compiled in step 1 (kernel B) with
/sbin/kexec. The shell command line can be as follow:

/sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
--mem-max=0xffffff --initrd=rootfs.gz

6. Boot the kernel B with following shell command line:

/sbin/kexec -e

7. The kernel B will boot as normal kexec. In kernel B the memory
image of kernel A can be saved into hibernating partition as
follow:

jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
echo $jump_back_entry > kexec_jump_back_entry
cp /proc/vmcore dump.elf

Then you can shutdown the machine as normal.

8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
root file system.

9. In kernel C, load the memory image of kernel A as follow:

/sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf

10. Jump back to the kernel A as follow:

/sbin/kexec -e

Then, kernel A is resumed.

Implementation point:

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

Known issues:

- Because the segment number supported by sys_kexec_load is limited,
hibernation image with many segments may not be load. This is
planned to be eliminated by adding a new flag to sys_kexec_load to
make a image can be loaded with multiple sys_kexec_load invoking.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying
Acked-by: Vivek Goyal
Cc: "Eric W. Biederman"
Cc: Pavel Machek
Cc: Nigel Cunningham
Cc: "Rafael J. Wysocki"
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-07-27 03:00:04 +0800
3ab835213 kexec jump ... Browse Code »

This patch provides an enhancement to kexec/kdump. It implements the
following features:

- Backup/restore memory used by the original kernel before/after
kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off). This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
line can be as follow:

/sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

/sbin/kexec -e

Implementation point:

To support jumping without reserving memory. One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page). When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped. Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying
Acked-by: Vivek Goyal
Cc: "Eric W. Biederman"
Cc: Pavel Machek
Cc: Nigel Cunningham
Cc: "Rafael J. Wysocki"
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Ying
2008-07-27 03:00:04 +0800
7fccf0326 kernel/kexec.c: make 'kimage_terminate' void ... Browse Code »

Since kimage_terminate() always returns 0, make it void.

Signed-off-by: WANG Cong
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

WANG Cong
2008-07-27 03:00:04 +0800

01 May, 2008

1 commit

be089d79c kexec: make extended crashkernel= syntax less confusing ... Browse Code »

The extended crashkernel syntax is a little confusing in the way it handles
ranges. eg:

crashkernel=512M-2G:64M,2G-:128M

Means if the machine has between 512M and 2G of memory the crash region should
be 64M, and if the machine has 2G of memory the region should be 64M. Only if
the machine has more than 2G memory will 128M be allocated.

Although that semantic is correct, it is somewhat baffling. Instead I propose
that the end of the range means the first address past the end of the range,
ie: 512M up to but not including 2G.

[bwalle@suse.de: clarify inclusive/exclusive in crashkernel commandline in documentation]
Signed-off-by: Michael Ellerman
Acked-by: Bernhard Walle
Cc: "Eric W. Biederman"
Cc: Simon Horman
Signed-off-by: Bernhard Walle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Ellerman
2008-05-01 23:04:00 +0800

28 Apr, 2008

1 commit

122c7a590 vmcoreinfo: add page flags values ... Browse Code »

Add some values of page flags to the vmcoreinfo data.

The vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.

An old makedumpfile (v1.2.4 or before) had assumed some values of page flags
internally, and this implementation could not follow the change of these
values. For example, Christoph Lameter is changing these values by the
follwing patch: http://lkml.org/lkml/2008/2/29/463

So a new makedumpfile (v1.2.5) came to need these values and I created this
patch to let the kernel output them.

Signed-off-by: Ken'ichi Ohmichi
Cc: Christoph Lameter
Cc: "Eric W. Biederman"
Acked-by: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2008-04-28 23:58:23 +0800

19 Apr, 2008

1 commit

a65502075 kernel: Remove unnecessary inclusions of asm/semaphore.h ... Browse Code »

None of these files use any of the functionality promised by
asm/semaphore.h.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2008-04-19 10:17:04 +0800

08 Feb, 2008

2 commits

bba1f603b vmcoreinfo: add "VMCOREINFO_" to all the call for vmcoreinfo_append_str() ... Browse Code »

For readability, all the calls to vmcoreinfo_append_str() are changed to macros
having a prefix "VMCOREINFO_".

This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0584.html

Signed-off-by: Ken'ichi Ohmichi
Acked-by: Simon Horman
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2008-02-08 00:42:25 +0800
c76f860c4 vmcoreinfo: rename vmcoreinfo's macros returning the size ... Browse Code »

This patchset is for the vmcoreinfo data.

The vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.

This patch:

VMCOREINFO_SIZE() should be renamed VMCOREINFO_STRUCT_SIZE() since it's always
returning the size of the struct with a given name. This change would allow
VMCOREINFO_TYPEDEF_SIZE() to simply become VMCOREINFO_SIZE() since it need not
be used exclusively for typedefs.

This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0582.html

Signed-off-by: Ken'ichi Ohmichi
Acked-by: David Rientjes
Acked-by: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2008-02-08 00:42:25 +0800

09 Jan, 2008

1 commit

83a08e7c6 vmcoreinfo: add the array length of "free_list" for filtering free pages ... Browse Code »

This patch adds the array length of "free_area.free_list" to the vmcoreinfo
data so that makedumpfile (dump filtering command) can exclude all free pages
in linux-2.6.24.

makedumpfile creates a small dumpfile by excluding unnecessary pages for the
analysis. To distinguish unnecessary pages, makedumpfile gets the vmcoreinfo
data which has the minimum debugging information only for dump filtering.

In 2.6.24-rc1 or later, the free_area.free_list is an array which has one list
for each migrate types instead of a single list. makedumpfile needs the array
length of "free_area.free_list" and the vmcoreinfo data should contain it.

Signed-off-by: Huang Ying
Tested-by: Ken'ichi Ohmichi
Acked-by: Simon Horman
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2008-01-09 08:10:36 +0800

20 Oct, 2007

2 commits

cba63c308 Extended crashkernel command line ... Browse Code »

This patch adds a extended crashkernel syntax that makes the value of reserved
system RAM dependent on the system RAM itself:

crashkernel=:[,:,...][@offset]
range=start-[end]

For example:

crashkernel=512M-2G:64M,2G-:128M

The motivation comes from distributors that configure their crashkernel
command line automatically with some configuration tool (YaST, you know ;)).
Of course that tool knows the value of System RAM, but if the user removes
RAM, then the system becomes unbootable or at least unusable and error
handling is very difficult.

This series implements this change for i386, x86_64, ia64, ppc64 and sh. That
should be all platforms that support kdump in current mainline. I tested all
platforms except sh due to the lack of a sh processor.

This patch:

This is the generic part of the patch. It adds a parse_crashkernel() function
in kernel/kexec.c that is called by the architecture specific code that
actually reserves the memory. That function takes the whole command line and
looks itself for "crashkernel=" in it.

If there are multiple occurrences, then the last one is taken. The advantage
is that if you have a bootloader like lilo or elilo which allows you to append
a command line parameter but not to remove one (like in GRUB), then you can
add another crashkernel value for testing at the boot command line and this
one overwrites the command line in the configuration then.

Signed-off-by: Bernhard Walle
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Vivek Goyal
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bernhard Walle
2007-10-20 02:53:49 +0800
b460cbc58 pid namespaces: define is_global_init() and is_container_init() ... Browse Code »

is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().

A cgroup init has it's tsk->pid == 1.

A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.

Changelog:

2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().

2.6.21-mm2-pidns2:

- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.

[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn
Signed-off-by: Sukadev Bhattiprolu
Acked-by: Pavel Emelianov
Cc: Eric W. Biederman
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Herbert Poetzel
Cc: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2007-10-20 02:53:37 +0800

19 Oct, 2007

1 commit

c80544dc0 sparse pointer use of zero as null ... Browse Code »

Get rid of sparse related warnings from places that use integer as NULL
pointer.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stephen Hemminger
Cc: Andi Kleen
Cc: Jeff Garzik
Cc: Matt Mackall
Cc: Ian Kent
Cc: Arnd Bergmann
Cc: Davide Libenzi
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Hemminger
2007-10-19 05:37:31 +0800

17 Oct, 2007

5 commits

bcbba6c10 add-vmcore: add a prefix "VMCOREINFO_" to the vmcoreinfo macros ... Browse Code »

Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros
were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is
impossible to grep for them. So these names should be changed. This
discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html

Signed-off-by: Ken'ichi Ohmichi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2007-10-17 23:42:54 +0800
6cfa062f0 add-vmcore: add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data ... Browse Code »

[2/3] Add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data.
The dump filetering command 'makedumpfile'(v1.1.6 or before) had assumed
the above values, and it was not good from the reliability viewpoint.
So makedumpfile v1.2.0 came to need these values and I created the patch
to let the kernel output them.
makedumpfile site:
https://sourceforge.net/projects/makedumpfile/

Signed-off-by: Ken'ichi Ohmichi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2007-10-17 23:42:54 +0800
d768281e9 add-vmcore: cleanup the coding style according to Andrew's comments ... Browse Code »

[1/3] Cleanup the coding style according to Andrew's comments:
http://lists.infradead.org/pipermail/kexec/2007-August/000522.html
- vmcoreinfo_append_str() should have suitable __attribute__s so that
the compiler can check its use.
- vmcoreinfo_max_size should have size_t.
- Use get_seconds() instead of xtime.tv_sec.
- Use init_uts_ns.name.release instead of UTS_RELEASE.

Signed-off-by: Ken'ichi Ohmichi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2007-10-17 23:42:54 +0800
fd59d231f Add vmcoreinfo ... Browse Code »

This patch set frees the restriction that makedumpfile users should install a
vmlinux file (including the debugging information) into each system.

makedumpfile command is the dump filtering feature for kdump. It creates a
small dumpfile by filtering unnecessary pages for the analysis. To
distinguish unnecessary pages, it needs a vmlinux file including the debugging
information. These days, the debugging package becomes a huge file, and it is
hard to install it into each system.

To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
the result, we reached the conclusion that necessary information for dump
filtering (called "vmcoreinfo") should be embedded into the first kernel file
and it should be accessed through /proc/vmcore during the second kernel.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)

Dan Aloni created the patch set for the above implementation.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)

And I updated it for multi architectures and memory models.
(http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)

Signed-off-by: Dan Aloni
Signed-off-by: Ken'ichi Ohmichi
Signed-off-by: Bernhard Walle
Signed-off-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2007-10-17 23:42:54 +0800
a9022e9cb Clean up duplicate includes in kernel/ ... Browse Code »

This patch cleans up duplicate includes in
kernel/

Signed-off-by: Jesper Juhl
Acked-by: Paul E. McKenney
Reviewed-by: Satyam Sharma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2007-10-17 23:42:48 +0800

09 May, 2007

1 commit

6672f76a5 kdump/kexec: calculate note size at compile time ... Browse Code »

Currently the size of the per-cpu region reserved to save crash notes is
set by the per-architecture value MAX_NOTE_BYTES. Which in turn is
currently set to 1024 on all supported architectures.

While testing ia64 I recently discovered that this value is in fact too
small. The particular setup I was using actually needs 1172 bytes. This
lead to very tedious failure mode where the tail of one elf note would
overwrite the head of another if they ended up being alocated sequentially
by kmalloc, which was often the case.

It seems to me that a far better approach is to caclculate the size that
the area needs to be. This patch does just that.

If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is
needed then this should be as easy as making MAX_NOTE_BYTES larger in
arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I
think that the approach in this patch is a much more robust idea.

Acked-by: Vivek Goyal
Signed-off-by: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Simon Horman
2007-05-09 02:15:07 +0800

08 Dec, 2006

4 commits

6ee7e78e7 Merge branch 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6 ... Browse Code »

* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] replace kmalloc+memset with kzalloc
[IA64] resolve name clash by renaming is_available_memory()
[IA64] Need export for csum_ipv6_magic
[IA64] Fix DISCONTIGMEM without VIRTUAL_MEM_MAP
[PATCH] Add support for type argument in PAL_GET_PSTATE
[IA64] tidy up return value of ip_fast_csum
[IA64] implement csum_ipv6_magic for ia64.
[IA64] More Itanium PAL spec updates
[IA64] Update processor_info features
[IA64] Add se bit to Processor State Parameter structure
[IA64] Add dp bit to cache and bus check structs
[IA64] SN: Correctly update smp_affinty mask
[IA64] sparse cleanups
[IA64] IA64 Kexec/kdump

Linus Torvalds
2006-12-08 07:39:22 +0800
a79561134 [IA64] IA64 Kexec/kdump ... Browse Code »

Changes and updates.

1. Remove fake rendz path and related code according to discuss with Khalid Aziz.
2. fc.i offset fix in relocate_kernel.S.
3. iospic shutdown code eoi and mask race fix from Fujitsu.
4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner.
5. Send slave to SAL slave loop patch from Jay Lan.
6. Kdump on non-recoverable MCA event patch from Jay Lan
7. Use CTL_UNNUMBERED in kdump_on_init sysctl.

Signed-off-by: Zou Nan hai
Signed-off-by: Tony Luck

Zou Nan hai
2006-12-08 01:51:35 +0800
85916f816 [PATCH] Kexec / Kdump: Unify elf note code ... Browse Code »

The elf note saving code is currently duplicated over several
architectures. This cleanup patch simply adds code to a common file and
then replaces the arch-specific code with calls to the newly added code.

The only drawback with this approach is that s390 doesn't fully support
kexec-on-panic which for that arch leads to introduction of unused code.

Signed-off-by: Magnus Damm
Cc: Vivek Goyal
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Magnus Damm
2006-12-08 00:39:46 +0800
4668edc33 [PATCH] kernel core: replace kmalloc+memset with kzalloc ... Browse Code »

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Burman Yan
2006-12-08 00:39:41 +0800

30 Sep, 2006

2 commits

0b4a8a789 [PATCH] kexec warning fix ... Browse Code »

This fixes a couple of compiler warnings, and adds paranoia checks as well.

Signed-off-by: Roland McGrath
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2006-09-30 00:18:15 +0800
f400e198b [PATCH] pidspace: is_init() ... Browse Code »

This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.

Introduce is_init to capture this case.

With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.

Signed-off-by: Eric W. Biederman
Signed-off-by: Sukadev Bhattiprolu
Cc: Dave Hansen
Cc: Serge Hallyn
Cc: Cedric Le Goater
Cc:
Acked-by: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sukadev Bhattiprolu
2006-09-30 00:18:12 +0800

28 Jun, 2006

1 commit

c0ce7d088 [POWERPC] Add the use of the firmware soft-reset-nmi to kdump. ... Browse Code »

With this patch, kdump uses the firmware soft-reset NMI for two purposes:
1) Initiate the kdump (take a crash dump) by issuing a soft-reset.
2) Break a CPU out of a deadlock condition that is detected during kdump
processing.

When a soft-reset is initiated each CPU will enter
system_reset_exception() and set its corresponding bit in the global
bit-array cpus_in_sr then call die(). When die() finds the CPU's bit set
in cpu_in_sr crash_kexec() is called to initiate a crash dump. The first
CPU to enter crash_kexec() is called the "crashing CPU". All other CPUs
are "secondary CPUs". The secondary CPU's pass through to
crash_kexec_secondary() and sleep. The crashing CPU waits for all CPUs
to enter via soft-reset then boots the kdump kernel (see
crash_soft_reset_check())

When the system crashes due to a panic or exception, crash_kexec() is
called by panic() or die(). The crashing CPU sends an IPI to all other
CPUs to notify them of the pending shutdown. If a CPU is in a deadlock
or hung state with interrupts disabled, the IPI will not be delivered.
The result being, that the kdump kernel is not booted. This problem is
solved with the use of a firmware generated soft-reset. After the
crashing_cpu has issued the IPI, it waits for 10 sec for all CPUs to
enter crash_ipi_callback(). A CPU signifies its entry to
crash_ipi_callback() by setting its corresponding bit in the
cpus_in_crash bit array. After 10 sec, if one or more CPUs have not set
their bit in cpus_in_crash we assume that the CPU(s) is deadlocked. The
operator is then prompted to generate a soft-reset to break the
deadlock. Each CPU enters the soft reset handler as described above.

Two conditions must be handled at this point:
1) The system crashed because the operator generated a soft-reset. See
2) The system had crashed before the soft-reset was generated ( in the
case of a Panic or oops).

The first CPU to enter crash_kexec() uses the state of the kexec_lock to
determine this state. If kexec_lock is already held then condition 2 is
true and crash_kexec_secondary() is called, else; this CPU is flagged as
the crashing CPU, the kexec_lock is acquired and crash_kexec() proceeds
as described above.

Each additional CPUs responding to the soft-reset will pass through
crash_kexec() to kexec_secondary(). All secondary CPUs call
crash_ipi_callback() readying them self's for the shutdown. When ready
they clear their bit in cpus_in_sr. The crashing CPU waits in
kexec_secondary() until all other CPUs have cleared their bits in
cpus_in_sr. The kexec kernel boot is then started.

Signed-off-by: Haren Myneni
Signed-off-by: David Wilder
Signed-off-by: Paul Mackerras

David Wilder
2006-06-28 13:18:52 +0800

23 Jun, 2006

1 commit

c330dda90 [PATCH] Add a sysfs file to determine if a kexec kernel is loaded ... Browse Code »

Create two files in /sys/kernel, kexec_loaded and kexec_crash_loaded. Each
file contains a simple boolean value indicating whether the relevant kernel
has been loaded into memory. The motivation for this is geared around
support.

Signed-off-by: Jeff Moyer
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
2006-06-23 22:43:02 +0800

12 Jan, 2006

1 commit

c59ede7b7 [PATCH] move capable() to capability.h ... Browse Code »

- Move capable() from sched.h to capability.h;

- Use where capable() is used
(in include/, block/, ipc/, kernel/, a few drivers/,
mm/, security/, & sound/;
many more drivers/ to go)

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy.Dunlap
2006-01-12 10:42:13 +0800

11 Jan, 2006

2 commits

e996e5813 [PATCH] kdump: save registers early (inline functions) ... Browse Code »

- If system panics then cpu register states are captured through funciton
crash_get_current_regs(). This is not a inline function hence a stack frame
is pushed on to the stack and then cpu register state is captured. Later
this frame is popped and new frames are pushed (machine_kexec).

- In theory this is not very right as we are capturing register states for a
frame and that frame is no more valid. This seems to have created back
trace problems for ppc64.

- This patch fixes it up. The very first thing it does after entering
crash_kexec() is to capture the register states. Anyway we don't want the
back trace beyond crash_kexec(). crash_get_current_regs() has been made
inline

- crash_setup_regs() is the top architecture dependent function which should
be responsible for capturing the register states as well as to do some
architecture dependent tricks. For ex. fixing up ss and esp for i386.
crash_setup_regs() has also been made inline to ensure no new call frame is
pushed onto stack.

Signed-off-by: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vivek Goyal
2006-01-11 00:01:27 +0800
cc5716587 [PATCH] kdump: dynamic per cpu allocation of memory for saving cpu registers ... Browse Code »

- In case of system crash, current state of cpu registers is saved in memory
in elf note format. So far memory for storing elf notes was being allocated
statically for NR_CPUS.

- This patch introduces dynamic allocation of memory for storing elf notes.
It uses alloc_percpu() interface. This should lead to better memory usage.

- Introduced based on Andi Kleen's and Eric W. Biederman's suggestions.

- This patch also moves memory allocation for elf notes from architecture
dependent portion to architecture independent portion. Now crash_notes is
architecture independent. The whole idea is that size of memory to be
allocated per cpu (MAX_NOTE_BYTES) can be architecture dependent and
allocation of this memory can be architecture independent.

Signed-off-by: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vivek Goyal
2006-01-11 00:01:26 +0800

30 Oct, 2005

1 commit

4c21e2f24 [PATCH] mm: split page table lock ... Browse Code »

Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.

This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)

In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.

Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.

There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:42 +0800

28 Oct, 2005

1 commit

9796fdd82 [PATCH] gfp_t: kernel/* ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-10-28 23:16:49 +0800

29 Jun, 2005

1 commit

314b6a4d8 [PATCH] kexec: fix sparse warnings ... Browse Code »

Signed-off-by: Alexey Dobriyan
Cc: Eric Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2005-06-29 05:53:40 +0800