Eric Lee / smarc-fsl-linux-kernel

15 Feb, 2011

1 commit

261cd298a s390: remove task_show_regs ... Browse Code »

task_show_regs used to be a debugging aid in the early bringup days
of Linux on s390. /proc//status is a world readable file, it
is not a good idea to show the registers of a process. The only
correct fix is to remove task_show_regs.

Reported-by: Al Viro
Signed-off-by: Martin Schwidefsky
Signed-off-by: Linus Torvalds

Martin Schwidefsky
2011-02-15 23:34:16 +0800

26 Jan, 2011

1 commit

ac751efa6 console: rename acquire/release_console_sem() to console_lock/unlock() ... Browse Code »

The -rt patches change the console_semaphore to console_mutex. As a
result, a quite large chunk of the patches changes all
acquire/release_console_sem() to acquire/release_console_mutex()

This commit makes things use more neutral function names which dont make
implications about the underlying lock.

The only real change is the return value of console_trylock which is
inverted from try_acquire_console_sem()

This patch also paves the way to switching console_sem from a semaphore to
a mutex.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
Signed-off-by: Torben Hohn
Cc: Thomas Gleixner
Cc: Greg KH
Cc: Ingo Molnar
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Torben Hohn
2011-01-26 08:50:06 +0800

21 Jan, 2011

1 commit

6a108a14f kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT ... Browse Code »

The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel. A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar
Acked-by: David Woodhouse
Signed-off-by: David Rientjes
Cc: Greg KH
Cc: "David S. Miller"
Cc: Jens Axboe
Cc: Arnd Bergmann
Cc: Robin Holt
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-01-21 09:02:05 +0800

14 Jan, 2011

13 commits

5f24ce5fd thp: remove PG_buddy ... Browse Code »

PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can
be added to page->flags without overflowing (because of the sparse section
bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also
has to move the memory hotplug code from _mapcount to lru.next to avoid
any risk of clashes. We can't use lru.next for PG_buddy removal, but
memory hotplug can use lru.next even more easily than the mapcount
instead.

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:43 +0800
79134171d thp: transparent hugepage vmstat ... Browse Code »

Add hugepage stat information to /proc/vmstat and /proc/meminfo.

Signed-off-by: Andrea Arcangeli
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:43 +0800
dabb16f63 oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down ... Browse Code »

We'd like to be able to oom_score_adj a process up/down as it
enters/leaves the foreground. Currently, it is not possible to oom_adj
down without CAP_SYS_RESOURCE. This patch allows a task to decrease its
oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
or its inherited value at fork. Assuming the thread that has forked it
has oom_score_adj of 0, each process could decrease it back from 0 upon
activation unless a CAP_SYS_RESOURCE thread elevated it to something
higher.

Alternative considered:

* a setuid binary
* a daemon with CAP_SYS_RESOURCE

Since you don't wan't all processes to be able to reduce their oom_adj, a
setuid or daemon implementation would be complex. The alternatives also
have much higher overhead.

This patch updated from original patch based on feedback from David
Rientjes.

Signed-off-by: Mandeep Singh Baines
Acked-by: David Rientjes
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Rik van Riel
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mandeep Singh Baines
2011-01-14 09:32:35 +0800
2d90508f6 mm: smaps: export mlock information ... Browse Code »

Currently there is no way to find whether a process has locked its pages
in memory or not. And which of the memory regions are locked in memory.

Add a new field "Locked" to export this information via the smaps file.

Signed-off-by: Nikanth Karthikesan
Acked-by: Balbir Singh
Acked-by: Wu Fengguang
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikanth Karthikesan
2011-01-14 09:32:33 +0800
ceff1a770 /proc/kcore: fix seeking ... Browse Code »

Commit 34aacb2920 ("procfs: Use generic_file_llseek in /proc/kcore") broke
seeking on /proc/kcore. This changes it back to use default_llseek in
order to restore the original behavior.

The problem with generic_file_llseek is that it only allows seeks up to
inode->i_sb->s_maxbytes, which is 2GB-1 on procfs, where the memory file
offset values in the /proc/kcore PT_LOAD segments may exceed or start
beyond that offset value.

A similar revert was made for /proc/vmcore.

Signed-off-by: Dave Anderson
Acked-by: Frederic Weisbecker
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Anderson
2011-01-14 00:03:17 +0800
bf33cbdf8 proc: move proc_console.c to fs/proc/consoles.c ... Browse Code »

Filename is supposed to match procfile name for random junk.

Add __init while I'm at it.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2011-01-14 00:03:17 +0800
3740a20c4 proc: less LOCK/UNLOCK in remove_proc_entry() ... Browse Code »

For the common case where a proc entry is being removed and nobody is in
the process of using it, save a LOCK/UNLOCK pair.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2011-01-14 00:03:17 +0800
a6fc86d2b kpagecount: add slab page checking because _mapcount is in a union ... Browse Code »

Add a PageSlab() check before adding the _mapcount value to /kpagecount.
page->_mapcount is in a union with the SLAB structure so for pages
controlled by SLAB, page_mapcount() returns nonsense.

Signed-off-by: Petr Holasek
Cc: Wu Fengguang
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Holasek
2011-01-14 00:03:17 +0800
c6a340584 proc: use single_open() correctly ... Browse Code »

single_open()'s third argument is for copying into seq_file->private. Use
that, rather than open-coding it.

Signed-off-by: Jovi Zhang
Acked-by: David Rientjes
Acked-by: Alexey Dobriyan
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jovi Zhang
2011-01-14 00:03:16 +0800
6d1b6e4ef proc: ->low_ino cleanup ... Browse Code »

- ->low_ino is write-once field -- reading it under locks is unnecessary.

- /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
PROC_DYNAMIC_FIRST check never triggers.

- in proc_get_inode(), inode number always matches proc dir entry, so
save one parameter.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2011-01-14 00:03:16 +0800
9d6de12f7 proc: use seq_puts()/seq_putc() where possible ... Browse Code »

For string without format specifiers, use seq_puts().
For seq_printf("\n"), use seq_putc('\n').

text data bss dec hex filename
61866 488 112 62466 f402 fs/proc/proc.o
61729 488 112 62329 f379 fs/proc/proc.o
----------------------------------------------------
-139

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2011-01-14 00:03:16 +0800
a2ade7b6c proc: use unsigned long inside /proc/*/statm ... Browse Code »

/proc/*/statm code needlessly truncates data from unsigned long to int.
One needs only 8+ TB of RAM to make truncation visible.

Signed-off-by: Alexey Dobriyan
Reviewed-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2011-01-14 00:03:16 +0800
34e49d4f6 fs/proc/base.c, kernel/latencytop.c: convert sprintf_symbol() to %ps ... Browse Code »

Use temporary lr for struct latency_record for improved readability and
fewer columns used. Removed trailing space from output.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches
Cc: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2011-01-14 00:03:16 +0800

08 Jan, 2011

2 commits

56b85f32d Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 ... Browse Code »

* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (36 commits)
serial: apbuart: Fixup apbuart_console_init()
TTY: Add tty ioctl to figure device node of the system console.
tty: add 'active' sysfs attribute to tty0 and console device
drivers: serial: apbuart: Handle OF failures gracefully
Serial: Avoid unbalanced IRQ wake disable during resume
tty: fix typos/errors in tty_driver.h comments
pch_uart : fix warnings for 64bit compile
8250: fix uninitialized FIFOs
ip2: fix compiler warning on ip2main_pci_tbl
specialix: fix compiler warning on specialix_pci_tbl
rocket: fix compiler warning on rocket_pci_ids
8250: add a UPIO_DWAPB32 for 32 bit accesses
8250: use container_of() instead of casting
serial: omap-serial: Add support for kernel debugger
serial: fix pch_uart kconfig & build
drivers: char: hvc: add arm JTAG DCC console support
RS485 documentation: add 16C950 UART description
serial: ifx6x60: fix memory leak
serial: ifx6x60: free IRQ on error
Serial: EG20T: add PCH_UART driver
...

Fixed up conflicts in drivers/serial/apbuart.c with evil merge that
makes the code look fairly sane (unlike either side).

Linus Torvalds
2011-01-08 06:39:20 +0800
b4a45f5fe Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/npiggin/linux-npiggin

* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
fs: scale mntget/mntput
fs: rename vfsmount counter helpers
fs: implement faster dentry memcmp
fs: prefetch inode data in dcache lookup
fs: improve scalability of pseudo filesystems
fs: dcache per-inode inode alias locking
fs: dcache per-bucket dcache hash locking
bit_spinlock: add required includes
kernel: add bl_list
xfs: provide simple rcu-walk ACL implementation
btrfs: provide simple rcu-walk ACL implementation
ext2,3,4: provide simple rcu-walk ACL implementation
fs: provide simple rcu-walk generic_check_acl implementation
fs: provide rcu-walk aware permission i_ops
fs: rcu-walk aware d_revalidate method
fs: cache optimise dentry and inode for rcu-walk
fs: dcache reduce branches in lookup path
fs: dcache remove d_mounted
fs: fs_struct use seqlock
fs: rcu-walk for path lookup
...

Linus Torvalds
2011-01-08 00:56:33 +0800

07 Jan, 2011

8 commits

b74c79e99 fs: provide rcu-walk aware permission i_ops ... Browse Code »

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:29 +0800
34286d666 fs: rcu-walk aware d_revalidate method ... Browse Code »

Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:29 +0800
fb045adb9 fs: dcache reduce branches in lookup path ... Browse Code »

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/$[^\t ]*$->d_op = $.*$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]*$\.d_op = $.*$;/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:28 +0800
31e6b01f4 fs: rcu-walk for path lookup ... Browse Code »

Perform common cases of path lookups without any stores or locking in the
ancestor dentry elements. This is called rcu-walk, as opposed to the current
algorithm which is a refcount based walk, or ref-walk.

This results in far fewer atomic operations on every path element,
significantly improving path lookup performance. It also avoids cacheline
bouncing on common dentries, significantly improving scalability.

The overall design is like this:
* LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
* Take the RCU lock for the entire path walk, starting with the acquiring
of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
not required for dentry persistence.
* synchronize_rcu is called when unregistering a filesystem, so we can
access d_ops and i_ops during rcu-walk.
* Similarly take the vfsmount lock for the entire path walk. So now mnt
refcounts are not required for persistence. Also we are free to perform mount
lookups, and to assume dentry mount points and mount roots are stable up and
down the path.
* Have a per-dentry seqlock to protect the dentry name, parent, and inode,
so we can load this tuple atomically, and also check whether any of its
members have changed.
* Dentry lookups (based on parent, candidate string tuple) recheck the parent
sequence after the child is found in case anything changed in the parent
during the path walk.
* inode is also RCU protected so we can load d_inode and use the inode for
limited things.
* i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
* i_op can be loaded.

When we reach the destination dentry, we lock it, recheck lookup sequence,
and increment its refcount and mountpoint refcount. RCU and vfsmount locks
are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
not match, we can not drop rcu-walk gracefully at the current point in the
lokup, so instead return -ECHILD (for want of a better errno). This signals the
path walking code to re-do the entire lookup with a ref-walk.

Aside from the final dentry, there are other situations that may be encounted
where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
a reference on the last good dentry) and continue with a ref-walk. Again, if
we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
using ref-walk. But it is very important that we can continue with ref-walk
for most cases, particularly to avoid the overhead of double lookups, and to
gain the scalability advantages on common path elements (like cwd and root).

The cases where rcu-walk cannot continue are:
* NULL dentry (ie. any uncached path element)
* parent with d_inode->i_op->permission or ACLs
* dentries with d_revalidate
* Following links

In future patches, permission checks and d_revalidate become rcu-walk aware. It
may be possible eventually to make following links rcu-walk aware.

Uncached path elements will always require dropping to ref-walk mode, at the
very least because i_mutex needs to be grabbed, and objects allocated.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:27 +0800
fa0d7e3de fs: icache RCU free inodes ... Browse Code »

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:26 +0800
621e155a3 fs: change d_compare for rcu-walk ... Browse Code »

Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:19 +0800
fe15ce446 fs: change d_delete semantics ... Browse Code »

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:18 +0800
3c0cb7c31 Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm ... Browse Code »

* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (416 commits)
ARM: DMA: add support for DMA debugging
ARM: PL011: add DMA burst threshold support for ST variants
ARM: PL011: Add support for transmit DMA
ARM: PL011: Ensure IRQs are disabled in UART interrupt handler
ARM: PL011: Separate hardware FIFO size from TTY FIFO size
ARM: PL011: Allow better handling of vendor data
ARM: PL011: Ensure error flags are clear at startup
ARM: PL011: include revision number in boot-time port printk
ARM: vexpress: add sched_clock() for Versatile Express
ARM i.MX53: Make MX53 EVK bootable
ARM i.MX53: Some bug fix about MX53 MSL code
ARM: 6607/1: sa1100: Update platform device registration
ARM: 6606/1: sa1100: Fix platform device registration
ARM i.MX51: rename IPU irqs
ARM i.MX51: Add ipu clock support
ARM: imx/mx27_3ds: Add PMIC support
ARM: DMA: Replace page_to_dma()/dma_to_page() with pfn_to_dma()/dma_to_pfn()
mx51: fix usb clock support
MX51: Add support for usb host 2
arch/arm/plat-mxc/ehci.c: fix errors/typos
...

Linus Torvalds
2011-01-07 08:50:35 +0800

06 Jan, 2011

1 commit

31edf274f Merge branches 'ftrace', 'gic', 'io', 'kexec', 'mod', 'sa11x0', 'sh' and 'versatile' into devel Browse Code »

Russell King
2011-01-06 02:08:10 +0800

09 Dec, 2010

1 commit

8e9255e6a Merge branch 'linus' into sched/core ... Browse Code »

Merge reason: we want to queue up dependent cleanup

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-12-09 03:15:29 +0800

06 Dec, 2010

1 commit

7b2a69ba7 Revert "vfs: show unreachable paths in getcwd and proc" ... Browse Code »

Because it caused a chroot ttyname regression in 2.6.36.

As of 2.6.36 ttyname does not work in a chroot. It has already been
reported that screen breaks, and for me this breaks an automated
distribution testsuite, that I need to preserve the ability to run the
existing binaries on for several more years. glibc 2.11.3 which has a
fix for this is not an option.

The root cause of this breakage is:

commit 8df9d1a4142311c084ffeeacb67cd34d190eff74
Author: Miklos Szeredi
Date: Tue Aug 10 11:41:41 2010 +0200

vfs: show unreachable paths in getcwd and proc

Prepend "(unreachable)" to path strings if the path is not reachable
from the current root.

Two places updated are
- the return string from getcwd()
- and symlinks under /proc/$PID.

Other uses of d_path() are left unchanged (we know that some old
software crashes if /proc/mounts is changed).

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

So remove the nice sounding, but ultimately ill advised change to how
/proc/fd symlinks work.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: Linus Torvalds

Eric W. Biederman
2010-12-06 08:39:45 +0800

30 Nov, 2010

2 commits

5091faa44 sched: Add 'autogroup' scheduling feature: automated per session task groups ... Browse Code »

A recurring complaint from CFS users is that parallel kbuild has
a negative impact on desktop interactivity. This patch
implements an idea from Linus, to automatically create task
groups. Currently, only per session autogroups are implemented,
but the patch leaves the way open for enhancement.

Implementation: each task's signal struct contains an inherited
pointer to a refcounted autogroup struct containing a task group
pointer, the default for all tasks pointing to the
init_task_group. When a task calls setsid(), a new task group
is created, the process is moved into the new task group, and a
reference to the preveious task group is dropped. Child
processes inherit this task group thereafter, and increase it's
refcount. When the last thread of a process exits, the
process's reference is dropped, such that when the last process
referencing an autogroup exits, the autogroup is destroyed.

At runqueue selection time, IFF a task has no cgroup assignment,
its current autogroup is used.

Autogroup bandwidth is controllable via setting it's nice level
through the proc filesystem:

cat /proc//autogroup

Displays the task's group and the group's nice level.

echo > /proc//autogroup

Sets the task group's shares to the weight of nice task.
Setting nice level is rate limited for !admin users due to the
abuse risk of task group locking.

The feature is enabled from boot by default if
CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
the boot option noautogroup, and can also be turned on/off on
the fly via:

echo [01] > /proc/sys/kernel/sched_autogroup_enabled

... which will automatically move tasks to/from the root task group.

Signed-off-by: Mike Galbraith
Acked-by: Linus Torvalds
Acked-by: Peter Zijlstra
Cc: Markus Trippelsdorf
Cc: Mathieu Desnoyers
Cc: Paul Turner
Cc: Oleg Nesterov
[ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
Signed-off-by: Ingo Molnar
LKML-Reference:
Signed-off-by: Ingo Molnar

Mike Galbraith
2010-11-30 23:03:35 +0800
9833c3940 ARM: 6485/5: proc/vmcore - allow archs to override vmcore_elf_check_arch() ... Browse Code »

Allow architectures to redefine this macro if needed. This is useful for
example in architectures where 64-bit ELF vmcores are not supported.
Specifying zero vmcore_elf64_check_arch() allows compiler to optimize
away unnecessary parts of parse_crash_elf64_headers().

We also rename the macro to vmcore_elf64_check_arch() to reflect that it
is used for 64-bit vmcores only.

Signed-off-by: Mika Westerberg
Signed-off-by: Russell King

Mika Westerberg
2010-11-30 21:39:55 +0800

25 Nov, 2010

1 commit

ea251c1d5 pagemap: set pagemap walk limit to PMD boundary ... Browse Code »

Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512
pages.) But there is a corner case where walk_pmd_range() accidentally
runs over a VMA associated with a hugetlbfs file.

For example, when a process has mappings to VMAs as shown below:

# cat /proc//maps
...
3a58f6d000-3a58f72000 rw-p 00000000 00:00 0
7fbd51853000-7fbd51855000 rw-p 00000000 00:00 0
7fbd5186c000-7fbd5186e000 rw-p 00000000 00:00 0
7fbd51a00000-7fbd51c00000 rw-s 00000000 00:12 8614 /hugepages/test

then pagemap_read() goes into walk_pmd_range() path and walks in the range
0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by
walk_hugetlb_range(). Otherwise PMD for the hugepage is considered bad
and cleared, which causes undesirable results.

This patch fixes it by separating pagemap walk range into one PMD.

Signed-off-by: Naoya Horiguchi
Cc: Jun'ichi Nomura
Acked-by: KAMEZAWA Hiroyuki
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2010-11-25 05:50:46 +0800

18 Nov, 2010

1 commit

451a3c24b BKL: remove extraneous #include <smp_lock.h> ... Browse Code »

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann
Signed-off-by: Linus Torvalds

Arnd Bergmann
2010-11-18 00:59:32 +0800

17 Nov, 2010

1 commit

23308ba54 console: add /proc/consoles ... Browse Code »

It allows users to see what consoles are currently known to the system
and with what flags.

It is based on Werner's patch, the part about traversing fds was
removed, the code was moved to kernel/printk.c, where consoles are
handled and it makes more sense to me.

Signed-off-by: Jiri Slaby [cleanups]
Signed-off-by: "Dr. Werner Fink"
Cc: Al Viro
Cc: Greg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman

Jiri Slaby
2010-11-17 04:50:17 +0800

29 Oct, 2010

2 commits

aed1d84f9 switch procfs to ->mount() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:17:01 +0800
579441a39 setting ->proc_mnt doesn't belong in proc_get_sb() ... Browse Code »

take that to kern_mount_data()-using callers

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:16:58 +0800

28 Oct, 2010

4 commits

478735e38 /proc/stat: fix scalability of irq sum of all cpu ... Browse Code »

In /proc/stat, the number of per-IRQ event is shown by making a sum each
irq's events on all cpus. But we can make use of kstat_irqs().

kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
it's not a big cost. (Both of the number of cpus and irqs are small.)

If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

for_each_irq()
for_each_cpu()
- look up a radix tree
- read desc->irq_stat[cpu]
This seems not efficient. This patch adds kstat_irqs() for
CONFIG_GENRIC_HARDIRQ and change the calculation as

for_each_irq()
look up radix tree
for_each_cpu()
- read desc->irq_stat[cpu]

This reduces cost.

A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

%time cat /proc/stat > /dev/null

Before Patch: 2.459 sec
After Patch : .561 sec

[akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
[akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
Signed-off-by: KAMEZAWA Hiroyuki
Tested-by: Jack Steiner
Acked-by: Jack Steiner
Cc: Yinghai Lu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:13 +0800
f2c66cd8e /proc/stat: scalability of irq num per cpu ... Browse Code »

/proc/stat shows the total number of all interrupts to each cpu. But when
the number of IRQs are very large, it take very long time and 'cat
/proc/stat' takes more than 10 secs. This is because sum of all irq
events are counted when /proc/stat is read. This patch adds "sum of all
irq" counter percpu and reduce read costs.

The cost of reading /proc/stat is important because it's used by major
applications as 'top', 'ps', 'w', etc....

A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows

%time cat /proc/stat > /dev/null
Before Patch: 12.627 sec
After Patch: 2.459 sec

Signed-off-by: KAMEZAWA Hiroyuki
Tested-by: Jack Steiner
Acked-by: Jack Steiner
Cc: Yinghai Lu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:13 +0800
19cd56c48 procfs: fix /proc/softirqs formatting ... Browse Code »

The length of the BLOCK_IPOLL string is making i's value be printed too
far to the right. This patch fixes this and makes the output a bit
neater.

Currently:
CPU0
HI: 0
TIMER: 599792
NET_TX: 2
NET_RX: 6
BLOCK: 80807
BLOCK_IOPOLL: 0
TASKLET: 20012
SCHED: 0
HRTIMER: 63
RCU: 619279

With patch:
CPU0
HI: 0
TIMER: 585582
NET_TX: 2
NET_RX: 6
BLOCK: 80320
BLOCK_IOPOLL: 0
TASKLET: 19287
SCHED: 0
HRTIMER: 62
RCU: 604441

Signed-off-by: Davidlohr Bueso
Acked-by: Keika Kobayashi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2010-10-28 09:03:13 +0800
b40d4f84b /proc/pid/smaps: export amount of anonymous memory in a mapping ... Browse Code »

Export the number of anonymous pages in a mapping via smaps.

Even the private pages in a mapping backed by a file, would be marked as
anonymous, when they are modified. Export this information to user-space via
smaps.

Exporting this count will help gdb to make a better decision on which
areas need to be dumped in its coredump; and should be useful to others
studying the memory usage of a process.

Signed-off-by: Nikanth Karthikesan
Acked-by: Hugh Dickins
Reviewed-by: KOSAKI Motohiro
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikanth Karthikesan
2010-10-28 09:03:13 +0800