15 Feb, 2011

1 commit

  • task_show_regs used to be a debugging aid in the early bringup days
    of Linux on s390. /proc//status is a world readable file, it
    is not a good idea to show the registers of a process. The only
    correct fix is to remove task_show_regs.

    Reported-by: Al Viro
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     

26 Jan, 2011

1 commit

  • The -rt patches change the console_semaphore to console_mutex. As a
    result, a quite large chunk of the patches changes all
    acquire/release_console_sem() to acquire/release_console_mutex()

    This commit makes things use more neutral function names which dont make
    implications about the underlying lock.

    The only real change is the return value of console_trylock which is
    inverted from try_acquire_console_sem()

    This patch also paves the way to switching console_sem from a semaphore to
    a mutex.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
    Signed-off-by: Torben Hohn
    Cc: Thomas Gleixner
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Torben Hohn
     

21 Jan, 2011

1 commit

  • The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
    is used to configure any non-standard kernel with a much larger scope than
    only small devices.

    This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
    references to the option throughout the kernel. A new CONFIG_EMBEDDED
    option is added that automatically selects CONFIG_EXPERT when enabled and
    can be used in the future to isolate options that should only be
    considered for embedded systems (RISC architectures, SLOB, etc).

    Calling the option "EXPERT" more accurately represents its intention: only
    expert users who understand the impact of the configuration changes they
    are making should enable it.

    Reviewed-by: Ingo Molnar
    Acked-by: David Woodhouse
    Signed-off-by: David Rientjes
    Cc: Greg KH
    Cc: "David S. Miller"
    Cc: Jens Axboe
    Cc: Arnd Bergmann
    Cc: Robin Holt
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

14 Jan, 2011

13 commits

  • PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can
    be added to page->flags without overflowing (because of the sparse section
    bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also
    has to move the memory hotplug code from _mapcount to lru.next to avoid
    any risk of clashes. We can't use lru.next for PG_buddy removal, but
    memory hotplug can use lru.next even more easily than the mapcount
    instead.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Add hugepage stat information to /proc/vmstat and /proc/meminfo.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • We'd like to be able to oom_score_adj a process up/down as it
    enters/leaves the foreground. Currently, it is not possible to oom_adj
    down without CAP_SYS_RESOURCE. This patch allows a task to decrease its
    oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
    or its inherited value at fork. Assuming the thread that has forked it
    has oom_score_adj of 0, each process could decrease it back from 0 upon
    activation unless a CAP_SYS_RESOURCE thread elevated it to something
    higher.

    Alternative considered:

    * a setuid binary
    * a daemon with CAP_SYS_RESOURCE

    Since you don't wan't all processes to be able to reduce their oom_adj, a
    setuid or daemon implementation would be complex. The alternatives also
    have much higher overhead.

    This patch updated from original patch based on feedback from David
    Rientjes.

    Signed-off-by: Mandeep Singh Baines
    Acked-by: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     
  • Currently there is no way to find whether a process has locked its pages
    in memory or not. And which of the memory regions are locked in memory.

    Add a new field "Locked" to export this information via the smaps file.

    Signed-off-by: Nikanth Karthikesan
    Acked-by: Balbir Singh
    Acked-by: Wu Fengguang
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikanth Karthikesan
     
  • Commit 34aacb2920 ("procfs: Use generic_file_llseek in /proc/kcore") broke
    seeking on /proc/kcore. This changes it back to use default_llseek in
    order to restore the original behavior.

    The problem with generic_file_llseek is that it only allows seeks up to
    inode->i_sb->s_maxbytes, which is 2GB-1 on procfs, where the memory file
    offset values in the /proc/kcore PT_LOAD segments may exceed or start
    beyond that offset value.

    A similar revert was made for /proc/vmcore.

    Signed-off-by: Dave Anderson
    Acked-by: Frederic Weisbecker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Anderson
     
  • Filename is supposed to match procfile name for random junk.

    Add __init while I'm at it.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • For the common case where a proc entry is being removed and nobody is in
    the process of using it, save a LOCK/UNLOCK pair.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Add a PageSlab() check before adding the _mapcount value to /kpagecount.
    page->_mapcount is in a union with the SLAB structure so for pages
    controlled by SLAB, page_mapcount() returns nonsense.

    Signed-off-by: Petr Holasek
    Cc: Wu Fengguang
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Holasek
     
  • single_open()'s third argument is for copying into seq_file->private. Use
    that, rather than open-coding it.

    Signed-off-by: Jovi Zhang
    Acked-by: David Rientjes
    Acked-by: Alexey Dobriyan
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jovi Zhang
     
  • - ->low_ino is write-once field -- reading it under locks is unnecessary.

    - /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
    PROC_DYNAMIC_FIRST check never triggers.

    - in proc_get_inode(), inode number always matches proc dir entry, so
    save one parameter.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • For string without format specifiers, use seq_puts().
    For seq_printf("\n"), use seq_putc('\n').

    text data bss dec hex filename
    61866 488 112 62466 f402 fs/proc/proc.o
    61729 488 112 62329 f379 fs/proc/proc.o
    ----------------------------------------------------
    -139

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • /proc/*/statm code needlessly truncates data from unsigned long to int.
    One needs only 8+ TB of RAM to make truncation visible.

    Signed-off-by: Alexey Dobriyan
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Use temporary lr for struct latency_record for improved readability and
    fewer columns used. Removed trailing space from output.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Joe Perches
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

08 Jan, 2011

2 commits

  • * 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (36 commits)
    serial: apbuart: Fixup apbuart_console_init()
    TTY: Add tty ioctl to figure device node of the system console.
    tty: add 'active' sysfs attribute to tty0 and console device
    drivers: serial: apbuart: Handle OF failures gracefully
    Serial: Avoid unbalanced IRQ wake disable during resume
    tty: fix typos/errors in tty_driver.h comments
    pch_uart : fix warnings for 64bit compile
    8250: fix uninitialized FIFOs
    ip2: fix compiler warning on ip2main_pci_tbl
    specialix: fix compiler warning on specialix_pci_tbl
    rocket: fix compiler warning on rocket_pci_ids
    8250: add a UPIO_DWAPB32 for 32 bit accesses
    8250: use container_of() instead of casting
    serial: omap-serial: Add support for kernel debugger
    serial: fix pch_uart kconfig & build
    drivers: char: hvc: add arm JTAG DCC console support
    RS485 documentation: add 16C950 UART description
    serial: ifx6x60: fix memory leak
    serial: ifx6x60: free IRQ on error
    Serial: EG20T: add PCH_UART driver
    ...

    Fixed up conflicts in drivers/serial/apbuart.c with evil merge that
    makes the code look fairly sane (unlike either side).

    Linus Torvalds
     
  • …t/npiggin/linux-npiggin

    * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
    fs: scale mntget/mntput
    fs: rename vfsmount counter helpers
    fs: implement faster dentry memcmp
    fs: prefetch inode data in dcache lookup
    fs: improve scalability of pseudo filesystems
    fs: dcache per-inode inode alias locking
    fs: dcache per-bucket dcache hash locking
    bit_spinlock: add required includes
    kernel: add bl_list
    xfs: provide simple rcu-walk ACL implementation
    btrfs: provide simple rcu-walk ACL implementation
    ext2,3,4: provide simple rcu-walk ACL implementation
    fs: provide simple rcu-walk generic_check_acl implementation
    fs: provide rcu-walk aware permission i_ops
    fs: rcu-walk aware d_revalidate method
    fs: cache optimise dentry and inode for rcu-walk
    fs: dcache reduce branches in lookup path
    fs: dcache remove d_mounted
    fs: fs_struct use seqlock
    fs: rcu-walk for path lookup
    ...

    Linus Torvalds
     

07 Jan, 2011

8 commits

  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Require filesystems be aware of .d_revalidate being called in rcu-walk
    mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
    -ECHILD from all implementations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Perform common cases of path lookups without any stores or locking in the
    ancestor dentry elements. This is called rcu-walk, as opposed to the current
    algorithm which is a refcount based walk, or ref-walk.

    This results in far fewer atomic operations on every path element,
    significantly improving path lookup performance. It also avoids cacheline
    bouncing on common dentries, significantly improving scalability.

    The overall design is like this:
    * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
    * Take the RCU lock for the entire path walk, starting with the acquiring
    of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
    not required for dentry persistence.
    * synchronize_rcu is called when unregistering a filesystem, so we can
    access d_ops and i_ops during rcu-walk.
    * Similarly take the vfsmount lock for the entire path walk. So now mnt
    refcounts are not required for persistence. Also we are free to perform mount
    lookups, and to assume dentry mount points and mount roots are stable up and
    down the path.
    * Have a per-dentry seqlock to protect the dentry name, parent, and inode,
    so we can load this tuple atomically, and also check whether any of its
    members have changed.
    * Dentry lookups (based on parent, candidate string tuple) recheck the parent
    sequence after the child is found in case anything changed in the parent
    during the path walk.
    * inode is also RCU protected so we can load d_inode and use the inode for
    limited things.
    * i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
    * i_op can be loaded.

    When we reach the destination dentry, we lock it, recheck lookup sequence,
    and increment its refcount and mountpoint refcount. RCU and vfsmount locks
    are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
    not match, we can not drop rcu-walk gracefully at the current point in the
    lokup, so instead return -ECHILD (for want of a better errno). This signals the
    path walking code to re-do the entire lookup with a ref-walk.

    Aside from the final dentry, there are other situations that may be encounted
    where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
    a reference on the last good dentry) and continue with a ref-walk. Again, if
    we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
    using ref-walk. But it is very important that we can continue with ref-walk
    for most cases, particularly to avoid the overhead of double lookups, and to
    gain the scalability advantages on common path elements (like cwd and root).

    The cases where rcu-walk cannot continue are:
    * NULL dentry (ie. any uncached path element)
    * parent with d_inode->i_op->permission or ACLs
    * dentries with d_revalidate
    * Following links

    In future patches, permission checks and d_revalidate become rcu-walk aware. It
    may be possible eventually to make following links rcu-walk aware.

    Uncached path elements will always require dropping to ref-walk mode, at the
    very least because i_mutex needs to be grabbed, and objects allocated.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_compare so it may be called from lock-free RCU lookups. This
    does put significant restrictions on what may be done from the callback,
    however there don't seem to have been any problems with in-tree fses.
    If some strange use case pops up that _really_ cannot cope with the
    rcu-walk rules, we can just add new rcu-unaware callbacks, which would
    cause name lookup to drop out of rcu-walk mode.

    For in-tree filesystems, this is just a mechanical change.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_delete from a dentry deletion notification to a dentry caching
    advise, more like ->drop_inode. Require it to be constant and idempotent,
    and not take d_lock. This is how all existing filesystems use the callback
    anyway.

    This makes fine grained dentry locking of dput and dentry lru scanning
    much simpler.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • * 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (416 commits)
    ARM: DMA: add support for DMA debugging
    ARM: PL011: add DMA burst threshold support for ST variants
    ARM: PL011: Add support for transmit DMA
    ARM: PL011: Ensure IRQs are disabled in UART interrupt handler
    ARM: PL011: Separate hardware FIFO size from TTY FIFO size
    ARM: PL011: Allow better handling of vendor data
    ARM: PL011: Ensure error flags are clear at startup
    ARM: PL011: include revision number in boot-time port printk
    ARM: vexpress: add sched_clock() for Versatile Express
    ARM i.MX53: Make MX53 EVK bootable
    ARM i.MX53: Some bug fix about MX53 MSL code
    ARM: 6607/1: sa1100: Update platform device registration
    ARM: 6606/1: sa1100: Fix platform device registration
    ARM i.MX51: rename IPU irqs
    ARM i.MX51: Add ipu clock support
    ARM: imx/mx27_3ds: Add PMIC support
    ARM: DMA: Replace page_to_dma()/dma_to_page() with pfn_to_dma()/dma_to_pfn()
    mx51: fix usb clock support
    MX51: Add support for usb host 2
    arch/arm/plat-mxc/ehci.c: fix errors/typos
    ...

    Linus Torvalds
     

06 Jan, 2011

1 commit


09 Dec, 2010

1 commit


06 Dec, 2010

1 commit

  • Because it caused a chroot ttyname regression in 2.6.36.

    As of 2.6.36 ttyname does not work in a chroot. It has already been
    reported that screen breaks, and for me this breaks an automated
    distribution testsuite, that I need to preserve the ability to run the
    existing binaries on for several more years. glibc 2.11.3 which has a
    fix for this is not an option.

    The root cause of this breakage is:

    commit 8df9d1a4142311c084ffeeacb67cd34d190eff74
    Author: Miklos Szeredi
    Date: Tue Aug 10 11:41:41 2010 +0200

    vfs: show unreachable paths in getcwd and proc

    Prepend "(unreachable)" to path strings if the path is not reachable
    from the current root.

    Two places updated are
    - the return string from getcwd()
    - and symlinks under /proc/$PID.

    Other uses of d_path() are left unchanged (we know that some old
    software crashes if /proc/mounts is changed).

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    So remove the nice sounding, but ultimately ill advised change to how
    /proc/fd symlinks work.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

30 Nov, 2010

2 commits

  • A recurring complaint from CFS users is that parallel kbuild has
    a negative impact on desktop interactivity. This patch
    implements an idea from Linus, to automatically create task
    groups. Currently, only per session autogroups are implemented,
    but the patch leaves the way open for enhancement.

    Implementation: each task's signal struct contains an inherited
    pointer to a refcounted autogroup struct containing a task group
    pointer, the default for all tasks pointing to the
    init_task_group. When a task calls setsid(), a new task group
    is created, the process is moved into the new task group, and a
    reference to the preveious task group is dropped. Child
    processes inherit this task group thereafter, and increase it's
    refcount. When the last thread of a process exits, the
    process's reference is dropped, such that when the last process
    referencing an autogroup exits, the autogroup is destroyed.

    At runqueue selection time, IFF a task has no cgroup assignment,
    its current autogroup is used.

    Autogroup bandwidth is controllable via setting it's nice level
    through the proc filesystem:

    cat /proc//autogroup

    Displays the task's group and the group's nice level.

    echo > /proc//autogroup

    Sets the task group's shares to the weight of nice task.
    Setting nice level is rate limited for !admin users due to the
    abuse risk of task group locking.

    The feature is enabled from boot by default if
    CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
    the boot option noautogroup, and can also be turned on/off on
    the fly via:

    echo [01] > /proc/sys/kernel/sched_autogroup_enabled

    ... which will automatically move tasks to/from the root task group.

    Signed-off-by: Mike Galbraith
    Acked-by: Linus Torvalds
    Acked-by: Peter Zijlstra
    Cc: Markus Trippelsdorf
    Cc: Mathieu Desnoyers
    Cc: Paul Turner
    Cc: Oleg Nesterov
    [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
    Signed-off-by: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Allow architectures to redefine this macro if needed. This is useful for
    example in architectures where 64-bit ELF vmcores are not supported.
    Specifying zero vmcore_elf64_check_arch() allows compiler to optimize
    away unnecessary parts of parse_crash_elf64_headers().

    We also rename the macro to vmcore_elf64_check_arch() to reflect that it
    is used for 64-bit vmcores only.

    Signed-off-by: Mika Westerberg
    Signed-off-by: Russell King

    Mika Westerberg
     

25 Nov, 2010

1 commit

  • Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512
    pages.) But there is a corner case where walk_pmd_range() accidentally
    runs over a VMA associated with a hugetlbfs file.

    For example, when a process has mappings to VMAs as shown below:

    # cat /proc//maps
    ...
    3a58f6d000-3a58f72000 rw-p 00000000 00:00 0
    7fbd51853000-7fbd51855000 rw-p 00000000 00:00 0
    7fbd5186c000-7fbd5186e000 rw-p 00000000 00:00 0
    7fbd51a00000-7fbd51c00000 rw-s 00000000 00:12 8614 /hugepages/test

    then pagemap_read() goes into walk_pmd_range() path and walks in the range
    0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by
    walk_hugetlb_range(). Otherwise PMD for the hugepage is considered bad
    and cleared, which causes undesirable results.

    This patch fixes it by separating pagemap walk range into one PMD.

    Signed-off-by: Naoya Horiguchi
    Cc: Jun'ichi Nomura
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

18 Nov, 2010

1 commit


17 Nov, 2010

1 commit

  • It allows users to see what consoles are currently known to the system
    and with what flags.

    It is based on Werner's patch, the part about traversing fds was
    removed, the code was moved to kernel/printk.c, where consoles are
    handled and it makes more sense to me.

    Signed-off-by: Jiri Slaby [cleanups]
    Signed-off-by: "Dr. Werner Fink"
    Cc: Al Viro
    Cc: Greg Kroah-Hartman
    Signed-off-by: Greg Kroah-Hartman

    Jiri Slaby
     

29 Oct, 2010

2 commits


28 Oct, 2010

4 commits

  • In /proc/stat, the number of per-IRQ event is shown by making a sum each
    irq's events on all cpus. But we can make use of kstat_irqs().

    kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
    it's not a big cost. (Both of the number of cpus and irqs are small.)

    If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

    for_each_irq()
    for_each_cpu()
    - look up a radix tree
    - read desc->irq_stat[cpu]
    This seems not efficient. This patch adds kstat_irqs() for
    CONFIG_GENRIC_HARDIRQ and change the calculation as

    for_each_irq()
    look up radix tree
    for_each_cpu()
    - read desc->irq_stat[cpu]

    This reduces cost.

    A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

    %time cat /proc/stat > /dev/null

    Before Patch: 2.459 sec
    After Patch : .561 sec

    [akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
    [akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • /proc/stat shows the total number of all interrupts to each cpu. But when
    the number of IRQs are very large, it take very long time and 'cat
    /proc/stat' takes more than 10 secs. This is because sum of all irq
    events are counted when /proc/stat is read. This patch adds "sum of all
    irq" counter percpu and reduce read costs.

    The cost of reading /proc/stat is important because it's used by major
    applications as 'top', 'ps', 'w', etc....

    A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows

    %time cat /proc/stat > /dev/null
    Before Patch: 12.627 sec
    After Patch: 2.459 sec

    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • The length of the BLOCK_IPOLL string is making i's value be printed too
    far to the right. This patch fixes this and makes the output a bit
    neater.

    Currently:
    CPU0
    HI: 0
    TIMER: 599792
    NET_TX: 2
    NET_RX: 6
    BLOCK: 80807
    BLOCK_IOPOLL: 0
    TASKLET: 20012
    SCHED: 0
    HRTIMER: 63
    RCU: 619279

    With patch:
    CPU0
    HI: 0
    TIMER: 585582
    NET_TX: 2
    NET_RX: 6
    BLOCK: 80320
    BLOCK_IOPOLL: 0
    TASKLET: 19287
    SCHED: 0
    HRTIMER: 62
    RCU: 604441

    Signed-off-by: Davidlohr Bueso
    Acked-by: Keika Kobayashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Export the number of anonymous pages in a mapping via smaps.

    Even the private pages in a mapping backed by a file, would be marked as
    anonymous, when they are modified. Export this information to user-space via
    smaps.

    Exporting this count will help gdb to make a better decision on which
    areas need to be dumped in its coredump; and should be useful to others
    studying the memory usage of a process.

    Signed-off-by: Nikanth Karthikesan
    Acked-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikanth Karthikesan