28 Oct, 2010

40 commits

  • This takes care of leaking uninitialized kernel stack memory to
    userspace from non-zeroed fields in structs in compat ipc functions.

    Signed-off-by: Dan Rosenberg
    Cc: Manfred Spraul
    Cc: Arnd Bergmann
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Rosenberg
     
  • The kernel currently provides no functionality to analyze the RSS and swap
    space usage of each individual sysvipc shared memory segment.

    This patch adds this info for each existing shm segment by extending the
    output of /proc/sysvipc/shm by two columns for RSS and swap.

    Since shmctl(SHM_INFO) already provides a similiar calculation (it
    currently sums up all RSS/swap info for all segments), I did split out a
    static function which is now used by the /proc/sysvipc/shm output and
    shmctl(SHM_INFO).

    SAP products (esp. the SAP Netweaver ABAP Kernel) uses lots of big shared
    memory segments (we often have Linux systems with >= 16GB shm usage).
    Sometimes we get customer reports about "slow" system responses and while
    looking into their configurations we often find massive swapping activity
    on the system. With this patch it's now easy to see from the command line
    if and which shm segments gets swapped out (and how much) and can more
    easily give recommendations for system tuning. Without the patch it's
    currently not possible to do such shm analysis at all.

    Also...

    Add some spaces in front of the "size" field for 64bit kernels to get the
    columns correct if you cat the contents of the file. In
    sysvipc_shm_proc_show() the kernel prints the size value in "SPEC_SIZE"
    format, which is defined like this:

    #if BITS_PER_LONG
    Cc: Manfred Spraul
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Helge Deller
     
  • Presently do_execve() turns PF_KTHREAD off before search_binary_handler().
    THis has a theorical risk of PF_KTHREAD getting lost. We don't have to
    turn PF_KTHREAD off in the ENOEXEC case.

    This patch moves this flag modification to after the finding of the
    executable file.

    This is only a theorical issue because kthreads do not call do_execve()
    directly. But fixing would be better.

    Signed-off-by: KOSAKI Motohiro
    Acked-by: Roland McGrath
    Acked-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • In /proc/stat, the number of per-IRQ event is shown by making a sum each
    irq's events on all cpus. But we can make use of kstat_irqs().

    kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
    it's not a big cost. (Both of the number of cpus and irqs are small.)

    If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

    for_each_irq()
    for_each_cpu()
    - look up a radix tree
    - read desc->irq_stat[cpu]
    This seems not efficient. This patch adds kstat_irqs() for
    CONFIG_GENRIC_HARDIRQ and change the calculation as

    for_each_irq()
    look up radix tree
    for_each_cpu()
    - read desc->irq_stat[cpu]

    This reduces cost.

    A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

    %time cat /proc/stat > /dev/null

    Before Patch: 2.459 sec
    After Patch : .561 sec

    [akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
    [akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • /proc/stat shows the total number of all interrupts to each cpu. But when
    the number of IRQs are very large, it take very long time and 'cat
    /proc/stat' takes more than 10 secs. This is because sum of all irq
    events are counted when /proc/stat is read. This patch adds "sum of all
    irq" counter percpu and reduce read costs.

    The cost of reading /proc/stat is important because it's used by major
    applications as 'top', 'ps', 'w', etc....

    A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows

    %time cat /proc/stat > /dev/null
    Before Patch: 12.627 sec
    After Patch: 2.459 sec

    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • The length of the BLOCK_IPOLL string is making i's value be printed too
    far to the right. This patch fixes this and makes the output a bit
    neater.

    Currently:
    CPU0
    HI: 0
    TIMER: 599792
    NET_TX: 2
    NET_RX: 6
    BLOCK: 80807
    BLOCK_IOPOLL: 0
    TASKLET: 20012
    SCHED: 0
    HRTIMER: 63
    RCU: 619279

    With patch:
    CPU0
    HI: 0
    TIMER: 585582
    NET_TX: 2
    NET_RX: 6
    BLOCK: 80320
    BLOCK_IOPOLL: 0
    TASKLET: 19287
    SCHED: 0
    HRTIMER: 62
    RCU: 604441

    Signed-off-by: Davidlohr Bueso
    Acked-by: Keika Kobayashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Document /proc/pid/pagemap in Documentation/filesystems/proc.txt

    Signed-off-by: Nikanth Karthikesan
    Cc: Richard Guenther
    Cc: Balbir Singh
    Cc: KOSAKI Motohiro
    Acked-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikanth Karthikesan
     
  • Export the number of anonymous pages in a mapping via smaps.

    Even the private pages in a mapping backed by a file, would be marked as
    anonymous, when they are modified. Export this information to user-space via
    smaps.

    Exporting this count will help gdb to make a better decision on which
    areas need to be dumped in its coredump; and should be useful to others
    studying the memory usage of a process.

    Signed-off-by: Nikanth Karthikesan
    Acked-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikanth Karthikesan
     
  • find_new_reaper() releases and regrabs tasklist_lock but was missing
    proper annotations. Add it. This remove following sparse warning:

    warning: context imbalance in 'find_new_reaper' - unexpected unlock

    Signed-off-by: Namhyung Kim
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The userland ELF tools have been coping with partial-segments core files
    for a few years now. Multiple distro builds are now setting this option.
    It behooves everyone who ever deals with core files to have more info
    dumped in there, especially as more and more people's compilers are
    producing build IDs. Make it the default.

    Anyone using older tools confused by these core files can configure this
    option off, or just change /proc/PID/coredump_filter after boot.

    Signed-off-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • We met a parameter truncated issue, consider following:
    > echo "|/root/core_pattern_pipe_test %p /usr/libexec/blah-blah-blah \
    %s %c %p %u %g 11 12345678901234567890123456789012345678 %t" > \
    /proc/sys/kernel/core_pattern

    This is okay because the strings is less than CORENAME_MAX_SIZE. "cat
    /proc/sys/kernel/core_pattern" shows the whole string. but after we run
    core_pattern_pipe_test in man page, we found last parameter was truncated
    like below:

    argc[10]=

    The root cause is core_pattern allows % specifiers, which need to be
    replaced during parse time, but the replace may expand the strings to
    larger than CORENAME_MAX_SIZE. So if the last parameter is % specifiers,
    the replace code is using snprintf(out_ptr, out_end - out_ptr, ...), this
    will write out of corename array.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Xiaotian Feng
    Cc: Alexander Viro
    Cc: Oleg Nesterov
    Cc: KOSAKI Motohiro
    Reviewed-by: Neil Horman
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaotian Feng
     
  • Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
    itself and we can reuse ->cred_guard_mutex for it. Yes, concurrent
    execve() has no worth.

    Let's move ->cred_guard_mutex from task_struct to signal_struct. It
    naturally prevent multiple-threads-inside-exec.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • ptrace_stop() releases and regrabs current->sighand->siglock but was
    missing proper annotation. Add it.

    Signed-off-by: Namhyung Kim
    Acked-by: Roland McGrath
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • lock_task_sighand() grabs sighand->siglock in case of returning non-NULL
    but unlock_task_sighand() releases it unconditionally. This leads sparse
    to complain about the lock context imbalance. Rename and wrap
    lock_task_sighand() using __cond_lock() macro to make sparse happy.

    Suggested-by: Eric Dumazet
    Signed-off-by: Namhyung Kim
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'datap' variable in order to remove unnecessary castings.

    Signed-off-by: Namhyung Kim
    Cc: Chris Zankel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Remove unnecessary castings using void pointer and fix copy_to_user()
    return value. Also add missing __user markup on the argument of
    arch_ptrctl().

    Signed-off-by: Namhyung Kim
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Remove checking @addr less than 0 because @addr is now unsigned.

    Signed-off-by: Namhyung Kim
    Acked-by: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Factor out struct fps and remove redundant castings.

    Signed-off-by: Namhyung Kim
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Remove unnecessary castings and get rid of dummy pointer in favor of
    offsetof() macro in ptrace_32.c. Also use temporary variables and
    break long lines in order to improve readability.

    Signed-off-by: Namhyung Kim
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Remove unnecessary castings.

    Signed-off-by: Namhyung Kim
    Cc: Chen Liqin
    Cc: Lennox Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'datavp' and 'datalp' variables in order to remove unnecessary
    castings.

    Signed-off-by: Namhyung Kim
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Add missing __user markup on the argument of put_user().

    Signed-off-by: Namhyung Kim
    Cc: Kyle McMartin
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'datap' variable in order to remove unnecessary castings.
    Also remove checking @addr less than 0 because @addr is now unsigned.

    Signed-off-by: Namhyung Kim
    Cc: David Howells
    Cc: Koichi Yasutake
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'addrp', 'datavp' and 'datalp' variables in order to remove
    unnecessary castings.

    Signed-off-by: Namhyung Kim
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Remove checking @addr greater than 0 because @addr is now unsigned.

    Signed-off-by: Namhyung Kim
    Cc: Michal Simek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'regno', 'datap' variables in order to remove duplicated
    expressions and unnecessary castings. Alse remove checking @addr less
    than 0 because addr is now unsigned.

    Signed-off-by: Namhyung Kim
    Acked-by: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'regno', 'datap' variables in order to remove duplicated
    expressions and unnecessary castings.

    Signed-off-by: Namhyung Kim
    Acked-by: Geert Uytterhoeven
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'datap' variable in order to remove duplicated castings.

    Signed-off-by: Namhyung Kim
    Cc: Hirokazu Takata
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'regno', 'datap' variables in order to remove duplicated
    expressions and unnecessary castings. Alse remove checking @addr
    less than 0 because addr is now unsigned.

    Signed-off-by: Namhyung Kim
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'regno', 'datap' variables in order to remove duplicated
    expressions and unnecessary castings. Alse remove checking @addr
    less than 0 because addr is now unsigned.

    Signed-off-by: Namhyung Kim
    Cc: David Howells
    Cc: "Daniel K."
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'regno' variable in order to remove redandunt expression and
    remove checking @addr less than 0 because @addr is now unsigned. Also
    update 'datap' on PTRACE_GET/SETREGS to fix a bug on arch-v10.

    Signed-off-by: Namhyung Kim
    Acked-by: Mikael Starvik
    Cc: Jesper Nilsson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Change signature of get/put_reg() according to the change of arch_ptrace()
    and remove unnecessary castings.

    Signed-off-by: Namhyung Kim
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • use new 'datap' variable type of void pointer in order to remove unnecessary
    castings.

    Signed-off-by: Namhyung Kim
    Acked-by: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • use new 'datap' variable in order to remove unnecessary castings.

    Signed-off-by: Namhyung Kim
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Remove checking @addr less than 0 because @addr is now unsigned and
    use new udescp variable in order to remove unnecessary castings.

    [akpm@linux-foundation.org: fix unused variable 'udescp']
    Signed-off-by: Namhyung Kim
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Fix up the arguments to arch_ptrace() to take account of the fact that
    @addr and @data are now unsigned long rather than long as of a preceding
    patch in this series.

    Signed-off-by: Namhyung Kim
    Cc:
    Acked-by: Roland McGrath
    Acked-by: David Howells
    Acked-by: Geert Uytterhoeven
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Use new 'datavp' and 'datalp' variables to remove unnecesary castings.

    Signed-off-by: Namhyung Kim
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Since userspace API of ptrace syscall defines @addr and @data as void
    pointers, it would be more appropriate to define them as unsigned long in
    kernel. Therefore related functions are changed also.

    'unsigned long' is typically used in other places in kernel as an opaque
    data type and that using this helps cleaning up a lot of warnings from
    sparse.

    Suggested-by: Arnd Bergmann
    Signed-off-by: Namhyung Kim
    Acked-by: Arnd Bergmann
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • exit_ptrace() releases and regrabs tasklist_lock but was missing proper
    annotation. Add it.

    Signed-off-by: Namhyung Kim
    Acked-by: Roland McGrath
    Cc: Ingo Molnar
    Cc: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • This patch extracts the core logic from mem_cgroup_update_file_mapped() as
    mem_cgroup_update_file_stat() and adds a wrapper.

    As a planned future update, memory cgroup has to count dirty pages to
    implement dirty_ratio/limit. And more, the number of dirty pages is
    required to kick flusher thread to start writeback. (Now, no kick.)

    This patch is preparation for it and makes other statistics implementation
    clearer. Just a clean up.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Reviewed-by: Greg Thelen
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki