24 Sep, 2009

40 commits

  • The following series adds a "cgroup.procs" file to each cgroup that
    reports unique tgids rather than pids, and allows all threads in a
    threadgroup to be atomically moved to a new cgroup.

    The subsystem "attach" interface is modified to support attaching whole
    threadgroups at a time, which could introduce potential problems if any
    subsystem were to need to access the old cgroup of every thread being
    moved. The attach interface may need to be revised if this becomes the
    case.

    Also added is functionality for read/write locking all CLONE_THREAD
    fork()ing within a threadgroup, by means of an rwsem that lives in the
    sighand_struct, for per-threadgroup-ness and also for sharing a cacheline
    with the sighand's atomic count. This scheme should introduce no extra
    overhead in the fork path when there's no contention.

    The final patch reveals potential for a race when forking before a
    subsystem's attach function is called - one potential solution in case any
    subsystem has this problem is to hang on to the group's fork mutex through
    the attach() calls, though no subsystem yet demonstrates need for an
    extended critical section.

    This patch:

    Revert

    commit 096b7fe012d66ed55e98bc8022405ede0cc80e96
    Author: Li Zefan
    AuthorDate: Wed Jul 29 15:04:04 2009 -0700
    Commit: Linus Torvalds
    CommitDate: Wed Jul 29 19:10:35 2009 -0700

    cgroups: fix pid namespace bug

    This is in preparation for some clashing cgroups changes that subsume the
    original commit's functionaliy.

    The original commit fixed a pid namespace bug which Ben Blum fixed
    independently (in the same way, but with different code) as part of a
    series of patches. I played around with trying to reconcile Ben's patch
    series with Li's patch, but concluded that it was simpler to just revert
    Li's, given that Ben's patch series contained essentially the same fix.

    Signed-off-by: Paul Menage
    Cc: Li Zefan
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • This patch removes the restriction that a cgroup hierarchy must have at
    least one bound subsystem. The mount option "none" is treated as an
    explicit request for no bound subsystems.

    A hierarchy with no subsystems can be useful for plain task tracking, and
    is also a step towards the support for multiply-bindable subsystems.

    As part of this change, the hierarchy id is no longer calculated from the
    bitmask of subsystems in the hierarchy (since this is not guaranteed to be
    unique) but is allocated via an ida. Reference counts on cgroups from
    css_set objects are now taken explicitly one per hierarchy, rather than
    one per subsystem.

    Example usage:

    mount -t cgroup -o none,name=foo cgroup /mnt/cgroup

    Based on the "no-op"/"none" subsystem concept proposed by
    kamezawa.hiroyu@jp.fujitsu.com

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Currently the cgroups code makes the assumption that the subsystem
    pointers in a struct css_set uniquely identify the hierarchy->cgroup
    mappings associated with the css_set; and there's no way to directly
    identify the associated set of cgroups other than by indirecting through
    the appropriate subsystem state pointers.

    This patch removes the need for that assumption by adding a back-pointer
    from struct cg_cgroup_link object to its associated cgroup; this allows
    the set of cgroups to be determined by traversing the cg_links list in
    the struct css_set.

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • While it's architecturally clean to have the cgroup debug subsystem be
    completely independent of the cgroups framework, it limits its usefulness
    for debugging the contents of internal data structures. Move the debug
    subsystem code into the scope of all the cgroups data structures to make
    more detailed debugging possible.

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • To simplify referring to cgroup hierarchies in mount statements, and to
    allow disambiguation in the presence of empty hierarchies and
    multiply-bindable subsystems this patch adds support for naming a new
    cgroup hierarchy via the "name=" mount option

    A pre-existing hierarchy may be specified by either name or by subsystems;
    a hierarchy's name cannot be changed by a remount operation.

    Example usage:

    # To create a hierarchy called "foo" containing the "cpu" subsystem
    mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1

    # To mount the "foo" hierarchy on a second location
    mount -t cgroup -oname=foo cgroup /mnt/cgroup2

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Make the last unlock sequence consistent with previous unlock sequeue.

    Acked-by: Balbir Singh
    Acked-by: Paul Menage
    Signed-off-by: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaotian Feng
     
  • Fix various Documentation/ paths in include/linux/.

    Signed-off-by: Randy Dunlap
    Reviewed-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Introduce "-p|--pid " for walking the process address space. The
    default action is to walk raw memory PFNs.

    Both the virtual address and physical address of each present pages will
    be listed:

    # ./tools/vm/page-types -lp $$ | head -3
    voffset offset len flags
    400 11bebe 1 __RU_lA____M______________________
    402 11bebc 1 __RU_lA____M______________________

    Note that voffset/offset/len are now showed as hex numbers.

    [akpm@linux-foundation.org: coding-style fixes]
    Cc: Andi Kleen
    Signed-off-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • Signed-off-by: Josh Triplett
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • fix the following 'make includecheck' warning:

    Documentation/auxdisplay/cfag12864b-example.c: string.h is included more than once.

    Signed-off-by: Jaswinder Singh Rajput
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     
  • In "documentation: update Documentation/filesystem/proc.txt and
    Documentation/sysctls" (commit 760df93ec) we merged /proc/sys/fs
    documentation in Documentation/sysctl/fs.txt and
    Documentation/filesystem/proc.txt, but stale file-nr definition
    remained.

    This patch adds back the right fs-nr definition for 2.6 kernel.

    Signed-off-by: Xiaotian Feng
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaotian Feng
     
  • Documentation/filesystems/sharedsubtree.txt needs updating because the
    mount command in util-linux package is well aware of shared subtree
    features now. The patch also fixes two typos in sharedsubtree.txt.

    Signed-off-by: Peng Tao
    Signed-off-by: Randy Dunlap
    Cc: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peng Tao
     
  • mount(8) handles shared subtrees just fine, so remove the smount program
    from Documentation/filesystems/sharedsubtree.txt.

    Fix annoying "Lets" -> "Let's".
    Insert space between '#' prompt and "mount" command.

    Signed-off-by: Randy Dunlap
    Acked-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • There are many similar code in kernel for one object: convert time between
    calendar time and broken-down time.

    Here is some source I found:
    fs/ncpfs/dir.c
    fs/smbfs/proc.c
    fs/fat/misc.c
    fs/udf/udftime.c
    fs/cifs/netmisc.c
    net/netfilter/xt_time.c
    drivers/scsi/ips.c
    drivers/input/misc/hp_sdc_rtc.c
    drivers/rtc/rtc-lib.c
    arch/ia64/hp/sim/boot/fw-emu.c
    arch/m68k/mac/misc.c
    arch/powerpc/kernel/time.c
    arch/parisc/include/asm/rtc.h
    ...

    We can make a common function for this type of conversion, At least we
    can get following benefit:

    1: Make kernel simple and unify
    2: Easy to fix bug in converting code
    3: Reduce clone of code in future
    For example, I'm trying to make ftrace display walltime,
    this patch will make me easy.

    This code is based on code from glibc-2.6

    Signed-off-by: Zhao Lei
    Cc: OGAWA Hirofumi
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Pavel Machek
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhaolei
     
  • Commit 6bfde05bf5c ("hugetlbfs: allow the creation of files suitable for
    MAP_PRIVATE on the vfs internal mount") altered can_do_hugetlb_shm() to
    check if a file is being created for shared memory or mmap(). If this
    returns false, we then unconditionally call user_shm_lock() triggering a
    warning. This block should never be entered for MAP_HUGETLB. This
    patch partially reverts the problem and fixes the check.

    Signed-off-by: Eric B Munson
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Adam Litke
    Cc: David Gibson
    Cc: Lee Schermerhorn
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    From: Mel Gorman
     
  • Now that ksm is in mainline it is better to change the default values to
    better fit to most of the users.

    This patch change the ksm default values to be:

    ksm_thread_pages_to_scan = 100 (instead of 200)
    ksm_thread_sleep_millisecs = 20 (like before)
    ksm_run = KSM_RUN_STOP (instead of KSM_RUN_MERGE - meaning ksm is
    disabled by default)
    ksm_max_kernel_pages = nr_free_buffer_pages / 4 (instead of 2046)

    The important aspect of this patch is: it disables ksm by default, and sets
    the number of the kernel_pages that can be allocated to be a reasonable
    number.

    Signed-off-by: Izik Eidus
    Cc: Hugh Dickins
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Izik Eidus
     
  • …e driver Kconfig entry

    Fix these warnings:

    drivers/built-in.o: In function `apanel_remove':
    apanel.c:(.text+0x56e852): undefined reference to `led_classdev_unregister'
    drivers/built-in.o: In function `apanel_probe':
    apanel.c:(.text+0x56eae3): undefined reference to `led_classdev_register'
    drivers/built-in.o: In function `acpi_fujitsu_hotkey_add':
    fujitsu-laptop.c:(.text+0x5d7647): undefined reference to `led_classdev_register'
    fujitsu-laptop.c:(.text+0x5d76b5): undefined reference to `led_classdev_register'
    drivers/built-in.o: In function `wbcir_probe':
    winbond-cir.c:(.devinit.text+0x5f375): undefined reference to `led_classdev_register'
    winbond-cir.c:(.devinit.text+0x5f663): undefined reference to `led_classdev_unregister'
    drivers/built-in.o: In function `wbcir_remove':
    winbond-cir.c:(.devexit.text+0x7f23): undefined reference to `led_classdev_unregister'
    drivers/built-in.o: In function `fujitsu_cleanup':
    fujitsu-laptop.c:(.exit.text+0xbe37): undefined reference to `led_classdev_unregister'
    fujitsu-laptop.c:(.exit.text+0xbe53): undefined reference to `led_classdev_unregister'

    It happens because the new INPUT_WINBOND_CIR driver relies on new-leds
    infrastructure - but does not select it in drivers/input/misc/Kconfig.
    But it selects LEDS_CLASS, which confuses a number of other drivers into
    thinking that all the leds infrastructure is in place.

    Fix this by selecting NEW_LEDS as well, like similar drivers do.

    Eventually, this whole leds infrastructure complexity should be
    cleaned up, it's been going on for years.

    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
    Cc: Dmitry Torokhov <dtor@mail.ru>
    Cc: David Härdeman <david@hardeman.nu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Ingo Molnar
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
    cpumask: Move deprecated functions to end of header.
    cpumask: remove unused deprecated functions, avoid accusations of insanity
    cpumask: use new-style cpumask ops in mm/quicklist.
    cpumask: use mm_cpumask() wrapper: x86
    cpumask: use mm_cpumask() wrapper: um
    cpumask: use mm_cpumask() wrapper: mips
    cpumask: use mm_cpumask() wrapper: mn10300
    cpumask: use mm_cpumask() wrapper: m32r
    cpumask: use mm_cpumask() wrapper: arm
    cpumask: Use accessors for cpu_*_mask: um
    cpumask: Use accessors for cpu_*_mask: powerpc
    cpumask: Use accessors for cpu_*_mask: mips
    cpumask: Use accessors for cpu_*_mask: m32r
    cpumask: remove arch_send_call_function_ipi
    cpumask: arch_send_call_function_ipi_mask: s390
    cpumask: arch_send_call_function_ipi_mask: powerpc
    cpumask: arch_send_call_function_ipi_mask: mips
    cpumask: arch_send_call_function_ipi_mask: m32r
    cpumask: arch_send_call_function_ipi_mask: alpha
    cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
    ...

    Linus Torvalds
     
  • * remove asm/atomic.h inclusion from linux/utsname.h --
    not needed after kref conversion
    * remove linux/utsname.h inclusion from files which do not need it

    NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
    due to some personality stuff it _is_ needed -- cowardly leave ELF-related
    headers and files alone.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • This reverts commit c02e3f361c7 ("kmod: fix race in usermodehelper code")

    The patch is wrong. UMH_WAIT_EXEC is called with VFORK what ensures
    that the child finishes prior returing back to the parent. No race.

    In fact, the patch makes it even worse because it does the thing it
    claims not do:

    - It calls ->complete() on UMH_WAIT_EXEC

    - the complete() callback may de-allocated subinfo as seen in the
    following call chain:

    [] (__link_path_walk+0x20/0xeb4) from [] (path_walk+0x48/0x94)
    [] (path_walk+0x48/0x94) from [] (do_path_lookup+0x24/0x4c)
    [] (do_path_lookup+0x24/0x4c) from [] (do_filp_open+0xa4/0x83c)
    [] (do_filp_open+0xa4/0x83c) from [] (open_exec+0x24/0xe0)
    [] (open_exec+0x24/0xe0) from [] (do_execve+0x7c/0x2e4)
    [] (do_execve+0x7c/0x2e4) from [] (kernel_execve+0x34/0x80)
    [] (kernel_execve+0x34/0x80) from [] (____call_usermodehelper+0x130/0x148)
    [] (____call_usermodehelper+0x130/0x148) from [] (kernel_thread_exit+0x0/0x8)

    and the path pointer was NULL. Good that ARM's kernel_execve()
    doesn't check the pointer for NULL or else I wouldn't notice it.

    The only race there might be is with UMH_NO_WAIT but it is too late for
    me to investigate it now. UMH_WAIT_PROC could probably also use VFORK
    and we could save one exec. So the only race I see is with UMH_NO_WAIT
    and recent scheduler changes where the child does not always run first
    might have trigger here something but as I said, it is late....

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Neil Horman
    Signed-off-by: Linus Torvalds

    Sebastian Andrzej Siewior
     
  • The new ones have pretty kerneldoc. Move the old ones to the end to
    avoid confusing people.

    Signed-off-by: Rusty Russell
    Cc: benh@kernel.crashing.org

    Rusty Russell
     
  • We're not forcing removal of the old cpu_ functions, but we might as
    well delete the now-unused ones.

    Especially CPUMASK_ALLOC and friends. I actually got a phone call (!)
    from a hacker who thought I had introduced them as the new cpumask
    API. He seemed bewildered that I had lost all taste.

    Signed-off-by: Rusty Russell
    Cc: benh@kernel.crashing.org

    Rusty Russell
     
  • This slipped past the previous sweeps.

    Signed-off-by: Rusty Russell
    Acked-by: Christoph Lameter

    Rusty Russell
     
  • Makes code futureproof against the impending change to mm->cpu_vm_mask (to be a pointer).

    It's also a chance to use the new cpumask_ ops which take a pointer
    (the older ones are deprecated, but there's no hurry for arch code).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Makes code futureproof against the impending change to mm->cpu_vm_mask.

    It's also a chance to use the new cpumask_ ops which take a pointer
    (the older ones are deprecated, but there's no hurry for arch code).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Makes code futureproof against the impending change to mm->cpu_vm_mask.

    It's also a chance to use the new cpumask_ ops which take a pointer
    (the older ones are deprecated, but there's no hurry for arch code).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Makes code futureproof against the impending change to mm->cpu_vm_mask
    (to be a pointer).

    It's also a chance to use the new cpumask_ ops which take a pointer
    (the older ones are deprecated, but there's no hurry for arch code).

    Also change the actual arg name here to "mm" (which it is), not "task".

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Makes code futureproof against the impending change to mm->cpu_vm_mask.

    It's also a chance to use the new cpumask_ ops which take a pointer
    (the older ones are deprecated, but there's no hurry for arch code).

    Signed-off-by: Rusty Russell
    Acked-by: Hirokazu Takata (fixes)

    Rusty Russell
     
  • Makes code futureproof against the impending change to mm->cpu_vm_mask.

    It's also a chance to use the new cpumask_ ops which take a pointer
    (the older ones are deprecated, but there's no hurry for arch code).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Use the accessors rather than frobbing bits directly (the new versions
    are const).

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     
  • Use the accessors rather than frobbing bits directly (the new versions
    are const).

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     
  • Use the accessors rather than frobbing bits directly (the new versions
    are const).

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     
  • Use the accessors rather than frobbing bits directly (the new versions
    are const).

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     
  • Now everyone is converted to arch_send_call_function_ipi_mask, remove
    the shim and the #defines.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • We're weaning the core code off handing cpumask's around on-stack.
    This introduces arch_send_call_function_ipi_mask().

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • We're weaning the core code off handing cpumask's around on-stack.
    This introduces arch_send_call_function_ipi_mask(), and by defining
    it, the old arch_send_call_function_ipi is defined by the core code.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • We're weaning the core code off handing cpumask's around on-stack.
    This introduces arch_send_call_function_ipi_mask(), and by defining
    it, the old arch_send_call_function_ipi is defined by the core code.

    We also take the chance to wean the implementations off the
    obsolescent for_each_cpu_mask(): making send_ipi_mask take the pointer
    seemed the most natural way to ensure all implementations used
    for_each_cpu.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • We're weaning the core code off handing cpumask's around on-stack.
    This introduces arch_send_call_function_ipi_mask(), and by defining
    it, the old arch_send_call_function_ipi is defined by the core code.

    We also take the chance to wean the implementations off the
    obsolescent for_each_cpu_mask(): making send_ipi_mask take the pointer
    seemed the most natural way to ensure all implementations used
    for_each_cpu.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • We're weaning the core code off handing cpumask's around on-stack.
    This introduces arch_send_call_function_ipi_mask().

    We also take the chance to wean the send_ipi_message off the
    obsolescent for_each_cpu_mask(): making it take a pointer seemed the
    most natural way to do this.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • There were replaced by topology_core_cpumask and topology_thread_cpumask.

    Signed-off-by: Rusty Russell

    Rusty Russell