18 Nov, 2011

1 commit


08 Nov, 2011

1 commit

  • Since commit 4a31a334, the name of this misc device is not initialized,
    which leads to a funny device named /dev/(null) being created and
    /proc/misc containing an entry with just a number but no name. The latter
    leads to complaints by cryptsetup, which caused me to investigate this
    matter.

    Signed-off-by: Dominik Brodowski
    Signed-off-by: Rafael J. Wysocki

    Dominik Brodowski
     

07 Nov, 2011

7 commits

  • …/kernel/git/jeremy/xen

    * 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
    jump-label: initialize jump-label subsystem much earlier
    x86/jump_label: add arch_jump_label_transform_static()
    s390/jump-label: add arch_jump_label_transform_static()
    jump_label: add arch_jump_label_transform_static() to optimise non-live code updates
    sparc/jump_label: drop arch_jump_label_text_poke_early()
    x86/jump_label: drop arch_jump_label_text_poke_early()
    jump_label: if a key has already been initialized, don't nop it out
    stop_machine: make stop_machine safe and efficient to call early
    jump_label: use proper atomic_t initializer

    Conflicts:
    - arch/x86/kernel/jump_label.c
    Added __init_or_module to arch_jump_label_text_poke_early vs
    removal of that function entirely
    - kernel/stop_machine.c
    same patch ("stop_machine: make stop_machine safe and efficient
    to call early") merged twice, with whitespace fix in one version

    Linus Torvalds
     
  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     
  • * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Add a 'reason' to wb_writeback_work
    writeback: send work item to queue_io, move_expired_inodes
    writeback: trace event balance_dirty_pages
    writeback: trace event bdi_dirty_ratelimit
    writeback: fix ppc compile warnings on do_div(long long, unsigned long)
    writeback: per-bdi background threshold
    writeback: dirty position control - bdi reserve area
    writeback: control dirty pause time
    writeback: limit max dirty pause time
    writeback: IO-less balance_dirty_pages()
    writeback: per task dirty rate limit
    writeback: stabilize bdi->dirty_ratelimit
    writeback: dirty rate control
    writeback: add bg_threshold parameter to __bdi_update_bandwidth()
    writeback: dirty position control
    writeback: account per-bdi accumulated dirtied pages

    Linus Torvalds
     
  • * git://github.com/rustyrussell/linux:
    module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree
    module: Enable dynamic debugging regardless of taint

    Linus Torvalds
     
  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits)
    powerpc/p3060qds: Add support for P3060QDS board
    powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX
    powerpc/85xx: Make kexec to interate over online cpus
    powerpc/fsl_booke: Fix comment in head_fsl_booke.S
    powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices
    powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver
    powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver
    powerpc/86xx: Correct Gianfar support for GE boards
    powerpc/cpm: Clear muram before it is in use.
    drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager
    powerpc/fsl_msi: add support for "msi-address-64" property
    powerpc/85xx: Setup secondary cores PIR with hard SMP id
    powerpc/fsl-booke: Fix settlbcam for 64-bit
    powerpc/85xx: Adding DCSR node to dtsi device trees
    powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards
    powerpc/85xx: fix PHYS_64BIT selection for P1022DS
    powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map
    powerpc: respect mem= setting for early memory limit setup
    powerpc: Update corenet64_smp_defconfig
    powerpc: Update mpc85xx/corenet 32-bit defconfigs
    ...

    Fix up trivial conflicts in:
    - arch/powerpc/configs/40x/hcu4_defconfig
    removed stale file, edited elsewhere
    - arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c:
    added opal and gelic drivers vs added ePAPR driver
    - drivers/tty/serial/8250.c
    moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support

    Linus Torvalds
     
  • Use of the GPL or a compatible licence doesn't necessarily make the code
    any good. We already consider staging modules to be suspect, and this
    should also be true for out-of-tree modules which may receive very
    little review.

    Signed-off-by: Ben Hutchings
    Reviewed-by: Dave Jones
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Rusty Russell (patched oops-tracing.txt)

    Ben Hutchings
     
  • Dynamic debugging is currently disabled for tainted modules, except
    for TAINT_CRAP. This prevents use of dynamic debugging for
    out-of-tree modules once the next patch is applied.

    This condition was apparently intended to avoid a crash if a force-
    loaded module has an incompatible definition of dynamic debug
    structures. However, a administrator that forces us to load a module
    is claiming that it *is* compatible even though it fails our version
    checks. If they are mistaken, there are any number of ways the module
    could crash the system.

    As a side-effect, proprietary and other tainted modules can now use
    dynamic_debug.

    Signed-off-by: Ben Hutchings
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Rusty Russell

    Ben Hutchings
     

05 Nov, 2011

3 commits

  • …ASK_KILLABLE tasks too"

    Commit 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake
    TASK_KILLABLE tasks too" updated fake_signal_wake_up() used by freezer
    to wake up KILLABLE tasks. Sending unsolicited wakeups to tasks in
    killable sleep is dangerous as there are code paths which depend on
    tasks not waking up spuriously from KILLABLE sleep.

    For example. sys_read() or page can sleep in TASK_KILLABLE assuming
    that wait/down/whatever _killable can only fail if we can not return
    to the usermode. TASK_TRACED is another obvious example.

    The previous patch updated wait_event_freezekillable() such that it
    doesn't depend on the spurious wakeup. This patch reverts the
    offending commit.

    Note that the spurious KILLABLE wakeup had other implicit effects in
    KILLABLE sleeps in nfs and cifs and those will need further updates to
    regain freezekillable behavior.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

    Tejun Heo
     
  • Remove an "if" check, that repeats an equivalent one 6 lines above.

    Signed-off-by: Guennadi Liakhovetski
    Signed-off-by: Rafael J. Wysocki

    Guennadi Liakhovetski
     
  • The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down()
    functions depend on the value of the 'tasks_frozen' argument passed to them
    (which indicates whether tasks have been frozen or not).
    (Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN,
    CPU_DEAD, CPU_DEAD_FROZEN).

    Thus, it is essential that while the callbacks for those notifications are
    running, the state of the system with respect to the tasks being frozen or
    not remains unchanged, *throughout that duration*. Hence there is a need for
    synchronizing the CPU hotplug code with the freezer subsystem.

    Since the freezer is involved only in the Suspend/Hibernate call paths, this
    patch hooks the CPU hotplug code to the suspend/hibernate notifiers
    PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent
    the race between CPU hotplug and freezer, thus ensuring that CPU hotplug
    notifications will always be run with the state of the system really being
    what the notifications indicate, _throughout_ their execution time.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

03 Nov, 2011

6 commits

  • This reverts commit 144060fee07e9c22e179d00819c83c86fbcbf82c.

    It causes a resume regression for Andi on his Acer Aspire 1830T post
    3.1. The screen just stays black after wakeup.

    Also, it really looks like the wrong way to suspend and resume perf
    events: I think they should be done as part of the CPU suspend and
    resume, rather than as a notifier that does smp_call_function().

    Reported-by: Andi Kleen
    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • While back-porting Johannes Weiner's patch "mm: memcg-aware global
    reclaim" for an internal effort, we noticed a significant performance
    regression during page-reclaim heavy workloads due to high contention of
    the ss->id_lock. This lock protects idr map, and serializes calls to
    idr_get_next() in css_get_next() (which is used during the memcg hierarchy
    walk).

    Since idr_get_next() is just doing a look up, we need only serialize it
    with respect to idr_remove()/idr_get_new(). By making the ss->id_lock a
    rwlock, contention is greatly reduced and performance improves.

    Tested: cat a 256m file from a ramdisk in a 128m container 50 times on
    each core (one file + container per core) in parallel on a NUMA machine.
    Result is the time for the test to complete in 1 of the containers.
    Both kernels included Johannes' memcg-aware global reclaim patches.

    Before rwlock patch: 1710.778s
    After rwlock patch: 152.227s

    Signed-off-by: Andrew Bresticker
    Cc: Paul Menage
    Cc: Li Zefan
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Bresticker
     
  • Adding support for poll() in sysctl fs allows userspace to receive
    notifications of changes in sysctl entries. This adds a infrastructure to
    allow files in sysctl fs to be pollable and implements it for hostname and
    domainname.

    [akpm@linux-foundation.org: s/declare/define/ for definitions]
    Signed-off-by: Lucas De Marchi
    Cc: Greg KH
    Cc: Kay Sievers
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • {get,put}_mems_allowed() exist so that general kernel code may locklessly
    access a task's set of allowable nodes without having the chance that a
    concurrent write will cause the nodemask to be empty on configurations
    where MAX_NUMNODES > BITS_PER_LONG.

    This could incur a significant delay, however, especially in low memory
    conditions because the page allocator is blocking and reclaim requires
    get_mems_allowed() itself. It is not atypical to see writes to
    cpuset.mems take over 2 seconds to complete, for example. In low memory
    conditions, this is problematic because it's one of the most imporant
    times to change cpuset.mems in the first place!

    The only way a task's set of allowable nodes may change is through cpusets
    by writing to cpuset.mems and when attaching a task to a generic code is
    not reading the nodemask with get_mems_allowed() at the same time, and
    then clearing all the old nodes. This prevents the possibility that a
    reader will see an empty nodemask at the same time the writer is storing a
    new nodemask.

    If at least one node remains unchanged, though, it's possible to simply
    set all new nodes and then clear all the old nodes. Changing a task's
    nodemask is protected by cgroup_mutex so it's guaranteed that two threads
    are not changing the same task's nodemask at the same time, so the
    nodemask is guaranteed to be stored before another thread changes it and
    determines whether a node remains set or not.

    Signed-off-by: David Rientjes
    Cc: Miao Xie
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • If a task has exited to the point it has called cgroup_exit() already,
    then we can't migrate it to another cgroup anymore.

    This can happen when we are attaching a task to a new cgroup between the
    call to ->can_attach_task() on subsystems and the migration that is
    eventually tried in cgroup_task_migrate().

    In this case cgroup_task_migrate() returns -ESRCH and we don't want to
    attach the task to the subsystems because the attachment to the new cgroup
    itself failed.

    Fix this by only calling ->attach_task() on the subsystems if the cgroup
    migration succeeded.

    Reported-by: Oleg Nesterov
    Signed-off-by: Ben Blum
    Acked-by: Paul Menage
    Cc: Li Zefan
    Cc: Tejun Heo
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Fix unstable tasklist locking in cgroup_attach_proc.

    According to this thread - https://lkml.org/lkml/2011/7/27/243 - RCU is
    not sufficient to guarantee the tasklist is stable w.r.t. de_thread and
    exit. Taking tasklist_lock for reading, instead of rcu_read_lock, ensures
    proper exclusion.

    Signed-off-by: Ben Blum
    Acked-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: Frederic Weisbecker
    Cc: "Paul E. McKenney"
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

02 Nov, 2011

1 commit

  • * 'next/dt' of git://git.linaro.org/people/arnd/arm-soc:
    ARM: gic: use module.h instead of export.h
    ARM: gic: fix irq_alloc_descs handling for sparse irq
    ARM: gic: add OF based initialization
    ARM: gic: add irq_domain support
    irq: support domains with non-zero hwirq base
    of/irq: introduce of_irq_init
    ARM: at91: add at91sam9g20 and Calao USB A9G20 DT support
    ARM: at91: dt: at91sam9g45 family and board device tree files
    arm/mx5: add device tree support for imx51 babbage
    arm/mx5: add device tree support for imx53 boards
    ARM: msm: Add devicetree support for msm8660-surf
    msm_serial: Add devicetree support
    msm_serial: Use relative resources for iomem

    Fix up conflicts in arch/arm/mach-at91/{at91sam9260.c,at91sam9g45.c}

    Linus Torvalds
     

01 Nov, 2011

14 commits

  • Quoth Andrew:

    - Most of MM. Still waiting for the poweroc guys to get off their
    butts and review some threaded hugepages patches.

    - alpha

    - vfs bits

    - drivers/misc

    - a few core kerenl tweaks

    - printk() features

    - MAINTAINERS updates

    - backlight merge

    - leds merge

    - various lib/ updates

    - checkpatch updates

    * akpm: (127 commits)
    epoll: fix spurious lockdep warnings
    checkpatch: add a --strict check for utf-8 in commit logs
    kernel.h/checkpatch: mark strict_strto and simple_strto as obsolete
    llist-return-whether-list-is-empty-before-adding-in-llist_add-fix
    wireless: at76c50x: follow rename pack_hex_byte to hex_byte_pack
    fat: follow rename pack_hex_byte() to hex_byte_pack()
    security: follow rename pack_hex_byte() to hex_byte_pack()
    kgdb: follow rename pack_hex_byte() to hex_byte_pack()
    lib: rename pack_hex_byte() to hex_byte_pack()
    lib/string.c: fix strim() semantics for strings that have only blanks
    lib/idr.c: fix comment for ida_get_new_above()
    lib/percpu_counter.c: enclose hotplug only variables in hotplug ifdef
    lib/bitmap.c: quiet sparse noise about address space
    lib/spinlock_debug.c: print owner on spinlock lockup
    lib/kstrtox: common code between kstrto*() and simple_strto*() functions
    drivers/leds/leds-lp5521.c: check if reset is successful
    leds: turn the blink_timer off before starting to blink
    leds: save the delay values after a successful call to blink_set()
    drivers/leds/leds-gpio.c: use gpio_get_value_cansleep() when initializing
    drivers/leds/leds-lm3530.c: add __devexit_p where needed
    ...

    Linus Torvalds
     
  • There is no functional change.

    Signed-off-by: Andy Shevchenko
    Acked-by: Jesper Nilsson
    Cc: David Howells
    Cc: Koichi Yasutake
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Currently log_prefix is testing that the first character of the log level
    and facility is less than '0' and greater than '9' (which is always
    false).

    Since the code being updated works because strtoul bombs out (endp isn't
    updated) and 0 is returned anyway just remove the check and don't change
    the behavior of the function.

    Signed-off-by: William Douglas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    William Douglas
     
  • Currently log_prefix is testing that the first character of the log level
    and facility is less than '0' and greater than '9' (which is always
    false). It should be testing to see if the character less than '0' or
    greater than '9' instead. This patch makes that change.

    The code being changed worked because strtoul bombs out (endp isn't
    updated) and 0 is returned anyway.

    Signed-off-by: William Douglas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    William Douglas
     
  • We are enabling some power features on medfield. To test suspend-2-RAM
    conveniently, we need turn on/off console_suspend_enabled frequently.

    Add a module parameter, so users could change it by:
    /sys/module/printk/parameters/console_suspend

    Signed-off-by: Yanmin Zhang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yanmin Zhang
     
  • We are enabling some power features on medfield. To test suspend-2-RAM
    conveniently, we need turn on/off ignore_loglevel frequently without
    rebooting.

    Add a module parameter, so users can change it by:
    /sys/module/printk/parameters/ignore_loglevel

    Signed-off-by: Yanmin Zhang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yanmin Zhang
     
  • Userspace needs to know the highest valid capability of the running
    kernel, which right now cannot reliably be retrieved from the header files
    only. The fact that this value cannot be determined properly right now
    creates various problems for libraries compiled on newer header files
    which are run on older kernels. They assume capabilities are available
    which actually aren't. libcap-ng is one example. And we ran into the
    same problem with systemd too.

    Now the capability is exported in /proc/sys/kernel/cap_last_cap.

    [akpm@linux-foundation.org: make cap_last_cap const, per Ulrich]
    Signed-off-by: Dan Ballard
    Cc: Randy Dunlap
    Cc: Ingo Molnar
    Cc: Lennart Poettering
    Cc: Kay Sievers
    Cc: Ulrich Drepper
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Ballard
     
  • Fix compilation warnings for CONFIG_SYSCTL=n:

    fixed compilation warnings in case of disabled CONFIG_SYSCTL
    kernel/watchdog.c:483:13: warning: `watchdog_enable_all_cpus' defined but not used
    kernel/watchdog.c:500:13: warning: `watchdog_disable_all_cpus' defined but not used

    these functions are static and are used only in sysctl handler, so move
    them inside #ifdef CONFIG_SYSCTL too

    Signed-off-by: Vasily Averin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Make stop_machine() safe to call early in boot, before SMP has been set
    up, by simply calling the callback function directly if there's only one
    CPU online.

    [ Fixes from AKPM:
    - add comment
    - local_irq_flags, not save_flags
    - also call hard_irq_disable() for systems which need it

    Tejun suggested using an explicit flag rather than just looking at
    the online cpu count. ]

    Cc: Tejun Heo
    Acked-by: Rusty Russell
    Cc: Peter Zijlstra
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Acked-by: Tejun Heo
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeremy Fitzhardinge
     
  • Some kernel components pin user space memory (infiniband and perf) (by
    increasing the page count) and account that memory as "mlocked".

    The difference between mlocking and pinning is:

    A. mlocked pages are marked with PG_mlocked and are exempt from
    swapping. Page migration may move them around though.
    They are kept on a special LRU list.

    B. Pinned pages cannot be moved because something needs to
    directly access physical memory. They may not be on any
    LRU list.

    I recently saw an mlockalled process where mm->locked_vm became
    bigger than the virtual size of the process (!) because some
    memory was accounted for twice:

    Once when the page was mlocked and once when the Infiniband
    layer increased the refcount because it needt to pin the RDMA
    memory.

    This patch introduces a separate counter for pinned pages and
    accounts them seperately.

    Signed-off-by: Christoph Lameter
    Cc: Mike Marciniszyn
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This removes mm->oom_disable_count entirely since it's unnecessary and
    currently buggy. The counter was intended to be per-process but it's
    currently decremented in the exit path for each thread that exits, causing
    it to underflow.

    The count was originally intended to prevent oom killing threads that
    share memory with threads that cannot be killed since it doesn't lead to
    future memory freeing. The counter could be fixed to represent all
    threads sharing the same mm, but it's better to remove the count since:

    - it is possible that the OOM_DISABLE thread sharing memory with the
    victim is waiting on that thread to exit and will actually cause
    future memory freeing, and

    - there is no guarantee that a thread is disabled from oom killing just
    because another thread sharing its mm is oom disabled.

    Signed-off-by: David Rientjes
    Reported-by: Oleg Nesterov
    Reviewed-by: Oleg Nesterov
    Cc: Ying Han
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • The basic idea behind cross memory attach is to allow MPI programs doing
    intra-node communication to do a single copy of the message rather than a
    double copy of the message via shared memory.

    The following patch attempts to achieve this by allowing a destination
    process, given an address and size from a source process, to copy memory
    directly from the source process into its own address space via a system
    call. There is also a symmetrical ability to copy from the current
    process's address space into a destination process's address space.

    - Use of /proc/pid/mem has been considered, but there are issues with
    using it:
    - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
    written to would need to be contiguous.
    - Currently mem_read allows only processes who are currently
    ptrace'ing the target and are still able to ptrace the target to read
    from the target. This check could possibly be moved to the open call,
    but its not clear exactly what race this restriction is stopping
    (reason appears to have been lost)
    - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
    domain socket is a bit ugly from a userspace point of view,
    especially when you may have hundreds if not (eventually) thousands
    of processes that all need to do this with each other
    - Doesn't allow for some future use of the interface we would like to
    consider adding in the future (see below)
    - Interestingly reading from /proc/pid/mem currently actually
    involves two copies! (But this could be fixed pretty easily)

    As mentioned previously use of vmsplice instead was considered, but has
    problems. Since you need the reader and writer working co-operatively if
    the pipe is not drained then you block. Which requires some wrapping to
    do non blocking on the send side or polling on the receive. In all to all
    communication it requires ordering otherwise you can deadlock. And in the
    example of many MPI tasks writing to one MPI task vmsplice serialises the
    copying.

    There are some cases of MPI collectives where even a single copy interface
    does not get us the performance gain we could. For example in an
    MPI_Reduce rather than copy the data from the source we would like to
    instead use it directly in a mathops (say the reduce is doing a sum) as
    this would save us doing a copy. We don't need to keep a copy of the data
    from the source. I haven't implemented this, but I think this interface
    could in the future do all this through the use of the flags - eg could
    specify the math operation and type and the kernel rather than just
    copying the data would apply the specified operation between the source
    and destination and store it in the destination.

    Although we don't have a "second user" of the interface (though I've had
    some nibbles from people who may be interested in using it for intra
    process messaging which is not MPI). This interface is something which
    hardware vendors are already doing for their custom drivers to implement
    fast local communication. And so in addition to this being useful for
    OpenMPI it would mean the driver maintainers don't have to fix things up
    when the mm changes.

    There was some discussion about how much faster a true zero copy would
    go. Here's a link back to the email with some testing I did on that:

    http://marc.info/?l=linux-mm&m=130105930902915&w=2

    There is a basic man page for the proposed interface here:

    http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

    This has been implemented for x86 and powerpc, other architecture should
    mainly (I think) just need to add syscall numbers for the process_vm_readv
    and process_vm_writev. There are 32 bit compatibility versions for
    64-bit kernels.

    For arch maintainers there are some simple tests to be able to quickly
    verify that the syscalls are working correctly here:

    http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

    Signed-off-by: Chris Yeoh
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: James Morris
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     
  • Recent commit "irq: Track the owner of irq descriptor" in
    commit ID b6873807a7143b7 placed module.h into linux/irq.h
    but we are trying to limit module.h inclusion to just C files
    that really need it, due to its size and number of children
    includes. This targets just reversing that include.

    Add in the basic "struct module" since that is all we really need
    to ensure things compile. In theory, b687380 should have added the
    module.h include to the irqdesc.h header as well, but the implicit
    module.h everywhere presence masked this from showing up. So give
    it the "struct module" as well.

    As for the C files, irqdesc.c is only using THIS_MODULE, so it
    does not need module.h - give it export.h instead. The C file
    irq/manage.c is now (as of b687380) using try_module_get and
    module_put and so it needs module.h (which it already has).

    Also convert the irq_alloc_descs variants to macros, since all
    they really do is is call the __irq_alloc_descs primitive.
    This avoids including export.h and no debug info is lost.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • These files were getting via an implicit non-obvious
    path, but we want to crush those out of existence since they cost
    time during compiles of processing thousands of lines of headers
    for no reason. Give them the lightweight header that just contains
    the EXPORT_SYMBOL infrastructure.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

31 Oct, 2011

7 commits

  • The file rcutiny.c does not need moduleparam.h header, as
    there are no modparams in this file.

    However rcutiny_plugin.h does define a module_init() and
    a module_exit() and it uses the various MODULE_ macros, so
    it really does need module.h included.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • Through various other implicit include paths, some files were
    getting the full module.h file, and hence living the illusion
    that they really only needed moduleparam.h -- but the reality
    is that once you remove the module.h presence, these show up:

    kernel/params.c:583: warning: ‘struct module_kobject’ declared inside parameter list

    Such files really require module.h so simply make it so. As the
    file module.h grabs moduleparam.h on the fly, all will be well.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • With the module.h usage cleanup, we'll get this:

    kernel/ksysfs.c:161: error: ‘S_IRUGO’ undeclared here (not in a function)
    make[2]: *** [kernel/ksysfs.o] Error 1

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • Up until now, this file was getting percpu.h because nearly every
    file was implicitly getting module.h (and all its sub-includes).
    But we want to clean that up, so call out percpu.h explicitly.
    Otherwise we'll get things like this on an ARM build:

    kernel/irq_work.c:48: error: expected declaration specifiers or '...' before 'irq_work_list'
    kernel/irq_work.c:48: warning: type defaults to 'int' in declaration of 'DEFINE_PER_CPU'

    The same thing was happening for builds on ARM for asm/processor.h

    kernel/irq_work.c: In function 'irq_work_sync':
    kernel/irq_work.c:166: error: implicit declaration of function 'cpu_relax'

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • These files were implicitly relying on coming in via
    module.h, as without it we get things like:

    kernel/power/suspend.c:100: error: implicit declaration of function ‘usermodehelper_disable’
    kernel/power/suspend.c:109: error: implicit declaration of function ‘usermodehelper_enable’
    kernel/power/user.c:254: error: implicit declaration of function ‘usermodehelper_disable’
    kernel/power/user.c:261: error: implicit declaration of function ‘usermodehelper_enable’

    kernel/sys.c:317: error: implicit declaration of function ‘usermodehelper_disable’
    kernel/sys.c:1816: error: implicit declaration of function ‘call_usermodehelper_setup’
    kernel/sys.c:1822: error: implicit declaration of function ‘call_usermodehelper_setfns’
    kernel/sys.c:1824: error: implicit declaration of function ‘call_usermodehelper_exec’

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • These files are doing things like module_put and try_module_get
    so they need to call out the module.h for explicit inclusion,
    rather than getting it via which we ideally want
    to remove the module.h inclusion from.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • The changed files were only including linux/module.h for the
    EXPORT_SYMBOL infrastructure, and nothing else. Revector them
    onto the isolated export header for faster compile times.

    Nothing to see here but a whole lot of instances of:

    -#include
    +#include

    This commit is only changing the kernel dir; next targets
    will probably be mm, fs, the arch dirs, etc.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker