17 Jan, 2012

1 commit


16 Jan, 2012

3 commits

  • Recent changes to kernel/module.c caused the following compile
    error:

    kernel/module.c: In function ‘show_taint’:
    kernel/module.c:1024:2: error: implicit declaration of function ‘module_flags_taint’ [-Werror=implicit-function-declaration]
    cc1: some warnings being treated as errors

    Correct this error by moving the definition of module_flags_taint
    outside of the #ifdef CONFIG_MODULE_UNLOAD section.

    Signed-off-by: Kevin Winchester
    Signed-off-by: Linus Torvalds

    Kevin Winchester
     
  • * 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits)
    Revert "block: recursive merge requests"
    block: Stop using macro stubs for the bio data integrity calls
    blockdev: convert some macros to static inlines
    fs: remove unneeded plug in mpage_readpages()
    block: Add BLKROTATIONAL ioctl
    block: Introduce blk_set_stacking_limits function
    block: remove WARN_ON_ONCE() in exit_io_context()
    block: an exiting task should be allowed to create io_context
    block: ioc_cgroup_changed() needs to be exported
    block: recursive merge requests
    block, cfq: fix empty queue crash caused by request merge
    block, cfq: move icq creation and rq->elv.icq association to block core
    block, cfq: restructure io_cq creation path for io_context interface cleanup
    block, cfq: move io_cq exit/release to blk-ioc.c
    block, cfq: move icq cache management to block core
    block, cfq: move io_cq lookup to blk-ioc.c
    block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq
    block, cfq: reorganize cfq_io_context into generic and cfq specific parts
    block: remove elevator_queue->ops
    block: reorder elevator switch sequence
    ...

    Fix up conflicts in:
    - block/blk-cgroup.c
    Switch from can_attach_task to can_attach
    - block/cfq-iosched.c
    conflict with now removed cic index changes (we now use q->id instead)

    Linus Torvalds
     
  • * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    perf tools: Fix compile error on x86_64 Ubuntu
    perf report: Fix --stdio output alignment when --showcpuutilization used
    perf annotate: Get rid of field_sep check
    perf annotate: Fix usage string
    perf kmem: Fix a memory leak
    perf kmem: Add missing closedir() calls
    perf top: Add error message for EMFILE
    perf test: Change type of '-v' option to INCR
    perf script: Add missing closedir() calls
    tracing: Fix compile error when static ftrace is enabled
    recordmcount: Fix handling of elf64 big-endian objects.
    perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again
    perf tools: Add support for guest/host-only profiling
    perf kvm: Do guest-only counting by default
    perf top: Don't update total_period on process_sample
    perf hists: Stop using 'self' for struct hist_entry
    perf hists: Rename total_session to total_period
    x86: Add counter when debug stack is used with interrupts enabled
    x86: Allow NMIs to hit breakpoints in i386
    x86: Keep current stack in NMI breakpoints
    ...

    Linus Torvalds
     

15 Jan, 2012

2 commits

  • * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
    capabilities: remove __cap_full_set definition
    security: remove the security_netlink_recv hook as it is equivalent to capable()
    ptrace: do not audit capability check when outputing /proc/pid/stat
    capabilities: remove task_ns_* functions
    capabitlies: ns_capable can use the cap helpers rather than lsm call
    capabilities: style only - move capable below ns_capable
    capabilites: introduce new has_ns_capabilities_noaudit
    capabilities: call has_ns_capability from has_capability
    capabilities: remove all _real_ interfaces
    capabilities: introduce security_capable_noaudit
    capabilities: reverse arguments to security_capable
    capabilities: remove the task from capable LSM hook entirely
    selinux: sparse fix: fix several warnings in the security server cod
    selinux: sparse fix: fix warnings in netlink code
    selinux: sparse fix: eliminate warnings for selinuxfs
    selinux: sparse fix: declare selinux_disable() in security.h
    selinux: sparse fix: move selinux_complete_init
    selinux: sparse fix: make selinux_secmark_refcount static
    SELinux: Fix RCU deref check warning in sel_netport_insert()

    Manually fix up a semantic mis-merge wrt security_netlink_recv():

    - the interface was removed in commit fd7784615248 ("security: remove
    the security_netlink_recv hook as it is equivalent to capable()")

    - a new user of it appeared in commit a38f7907b926 ("crypto: Add
    userspace configuration API")

    causing no automatic merge conflict, but Eric Paris pointed out the
    issue.

    Linus Torvalds
     
  • Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1

    * tag 'for-linus' of git://github.com/rustyrussell/linux:
    module_param: check that bool parameters really are bool.
    intelfbdrv.c: bailearly is an int module_param
    paride/pcd: fix bool verbose module parameter.
    module_param: make bool parameters really bool (drivers & misc)
    module_param: make bool parameters really bool (arch)
    module_param: make bool parameters really bool (core code)
    kernel/async: remove redundant declaration.
    printk: fix unnecessary module_param_name.
    lirc_parallel: fix module parameter description.
    module_param: avoid bool abuse, add bint for special cases.
    module_param: check type correctness for module_param_array
    modpost: use linker section to generate table.
    modpost: use a table rather than a giant if/else statement.
    modules: sysfs - export: taint, coresize, initsize
    kernel/params: replace DEBUGP with pr_debug
    module: replace DEBUGP with pr_debug
    module: struct module_ref should contains long fields
    module: Fix performance regression on modules with large symbol tables
    module: Add comments describing how the "strmap" logic works

    Fix up conflicts in scripts/mod/file2alias.c due to the new linker-
    generated table approach to adding __mod_*_device_table entries. The
    ARM sa11x0 mcp bus needed to be converted to that too.

    Linus Torvalds
     

14 Jan, 2012

2 commits

  • For compressed image, the space required is not known until
    we finish compressing and writing all pages.
    This patch drops the check, and if swap space is not enough
    finally, system can still restore to normal after writing
    swap fails for compressed images.

    Signed-off-by: Barry Song
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Barry Song
     
  • After commit 1eb208aea3179dd2fc0cdeea45ef869d75b4fe70, "PM: Make
    CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME)", the
    files under kernel/power are not built unless CONFIG_PM_SLEEP or
    CONFIG_PM_RUNTIME is set. In particular, this causes
    kernel/power/poweroff.c to be omitted, even though it should be
    compiled, because CONFIG_MAGIC_SYSRQ is set.

    Fix the problem by causing kernel/power/Makefile to be processed
    for CONFIG_PM unset too.

    Reported-and-tested-by: Phil Oester
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

13 Jan, 2012

19 commits

  • When we restore a task we need to set up text, data and data heap sizes
    from userspace to the values a task had at checkpoint time. This patch
    adds auxilary prctl codes for that.

    While most of them have a statistical nature (their values are involved
    into calculation of /proc//statm output) the start_brk and brk values
    are used to compute an allowed size of program data segment expansion.
    Which means an arbitrary changes of this values might be dangerous
    operation. So to restrict access the following requirements applied to
    prctl calls:

    - The process has to have CAP_SYS_ADMIN capability granted.
    - For all opcodes except start_brk/brk members an appropriate
    VMA area must exist and should fit certain VMA flags,
    such as:
    - code segment must be executable but not writable;
    - data segment must not be executable.

    start_brk/brk values must not intersect with data segment and must not
    exceed RLIMIT_DATA resource limit.

    Still the main guard is CAP_SYS_ADMIN capability check.

    Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE support
    otherwise these prctl calls will return -EINVAL.

    [akpm@linux-foundation.org: cache current->mm in a local, saving 200 bytes text]
    Signed-off-by: Cyrill Gorcunov
    Reviewed-by: Kees Cook
    Cc: Tejun Heo
    Cc: Andrew Vagin
    Cc: Serge Hallyn
    Cc: Pavel Emelyanov
    Cc: Vasiliy Kulikov
    Cc: KAMEZAWA Hiroyuki
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • When an oops causes a panic and panic prints another backtrace it's pretty
    common to have the original oops data be scrolled away on a 80x50 screen.

    The second backtrace is quite redundant and not needed anyways.

    So don't print the panic backtrace when oops_in_progress is true.

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: Andi Kleen
    Cc: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • The sysctl works on the current task's pid namespace, getting and setting
    its last_pid field.

    Writing is allowed for CAP_SYS_ADMIN-capable tasks thus making it possible
    to create a task with desired pid value. This ability is required badly
    for the checkpoint/restore in userspace.

    This approach suits all the parties for now.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Tejun Heo
    Cc: Oleg Nesterov
    Cc: Cyrill Gorcunov
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • When two CPUs call panic at the same time there is a possible race
    condition that can stop kdump. The first CPU calls crash_kexec() and the
    second CPU calls smp_send_stop() in panic() before crash_kexec() finished
    on the first CPU. So the second CPU stops the first CPU and therefore
    kdump fails:

    1st CPU:
    panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump

    2nd CPU:
    panic()->crash_kexec()->kexec_mutex already held by 1st CPU
    ->smp_send_stop()-> stop 1st CPU (stop kdump)

    This patch fixes the problem by introducing a spinlock in panic that
    allows only one CPU to process crash_kexec() and the subsequent panic
    code.

    All other CPUs call the weak function panic_smp_self_stop() that stops the
    CPU itself. This function can be overloaded by architecture code. For
    example "tile" can use their lower-power "nap" instruction for that.

    Signed-off-by: Michael Holzheu
    Acked-by: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • Currently it is possible to set the crash_size via the sysfs
    /sys/kernel/kexec_crash_size even if no crash kernel memory has been
    defined with the "crashkernel" parameter. In this case "crashk_res" is
    not initialized and crashk_res.start = crashk_res.end = 0. Unfortunately
    resource_size(&crashk_res) returns 1 in this case. This breaks the s390
    implementation of crash_(un)map_reserved_pages().

    To fix the problem the correct "old_size" is now calculated in
    crash_shrink_memory(). "old_size is set to "0" if crashk_res is not
    initialized. With this change crash_shrink_memory() will do nothing, when
    "crashk_res" is not initialized. It will return "0" for "echo 0 >
    /sys/kernel/kexec_crash_size" and -EINVAL for "echo [not zero] >
    /sys/kernel/kexec_crash_size".

    In addition to that this patch also simplifies the "ret = -EINVAL" vs.
    "ret = 0" logic as suggested by Simon Horman.

    Signed-off-by: Michael Holzheu
    Reviewed-by: Dave Young
    Reviewed-by: WANG Cong
    Reviewed-by: Simon Horman
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • When shrinking crashkernel memory using /sys/kernel/kexec_crash_size for
    the newly added memory no RAM resource is created at the moment.

    Example:

    $ cat /proc/iomem
    00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
    c0000000-cfffffff : Crash kernel
    d0000000-ffffffff : System RAM

    $ echo 0 > /sys/kernel/kexec_crash_size
    $ cat /proc/iomem
    00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
    < /sys/kernel/kexec_crash_size
    $ cat /proc/iomem
    00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
    c0000000-cfffffff : System RAM <
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
    /proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
    crash dump scenario.

    [akpm@linux-foundation.org: fix powerpc build]
    Signed-off-by: WANG Cong
    Reported-by: Vivek Goyal
    Acked-by: Vivek Goyal
    Acked-by: Jarod Wilson
    Cc: "Eric W. Biederman"
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • It's a very old and now unused prototype marking so just delete it.

    Neaten panic pointer argument style to keep checkpatch quiet.

    Signed-off-by: Joe Perches
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Tony Luck
    Cc: Fenghua Yu
    Acked-by: Geert Uytterhoeven
    Acked-by: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Enabling DEBUG_STRICT_USER_COPY_CHECKS causes the following warning:

    In file included from arch/x86/include/asm/uaccess.h:573,
    from kernel/kprobes.c:55:
    In function 'copy_from_user',
    inlined from 'write_enabled_file_bool' at
    kernel/kprobes.c:2191:
    arch/x86/include/asm/uaccess_64.h:65:
    warning: call to 'copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct

    presumably due to buf_size being signed causing GCC to fail to see that
    buf_size can't become negative.

    Signed-off-by: Stephen Boyd
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: David S. Miller
    Acked-by: Masami Hiramatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     
  • module_param(bool) used to counter-intuitively take an int. In
    fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
    trick.

    It's time to remove the int/unsigned int option. For this version
    it'll simply give a warning, but it'll break next kernel version.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • It's in linux/init.h, and I'm about to change it to a bool.

    Cc: Arjan van de Ven
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • You don't need module_param_name if the name is the same!

    Cc: Yanmin Zhang
    Cc: Andrew Morton
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • For historical reasons, we allow module_param(bool) to take an int (or
    an unsigned int). That's going away.

    A few drivers really want an int: they set it to -1 and a parameter
    will set it to 0 or 1. This sucks: reading them from sysfs will give
    'Y' for both -1 and 1, but if we change it to an int, then the users
    might be broken (if they did "param" instead of "param=1").

    Use a new 'bint' parser for them.

    (ntfs has a different problem: it needs an int for debug_msgs because
    it's also exposed via sysctl.)

    Cc: Steve Glendinning
    Cc: Jean Delvare
    Cc: Guenter Roeck
    Cc: Hoang-Nam Nguyen
    Cc: Christoph Raisch
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Hal Rosenstock
    Cc: linux390@de.ibm.com
    Cc: Anton Altaparmakov
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Cc: lm-sensors@lm-sensors.org
    Cc: linux-rdma@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: alsa-devel@alsa-project.org
    Acked-by: Takashi Iwai (For the sound part)
    Acked-by: Guenter Roeck (For the hwmon driver)
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Recent tools do not want to use /proc to retrieve module information. A few
    values are currently missing from sysfs to replace the information available
    in /proc/modules.

    This adds /sys/module/*/{coresize,initsize,taint} attributes.

    TAINT_PROPRIETARY_MODULE (P) and TAINT_OOT_MODULE (O) flags are both always
    shown now, and do no longer exclude each other, also in /proc/modules.

    Replace the open-coded sysfs attribute initializers with the __ATTR() macro.

    Add the new attributes to Documentation/ABI.

    Cc: Lucas De Marchi
    Signed-off-by: Kay Sievers
    Signed-off-by: Rusty Russell

    Kay Sievers
     
  • Use more flexible pr_debug. This allows:

    echo "module params +p" > /dbg/dynamic_debug/control

    to turn on debug messages when needed.

    Signed-off-by: Jim Cromie
    Signed-off-by: Rusty Russell

    Jim Cromie
     
  • Use more flexible pr_debug. This allows:

    echo "module module +p" > /dbg/dynamic_debug/control

    to turn on debug messages when needed.

    Signed-off-by: Jim Cromie
    Signed-off-by: Rusty Russell

    Jim Cromie
     
  • module_ref contains two "unsigned int" fields.

    Thats now too small, since some machines can open more than 2^32 files.

    Check commit 518de9b39e8 (fs: allow for more than 2^31 files) for
    reference.

    We can add an aligned(2 * sizeof(unsigned long)) attribute to force
    alloc_percpu() allocating module_ref areas in single cache lines.

    Signed-off-by: Eric Dumazet
    CC: Rusty Russell
    CC: Tejun Heo
    CC: Robin Holt
    CC: David Miller
    Signed-off-by: Rusty Russell

    Eric Dumazet
     
  • Looking at /proc/kallsyms, one starts to ponder whether all of the extra
    strtab-related complexity in module.c is worth the memory savings.

    Instead of making the add_kallsyms() loop even more complex, I tried the
    other route of deleting the strmap logic and naively copying each string
    into core_strtab with no consideration for consolidating duplicates.

    Performance on an "already exists" insmod of nvidia.ko (runs
    add_kallsyms() but does not actually initialize the module):

    Original scheme: 1.230s
    With naive copying: 0.058s

    Extra space used: 35k (of a 408k module).

    Signed-off-by: Kevin Cernekee
    Signed-off-by: Rusty Russell
    LKML-Reference:

    Kevin Cernekee
     
  • Signed-off-by: Kevin Cernekee
    Signed-off-by: Rusty Russell

    Kevin Cernekee
     

12 Jan, 2012

3 commits

  • * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Fix lockup by limiting load-balance retries on lock-break
    sched: Fix CONFIG_CGROUP_SCHED dependency
    sched: Remove empty #ifdefs

    Linus Torvalds
     
  • * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86, reboot: Fix typo in nmi reboot path
    x86, NMI: Add to_cpumask() to silence compile warning
    x86, NMI: NMI selftest depends on the local apic
    x86: Add stack top margin for stack overflow checking
    x86, NMI: NMI-selftest should handle the UP case properly
    x86: Fix the 32-bit stackoverflow-debug build
    x86, NMI: Add knob to disable using NMI IPIs to stop cpus
    x86, NMI: Add NMI IPI selftest
    x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus
    x86: Clean up the range of stack overflow checking
    x86: Panic on detection of stack overflow
    x86: Check stack overflow in detail

    Linus Torvalds
     
  • Eric and David reported dead machines and traced it to commit
    a195f004 ("sched: Fix load-balance lock-breaking"), it turns out
    there's still a scenario where we can end up re-trying forever.

    Since there is no strict forward progress guarantee in the
    load-balance iteration we can get stuck re-retrying the same
    task-set over and over.

    Creating a forward progress guarantee with the existing
    structure is somewhat non-trivial, for now simply terminate the
    retry loop after a few tries.

    Reported-by: Eric Dumazet
    Tested-by: Eric Dumazet
    Reported-by: David Ahern
    [ logic cleanup as suggested by Eric ]
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Frederic Weisbecker
    Cc: Suresh Siddha
    Link: http://lkml.kernel.org/r/1326297936.2442.157.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

11 Jan, 2012

8 commits

  • * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
    writeback: balanced_rate cannot exceed write bandwidth
    writeback: do strict bdi dirty_exceeded
    writeback: avoid tiny dirty poll intervals
    writeback: max, min and target dirty pause time
    writeback: dirty ratelimit - think time compensation
    btrfs: fix dirtied pages accounting on sub-page writes
    writeback: fix dirtied pages accounting on redirty
    writeback: fix dirtied pages accounting on sub-page writes
    writeback: charge leaked page dirties to active tasks
    writeback: Include all dirty inodes in background writeback

    Linus Torvalds
     
  • Andrew elucidates:
    - First installmeant of MM. We have a HUGE number of MM patches this
    time. It's crazy.
    - MAINTAINERS updates
    - backlight updates
    - leds
    - checkpatch updates
    - misc ELF stuff
    - rtc updates
    - reiserfs
    - procfs
    - some misc other bits

    * akpm: (124 commits)
    user namespace: make signal.c respect user namespaces
    workqueue: make alloc_workqueue() take printf fmt and args for name
    procfs: add hidepid= and gid= mount options
    procfs: parse mount options
    procfs: introduce the /proc//map_files/ directory
    procfs: make proc_get_link to use dentry instead of inode
    signal: add block_sigmask() for adding sigmask to current->blocked
    sparc: make SA_NOMASK a synonym of SA_NODEFER
    reiserfs: don't lock root inode searching
    reiserfs: don't lock journal_init()
    reiserfs: delay reiserfs lock until journal initialization
    reiserfs: delete comments referring to the BKL
    drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
    drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
    drivers/rtc/: remove redundant spi driver bus initialization
    drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
    drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
    rtc: convert drivers/rtc/* to use module_platform_driver()
    drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
    drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
    ...

    Linus Torvalds
     
  • ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
    user namespace. (new, thanks Oleg)

    __send_signal: convert current's uid to the recipient's user namespace
    for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
    again :)

    do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
    user namespace

    ptrace_signal maps parent's uid into current's user namespace before
    including in signal to current. IIUC Oleg has argued that this shouldn't
    matter as the debugger will play with it, but it seems like not converting
    the value currently being set is misleading.

    Changelog:
    Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
    simplify callers and help make clear what we are translating
    (which uid into which namespace). Passing the target task would
    make callers even easier to read, but we pass in user_ns because
    current_user_ns() != task_cred_xxx(current, user_ns).
    Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
    in ptrace_signal().
    Sep 23: In send_signal(), detect when (user) signal is coming from an
    ancestor or unrelated user namespace. Pass that on to __send_signal,
    which sets si_uid to 0 or overflowuid if needed.
    Oct 12: Base on Oleg's fixup_uid() patch. On top of that, handle all
    SI_FROMKERNEL cases at callers, because we can't assume sender is
    current in those cases.
    Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
    Nov 10: (akpm) make the !CONFIG_USER_NS case clearer

    Signed-off-by: Serge Hallyn
    Cc: Oleg Nesterov
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    From: Serge Hallyn
    Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)

    Eric Biederman pointed out that passing info is a bug and could lead to a
    NULL pointer deref to boot.

    A collection of signal, securebits, filecaps, cap_bounds, and a few other
    ltp tests passed with this kernel.

    Changelog:
    Nov 18: previous patch missed a leading '&'

    Signed-off-by: Serge Hallyn
    Cc: "Eric W. Biederman"
    From: Dan Carpenter
    Subject: ipc/mqueue: lock() => unlock() typo

    There was a double lock typo introduced in b085f4bd6b21 "user namespace:
    make signal.c respect user namespaces"

    Signed-off-by: Dan Carpenter
    Cc: Oleg Nesterov
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • alloc_workqueue() currently expects the passed in @name pointer to remain
    accessible. This is inconvenient and a bit silly given that the whole wq
    is being dynamically allocated. This patch updates alloc_workqueue() and
    friends to take printf format string instead of opaque string and matching
    varargs at the end. The name is allocated together with the wq and
    formatted.

    alloc_ordered_workqueue() is converted to a macro to unify varargs
    handling with alloc_workqueue(), and, while at it, add comment to
    alloc_workqueue().

    None of the current in-kernel users pass in string with '%' as constant
    name and this change shouldn't cause any problem.

    [akpm@linux-foundation.org: use __printf]
    Signed-off-by: Tejun Heo
    Suggested-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Abstract the code sequence for adding a signal handler's sa_mask to
    current->blocked because the sequence is identical for all architectures.
    Furthermore, in the past some architectures actually got this code wrong,
    so introduce a wrapper that all architectures can use.

    Signed-off-by: Matt Fleming
    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Tejun Heo
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Fleming
     
  • oom_score_adj is used for guarding processes from OOM-Killer. One of
    problem is that it's inherited at fork(). When a daemon set oom_score_adj
    and make children, it's hard to know where the value is set.

    This patch adds some tracepoints useful for debugging. This patch adds
    3 trace points.
    - creating new task
    - renaming a task (exec)
    - set oom_score_adj

    To debug, users need to enable some trace pointer. Maybe filtering is useful as

    # EVENT=/sys/kernel/debug/tracing/events/task/
    # echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
    # echo "oom_score_adj != 0" > $EVENT/task_rename/filter
    # echo 1 > $EVENT/enable
    # EVENT=/sys/kernel/debug/tracing/events/oom/
    # echo 1 > $EVENT/enable

    output will be like this.
    # grep oom /sys/kernel/debug/tracing/trace
    bash-7699 [007] d..3 5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
    bash-7699 [007] ...1 5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
    ls-7729 [003] ...2 5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
    bash-7699 [002] ...1 5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
    grep-7730 [007] ...2 5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder >
    0, we have lot of free pages that are not marked so. Snapshot code
    account them as savable, what cause hibernate memory preallocation
    failure.

    It is pretty hard to make hibernate allocation succeed with
    debug_guardpage_minorder=1. This change at least make it possible when
    system has relatively big amount of RAM.

    Signed-off-by: Stanislaw Gruszka
    Acked-by: Rafael J. Wysocki
    Cc: Andrea Arcangeli
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislaw Gruszka
     
  • * 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (74 commits)
    KVM: PPC: Whitespace fix for kvm.h
    KVM: Fix whitespace in kvm_para.h
    KVM: PPC: annotate kvm_rma_init as __init
    KVM: x86 emulator: implement RDPMC (0F 33)
    KVM: x86 emulator: fix RDPMC privilege check
    KVM: Expose the architectural performance monitoring CPUID leaf
    KVM: VMX: Intercept RDPMC
    KVM: SVM: Intercept RDPMC
    KVM: Add generic RDPMC support
    KVM: Expose a version 2 architectural PMU to a guests
    KVM: Expose kvm_lapic_local_deliver()
    KVM: x86 emulator: Use opcode::execute for Group 9 instruction
    KVM: x86 emulator: Use opcode::execute for Group 4/5 instructions
    KVM: x86 emulator: Use opcode::execute for Group 1A instruction
    KVM: ensure that debugfs entries have been created
    KVM: drop bsp_vcpu pointer from kvm struct
    KVM: x86: Consolidate PIT legacy test
    KVM: x86: Do not rely on implicit inclusions
    KVM: Make KVM_INTEL depend on CPU_SUP_INTEL
    KVM: Use memdup_user instead of kmalloc/copy_from_user
    ...

    Linus Torvalds
     

10 Jan, 2012

2 commits

  • Signed-off-by: Hiroshi Shimamoto
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4F0B8525.8070901@ct.jp.nec.com
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     
  • * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
    cgroup: fix to allow mounting a hierarchy by name
    cgroup: move assignement out of condition in cgroup_attach_proc()
    cgroup: Remove task_lock() from cgroup_post_fork()
    cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
    cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
    cgroup: only need to check oldcgrp==newgrp once
    cgroup: remove redundant get/put of task struct
    cgroup: remove redundant get/put of old css_set from migrate
    cgroup: Remove unnecessary task_lock before fetching css_set on migration
    cgroup: Drop task_lock(parent) on cgroup_fork()
    cgroups: remove redundant get/put of css_set from css_set_check_fetched()
    resource cgroups: remove bogus cast
    cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
    cgroup, cpuset: don't use ss->pre_attach()
    cgroup: don't use subsys->can_attach_task() or ->attach_task()
    cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
    cgroup: improve old cgroup handling in cgroup_attach_proc()
    cgroup: always lock threadgroup during migration
    threadgroup: extend threadgroup_lock() to cover exit and exec
    threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
    ...

    Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
    fix a css_set not found bug in cgroup_attach_proc" that already
    mentioned that the bug is fixed (differently) in Tejun's cgroup
    patchset. This one, in other words.

    Linus Torvalds