12 Sep, 2013

40 commits

  • The patch "s390/vmcore: Implement remap_oldmem_pfn_range for s390" allows
    now to use mmap also on s390.

    So enable mmap for s390 again.

    Signed-off-by: Michael Holzheu
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • Introduce the s390 specific way to map pages from oldmem. The memory area
    below OLDMEM_SIZE is mapped with offset OLDMEM_BASE. The other old memory
    is mapped directly.

    Signed-off-by: Jan Willeke
    Signed-off-by: Michael Holzheu
    Cc: HATAYAMA Daisuke
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Willeke
     
  • For zfcpdump we can't map the HSA storage because it is only available via
    a read interface. Therefore, for the new vmcore mmap feature we have
    introduce a new mechanism to create mappings on demand.

    This patch introduces a new architecture function remap_oldmem_pfn_range()
    that should be used to create mappings with remap_pfn_range() for oldmem
    areas that can be directly mapped. For zfcpdump this is everything
    besides of the HSA memory. For the areas that are not mapped by
    remap_oldmem_pfn_range() a generic vmcore a new generic vmcore fault
    handler mmap_vmcore_fault() is called.

    This handler works as follows:

    * Get already available or new page from page cache (find_or_create_page)
    * Check if /proc/vmcore page is filled with data (PageUptodate)
    * If yes:
    Return that page
    * If no:
    Fill page using __vmcore_read(), set PageUptodate, and return page

    Signed-off-by: Michael Holzheu
    Acked-by: Vivek Goyal
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • Exchange the old relocate mechanism with the new arch function call
    override mechanism that allows to create the ELF core header in the 2nd
    kernel.

    Signed-off-by: Michael Holzheu
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • For s390 we want to use /proc/vmcore for our SCSI stand-alone dump
    (zfcpdump). We have support where the first HSA_SIZE bytes are saved into
    a hypervisor owned memory area (HSA) before the kdump kernel is booted.
    When the kdump kernel starts, it is restricted to use only HSA_SIZE bytes.

    The advantages of this mechanism are:

    * No crashkernel memory has to be defined in the old kernel.
    * Early boot problems (before kexec_load has been done) can be dumped
    * Non-Linux systems can be dumped.

    We modify the s390 copy_oldmem_page() function to read from the HSA memory
    if memory below HSA_SIZE bytes is requested.

    Since we cannot use the kexec tool to load the kernel in this scenario,
    we have to build the ELF header in the 2nd (kdump/new) kernel.

    So with the following patch set we would like to introduce the new
    function that the ELF header for /proc/vmcore can be created in the 2nd
    kernel memory.

    The following steps are done during zfcpdump execution:

    1. Production system crashes
    2. User boots a SCSI disk that has been prepared with the zfcpdump tool
    3. Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
    4. Boot loader loads kernel into low memory area
    5. Kernel boots and uses only HSA_SIZE bytes of memory
    6. Kernel saves registers of non-boot CPUs
    7. Kernel does memory detection for dump memory map
    8. Kernel creates ELF header for /proc/vmcore
    9. /proc/vmcore uses this header for initialization
    10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
    - copy_oldmem_page() copies from HSA for memory below HSA_SIZE
    - copy_oldmem_page() copies from real memory for memory above HSA_SIZE

    Currently for s390 we create the ELF core header in the 2nd kernel with a
    small trick. We relocate the addresses in the ELF header in a way that
    for the /proc/vmcore code it seems to be in the 1st kernel (old) memory
    and the read_from_oldmem() returns the correct data. This allows the
    /proc/vmcore code to use the ELF header in the 2nd kernel.

    This patch:

    Exchange the old mechanism with the new and much cleaner function call
    override feature that now offcially allows to create the ELF core header
    in the 2nd kernel.

    To use the new feature the following function have to be defined
    by the architecture backend code to read from new memory:

    * elfcorehdr_alloc: Allocate ELF header
    * elfcorehdr_free: Free the memory of the ELF header
    * elfcorehdr_read: Read from ELF header
    * elfcorehdr_read_notes: Read from ELF notes

    Signed-off-by: Michael Holzheu
    Acked-by: Vivek Goyal
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • Code can not run here forever, so remove the unnecessary return.

    Signed-off-by: Xishi Qiu
    Suggested-by: Zhang Yanfei
    Reviewed-by: Simon Horman
    Reviewed-by: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     
  • The error hanling and ret-from-loop look confusing and inconsistent.

    - "retval >= 0" simply returns

    - "!bprm->file" returns too but with read_unlock() because
    binfmt_lock was already re-acquired

    - "retval != -ENOEXEC || bprm->mm == NULL" does "break" and
    relies on the same check after the main loop

    Consolidate these checks into a single if/return statement.

    need_retry still checks "retval == -ENOEXEC", but this and -ENOENT before
    the main loop are not needed. This is only for pathological and
    impossible list_empty(&formats) case.

    It is not clear why do we check "bprm->mm == NULL", probably this
    should be removed.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • A separate one-liner for better documentation.

    It doesn't make sense to retry if request_module() fails to exec
    /sbin/modprobe, add the additional "request_module() < 0" check.

    However, this logic still doesn't look exactly right:

    1. It would be better to check "request_module() != 0", the user
    space modprobe process should report the correct exit code.
    But I didn't dare to add the user-visible change.

    2. The whole ENOEXEC logic looks suboptimal. Suppose that we try
    to exec a "#!path-to-unsupported-binary" script. In this case
    request_module() + "retry" will be done twice: first by the
    "depth == 1" code, and then again by the "depth == 0" caller
    which doesn't make sense.

    3. And note that in the case above bprm->buf was already changed
    by load_script()->prepare_binprm(), so this looks even more
    ugly.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • search_binary_handler() uses "for (try=0; try
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • search_binary_handler() checks ->load_binary != NULL for no reason, this
    method should be always defined. Turn this check into WARN_ON() and move
    it into __register_binfmt().

    Also, kill the function pointer. The current code looks confusing, as if
    ->load_binary can go away after read_unlock(&binfmt_lock). But we rely on
    module_get(fmt->module), this fmt can't be changed or unregistered,
    otherwise this code is buggy anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • When search_binary_handler() succeeds it does allow_write_access() and
    fput(), then it clears bprm->file to ensure the caller will not do the
    same.

    We can simply move this code to exec_binprm() which is called only once.
    In fact we could move this to free_bprm() and remove the same code in
    do_execve_common's error path.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • A separate one-liner with the minor fix.

    PROC_EVENT_EXEC reports the "exec" event, but this message is sent at
    least twice if search_binary_handler() is called by ->load_binary()
    recursively, say, load_script().

    Move it to exec_binprm(), this is "depth == 0" code too.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Nobody except search_binary_handler() should touch ->recursion_depth, "int
    depth" buys nothing but complicates the code, kill it.

    Probably we should also kill "fn" and the !NULL check, ->load_binary
    should be always defined. And it can not go away after read_unlock() or
    this code is buggy anyway.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • task_pid_nr_ns() and trace/ptrace code in the middle of the recursive
    search_binary_handler() looks confusing and imho annoying. We only need
    this code if "depth == 0", lets add a simple helper which calls
    search_binary_handler() and does trace_sched_process_exec() +
    ptrace_event().

    The patch also moves the setting of task->did_exec, we need to do this
    only once.

    Note: we can kill either task->did_exec or PF_FORKNOEXEC.

    Signed-off-by: Oleg Nesterov
    Acked-by: Kees Cook
    Cc: Al Viro
    Cc: Evgeniy Polyakov
    Cc: Zach Levis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • proc_fd_permission() says "process can still access /proc/self/fd after it
    has executed a setuid()", but the "task_pid() = proc_pid() check only
    helps if the task is group leader, /proc/self points to
    /proc/.

    Change this check to use task_tgid() so that the whole thread group can
    access its /proc/self/fd or /proc//fd.

    Notes:
    - CLONE_THREAD does not require CLONE_FILES so task->files
    can differ, but I don't think this can lead to any security
    problem. And this matches same_thread_group() in
    __ptrace_may_access().

    - /proc/self should probably point to /proc/, but
    it is too late to change the rules. Perhaps it makes sense
    to add /proc/thread though.

    Test-case:

    void *tfunc(void *arg)
    {
    assert(opendir("/proc/self/fd"));
    return NULL;
    }

    int main(void)
    {
    pthread_t t;
    pthread_create(&t, NULL, tfunc, NULL);
    pthread_join(t, NULL);
    return 0;
    }

    fails if, say, this executable is not readable and suid_dumpable = 0.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • mpol_to_str() may fail, and not fill the buffer (e.g. -EINVAL), so need
    check about it, or buffer may not be zero based, and next seq_printf()
    will cause issue.

    The failure return need after mpol_cond_put() to match get_vma_policy().

    Signed-off-by: Chen Gang
    Cc: Cyrill Gorcunov
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     
  • Fix mistake in the description of Committed_AS in kernel documentation.

    Signed-off-by: Minto Joseph
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minto Joseph
     
  • Cc: "Eric W. Biederman"
    Cc: Andrey Vagin
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Add a new %P variable to be used in core_pattern. This variable contains
    the global PID (PID in the init namespace) as %p contains the PID in the
    current namespace which isn't always what we want.

    The main use for this is to make it easier to handle crashes that happened
    within a container. With that new variables it's possible to have the
    crashes dumped into the container or forwarded to the host with the right
    PID (from the host's point of view).

    Signed-off-by: Stéphane Graber
    Reported-by: Hans Feldt
    Cc: Alexander Viro
    Cc: Eric W. Biederman
    Cc: Andy Whitcroft
    Acked-by: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stéphane Graber
     
  • __ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
    current, this can can lead to surprising results.

    For example, a sub-thread can't readlink("/proc/self/exe") if the
    executable is not readable. setup_new_exec()->would_dump() notices that
    inode_permission(MAY_READ) fails and then it does
    set_dumpable(suid_dumpable). After that get_dumpable() fails.

    (It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
    could add PTRACE_MODE_NODUMPABLE)

    Change __ptrace_may_access() to use same_thread_group() instead of "task
    == current". Any security check is pointless when the tasks share the
    same ->mm.

    Signed-off-by: Mark Grondona
    Signed-off-by: Ben Woodard
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Grondona
     
  • Integrate implemented POSIX ACLs support into hfsplus driver.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Implement POSIX ACLs support in hfsplus driver.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • This patchset implements POSIX ACLs support in hfsplus driver.

    Mac OS X beginning with version 10.4 ("Tiger") support NFSv4 ACLs, which
    are part of the NFSv4 standard. HFS+ stores ACLs in the form of
    specially named extended attributes (com.apple.system.Security).

    But this patchset doesn't use "com.apple.system.Security" extended
    attributes. It implements support of POSIX ACLs in the form of extended
    attributes with names "system.posix_acl_access" and
    "system.posix_acl_default". These xattrs are treated only under Linux.
    POSIX ACLs doesn't mean something under Mac OS X. Thereby, this patch
    set provides opportunity to use POSIX ACLs under Linux on HFS+
    filesystem.

    This patch:

    Add CONFIG_HFSPLUS_FS_POSIX_ACL kernel configuration option, DBG_ACL_MOD
    debugging flag and acl.h file with declaration of essential functions
    for support POSIX ACLs in hfsplus driver.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Convert the composition of devm_request_mem_region and devm_ioremap to a
    single call to devm_ioremap_resource. The associated call to
    platform_get_resource is also simplified and moved next to the new call
    to devm_ioremap_resource.

    This was done using a combination of the semantic patches
    devm_ioremap_resource.cocci and devm_request_and_ioremap.cocci, found in
    the scripts/coccinelle/api directory.

    In rtc-lpc32xx.c and rtc-mv.c, the local variable size is no longer needed.

    In rtc-ds1511.c the size field of the local structure is not useful any
    more, and is deleted.

    Signed-off-by: Julia Lawall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Julia Lawall
     
  • Let RTC core decide if the retrieved time is invalid, instead of
    processing errors in the driver.

    Signed-off-by: Alexander Shiyan
    Cc: Jingoo Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Shiyan
     
  • Private field "rtc" is not used outside "probe", so there is no reason to
    keep it.

    Signed-off-by: Alexander Shiyan
    Cc: Jingoo Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Shiyan
     
  • Replace devm_request_mem_region() and devm_ioremap() with
    devm_ioremap_resource().

    Signed-off-by: Alexander Shiyan
    Cc: Jingoo Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Shiyan
     
  • enable_irq_wake() might fail, if so, we will see kernel warning in resume
    entries due to it always calls disable_irq_wake().

    WARNING: at kernel/irq/manage.c:529 irq_set_irq_wake+0xc4/0xf0()
    Unbalanced IRQ 52 wake disable
    Modules linked in: ipv6 libcomposite configfs
    CPU: 0 PID: 1591 Comm: ash Tainted: G W 3.10.0-00854-gdbd86d4-dirty #100
    (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
    (show_stack+0x10/0x14) from (warn_slowpath_common+0x54/0x68)
    (warn_slowpath_common+0x54/0x68) from (warn_slowpath_fmt+0x30/0x40)
    (warn_slowpath_fmt+0x30/0x40) from (irq_set_irq_wake+0xc4/0xf0)
    (irq_set_irq_wake+0xc4/0xf0) from (sirfsoc_rtc_restore+0x30/0x38)
    (sirfsoc_rtc_restore+0x30/0x38) from (platform_pm_restore+0x2c/0x50)
    (platform_pm_restore+0x2c/0x50) from (dpm_run_callback.clone.6+0x30/0xb0)
    (dpm_run_callback.clone.6+0x30/0xb0) from (device_resume+0x88/0x134)
    (device_resume+0x88/0x134) from (dpm_resume+0x114/0x230)
    (dpm_resume+0x114/0x230) from (hibernation_snapshot+0x178/0x1d0)
    (hibernation_snapshot+0x178/0x1d0) from (hibernate+0x130/0x1dc)
    (hibernate+0x130/0x1dc) from (state_store+0xb4/0xc0)
    (state_store+0xb4/0xc0) from (kobj_attr_store+0x14/0x20)
    (kobj_attr_store+0x14/0x20) from (sysfs_write_file+0xfc/0x17c)
    (sysfs_write_file+0xfc/0x17c) from (vfs_write+0xc8/0x194)
    (vfs_write+0xc8/0x194) from (SyS_write+0x40/0x6c)
    (SyS_write+0x40/0x6c) from (ret_fast_syscall+0x0/0x30)

    To avoid unbalanced "IRQ wake disable", ensure that disable_irq_wake() is
    called only when enable_irq_wake() have been successfully enabled.

    Signed-off-by: Xianglong Du
    Signed-off-by: Barry Song
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xianglong Du
     
  • check_rtc_access_enable() returns pointer, thus NULL should be used
    instead of 0 in order to fix the following sparse warning:

    drivers/rtc/rtc-nuc900.c:102:16: warning: Using plain integer as NULL pointer

    Signed-off-by: Jingoo Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jingoo Han
     
  • Fix a read of the wrong register when checking whether the RTC timer has
    reached the alarm time.

    Signed-off-by: Sangjung Woo
    Signed-off-by: Myugnjoo Ham
    Reviewed-by: Jonghwa Lee
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sangjung Woo
     
  • Stop processing hid input when registering the RTC fails and handle a NULL
    returned from devm_rtc_device_register() as a failure too.

    Signed-off-by: Alexander Holler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Holler
     
  • Palmas series device like TPS65913, TPS80036 supports the backup battery
    for powering the RTC when no other energy source is available.

    The backup battery is optional, connected to the VBACKUP pin, and can be
    nonrechargeable or rechargeable. The rechargeable battery can be charged
    from the system supply using the backup battery charger.

    Add support for enabling charging of this backup battery. Also add the DT
    binding document and the new properties to have this support.

    Signed-off-by: Laxman Dewangan
    Reviewed-by: Felipe Balbi
    Acked-by: Kumar Gala
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laxman Dewangan
     
  • On some platforms (like AM33xx), a special register (RTC_IRQWAKEEN) is
    available to enable Alarm Wakeup feature. This register needs to be
    properly handled for the rtcwake to work properly.

    Platforms using such IP should set "ti,am3352-rtc" in rtc device dt
    compatibility node.

    Signed-off-by: Hebbar Gururaja
    Acked-by: Kevin Hilman
    Acked-by: Sekhar Nori
    Cc: Grant Likely
    Cc: Rob Herring
    Cc: Rob Landley
    Cc: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hebbar Gururaja
     
  • Add RTC driver for MOXA ART SoCs.

    Signed-off-by: Jonas Jensen
    Reviewed-by: Mark Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonas Jensen
     
  • The 'remove' function is empty and does not do anything. Delete it.

    Signed-off-by: Sachin Kamat
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sachin Kamat
     
  • In order to get the module automatically loaded by hotplug mechanisms a
    MODULE_DEVICE_TABLE is needed.

    Therefore add one.

    This makes it also possible to use a module name other than
    HID-SENSOR-2000a0 which isn't very descriptive in kernel messages.

    Signed-off-by: Alexander Holler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Holler
     
  • With the general-instruction extension facility (z10) a couple of
    instructions with a pc-relative long displacement were introduced. The
    kprobes support for these instructions however was never implemented.

    In result, if anybody ever put a probe on any of these instructions the
    result would have been random behaviour after the instruction got executed
    within the insn slot.

    So lets add the missing handling for these instructions. Since all of the
    new instructions have 32 bit signed displacement the easiest solution is
    to allocate an insn slot that is within the same 2GB area like the
    original instruction and patch the displacement field.

    Signed-off-by: Heiko Carstens
    Reviewed-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • The current two insn slot caches both use module_alloc/module_free to
    allocate and free insn slot cache pages.

    For s390 this is not sufficient since there is the need to allocate insn
    slots that are either within the vmalloc module area or within dma memory.

    Therefore add a mechanism which allows to specify an own allocator for an
    own insn slot cache.

    Signed-off-by: Heiko Carstens
    Acked-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • The current kpropes insn caches allocate memory areas for insn slots
    with module_alloc(). The assumption is that the kernel image and module
    area are both within the same +/- 2GB memory area.

    This however is not true for s390 where the kernel image resides within
    the first 2GB (DMA memory area), but the module area is far away in the
    vmalloc area, usually somewhere close below the 4TB area.

    For new pc relative instructions s390 needs insn slots that are within
    +/- 2GB of each area. That way we can patch displacements of
    pc-relative instructions within the insn slots just like x86 and
    powerpc.

    The module area works already with the normal insn slot allocator,
    however there is currently no way to get insn slots that are within the
    first 2GB on s390 (aka DMA area).

    Therefore this patch set modifies the kprobes insn slot cache code in
    order to allow to specify a custom allocator for the insn slot cache
    pages. In addition architecure can now have private insn slot caches
    withhout the need to modify common code.

    Patch 1 unifies and simplifies the current insn and optinsn caches
    implementation. This is a preparation which allows to add more
    insn caches in a simple way.

    Patch 2 adds the possibility to specify a custom allocator.

    Patch 3 makes s390 use the new insn slot mechanisms and adds support for
    pc-relative instructions with long displacements.

    This patch (of 3):

    The two insn caches (insn, and optinsn) each have an own mutex and
    alloc/free functions (get_[opt]insn_slot() / free_[opt]insn_slot()).

    Since there is the need for yet another insn cache which satifies dma
    allocations on s390, unify and simplify the current implementation:

    - Move the per insn cache mutex into struct kprobe_insn_cache.
    - Move the alloc/free functions to kprobe.h so they are simply
    wrappers for the generic __get_insn_slot/__free_insn_slot functions.
    The implementation is done with a DEFINE_INSN_CACHE_OPS() macro
    which provides the alloc/free functions for each cache if needed.
    - move the struct kprobe_insn_cache to kprobe.h which allows to generate
    architecture specific insn slot caches outside of the core kprobes
    code.

    Signed-off-by: Heiko Carstens
    Cc: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • As reported by Joe Perches: OOM messages generally aren't useful.
    dmi_alloc is either a trivial front-end to kzalloc, and kzalloc already
    does a dump_stack() when OOM, or for x86, dmi_alloc uses extend_brk
    which BUGs when unsuccessful.

    So we can remove all 6 such log messages in the dmi_scan driver, to
    shrink the binary size (by 528 bytes on x86_64.)

    Signed-off-by: Jean Delvare
    Reported-by: Joe Perches
    Cc: Ben Hutchings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare