15 Nov, 2013

4 commits

  • All seq_printf() users are using "%n" for calculating padding size,
    convert them to use seq_setwidth() / seq_pad() pair.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Kees Cook
    Cc: Joe Perches
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Hugetlb supports multiple page sizes. We use split lock only for PMD
    level, but not for PUD.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • With split ptlock it's important to know which lock
    pmd_trans_huge_lock() took. This patch adds one more parameter to the
    function to return the lock.

    In most places migration to new api is trivial. Exception is
    move_huge_pmd(): we need to take two locks if pmd tables are different.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • With split page table lock for PMD level we can't hold mm->page_table_lock
    while updating nr_ptes.

    Let's convert it to atomic_long_t to avoid races.

    Signed-off-by: Kirill A. Shutemov
    Tested-by: Alex Thorlton
    Cc: Ingo Molnar
    Cc: Naoya Horiguchi
    Cc: "Eric W . Biederman"
    Cc: "Paul E . McKenney"
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Dave Jones
    Cc: David Howells
    Cc: Frederic Weisbecker
    Cc: Johannes Weiner
    Cc: Kees Cook
    Cc: Mel Gorman
    Cc: Michael Kerrisk
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Robin Holt
    Cc: Sedat Dilek
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

13 Nov, 2013

8 commits

  • Merge first patch-bomb from Andrew Morton:
    "Quite a lot of other stuff is banked up awaiting further
    next->mainline merging, but this batch contains:

    - Lots of random misc patches
    - OCFS2
    - Most of MM
    - backlight updates
    - lib/ updates
    - printk updates
    - checkpatch updates
    - epoll tweaking
    - rtc updates
    - hfs
    - hfsplus
    - documentation
    - procfs
    - update gcov to gcc-4.7 format
    - IPC"

    * emailed patches from Andrew Morton : (269 commits)
    ipc, msg: fix message length check for negative values
    ipc/util.c: remove unnecessary work pending test
    devpts: plug the memory leak in kill_sb
    ./Makefile: export initial ramdisk compression config option
    init/Kconfig: add option to disable kernel compression
    drivers: w1: make w1_slave::flags long to avoid memory corruption
    drivers/w1/masters/ds1wm.cuse dev_get_platdata()
    drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
    drivers/memstick/core/mspro_block.c: fix attributes array allocation
    drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
    kernel/panic.c: reduce 1 byte usage for print tainted buffer
    gcov: reuse kbasename helper
    kernel/gcov/fs.c: use pr_warn()
    kernel/module.c: use pr_foo()
    gcov: compile specific gcov implementation based on gcc version
    gcov: add support for gcc 4.7 gcov format
    gcov: move gcov structs definitions to a gcc version specific file
    kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
    kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
    kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
    ...

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "All kinds of stuff this time around; some more notable parts:

    - RCU'd vfsmounts handling
    - new primitives for coredump handling
    - files_lock is gone
    - Bruce's delegations handling series
    - exportfs fixes

    plus misc stuff all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
    ecryptfs: ->f_op is never NULL
    locks: break delegations on any attribute modification
    locks: break delegations on link
    locks: break delegations on rename
    locks: helper functions for delegation breaking
    locks: break delegations on unlink
    namei: minor vfs_unlink cleanup
    locks: implement delegations
    locks: introduce new FL_DELEG lock flag
    vfs: take i_mutex on renamed file
    vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
    vfs: don't use PARENT/CHILD lock classes for non-directories
    vfs: pull ext4's double-i_mutex-locking into common code
    exportfs: fix quadratic behavior in filehandle lookup
    exportfs: better variable name
    exportfs: move most of reconnect_path to helper function
    exportfs: eliminate unused "noprogress" counter
    exportfs: stop retrying once we race with rename/remove
    exportfs: clear DISCONNECTED on all parents sooner
    exportfs: more detailed comment for path_reconnect
    ...

    Linus Torvalds
     
  • Under Pseudo filesystems, /proc/kcore support has no help.

    Fixes a portion of kernel bugzilla #52671:
    https://bugzilla.kernel.org/show_bug.cgi?id=52671

    Thanks for David Howells for the help text.

    Signed-off-by: Randy Dunlap
    Reported-by:
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Clean up proc_reg_get_unmapped_area due to its 80-column limit
    violation.

    Signed-off-by: HATAYAMA Daisuke
    Tested-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • The same calculation is currently done in three differents places.
    Factor that code so future changes has to be made at only one place.

    [akpm@linux-foundation.org: uninline vm_commit_limit()]
    Signed-off-by: Jerome Marchand
    Cc: Dave Hansen
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     
  • This flag shows that the VMA is "newly created" and thus represents
    "dirty" in the task's VM.

    You can clear it by "echo 4 > /proc/pid/clear_refs."

    Signed-off-by: Naoya Horiguchi
    Cc: Wu Fengguang
    Cc: Pavel Emelyanov
    Acked-by: Cyrill Gorcunov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • mpol_to_str() should not fail. Currently, it either fails because the
    string buffer is too small or because a string hasn't been defined for a
    mempolicy mode.

    If a new mempolicy mode is introduced and no string is defined for it,
    just warn and return "unknown".

    If the buffer is too small, just truncate the string and return, the
    same behavior as snprintf().

    This also fixes a bug where there was no NULL-byte termination when doing
    *p++ = '=' and *p++ ':' and maxlen has been reached.

    Signed-off-by: David Rientjes
    Cc: KOSAKI Motohiro
    Cc: Chen Gang
    Cc: Rik van Riel
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn +
    pgdat->node_spanned_pages". Simplify the code, no functional change.

    Signed-off-by: Xishi Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     

12 Nov, 2013

1 commit

  • Pull devicetree updates from Rob Herring:
    "DeviceTree updates for 3.13. This is a bit larger pull request than
    usual for this cycle with lots of clean-up.

    - Cross arch clean-up and consolidation of early DT scanning code.
    - Clean-up and removal of arch prom.h headers. Makes arch specific
    prom.h optional on all but Sparc.
    - Addition of interrupts-extended property for devices connected to
    multiple interrupt controllers.
    - Refactoring of DT interrupt parsing code in preparation for
    deferred probe of interrupts.
    - ARM cpu and cpu topology bindings documentation.
    - Various DT vendor binding documentation updates"

    * tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
    powerpc: add missing explicit OF includes for ppc
    dt/irq: add empty of_irq_count for !OF_IRQ
    dt: disable self-tests for !OF_IRQ
    of: irq: Fix interrupt-map entry matching
    MIPS: Netlogic: replace early_init_devtree() call
    of: Add Panasonic Corporation vendor prefix
    of: Add Chunghwa Picture Tubes Ltd. vendor prefix
    of: Add AU Optronics Corporation vendor prefix
    of/irq: Fix potential buffer overflow
    of/irq: Fix bug in interrupt parsing refactor.
    of: set dma_mask to point to coherent_dma_mask
    of: add vendor prefix for PHYTEC Messtechnik GmbH
    DT: sort vendor-prefixes.txt
    of: Add vendor prefix for Cadence
    of: Add empty for_each_available_child_of_node() macro definition
    arm/versatile: Fix versatile irq specifications.
    of/irq: create interrupts-extended property
    microblaze/pci: Drop PowerPC-ism from irq parsing
    of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
    of/irq: Use irq_of_parse_and_map()
    ...

    Linus Torvalds
     

01 Nov, 2013

1 commit

  • Resolve cherry-picking conflicts:

    Conflicts:
    mm/huge_memory.c
    mm/memory.c
    mm/mprotect.c

    See this upstream merge commit for more details:

    52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

25 Oct, 2013

1 commit


17 Oct, 2013

3 commits

  • Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
    proc_reg_get_unmapped_area in proc_reg_file_ops and
    proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
    get_unmapped_area method is not defined for the target procfs file,
    which causes regression of mmap on /proc/vmcore.

    To address this issue, like get_unmapped_area(), call default
    current->mm->get_unmapped_area on MMU-present architectures if
    pde->proc_fops->get_unmapped_area, i.e. the one in actual file
    operation in the procfs file, is not defined.

    Reported-by: Michael Holzheu
    Signed-off-by: HATAYAMA Daisuke
    Cc: Alexey Dobriyan
    Cc: David S. Miller
    Tested-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
    mapped virtual address returned from get_unmapped_area method in
    pde->proc_fops due to the variable rv of signed integer on x86_64. This
    is too small to have vitual address of unsigned long on x86_64 since on
    x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
    To fix this issue, use unsigned long instead.

    Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device
    proc file mmap(2)").

    Signed-off-by: HATAYAMA Daisuke
    Cc: Alexey Dobriyan
    Cc: David S. Miller
    Tested-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • If a page we are inspecting is in swap we may occasionally report it as
    having soft dirty bit (even if it is clean). The pte_soft_dirty helper
    should be called on present pte only.

    Signed-off-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Andy Lutomirski
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Reviewed-by: Naoya Horiguchi
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

10 Oct, 2013

1 commit

  • HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc,
    but it is only used for /proc/device-teee and sparc does not enable
    /proc/device-tree. So this option is redundant. Remove the option and
    always enable it. This has the side effect of fixing /proc/device-tree
    on arches such as arm64 which failed to define this option.

    Signed-off-by: Rob Herring
    Acked-by: Vineet Gupta
    Acked-by: Grant Likely
    Cc: Russell King
    Cc: James Hogan
    Cc: Michal Simek
    Cc: Jonas Bonn
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: x86@kernel.org
    Cc: Chris Zankel
    Cc: Max Filippov

    Rob Herring
     

09 Oct, 2013

1 commit

  • It is desirable to model from userspace how the scheduler groups tasks
    over time. This patch adds an ID to the numa_group and reports it via
    /proc/PID/status.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Srikar Dronamraju
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de
    Signed-off-by: Ingo Molnar

    Mel Gorman
     

13 Sep, 2013

1 commit

  • We use NR_ANON_PAGES as base for reporting AnonPages to user. There's
    not much sense in not accounting transparent huge pages there, but add
    them on printing to user.

    Let's account transparent huge pages in NR_ANON_PAGES in the first place.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Dave Hansen
    Cc: Andrea Arcangeli
    Cc: Al Viro
    Cc: Hugh Dickins
    Cc: Wu Fengguang
    Cc: Jan Kara
    Cc: Mel Gorman
    Cc: Andi Kleen
    Cc: Matthew Wilcox
    Cc: Hillf Danton
    Cc: Ning Qu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

12 Sep, 2013

6 commits

  • The patch "s390/vmcore: Implement remap_oldmem_pfn_range for s390" allows
    now to use mmap also on s390.

    So enable mmap for s390 again.

    Signed-off-by: Michael Holzheu
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • For zfcpdump we can't map the HSA storage because it is only available via
    a read interface. Therefore, for the new vmcore mmap feature we have
    introduce a new mechanism to create mappings on demand.

    This patch introduces a new architecture function remap_oldmem_pfn_range()
    that should be used to create mappings with remap_pfn_range() for oldmem
    areas that can be directly mapped. For zfcpdump this is everything
    besides of the HSA memory. For the areas that are not mapped by
    remap_oldmem_pfn_range() a generic vmcore a new generic vmcore fault
    handler mmap_vmcore_fault() is called.

    This handler works as follows:

    * Get already available or new page from page cache (find_or_create_page)
    * Check if /proc/vmcore page is filled with data (PageUptodate)
    * If yes:
    Return that page
    * If no:
    Fill page using __vmcore_read(), set PageUptodate, and return page

    Signed-off-by: Michael Holzheu
    Acked-by: Vivek Goyal
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • For s390 we want to use /proc/vmcore for our SCSI stand-alone dump
    (zfcpdump). We have support where the first HSA_SIZE bytes are saved into
    a hypervisor owned memory area (HSA) before the kdump kernel is booted.
    When the kdump kernel starts, it is restricted to use only HSA_SIZE bytes.

    The advantages of this mechanism are:

    * No crashkernel memory has to be defined in the old kernel.
    * Early boot problems (before kexec_load has been done) can be dumped
    * Non-Linux systems can be dumped.

    We modify the s390 copy_oldmem_page() function to read from the HSA memory
    if memory below HSA_SIZE bytes is requested.

    Since we cannot use the kexec tool to load the kernel in this scenario,
    we have to build the ELF header in the 2nd (kdump/new) kernel.

    So with the following patch set we would like to introduce the new
    function that the ELF header for /proc/vmcore can be created in the 2nd
    kernel memory.

    The following steps are done during zfcpdump execution:

    1. Production system crashes
    2. User boots a SCSI disk that has been prepared with the zfcpdump tool
    3. Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
    4. Boot loader loads kernel into low memory area
    5. Kernel boots and uses only HSA_SIZE bytes of memory
    6. Kernel saves registers of non-boot CPUs
    7. Kernel does memory detection for dump memory map
    8. Kernel creates ELF header for /proc/vmcore
    9. /proc/vmcore uses this header for initialization
    10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
    - copy_oldmem_page() copies from HSA for memory below HSA_SIZE
    - copy_oldmem_page() copies from real memory for memory above HSA_SIZE

    Currently for s390 we create the ELF core header in the 2nd kernel with a
    small trick. We relocate the addresses in the ELF header in a way that
    for the /proc/vmcore code it seems to be in the 1st kernel (old) memory
    and the read_from_oldmem() returns the correct data. This allows the
    /proc/vmcore code to use the ELF header in the 2nd kernel.

    This patch:

    Exchange the old mechanism with the new and much cleaner function call
    override feature that now offcially allows to create the ELF core header
    in the 2nd kernel.

    To use the new feature the following function have to be defined
    by the architecture backend code to read from new memory:

    * elfcorehdr_alloc: Allocate ELF header
    * elfcorehdr_free: Free the memory of the ELF header
    * elfcorehdr_read: Read from ELF header
    * elfcorehdr_read_notes: Read from ELF notes

    Signed-off-by: Michael Holzheu
    Acked-by: Vivek Goyal
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • proc_fd_permission() says "process can still access /proc/self/fd after it
    has executed a setuid()", but the "task_pid() = proc_pid() check only
    helps if the task is group leader, /proc/self points to
    /proc/.

    Change this check to use task_tgid() so that the whole thread group can
    access its /proc/self/fd or /proc//fd.

    Notes:
    - CLONE_THREAD does not require CLONE_FILES so task->files
    can differ, but I don't think this can lead to any security
    problem. And this matches same_thread_group() in
    __ptrace_may_access().

    - /proc/self should probably point to /proc/, but
    it is too late to change the rules. Perhaps it makes sense
    to add /proc/thread though.

    Test-case:

    void *tfunc(void *arg)
    {
    assert(opendir("/proc/self/fd"));
    return NULL;
    }

    int main(void)
    {
    pthread_t t;
    pthread_create(&t, NULL, tfunc, NULL);
    pthread_join(t, NULL);
    return 0;
    }

    fails if, say, this executable is not readable and suid_dumpable = 0.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • mpol_to_str() may fail, and not fill the buffer (e.g. -EINVAL), so need
    check about it, or buffer may not be zero based, and next seq_printf()
    will cause issue.

    The failure return need after mpol_cond_put() to match get_vma_policy().

    Signed-off-by: Chen Gang
    Cc: Cyrill Gorcunov
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     
  • Pavel reported that in case if vma area get unmapped and then mapped (or
    expanded) in-place, the soft dirty tracker won't be able to recognize this
    situation since it works on pte level and ptes are get zapped on unmap,
    loosing soft dirty bit of course.

    So to resolve this situation we need to track actions on vma level, there
    VM_SOFTDIRTY flag comes in. When new vma area created (or old expanded)
    we set this bit, and keep it here until application calls for clearing
    soft dirty bit.

    Thus when user space application track memory changes now it can detect if
    vma area is renewed.

    Reported-by: Pavel Emelyanov
    Signed-off-by: Cyrill Gorcunov
    Cc: Andy Lutomirski
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Cc: Rob Landley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

08 Sep, 2013

1 commit

  • Pull namespace changes from Eric Biederman:
    "This is an assorted mishmash of small cleanups, enhancements and bug
    fixes.

    The major theme is user namespace mount restrictions. nsown_capable
    is killed as it encourages not thinking about details that need to be
    considered. A very hard to hit pid namespace exiting bug was finally
    tracked and fixed. A couple of cleanups to the basic namespace
    infrastructure.

    Finally there is an enhancement that makes per user namespace
    capabilities usable as capabilities, and an enhancement that allows
    the per userns root to nice other processes in the user namespace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Kill nsown_capable it makes the wrong thing easy
    capabilities: allow nice if we are privileged
    pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
    userns: Allow PR_CAPBSET_DROP in a user namespace.
    namespaces: Simplify copy_namespaces so it is clear what is going on.
    pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
    sysfs: Restrict mounting sysfs
    userns: Better restrictions on when proc and sysfs can be mounted
    vfs: Don't copy mount bind mounts of /proc//ns/mnt between namespaces
    kernel/nsproxy.c: Improving a snippet of code.
    proc: Restrict mounting the proc filesystem
    vfs: Lock in place mounts from more privileged users

    Linus Torvalds
     

06 Sep, 2013

2 commits

  • Pull sparc changes from David Miller:
    "Several bug fixes (from Kirill Tkhai, Geery Uytterhoeven, and Alexey
    Dobriyan) and some support for Fujitsu sparc64x chips (from Allen
    Pais)"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc64: Export flush_ptrace_access() (needed by lustre)
    sparc: fix PCI device proc file mmap(2)
    sparc64: Remove RWSEM export leftovers
    sparc64: Fix off by one in trampoline TLB mapping installation loop.
    sparc64: Fix ITLB handler of null page
    esp_scsi: Fix tag state corruption when autosensing.
    sparc64: Fix not SRA'ed %o5 in 32-bit traced syscall
    sparc64: cleanup: Rename ret_from_syscall to ret_from_fork
    sparc32: Fix exit flag passed from traced sys_sigreturn
    sparc64: Fix wrong syscall return value passed to trace_sys_exit()
    support sparc64x chip type in cpumap.c
    cpu hw caps support for sparc64x

    Linus Torvalds
     
  • Commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries"
    must have broken mmapping of PCI device proc files on Sparc.

    Notice how it adds wrapper around ->mmap but doesn't do it around ->get_unmapped_area.
    Add wrapper around ->get_unmapped_area.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

27 Aug, 2013

2 commits

  • Rely on the fact that another flavor of the filesystem is already
    mounted and do not rely on state in the user namespace.

    Verify that the mounted filesystem is not covered in any significant
    way. I would love to verify that the previously mounted filesystem
    has no mounts on top but there are at least the directories
    /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
    for other filesystems to mount on top of.

    Refactor the test into a function named fs_fully_visible and call that
    function from the mount routines of proc and sysfs. This makes this
    test local to the filesystems involved and the results current of when
    the mounts take place, removing a weird threading of the user
    namespace, the mount namespace and the filesystems themselves.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Don't allow mounting the proc filesystem unless the caller has
    CAP_SYS_ADMIN rights over the pid namespace. The principle here is if
    you create or have capabilities over it you can mount it, otherwise
    you get to live with what other people have mounted.

    Andy pointed out that this is needed to prevent users in a user
    namespace from remounting proc and specifying different hidepid and gid
    options on already existing proc mounts.

    Cc: stable@vger.kernel.org
    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

26 Aug, 2013

1 commit


25 Aug, 2013

1 commit


20 Aug, 2013

2 commits

  • In the previous commit, Richard Genoud fixed proc_root_readdir(), which
    had lost the check for whether all of the non-process /proc entries had
    been returned or not.

    But that in turn exposed _another_ bug, namely that the original readdir
    conversion patch had yet another problem: it had lost the return value
    of proc_readdir_de(), so now checking whether it had completed
    successfully or not didn't actually work right anyway.

    This reinstates the non-zero return for the "end of base entries" that
    had also gotten lost in commit f0c3b5093add ("[readdir] convert
    procfs"). So now you get all the base entries *and* you get all the
    process entries, regardless of getdents buffer size.

    (Side note: the Linux "getdents" manual page actually has a nice example
    application for testing getdents, which can be easily modified to use
    different buffers. Who knew? Man-pages can be useful)

    Reported-by: Emmanuel Benisty
    Reported-by: Marc Dionne
    Cc: Richard Genoud
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Commit f0c3b5093add ("[readdir] convert procfs") introduced a bug on the
    listing of the proc file-system. The return value of proc_readdir()
    isn't tested anymore in the proc_root_readdir function.

    This lead to an "interesting" behaviour when we are using the getdents()
    system call with a buffer too small: instead of failing, it returns the
    first entries of /proc (enough to fill the given buffer), plus the PID
    directories.

    This is not triggered on glibc (as getdents is called with a 32KB
    buffer), but on uclibc, the buffer size is only 1KB, thus some proc
    entries are missing.

    See https://lkml.org/lkml/2013/8/12/288 for more background.

    Signed-off-by: Richard Genoud
    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Genoud
     

14 Aug, 2013

3 commits

  • Recently we met quite a lot of random kernel panic issues after enabling
    CONFIG_PROC_PAGE_MONITOR. After debuggind we found this has something
    to do with following bug in pagemap:

    In struct pagemapread:

    struct pagemapread {
    int pos, len;
    pagemap_entry_t *buffer;
    bool v2;
    };

    pos is number of PM_ENTRY_BYTES in buffer, but len is the size of
    buffer, it is a mistake to compare pos and len in add_page_map() for
    checking buffer is full or not, and this can lead to buffer overflow and
    random kernel panic issue.

    Correct len to be total number of PM_ENTRY_BYTES in buffer.

    [akpm@linux-foundation.org: document pagemapread.pos and .len units, fix PM_ENTRY_BYTES definition]
    Signed-off-by: Yonghua Zheng
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    yonghua zheng
     
  • Andy reported that if file page get reclaimed we lose the soft-dirty bit
    if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
    encoded into pte entry. Thus when #pf happens on such non-present pte
    we can restore it back.

    Reported-by: Andy Lutomirski
    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Cc: Minchan Kim
    Cc: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
    get swapped out, the bit is getting lost and no longer available when
    pte read back.

    To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
    pte entry for the page being swapped out. When such page is to be read
    back from a swap cache we check for bit presence and if it's there we
    clear it and restore the former _PAGE_SOFT_DIRTY bit back.

    One of the problem was to find a place in pte entry where we can save
    the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was
    chosen for that, it doesn't intersect with swap entry format stored in
    pte.

    Reported-by: Andy Lutomirski
    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Reviewed-by: Minchan Kim
    Reviewed-by: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

18 Jul, 2013

1 commit

  • The kdump mmap patch series (git commit 83086978c63afd7c73e1c) directly
    map the PT_LOADs to memory. On s390 this does not work because the
    copy_from_oldmem() function swaps [0,crashkernel size] with
    [crashkernel base, crashkernel base+crashkernel size]. The swap
    int copy_from_oldmem() was done in order correctly implement /dev/oldmem.

    See: http://marc.info/?l=kexec&m=136940802511603&w=2

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu