10 Dec, 2015

1 commit

  • commit b2f73922d119686323f14fbbe46587f863852328 upstream.

    So the /proc/PID/stat 'wchan' field (the 30th field, which contains
    the absolute kernel address of the kernel function a task is blocked in)
    leaks absolute kernel addresses to unprivileged user-space:

    seq_put_decimal_ull(m, ' ', wchan);

    The absolute address might also leak via /proc/PID/wchan as well, if
    KALLSYMS is turned off or if the symbol lookup fails for some reason:

    static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
    struct pid *pid, struct task_struct *task)
    {
    unsigned long wchan;
    char symname[KSYM_NAME_LEN];

    wchan = get_wchan(task);

    if (lookup_symbol_name(wchan, symname) < 0) {
    if (!ptrace_may_access(task, PTRACE_MODE_READ))
    return 0;
    seq_printf(m, "%lu", wchan);
    } else {
    seq_printf(m, "%s", symname);
    }

    return 0;
    }

    This isn't ideal, because for example it trivially leaks the KASLR offset
    to any local attacker:

    fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
    ffffffff8123b380

    Most real-life uses of wchan are symbolic:

    ps -eo pid:10,tid:10,wchan:30,comm

    and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat:

    triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
    open("/proc/30833/wchan", O_RDONLY) = 6

    There's one compatibility quirk here: procps relies on whether the
    absolute value is non-zero - and we can provide that functionality
    by outputing "0" or "1" depending on whether the task is blocked
    (whether there's a wchan address).

    These days there appears to be very little legitimate reason
    user-space would be interested in the absolute address. The
    absolute address is mostly historic: from the days when we
    didn't have kallsyms and user-space procps had to do the
    decoding itself via the System.map.

    So this patch sets all numeric output to "0" or "1" and keeps only
    symbolic output, in /proc/PID/wchan.

    ( The absolute sleep address can generally still be profiled via
    perf, by tasks with sufficient privileges. )

    Reviewed-by: Thomas Gleixner
    Acked-by: Kees Cook
    Acked-by: Linus Torvalds
    Cc: Al Viro
    Cc: Alexander Potapenko
    Cc: Andrey Konovalov
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: Dmitry Vyukov
    Cc: Kostya Serebryany
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: kasan-dev
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Ingo Molnar
     

24 Apr, 2015

1 commit

  • Pull xfs update from Dave Chinner:
    "This update contains:

    - RENAME_WHITEOUT support

    - conversion of per-cpu superblock accounting to use generic counters

    - new inode mmap lock so that we can lock page faults out of
    truncate, hole punch and other direct extent manipulation functions
    to avoid racing mmap writes from causing data corruption

    - rework of direct IO submission and completion to solve data
    corruption issue when running concurrent extending DIO writes.
    Also solves problem of running IO completion transactions in
    interrupt context during size extending AIO writes.

    - FALLOC_FL_INSERT_RANGE support for inserting holes into a file via
    direct extent manipulation to avoid needing to copy data within the
    file

    - attribute block header field overflow fix for 64k block size
    filesystems

    - Lots of changes to log messaging to be more informative and concise
    when errors occur. Also prevent a lot of unnecessary log spamming
    due to cascading failures in error conditions.

    - lots of cleanups and bug fixes

    One thing of note is the direct IO fixes that we merged last week
    after the window opened. Even though a little late, they fix a user
    reported data corruption and have been pretty well tested. I figured
    there was not much point waiting another 2 weeks for -rc1 to be
    released just so I could send them to you..."

    * tag 'xfs-for-linus-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (49 commits)
    xfs: using generic_file_direct_write() is unnecessary
    xfs: direct IO EOF zeroing needs to drain AIO
    xfs: DIO write completion size updates race
    xfs: DIO writes within EOF don't need an ioend
    xfs: handle DIO overwrite EOF update completion correctly
    xfs: DIO needs an ioend for writes
    xfs: move DIO mapping size calculation
    xfs: factor DIO write mapping from get_blocks
    xfs: unlock i_mutex in xfs_break_layouts
    xfs: kill unnecessary firstused overflow check on attr3 leaf removal
    xfs: use larger in-core attr firstused field and detect overflow
    xfs: pass attr geometry to attr leaf header conversion functions
    xfs: disallow ro->rw remount on norecovery mount
    xfs: xfs_shift_file_space can be static
    xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
    fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
    xfs: Fix incorrect positive ENOMEM return
    xfs: xfs_mru_cache_insert() should use GFP_NOFS
    xfs: %pF is only for function pointers
    xfs: fix shadow warning in xfs_da3_root_split()
    ...

    Linus Torvalds
     

23 Apr, 2015

1 commit

  • Pull InfiniBand/RDMA updates from Roland Dreier:

    - IPoIB fixes from Doug Ledford and Erez Shitrit

    - iSER updates from Sagi Grimberg

    - mlx4 GUID handling changes from Yishai Hadas

    - other misc fixes

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (51 commits)
    mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures
    IB/iser: Rewrite bounce buffer code path
    IB/iser: Bump version to 1.6
    IB/iser: Remove code duplication for a single DMA entry
    IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
    IB/iser: Modify struct iser_mem_reg members
    IB/iser: Make fastreg pool cache friendly
    IB/iser: Move PI context alloc/free to routines
    IB/iser: Move fastreg descriptor pool get/put to helper functions
    IB/iser: Merge build page-vec into register page-vec
    IB/iser: Get rid of struct iser_rdma_regd
    IB/iser: Remove redundant assignments in iser_reg_page_vec
    IB/iser: Move memory reg/dereg routines to iser_memory.c
    IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
    IB/iser: Remove a redundant struct iser_data_buf
    IB/iser: Remove redundant cmd_data_len calculation
    IB/iser: Fix wrong calculation of protection buffer length
    IB/iser: Handle fastreg/local_inv completion errors
    IB/iser: Fix unload during ep_poll wrong dereference
    ib_srpt: convert printk's to pr_* functions
    ...

    Linus Torvalds
     

18 Apr, 2015

2 commits

  • Pull f2fs updates from Jaegeuk Kim:
    "New features:
    - in-memory extent_cache
    - fs_shutdown to test power-off-recovery
    - use inline_data to store symlink path
    - show f2fs as a non-misc filesystem

    Major fixes:
    - avoid CPU stalls on sync_dirty_dir_inodes
    - fix some power-off-recovery procedure
    - fix handling of broken symlink correctly
    - fix missing dot and dotdot made by sudden power cuts
    - handle wrong data index during roll-forward recovery
    - preallocate data blocks for direct_io

    ... and a bunch of minor bug fixes and cleanups"

    * tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits)
    f2fs: pass checkpoint reason on roll-forward recovery
    f2fs: avoid abnormal behavior on broken symlink
    f2fs: flush symlink path to avoid broken symlink after POR
    f2fs: change 0 to false for bool type
    f2fs: do not recover wrong data index
    f2fs: do not increase link count during recovery
    f2fs: assign parent's i_mode for empty dir
    f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
    f2fs: fix mismatching lock and unlock pages for roll-forward recovery
    f2fs: fix sparse warnings
    f2fs: limit b_size of mapped bh in f2fs_map_bh
    f2fs: persist system.advise into on-disk inode
    f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get
    f2fs: preallocate fallocated blocks for direct IO
    f2fs: enable inline data by default
    f2fs: preserve extent info for extent cache
    f2fs: initialize extent tree with on-disk extent info of inode
    f2fs: introduce __{find,grab}_extent_tree
    f2fs: split set_data_blkaddr from f2fs_update_extent_cache
    f2fs: enable fast symlink by utilizing inline data
    ...

    Linus Torvalds
     
  • Pull documentation updates from Jonathan Corbet:
    "Numerous fixes, the overdue removal of the i2o docs, some new Chinese
    translations, and, hopefully, the README fix that will end the flow of
    identical patches to that file"

    * tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (34 commits)
    Documentation/memcg: update memcg/kmem status
    Documentation: blackfin: Makefile: Typo building issue
    Documentation/vm/pagemap.txt: correct location of page-types tool
    Documentation/memory-barriers.txt: typo fix
    doc: Add guest_nice column to example output of `cat /proc/stat'
    Documentation/kernel-parameters: Move "eagerfpu" to its right place
    Documentation: gpio: Update ACPI part of the document to mention _DSD
    docs/completion.txt: Various tweaks and corrections
    doc: completion: context, scope and language fixes
    Documentation:Update Documentation/zh_CN/arm64/memory.txt
    Documentation:Update Documentation/zh_CN/arm64/booting.txt
    Documentation: Chinese translation of arm64/legacy_instructions.txt
    DocBook media: fix broken EIA hyperlink
    Documentation: tweak the maintainers entry
    README: Change gzip/bzip2 to xz compression format
    README: Update version number reference
    doc:pci: Fix typo in Documentation/PCI
    Documentation: drm: Use '->' when describing access through pointers.
    Documentation: Remove mentioning of block barriers
    Documentation/email-clients.txt: Fix one grammar mistake, add extra info about TB
    ...

    Linus Torvalds
     

17 Apr, 2015

4 commits

  • Merge third patchbomb from Andrew Morton:

    - various misc things

    - a couple of lib/ optimisations

    - provide DIV_ROUND_CLOSEST_ULL()

    - checkpatch updates

    - rtc tree

    - befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs

    - ptrace fixes

    - fork() fixes

    - seccomp cleanups

    - more mmap_sem hold time reductions from Davidlohr

    * emailed patches from Andrew Morton : (138 commits)
    proc: show locks in /proc/pid/fdinfo/X
    docs: add missing and new /proc/PID/status file entries, fix typos
    drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic
    Documentation/spi/spidev_test.c: fix warning
    drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type
    .gitignore: ignore *.tar
    MAINTAINERS: add Mediatek SoC mailing list
    tomoyo: reduce mmap_sem hold for mm->exe_file
    powerpc/oprofile: reduce mmap_sem hold for exe_file
    oprofile: reduce mmap_sem hold for mm->exe_file
    mips: ip32: add platform data hooks to use DS1685 driver
    lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text
    x86: switch to using asm-generic for seccomp.h
    sparc: switch to using asm-generic for seccomp.h
    powerpc: switch to using asm-generic for seccomp.h
    parisc: switch to using asm-generic for seccomp.h
    mips: switch to using asm-generic for seccomp.h
    microblaze: use asm-generic for seccomp.h
    arm: use asm-generic for seccomp.h
    seccomp: allow COMPAT sigreturn overrides
    ...

    Linus Torvalds
     
  • Let's show locks which are associated with a file descriptor in
    its fdinfo file.

    Currently we don't have a reliable way to determine who holds a lock. We
    can find some information in /proc/locks, but PID which is reported there
    can be wrong. For example, a process takes a lock, then forks a child and
    dies. In this case /proc/locks contains the parent pid, which can be
    reused by another process.

    $ cat /proc/locks
    ...
    6: FLOCK ADVISORY WRITE 324 00:13:13431 0 EOF
    ...

    $ ps -C rpcbind
    PID TTY TIME CMD
    332 ? 00:00:00 rpcbind

    $ cat /proc/332/fdinfo/4
    pos: 0
    flags: 0100000
    mnt_id: 22
    lock: 1: FLOCK ADVISORY WRITE 324 00:13:13431 0 EOF

    $ ls -l /proc/332/fd/4
    lr-x------ 1 root root 64 Mar 5 14:43 /proc/332/fd/4 -> /run/rpcbind.lock

    $ ls -l /proc/324/fd/
    total 0
    lrwx------ 1 root root 64 Feb 27 14:50 0 -> /dev/pts/0
    lrwx------ 1 root root 64 Feb 27 14:50 1 -> /dev/pts/0
    lrwx------ 1 root root 64 Feb 27 14:49 2 -> /dev/pts/0

    You can see that the process with the 324 pid doesn't hold the lock.

    This information is required for proper dumping and restoring file
    locks.

    Signed-off-by: Andrey Vagin
    Cc: Jonathan Corbet
    Cc: Alexander Viro
    Acked-by: Jeff Layton
    Acked-by: "J. Bruce Fields"
    Acked-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • docs: add missing and new /proc/PID/status file entries, fix typos

    Signed-off-by: Nathan Scott
    Signed-off-by: Chen Hanxiao
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Scott
     
  • Pull third hunk of vfs changes from Al Viro:
    "This contains the ->direct_IO() changes from Omar + saner
    generic_write_checks() + dealing with fcntl()/{read,write}() races
    (mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
    repeatedly looking at ->f_flags, which can be changed by fcntl(2),
    check ->ki_flags - which cannot) + infrastructure bits for dhowells'
    d_inode annotations + Christophs switch of /dev/loop to
    vfs_iter_write()"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
    block: loop: switch to VFS ITER_BVEC
    configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
    VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
    VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
    VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
    NFS: Don't use d_inode as a variable name
    VFS: Impose ordering on accesses of d_inode and d_flags
    VFS: Add owner-filesystem positive/negative dentry checks
    nfs: generic_write_checks() shouldn't be done on swapout...
    ocfs2: use __generic_file_write_iter()
    mirror O_APPEND and O_DIRECT into iocb->ki_flags
    switch generic_write_checks() to iocb and iter
    ocfs2: move generic_write_checks() before the alignment checks
    ocfs2_file_write_iter: stop messing with ppos
    udf_file_write_iter: reorder and simplify
    fuse: ->direct_IO() doesn't need generic_write_checks()
    ext4_file_write_iter: move generic_write_checks() up
    xfs_file_aio_write_checks: switch to iocb/iov_iter
    generic_write_checks(): drop isblk argument
    blkdev_write_iter: expand generic_file_checks() call in there
    ...

    Linus Torvalds
     

16 Apr, 2015

3 commits

  • Merge second patchbomb from Andrew Morton:

    - the rest of MM

    - various misc bits

    - add ability to run /sbin/reboot at reboot time

    - printk/vsprintf changes

    - fiddle with seq_printf() return value

    * akpm: (114 commits)
    parisc: remove use of seq_printf return value
    lru_cache: remove use of seq_printf return value
    tracing: remove use of seq_printf return value
    cgroup: remove use of seq_printf return value
    proc: remove use of seq_printf return value
    s390: remove use of seq_printf return value
    cris fasttimer: remove use of seq_printf return value
    cris: remove use of seq_printf return value
    openrisc: remove use of seq_printf return value
    ARM: plat-pxa: remove use of seq_printf return value
    nios2: cpuinfo: remove use of seq_printf return value
    microblaze: mb: remove use of seq_printf return value
    ipc: remove use of seq_printf return value
    rtc: remove use of seq_printf return value
    power: wakeup: remove use of seq_printf return value
    x86: mtrr: if: remove use of seq_printf return value
    linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
    MAINTAINERS: CREDITS: remove Stefano Brivio from B43
    .mailmap: add Ricardo Ribalda
    CREDITS: add Ricardo Ribalda Delgado
    ...

    Linus Torvalds
     
  • This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to
    get notified when access is a write to a read-only PFN.

    This can happen if we mmap() a file then first mmap-read from it to
    page-in a read-only PFN, than we mmap-write to the same page.

    We need this functionality to fix a DAX bug, where in the scenario above
    we fail to set ctime/mtime though we modified the file. An xfstest is
    attached to this patchset that shows the failure and the fix. (A DAX
    patch will follow)

    This functionality is extra important for us, because upon dirtying of a
    pmem page we also want to RDMA the page to a remote cluster node.

    We define a new pfn_mkwrite and do not reuse page_mkwrite because
    1 - The name ;-)
    2 - But mainly because it would take a very long and tedious
    audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP
    users. To make sure they do not now CRASH. For example current
    DAX code (which this is for) would crash.
    If we would want to reuse page_mkwrite, We will need to first
    patch all users, so to not-crash-on-no-page. Then enable this
    patch. But even if I did that I would not sleep so well at night.
    Adding a new vector is the safest thing to do, and is not that
    expensive. an extra pointer at a static function vector per driver.
    Also the new vector is better for performance, because else we
    Will call all current Kernel vectors, so to:
    check-ha-no-page-do-nothing and return.

    No need to call it from do_shared_fault because do_wp_page is called to
    change pte permissions anyway.

    Signed-off-by: Yigal Korman
    Signed-off-by: Boaz Harrosh
    Acked-by: Kirill A. Shutemov
    Cc: Matthew Wilcox
    Cc: Jan Kara
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Boaz Harrosh
     
  • The ifconfig command has been deprecated for many years.
    To encourage new users not to continue using it and learning
    iproute2; the ifconfig should not be used in examples.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Doug Ledford

    Stephen Hemminger
     

12 Apr, 2015

3 commits


11 Apr, 2015

1 commit

  • Enable inline_data feature by default since it brings us better
    performance and space utilization and now has already stable.
    Add another option noinline_data to disable it during mount.

    Suggested-by: Jaegeuk Kim
    Suggested-by: Chao Yu
    Signed-off-by: Wanpeng Li
    Signed-off-by: Jaegeuk Kim

    Wanpeng Li
     

04 Apr, 2015

1 commit


04 Mar, 2015

1 commit

  • This patch adds a mount option 'extent_cache' in f2fs.

    It is try to use a rb-tree based extent cache to cache more mapping information
    with less memory if this option is set, otherwise we will use the original one
    extent info cache.

    Suggested-by: Changman Lee
    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

01 Mar, 2015

1 commit

  • We (the Ocfs2 project) recently moved the location of our ocfs2-tools
    git tree and project web page. The pertinent discussion can be seen
    here:

    https://oss.oracle.com/pipermail/ocfs2-devel/2015-February/010579.html

    The following patch updates the Ocfs2 documentation in MAINTAINERS,
    ocfs2.txt, and dlmfs.txt. I added our new official web page, changed
    the location of our tools git tree and removed the link to Joel's
    ancient kernel git tree - Andrew has handled our patches for a while
    now.

    Signed-off-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     

24 Feb, 2015

1 commit


23 Feb, 2015

1 commit


20 Feb, 2015

1 commit


17 Feb, 2015

6 commits

  • The DAX code accesses the underlying storage through the kernel's linear
    mapping, which may not be cache-coherent with user mappings on ARM, MIPS
    or SPARC. Temporarily disable the DAX code until this problem is
    resolved.

    The original XIP code also had this problem, but it was never noticed.

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Ralf Baechle
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • This is a port of the DAX functionality found in the current version of
    ext2.

    [matthew.r.wilcox@intel.com: heavily tweaked]
    [akpm@linux-foundation.org: remap_pages went away]
    Signed-off-by: Ross Zwisler
    Reviewed-by: Andreas Dilger
    Signed-off-by: Matthew Wilcox
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • This new function allows us to support hole-punch for DAX files by zeroing
    a partial page, as opposed to the dax_truncate_page() function which can
    only truncate to the end of the page. Reimplement dax_truncate_page() to
    call dax_zero_page_range().

    [ross.zwisler@linux.intel.com: ported to 3.13-rc2]
    [akpm@linux-foundation.org: fix typos in comments]
    Signed-off-by: Matthew Wilcox
    Signed-off-by: Ross Zwisler
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • To help people transition, accept the 'xip' mount option (and report it in
    /proc/mounts), but print a message encouraging people to switch over to
    the 'dax' option.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Mathieu Desnoyers
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • All callers of get_xip_mem() are now gone. Remove checks for it,
    initialisers of it, documentation of it and the only implementation of it.
    Also remove mm/filemap_xip.c as it is now empty. Also remove
    documentation of the long-gone get_xip_page().

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Based on the original XIP documentation, this documents the current state
    of affairs, and includes instructions on how users can enable DAX if their
    devices and kernel support it.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Randy Dunlap
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

13 Feb, 2015

7 commits

  • Pull f2fs updates from Jaegeuk Kim:
    "Major changes are to:
    - add f2fs_io_tracer and F2FS_IOC_GETVERSION
    - fix wrong acl assignment from parent
    - fix accessing wrong data blocks
    - fix wrong condition check for f2fs_sync_fs
    - align start block address for direct_io
    - add and refactor the readahead flows of FS metadata
    - refactor atomic and volatile write policies

    But most of patches are for clean-ups and minor bug fixes. Some of
    them refactor old code too"

    * tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (64 commits)
    f2fs: use spinlock for segmap_lock instead of rwlock
    f2fs: fix accessing wrong indexed data blocks
    f2fs: avoid variable length array
    f2fs: fix sparse warnings
    f2fs: allocate data blocks in advance for f2fs_direct_IO
    f2fs: introduce macros to convert bytes and blocks in f2fs
    f2fs: call set_buffer_new for get_block
    f2fs: check node page contents all the time
    f2fs: avoid data offset overflow when lseeking huge file
    f2fs: fix to use highmem for pages of newly created directory
    f2fs: introduce a batched trim
    f2fs: merge {invalidate,release}page for meta/node/data pages
    f2fs: show the number of writeback pages in stat
    f2fs: keep PagePrivate during releasepage
    f2fs: should fail mount when trying to recover data on read-only dev
    f2fs: split UMOUNT and FASTBOOT flags
    f2fs: avoid write_checkpoint if f2fs is mounted readonly
    f2fs: support norecovery mount option
    f2fs: fix not to drop mount options when retrying fill_super
    f2fs: merge flags in struct f2fs_sb_info
    ...

    Linus Torvalds
     
  • Merge third set of updates from Andrew Morton:

    - the rest of MM

    [ This includes getting rid of the numa hinting bits, in favor of
    just generic protnone logic. Yay. - Linus ]

    - core kernel

    - procfs

    - some of lib/ (lots of lib/ material this time)

    * emailed patches from Andrew Morton : (104 commits)
    lib/lcm.c: replace include
    lib/percpu_ida.c: remove redundant includes
    lib/strncpy_from_user.c: replace module.h include
    lib/stmp_device.c: replace module.h include
    lib/sort.c: move include inside #if 0
    lib/show_mem.c: remove redundant include
    lib/radix-tree.c: change to simpler include
    lib/plist.c: remove redundant include
    lib/nlattr.c: remove redundant include
    lib/kobject_uevent.c: remove redundant include
    lib/llist.c: remove redundant include
    lib/md5.c: simplify include
    lib/list_sort.c: rearrange includes
    lib/genalloc.c: remove redundant include
    lib/idr.c: remove redundant include
    lib/halfmd4.c: simplify includes
    lib/dynamic_queue_limits.c: simplify includes
    lib/sort.c: use simpler includes
    lib/interval_tree.c: simplify includes
    hexdump: make it return number of bytes placed in buffer
    ...

    Linus Torvalds
     
  • The output of /proc/$pid/numa_maps is in terms of number of pages like
    anon=22 or dirty=54. Here's some output:

    7f4680000000 default file=/hugetlb/bigfile anon=50 dirty=50 N0=50
    7f7659600000 default file=/anon_hugepage\040(deleted) anon=50 dirty=50 N0=50
    7fff8d425000 default stack anon=50 dirty=50 N0=50

    Looks like we have a stack and a couple of anonymous hugetlbfs
    areas page which both use the same amount of memory. They don't.

    The 'bigfile' uses 1GB pages and takes up ~50GB of space. The
    anon_hugepage uses 2MB pages and takes up ~100MB of space while the stack
    uses normal 4k pages. You can go over to smaps to figure out what the
    page size _really_ is with KernelPageSize or MMUPageSize. But, I think
    this is a pretty nasty and counterintuitive interface as it stands.

    This patch introduces 'kernelpagesize_kB' line element to
    /proc//numa_maps report file in order to help identifying the size of
    pages that are backing memory areas mapped by a given task. This is
    specially useful to help differentiating between HUGE and GIGANTIC page
    backed VMAs.

    This patch is based on Dave Hansen's proposal and reviewer's follow-ups
    taken from the following dicussion threads:
    * https://lkml.org/lkml/2011/9/21/454
    * https://lkml.org/lkml/2014/12/20/66

    Signed-off-by: Rafael Aquini
    Cc: Johannes Weiner
    Cc: Dave Hansen
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • Add a small section to proc.txt doc in order to document its
    /proc/pid/numa_maps interface. It does not introduce any functional
    changes, just documentation.

    Signed-off-by: Rafael Aquini
    Cc: Johannes Weiner
    Cc: Dave Hansen
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • Peak resident size of a process can be reset back to the process's
    current rss value by writing "5" to /proc/pid/clear_refs. The driving
    use-case for this would be getting the peak RSS value, which can be
    retrieved from the VmHWM field in /proc/pid/status, per benchmark
    iteration or test scenario.

    [akpm@linux-foundation.org: clarify behaviour in documentation]
    Signed-off-by: Petr Cermak
    Cc: Bjorn Helgaas
    Cc: Primiano Tucci
    Cc: Petr Cermak
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Cermak
     
  • Pull core block IO changes from Jens Axboe:
    "This contains:

    - A series from Christoph that cleans up and refactors various parts
    of the REQ_BLOCK_PC handling. Contributions in that series from
    Dongsu Park and Kent Overstreet as well.

    - CFQ:
    - A bug fix for cfq for realtime IO scheduling from Jeff Moyer.
    - A stable patch fixing a potential crash in CFQ in OOM
    situations. From Konstantin Khlebnikov.

    - blk-mq:
    - Add support for tag allocation policies, from Shaohua. This is
    a prep patch enabling libata (and other SCSI parts) to use the
    blk-mq tagging, instead of rolling their own.
    - Various little tweaks from Keith and Mike, in preparation for
    DM blk-mq support.
    - Minor little fixes or tweaks from me.
    - A double free error fix from Tony Battersby.

    - The partition 4k issue fixes from Matthew and Boaz.

    - Add support for zero+unprovision for blkdev_issue_zeroout() from
    Martin"

    * 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits)
    block: remove unused function blk_bio_map_sg
    block: handle the null_mapped flag correctly in blk_rq_map_user_iov
    blk-mq: fix double-free in error path
    block: prevent request-to-request merging with gaps if not allowed
    blk-mq: make blk_mq_run_queues() static
    dm: fix multipath regression due to initializing wrong request
    cfq-iosched: handle failure of cfq group allocation
    block: Quiesce zeroout wrapper
    block: rewrite and split __bio_copy_iov()
    block: merge __bio_map_user_iov into bio_map_user_iov
    block: merge __bio_map_kern into bio_map_kern
    block: pass iov_iter to the BLOCK_PC mapping functions
    block: add a helper to free bio bounce buffer pages
    block: use blk_rq_map_user_iov to implement blk_rq_map_user
    block: simplify bio_map_kern
    block: mark blk-mq devices as stackable
    block: keep established cmd_flags when cloning into a blk-mq request
    block: add blk-mq support to blk_insert_cloned_request()
    block: require blk_rq_prep_clone() be given an initialized clone request
    blk-mq: add tag allocation policy
    ...

    Linus Torvalds
     
  • Pull nfsd updates from Bruce Fields:
    "The main change is the pNFS block server support from Christoph, which
    allows an NFS client connected to shared disk to do block IO to the
    shared disk in place of NFS reads and writes. This also requires xfs
    patches, which should arrive soon through the xfs tree, barring
    unexpected problems. Support for other filesystems is also possible
    if there's interest.

    Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
    shape"

    * 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: default NFSv4.2 to on
    nfsd: pNFS block layout driver
    exportfs: add methods for block layout exports
    nfsd: add trace events
    nfsd: update documentation for pNFS support
    nfsd: implement pNFS layout recalls
    nfsd: implement pNFS operations
    nfsd: make find_any_file available outside nfs4state.c
    nfsd: make find/get/put file available outside nfs4state.c
    nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
    nfsd: add fh_fsid_match helper
    nfsd: move nfsd_fh_match to nfsfh.h
    fs: add FL_LAYOUT lease type
    fs: track fl_owner for leases
    nfs: add LAYOUT_TYPE_MAX enum value
    nfsd: factor out a helper to decode nfstime4 values
    sunrpc/lockd: fix references to the BKL
    nfsd: fix year-2038 nfs4 state problem
    svcrdma: Handle additional inline content
    svcrdma: Move read list XDR round-up logic
    ...

    Linus Torvalds
     

12 Feb, 2015

5 commits

  • Merge second set of updates from Andrew Morton:
    "More of MM"

    * emailed patches from Andrew Morton : (83 commits)
    mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
    mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
    vmstat: Reduce time interval to stat update on idle cpu
    mm/page_owner.c: remove unnecessary stack_trace field
    Documentation/filesystems/proc.txt: describe /proc//map_files
    mm: incorporate read-only pages into transparent huge pages
    vmstat: do not use deferrable delayed work for vmstat_update
    mm: more aggressive page stealing for UNMOVABLE allocations
    mm: always steal split buddies in fallback allocations
    mm: when stealing freepages, also take pages created by splitting buddy page
    mincore: apply page table walker on do_mincore()
    mm: /proc/pid/clear_refs: avoid split_huge_page()
    mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
    mempolicy: apply page table walker on queue_pages_range()
    arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
    memcg: cleanup preparation for page table walk
    numa_maps: remove numa_maps->vma
    numa_maps: fix typo in gather_hugetbl_stats
    pagemap: use walk->vma instead of calling find_vma()
    clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
    ...

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights incluse:

    Features:
    - Removing the forced serialisation of open()/close() calls in
    NFSv4.x (x>0) makes for a significant performance improvement in
    metadata intensive workloads.
    - Full support for the pNFS "flexible files" layout type
    - Further RPC/RDMA client improvements from Chuck

    Bugfixes:
    - Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING
    - Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL
    - Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called
    as part of the namespace cleanup,
    - Stable fix: Ensure we reference the inode for return-on-close in
    delegreturn
    - Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to
    the same source address/port combination during a disconnect/
    reconnect event. This is a requirement imposed by most NFSv3
    server duplicate reply cache implementations.

    Optimisations:
    - Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT

    Other:
    - Add Anna Schumaker as co-maintainer for the NFS client"

    * tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits)
    SUNRPC: Cleanup to remove xs_tcp_close()
    pnfs: delete an unintended goto
    pnfs/flexfiles: Do not dprintk after the free
    SUNRPC: Fix stupid typo in xs_sock_set_reuseport
    SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG
    SUNRPC: Handle connection reset more efficiently.
    SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag
    SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release
    SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection
    SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT
    SUNRPC: Remove TCP socket linger code
    SUNRPC: Remove TCP client connection reset hack
    SUNRPC: TCP/UDP always close the old socket before reconnecting
    SUNRPC: Add helpers to prevent socket create from racing
    SUNRPC: Ensure xs_reset_transport() resets the close connection flags
    SUNRPC: Do not clear the source port in xs_reset_transport
    SUNRPC: Handle EADDRINUSE on connect
    SUNRPC: Set SO_REUSEPORT socket option for TCP connections
    NFSv4.1: Fix pnfs_put_lseg races
    NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
    ...

    Linus Torvalds
     
  • [akpm@linux-foundation.org: tweaks]
    Signed-off-by: Cyrill Gorcunov
    Cc: Kees Cook
    Cc: "Kirill A. Shutemov"
    Cc: Calvin Owens
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • This patch introduces a batched trimming feature, which submits split discard
    commands.

    This is to avoid long latency due to huge trim commands.
    If fstrim was triggered ranging from 0 to the end of device, we should lock
    all the checkpoint-related mutexes, resulting in very long latency.

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This patch adds a mount option, norecovery, which is mostly same as
    disable_roll_forward. The only difference is that norecovery should be activated
    with read-only mount option.

    This can be used when user wants to check whether f2fs is mountable or not
    without any recovery process. (e.g., xfstests/200)

    Reviewed-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim