09 May, 2017

26 commits

  • Merge more updates from Andrew Morton:

    - the rest of MM

    - various misc things

    - procfs updates

    - lib/ updates

    - checkpatch updates

    - kdump/kexec updates

    - add kvmalloc helpers, use them

    - time helper updates for Y2038 issues. We're almost ready to remove
    current_fs_time() but that awaits a btrfs merge.

    - add tracepoints to DAX

    * emailed patches from Andrew Morton : (114 commits)
    drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
    selftests/vm: add a test for virtual address range mapping
    dax: add tracepoint to dax_insert_mapping()
    dax: add tracepoint to dax_writeback_one()
    dax: add tracepoints to dax_writeback_mapping_range()
    dax: add tracepoints to dax_load_hole()
    dax: add tracepoints to dax_pfn_mkwrite()
    dax: add tracepoints to dax_iomap_pte_fault()
    mtd: nand: nandsim: convert to memalloc_noreclaim_*()
    treewide: convert PF_MEMALLOC manipulations to new helpers
    mm: introduce memalloc_noreclaim_{save,restore}
    mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
    mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
    mm/huge_memory.c: use zap_deposited_table() more
    time: delete CURRENT_TIME_SEC and CURRENT_TIME
    gfs2: replace CURRENT_TIME with current_time
    apparmorfs: replace CURRENT_TIME with current_time()
    lustre: replace CURRENT_TIME macro
    fs: ubifs: replace CURRENT_TIME_SEC with current_time
    fs: ufs: use ktime_get_real_ts64() for birthtime
    ...

    Linus Torvalds
     
  • Add a tracepoint to dax_insert_mapping(), following the same logging
    conventions as the rest of DAX. This tracepoint, along with the one in
    dax_load_hole(), lets us know how a DAX PTE fault was serviced.

    Here is an example DAX fault that inserts a PTE mapping:

    small-1126 [007] ....
    145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

    small-1126 [007] ....
    145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006

    small-1126 [007] ....
    145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

    Link: http://lkml.kernel.org/r/20170221195116.13278-7-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Cc: Dan Williams
    Cc: Ingo Molnar
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Add a tracepoint to dax_writeback_one(), following the same logging
    conventions as the rest of DAX.

    Here is an example range writeback which ends up flushing one PMD and
    one PTE:

    test-1265 [003] ....
    496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff

    test-1265 [003] ....
    496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200

    test-1265 [003] ....
    496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1

    test-1265 [003] ....
    496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff

    [akpm@linux-foundation.org: struct blk_dax_ctl has disappeared]
    Link: http://lkml.kernel.org/r/20170221195116.13278-6-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Cc: Dan Williams
    Cc: Ingo Molnar
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Add tracepoints to dax_writeback_mapping_range(), following the same
    logging conventions as the rest of DAX.

    Here is an example writeback call:

    msync-1085 [006] ....
    200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff

    msync-1085 [006] ....
    200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff

    [ross.zwisler@linux.intel.com: fix regression in dax_writeback_mapping_range()]
    Link: http://lkml.kernel.org/r/20170314215358.31451-1-ross.zwisler@linux.intel.com
    Link: http://lkml.kernel.org/r/20170221195116.13278-5-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Cc: Dan Williams
    Cc: Ingo Molnar
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Add tracepoints to dax_load_hole(), following the same logging conventions
    as the rest of DAX.

    Here is the logging generated by a PTE read from a hole:

    read-1075 [002] ....
    62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280

    read-1075 [002] ....
    62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

    read-1075 [002] ....
    62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

    Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Cc: Dan Williams
    Cc: Ingo Molnar
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Add tracepoints to dax_pfn_mkwrite(), following the same logging
    conventions as the rest of DAX.

    Here is an example PTE fault followed by a pfn_mkwrite:

    small_aligned-1094 [002] ....
    374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200

    small_aligned-1094 [002] ....
    374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE

    small_aligned-1094 [002] ....
    374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE

    Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Cc: Dan Williams
    Cc: Ingo Molnar
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Patch series "second round of tracepoints for DAX".

    This second round of DAX tracepoint patches adds tracing to the PTE
    fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
    dax_insert_mapping()) and to the writeback path
    (dax_writeback_mapping_range(), dax_writeback_one()).

    The purpose of this tracing is to give us a high level view of what DAX
    is doing, whether faults are being serviced by PMDs or PTEs, and by real
    storage or by zero pages covering holes.

    I do have some patches nearly ready which also add tracing to
    grab_mapping_entry() and dax_insert_mapping_entry(). These are more
    targeted at logging how we are interacting with the radix tree, how we
    use empty entries for locking, whether we "downgrade" huge zero pages to
    4k PTE sized allocations, etc. In the end it seemed to me that this
    might be too detailed to have as constantly present tracepoints, but if
    anyone sees value in having tracepoints like this in the DAX code
    permanently (Jan?), please let me know and I'll add those last two
    patches.

    All these tracepoints were done to be consistent with the style of the
    XFS tracepoints and with the existing DAX PMD tracepoints.

    This patch (of 6):

    Add tracepoints to dax_iomap_pte_fault(), following the same logging
    conventions as the rest of DAX.

    Here is an example fault that initially tries to be serviced by the PMD
    fault handler but which falls back to PTEs because the VMA isn't large
    enough to hold a PMD:

    small-1086 [005] ....
    71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003

    small-1086 [005] ....
    71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400

    small-1086 [005] ....
    71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK

    small-1086 [005] ....
    71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

    small-1086 [005] ....
    71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

    Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Cc: Dan Williams
    Cc: Ingo Molnar
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • CURRENT_TIME_SEC is not y2038 safe. current_time() will be transitioned
    to use 64 bit time along with vfs in a separate patch. There is no plan
    to transition CURRENT_TIME_SEC to use y2038 safe time interfaces.

    current_time() returns timestamps according to the granularities set in
    the inode's super_block. The granularity check to call
    current_fs_time() or CURRENT_TIME_SEC is not required.

    Use current_time() directly to update inode timestamp. Use
    timespec_trunc during file system creation, before the first inode is
    created.

    Link: http://lkml.kernel.org/r/1491613030-11599-9-git-send-email-deepa.kernel@gmail.com
    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Cc: Richard Weinberger
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Deepa Dinamani
     
  • CURRENT_TIME is not y2038 safe. Replace it with ktime_get_real_ts64().
    Inode time formats are already 64 bit long and accommodates time64_t.

    Link: http://lkml.kernel.org/r/1491613030-11599-6-git-send-email-deepa.kernel@gmail.com
    Signed-off-by: Deepa Dinamani
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Deepa Dinamani
     
  • CURRENT_TIME is not y2038 safe. The macro will be deleted and all the
    references to it will be replaced by ktime_get_* apis.

    struct timespec is also not y2038 safe. Retain timespec for timestamp
    representation here as ceph uses it internally everywhere. These
    references will be changed to use struct timespec64 in a separate patch.

    The current_fs_time() api is being changed to use vfs struct inode* as
    an argument instead of struct super_block*.

    Set the new mds client request r_stamp field using ktime_get_real_ts()
    instead of using current_fs_time().

    Also, since r_stamp is used as mtime on the server, use timespec_trunc()
    to truncate the timestamp, using the right granularity from the
    superblock.

    This api will be transitioned to be y2038 safe along with vfs.

    Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.com
    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    M: Ilya Dryomov
    M: "Yan, Zheng"
    M: Sage Weil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Deepa Dinamani
     
  • CURRENT_TIME macro is not y2038 safe on 32 bit systems.

    The patch replaces all the uses of CURRENT_TIME by current_time() for
    filesystem times, and ktime_get_* functions for authentication
    timestamps and timezone calculations.

    This is also in preparation for the patch that transitions vfs
    timestamps to use 64 bit time and hence make them y2038 safe.

    CURRENT_TIME macro will be deleted before merging the aforementioned
    change.

    The inode timestamps read from the server are assumed to have correct
    granularity and range.

    The patch also assumes that the difference between server and client
    times lie in the range INT_MIN..INT_MAX. This is valid because this is
    the difference between current times between server and client, and the
    largest timezone difference is in the range of one day.

    All cifs timestamps currently use timespec representation internally.
    Authentication and timezone timestamps can also be transitioned into
    using timespec64 when all other timestamps for cifs is transitioned to
    use timespec64.

    Link: http://lkml.kernel.org/r/1491613030-11599-4-git-send-email-deepa.kernel@gmail.com
    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Cc: Steve French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Deepa Dinamani
     
  • CURRENT_TIME_SEC is not y2038 safe.

    Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds in segment
    timestamps used by GC algorithm including the segment mtime timestamps.

    Link: http://lkml.kernel.org/r/1491613030-11599-2-git-send-email-deepa.kernel@gmail.com
    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Cc: Jaegeuk Kim
    Cc: Chao Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Deepa Dinamani
     
  • Commit afddba49d18f ("fs: introduce write_begin, write_end, and
    perform_write aops") introduced AOP_FLAG_UNINTERRUPTIBLE flag which was
    checked in pagecache_write_begin(), but that check was removed by
    4e02ed4b4a2f ("fs: remove prepare_write/commit_write").

    Between these two commits, commit d9414774dc0c ("cifs: Convert cifs to
    new aops.") added a check in cifs_write_begin(), but that check was soon
    removed by commit a98ee8c1c707 ("[CIFS] fix regression in
    cifs_write_begin/cifs_write_end").

    Therefore, AOP_FLAG_UNINTERRUPTIBLE flag is checked nowhere. Let's
    remove this flag. This patch has no functionality changes.

    Link: http://lkml.kernel.org/r/1489294781-53494-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
    Signed-off-by: Tetsuo Handa
    Reviewed-by: Jeff Layton
    Reviewed-by: Christoph Hellwig
    Cc: Nick Piggin
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Fix typos and add the following to the scripts/spelling.txt:

    intialisation||initialisation
    intialised||initialised
    intialise||initialise

    This commit does not intend to change the British spelling itself.

    Link: http://lkml.kernel.org/r/1481573103-11329-18-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • __vmalloc* allows users to provide gfp flags for the underlying
    allocation. This API is quite popular

    $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
    77

    The only problem is that many people are not aware that they really want
    to give __GFP_HIGHMEM along with other flags because there is really no
    reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
    which are mapped to the kernel vmalloc space. About half of users don't
    use this flag, though. This signals that we make the API unnecessarily
    too complex.

    This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
    be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM
    are simplified and drop the flag.

    Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Matthew Wilcox
    Cc: Al Viro
    Cc: Vlastimil Babka
    Cc: David Rientjes
    Cc: Cristopher Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • There are many code paths opencoding kvmalloc. Let's use the helper
    instead. The main difference to kvmalloc is that those users are
    usually not considering all the aspects of the memory allocator. E.g.
    allocation requests
    Reviewed-by: Boris Ostrovsky # Xen bits
    Acked-by: Kees Cook
    Acked-by: Vlastimil Babka
    Acked-by: Andreas Dilger # Lustre
    Acked-by: Christian Borntraeger # KVM/s390
    Acked-by: Dan Williams # nvdim
    Acked-by: David Sterba # btrfs
    Acked-by: Ilya Dryomov # Ceph
    Acked-by: Tariq Toukan # mlx4
    Acked-by: Leon Romanovsky # mlx5
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Herbert Xu
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Tony Luck
    Cc: "Rafael J. Wysocki"
    Cc: Ben Skeggs
    Cc: Kent Overstreet
    Cc: Santosh Raspatur
    Cc: Hariprasad S
    Cc: Yishai Hadas
    Cc: Oleg Drokin
    Cc: "Yan, Zheng"
    Cc: Alexander Viro
    Cc: Alexei Starovoitov
    Cc: Eric Dumazet
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • getxattr uses vmalloc to allocate memory if kzalloc fails. This is
    filled by vfs_getxattr and then copied to the userspace. vmalloc,
    however, doesn't zero out the memory so if the specific implementation
    of the xattr handler is sloppy we can theoretically expose a kernel
    memory. There is no real sign this is really the case but let's make
    sure this will not happen and use vzalloc instead.

    Fixes: 779302e67835 ("fs/xattr.c:getxattr(): improve handling of allocation failures")
    Link: http://lkml.kernel.org/r/20170306103327.2766-1-mhocko@kernel.org
    Acked-by: Kees Cook
    Reported-by: Vlastimil Babka
    Signed-off-by: Michal Hocko
    Cc: [3.6+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Patch series "kvmalloc", v5.

    There are many open coded kmalloc with vmalloc fallback instances in the
    tree. Most of them are not careful enough or simply do not care about
    the underlying semantic of the kmalloc/page allocator which means that
    a) some vmalloc fallbacks are basically unreachable because the kmalloc
    part will keep retrying until it succeeds b) the page allocator can
    invoke a really disruptive steps like the OOM killer to move forward
    which doesn't sound appropriate when we consider that the vmalloc
    fallback is available.

    As it can be seen implementing kvmalloc requires quite an intimate
    knowledge if the page allocator and the memory reclaim internals which
    strongly suggests that a helper should be implemented in the memory
    subsystem proper.

    Most callers, I could find, have been converted to use the helper
    instead. This is patch 6. There are some more relying on __GFP_REPEAT
    in the networking stack which I have converted as well and Eric Dumazet
    was not opposed [2] to convert them as well.

    [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
    [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

    This patch (of 9):

    Using kmalloc with the vmalloc fallback for larger allocations is a
    common pattern in the kernel code. Yet we do not have any common helper
    for that and so users have invented their own helpers. Some of them are
    really creative when doing so. Let's just add kv[mz]alloc and make sure
    it is implemented properly. This implementation makes sure to not make
    a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
    to not warn about allocation failures. This also rules out the OOM
    killer as the vmalloc is a more approapriate fallback than a disruptive
    user visible action.

    This patch also changes some existing users and removes helpers which
    are specific for them. In some cases this is not possible (e.g.
    ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
    require GFP_NO{FS,IO} context which is not vmalloc compatible in general
    (note that the page table allocation is GFP_KERNEL). Those need to be
    fixed separately.

    While we are at it, document that __vmalloc{_node} about unsupported gfp
    mask because there seems to be a lot of confusion out there.
    kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
    superset) flags to catch new abusers. Existing ones would have to die
    slowly.

    [sfr@canb.auug.org.au: f2fs fixup]
    Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Stephen Rothwell
    Reviewed-by: Andreas Dilger [ext4 part]
    Acked-by: Vlastimil Babka
    Cc: John Hubbard
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • pid_ns_for_children set by a task is known only to the task itself, and
    it's impossible to identify it from outside.

    It's a big problem for checkpoint/restore software like CRIU, because it
    can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
    their work.

    This patch solves the problem, and it exposes pid_ns_for_children to ns
    directory in standard way with the name "pid_for_children":

    ~# ls /proc/5531/ns -l | grep pid
    lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
    lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]

    Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Cc: Andrei Vagin
    Cc: Andreas Gruenbacher
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: Paul Moore
    Cc: Eric Biederman
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • Patch series "Expose task pid_ns_for_children to userspace".

    pid_ns_for_children set by a task is known only to the task itself, and
    it's impossible to identify it from outside.

    It's a big problem for checkpoint/restore software like CRIU, because it
    can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
    their work. If they have a custom pid_ns_for_children before dump, they
    must have the same ns after restore. Otherwise, restored task bumped
    into enviroment it does not expect.

    This patchset solves the problem. It exposes pid_ns_for_children to ns
    directory in standard way with the name "pid_for_children":

    ~# ls /proc/5531/ns -l | grep pid
    lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
    lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]

    This patch (of 2):

    Make possible to have link content prefix yyy different from the link
    name xxx:

    $ readlink /proc/[pid]/ns/xxx
    yyy:[4026531838]

    This will be used in next patch.

    Link: http://lkml.kernel.org/r/149201120318.6007.7362655181033883000.stgit@localhost.localdomain
    Signed-off-by: Kirill Tkhai
    Reviewed-by: Cyrill Gorcunov
    Acked-by: Andrei Vagin
    Cc: Andreas Gruenbacher
    Cc: Kees Cook
    Cc: Michael Kerrisk
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: Paul Moore
    Cc: Eric Biederman
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Tkhai
     
  • Prepare to mark sensitive kernel structures for randomization by making
    sure they're using designated initializers. These were identified
    during allyesconfig builds of x86, arm, and arm64, with most initializer
    fixes extracted from grsecurity.

    Link: http://lkml.kernel.org/r/20170329210419.GA40066@beast
    Signed-off-by: Kees Cook
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Coccinelle emits this warning:

    WARNING: casting value returned by memory allocation function to (struct proc_inode *) is useless.

    Remove unnecessary cast.

    Link: http://lkml.kernel.org/r/1487745720-16967-1-git-send-email-me@tobin.cc
    Signed-off-by: Tobin C. Harding
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tobin C. Harding
     
  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've focused on enhancing performance with regards to
    block allocation, GC, and discard/in-place-update IO controls. There
    are a bunch of clean-ups as well as minor bug fixes.

    Enhancements:
    - disable heap-based allocation by default
    - issue small-sized discard commands by default
    - change the policy of data hotness for logging
    - distinguish IOs in terms of size and wbc type
    - start SSR earlier to avoid foreground GC
    - enhance data structures managing discard commands
    - enhance in-place update flow
    - add some more fault injection routines
    - secure one more xattr entry

    Bug fixes:
    - calculate victim cost for GC correctly
    - remain correct victim segment number for GC
    - race condition in nid allocator and initializer
    - stale pointer produced by atomic_writes
    - fix missing REQ_SYNC for flush commands
    - handle missing errors in more corner cases"

    * tag 'for-f2fs-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (111 commits)
    f2fs: fix a mount fail for wrong next_scan_nid
    f2fs: enhance scalability of trace macro
    f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS
    f2fs: Make flush bios explicitely sync
    f2fs: show available_nids in f2fs/status
    f2fs: flush dirty nats periodically
    f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard
    f2fs: allow cpc->reason to indicate more than one reason
    f2fs: release cp and dnode lock before IPU
    f2fs: shrink size of struct discard_cmd
    f2fs: don't hold cmd_lock during waiting discard command
    f2fs: nullify fio->encrypted_page for each writes
    f2fs: sanity check segment count
    f2fs: introduce valid_ipu_blkaddr to clean up
    f2fs: lookup extent cache first under IPU scenario
    f2fs: reconstruct code to write a data page
    f2fs: introduce __wait_discard_cmd
    f2fs: introduce __issue_discard_cmd
    f2fs: enable small discard by default
    f2fs: delay awaking discard thread
    ...

    Linus Torvalds
     
  • Pull fscrypt updates from Ted Ts'o:
    "Only bug fixes and cleanups for this merge window"

    * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
    fscrypt: correct collision claim for digested names
    MAINTAINERS: fscrypt: update mailing list, patchwork, and git
    ext4: clean up ext4_match() and callers
    f2fs: switch to using fscrypt_match_name()
    ext4: switch to using fscrypt_match_name()
    fscrypt: introduce helper function for filename matching
    fscrypt: avoid collisions when presenting long encrypted filenames
    f2fs: check entire encrypted bigname when finding a dentry
    ubifs: check for consistent encryption contexts in ubifs_lookup()
    f2fs: sync f2fs_lookup() with ext4_lookup()
    ext4: remove "nokey" check from ext4_lookup()
    fscrypt: fix context consistency check when key(s) unavailable
    fscrypt: Remove __packed from fscrypt_policy
    fscrypt: Move key structure and constants to uapi
    fscrypt: remove fscrypt_symlink_data_len()
    fscrypt: remove unnecessary checks for NULL operations

    Linus Torvalds
     
  • Pull ext4 updates from Ted Ts'o:

    - add GETFSMAP support

    - some performance improvements for very large file systems and for
    random write workloads into a preallocated file

    - bug fixes and cleanups.

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd2: cleanup write flags handling from jbd2_write_superblock()
    ext4: mark superblock writes synchronous for nobarrier mounts
    ext4: inherit encryption xattr before other xattrs
    ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio()
    ext4: avoid unnecessary transaction stalls during writeback
    ext4: preload block group descriptors
    ext4: make ext4_shutdown() static
    ext4: support GETFSMAP ioctls
    vfs: add common GETFSMAP ioctl definitions
    ext4: evict inline data when writing to memory map
    ext4: remove ext4_xattr_check_entry()
    ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()
    ext4: merge ext4_xattr_list() into ext4_listxattr()
    ext4: constify static data that is never modified
    ext4: trim return value and 'dir' argument from ext4_insert_dentry()
    jbd2: fix dbench4 performance regression for 'nobarrier' mounts
    jbd2: Fix lockdep splat with generic/270 test
    mm: retry writepages() on ENOMEM when doing an data integrity writeback

    Linus Torvalds
     

07 May, 2017

3 commits

  • Pull cifs fixes from Steve French:
    "Various fixes for stable for CIFS/SMB3 especially for better
    interoperability for SMB3 to Macs.

    It also includes Pavel's improvements to SMB3 async i/o support
    (which is much faster now)"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: add misssing SFM mapping for doublequote
    SMB3: Work around mount failure when using SMB3 dialect to Macs
    cifs: fix CIFS_IOC_GET_MNT_INFO oops
    CIFS: fix mapping of SFM_SPACE and SFM_PERIOD
    CIFS: fix oplock break deadlocks
    cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops
    cifs: fix leak in FSCTL_ENUM_SNAPS response handling
    Set unicode flag on cifs echo request to avoid Mac error
    CIFS: Add asynchronous write support through kernel AIO
    CIFS: Add asynchronous read support through kernel AIO
    CIFS: Add asynchronous context to support kernel AIO
    cifs: fix IPv6 link local, with scope id, address parsing
    cifs: small underflow in cnvrtDosUnixTm()

    Linus Torvalds
     
  • Pull xfs updates from Darrick Wong:
    "Here are the XFS changes for 4.12. The big new feature for this
    release is the new space mapping ioctl that we've been discussing
    since LSF2016, but other than that most of the patches are larger bug
    fixes, memory corruption prevention, and other cleanups.

    Summary:
    - various code cleanups
    - introduce GETFSMAP ioctl
    - various refactoring
    - avoid dio reads past eof
    - fix memory corruption and other errors with fragmented directory blocks
    - fix accidental userspace memory corruptions
    - publish fs uuid in superblock
    - make fstrim terminatable
    - fix race between quotaoff and in-core inode creation
    - avoid use-after-free when finishing up w/ buffer heads
    - reserve enough space to handle bmap tree resizing during cow remap"

    * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits)
    xfs: fix use-after-free in xfs_finish_page_writeback
    xfs: reserve enough blocks to handle btree splits when remapping
    xfs: wait on new inodes during quotaoff dquot release
    xfs: update ag iterator to support wait on new inodes
    xfs: support ability to wait on new inodes
    xfs: publish UUID in struct super_block
    xfs: Allow user to kill fstrim process
    xfs: better log intent item refcount checking
    xfs: fix up quotacheck buffer list error handling
    xfs: remove xfs_trans_ail_delete_bulk
    xfs: don't use bool values in trace buffers
    xfs: fix getfsmap userspace memory corruption while setting OF_LAST
    xfs: fix __user annotations for xfs_ioc_getfsmap
    xfs: corruption needs to respect endianess too!
    xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap
    xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap
    xfs: simplify validation of the unwritten extent bit
    xfs: remove unused values from xfs_exntst_t
    xfs: remove the unused XFS_MAXLINK_1 define
    xfs: more do_div cleanups
    ...

    Linus Torvalds
     
  • Pull block fixes and updates from Jens Axboe:
    "Some fixes and followup features/changes that should go in, in this
    merge window. This contains:

    - Two fixes for lightnvm from Javier, fixing problems in the new code
    merge previously in this merge window.

    - A fix from Jan for the backing device changes, fixing an issue in
    NFS that causes a failure to mount on certain setups.

    - A change from Christoph, cleaning up the blk-mq init and exit
    request paths.

    - Remove elevator_change(), which is now unused. From Bart.

    - A fix for queue operation invocation on a dead queue, from Bart.

    - A series fixing up mtip32xx for blk-mq scheduling, removing a
    bandaid we previously had in place for this. From me.

    - A regression fix for this series, fixing a case where we wait on
    workqueue flushing from an invalid (non-blocking) context. From me.

    - A fix/optimization from Ming, ensuring that we don't both quiesce
    and freeze a queue at the same time.

    - A fix from Peter on lock ordering for CPU hotplug. Not a real
    problem right now, but will be once the CPU hotplug rework goes in.

    - A series from Omar, cleaning up out blk-mq debugfs support, and
    adding support for exporting info from schedulers in debugfs as
    well. This is really useful in debugging stalls or livelocks. From
    Omar"

    * 'for-linus' of git://git.kernel.dk/linux-block: (28 commits)
    mq-deadline: add debugfs attributes
    kyber: add debugfs attributes
    blk-mq-debugfs: allow schedulers to register debugfs attributes
    blk-mq: untangle debugfs and sysfs
    blk-mq: move debugfs declarations to a separate header file
    blk-mq: Do not invoke queue operations on a dead queue
    blk-mq-debugfs: get rid of a bunch of boilerplate
    blk-mq-debugfs: rename hw queue directories from to hctx
    blk-mq-debugfs: don't open code strstrip()
    blk-mq-debugfs: error on long write to queue "state" file
    blk-mq-debugfs: clean up flag definitions
    blk-mq-debugfs: separate flags with |
    nfs: Fix bdi handling for cloned superblocks
    block/mq: Cure cpu hotplug lock inversion
    lightnvm: fix bad back free on error path
    lightnvm: create cmd before allocating request
    blk-mq: don't use sync workqueue flushing from drivers
    mtip32xx: convert internal commands to regular block infrastructure
    mtip32xx: cleanup internal tag assumptions
    block: don't call blk_mq_quiesce_queue() after queue is frozen
    ...

    Linus Torvalds
     

06 May, 2017

7 commits

  • Pull libnvdimm updates from Dan Williams:
    "The bulk of this has been in multiple -next releases. There were a few
    late breaking fixes and small features that got added in the last
    couple days, but the whole set has received a build success
    notification from the kbuild robot.

    Change summary:

    - Region media error reporting: A libnvdimm region device is the
    parent to one or more namespaces. To date, media errors have been
    reported via the "badblocks" attribute attached to pmem block
    devices for namespaces in "raw" or "memory" mode. Given that
    namespaces can be in "device-dax" or "btt-sector" mode this new
    interface reports media errors generically, i.e. independent of
    namespace modes or state.

    This subsequently allows userspace tooling to craft "ACPI 6.1
    Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
    requests and submit them via the ioctl path for NVDIMM root bus
    devices.

    - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
    by a request from Linus and feedback from Christoph this allows for
    dax capable drivers to publish their own custom dax operations.
    This fixes the broken assumption that all dax operations are
    related to a persistent memory device, and makes it easier for
    other architectures and platforms to add customized persistent
    memory support.

    - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
    available for storage appliance applications to manually trigger
    memory controllers to drain write-pending buffers that would
    otherwise be flushed automatically by the platform ADR
    (asynchronous-DRAM-refresh) mechanism at a power loss event.
    Support for "locked" DIMMs is included to prevent namespaces from
    surfacing when the namespace label data area is locked. Finally,
    fixes for various reported deadlocks and crashes, also tagged for
    -stable.

    - ACPI / nfit driver updates: General updates of the nfit driver to
    add DSM command overrides, ACPI 6.1 health state flags support, DSM
    payload debug available by default, and various fixes.

    Acknowledgements that came after the branch was pushed:

    - commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock":
    Tested-by: Yi Zhang

    - commit 23f498448362 "libnvdimm: rework region badblocks clearing"
    Tested-by: Toshi Kani "

    * tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
    libnvdimm, pfn: fix 'npfns' vs section alignment
    libnvdimm: handle locked label storage areas
    libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
    brd: fix uninitialized use of brd->dax_dev
    block, dax: use correct format string in bdev_dax_supported
    device-dax: fix sysfs attribute deadlock
    libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
    libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
    libnvdimm: rework region badblocks clearing
    acpi, nfit: kill ACPI_NFIT_DEBUG
    libnvdimm: fix clear length of nvdimm_forget_poison()
    libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
    libnvdimm, region: sysfs trigger for nvdimm_flush()
    libnvdimm: fix phys_addr for nvdimm_clear_poison
    x86, dax, pmem: remove indirection around memcpy_from_pmem()
    block: remove block_device_operations ->direct_access()
    block, dax: convert bdev_dax_supported() to dax_direct_access()
    filesystem-dax: convert to dax_direct_access()
    Revert "block: use DAX for partition table reads"
    ext2, ext4, xfs: retrieve dax_device for iomap operations
    ...

    Linus Torvalds
     
  • Pull GFS2 updates from Bob Peterson:
    "We've got ten GFS2 patches for this merge window.

    - Andreas Gruenbacher wrote a patch to replace the deprecated call to
    rhashtable_walk_init with rhashtable_walk_enter.

    - Andreas also wrote a patch to eliminate redundant code in two of
    our debugfs sequence files.

    - Andreas also cleaned up the rhashtable key ugliness Linus pointed
    out during this cycle, following Linus's suggestions.

    - Andreas also wrote a patch to take advantage of his new function
    rhashtable_lookup_get_insert_fast. This makes glock lookup faster
    and more bullet-proof.

    - Andreas also wrote a patch to revert a patch in the evict path that
    caused occasional deadlocks, and is no longer needed.

    - Andrew Price wrote a patch to re-enable fallocate for the rindex
    system file to enable gfs2_grow to grow properly on secondary file
    system grow operations.

    - I wrote a patch to initialize an inode number field to make certain
    kernel trace points more understandable.

    - I also wrote a patch that makes GFS2 file system "withdraw" work
    more like it should by ignoring operations after a withdraw that
    would formerly cause a BUG() and kernel panic.

    - I also reworked the entire truncate/delete algorithm, scrapping the
    old recursive algorithm in favor of a new non-recursive algorithm.
    This was done for performance: This way, GFS2 no longer needs to
    lock multiple resource groups while doing truncates and deletes of
    files that cross multiple resource group boundaries, allowing for
    better parallelism. It also solves a problem whereby deleting large
    files would request a large chunk of kernel memory, which resulted
    in a get_page_from_freelist warning.

    - Due to a regression found during testing, I added a new patch to
    correct 'GFS2: Prevent BUG from occurring when normal Withdraws
    occur'."

    * tag 'gfs2-4.12.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    GFS2: Allow glocks to be unlocked after withdraw
    GFS2: Non-recursive delete
    gfs2: Re-enable fallocate for the rindex
    Revert "GFS2: Wait for iopen glock dequeues"
    gfs2: Switch to rhashtable_lookup_get_insert_fast
    GFS2: Temporarily zero i_no_addr when creating a dinode
    gfs2: Don't pack struct lm_lockname
    gfs2: Deduplicate gfs2_{glocks,glstats}_open
    gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter
    GFS2: Prevent BUG from occurring when normal Withdraws occur

    Linus Torvalds
     
  • Pull orangefs updates from Mike Marshall:
    "Orangefs cleanups, fixes and statx support.

    Some cleanups:

    - remove unused get_fsid_from_ino
    - fix bounds check for listxattr
    - clean up oversize xattr validation
    - do not set getattr_time on orangefs_lookup
    - return from orangefs_devreq_read quickly if possible
    - do not wait for timeout if umounting
    - handle zero size write in debugfs

    Bug fixes:

    - do not check possibly stale size on truncate
    - ensure the userspace component is unmounted if mount fails
    - total reimplementation of dir.c

    New feature:

    - implement statx

    The new implementation of dir.c is kind of a big deal, all new code.
    It has been posted to fs-devel during the previous rc period, we
    didn't get much review or feedback from there, but it has been
    reviewed very heavily here, so much so that we have two entire
    versions of the reimplementation.

    Not only does the new implementation fix some xfstests, but it passes
    all the new tests we made here that involve seeking and rewinding and
    giant directories and long file names. The new dir code has three
    patches itself:

    - skip forward to the next directory entry if seek is short
    - invalidate stored directory on seek
    - count directory pieces correctly"

    * tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
    orangefs: count directory pieces correctly
    orangefs: invalidate stored directory on seek
    orangefs: skip forward to the next directory entry if seek is short
    orangefs: handle zero size write in debugfs
    orangefs: do not wait for timeout if umounting
    orangefs: return from orangefs_devreq_read quickly if possible
    orangefs: ensure the userspace component is unmounted if mount fails
    orangefs: do not check possibly stale size on truncate
    orangefs: implement statx
    orangefs: remove ORANGEFS_READDIR macros
    orangefs: support very large directories
    orangefs: support llseek on directories
    orangefs: rewrite readdir to fix several bugs
    orangefs: do not set getattr_time on orangefs_lookup
    orangefs: clean up oversize xattr validation
    orangefs: fix bounds check for listxattr
    orangefs: remove unused get_fsid_from_ino

    Linus Torvalds
     
  • Pull befs fix from Luis de Bethencourt:
    "One fix from Fabian Frederick making the nfs client still work after a
    cache drop"

    * tag 'befs-v4.12-rc1' of git://github.com/luisbg/linux-befs:
    befs: make export work with cold dcache

    Linus Torvalds
     
  • This bug fixes a regression introduced by patch 0d1c7ae9d8.

    The intent of the patch was to stop promoting glocks after a
    file system is withdrawn due to a variety of errors, because doing
    so results in a BUG(). (You should be able to unmount after a
    withdraw rather than having the kernel panic.)

    Unfortunately, it also stopped demotions, so glocks could not be
    unlocked after withdraw, which means the unmount would hang.

    This patch allows function do_xmote to demote locks to an
    unlocked state after a withdraw, but not promote them.

    Signed-off-by: Bob Peterson

    Bob Peterson
     
  • Commit 28b783e47ad7 ("xfs: bufferhead chains are invalid after
    end_page_writeback") fixed one use-after-free issue by
    pre-calculating the loop conditionals before calling bh->b_end_io()
    in the end_io processing loop, but it assigned 'next' pointer before
    checking end offset boundary & breaking the loop, at which point the
    bh might be freed already, and caused use-after-free.

    This is caught by KASAN when running fstests generic/127 on sub-page
    block size XFS.

    [ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
    [ 2747.868840] ==================================================================
    [ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
    ...
    [ 2747.918245] Call Trace:
    [ 2747.920975] dump_stack+0x63/0x84
    [ 2747.924673] kasan_object_err+0x21/0x70
    [ 2747.928950] kasan_report+0x271/0x530
    [ 2747.933064] ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
    [ 2747.938409] ? end_page_writeback+0xce/0x110
    [ 2747.943171] __asan_report_load8_noabort+0x19/0x20
    [ 2747.948545] xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
    [ 2747.953724] xfs_end_io+0x1af/0x2b0 [xfs]
    [ 2747.958197] process_one_work+0x5ff/0x1000
    [ 2747.962766] worker_thread+0xe4/0x10e0
    [ 2747.966946] kthread+0x2d3/0x3d0
    [ 2747.970546] ? process_one_work+0x1000/0x1000
    [ 2747.975405] ? kthread_create_on_node+0xc0/0xc0
    [ 2747.980457] ? syscall_return_slowpath+0xe6/0x140
    [ 2747.985706] ? do_page_fault+0x30/0x80
    [ 2747.989887] ret_from_fork+0x2c/0x40
    [ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
    [ 2748.001155] Allocated:
    [ 2748.003782] PID = 8327
    [ 2748.006411] save_stack_trace+0x1b/0x20
    [ 2748.010688] save_stack+0x46/0xd0
    [ 2748.014383] kasan_kmalloc+0xad/0xe0
    [ 2748.018370] kasan_slab_alloc+0x12/0x20
    [ 2748.022648] kmem_cache_alloc+0xb8/0x1b0
    [ 2748.027024] alloc_buffer_head+0x22/0xc0
    [ 2748.031399] alloc_page_buffers+0xd1/0x250
    [ 2748.035968] create_empty_buffers+0x30/0x410
    [ 2748.040730] create_page_buffers+0x120/0x1b0
    [ 2748.045493] __block_write_begin_int+0x17a/0x1800
    [ 2748.050740] iomap_write_begin+0x100/0x2f0
    [ 2748.055308] iomap_zero_range_actor+0x253/0x5c0
    [ 2748.060362] iomap_apply+0x157/0x270
    [ 2748.064347] iomap_zero_range+0x5a/0x80
    [ 2748.068624] iomap_truncate_page+0x6b/0xa0
    [ 2748.073227] xfs_setattr_size+0x1f7/0xa10 [xfs]
    [ 2748.078312] xfs_vn_setattr_size+0x68/0x140 [xfs]
    [ 2748.083589] xfs_file_fallocate+0x4ac/0x820 [xfs]
    [ 2748.088838] vfs_fallocate+0x2cf/0x780
    [ 2748.093021] SyS_fallocate+0x48/0x80
    [ 2748.097006] do_syscall_64+0x18a/0x430
    [ 2748.101186] return_from_SYSCALL_64+0x0/0x6a
    [ 2748.105948] Freed:
    [ 2748.108189] PID = 8327
    [ 2748.110816] save_stack_trace+0x1b/0x20
    [ 2748.115093] save_stack+0x46/0xd0
    [ 2748.118788] kasan_slab_free+0x73/0xc0
    [ 2748.122969] kmem_cache_free+0x7a/0x200
    [ 2748.127247] free_buffer_head+0x41/0x80
    [ 2748.131524] try_to_free_buffers+0x178/0x250
    [ 2748.136316] xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
    [ 2748.141563] try_to_release_page+0x100/0x180
    [ 2748.146325] invalidate_inode_pages2_range+0x7da/0xcf0
    [ 2748.152087] xfs_shift_file_space+0x37d/0x6e0 [xfs]
    [ 2748.157557] xfs_collapse_file_space+0x49/0x120 [xfs]
    [ 2748.163223] xfs_file_fallocate+0x2a7/0x820 [xfs]
    [ 2748.168462] vfs_fallocate+0x2cf/0x780
    [ 2748.172642] SyS_fallocate+0x48/0x80
    [ 2748.176629] do_syscall_64+0x18a/0x430
    [ 2748.180810] return_from_SYSCALL_64+0x0/0x6a

    Fixed it by checking on offset against end & breaking out first,
    dereference bh only if there're still bufferheads to process.

    Signed-off-by: Eryu Guan
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eryu Guan
     
  • Pull namespace updates from Eric Biederman:
    "This is a set of small fixes that were mostly stumbled over during
    more significant development. This proc fix and the fix to
    posix-timers are the most significant of the lot.

    There is a lot of good development going on but unfortunately it
    didn't quite make the merge window"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    proc: Fix unbalanced hard link numbers
    signal: Make kill_proc_info static
    rlimit: Properly call security_task_setrlimit
    signal: Remove unused definition of sig_user_definied
    ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET
    ipc: Remove unused declaration of recompute_msgmni
    posix-timers: Correct sanity check in posix_cpu_nsleep
    sysctl: Remove dead register_sysctl_root

    Linus Torvalds
     

05 May, 2017

4 commits

  • SFM is mapping doublequote to 0xF020

    Without this patch creating files with doublequote fails to Windows/Mac

    Signed-off-by: Bjoern Jacke
    Signed-off-by: Steve French
    CC: stable

    Björn Jacke
     
  • based on commit b3b42c0deaa1
    ("fs/affs: make export work with cold dcache")

    This adds get_parent function so that nfs client can still work after
    cache drop (Tested on NFS v4 with echo 3 > /proc/sys/vm/drop_caches)

    Signed-off-by: Fabian Frederick
    Signed-off-by: Luis de Bethencourt

    Fabian Frederick
     
  • Pull char/misc driver updates from Greg KH:
    "Here is the big set of new char/misc driver drivers and features for
    4.12-rc1.

    There's lots of new drivers added this time around, new firmware
    drivers from Google, more auxdisplay drivers, extcon drivers, fpga
    drivers, and a bunch of other driver updates. Nothing major, except if
    you happen to have the hardware for these drivers, and then you will
    be happy :)

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
    firmware: google memconsole: Fix return value check in platform_memconsole_init()
    firmware: Google VPD: Fix return value check in vpd_platform_init()
    goldfish_pipe: fix build warning about using too much stack.
    goldfish_pipe: An implementation of more parallel pipe
    fpga fr br: update supported version numbers
    fpga: region: release FPGA region reference in error path
    fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe()
    mei: drop the TODO from samples
    firmware: Google VPD sysfs driver
    firmware: Google VPD: import lib_vpd source files
    misc: lkdtm: Add volatile to intentional NULL pointer reference
    eeprom: idt_89hpesx: Add OF device ID table
    misc: ds1682: Add OF device ID table
    misc: tsl2550: Add OF device ID table
    w1: Remove unneeded use of assert() and remove w1_log.h
    w1: Use kernel common min() implementation
    uio_mf624: Align memory regions to page size and set correct offsets
    uio_mf624: Refactor memory info initialization
    uio: Allow handling of non page-aligned memory regions
    hangcheck-timer: Fix typo in comment
    ...

    Linus Torvalds
     
  • A large directory full of differently sized file names triggered this.
    Most directories, even very large directories with shorter names, would
    be lucky enough to fit in one server response.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg