22 Feb, 2018

1 commit


06 Oct, 2017

1 commit

  • Pull overlayfs fixes from Miklos Szeredi:
    "Fix a regression in 4.14 and one in 4.13. The latter is a case when
    Docker is doing something it really shouldn't and gets away with it.
    We now print a warning instead of erroring out.

    There are also fixes to several error paths"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: fix regression caused by exclusive upper/work dir protection
    ovl: fix missing unlock_rename() in ovl_do_copy_up()
    ovl: fix dentry leak in ovl_indexdir_cleanup()
    ovl: fix dput() of ERR_PTR in ovl_cleanup_index()
    ovl: fix error value printed in ovl_lookup_index()
    ovl: fix may_write_real() for overlayfs directories

    Linus Torvalds
     

05 Oct, 2017

1 commit

  • Enforcing exclusive ownership on upper/work dirs caused a docker
    regression: https://github.com/moby/moby/issues/34672.

    Euan spotted the regression and pointed to the offending commit.
    Vivek has brought the regression to my attention and provided this
    reproducer:

    Terminal 1:

    mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
    merged/

    Terminal 2:

    unshare -m

    Terminal 1:

    umount merged
    mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
    merged/
    mount: /root/overlay-testing/merged: none already mounted or mount point
    busy

    To fix the regression, I replaced the error with an alarming warning.
    With index feature enabled, mount does fail, but logs a suggestion to
    override exclusive dir protection by disabling index.
    Note that index=off mount does take the inuse locks, so a concurrent
    index=off will issue the warning and a concurrent index=on mount will fail.

    Documentation was updated to reflect this change.

    Fixes: 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs")
    Cc: # v4.13
    Reported-by: Euan Kemp
    Reported-by: Vivek Goyal
    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

03 Oct, 2017

1 commit

  • Pull driver core fixes from Greg KH:
    "Here are a few small fixes for 4.14-rc4.

    The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
    straggler made it in through some other tree (odds are, one of
    mine...) So there's a simple removal of the last user, and then
    finally the macro is removed from the tree.

    There's a fix for old crazy udev instances that insist on reloading a
    module when it is removed from the kernel due to the new uevents for
    bind/unbind. This fixes the reported regression, hopefully some year
    in the future we can drop the workaround, once users update to the
    latest version, but I'm not holding my breath.

    And then there's a build fix for a linker warning, and a buffer
    overflow fix to match the PCI fixes you took through the PCI tree in
    the same area.

    All of these have been in linux-next for a few weeks while I've been
    traveling, sorry for the delay"

    * tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    driver core: remove DRIVER_ATTR
    fpga: altera-cvp: remove DRIVER_ATTR() usage
    driver core: platform: Don't read past the end of "driver_override" buffer
    base: arch_topology: fix section mismatch build warnings
    driver core: suppress sending MODALIAS in UNBIND uevents

    Linus Torvalds
     

19 Sep, 2017

2 commits

  • …ba.org/sfrench/cifs-2.6

    Pull cifs fixes from Steve French:
    "Convert default dialect to smb2.1 or later to allow connecting to
    Windows 7 for example, also includes some fixes for stable"

    * tag '4.14-smb3-multidialect-support-and-fixes-for-stable' of git://git.samba.org/sfrench/cifs-2.6:
    Update version of cifs module
    cifs: hide unused functions
    SMB3: Add support for multidialect negotiate (SMB2.1 and later)
    CIFS/SMB3: Update documentation to reflect SMB3 and various changes
    cifs: check rsp for NULL before dereferencing in SMB2_open

    Linus Torvalds
     
  • DRIVER_ATTR is no longer in use, and driver authors should be using
    DRIVER_ATTR_RW() or DRIVER_ATTR_RO() or DRIVER_ATTR_WO() instead in
    order to always get the permissions correct. So remove it so that no
    one can use it anymore.

    Acked-by: Alan Tull
    Reviewed-by: Moritz Fischer
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

17 Sep, 2017

1 commit


16 Sep, 2017

1 commit

  • Pull orangefs updates from Mike Marshall:
    "Some cleanups and a big bug fix for ACLs.

    When I was reviewing Jan Kara's ACL patch, I realized that Orangefs
    ACL code was busted, not just in the kernel module, but in the server
    as well. I've been working on the code in the server mostly, but
    here's one kernel patch, there will be more"

    * tag 'for-linus-4.14-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
    orangefs: Adjust three checks for null pointers
    orangefs: Use kcalloc() in orangefs_prepare_cdm_array()
    orangefs: Delete error messages for a failed memory allocation in five functions
    orangefs: constify xattr_handler structure
    orangefs: don't call filemap_write_and_wait from fsync
    orangefs: off by ones in xattr size checks
    orangefs: documentation clean up
    orangefs: react properly to posix_acl_update_mode's aftermath.
    orangefs: Don't clear SGID when inheriting ACLs

    Linus Torvalds
     

15 Sep, 2017

2 commits

  • Pull mount flag updates from Al Viro:
    "Another chunk of fmount preparations from dhowells; only trivial
    conflicts for that part. It separates MS_... bits (very grotty
    mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
    only a small subset of MS_... stuff).

    This does *not* convert the filesystems to new constants; only the
    infrastructure is done here. The next step in that series is where the
    conflicts would be; that's the conversion of filesystems. It's purely
    mechanical and it's better done after the merge, so if you could run
    something like

    list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

    sed -i -e 's/\/SB_RDONLY/g' \
    -e 's/\/SB_NOSUID/g' \
    -e 's/\/SB_NODEV/g' \
    -e 's/\/SB_NOEXEC/g' \
    -e 's/\/SB_SYNCHRONOUS/g' \
    -e 's/\/SB_MANDLOCK/g' \
    -e 's/\/SB_DIRSYNC/g' \
    -e 's/\/SB_NOATIME/g' \
    -e 's/\/SB_NODIRATIME/g' \
    -e 's/\/SB_SILENT/g' \
    -e 's/\/SB_POSIXACL/g' \
    -e 's/\/SB_KERNMOUNT/g' \
    -e 's/\/SB_I_VERSION/g' \
    -e 's/\/SB_LAZYTIME/g' \
    $list

    and commit it with something along the lines of 'convert filesystems
    away from use of MS_... constants' as commit message, it would save a
    quite a bit of headache next cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    VFS: Differentiate mount flags (MS_*) from internal superblock flags
    VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
    vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

    Linus Torvalds
     
  • Signed-off-by: Mike Marshall

    Mike Marshall
     

14 Sep, 2017

1 commit

  • Pull overlayfs updates from Miklos Szeredi:
    "This fixes d_ino correctness in readdir, which brings overlayfs on par
    with normal filesystems regarding inode number semantics, as long as
    all layers are on the same filesystem.

    There are also some bug fixes, one in particular (random ioctl's
    shouldn't be able to modify lower layers) that touches some vfs code,
    but of course no-op for non-overlay fs"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: fix false positive ESTALE on lookup
    ovl: don't allow writing ioctl on lower layer
    ovl: fix relatime for directories
    vfs: add flags to d_real()
    ovl: cleanup d_real for negative
    ovl: constant d_ino for non-merge dirs
    ovl: constant d_ino across copy up
    ovl: fix readdir error value
    ovl: check snprintf return

    Linus Torvalds
     

13 Sep, 2017

1 commit

  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've mostly tuned f2fs to provide better user
    experience for Android. Especially, we've worked on atomic write
    feature again with SQLite community in order to support it officially.
    And we added or modified several facilities to analyze and enhance IO
    behaviors.

    Major changes include:
    - add app/fs io stat
    - add inode checksum feature
    - support project/journalled quota
    - enhance atomic write with new ioctl() which exposes feature set
    - enhance background gc/discard/fstrim flows with new gc_urgent mode
    - add F2FS_IOC_FS{GET,SET}XATTR
    - fix some quota flows"

    * tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits)
    f2fs: hurry up to issue discard after io interruption
    f2fs: fix to show correct discard_granularity in sysfs
    f2fs: detect dirty inode in evict_inode
    f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared
    f2fs: speed up gc_urgent mode with SSR
    f2fs: better to wait for fstrim completion
    f2fs: avoid race in between read xattr & write xattr
    f2fs: make get_lock_data_page to handle encrypted inode
    f2fs: use generic terms used for encrypted block management
    f2fs: introduce f2fs_encrypted_file for clean-up
    Revert "f2fs: add a new function get_ssr_cost"
    f2fs: constify super_operations
    f2fs: fix to wake up all sleeping flusher
    f2fs: avoid race in between atomic_read & atomic_inc
    f2fs: remove unneeded parameter of change_curseg
    f2fs: update i_flags correctly
    f2fs: don't check inode's checksum if it was dirtied or writebacked
    f2fs: don't need to update inode checksum for recovery
    f2fs: trigger fdatasync for non-atomic_write file
    f2fs: fix to avoid race in between aio and gc
    ...

    Linus Torvalds
     

07 Sep, 2017

2 commits

  • Patch series "Ranged pagevec lookup", v2.

    In this series I make pagevec_lookup() update the index (to be
    consistent with pagevec_lookup_tag() and also as a preparation for
    ranged lookups), provide ranged variant of pagevec_lookup() and use it
    in places where it makes sense. This not only removes some common code
    but is also a measurable performance win for some use cases (see patch
    4/10) where radix tree is sparse and searching & grabing of a page after
    the end of the range has measurable overhead.

    This patch (of 10):

    The callback doesn't ever get called. Remove it.

    Link: http://lkml.kernel.org/r/20170726114704.7626-2-jack@suse.cz
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • When servicing mmap() reads from file holes the current DAX code
    allocates a page cache page of all zeroes and places the struct page
    pointer in the mapping->page_tree radix tree.

    This has three major drawbacks:

    1) It consumes memory unnecessarily. For every 4k page that is read via
    a DAX mmap() over a hole, we allocate a new page cache page. This
    means that if you read 1GiB worth of pages, you end up using 1GiB of
    zeroed memory. This is easily visible by looking at the overall
    memory consumption of the system or by looking at /proc/[pid]/smaps:

    7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12 /root/dax/data
    Size: 1048576 kB
    Rss: 1048576 kB
    Pss: 1048576 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 1048576 kB
    Private_Dirty: 0 kB
    Referenced: 1048576 kB
    Anonymous: 0 kB
    LazyFree: 0 kB
    AnonHugePages: 0 kB
    ShmemPmdMapped: 0 kB
    Shared_Hugetlb: 0 kB
    Private_Hugetlb: 0 kB
    Swap: 0 kB
    SwapPss: 0 kB
    KernelPageSize: 4 kB
    MMUPageSize: 4 kB
    Locked: 0 kB

    2) It is slower than using a common zero page because each page fault
    has more work to do. Instead of just inserting a common zero page we
    have to allocate a page cache page, zero it, and then insert it. Here
    are the average latencies of dax_load_hole() as measured by ftrace on
    a random test box:

    Old method, using zeroed page cache pages: 3.4 us
    New method, using the common 4k zero page: 0.8 us

    This was the average latency over 1 GiB of sequential reads done by
    this simple fio script:

    [global]
    size=1G
    filename=/root/dax/data
    fallocate=none
    [io]
    rw=read
    ioengine=mmap

    3) The fact that we had to check for both DAX exceptional entries and
    for page cache pages in the radix tree made the DAX code more
    complex.

    Solve these issues by following the lead of the DAX PMD code and using a
    common 4k zero page instead. As with the PMD code we will now insert a
    DAX exceptional entry into the radix tree instead of a struct page
    pointer which allows us to remove all the special casing in the DAX
    code.

    Note that we do still pretty aggressively check for regular pages in the
    DAX radix tree, especially where we take action based on the bits set in
    the page. If we ever find a regular page in our radix tree now that
    most likely means that someone besides DAX is inserting pages (which has
    happened lots of times in the past), and we want to find that out early
    and fail loudly.

    This solution also removes the extra memory consumption. Here is that
    same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
    code:

    7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12 /root/dax/data
    Size: 1048576 kB
    Rss: 0 kB
    Pss: 0 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 0 kB
    Private_Dirty: 0 kB
    Referenced: 0 kB
    Anonymous: 0 kB
    LazyFree: 0 kB
    AnonHugePages: 0 kB
    ShmemPmdMapped: 0 kB
    Shared_Hugetlb: 0 kB
    Private_Hugetlb: 0 kB
    Swap: 0 kB
    SwapPss: 0 kB
    KernelPageSize: 4 kB
    MMUPageSize: 4 kB
    Locked: 0 kB

    Overall system memory consumption is similarly improved.

    Another major change is that we remove dax_pfn_mkwrite() from our fault
    flow, and instead rely on the page fault itself to make the PTE dirty
    and writeable. The following description from the patch adding the
    vm_insert_mixed_mkwrite() call explains this a little more:

    "To be able to use the common 4k zero page in DAX we need to have our
    PTE fault path look more like our PMD fault path where a PTE entry
    can be marked as dirty and writeable as it is first inserted rather
    than waiting for a follow-up dax_pfn_mkwrite() =>
    finish_mkwrite_fault() call.

    Right now we can rely on having a dax_pfn_mkwrite() call because we
    can distinguish between these two cases in do_wp_page():

    case 1: 4k zero page => writable DAX storage
    case 2: read-only DAX storage => writeable DAX storage

    This distinction is made by via vm_normal_page(). vm_normal_page()
    returns false for the common 4k zero page, though, just as it does
    for DAX ptes. Instead of special casing the DAX + 4k zero page case
    we will simplify our DAX PTE page fault sequence so that it matches
    our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
    We will instead use dax_iomap_fault() to handle write-protection
    faults.

    This means that insert_pfn() needs to follow the lead of
    insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
    'mkwrite' is set insert_pfn() will do the work that was previously
    done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"

    Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: "Darrick J. Wong"
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andreas Dilger
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Ingo Molnar
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Cc: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

05 Sep, 2017

1 commit


27 Aug, 2017

1 commit

  • Currently there are no ->swap_{in,out} method in address_space_operations
    sructure definition, so the statement that anything is going to be proxied
    through them is wrong.

    Signed-off-by: Nikolay Borisov
    Signed-off-by: Jonathan Corbet

    Nikolay Borisov
     

22 Aug, 2017

1 commit

  • This patch supports to enable f2fs to accept quota information through
    mount option:
    - {usr,grp,prj}jquota=
    - jqfmt=

    Then, in ->mount flow, we can recover quota file during log replaying,
    by this, journelled quota can be supported.

    Signed-off-by: Chao Yu
    [Jaegeuk Kim: Fix wrong return values.]
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

16 Aug, 2017

1 commit


01 Aug, 2017

1 commit


17 Jul, 2017

1 commit

  • Differentiate the MS_* flags passed to mount(2) from the internal flags set
    in the super_block's s_flags. s_flags are now called SB_*, with the names
    and the values for the moment mirroring the MS_* flags that they're
    equivalent to.

    In this patch, just the headers are altered and some kernel code where
    blind automated conversion isn't necessarily correct.

    Note that this shows up some interesting issues:

    (1) Some MS_* flags get translated to MNT_* flags (such as MS_NODEV ->
    MNT_NODEV) without passing this on to the filesystem, but some
    filesystems set such flags anyway.

    (2) The ->remount_fs() methods of some filesystems adjust the *flags
    argument by setting MS_* flags in it, such as MS_NOATIME - but these
    flags are then scrubbed by do_remount_sb() (only the occupants of
    MS_RMT_MASK are permitted: MS_RDONLY, MS_SYNCHRONOUS, MS_MANDLOCK,
    MS_I_VERSION and MS_LAZYTIME)

    I'm not sure what's the best way to solve all these cases.

    Suggested-by: Al Viro
    Signed-off-by: David Howells

    David Howells
     

16 Jul, 2017

1 commit

  • Pull ->s_options removal from Al Viro:
    "Preparations for fsmount/fsopen stuff (coming next cycle). Everything
    gets moved to explicit ->show_options(), killing ->s_options off +
    some cosmetic bits around fs/namespace.c and friends. Basically, the
    stuff needed to work with fsmount series with minimum of conflicts
    with other work.

    It's not strictly required for this merge window, but it would reduce
    the PITA during the coming cycle, so it would be nice to have those
    bits and pieces out of the way"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    isofs: Fix isofs_show_options()
    VFS: Kill off s_options and helpers
    orangefs: Implement show_options
    9p: Implement show_options
    isofs: Implement show_options
    afs: Implement show_options
    affs: Implement show_options
    befs: Implement show_options
    spufs: Implement show_options
    bpf: Implement show_options
    ramfs: Implement show_options
    pstore: Implement show_options
    omfs: Implement show_options
    hugetlbfs: Implement show_options
    VFS: Don't use save/replace_mount_options if not using generic_show_options
    VFS: Provide empty name qstr
    VFS: Make get_filesystem() return the affected filesystem
    VFS: Clean up whitespace in fs/namespace.c and fs/super.c
    Provide a function to create a NUL-terminated string from unterminated data

    Linus Torvalds
     

13 Jul, 2017

2 commits

  • Since it is possbile to have same number in tfd field (say file added,
    closed, then nother file dup'ed to same number and added back) it is
    imposible to distinguish such target files solely by their numbers.

    Strictly speaking regular applications don't need to recognize these
    targets at all but for checkpoint/restore sake we need to collect
    targets to be able to push them back on restore stage in a proper order.

    Thus lets add file position, inode and device number where this target
    lays. This three fields can be used as a primary key for sorting, and
    together with kcmp help CRIU can find out an exact file target (from the
    whole set of processes being checkpointed).

    Link: http://lkml.kernel.org/r/20170424154423.436491881@gmail.com
    Signed-off-by: Cyrill Gorcunov
    Acked-by: Andrei Vagin
    Cc: Al Viro
    Cc: Pavel Emelyanov
    Cc: Michael Kerrisk
    Cc: Jason Baron
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Pull overlayfs updates from Miklos Szeredi:
    "This work from Amir introduces the inodes index feature, which
    provides:

    - hardlinks are not broken on copy up

    - infrastructure for overlayfs NFS export

    This also fixes constant st_ino for samefs case for lower hardlinks"

    * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (33 commits)
    ovl: mark parent impure and restore timestamp on ovl_link_up()
    ovl: document copying layers restrictions with inodes index
    ovl: cleanup orphan index entries
    ovl: persistent overlay inode nlink for indexed inodes
    ovl: implement index dir copy up
    ovl: move copy up lock out
    ovl: rearrange copy up
    ovl: add flag for upper in ovl_entry
    ovl: use struct copy_up_ctx as function argument
    ovl: base tmpfile in workdir too
    ovl: factor out ovl_copy_up_inode() helper
    ovl: extract helper to get temp file in copy up
    ovl: defer upper dir lock to tempfile link
    ovl: hash overlay non-dir inodes by copy up origin
    ovl: cleanup bad and stale index entries on mount
    ovl: lookup index entry for copy up origin
    ovl: verify index dir matches upper dir
    ovl: verify upper root dir matches lower root dir
    ovl: introduce the inodes index dir feature
    ovl: generalize ovl_create_workdir()
    ...

    Linus Torvalds
     

11 Jul, 2017

3 commits

  • Kill off s_options, save/replace_mount_options() and generic_show_options()
    as all filesystems now implement ->show_options() for themselves. This
    should make it easier to implement a context-based mount where the mount
    options can be passed individually over a file descriptor.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've added new features such as disk quota and statx,
    and modified internal bio management flow to merge more IOs depending
    on block types. We've also made internal threads freezeable for
    Android battery life. In addition to them, there are some patches to
    avoid lock contention as well as a couple of deadlock conditions.

    Enhancements:
    - support usrquota, grpquota, and statx
    - manage DATA/NODE typed bios separately to serialize more IOs
    - modify f2fs_lock_op/wio_mutex to avoid lock contention
    - prevent lock contention in migratepage

    Bug fixes:
    - fix missing load of written inode flag
    - fix worst case victim selection in GC
    - freezeable GC and discard threads for Android battery life
    - sanitize f2fs metadata to deal with security hole
    - clean up sysfs-related code and docs"

    * tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (59 commits)
    f2fs: support plain user/group quota
    f2fs: avoid deadlock caused by lock order of page and lock_op
    f2fs: use spin_{,un}lock_irq{save,restore}
    f2fs: relax migratepage for atomic written page
    f2fs: don't count inode block in in-memory inode.i_blocks
    Revert "f2fs: fix to clean previous mount option when remount_fs"
    f2fs: do not set LOST_PINO for renamed dir
    f2fs: do not set LOST_PINO for newly created dir
    f2fs: skip ->writepages for {mete,node}_inode during recovery
    f2fs: introduce __check_sit_bitmap
    f2fs: stop gc/discard thread in prior during umount
    f2fs: introduce reserved_blocks in sysfs
    f2fs: avoid redundant f2fs_flush after remount
    f2fs: report # of free inodes more precisely
    f2fs: add ioctl to do gc with target block address
    f2fs: don't need to check encrypted inode for partial truncation
    f2fs: measure inode.i_blocks as generic filesystem
    f2fs: set CP_TRIMMED_FLAG correctly
    f2fs: require key for truncate(2) of encrypted file
    f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c
    ...

    Linus Torvalds
     
  • Commit ac6424b981bc ("sched/wait: Rename wait_queue_t =>
    wait_queue_entry_t") had scripted the renaming incorrectly, and didn't
    actually check that the 'wait_queue_t' was a full token.

    As a result, it also triggered on 'wait_queue_token', and renamed that
    to 'wait_queue_entry_token' entry in the autofs4 packet structure
    definition too. That was entirely incorrect, and not intended.

    The end result built fine when building just the kernel - because
    everything had been renamed consistently there - but caused problems in
    user space because the "struct autofs_packet_missing" type is exported
    as part of the uapi.

    This scripts it all back again:

    git grep -lw wait_queue_entry_token |
    xargs sed -i 's/wait_queue_entry_token/wait_queue_token/g'

    and checks the end result.

    Reported-by: Florian Fainelli
    Acked-by: Ingo Molnar
    Fixes: ac6424b981bc ("sched/wait: Rename wait_queue_t => wait_queue_entry_t")
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 Jul, 2017

1 commit

  • This patch adds to support plain user/group quota.

    Change Note by Jaegeuk Kim.

    - Use f2fs page cache for quota files in order to consider garbage collection.
    so, quota files are not tolerable for sudden power-cuts, so user needs to do
    quotacheck.

    - setattr() calls dquot_transfer which will transfer inode->i_blocks.
    We can't reclaim that during f2fs_evict_inode(). So, we need to count
    node blocks as well in order to match i_blocks with dquot's space.

    Note that, Chao wrote a patch to count inode->i_blocks without inode block.
    (f2fs: don't count inode block in in-memory inode.i_blocks)

    - in f2fs_remount, we need to make RW in prior to dquot_resume.

    - handle fault_injection case during f2fs_quota_off_umount

    - TODO: Project quota

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     

08 Jul, 2017

1 commit

  • Pull Writeback error handling updates from Jeff Layton:
    "This pile represents the bulk of the writeback error handling fixes
    that I have for this cycle. Some of the earlier patches in this pile
    may look trivial but they are prerequisites for later patches in the
    series.

    The aim of this set is to improve how we track and report writeback
    errors to userland. Most applications that care about data integrity
    will periodically call fsync/fdatasync/msync to ensure that their
    writes have made it to the backing store.

    For a very long time, we have tracked writeback errors using two flags
    in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
    writeback error occurs (via mapping_set_error) and are cleared as a
    side-effect of filemap_check_errors (as you noted yesterday). This
    model really sucks for userland.

    Only the first task to call fsync (or msync or fdatasync) will see the
    error. Any subsequent task calling fsync on a file will get back 0
    (unless another writeback error occurs in the interim). If I have
    several tasks writing to a file and calling fsync to ensure that their
    writes got stored, then I need to have them coordinate with one
    another. That's difficult enough, but in a world of containerized
    setups that coordination may even not be possible.

    But wait...it gets worse!

    The calls to filemap_check_errors can be buried pretty far down in the
    call stack, and there are internal callers of filemap_write_and_wait
    and the like that also end up clearing those errors. Many of those
    callers ignore the error return from that function or return it to
    userland at nonsensical times (e.g. truncate() or stat()). If I get
    back -EIO on a truncate, there is no reason to think that it was
    because some previous writeback failed, and a subsequent fsync() will
    (incorrectly) return 0.

    This pile aims to do three things:

    1) ensure that when a writeback error occurs that that error will be
    reported to userland on a subsequent fsync/fdatasync/msync call,
    regardless of what internal callers are doing

    2) report writeback errors on all file descriptions that were open at
    the time that the error occurred. This is a user-visible change,
    but I think most applications are written to assume this behavior
    anyway. Those that aren't are unlikely to be hurt by it.

    3) document what filesystems should do when there is a writeback
    error. Today, there is very little consistency between them, and a
    lot of cargo-cult copying. We need to make it very clear what
    filesystems should do in this situation.

    To achieve this, the set adds a new data type (errseq_t) and then
    builds new writeback error tracking infrastructure around that. Once
    all of that is in place, we change the filesystems to use the new
    infrastructure for reporting wb errors to userland.

    Note that this is just the initial foray into cleaning up this mess.
    There is a lot of work remaining here:

    1) convert the rest of the filesystems in a similar fashion. Once the
    initial set is in, then I think most other fs' will be fairly
    simple to convert. Hopefully most of those can in via individual
    filesystem trees.

    2) convert internal waiters on writeback to use errseq_t for
    detecting errors instead of relying on the AS_* flags. I have some
    draft patches for this for ext4, but they are not quite ready for
    prime time yet.

    This was a discussion topic this year at LSF/MM too. If you're
    interested in the gory details, LWN has some good articles about this:

    https://lwn.net/Articles/718734/
    https://lwn.net/Articles/724307/"

    * tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    btrfs: minimal conversion to errseq_t writeback error reporting on fsync
    xfs: minimal conversion to errseq_t writeback error reporting
    ext4: use errseq_t based error handling for reporting data writeback errors
    fs: convert __generic_file_fsync to use errseq_t based reporting
    block: convert to errseq_t based writeback error tracking
    dax: set errors in mapping when writeback fails
    Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
    mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
    fs: new infrastructure for writeback error handling and reporting
    lib: add errseq_t type and infrastructure for handling it
    mm: don't TestClearPageError in __filemap_fdatawait_range
    mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
    jbd2: don't clear and reset errors after waiting on writeback
    buffer: set errors in mapping at the time that the error occurs
    fs: check for writeback errors after syncing out buffers in generic_file_fsync
    buffer: use mapping_set_error instead of setting the flag
    mm: fix mapping_set_error call in me_pagecache_dirty

    Linus Torvalds
     

06 Jul, 2017

1 commit


05 Jul, 2017

1 commit

  • The inodes index feature introduces a behavior change - on mount,
    upper root origin file handle is verified to match the lower root dir.
    This implies that copied layers cannot be mounted with the inodes index
    feature enabled, without explicitly removing the upper dir origin xattr
    and the index dir.

    The inodes index feature is required to support:
    - Prevent breaking hardlinks on copy up
    - NFS export support (upcoming)
    - Overlayfs snapshots (POC)

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

04 Jul, 2017

2 commits

  • Commit 73faec4d9935 ("f2fs: add mount option to select fault injection
    ratio") and Commit 087968974fcd ("f2fs: add fault injection to sysfs")
    forget to document mount option and sysfs file.

    This patch fixes to document them.

    Signed-off-by: Chao Yu
    Signed-off-by: Jaegeuk Kim

    Chao Yu
     
  • Pull documentation updates from Jonathan Corbet:
    "There has been a fair amount of activity in the docs tree this time
    around. Highlights include:

    - Conversion of a bunch of security documentation into RST

    - The conversion of the remaining DocBook templates by The Amazing
    Mauro Machine. We can now drop the entire DocBook build chain.

    - The usual collection of fixes and minor updates"

    * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
    scripts/kernel-doc: handle DECLARE_HASHTABLE
    Documentation: atomic_ops.txt is core-api/atomic_ops.rst
    Docs: clean up some DocBook loose ends
    Make the main documentation title less Geocities
    Docs: Use kernel-figure in vidioc-g-selection.rst
    Docs: fix table problems in ras.rst
    Docs: Fix breakage with Sphinx 1.5 and upper
    Docs: Include the Latex "ifthen" package
    doc/kokr/howto: Only send regression fixes after -rc1
    docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
    doc: Document suitability of IBM Verse for kernel development
    Doc: fix a markup error in coding-style.rst
    docs: driver-api: i2c: remove some outdated information
    Documentation: DMA API: fix a typo in a function name
    Docs: Insert missing space to separate link from text
    doc/ko_KR/memory-barriers: Update control-dependencies example
    Documentation, kbuild: fix typo "minimun" -> "minimum"
    docs: Fix some formatting issues in request-key.rst
    doc: ReSTify keys-trusted-encrypted.txt
    doc: ReSTify keys-request-key.txt
    ...

    Linus Torvalds
     

20 Jun, 2017

1 commit

  • Rename:

    wait_queue_t => wait_queue_entry_t

    'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
    but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
    which had to carry the name.

    Start sorting this out by renaming it to 'wait_queue_entry_t'.

    This also allows the real structure name 'struct __wait_queue' to
    lose its double underscore and become 'struct wait_queue_entry',
    which is the more canonical nomenclature for such data types.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

19 May, 2017

2 commits

  • Mauro says:

    This patch series convert the remaining DocBooks to ReST.

    The first version was originally
    send as 3 patch series:

    [PATCH 00/36] Convert DocBook documents to ReST
    [PATCH 0/5] Convert more books to ReST
    [PATCH 00/13] Get rid of DocBook

    The lsm book was added as if it were a text file under
    Documentation. The plan is to merge it with another file
    under Documentation/security, after both this series and
    a security Documentation patch series gets merged.

    It also adjusts some Sphinx-pedantic errors/warnings on
    some kernel-doc markups.

    I also added some patches here to add PDF output for all
    existing ReST books.

    Jonathan Corbet
     
  • Adjusts for ReST markup and moves under keys security devel index.

    Cc: David Howells
    Signed-off-by: Kees Cook
    Signed-off-by: Jonathan Corbet

    Kees Cook
     

16 May, 2017

3 commits


13 May, 2017

1 commit


11 May, 2017

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable bugfixes:
    - Fix use after free in write error path
    - Use GFP_NOIO for two allocations in writeback
    - Fix a hang in OPEN related to server reboot
    - Check the result of nfs4_pnfs_ds_connect
    - Fix an rcu lock leak

    Features:
    - Removal of the unmaintained and unused OSD pNFS layout
    - Cleanup and removal of lots of unnecessary dprintk()s
    - Cleanup and removal of some memory failure paths now that GFP_NOFS
    is guaranteed to never fail.
    - Remove the v3-only data server limitation on pNFS/flexfiles

    Bugfixes:
    - RPC/RDMA connection handling bugfixes
    - Copy offload: fixes to ensure the copied data is COMMITed to disk.
    - Readdir: switch back to using the ->iterate VFS interface
    - File locking fixes from Ben Coddington
    - Various use-after-free and deadlock issues in pNFS
    - Write path bugfixes"

    * tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (89 commits)
    pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled
    NFSv4.1: Work around a Linux server bug...
    NFS append COMMIT after synchronous COPY
    NFSv4: Fix exclusive create attributes encoding
    NFSv4: Fix an rcu lock leak
    nfs: use kmap/kunmap directly
    NFS: always treat the invocation of nfs_getattr as cache hit when noac is on
    Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew
    NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
    pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits
    pNFS: Fix a typo in pnfs_generic_alloc_ds_commits
    pNFS: Fix a deadlock when coalescing writes and returning the layout
    pNFS: Don't clear the layout return info if there are segments to return
    pNFS: Ensure we commit the layout if it has been invalidated
    pNFS: Don't send COMMITs to the DSes if the server invalidated our layout
    pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path
    pNFS: Ensure we check layout validity before marking it for return
    NFS4.1 handle interrupted slot reuse from ERR_DELAY
    NFSv4: check return value of xdr_inline_decode
    nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()
    ...

    Linus Torvalds