08 Aug, 2016

2 commits

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Commit abf545484d31 changed it from an 'rw' flags type to the
    newer ops based interface, but now we're effectively leaking
    some bdev internals to the rest of the kernel. Since we only
    care about whether it's a read or a write at that level, just
    pass in a bool 'is_write' parameter instead.

    Then we can also move op_is_write() and friends back under
    CONFIG_BLOCK protection.

    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Jens Axboe
     

07 Aug, 2016

3 commits

  • Pull binfmt_misc update from James Bottomley:
    "This update is to allow architecture emulation containers to function
    such that the emulation binary can be housed outside the container
    itself. The container and fs parts both have acks from relevant
    experts.

    To use the new feature you have to add an F option to your binfmt_misc
    configuration"

    From the docs:
    "The usual behaviour of binfmt_misc is to spawn the binary lazily when
    the misc format file is invoked. However, this doesn't work very well
    in the face of mount namespaces and changeroots, so the F mode opens
    the binary as soon as the emulation is installed and uses the opened
    image to spawn the emulator, meaning it is always available once
    installed, regardless of how the environment changes"

    * tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
    binfmt_misc: add F option description to documentation
    binfmt_misc: add persistent opened binary handler for containers
    fs: add filp_clone_open API

    Linus Torvalds
     
  • In most cases, EPERM is returned on immutable inode, and there're only a
    few places returning EACCES. I noticed this when running LTP on
    overlayfs, setxattr03 failed due to unexpected EACCES on immutable
    inode.

    So converting all EACCES to EPERM on immutable inode.

    Acked-by: Dave Chinner
    Signed-off-by: Eryu Guan
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Eryu Guan
     
  • Pull more vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    In the "trivial API change" department - ->d_compare() losing 'parent'
    argument"

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    cachefiles: Fix race between inactivating and culling a cache object
    9p: use clone_fid()
    9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
    vfs: make dentry_needs_remove_privs() internal
    vfs: remove file_needs_remove_privs()
    vfs: fix deadlock in file_remove_privs() on overlayfs
    get rid of 'parent' argument of ->d_compare()
    cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
    affs ->d_compare(): don't bother with ->d_inode
    fold _d_rehash() and __d_rehash() together
    fold dentry_rcuwalk_invalidate() into its only remaining caller

    Linus Torvalds
     

06 Aug, 2016

6 commits

  • …nel/git/dgc/linux-xfs

    Pull more xfs updates from Dave Chinner:
    "This is the second part of the XFS updates for this merge cycle, and
    contains the new reverse block mapping feature for XFS.

    Reverse mapping allows us to track the owner of a specific block on
    disk precisely. It is implemented as a set of btrees (one per
    allocation group) that track the owners of allocated extents.
    Effectively it is a "used space tree" that is updated when we allocate
    or free extents. i.e. it is coherent with the free space btrees we
    already maintain and never overlaps with them.

    This reverse mapping infrastructure is the building block of several
    upcoming features - reflink, copy-on-write data, dedupe, online
    metadata and data scrubbing, highly accurate bad sector/data loss
    reporting to users, and significantly improved reconstruction of
    damaged and corrupted filesystems. There's a lot of new stuff coming
    along in the next couple of cycles,a nd it all builds in the rmap
    infrastructure.

    As such, it's a huge chunk of new code with new on-disk format
    features and internal infrastructure. It warns at mount time as an
    experimental feature and that it may eat data (as we do with all new
    on-disk features until they stabilise). We have not released
    userspace suport for it yet - userspace support currently requires
    download from Darrick's xfsprogs repo and build from source, so the
    access to this feature is really developer/tester only at this point.
    Initial userspace support will be released at the same time kernel
    with this code in it is released.

    The new rmap enabled code regresses 3 xfstests - all are ENOSPC
    related corner cases, one of which Darrick posted a fix for a few
    hours ago. The other two are fixed by infrastructure that is part of
    the upcoming reflink patchset. This new ENOSPC infrastructure
    requires a on-disk format tweak required to keep mount times in
    check - we need to keep an on-disk count of allocated rmapbt blocks so
    we don't have to scan the entire btrees at mount time to count them.

    This is currently being tested and will be part of the fixes sent in
    the next week or two so users will not be exposed to this change"

    * tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (52 commits)
    xfs: move (and rename) the deferred bmap-free tracepoints
    xfs: collapse single use static functions
    xfs: remove unnecessary parentheses from log redo item recovery functions
    xfs: remove the extents array from the rmap update done log item
    xfs: in btree_lshift, only allocate temporary cursor when needed
    xfs: remove unnecesary lshift/rshift key initialization
    xfs: remove the get*keys and update_keys btree ops pointers
    xfs: enable the rmap btree functionality
    xfs: don't update rmapbt when fixing agfl
    xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
    xfs: add rmap btree block detection to log recovery
    xfs: add rmap btree geometry feature flag
    xfs: propagate bmap updates to rmapbt
    xfs: enable the xfs_defer mechanism to process rmaps to update
    xfs: log rmap intent items
    xfs: create rmap update intent log items
    xfs: add rmap btree insert and delete helpers
    xfs: convert unwritten status of reverse mappings
    xfs: remove an extent from the rmap btree
    xfs: add an extent to the rmap btree
    ...

    Linus Torvalds
     
  • Pull qstr constification updates from Al Viro:
    "Fairly self-contained bunch - surprising lot of places passes struct
    qstr * as an argument when const struct qstr * would suffice; it
    complicates analysis for no good reason.

    I'd prefer to feed that separately from the assorted fixes (those are
    in #for-linus and with somewhat trickier topology)"

    * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    qstr: constify instances in adfs
    qstr: constify instances in lustre
    qstr: constify instances in f2fs
    qstr: constify instances in ext2
    qstr: constify instances in vfat
    qstr: constify instances in procfs
    qstr: constify instances in fuse
    qstr constify instances in fs/dcache.c
    qstr: constify instances in nfs
    qstr: constify instances in ocfs2
    qstr: constify instances in autofs4
    qstr: constify instances in hfs
    qstr: constify instances in hfsplus
    qstr: constify instances in logfs
    qstr: constify dentry_init_security

    Linus Torvalds
     
  • Pull pstore fixes from Kees Cook:
    "Fixes for pstore ramoops driver to catch bad kfree() and to use better
    DT bindings"

    * tag 'pstore-v4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    ramoops: use persistent_ram_free() instead of kfree() for freeing prz
    ramoops: use DT reserved-memory bindings

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Here's the second round of block updates for this merge window.

    It's a mix of fixes for changes that went in previously in this round,
    and fixes in general. This pull request contains:

    - Fixes for loop from Christoph

    - A bdi vs gendisk lifetime fix from Dan, worth two cookies.

    - A blk-mq timeout fix, when on frozen queues. From Gabriel.

    - Writeback fix from Jan, ensuring that __writeback_single_inode()
    does the right thing.

    - Fix for bio->bi_rw usage in f2fs from me.

    - Error path deadlock fix in blk-mq sysfs registration from me.

    - Floppy O_ACCMODE fix from Jiri.

    - Fix to the new bio op methods from Mike.

    One more followup will be coming here, ensuring that we don't
    propagate the block types outside of block. That, and a rename of
    bio->bi_rw is coming right after -rc1 is cut.

    - Various little fixes"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    mm/block: convert rw_page users to bio op use
    loop: make do_req_filebacked more robust
    loop: don't try to use AIO for discards
    blk-mq: fix deadlock in blk_mq_register_disk() error path
    Include: blkdev: Removed duplicate 'struct request;' declaration.
    Fixup direct bi_rw modifiers
    block: fix bdi vs gendisk lifetime mismatch
    blk-mq: Allow timeouts to run while queue is freezing
    nbd: fix race in ioctl
    block: fix use-after-free in seq file
    f2fs: drop bio->bi_rw manual assignment
    block: add missing group association in bio-cloning functions
    blkcg: kill unused field nr_undestroyed_grps
    writeback: Write dirty times for WB_SYNC_ALL writeback
    floppy: fix open(O_ACCMODE) for ioctl-only open

    Linus Torvalds
     
  • persistent_ram_zone(=prz) structures are allocated by persistent_ram_new(),
    which includes vmap() or ioremap(). But they are currently freed by
    kfree(). This uses persistent_ram_free() for correct this asymmetry usage.

    Signed-off-by: Hiraku Toyooka
    Signed-off-by: Nobuhiro Iwamatsu
    Cc: Mark Salyzyn
    Cc: Seiji Aguchi
    Signed-off-by: Kees Cook

    Hiraku Toyooka
     
  • Instead of a ramoops-specific node, use a child node of /reserved-memory.
    This requires that of_platform_device_create() be explicitly called
    for the node, though, since "/reserved-memory" does not have its own
    "compatible" property.

    Suggested-by: Rob Herring
    Signed-off-by: Kees Cook
    Acked-by: Rob Herring

    Kees Cook
     

05 Aug, 2016

18 commits

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Trond made a change to the server's tcp logic that allows a fast
    client to better take advantage of high bandwidth networks, but may
    increase the risk that a single client could starve other clients;
    a new sunrpc.svc_rpc_per_connection_limit parameter should help
    mitigate this in the (hopefully unlikely) event this becomes a
    problem in practice.

    - Tom Haynes added a minimal flex-layout pnfs server, which is of no
    use in production for now--don't build it unless you're doing
    client testing or further server development"

    * tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: remove some dead code in nfsd_create_locked()
    nfsd: drop unnecessary MAY_EXEC check from create
    nfsd: clean up bad-type check in nfsd_create_locked
    nfsd: remove unnecessary positive-dentry check
    nfsd: reorganize nfsd_create
    nfsd: check d_can_lookup in fh_verify of directories
    nfsd: remove redundant zero-length check from create
    nfsd: Make creates return EEXIST instead of EACCES
    SUNRPC: Detect immediate closure of accepted sockets
    SUNRPC: accept() may return sockets that are still in SYN_RECV
    nfsd: allow nfsd to advertise multiple layout types
    nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
    nfsd/blocklayout: Make sure calculate signature/designator length aligned
    xfs: abstract block export operations from nfsd layouts
    SUNRPC: Remove unused callback xpo_adjust_wspace()
    SUNRPC: Change TCP socket space reservation
    SUNRPC: Add a server side per-connection limit
    SUNRPC: Micro optimisation for svc_data_ready
    SUNRPC: Call the default socket callbacks instead of open coding
    SUNRPC: lock the socket while detaching it
    ...

    Linus Torvalds
     
  • Pull more btrfs updates from Chris Mason:
    "This is part two of my btrfs pull, which is some cleanups and a batch
    of fixes.

    Most of the code here is from Jeff Mahoney, making the pointers we
    pass around internally more consistent and less confusing overall. I
    noticed a small problem right before I sent this out yesterday, so I
    fixed it up and re-tested overnight"

    * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits)
    Btrfs: fix __MAX_CSUM_ITEMS
    btrfs: btrfs_abort_transaction, drop root parameter
    btrfs: add btrfs_trans_handle->fs_info pointer
    btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction
    btrfs: convert nodesize macros to static inlines
    btrfs: introduce BTRFS_MAX_ITEM_SIZE
    btrfs: cleanup, remove prototype for btrfs_find_root_ref
    btrfs: copy_to_sk drop unused root parameter
    btrfs: simpilify btrfs_subvol_inherit_props
    btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root
    btrfs: tests, require fs_info for root
    btrfs: tests, move initialization into tests/
    btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
    btrfs: prefix fsid to all trace events
    btrfs: plumb fs_info into btrfs_work
    btrfs: remove obsolete part of comment in statfs
    btrfs: hide test-only member under ifdef
    btrfs: Ratelimit "no csum found" info message
    btrfs: Add ratelimit to btrfs printing
    Btrfs: fix unexpected balance crash due to BUG_ON
    ...

    Linus Torvalds
     
  • Pull UBI/UBIFS updates from Richard Weinberger:
    "This contains mostly cleanups and minor improvements of UBI and UBIFS"

    * tag 'upstream-4.8-rc1' of git://git.infradead.org/linux-ubifs:
    ubi: Use bitmaps in Fastmap self-check code
    ubi: Be more paranoid while seaching for the most recent Fastmap
    ubi: Check whether the Fastmap anchor matches the super block
    ubi: Rework Fastmap attach base code
    ubi: Fix whitespace issue in count_fastmap_pebs()
    ubi: Introduce vol_ignored()
    ubi: Fix scan_fast() comment
    ubifs: switch_gc_head: Remove redondant sync of wbuf
    ubi: Make volume resize power cut aware
    ubi: Fix early logging
    ubi: gluebi: Fix double refcounting
    ubifs: Silence early error messages if MS_SILENT is set
    ubi: Fix race condition between ubi device creation and udev
    ubifs: Update comment for ubifs_errc
    ubi: Only read necessary size when reading the VID header
    ubifs: Make xattr structures static
    ubifs: Silence error output if MS_SILENT is set

    Linus Torvalds
     
  • Pull UML updates from Richard Weinberger:
    "Beside of various fixes this also contains patches to enable features
    such was Kcov, kmemleak and TRACE_IRQFLAGS_SUPPORT on UML"

    * 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
    um: Support kcov
    um: Enable TRACE_IRQFLAGS_SUPPORT
    um: Use asm-generic/irqflags.h
    um: Fix possible deadlock in sig_handler_common()
    um: Select HAVE_DEBUG_KMEMLEAK
    um: Setup physical memory in setup_arch()
    um: Eliminate null test after alloc_bootmem

    Linus Torvalds
     
  • Pull m68knommu updates from Greg Ungerer:
    "This series is all about Nicolas flat format support for MMU systems.

    Traditional m68k no-MMU flat format binaries can now be run on m68k
    MMU enabled systems too. The series includes some nice cleanups of
    the binfmt_flat code and converts it to using proper user space
    accessor functions.

    With all this in place you can boot and run a complete no-MMU flat
    format based user space on an MMU enabled system"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
    m68k: enable binfmt_flat on systems with an MMU
    binfmt_flat: allow compressed flat binary format to work on MMU systems
    binfmt_flat: add MMU-specific support
    binfmt_flat: update libraries' data segment pointer with userspace accessors
    binfmt_flat: use clear_user() rather than memset() to clear .bss
    binfmt_flat: use proper user space accessors with old relocs code
    binfmt_flat: use proper user space accessors with relocs processing code
    binfmt_flat: clean up create_flat_tables() and stack accesses
    binfmt_flat: use generic transfer_args_to_stack()
    elf_fdpic_transfer_args_to_stack(): make it generic
    binfmt_flat: prevent kernel dammage from corrupted executable headers
    binfmt_flat: convert printk invocations to their modern form
    binfmt_flat: assorted cleanups
    m68k: use same start_thread() on MMU and no-MMU
    m68k: fix file path comment
    m68k: fix bFLT executable running on MMU enabled systems

    Linus Torvalds
     
  • We changed this around in f135af1041f ('nfsd: reorganize nfsd_create')
    so "dchild" can't be an error pointer any more. Also, dchild can't be
    NULL here (and dput would already handle this even if it was).

    Signed-off-by: Dan Carpenter
    Signed-off-by: J. Bruce Fields

    Dan Carpenter
     
  • We need an fh_verify to make sure we at least have a dentry, but actual
    permission checks happen later.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Minor cleanup, no change in behavior.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
    returns EEXIST if the object already exists.

    This check is therefore unnecessary.

    (In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to
    RFC 1094, our code seems to believe that a CREATE of an existing file
    should succeed. I'm leaving that behavior alone.)

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • There's some odd logic in nfsd_create() that allows it to be called with
    the parent directory either locked or unlocked. The only already-locked
    caller is NFSv2's nfsd_proc_create(). It's less confusing to split out
    the unlocked case into a separate function which the NFSv2 code can call
    directly.

    Also fix some comments while we're here.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Create and other nfsd ops generally assume we can call lookup_one_len on
    inodes with S_IFDIR set. Al says that this assumption isn't true in
    general, though it should be for the filesystem objects nfsd sees.

    Add a check just to make sure our assumption isn't violated.

    Remove a couple checks for i_op->lookup in create code.

    Cc: Al Viro
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • lookup_one_len already has this check.

    The only effect of this patch is to return access instead of perm in the
    0-length-filename case. I actually prefer nfserr_perm (or _inval?), but
    I doubt anyone cares.

    The isdotent check seems redundant too, but I worry that some client
    might actually care about that strange nfserr_exist error.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • When doing a create (mkdir/mknod) on a name, it's worth
    checking the name exists first before returning EACCES in case
    the directory is not writeable by the user.
    This makes return values on the client more consistent
    regardless of whenever the entry there is cached in the local
    cache or not.
    Another positive side effect is certain programs only expect
    EEXIST in that case even despite POSIX allowing any valid
    error to be returned.

    Signed-off-by: Oleg Drokin
    Signed-off-by: J. Bruce Fields

    Oleg Drokin
     
  • The rw_page users were not converted to use bio/req ops. As a result
    bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
    be sent down as reads.

    Signed-off-by: Mike Christie
    Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code")

    Modified by me to:

    1) Drop op_flags passing into ->rw_page(), as we don't use it.
    2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK

    Signed-off-by: Jens Axboe

    Mike Christie
     
  • bi_rw should be using bio_set_op_attrs to set bi_rw.

    Signed-off-by: Shaun Tancheff
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: David Sterba
    Cc: Mike Christie
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Shaun Tancheff
     
  • Merge 4fc29c1aa375 included this extra line, but it's not needed (or
    useful) since we'll bio_set_op_attrs() right after to properly set
    the op and flags for the bio.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • When a bio is cloned, the newly created bio must be associated with
    the same blkcg as the original bio (if BLK_CGROUP is enabled). If
    this operation is not performed, then the new bio is not associated
    with any group, and the group of the current task is returned when
    the group of the bio is requested.

    Depending on the cloning frequency, this may cause a large
    percentage of the bios belonging to a given group to be treated
    as if belonging to other groups (in most cases as if belonging to
    the root group). The expected group isolation may thereby be broken.

    This commit adds the missing association in bio-cloning functions.

    Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers")
    Cc: stable@vger.kernel.org # v4.3+

    Signed-off-by: Paolo Valente
    Reviewed-by: Nikolay Borisov
    Reviewed-by: Jeff Moyer
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
    queue_io() so that inodes which have only dirty timestamps are properly
    written on fsync(2) and sync(2). However there are other call sites -
    most notably going through write_inode_now() - which expect inode to be
    clean after WB_SYNC_ALL writeback. This is not currently true as we do
    not clear I_DIRTY_TIME in __writeback_single_inode() even for
    WB_SYNC_ALL writeback in all the cases. This then resulted in the
    following oops because bdev_write_inode() did not clean the inode and
    writeback code later stumbled over a dirty inode with detached wb.

    general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
    Modules linked in:
    CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Workqueue: writeback wb_workfn (flush-11:0)
    task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
    RIP: 0010:[] []
    locked_inode_to_wb_and_lock_list+0xa2/0x750
    RSP: 0018:ffff88006cdaf7d0 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
    RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
    RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
    R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
    R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
    FS: 0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
    DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    Stack:
    ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
    ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
    ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
    Call Trace:
    [< inline >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
    [] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
    [] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
    [] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
    [< inline >] wb_do_writeback fs/fs-writeback.c:1844
    [] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
    [] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
    [] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
    [] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
    [] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
    Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
    00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03
    80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
    RIP [< inline >] wb_get include/linux/backing-dev-defs.h:212
    RIP [] locked_inode_to_wb_and_lock_list+0xa2/0x750
    fs/fs-writeback.c:281
    RSP
    ---[ end trace 986a4d314dcb2694 ]---

    Fix the problem by making sure __writeback_single_inode() writes inode
    only with dirty times in WB_SYNC_ALL mode.

    Reported-by: Dmitry Vyukov
    Tested-by: Laurent Dufour
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

04 Aug, 2016

6 commits

  • The functionality for block device DAX was already removed with commit
    acc93d30d7d4 ("Revert "block: enable dax for raw block devices"")

    However, we still had a config option hanging around that was always
    disabled because it depended on CONFIG_BROKEN. This config option was
    introduced in commit 03cdadb04077 ("block: disable block device DAX by
    default")

    This change reverts that commit, removing the dead config option.

    Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Cc: Dave Hansen
    Acked-by: Dan Williams
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • We can't pass error pointers to kfree() or it causes an oops.

    Fixes: 52b209f7b848 ('get rid of hostfs_read_inode()')
    Signed-off-by: Dan Carpenter
    Signed-off-by: Richard Weinberger

    Dan Carpenter
     
  • Jeff Mahoney's cleanup commit (14a1e067b4) wasn't correct for csums on
    machines where the pagesize >= metadata blocksize.

    This just reverts the relevant hunks to bring the old math back.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • There's a race between cachefiles_mark_object_inactive() and
    cachefiles_cull():

    (1) cachefiles_cull() can't delete a backing file until the cache object
    is marked inactive, but as soon as that's the case it's fair game.

    (2) cachefiles_mark_object_inactive() marks the object as being inactive
    and *only then* reads the i_blocks on the backing inode - but
    cachefiles_cull() might've managed to delete it by this point.

    Fix this by making sure cachefiles_mark_object_inactive() gets any data it
    needs from the backing inode before deactivating the object.

    Without this, the following oops may occur:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
    IP: [] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
    ...
    CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G I ------------ 3.10.0-470.el7.x86_64 #1
    Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
    Workqueue: fscache_object fscache_object_work_func [fscache]
    task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
    RIP: 0010:[] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
    RSP: 0018:ffff8800b77c3d70 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
    RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
    RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
    R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
    R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
    FS: 0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
    ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
    ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
    Call Trace:
    [] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
    [] fscache_drop_object+0xd6/0x1e0 [fscache]
    [] fscache_object_work_func+0xa5/0x200 [fscache]
    [] process_one_work+0x17b/0x470
    [] worker_thread+0x126/0x410
    [] ? rescuer_thread+0x460/0x460
    [] kthread+0xcf/0xe0
    [] ? kthread_create_on_node+0x140/0x140
    [] ret_from_fork+0x58/0x90
    [] ? kthread_create_on_node+0x140/0x140

    The oopsing code shows:

    callq 0xffffffff810af6a0
    mov 0xf8(%r12),%rax
    mov 0x30(%rax),%rax
    mov 0x98(%rax),%rax dentry)->i_blocks

    Fixes: a5b3a80b899bda0f456f1246c4c5a1191ea01519 (CacheFiles: Provide read-and-reset release counters for cachefilesd)
    Reported-by: Jianhong Yin
    Signed-off-by: David Howells
    Reviewed-by: Jeff Layton
    Reviewed-by: Steve Dickson
    cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Al Viro
     
  • With gcc < 4.2 (e.g. 4.1.2):

    CC fs/proc/task_mmu.o
    cc1: error: unrecognized command line option "-Wno-override-init"

    To fix this, only enable the compiler option when it is actually
    supported by the compiler.

    Fixes: ca52953f5f24 ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
    Signed-off-by: Geert Uytterhoeven
    Acked-by: Valdis Kletnieks
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

03 Aug, 2016

5 commits