14 Nov, 2018

1 commit

  • commit ccd3c4373eacb044eb3832966299d13d2631f66f upstream.

    The code cleaning transaction's lists of checkpoint buffers has a bug
    where it increases bh refcount only after releasing
    journal->j_list_lock. Thus the following race is possible:

    CPU0 CPU1
    jbd2_log_do_checkpoint()
    jbd2_journal_try_to_free_buffers()
    __journal_try_to_free_buffer(bh)
    ...
    while (transaction->t_checkpoint_io_list)
    ...
    if (buffer_locked(bh)) {

    spin_unlock(&journal->j_list_lock);
    spin_lock(&journal->j_list_lock);
    __jbd2_journal_remove_checkpoint(jh);
    spin_unlock(&journal->j_list_lock);
    try_to_free_buffers(page);
    get_bh(bh) j_list_lock.

    Fixes: dc6e8d669cf5 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
    Fixes: be1158cc615f ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
    Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
    CC: stable@vger.kernel.org
    Reviewed-by: Lukas Czerner
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

11 Jul, 2018

1 commit

  • commit e09463f220ca9a1a1ecfda84fcda658f99a1f12a upstream.

    Do not set the b_modified flag in block's journal head should not
    until after we're sure that jbd2_journal_dirty_metadat() will not
    abort with an error due to there not being enough space reserved in
    the jbd2 handle.

    Otherwise, future attempts to modify the buffer may lead a large
    number of spurious errors and warnings.

    This addresses CVE-2018-10883.

    https://bugzilla.kernel.org/show_bug.cgi?id=200071

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

02 May, 2018

1 commit

  • commit b2569260d55228b617bd82aba6d0db2faeeb4116 upstream.

    If ext4 tries to start a reserved handle via
    jbd2_journal_start_reserved(), and the journal has been aborted, this
    can result in a NULL pointer dereference. This is because the fields
    h_journal and h_transaction in the handle structure share the same
    memory, via a union, so jbd2_journal_start_reserved() will clear
    h_journal before calling start_this_handle(). If this function fails
    due to an aborted handle, h_journal will still be NULL, and the call
    to jbd2_journal_free_reserved() will pass a NULL journal to
    sub_reserve_credits().

    This can be reproduced by running "kvm-xfstests -c dioread_nolock
    generic/475".

    Cc: stable@kernel.org # 3.11
    Fixes: 8f7d89f36829b ("jbd2: transaction reservation support")
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Reviewed-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

24 Apr, 2018

2 commits

  • commit fb7c02445c497943e7296cd3deee04422b63acb8 upstream.

    Previously the jbd2 layer assumed that a file system check would be
    required after a journal abort. In the case of the deliberate file
    system shutdown, this should not be necessary. Allow the jbd2 layer
    to distinguish between these two cases by using the ESHUTDOWN errno.

    Also add proper locking to __journal_abort_soft().

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 85e0c4e89c1b864e763c4e3bb15d0b6d501ad5d9 upstream.

    This updates the jbd2 superblock unnecessarily, and on an abort we
    shouldn't truncate the log.

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

22 Feb, 2018

1 commit

  • commit f69120ce6c024aa634a8fc25787205e42f0ccbe6 upstream.

    Sphinx emits various (26) warnings when building make target 'htmldocs'.
    Currently struct definitions contain duplicate documentation, some as
    kernel-docs and some as standard c89 comments. We can reduce
    duplication while cleaning up the kernel docs.

    Move all kernel-docs to right above each struct member. Use the set of
    all existing comments (kernel-doc and c89). Add documentation for
    missing struct members and function arguments.

    Signed-off-by: Tobin C. Harding
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Tobin C. Harding
     

08 Jul, 2017

1 commit

  • Pull Writeback error handling updates from Jeff Layton:
    "This pile represents the bulk of the writeback error handling fixes
    that I have for this cycle. Some of the earlier patches in this pile
    may look trivial but they are prerequisites for later patches in the
    series.

    The aim of this set is to improve how we track and report writeback
    errors to userland. Most applications that care about data integrity
    will periodically call fsync/fdatasync/msync to ensure that their
    writes have made it to the backing store.

    For a very long time, we have tracked writeback errors using two flags
    in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
    writeback error occurs (via mapping_set_error) and are cleared as a
    side-effect of filemap_check_errors (as you noted yesterday). This
    model really sucks for userland.

    Only the first task to call fsync (or msync or fdatasync) will see the
    error. Any subsequent task calling fsync on a file will get back 0
    (unless another writeback error occurs in the interim). If I have
    several tasks writing to a file and calling fsync to ensure that their
    writes got stored, then I need to have them coordinate with one
    another. That's difficult enough, but in a world of containerized
    setups that coordination may even not be possible.

    But wait...it gets worse!

    The calls to filemap_check_errors can be buried pretty far down in the
    call stack, and there are internal callers of filemap_write_and_wait
    and the like that also end up clearing those errors. Many of those
    callers ignore the error return from that function or return it to
    userland at nonsensical times (e.g. truncate() or stat()). If I get
    back -EIO on a truncate, there is no reason to think that it was
    because some previous writeback failed, and a subsequent fsync() will
    (incorrectly) return 0.

    This pile aims to do three things:

    1) ensure that when a writeback error occurs that that error will be
    reported to userland on a subsequent fsync/fdatasync/msync call,
    regardless of what internal callers are doing

    2) report writeback errors on all file descriptions that were open at
    the time that the error occurred. This is a user-visible change,
    but I think most applications are written to assume this behavior
    anyway. Those that aren't are unlikely to be hurt by it.

    3) document what filesystems should do when there is a writeback
    error. Today, there is very little consistency between them, and a
    lot of cargo-cult copying. We need to make it very clear what
    filesystems should do in this situation.

    To achieve this, the set adds a new data type (errseq_t) and then
    builds new writeback error tracking infrastructure around that. Once
    all of that is in place, we change the filesystems to use the new
    infrastructure for reporting wb errors to userland.

    Note that this is just the initial foray into cleaning up this mess.
    There is a lot of work remaining here:

    1) convert the rest of the filesystems in a similar fashion. Once the
    initial set is in, then I think most other fs' will be fairly
    simple to convert. Hopefully most of those can in via individual
    filesystem trees.

    2) convert internal waiters on writeback to use errseq_t for
    detecting errors instead of relying on the AS_* flags. I have some
    draft patches for this for ext4, but they are not quite ready for
    prime time yet.

    This was a discussion topic this year at LSF/MM too. If you're
    interested in the gory details, LWN has some good articles about this:

    https://lwn.net/Articles/718734/
    https://lwn.net/Articles/724307/"

    * tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    btrfs: minimal conversion to errseq_t writeback error reporting on fsync
    xfs: minimal conversion to errseq_t writeback error reporting
    ext4: use errseq_t based error handling for reporting data writeback errors
    fs: convert __generic_file_fsync to use errseq_t based reporting
    block: convert to errseq_t based writeback error tracking
    dax: set errors in mapping when writeback fails
    Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
    mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
    fs: new infrastructure for writeback error handling and reporting
    lib: add errseq_t type and infrastructure for handling it
    mm: don't TestClearPageError in __filemap_fdatawait_range
    mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
    jbd2: don't clear and reset errors after waiting on writeback
    buffer: set errors in mapping at the time that the error occurs
    fs: check for writeback errors after syncing out buffers in generic_file_fsync
    buffer: use mapping_set_error instead of setting the flag
    mm: fix mapping_set_error call in me_pagecache_dirty

    Linus Torvalds
     

06 Jul, 2017

1 commit

  • Resetting this flag is almost certainly racy, and will be problematic
    with some coming changes.

    Make filemap_fdatawait_keep_errors return int, but not clear the flag(s).
    Have jbd2 call it instead of filemap_fdatawait and don't attempt to
    re-set the error flag if it fails.

    Reviewed-by: Jan Kara
    Reviewed-by: Carlos Maiolino
    Signed-off-by: Jeff Layton

    Jeff Layton
     

04 Jul, 2017

1 commit

  • Pull documentation updates from Jonathan Corbet:
    "There has been a fair amount of activity in the docs tree this time
    around. Highlights include:

    - Conversion of a bunch of security documentation into RST

    - The conversion of the remaining DocBook templates by The Amazing
    Mauro Machine. We can now drop the entire DocBook build chain.

    - The usual collection of fixes and minor updates"

    * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
    scripts/kernel-doc: handle DECLARE_HASHTABLE
    Documentation: atomic_ops.txt is core-api/atomic_ops.rst
    Docs: clean up some DocBook loose ends
    Make the main documentation title less Geocities
    Docs: Use kernel-figure in vidioc-g-selection.rst
    Docs: fix table problems in ras.rst
    Docs: Fix breakage with Sphinx 1.5 and upper
    Docs: Include the Latex "ifthen" package
    doc/kokr/howto: Only send regression fixes after -rc1
    docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
    doc: Document suitability of IBM Verse for kernel development
    Doc: fix a markup error in coding-style.rst
    docs: driver-api: i2c: remove some outdated information
    Documentation: DMA API: fix a typo in a function name
    Docs: Insert missing space to separate link from text
    doc/ko_KR/memory-barriers: Update control-dependencies example
    Documentation, kbuild: fix typo "minimun" -> "minimum"
    docs: Fix some formatting issues in request-key.rst
    doc: ReSTify keys-trusted-encrypted.txt
    doc: ReSTify keys-request-key.txt
    ...

    Linus Torvalds
     

20 Jun, 2017

1 commit


22 May, 2017

1 commit

  • When a transaction starts, start_this_handle() saves current
    PF_MEMALLOC_NOFS value so that it can be restored at journal stop time.
    Journal restart is a special case that calls start_this_handle() without
    stopping the transaction. start_this_handle() isn't aware that the
    original value is already stored so it overwrites it with current value.

    For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set
    at the end:

    jbd2_journal_start()
    jbd2__journal_restart()
    jbd2_journal_stop()

    Make jbd2__journal_restart() restore the original value before calling
    start_this_handle().

    Fixes: 81378da64de6 ("jbd2: mark the transaction context with the scope GFP_NOFS context")
    Signed-off-by: Tahsin Erdogan
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Tahsin Erdogan
     

19 May, 2017

1 commit

  • Mauro says:

    This patch series convert the remaining DocBooks to ReST.

    The first version was originally
    send as 3 patch series:

    [PATCH 00/36] Convert DocBook documents to ReST
    [PATCH 0/5] Convert more books to ReST
    [PATCH 00/13] Get rid of DocBook

    The lsm book was added as if it were a text file under
    Documentation. The plan is to merge it with another file
    under Documentation/security, after both this series and
    a security Documentation patch series gets merged.

    It also adjusts some Sphinx-pedantic errors/warnings on
    some kernel-doc markups.

    I also added some patches here to add PDF output for all
    existing ReST books.

    Jonathan Corbet
     

16 May, 2017

2 commits


11 May, 2017

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes are:

    - Debloat RCU headers

    - Parallelize SRCU callback handling (plus overlapping patches)

    - Improve the performance of Tree SRCU on a CPU-hotplug stress test

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    rcu: Open-code the rcu_cblist_n_cbs() function
    rcu: Open-code the rcu_cblist_empty() function
    rcu: Separately compile large rcu_segcblist functions
    srcu: Debloat the header
    srcu: Adjust default auto-expediting holdoff
    srcu: Specify auto-expedite holdoff time
    srcu: Expedite first synchronize_srcu() when idle
    srcu: Expedited grace periods with reduced memory contention
    srcu: Make rcutorture writer stalls print SRCU GP state
    srcu: Exact tracking of srcu_data structures containing callbacks
    srcu: Make SRCU be built by default
    srcu: Fix Kconfig botch when SRCU not selected
    rcu: Make non-preemptive schedule be Tasks RCU quiescent state
    srcu: Expedite srcu_schedule_cbs_snp() callback invocation
    srcu: Parallelize callback handling
    kvm: Move srcu_struct fields to end of struct kvm
    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    rcu: Use true/false in assignment to bool
    rcu: Use bool value directly
    ...

    Linus Torvalds
     

09 May, 2017

1 commit

  • Pull ext4 updates from Ted Ts'o:

    - add GETFSMAP support

    - some performance improvements for very large file systems and for
    random write workloads into a preallocated file

    - bug fixes and cleanups.

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd2: cleanup write flags handling from jbd2_write_superblock()
    ext4: mark superblock writes synchronous for nobarrier mounts
    ext4: inherit encryption xattr before other xattrs
    ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio()
    ext4: avoid unnecessary transaction stalls during writeback
    ext4: preload block group descriptors
    ext4: make ext4_shutdown() static
    ext4: support GETFSMAP ioctls
    vfs: add common GETFSMAP ioctl definitions
    ext4: evict inline data when writing to memory map
    ext4: remove ext4_xattr_check_entry()
    ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()
    ext4: merge ext4_xattr_list() into ext4_listxattr()
    ext4: constify static data that is never modified
    ext4: trim return value and 'dir' argument from ext4_insert_dentry()
    jbd2: fix dbench4 performance regression for 'nobarrier' mounts
    jbd2: Fix lockdep splat with generic/270 test
    mm: retry writepages() on ENOMEM when doing an data integrity writeback

    Linus Torvalds
     

04 May, 2017

3 commits

  • Currently jbd2_write_superblock() silently adds REQ_SYNC to flags with
    which journal superblock is written. Make this explicit by making flags
    passed down to jbd2_write_superblock() contain REQ_SYNC.

    CC: linux-ext4@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • kjournald2 is central to the transaction commit processing. As such any
    potential allocation from this kernel thread has to be GFP_NOFS. Make
    sure to mark the whole kernel thread GFP_NOFS by the memalloc_nofs_save.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20170306131408.9828-8-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • now that we have memalloc_nofs_{save,restore} api we can mark the whole
    transaction context as implicitly GFP_NOFS. All allocations will
    automatically inherit GFP_NOFS this way. This means that we do not have
    to mark any of those requests with GFP_NOFS and moreover all the
    ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded
    GFP_KERNEL allocations deep inside the vmalloc will be NOFS now.

    [akpm@linux-foundation.org: tweak comments]
    Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

30 Apr, 2017

2 commits

  • Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
    synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
    JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
    filesystem is mounted with nobarrier mount option, journal superblock
    writes ended up being async writes after this patch and that caused
    heavy performance regression for dbench4 benchmark with high number of
    processes. In my test setup with HP RAID array with non-volatile write
    cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.

    Fix the problem by making sure journal superblock writes are always
    treated as synchronous since they generally block progress of the
    journalling machinery and thus the whole filesystem.

    Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • I've hit a lockdep splat with generic/270 test complaining that:

    3216.fsstress.b/3533 is trying to acquire lock:
    (jbd2_handle){++++..}, at: [] jbd2_log_wait_commit+0x0/0x150

    but task is already holding lock:
    (jbd2_handle){++++..}, at: [] start_this_handle+0x35b/0x850

    The underlying problem is that jbd2_journal_force_commit_nested()
    (called from ext4_should_retry_alloc()) may get called while a
    transaction handle is started. In such case it takes care to not wait
    for commit of the running transaction (which would deadlock) but only
    for a commit of a transaction that is already committing (which is safe
    as that doesn't wait for any filesystem locks).

    In fact there are also other callers of jbd2_log_wait_commit() that take
    care to pass tid of a transaction that is already committing and for
    those cases, the lockdep instrumentation is too restrictive and leading
    to false positive reports. Fix the problem by calling
    jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the
    transaction isn't already committing.

    Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

23 Apr, 2017

1 commit


19 Apr, 2017

1 commit

  • A group of Linux kernel hackers reported chasing a bug that resulted
    from their assumption that SLAB_DESTROY_BY_RCU provided an existence
    guarantee, that is, that no block from such a slab would be reallocated
    during an RCU read-side critical section. Of course, that is not the
    case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
    slab of blocks.

    However, there is a phrase for this, namely "type safety". This commit
    therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
    to avoid future instances of this sort of confusion.

    Signed-off-by: Paul E. McKenney
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    [ paulmck: Add comments mentioning the old name, as requested by Eric
    Dumazet, in order to help people familiar with the old name find
    the new one. ]
    Acked-by: David Rientjes

    Paul E. McKenney
     

16 Mar, 2017

1 commit

  • In journal_init_common(), if we failed to allocate the j_wbuf array, or
    if we failed to create the buffer_head for the journal superblock, we
    leaked the memory allocated for the revocation tables. Fix this.

    Cc: stable@vger.kernel.org # 4.9
    Fixes: f0c9fd5458bacf7b12a9a579a727dc740cbe047e
    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Eric Biggers
     

21 Feb, 2017

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "For this cycle we add support for the shutdown ioctl, which is
    primarily used for testing, but which can be useful on production
    systems when a scratch volume is being destroyed and the data on it
    doesn't need to be saved.

    This found (and we fixed) a number of bugs with ext4's recovery to
    corrupted file system --- the bugs increased the amount of data that
    could be potentially lost, and in the case of the inline data feature,
    could cause the kernel to BUG.

    Also included are a number of other bug fixes, including in ext4's
    fscrypt, DAX, inline data support"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
    ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWN
    ext4: fix fencepost in s_first_meta_bg validation
    ext4: don't BUG when truncating encrypted inodes on the orphan list
    ext4: do not use stripe_width if it is not set
    ext4: fix stripe-unaligned allocations
    dax: assert that i_rwsem is held exclusive for writes
    ext4: fix DAX write locking
    ext4: add EXT4_IOC_GOINGDOWN ioctl
    ext4: add shutdown bit and check for it
    ext4: rename s_resize_flags to s_ext4_flags
    ext4: return EROFS if device is r/o and journal replay is needed
    ext4: preserve the needs_recovery flag when the journal is aborted
    jbd2: don't leak modified metadata buffers on an aborted journal
    ext4: fix inline data error paths
    ext4: move halfmd4 into hash.c directly
    ext4: fix use-after-iput when fscrypt contexts are inconsistent
    jbd2: fix use after free in kjournald2()
    ext4: fix data corruption in data=journal mode
    ext4: trim allocation requests to group size
    ext4: replace BUG_ON with WARN_ON in mb_find_extent()
    ...

    Linus Torvalds
     

05 Feb, 2017

1 commit


02 Feb, 2017

1 commit

  • Below is the synchronization issue between unmount and kjournald2
    contexts, which results into use after free issue in kjournald2().
    Fix this issue by using journal->j_state_lock to synchronize the
    wait_event() done in journal_kill_thread() and the wake_up() done
    in kjournald2().

    TASK 1:
    umount cmd:
    |--jbd2_journal_destroy() {
    |--journal_kill_thread() {
    write_lock(&journal->j_state_lock);
    journal->j_flags |= JBD2_UNMOUNT;
    ...
    write_unlock(&journal->j_state_lock);
    wake_up(&journal->j_wait_commit); TASK 2 wakes up here:
    kjournald2() {
    ...
    checks JBD2_UNMOUNT flag and calls goto end-loop;
    ...
    end_loop:
    write_unlock(&journal->j_state_lock);
    journal->j_task = NULL; --> If this thread gets
    pre-empted here, then TASK 1 wait_event will
    exit even before this thread is completely
    done.
    wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
    ...
    write_lock(&journal->j_state_lock);
    write_unlock(&journal->j_state_lock);
    }
    |--kfree(journal);
    }
    }
    wake_up(&journal->j_wait_done_commit); --> this step
    now results into use after free issue.
    }

    Signed-off-by: Sahitya Tummala
    Signed-off-by: Theodore Ts'o

    Sahitya Tummala
     

14 Jan, 2017

1 commit

  • When an ext4 fs is bogged down by a lot of metadata IOs (in the
    reported case, it was deletion of millions of files, but any massive
    amount of journal writes would do), after the journal is filled up,
    tasks which try to access the filesystem and aren't currently
    performing the journal writes end up waiting in
    __jbd2_log_wait_for_space() for journal->j_checkpoint_mutex.

    Because those mutex sleeps aren't marked as iowait, this condition can
    lead to misleadingly low iowait and /proc/stat:procs_blocked. While
    iowait propagation is far from strict, this condition can be triggered
    fairly easily and annotating these sleeps correctly helps initial
    diagnosis quite a bit.

    Use the new mutex_lock_io() for journal->j_checkpoint_mutex so that
    these sleeps are properly marked as iowait.

    Reported-by: Mingbo Wan
    Signed-off-by: Tejun Heo
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andreas Dilger
    Cc: Andrew Morton
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Theodore Ts'o
    Cc: Thomas Gleixner
    Cc: kernel-team@fb.com
    Link: http://lkml.kernel.org/r/1477673892-28940-5-git-send-email-tj@kernel.org
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

25 Dec, 2016

1 commit


14 Dec, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main block pull request this series. Contrary to previous
    release, I've kept the core and driver changes in the same branch. We
    always ended up having dependencies between the two for obvious
    reasons, so makes more sense to keep them together. That said, I'll
    probably try and keep more topical branches going forward, especially
    for cycles that end up being as busy as this one.

    The major parts of this pull request is:

    - Improved support for O_DIRECT on block devices, with a small
    private implementation instead of using the pig that is
    fs/direct-io.c. From Christoph.

    - Request completion tracking in a scalable fashion. This is utilized
    by two components in this pull, the new hybrid polling and the
    writeback queue throttling code.

    - Improved support for polling with O_DIRECT, adding a hybrid mode
    that combines pure polling with an initial sleep. From me.

    - Support for automatic throttling of writeback queues on the block
    side. This uses feedback from the device completion latencies to
    scale the queue on the block side up or down. From me.

    - Support from SMR drives in the block layer and for SD. From Hannes
    and Shaun.

    - Multi-connection support for nbd. From Josef.

    - Cleanup of request and bio flags, so we have a clear split between
    which are bio (or rq) private, and which ones are shared. From
    Christoph.

    - A set of patches from Bart, that improve how we handle queue
    stopping and starting in blk-mq.

    - Support for WRITE_ZEROES from Chaitanya.

    - Lightnvm updates from Javier/Matias.

    - Supoort for FC for the nvme-over-fabrics code. From James Smart.

    - A bunch of fixes from a whole slew of people, too many to name
    here"

    * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
    blk-stat: fix a few cases of missing batch flushing
    blk-flush: run the queue when inserting blk-mq flush
    elevator: make the rqhash helpers exported
    blk-mq: abstract out blk_mq_dispatch_rq_list() helper
    blk-mq: add blk_mq_start_stopped_hw_queue()
    block: improve handling of the magic discard payload
    blk-wbt: don't throttle discard or write zeroes
    nbd: use dev_err_ratelimited in io path
    nbd: reset the setup task for NBD_CLEAR_SOCK
    nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
    nvme-fabrics: Add target support for FC transport
    nvme-fabrics: Add host support for FC transport
    nvme-fabrics: Add FC transport LLDD api definitions
    nvme-fabrics: Add FC transport FC-NVME definitions
    nvme-fabrics: Add FC transport error codes to nvme.h
    Add type 0x28 NVME type code to scsi fc headers
    nvme-fabrics: patch target code in prep for FC transport support
    nvme-fabrics: set sqe.command_id in core not transports
    parser: add u64 number parser
    nvme-rdma: align to generic ib_event logging helper
    ...

    Linus Torvalds
     

01 Nov, 2016

1 commit


13 Oct, 2016

1 commit

  • When 'jh->b_transaction == transaction' (asserted by below)

    J_ASSERT_JH(jh, (jh->b_transaction == transaction || ...

    'journal->j_list_lock' will be incorrectly unlocked, since
    the the lock is aquired only at the end of if / else-if
    statements (missing the else case).

    Signed-off-by: Taesoo Kim
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Fixes: 6e4862a5bb9d12be87e4ea5d9a60836ebed71d28
    Cc: stable@vger.kernel.org # 3.14+

    Taesoo Kim
     

12 Oct, 2016

1 commit

  • The mapping_set_error() helper sets the correct AS_ flag for the mapping
    so there is no reason to open code it. Use the helper directly.

    [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
    Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

22 Sep, 2016

1 commit

  • Thomas has reported a lockdep splat hitting in
    add_transaction_credits(). The problem is that that function calls
    jbd2_might_wait_for_commit() while holding j_state_lock which is wrong
    (we do not really wait for transaction commit while holding that lock).

    Fix the problem by moving jbd2_might_wait_for_commit() into places where
    we are ready to wait for transaction commit and thus j_state_lock is
    unlocked.

    Cc: stable@vger.kernel.org
    Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae
    Reported-by: Thomas Gleixner
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

16 Sep, 2016

1 commit

  • There are some repetitive code in jbd2_journal_init_dev() and
    jbd2_journal_init_inode(). So this patch moves the common code into
    journal_init_common() helper to simplify the code. And fix the coding
    style warnings reported by checkpatch.pl by the way.

    Signed-off-by: Geliang Tang
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Geliang Tang
     

27 Jul, 2016

2 commits

  • Pull ext4 updates from Ted Ts'o:
    "The major change this cycle is deleting ext4's copy of the file system
    encryption code and switching things over to using the copies in
    fs/crypto. I've updated the MAINTAINERS file to add an entry for
    fs/crypto listing Jaeguk Kim and myself as the maintainers.

    There are also a number of bug fixes, most notably for some problems
    found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also
    fixed is a writeback deadlock detected by generic/130, and some
    potential races in the metadata checksum code"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
    ext4: verify extent header depth
    ext4: short-cut orphan cleanup on error
    ext4: fix reference counting bug on block allocation error
    MAINTAINRES: fs-crypto maintainers update
    ext4 crypto: migrate into vfs's crypto engine
    ext2: fix filesystem deadlock while reading corrupted xattr block
    ext4: fix project quota accounting without quota limits enabled
    ext4: validate s_reserved_gdt_blocks on mount
    ext4: remove unused page_idx
    ext4: don't call ext4_should_journal_data() on the journal inode
    ext4: Fix WARN_ON_ONCE in ext4_commit_super()
    ext4: fix deadlock during page writeback
    ext4: correct error value of function verifying dx checksum
    ext4: avoid modifying checksum fields directly during checksum verification
    ext4: check for extents that wrap around
    jbd2: make journal y2038 safe
    jbd2: track more dependencies on transaction commit
    jbd2: move lockdep tracking to journal_s
    jbd2: move lockdep instrumentation for jbd2 handles
    ext4: respect the nobarrier mount option in nojournal mode
    ...

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     

30 Jun, 2016

3 commits

  • The jbd2 journal stores the commit time in 64-bit seconds and 32-bit
    nanoseconds, which avoids an overflow in 2038, but it gets the numbers
    from current_kernel_time(), which uses 'long' seconds on 32-bit
    architectures.

    This simply changes the code to call current_kernel_time64() so
    we use 64-bit seconds consistently.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org

    Arnd Bergmann
     
  • So far we were tracking only dependency on transaction commit due to
    starting a new handle (which may require commit to start a new
    transaction). Now add tracking also for other cases where we wait for
    transaction commit. This way lockdep can catch deadlocks e. g. because we
    call jbd2_journal_stop() for a synchronous handle with some locks held
    which rank below transaction start.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • Currently lockdep map is tracked in each journal handle. To be able to
    expand lockdep support to cover also other cases where we depend on
    transaction commit and where handle is not available, move lockdep map
    into struct journal_s. Since this makes the lockdep map shared for all
    handles, we have to use rwsem_acquire_read() for acquisitions now.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara