06 Apr, 2019

2 commits

  • [ Upstream commit 538bcaa6261b77e71d37f5596c33127c1a3ec3f7 ]

    The jbd2 superblock is lockless now, so there is probably a race
    condition between writing it so disk and modifing contents of it, which
    may lead to checksum error. The following race is the one case that we
    have captured.

    jbd2 fsstress
    jbd2_journal_commit_transaction
    jbd2_journal_update_sb_log_tail
    jbd2_write_superblock
    jbd2_superblock_csum_set jbd2_journal_revoke
    jbd2_journal_set_features(revork)
    modify superblock
    submit_bh(checksum incorrect)

    Fix this by locking the buffer head before modifing it. We always
    write the jbd2 superblock after we modify it, so this just means
    calling the lock_buffer() a little earlier.

    This checksum corruption problem can be reproduced by xfstests
    generic/475.

    Reported-by: zhangyi (F)
    Suggested-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Sasha Levin

    Theodore Ts'o
     
  • [ Upstream commit 6e876c3dd205d30b0db6850e97a03d75457df007 ]

    In jbd2_journal_commit_transaction(), if we are in abort mode,
    we may flush the buffer without setting descriptor block checksum
    by goto start_journal_io. Then fs is mounted,
    jbd2_descriptor_block_csum_verify() failed.

    [ 271.379811] EXT4-fs (vdd): shut down requested (2)
    [ 271.381827] Aborting journal on device vdd-8.
    [ 271.597136] JBD2: Invalid checksum recovering block 22199 in log
    [ 271.598023] JBD2: recovery failed
    [ 271.598484] EXT4-fs (vdd): error loading journal

    Fix this problem by keep setting descriptor block checksum if the
    descriptor buffer is not NULL.

    This checksum problem can be reproduced by xfstests generic/388.

    Signed-off-by: luojiajun
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Signed-off-by: Sasha Levin

    luojiajun
     

24 Mar, 2019

2 commits

  • commit 01215d3edb0f384ddeaa5e4a22c1ae5ff634149f upstream.

    The jh pointer may be used uninitialized in the two cases below and the
    compiler complain about it when enabling JBUFFER_TRACE macro, fix them.

    In file included from fs/jbd2/transaction.c:19:0:
    fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’:
    ./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized]
    #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
    ^
    fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here
    struct journal_head *jh;
    ^
    In file included from fs/jbd2/transaction.c:19:0:
    fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’:
    ./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0)
    ^
    fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here
    struct journal_head *jh;
    ^

    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org
    Reviewed-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     
  • commit 904cdbd41d749a476863a0ca41f6f396774f26e4 upstream.

    Now, we capture a data corruption problem on ext4 while we're truncating
    an extent index block. Imaging that if we are revoking a buffer which
    has been journaled by the committing transaction, the buffer's jbddirty
    flag will not be cleared in jbd2_journal_forget(), so the commit code
    will set the buffer dirty flag again after refile the buffer.

    fsx kjournald2
    jbd2_journal_commit_transaction
    jbd2_journal_revoke commit phase 1~5...
    jbd2_journal_forget
    belongs to older transaction commit phase 6
    jbddirty not clear __jbd2_journal_refile_buffer
    __jbd2_journal_unfile_buffer
    test_clear_buffer_jbddirty
    mark_buffer_dirty

    Finally, if the freed extent index block was allocated again as data
    block by some other files, it may corrupt the file data after writing
    cached pages later, such as during unmount time. (In general,
    clean_bdev_aliases() related helpers should be invoked after
    re-allocation to prevent the above corruption, but unfortunately we
    missed it when zeroout the head of extra extent blocks in
    ext4_ext_handle_unwritten_extents()).

    This patch mark buffer as freed and set j_next_transaction to the new
    transaction when it already belongs to the committing transaction in
    jbd2_journal_forget(), so that commit code knows it should clear dirty
    bits when it is done with the buffer.

    This problem can be reproduced by xfstests generic/455 easily with
    seeds (3246 3247 3248 3249).

    Signed-off-by: zhangyi (F)
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    zhangyi (F)
     

14 Nov, 2018

1 commit

  • commit ccd3c4373eacb044eb3832966299d13d2631f66f upstream.

    The code cleaning transaction's lists of checkpoint buffers has a bug
    where it increases bh refcount only after releasing
    journal->j_list_lock. Thus the following race is possible:

    CPU0 CPU1
    jbd2_log_do_checkpoint()
    jbd2_journal_try_to_free_buffers()
    __journal_try_to_free_buffer(bh)
    ...
    while (transaction->t_checkpoint_io_list)
    ...
    if (buffer_locked(bh)) {

    spin_unlock(&journal->j_list_lock);
    spin_lock(&journal->j_list_lock);
    __jbd2_journal_remove_checkpoint(jh);
    spin_unlock(&journal->j_list_lock);
    try_to_free_buffers(page);
    get_bh(bh) j_list_lock.

    Fixes: dc6e8d669cf5 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
    Fixes: be1158cc615f ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
    Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
    CC: stable@vger.kernel.org
    Reviewed-by: Lukas Czerner
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

30 Jul, 2018

1 commit

  • jbd2 is one of the few callers of current_kernel_time64(), which
    is a wrapper around ktime_get_coarse_real_ts64(). This calls the
    latter directly for consistency with the rest of the kernel that
    is moving to the ktime_get_ family of time accessors.

    Reviewed-by: Andreas Dilger
    Reviewed-by: Jan Kara
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Theodore Ts'o

    Arnd Bergmann
     

09 Jul, 2018

1 commit

  • Pull ext4 bugfixes from Ted Ts'o:
    "Bug fixes for ext4; most of which relate to vulnerabilities where a
    maliciously crafted file system image can result in a kernel OOPS or
    hang.

    At least one fix addresses an inline data bug could be triggered by
    userspace without the need of a crafted file system (although it does
    require that the inline data feature be enabled)"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: check superblock mapped prior to committing
    ext4: add more mount time checks of the superblock
    ext4: add more inode number paranoia checks
    ext4: avoid running out of journal credits when appending to an inline file
    jbd2: don't mark block as modified if the handle is out of credits
    ext4: never move the system.data xattr out of the inode body
    ext4: clear i_data in ext4_inode_info when removing inline data
    ext4: include the illegal physical block in the bad map ext4_error msg
    ext4: verify the depth of extent tree in ext4_find_extent()
    ext4: only look at the bg_flags field if it is valid
    ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
    ext4: always check block group bounds in ext4_init_block_bitmap()
    ext4: always verify the magic number in xattr blocks
    ext4: add corruption check in ext4_xattr_set_entry()
    ext4: add warn_on_error mount option

    Linus Torvalds
     

17 Jun, 2018

1 commit

  • Do not set the b_modified flag in block's journal head should not
    until after we're sure that jbd2_journal_dirty_metadat() will not
    abort with an error due to there not being enough space reserved in
    the jbd2 handle.

    Otherwise, future attempts to modify the buffer may lead a large
    number of spurious errors and warnings.

    This addresses CVE-2018-10883.

    https://bugzilla.kernel.org/show_bug.cgi?id=200071

    Signed-off-by: Theodore Ts'o
    Cc: stable@kernel.org

    Theodore Ts'o
     

13 Jun, 2018

1 commit

  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

21 May, 2018

2 commits

  • The kmem_cache_destroy() function already checks for null pointers, so
    we can remove the check at the call site.

    This patch also sets jbd2_handle_cache and jbd2_inode_cache to be NULL
    after freeing them in jbd2_journal_destroy_handle_cache().

    Signed-off-by: Wang Long
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Wang Long
     
  • See following dmesg output with jbd2 debug enabled:

    ...(start_this_handle, 313): New handle 00000000c88d6ceb going live.

    ...(start_this_handle, 383): Handle 00000000c88d6ceb given 53 credits (total 53, free 32681)

    ...(do_get_write_access, 838): journal_head 0000000002856fc0, force_copy 0

    ...(jbd2_journal_cancel_revoke, 421): journal_head 0000000002856fc0, cancelling revoke

    We have an extra line with every messages, this is a waste of buffer,
    we can fix it by removing "\n" in the caller or remove it in
    the __jbd2_debug(), i checked every jbd2_debug() passed '\n' explicitly.

    To avoid more lines, let's remove it inside __jbd2_debug().

    Signed-off-by: Wang Shilong
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Wang Shilong
     

18 Apr, 2018

1 commit

  • If ext4 tries to start a reserved handle via
    jbd2_journal_start_reserved(), and the journal has been aborted, this
    can result in a NULL pointer dereference. This is because the fields
    h_journal and h_transaction in the handle structure share the same
    memory, via a union, so jbd2_journal_start_reserved() will clear
    h_journal before calling start_this_handle(). If this function fails
    due to an aborted handle, h_journal will still be NULL, and the call
    to jbd2_journal_free_reserved() will pass a NULL journal to
    sub_reserve_credits().

    This can be reproduced by running "kvm-xfstests -c dioread_nolock
    generic/475".

    Cc: stable@kernel.org # 3.11
    Fixes: 8f7d89f36829b ("jbd2: transaction reservation support")
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Reviewed-by: Jan Kara

    Theodore Ts'o
     

20 Feb, 2018

1 commit


19 Feb, 2018

2 commits

  • Previously the jbd2 layer assumed that a file system check would be
    required after a journal abort. In the case of the deliberate file
    system shutdown, this should not be necessary. Allow the jbd2 layer
    to distinguish between these two cases by using the ESHUTDOWN errno.

    Also add proper locking to __journal_abort_soft().

    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • There were two error messages emitted by jbd2, one for a bad checksum
    for a jbd2 descriptor block, and one for a bad checksum for a jbd2
    data block. Change the data block checksum error so that the two can
    be disambiguated.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

10 Jan, 2018

1 commit

  • Sphinx emits various (26) warnings when building make target 'htmldocs'.
    Currently struct definitions contain duplicate documentation, some as
    kernel-docs and some as standard c89 comments. We can reduce
    duplication while cleaning up the kernel docs.

    Move all kernel-docs to right above each struct member. Use the set of
    all existing comments (kernel-doc and c89). Add documentation for
    missing struct members and function arguments.

    Signed-off-by: Tobin C. Harding
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Tobin C. Harding
     

18 Dec, 2017

1 commit

  • A number of ext4 source files were skipped due because their copyright
    permission statements didn't match the expected text used by the
    automated conversion utilities. I've added SPDX tags for the rest.

    While looking at some of these files, I've noticed that we have quite
    a bit of variation on the licenses that were used --- in particular
    some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
    and we have some files that have a LGPL-2.1 license (which was quite
    surprising).

    I've not attempted to do any license changes. Even if it is perfectly
    legal to relicense to GPL 2.0-only for consistency's sake, that should
    be done with ext4 developer community discussion.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

18 Nov, 2017

1 commit

  • Pull libnvdimm and dax updates from Dan Williams:
    "Save for a few late fixes, all of these commits have shipped in -next
    releases since before the merge window opened, and 0day has given a
    build success notification.

    The ext4 touches came from Jan, and the xfs touches have Darrick's
    reviewed-by. An xfstest for the MAP_SYNC feature has been through
    a few round of reviews and is on track to be merged.

    - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
    'userspace flush' of persistent memory updates via filesystem-dax
    mappings. It arranges for any filesystem metadata updates that may
    be required to satisfy a write fault to also be flushed ("on disk")
    before the kernel returns to userspace from the fault handler.
    Effectively every write-fault that dirties metadata completes an
    fsync() before returning from the fault handler. The new
    MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag
    is validated as supported by the filesystem's ->mmap() file
    operation.

    - Add support for the standard ACPI 6.2 label access methods that
    replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods.
    This enables interoperability with environments that only implement
    the standardized methods.

    - Add support for the ACPI 6.2 NVDIMM media error injection methods.

    - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for
    latch last shutdown status, firmware update, SMART error injection,
    and SMART alarm threshold control.

    - Cleanup physical address information disclosures to be root-only.

    - Fix revalidation of the DIMM "locked label area" status to support
    dynamic unlock of the label area.

    - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
    (system-physical-address) command and error injection commands.

    Acknowledgements that came after the commits were pushed to -next:

    - 957ac8c421ad ("dax: fix PMD faults on zero-length files"):
    Reviewed-by: Ross Zwisler

    - a39e596baa07 ("xfs: support for synchronous DAX faults") and
    7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()")
    Reviewed-by: Darrick J. Wong "

    * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits)
    acpi, nfit: add 'Enable Latch System Shutdown Status' command support
    dax: fix general protection fault in dax_alloc_inode
    dax: fix PMD faults on zero-length files
    dax: stop requiring a live device for dax_flush()
    brd: remove dax support
    dax: quiet bdev_dax_supported()
    fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
    tools/testing/nvdimm: unit test clear-error commands
    acpi, nfit: validate commands against the device type
    tools/testing/nvdimm: stricter bounds checking for error injection commands
    xfs: support for synchronous DAX faults
    xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
    ext4: Support for synchronous DAX faults
    ext4: Simplify error handling in ext4_dax_huge_fault()
    dax: Implement dax_finish_sync_fault()
    dax, iomap: Add support for synchronous faults
    mm: Define MAP_SYNC and VM_SYNC flags
    dax: Allow tuning whether dax_insert_mapping_entry() dirties entry
    dax: Allow dax_iomap_fault() to return pfn
    dax: Fix comment describing dax_iomap_fault()
    ...

    Linus Torvalds
     

03 Nov, 2017

1 commit

  • We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to
    prepare blocks for writing and the inode has some uncommitted metadata
    changes. In the fault handler ext4_dax_fault() we then detect this case
    (through VM_FAULT_NEEDDSYNC return value) and call helper
    dax_finish_sync_fault() to flush metadata changes and insert page table
    entry. Note that this will also dirty corresponding radix tree entry
    which is what we want - fsync(2) will still provide data integrity
    guarantees for applications not using userspace flushing. And
    applications using userspace flushing can avoid calling fsync(2) and
    thus avoid the performance overhead.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     

19 Oct, 2017

1 commit

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Signed-off-by: Kees Cook
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: linux-ext4@vger.kernel.org
    Cc: Thomas Gleixner

    Kees Cook
     

08 Jul, 2017

1 commit

  • Pull Writeback error handling updates from Jeff Layton:
    "This pile represents the bulk of the writeback error handling fixes
    that I have for this cycle. Some of the earlier patches in this pile
    may look trivial but they are prerequisites for later patches in the
    series.

    The aim of this set is to improve how we track and report writeback
    errors to userland. Most applications that care about data integrity
    will periodically call fsync/fdatasync/msync to ensure that their
    writes have made it to the backing store.

    For a very long time, we have tracked writeback errors using two flags
    in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
    writeback error occurs (via mapping_set_error) and are cleared as a
    side-effect of filemap_check_errors (as you noted yesterday). This
    model really sucks for userland.

    Only the first task to call fsync (or msync or fdatasync) will see the
    error. Any subsequent task calling fsync on a file will get back 0
    (unless another writeback error occurs in the interim). If I have
    several tasks writing to a file and calling fsync to ensure that their
    writes got stored, then I need to have them coordinate with one
    another. That's difficult enough, but in a world of containerized
    setups that coordination may even not be possible.

    But wait...it gets worse!

    The calls to filemap_check_errors can be buried pretty far down in the
    call stack, and there are internal callers of filemap_write_and_wait
    and the like that also end up clearing those errors. Many of those
    callers ignore the error return from that function or return it to
    userland at nonsensical times (e.g. truncate() or stat()). If I get
    back -EIO on a truncate, there is no reason to think that it was
    because some previous writeback failed, and a subsequent fsync() will
    (incorrectly) return 0.

    This pile aims to do three things:

    1) ensure that when a writeback error occurs that that error will be
    reported to userland on a subsequent fsync/fdatasync/msync call,
    regardless of what internal callers are doing

    2) report writeback errors on all file descriptions that were open at
    the time that the error occurred. This is a user-visible change,
    but I think most applications are written to assume this behavior
    anyway. Those that aren't are unlikely to be hurt by it.

    3) document what filesystems should do when there is a writeback
    error. Today, there is very little consistency between them, and a
    lot of cargo-cult copying. We need to make it very clear what
    filesystems should do in this situation.

    To achieve this, the set adds a new data type (errseq_t) and then
    builds new writeback error tracking infrastructure around that. Once
    all of that is in place, we change the filesystems to use the new
    infrastructure for reporting wb errors to userland.

    Note that this is just the initial foray into cleaning up this mess.
    There is a lot of work remaining here:

    1) convert the rest of the filesystems in a similar fashion. Once the
    initial set is in, then I think most other fs' will be fairly
    simple to convert. Hopefully most of those can in via individual
    filesystem trees.

    2) convert internal waiters on writeback to use errseq_t for
    detecting errors instead of relying on the AS_* flags. I have some
    draft patches for this for ext4, but they are not quite ready for
    prime time yet.

    This was a discussion topic this year at LSF/MM too. If you're
    interested in the gory details, LWN has some good articles about this:

    https://lwn.net/Articles/718734/
    https://lwn.net/Articles/724307/"

    * tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    btrfs: minimal conversion to errseq_t writeback error reporting on fsync
    xfs: minimal conversion to errseq_t writeback error reporting
    ext4: use errseq_t based error handling for reporting data writeback errors
    fs: convert __generic_file_fsync to use errseq_t based reporting
    block: convert to errseq_t based writeback error tracking
    dax: set errors in mapping when writeback fails
    Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
    mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
    fs: new infrastructure for writeback error handling and reporting
    lib: add errseq_t type and infrastructure for handling it
    mm: don't TestClearPageError in __filemap_fdatawait_range
    mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
    jbd2: don't clear and reset errors after waiting on writeback
    buffer: set errors in mapping at the time that the error occurs
    fs: check for writeback errors after syncing out buffers in generic_file_fsync
    buffer: use mapping_set_error instead of setting the flag
    mm: fix mapping_set_error call in me_pagecache_dirty

    Linus Torvalds
     

06 Jul, 2017

1 commit

  • Resetting this flag is almost certainly racy, and will be problematic
    with some coming changes.

    Make filemap_fdatawait_keep_errors return int, but not clear the flag(s).
    Have jbd2 call it instead of filemap_fdatawait and don't attempt to
    re-set the error flag if it fails.

    Reviewed-by: Jan Kara
    Reviewed-by: Carlos Maiolino
    Signed-off-by: Jeff Layton

    Jeff Layton
     

04 Jul, 2017

1 commit

  • Pull documentation updates from Jonathan Corbet:
    "There has been a fair amount of activity in the docs tree this time
    around. Highlights include:

    - Conversion of a bunch of security documentation into RST

    - The conversion of the remaining DocBook templates by The Amazing
    Mauro Machine. We can now drop the entire DocBook build chain.

    - The usual collection of fixes and minor updates"

    * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
    scripts/kernel-doc: handle DECLARE_HASHTABLE
    Documentation: atomic_ops.txt is core-api/atomic_ops.rst
    Docs: clean up some DocBook loose ends
    Make the main documentation title less Geocities
    Docs: Use kernel-figure in vidioc-g-selection.rst
    Docs: fix table problems in ras.rst
    Docs: Fix breakage with Sphinx 1.5 and upper
    Docs: Include the Latex "ifthen" package
    doc/kokr/howto: Only send regression fixes after -rc1
    docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
    doc: Document suitability of IBM Verse for kernel development
    Doc: fix a markup error in coding-style.rst
    docs: driver-api: i2c: remove some outdated information
    Documentation: DMA API: fix a typo in a function name
    Docs: Insert missing space to separate link from text
    doc/ko_KR/memory-barriers: Update control-dependencies example
    Documentation, kbuild: fix typo "minimun" -> "minimum"
    docs: Fix some formatting issues in request-key.rst
    doc: ReSTify keys-trusted-encrypted.txt
    doc: ReSTify keys-request-key.txt
    ...

    Linus Torvalds
     

20 Jun, 2017

1 commit


22 May, 2017

1 commit

  • When a transaction starts, start_this_handle() saves current
    PF_MEMALLOC_NOFS value so that it can be restored at journal stop time.
    Journal restart is a special case that calls start_this_handle() without
    stopping the transaction. start_this_handle() isn't aware that the
    original value is already stored so it overwrites it with current value.

    For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set
    at the end:

    jbd2_journal_start()
    jbd2__journal_restart()
    jbd2_journal_stop()

    Make jbd2__journal_restart() restore the original value before calling
    start_this_handle().

    Fixes: 81378da64de6 ("jbd2: mark the transaction context with the scope GFP_NOFS context")
    Signed-off-by: Tahsin Erdogan
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Tahsin Erdogan
     

19 May, 2017

1 commit

  • Mauro says:

    This patch series convert the remaining DocBooks to ReST.

    The first version was originally
    send as 3 patch series:

    [PATCH 00/36] Convert DocBook documents to ReST
    [PATCH 0/5] Convert more books to ReST
    [PATCH 00/13] Get rid of DocBook

    The lsm book was added as if it were a text file under
    Documentation. The plan is to merge it with another file
    under Documentation/security, after both this series and
    a security Documentation patch series gets merged.

    It also adjusts some Sphinx-pedantic errors/warnings on
    some kernel-doc markups.

    I also added some patches here to add PDF output for all
    existing ReST books.

    Jonathan Corbet
     

16 May, 2017

2 commits


11 May, 2017

1 commit

  • Pull RCU updates from Ingo Molnar:
    "The main changes are:

    - Debloat RCU headers

    - Parallelize SRCU callback handling (plus overlapping patches)

    - Improve the performance of Tree SRCU on a CPU-hotplug stress test

    - Documentation updates

    - Miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
    rcu: Open-code the rcu_cblist_n_lazy_cbs() function
    rcu: Open-code the rcu_cblist_n_cbs() function
    rcu: Open-code the rcu_cblist_empty() function
    rcu: Separately compile large rcu_segcblist functions
    srcu: Debloat the header
    srcu: Adjust default auto-expediting holdoff
    srcu: Specify auto-expedite holdoff time
    srcu: Expedite first synchronize_srcu() when idle
    srcu: Expedited grace periods with reduced memory contention
    srcu: Make rcutorture writer stalls print SRCU GP state
    srcu: Exact tracking of srcu_data structures containing callbacks
    srcu: Make SRCU be built by default
    srcu: Fix Kconfig botch when SRCU not selected
    rcu: Make non-preemptive schedule be Tasks RCU quiescent state
    srcu: Expedite srcu_schedule_cbs_snp() callback invocation
    srcu: Parallelize callback handling
    kvm: Move srcu_struct fields to end of struct kvm
    rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
    rcu: Use true/false in assignment to bool
    rcu: Use bool value directly
    ...

    Linus Torvalds
     

09 May, 2017

1 commit

  • Pull ext4 updates from Ted Ts'o:

    - add GETFSMAP support

    - some performance improvements for very large file systems and for
    random write workloads into a preallocated file

    - bug fixes and cleanups.

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd2: cleanup write flags handling from jbd2_write_superblock()
    ext4: mark superblock writes synchronous for nobarrier mounts
    ext4: inherit encryption xattr before other xattrs
    ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio()
    ext4: avoid unnecessary transaction stalls during writeback
    ext4: preload block group descriptors
    ext4: make ext4_shutdown() static
    ext4: support GETFSMAP ioctls
    vfs: add common GETFSMAP ioctl definitions
    ext4: evict inline data when writing to memory map
    ext4: remove ext4_xattr_check_entry()
    ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()
    ext4: merge ext4_xattr_list() into ext4_listxattr()
    ext4: constify static data that is never modified
    ext4: trim return value and 'dir' argument from ext4_insert_dentry()
    jbd2: fix dbench4 performance regression for 'nobarrier' mounts
    jbd2: Fix lockdep splat with generic/270 test
    mm: retry writepages() on ENOMEM when doing an data integrity writeback

    Linus Torvalds
     

04 May, 2017

3 commits

  • Currently jbd2_write_superblock() silently adds REQ_SYNC to flags with
    which journal superblock is written. Make this explicit by making flags
    passed down to jbd2_write_superblock() contain REQ_SYNC.

    CC: linux-ext4@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • kjournald2 is central to the transaction commit processing. As such any
    potential allocation from this kernel thread has to be GFP_NOFS. Make
    sure to mark the whole kernel thread GFP_NOFS by the memalloc_nofs_save.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20170306131408.9828-8-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • now that we have memalloc_nofs_{save,restore} api we can mark the whole
    transaction context as implicitly GFP_NOFS. All allocations will
    automatically inherit GFP_NOFS this way. This means that we do not have
    to mark any of those requests with GFP_NOFS and moreover all the
    ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded
    GFP_KERNEL allocations deep inside the vmalloc will be NOFS now.

    [akpm@linux-foundation.org: tweak comments]
    Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Theodore Ts'o
    Cc: Chris Mason
    Cc: David Sterba
    Cc: Brian Foster
    Cc: Darrick J. Wong
    Cc: Nikolay Borisov
    Cc: Peter Zijlstra
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

30 Apr, 2017

2 commits

  • Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
    synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
    JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
    filesystem is mounted with nobarrier mount option, journal superblock
    writes ended up being async writes after this patch and that caused
    heavy performance regression for dbench4 benchmark with high number of
    processes. In my test setup with HP RAID array with non-volatile write
    cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.

    Fix the problem by making sure journal superblock writes are always
    treated as synchronous since they generally block progress of the
    journalling machinery and thus the whole filesystem.

    Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • I've hit a lockdep splat with generic/270 test complaining that:

    3216.fsstress.b/3533 is trying to acquire lock:
    (jbd2_handle){++++..}, at: [] jbd2_log_wait_commit+0x0/0x150

    but task is already holding lock:
    (jbd2_handle){++++..}, at: [] start_this_handle+0x35b/0x850

    The underlying problem is that jbd2_journal_force_commit_nested()
    (called from ext4_should_retry_alloc()) may get called while a
    transaction handle is started. In such case it takes care to not wait
    for commit of the running transaction (which would deadlock) but only
    for a commit of a transaction that is already committing (which is safe
    as that doesn't wait for any filesystem locks).

    In fact there are also other callers of jbd2_log_wait_commit() that take
    care to pass tid of a transaction that is already committing and for
    those cases, the lockdep instrumentation is too restrictive and leading
    to false positive reports. Fix the problem by calling
    jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the
    transaction isn't already committing.

    Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

23 Apr, 2017

1 commit


19 Apr, 2017

1 commit

  • A group of Linux kernel hackers reported chasing a bug that resulted
    from their assumption that SLAB_DESTROY_BY_RCU provided an existence
    guarantee, that is, that no block from such a slab would be reallocated
    during an RCU read-side critical section. Of course, that is not the
    case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
    slab of blocks.

    However, there is a phrase for this, namely "type safety". This commit
    therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
    to avoid future instances of this sort of confusion.

    Signed-off-by: Paul E. McKenney
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrew Morton
    Cc:
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    [ paulmck: Add comments mentioning the old name, as requested by Eric
    Dumazet, in order to help people familiar with the old name find
    the new one. ]
    Acked-by: David Rientjes

    Paul E. McKenney
     

16 Mar, 2017

1 commit

  • In journal_init_common(), if we failed to allocate the j_wbuf array, or
    if we failed to create the buffer_head for the journal superblock, we
    leaked the memory allocated for the revocation tables. Fix this.

    Cc: stable@vger.kernel.org # 4.9
    Fixes: f0c9fd5458bacf7b12a9a579a727dc740cbe047e
    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara

    Eric Biggers
     

21 Feb, 2017

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "For this cycle we add support for the shutdown ioctl, which is
    primarily used for testing, but which can be useful on production
    systems when a scratch volume is being destroyed and the data on it
    doesn't need to be saved.

    This found (and we fixed) a number of bugs with ext4's recovery to
    corrupted file system --- the bugs increased the amount of data that
    could be potentially lost, and in the case of the inline data feature,
    could cause the kernel to BUG.

    Also included are a number of other bug fixes, including in ext4's
    fscrypt, DAX, inline data support"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
    ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWN
    ext4: fix fencepost in s_first_meta_bg validation
    ext4: don't BUG when truncating encrypted inodes on the orphan list
    ext4: do not use stripe_width if it is not set
    ext4: fix stripe-unaligned allocations
    dax: assert that i_rwsem is held exclusive for writes
    ext4: fix DAX write locking
    ext4: add EXT4_IOC_GOINGDOWN ioctl
    ext4: add shutdown bit and check for it
    ext4: rename s_resize_flags to s_ext4_flags
    ext4: return EROFS if device is r/o and journal replay is needed
    ext4: preserve the needs_recovery flag when the journal is aborted
    jbd2: don't leak modified metadata buffers on an aborted journal
    ext4: fix inline data error paths
    ext4: move halfmd4 into hash.c directly
    ext4: fix use-after-iput when fscrypt contexts are inconsistent
    jbd2: fix use after free in kjournald2()
    ext4: fix data corruption in data=journal mode
    ext4: trim allocation requests to group size
    ext4: replace BUG_ON with WARN_ON in mb_find_extent()
    ...

    Linus Torvalds
     

05 Feb, 2017

1 commit