15 Nov, 2020

1 commit

  • Though problem if found on a lower 4.1.12 kernel, I think upstream has
    same issue.

    In one node in the cluster, there is the following callback trace:

    # cat /proc/21473/stack
    __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
    ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
    ocfs2_evict_inode+0x152/0x820 [ocfs2]
    evict+0xae/0x1a0
    iput+0x1c6/0x230
    ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
    ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
    ocfs2_dir_foreach+0x29/0x30 [ocfs2]
    ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
    ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
    process_one_work+0x169/0x4a0
    worker_thread+0x5b/0x560
    kthread+0xcb/0xf0
    ret_from_fork+0x61/0x90

    The above stack is not reasonable, the final iput shouldn't happen in
    ocfs2_orphan_filldir() function. Looking at the code,

    2067 /* Skip inodes which are already added to recover list, since dio may
    2068 * happen concurrently with unlink/rename */
    2069 if (OCFS2_I(iter)->ip_next_orphan) {
    2070 iput(iter);
    2071 return 0;
    2072 }
    2073

    The logic thinks the inode is already in recover list on seeing
    ip_next_orphan is non-NULL, so it skip this inode after dropping a
    reference which incremented in ocfs2_iget().

    While, if the inode is already in recover list, it should have another
    reference and the iput() at line 2070 should not be the final iput
    (dropping the last reference). So I don't think the inode is really in
    the recover list (no vmcore to confirm).

    Note that ocfs2_queue_orphans(), though not shown up in the call back
    trace, is holding cluster lock on the orphan directory when looking up
    for unlinked inodes. The on disk inode eviction could involve a lot of
    IOs which may need long time to finish. That means this node could hold
    the cluster lock for very long time, that can lead to the lock requests
    (from other nodes) to the orhpan directory hang for long time.

    Looking at more on ip_next_orphan, I found it's not initialized when
    allocating a new ocfs2_inode_info structure.

    This causes te reflink operations from some nodes hang for very long
    time waiting for the cluster lock on the orphan directory.

    Fix: initialize ip_next_orphan as NULL.

    Signed-off-by: Wengang Wang
    Signed-off-by: Andrew Morton
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com
    Signed-off-by: Linus Torvalds

    Wengang Wang
     

07 Nov, 2020

1 commit

  • The on-disk superblock field sb->s_maxlen represents the total size of
    the journal including the fast commit area and is no more the max
    number of blocks available for a transaction. The maximum number of
    blocks available to a transaction is reduced by the number of fast
    commit blocks. So, this patch renames j_maxlen to j_total_len to
    better represent its intent. Also, it adds a function to calculate max
    number of bufs available for a transaction.

    Suggested-by: Jan Kara
    Signed-off-by: Harshad Shirwadkar
    Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com
    Signed-off-by: Theodore Ts'o

    Harshad Shirwadkar
     

23 Oct, 2020

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "The siginificant new ext4 feature this time around is Harshad's new
    fast_commit mode.

    In addition, thanks to Mauricio for fixing a race where mmap'ed pages
    that are being changed in parallel with a data=journal transaction
    commit could result in bad checksums in the failure that could cause
    journal replays to fail.

    Also notable is Ritesh's buffered write optimization which can result
    in significant improvements on parallel write workloads. (The kernel
    test robot reported a 330.6% improvement on fio.write_iops on a 96
    core system using DAX)

    Besides that, we have the usual miscellaneous cleanups and bug fixes"

    Link: https://lore.kernel.org/r/20200925071217.GO28663@shao2-debian

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
    ext4: fix invalid inode checksum
    ext4: add fast commit stats in procfs
    ext4: add a mount opt to forcefully turn fast commits on
    ext4: fast commit recovery path
    jbd2: fast commit recovery path
    ext4: main fast-commit commit path
    jbd2: add fast commit machinery
    ext4 / jbd2: add fast commit initialization
    ext4: add fast_commit feature and handling for extended mount options
    doc: update ext4 and journalling docs to include fast commit feature
    ext4: Detect already used quota file early
    jbd2: avoid transaction reuse after reformatting
    ext4: use the normal helper to get the actual inode
    ext4: fix bs < ps issue reported with dioread_nolock mount opt
    ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()
    ext4: data=journal: fixes for ext4_page_mkwrite()
    jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers()
    jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()
    ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable()
    ext4: use ext4_sb_bread() instead of sb_bread()
    ...

    Linus Torvalds
     

18 Oct, 2020

1 commit

  • Introduce journal callbacks to allow different behaviors
    for an inode in journal_submit|finish_inode_data_buffers().

    The existing users of the current behavior (ext4, ocfs2)
    are adapted to use the previously exported functions
    that implement the current behavior.

    Users are callers of jbd2_journal_inode_ranged_write|wait(),
    which adds the inode to the transaction's inode list with
    the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree.

    Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2,
    which builds fs/jbd2/commit.c and journal.c that define and
    export the functions, so we can call directly in ext4/ocfs2.

    Signed-off-by: Mauricio Faria de Oliveira
    Suggested-by: Jan Kara
    Reviewed-by: Jan Kara
    Reviewed-by: Andreas Dilger
    Link: https://lore.kernel.org/r/20201006004841.600488-3-mfo@canonical.com
    Signed-off-by: Theodore Ts'o

    Mauricio Faria de Oliveira
     

14 Oct, 2020

2 commits

  • When we discard unused blocks on a mounted ocfs2 filesystem, fstrim
    handles each block goup with locking/unlocking global bitmap meta-file
    repeatedly. we should let fstrim thread take a break(if need) between
    unlock and lock, this will avoid the potential soft lockup problem,
    and also gives the upper applications more IO opportunities, these
    applications are not blocked for too long at writing files.

    Signed-off-by: Gang He
    Signed-off-by: Andrew Morton
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Jun Piao
    Link: https://lkml.kernel.org/r/20200927015815.14904-1-ghe@suse.com
    Signed-off-by: Linus Torvalds

    Gang He
     
  • Drop duplicated words {the, and} in comments.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Link: https://lkml.kernel.org/r/20200811021845.25134-1-rdunlap@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

24 Sep, 2020

1 commit


24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

08 Aug, 2020

6 commits

  • Pull misc vfs updates from Al Viro:
    "No common topic whatsoever in those, sorry"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: define inode flags using bit numbers
    iov_iter: Move unnecessary inclusion of crypto/hash.h
    dlmfs: clean up dlmfs_file_{read,write}() a bit

    Linus Torvalds
     
  • Based on what fails, function can return with nfs_sync_rwlock either
    locked or unlocked. That can not be right.

    Always return with lock unlocked on error.

    Fixes: 4cd9973f9ff6 ("ocfs2: avoid inode removal while nfsd is accessing it")
    Signed-off-by: Pavel Machek (CIP)
    Signed-off-by: Andrew Morton
    Reviewed-by: Joseph Qi
    Reviewed-by: Andrew Morton
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Link: http://lkml.kernel.org/r/20200724124443.GA28164@duo.ucw.cz
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Rationale:
    Reduces attack surface on kernel devs opening the links for MITM
    as HTTPS traffic is much harder to manipulate.

    Deterministic algorithm:
    For each file:
    If not .svg:
    For each line:
    If doesn't contain `xmlns`:
    For each link, `http://[^# ]*(?:\w|/)`:
    If neither `gnu\.org/license`, nor `mozilla\.org/MPL`:
    If both the HTTP and HTTPS versions
    return 200 OK and serve the same content:
    Replace HTTP with HTTPS.

    Signed-off-by: Alexander A. Klimov
    Signed-off-by: Andrew Morton
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Link: http://lkml.kernel.org/r/20200713174456.36596-1-grandmaster@al2klimov.de
    Signed-off-by: Linus Torvalds

    Alexander A. Klimov
     
  • Dan Carpenter reported the following static checker warning.

    fs/ocfs2/super.c:1269 ocfs2_parse_options() warn: '(-1)' 65535 can't fit into 32767 'mopt->slot'
    fs/ocfs2/suballoc.c:859 ocfs2_init_inode_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_inode_steal_slot'
    fs/ocfs2/suballoc.c:867 ocfs2_init_meta_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_meta_steal_slot'

    That's because OCFS2_INVALID_SLOT is (u16)-1. Slot number in ocfs2 can be
    never negative, so change s16 to u16.

    Fixes: 9277f8334ffc ("ocfs2: fix value of OCFS2_INVALID_SLOT")
    Reported-by: Dan Carpenter
    Signed-off-by: Junxiao Bi
    Signed-off-by: Andrew Morton
    Reviewed-by: Joseph Qi
    Reviewed-by: Gang He
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Jun Piao
    Cc:
    Link: http://lkml.kernel.org/r/20200627001259.19757-1-junxiao.bi@oracle.com
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • Drop the repeated word "is" in a comment.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Joseph Qi
    Link: http://lkml.kernel.org/r/20200720001421.28823-1-rdunlap@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • When use setfacl command to change a file's acl, the user cannot get the
    latest acl information from the file via getfacl command, until remounting
    the file system.

    e.g.
    setfacl -m u:ivan:rw /ocfs2/ivan
    getfacl /ocfs2/ivan
    getfacl: Removing leading '/' from absolute path names
    file: ocfs2/ivan
    owner: root
    group: root
    user::rw-
    group::r--
    mask::r--
    other::r--

    The latest acl record("u:ivan:rw") cannot be returned via getfacl
    command until remounting.

    Signed-off-by: Gang He
    Signed-off-by: Andrew Morton
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Jun Piao
    Link: http://lkml.kernel.org/r/20200717023751.9922-1-ghe@suse.com
    Signed-off-by: Linus Torvalds

    Gang He
     

17 Jul, 2020

1 commit

  • Using uninitialized_var() is dangerous as it papers over real bugs[1]
    (or can in the future), and suppresses unrelated compiler warnings
    (e.g. "unused variable"). If the compiler thinks it is uninitialized,
    either simply initialize the variable or make compiler changes.

    In preparation for removing[2] the[3] macro[4], remove all remaining
    needless uses with the following script:

    git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
    xargs perl -pi -e \
    's/\buninitialized_var\(([^\)]+)\)/\1/g;
    s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

    drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
    pathological white-space.

    No outstanding warnings were found building allmodconfig with GCC 9.3.0
    for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
    alpha, and m68k.

    [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
    [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
    [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
    [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

    Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
    Acked-by: Jason Gunthorpe # IB
    Acked-by: Kalle Valo # wireless drivers
    Reviewed-by: Chao Yu # erofs
    Signed-off-by: Kees Cook

    Kees Cook
     

26 Jun, 2020

4 commits

  • In the ocfs2 disk layout, slot number is 16 bits, but in ocfs2
    implementation, slot number is 32 bits. Usually this will not cause any
    issue, because slot number is converted from u16 to u32, but
    OCFS2_INVALID_SLOT was defined as -1, when an invalid slot number from
    disk was obtained, its value was (u16)-1, and it was converted to u32.
    Then the following checking in get_local_system_inode will be always
    skipped:

    static struct inode **get_local_system_inode(struct ocfs2_super *osb,
    int type,
    u32 slot)
    {
    BUG_ON(slot == OCFS2_INVALID_SLOT);
    ...
    }

    Link: http://lkml.kernel.org/r/20200616183829.87211-5-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • The following kernel panic was captured when running nfs server over
    ocfs2, at that time ocfs2_test_inode_bit() was checking whether one
    inode locating at "blkno" 5 was valid, that is ocfs2 root inode, its
    "suballoc_slot" was OCFS2_INVALID_SLOT(65535) and it was allocted from
    //global_inode_alloc, but here it wrongly assumed that it was got from per
    slot inode alloctor which would cause array overflow and trigger kernel
    panic.

    BUG: unable to handle kernel paging request at 0000000000001088
    IP: [] _raw_spin_lock+0x18/0xf0
    PGD 1e06ba067 PUD 1e9e7d067 PMD 0
    Oops: 0002 [#1] SMP
    CPU: 6 PID: 24873 Comm: nfsd Not tainted 4.1.12-124.36.1.el6uek.x86_64 #2
    Hardware name: Huawei CH121 V3/IT11SGCA1, BIOS 3.87 02/02/2018
    RIP: _raw_spin_lock+0x18/0xf0
    RSP: e02b:ffff88005ae97908 EFLAGS: 00010206
    RAX: ffff88005ae98000 RBX: 0000000000001088 RCX: 0000000000000000
    RDX: 0000000000020000 RSI: 0000000000000009 RDI: 0000000000001088
    RBP: ffff88005ae97928 R08: 0000000000000000 R09: ffff880212878e00
    R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000001088
    R13: ffff8800063c0aa8 R14: ffff8800650c27d0 R15: 000000000000ffff
    FS: 0000000000000000(0000) GS:ffff880218180000(0000) knlGS:ffff880218180000
    CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000001088 CR3: 00000002033d0000 CR4: 0000000000042660
    Call Trace:
    igrab+0x1e/0x60
    ocfs2_get_system_file_inode+0x63/0x3a0 [ocfs2]
    ocfs2_test_inode_bit+0x328/0xa00 [ocfs2]
    ocfs2_get_parent+0xba/0x3e0 [ocfs2]
    reconnect_path+0xb5/0x300
    exportfs_decode_fh+0xf6/0x2b0
    fh_verify+0x350/0x660 [nfsd]
    nfsd4_putfh+0x4d/0x60 [nfsd]
    nfsd4_proc_compound+0x3d3/0x6f0 [nfsd]
    nfsd_dispatch+0xe0/0x290 [nfsd]
    svc_process_common+0x412/0x6a0 [sunrpc]
    svc_process+0x123/0x210 [sunrpc]
    nfsd+0xff/0x170 [nfsd]
    kthread+0xcb/0xf0
    ret_from_fork+0x61/0x90
    Code: 83 c2 02 0f b7 f2 e8 18 dc 91 ff 66 90 eb bf 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 53 0f 1f 44 00 00 48 89 fb ba 00 00 02 00 0f c1 17 89 d0 45 31 e4 45 31 ed c1 e8 10 66 39 d0 41 89 c6
    RIP _raw_spin_lock+0x18/0xf0
    CR2: 0000000000001088
    ---[ end trace 7264463cd1aac8f9 ]---
    Kernel panic - not syncing: Fatal exception

    Link: http://lkml.kernel.org/r/20200616183829.87211-4-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Joel Becker
    Cc: Jun Piao
    Cc: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • Set global_inode_alloc as OCFS2_FIRST_ONLINE_SYSTEM_INODE, that will
    make it load during mount. It can be used to test whether some
    global/system inodes are valid. One use case is that nfsd will test
    whether root inode is valid.

    Link: http://lkml.kernel.org/r/20200616183829.87211-3-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Joel Becker
    Cc: Jun Piao
    Cc: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • Patch series "ocfs2: fix nfsd over ocfs2 issues", v2.

    This is a series of patches to fix issues on nfsd over ocfs2. patch 1
    is to avoid inode removed while nfsd access it patch 2 & 3 is to fix a
    panic issue.

    This patch (of 4):

    When nfsd is getting file dentry using handle or parent dentry of some
    dentry, one cluster lock is used to avoid inode removed from other node,
    but it still could be removed from local node, so use a rw lock to avoid
    this.

    Link: http://lkml.kernel.org/r/20200616183829.87211-1-junxiao.bi@oracle.com
    Link: http://lkml.kernel.org/r/20200616183829.87211-2-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Joel Becker
    Cc: Jun Piao
    Cc: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

15 Jun, 2020

1 commit


12 Jun, 2020

1 commit

  • After commit 12abc5ee7873 ("tcp: add tcp_sock_set_nodelay") and commit
    c488aeadcbd0 ("tcp: add tcp_sock_set_user_timeout"), building the kernel
    with OCFS2_FS=y but without INET=y causes it to fail with:

    ld: fs/ocfs2/cluster/tcp.o: in function `o2net_accept_many':
    tcp.c:(.text+0x21b1): undefined reference to `tcp_sock_set_nodelay'
    ld: tcp.c:(.text+0x21c1): undefined reference to `tcp_sock_set_user_timeout'
    ld: fs/ocfs2/cluster/tcp.o: in function `o2net_start_connect':
    tcp.c:(.text+0x2633): undefined reference to `tcp_sock_set_nodelay'
    ld: tcp.c:(.text+0x2643): undefined reference to `tcp_sock_set_user_timeout'

    This is due to tcp_sock_set_nodelay() and tcp_sock_set_user_timeout()
    being declared in linux/tcp.h and defined in net/ipv4/tcp.c, which
    depend on TCP/IP being enabled.

    To fix this, make OCFS2_FS depend on INET=y which already requires
    NET=y.

    Fixes: 12abc5ee7873 ("tcp: add tcp_sock_set_nodelay")
    Fixes: c488aeadcbd0 ("tcp: add tcp_sock_set_user_timeout")
    Signed-off-by: Tom Seewald
    Signed-off-by: Andrew Morton
    Reviewed-by: Joseph Qi
    Acked-by: Christoph Hellwig
    Cc: Sagi Grimberg
    Cc: Jason Gunthorpe
    Cc: David S. Miller
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Link: http://lkml.kernel.org/r/20200606190827.23954-1-tseewald@gmail.com
    Signed-off-by: Linus Torvalds

    Tom Seewald
     

11 Jun, 2020

1 commit

  • ./ocfs2/mmap.c:65: bebongs ==> belonging

    Signed-off-by: Keyur Patel
    Signed-off-by: Andrew Morton
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Link: http://lkml.kernel.org/r/20200608014818.102358-1-iamkeyur96@gmail.com
    Signed-off-by: Linus Torvalds

    Keyur Patel
     

06 Jun, 2020

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "A lot of bug fixes and cleanups for ext4, including:

    - Fix performance problems found in dioread_nolock now that it is the
    default, caused by transaction leaks.

    - Clean up fiemap handling in ext4

    - Clean up and refactor multiple block allocator (mballoc) code

    - Fix a problem with mballoc with a smaller file systems running out
    of blocks because they couldn't properly use blocks that had been
    reserved by inode preallocation.

    - Fixed a race in ext4_sync_parent() versus rename()

    - Simplify the error handling in the extent manipulation code

    - Make sure all metadata I/O errors are felected to
    ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.

    - Avoid passing an error pointer to brelse in ext4_xattr_set()

    - Fix race which could result to freeing an inode on the dirty last
    in data=journal mode.

    - Fix refcount handling if ext4_iget() fails

    - Fix a crash in generic/019 caused by a corrupted extent node"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
    ext4: avoid unnecessary transaction starts during writeback
    ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
    ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
    fs: remove the access_ok() check in ioctl_fiemap
    fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
    fs: move fiemap range validation into the file systems instances
    iomap: fix the iomap_fiemap prototype
    fs: move the fiemap definitions out of fs.h
    fs: mark __generic_block_fiemap static
    ext4: remove the call to fiemap_check_flags in ext4_fiemap
    ext4: split _ext4_fiemap
    ext4: fix fiemap size checks for bitmap files
    ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
    add comment for ext4_dir_entry_2 file_type member
    jbd2: avoid leaking transaction credits when unreserving handle
    ext4: drop ext4_journal_free_reserved()
    ext4: mballoc: use lock for checking free blocks while retrying
    ext4: mballoc: refactor ext4_mb_good_group()
    ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
    ext4: mballoc: refactor ext4_mb_discard_preallocations()
    ...

    Linus Torvalds
     

04 Jun, 2020

3 commits

  • By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is
    handled once instead of duplicated, but can still be done under fs locks,
    like xfs/iomap intended with its duplicate handling. Also make sure the
    error value of filemap_write_and_wait is propagated to user space.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Amir Goldstein
    Reviewed-by: Darrick J. Wong
    Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.de
    Signed-off-by: Theodore Ts'o

    Christoph Hellwig
     
  • Replace fiemap_check_flags with a fiemap_prep helper that also takes the
    inode and mapped range, and performs the sanity check and truncation
    previously done in fiemap_check_range. This way the validation is inside
    the file system itself and thus properly works for the stacked overlayfs
    case as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Amir Goldstein
    Reviewed-by: Darrick J. Wong
    Link: https://lore.kernel.org/r/20200523073016.2944131-7-hch@lst.de
    Signed-off-by: Theodore Ts'o

    Christoph Hellwig
     
  • Pull networking updates from David Miller:

    1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

    2) Add GSO partial support to igc, from Sasha Neftin.

    3) Several cleanups and improvements to r8169 from Heiner Kallweit.

    4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

    5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

    6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

    7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

    8) Add sriov and vf support to hinic, from Luo bin.

    9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

    10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

    11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

    12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

    13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

    14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

    15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

    16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

    17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

    18) Several RISCV bpf jit optimizations, from Luke Nelson.

    19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

    20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

    21) Add BPF iterators, from Yonghang Song.

    22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

    23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

    24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

    25) Add CAP_BPF, from Alexei Starovoitov.

    26) Support terse dumps in the packet scheduler, from Vlad Buslov.

    27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

    28) Add devm_register_netdev(), from Bartosz Golaszewski.

    29) Minimize qdisc resets, from Cong Wang.

    30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
    selftests: net: ip_defrag: ignore EPERM
    net_failover: fixed rollback in net_failover_open()
    Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
    Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
    vmxnet3: allow rx flow hash ops only when rss is enabled
    hinic: add set_channels ethtool_ops support
    selftests/bpf: Add a default $(CXX) value
    tools/bpf: Don't use $(COMPILE.c)
    bpf, selftests: Use bpf_probe_read_kernel
    s390/bpf: Use bcr 0,%0 as tail call nop filler
    s390/bpf: Maintain 8-byte stack alignment
    selftests/bpf: Fix verifier test
    selftests/bpf: Fix sample_cnt shared between two threads
    bpf, selftests: Adapt cls_redirect to call csum_level helper
    bpf: Add csum_level helper for fixing up csum levels
    bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
    sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
    crypto/chtls: IPv6 support for inline TLS
    Crypto/chcr: Fixes a coccinile check error
    Crypto/chcr: Fixes compilations warnings
    ...

    Linus Torvalds
     

03 Jun, 2020

5 commits

  • Pull block updates from Jens Axboe:
    "Core block changes that have been queued up for this release:

    - Remove dead blk-throttle and blk-wbt code (Guoqing)

    - Include pid in blktrace note traces (Jan)

    - Don't spew I/O errors on wouldblock termination (me)

    - Zone append addition (Johannes, Keith, Damien)

    - IO accounting improvements (Konstantin, Christoph)

    - blk-mq hardware map update improvements (Ming)

    - Scheduler dispatch improvement (Salman)

    - Inline block encryption support (Satya)

    - Request map fixes and improvements (Weiping)

    - blk-iocost tweaks (Tejun)

    - Fix for timeout failing with error injection (Keith)

    - Queue re-run fixes (Douglas)

    - CPU hotplug improvements (Christoph)

    - Queue entry/exit improvements (Christoph)

    - Move DMA drain handling to the few drivers that use it (Christoph)

    - Partition handling cleanups (Christoph)"

    * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
    block: mark bio_wouldblock_error() bio with BIO_QUIET
    blk-wbt: rename __wbt_update_limits to wbt_update_limits
    blk-wbt: remove wbt_update_limits
    blk-throttle: remove tg_drain_bios
    blk-throttle: remove blk_throtl_drain
    null_blk: force complete for timeout request
    blk-mq: drain I/O when all CPUs in a hctx are offline
    blk-mq: add blk_mq_all_tag_iter
    blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
    blk-mq: use BLK_MQ_NO_TAG in more places
    blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
    blk-mq: move more request initialization to blk_mq_rq_ctx_init
    blk-mq: simplify the blk_mq_get_request calling convention
    blk-mq: remove the bio argument to ->prepare_request
    nvme: force complete cancelled requests
    blk-mq: blk-mq: provide forced completion method
    block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
    block: blk-crypto-fallback: remove redundant initialization of variable err
    block: reduce part_stat_lock() scope
    block: use __this_cpu_add() instead of access by smp_processor_id()
    ...

    Linus Torvalds
     
  • Merge updates from Andrew Morton:
    "A few little subsystems and a start of a lot of MM patches.

    Subsystems affected by this patch series: squashfs, ocfs2, parisc,
    vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
    swap, memcg, pagemap, memory-failure, vmalloc, kasan"

    * emailed patches from Andrew Morton : (128 commits)
    kasan: move kasan_report() into report.c
    mm/mm_init.c: report kasan-tag information stored in page->flags
    ubsan: entirely disable alignment checks under UBSAN_TRAP
    kasan: fix clang compilation warning due to stack protector
    x86/mm: remove vmalloc faulting
    mm: remove vmalloc_sync_(un)mappings()
    x86/mm/32: implement arch_sync_kernel_mappings()
    x86/mm/64: implement arch_sync_kernel_mappings()
    mm/ioremap: track which page-table levels were modified
    mm/vmalloc: track which page-table levels were modified
    mm: add functions to track page directory modifications
    s390: use __vmalloc_node in stack_alloc
    powerpc: use __vmalloc_node in alloc_vm_stack
    arm64: use __vmalloc_node in arch_alloc_vmap_stack
    mm: remove vmalloc_user_node_flags
    mm: switch the test_vmalloc module to use __vmalloc_node
    mm: remove __vmalloc_node_flags_caller
    mm: remove both instances of __vmalloc_node_flags
    mm: remove the prot argument to __vmalloc_node
    mm: remove the pgprot argument to __vmalloc
    ...

    Linus Torvalds
     
  • Implement the new readahead aop and convert all callers (block_dev,
    exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
    reiserfs & udf).

    The callers are all trivial except for GFS2 & OCFS2.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Reviewed-by: Junxiao Bi # ocfs2
    Reviewed-by: Joseph Qi # ocfs2
    Reviewed-by: Dave Chinner
    Reviewed-by: John Hubbard
    Reviewed-by: Christoph Hellwig
    Reviewed-by: William Kucharski
    Cc: Chao Yu
    Cc: Cong Wang
    Cc: Darrick J. Wong
    Cc: Eric Biggers
    Cc: Gao Xiang
    Cc: Jaegeuk Kim
    Cc: Michal Hocko
    Cc: Zi Yan
    Cc: Johannes Thumshirn
    Cc: Miklos Szeredi
    Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • Usually we create and use a ocfs2 shared volume on the top of ha stack.
    For pcmk based ha stack, which includes DLM, corosync and pacemaker
    services.

    The customers complained they could not mount existent ocfs2 volume in
    the single node without ha stack, e.g. single node backup/restore
    scenario.

    Like this case, the customers just want to access the data from the
    existent ocfs2 volume quickly, but do not want to restart or setup ha
    stack.

    Then, I'd like to add a mount option "nocluster", if the users use this
    option to mount a ocfs2 shared volume, the whole mount will not depend
    on the ha related services. the command will mount the existent ocfs2
    volume directly (like local mount), for avoiding setup the ha stack.

    Signed-off-by: Gang He
    Signed-off-by: Andrew Morton
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc: Changwei Ge
    Cc: Jun Piao
    Link: http://lkml.kernel.org/r/20200423053300.22661-1-ghe@suse.com
    Signed-off-by: Linus Torvalds

    Gang He
     
  • Sparse reports a warning at dlm_empty_lockres()

    warning: context imbalance in dlm_purge_lockres() - unexpected unlock

    The root cause is the missing annotation at dlm_purge_lockres()

    Add the missing __must_hold(&dlm->spinlock)

    Signed-off-by: Jules Irenge
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Link: http://lkml.kernel.org/r/20200403160505.2832-4-jbi.octave@gmail.com
    Signed-off-by: Linus Torvalds

    Jules Irenge
     

02 Jun, 2020

3 commits

  • Pull uaccess/__copy_to_user updates from Al Viro:
    "Getting rid of __copy_to_user() callers - stuff that doesn't fit into
    other series"

    * 'uaccess.__copy_to_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    dlmfs: convert dlmfs_file_read() to copy_to_user()
    esas2r: don't bother with __copy_to_user()

    Linus Torvalds
     
  • Pull uaccess/access_ok updates from Al Viro:
    "Removals of trivially pointless access_ok() calls.

    Note: the fiemap stuff was removed from the series, since they are
    duplicates with part of ext4 series carried in Ted's tree"

    * 'uaccess.access_ok' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vmci_host: get rid of pointless access_ok()
    hfi1: get rid of pointless access_ok()
    usb: get rid of pointless access_ok() calls
    lpfc_debugfs: get rid of pointless access_ok()
    efi_test: get rid of pointless access_ok()
    drm_read(): get rid of pointless access_ok()
    via-pmu: don't bother with access_ok()
    drivers/crypto/ccp/sev-dev.c: get rid of pointless access_ok()
    omapfb: get rid of pointless access_ok() calls
    amifb: get rid of pointless access_ok() calls
    drivers/fpga/dfl-afu-dma-region.c: get rid of pointless access_ok()
    drivers/fpga/dfl-fme-pr.c: get rid of pointless access_ok()
    cm4000_cs.c cmm_ioctl(): get rid of pointless access_ok()
    nvram: drop useless access_ok()
    n_hdlc_tty_read(): remove pointless access_ok()
    tomoyo_write_control(): get rid of pointless access_ok()
    btrfs_ioctl_send(): don't bother with access_ok()
    fat_dir_ioctl(): hadn't needed that access_ok() for more than a decade...
    dlmfs_file_write(): get rid of pointless access_ok()

    Linus Torvalds
     
  • Pull documentation updates from Jonathan Corbet:
    "A fair amount of stuff this time around, dominated by yet another
    massive set from Mauro toward the completion of the RST conversion. I
    *really* hope we are getting close to the end of this. Meanwhile,
    those patches reach pretty far afield to update document references
    around the tree; there should be no actual code changes there. There
    will be, alas, more of the usual trivial merge conflicts.

    Beyond that we have more translations, improvements to the sphinx
    scripting, a number of additions to the sysctl documentation, and lots
    of fixes"

    * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
    Documentation: fixes to the maintainer-entry-profile template
    zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
    tracing: Fix events.rst section numbering
    docs: acpi: fix old http link and improve document format
    docs: filesystems: add info about efivars content
    Documentation: LSM: Correct the basic LSM description
    mailmap: change email for Ricardo Ribalda
    docs: sysctl/kernel: document unaligned controls
    Documentation: admin-guide: update bug-hunting.rst
    docs: sysctl/kernel: document ngroups_max
    nvdimm: fixes to maintainter-entry-profile
    Documentation/features: Correct RISC-V kprobes support entry
    Documentation/features: Refresh the arch support status files
    Revert "docs: sysctl/kernel: document ngroups_max"
    docs: move locking-specific documents to locking/
    docs: move digsig docs to the security book
    docs: move the kref doc into the core-api book
    docs: add IRQ documentation at the core-api book
    docs: debugging-via-ohci1394.txt: add it to the core-api book
    docs: fix references for ipmi.rst file
    ...

    Linus Torvalds
     

29 May, 2020

2 commits

  • Add a helper to directly set the TCP_USER_TIMEOUT sockopt from kernel
    space without going through a fake uaccess.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David S. Miller

    Christoph Hellwig
     
  • Add a helper to directly set the TCP_NODELAY sockopt from kernel space
    without going through a fake uaccess. Cleanup the callers to avoid
    pointless wrappers now that this is a simple function call.

    Signed-off-by: Christoph Hellwig
    Acked-by: Sagi Grimberg
    Acked-by: Jason Gunthorpe
    Signed-off-by: David S. Miller

    Christoph Hellwig
     

22 May, 2020

1 commit


10 May, 2020

1 commit


24 Apr, 2020

2 commits