29 Apr, 2018

2 commits

  • Pull ext4 fixes from Ted Ts'o:
    "Fix misc bugs and a regression for ext4"

    * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs
    ext4: fix bitmap position validation
    ext4: set h_journal if there is a failure starting a reserved handle
    ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS

    Linus Torvalds
     
  • Pull cifs fixes from Steve French:
    "A few security related fixes for SMB3, most importantly for SMB3.11
    encryption"

    * tag '4.17-rc2-smb3' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: smbd: Avoid allocating iov on the stack
    cifs: smbd: Don't use RDMA read/write when signing is used
    SMB311: Fix reconnect
    SMB3: Fix 3.11 encryption to Windows and handle encrypted smb3 tcon
    CIFS: set *resp_buf_type to NO_BUFFER on error

    Linus Torvalds
     

26 Apr, 2018

4 commits

  • Fixes: a45403b51582 ("ext4: always initialize the crc32c checksum driver")
    Reported-by: François Valenduc
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • It's not necessary to allocate another iov when going through the buffers
    in smbd_send() through RDMA send.

    Remove it to reduce stack size.

    Thanks to Matt for spotting a printk typo in the earlier version of this.

    CC: Matt Redfearn
    Signed-off-by: Long Li
    Acked-by: Ronnie Sahlberg
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve French

    Long Li
     
  • SMB server will not sign data transferred through RDMA read/write. When
    signing is used, it's a good idea to have all the data signed.

    In this case, use RDMA send/recv for all data transfers. This will degrade
    performance as this is not generally configured in RDMA environemnt. So
    warn the user on signing and RDMA send/recv.

    Signed-off-by: Long Li
    Acked-by: Ronnie Sahlberg
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve French

    Long Li
     
  • The preauth hash was not being recalculated properly on reconnect
    of SMB3.11 dialect mounts (which caused access denied repeatedly
    on auto-reconnect).

    Fixes: 8bd68c6e47ab ("CIFS: implement v3.11 preauth integrity")

    Signed-off-by: Steve French
    CC: Stable
    Reviewed-by: Ronnie Sahlberg

    Steve French
     

24 Apr, 2018

3 commits

  • Currently in ext4_valid_block_bitmap() we expect the bitmap to be
    positioned anywhere between 0 and s_blocksize clusters, but that's
    wrong because the bitmap can be placed anywhere in the block group. This
    causes false positives when validating bitmaps on perfectly valid file
    system layouts. Fix it by checking whether the bitmap is within the group
    boundary.

    The problem can be reproduced using the following

    mkfs -t ext3 -E stride=256 /dev/vdb1
    mount /dev/vdb1 /mnt/test
    cd /mnt/test
    wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz
    tar xf linux-4.16.3.tar.xz

    This will result in the warnings in the logs

    EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap

    [ Changed slightly for clarity and to not drop a overflow test -- TYT ]

    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Reported-by: Ilya Dryomov
    Fixes: 7dac4a1726a9 ("ext4: add validity checks for bitmap block numbers")
    Cc: stable@vger.kernel.org

    Lukas Czerner
     
  • Temporarily disable AES-GCM, as AES-CCM is only currently
    enabled mechanism on client side. This fixes SMB3.11
    encrypted mounts to Windows.

    Also the tree connect request itself should be encrypted if
    requested encryption ("seal" on mount), in addition we should be
    enabling encryption in 3.11 based on whether we got any valid
    encryption ciphers back in negprot (the corresponding session flag is
    not set as it is in 3.0 and 3.02)

    Signed-off-by: Steve French
    Reviewed-by: Pavel Shilovsky
    Reviewed-by: Ronnie Sahlberg
    CC: Stable

    Steve French
     
  • Dan Carpenter had pointed this out a while ago, but the code around
    this had changed so wasn't causing any problems since that field
    was not used in this error path.

    Still, it is cleaner to always initialize this field, so changing
    the error path to set it.

    Reviewed-by: Ronnie Sahlberg
    CC: Dan Carpenter
    Signed-off-by: Steve French

    Steve French
     

23 Apr, 2018

3 commits

  • If mds does not, return -EOPNOTSUPP.

    Link: http://tracker.ceph.com/issues/23491
    Signed-off-by: "Yan, Zheng"
    Signed-off-by: Ilya Dryomov

    Yan, Zheng
     
  • Pull cifs fixes from Steve French:
    "Various SMB3/CIFS fixes.

    There are three more security related fixes in progress that are not
    included in this set but they are still being tested and reviewed, so
    sending this unrelated set of smaller fixes now"

    * tag '4.17-rc1-SMB3-CIFS' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: fix typo in cifs_dbg
    cifs: do not allow creating sockets except with SMB1 posix exensions
    cifs: smbd: Dump SMB packet when configured
    cifs: smbd: Check for iov length on sending the last iov
    fs: cifs: Adding new return type vm_fault_t
    cifs: smb2ops: Fix NULL check in smb2_query_symlink

    Linus Torvalds
     
  • Pull btrfs fixes from David Sterba:
    "This contains a few fixups to the qgroup patches that were merged this
    dev cycle, unaligned access fix, blockgroup removal corner case fix
    and a small debugging output tweak"

    * tag 'for-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    btrfs: print-tree: debugging output enhancement
    btrfs: Fix race condition between delayed refs and blockgroup removal
    btrfs: fix unaligned access in readdir
    btrfs: Fix wrong btrfs_delalloc_release_extents parameter
    btrfs: delayed-inode: Remove wrong qgroup meta reservation calls
    btrfs: qgroup: Use independent and accurate per inode qgroup rsv
    btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT

    Linus Torvalds
     

21 Apr, 2018

16 commits

  • Commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") is
    printing spurious messages under memory pressure due to map_addr == -ENOMEM.

    9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
    14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
    16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

    Complain only if -EEXIST, and use %px for printing the address.

    Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
    Fixes: 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Cc: Andrei Vagin
    Cc: Khalid Aziz
    Cc: Michael Ellerman
    Cc: Kees Cook
    Cc: Abdul Haleem
    Cc: Joel Stanley
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR
    API") changed last field of /proc/loadavg (last pid allocated) to be off
    by one:

    # unshare -p -f --mount-proc cat /proc/loadavg
    0.00 0.00 0.00 1/60 2
    Cc: "Eric W. Biederman"
    Cc: Gargi Sharma
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • task_dump_owner() has the following code:

    mm = task->mm;
    if (mm) {
    if (get_dumpable(mm) != SUID_DUMP_USER) {
    uid = ...
    }
    }

    Check for ->mm is buggy -- kernel thread might be borrowing mm
    and inode will go to some random uid:gid pair.

    Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2
    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The autofs file system mkdir inode operation blindly sets the created
    directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
    cause selinux dac_override denials.

    But the function also checks if the caller is the daemon (as no-one else
    should be able to do anything here) so there's no point in not honouring
    the passed in mode, allowing the daemon to set appropriate mode when
    required.

    Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net
    Signed-off-by: Ian Kent
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
    the page's memcg is undergoing move accounting, which occurs when a
    process leaves its memcg for a new one that has
    memory.move_charge_at_immigrate set.

    unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
    the given inode is switching writeback domains. Switches occur when
    enough writes are issued from a new domain.

    This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);

    If both inode switch and process memcg migration are both in-flight then
    unlocked_inode_to_wb_end() will unconditionally enable interrupts while
    still holding the lock_page_memcg() irq spinlock. This suggests the
    possibility of deadlock if an interrupt occurs before unlock_page_memcg().

    truncate
    __cancel_dirty_page
    lock_page_memcg
    unlocked_inode_to_wb_begin
    unlocked_inode_to_wb_end


    end_page_writeback
    test_clear_page_writeback
    lock_page_memcg

    unlock_page_memcg

    Due to configuration limitations this deadlock is not currently possible
    because we don't mix cgroup writeback (a cgroupv2 feature) and
    memory.move_charge_at_immigrate (a cgroupv1 feature).

    If the kernel is hacked to always claim inode switching and memcg
    moving_account, then this script triggers lockup in less than a minute:

    cd /mnt/cgroup/memory
    mkdir a b
    echo 1 > a/memory.move_charge_at_immigrate
    echo 1 > b/memory.move_charge_at_immigrate
    (
    echo $BASHPID > a/cgroup.procs
    while true; do
    dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
    ) &
    while true; do
    sync
    done &
    sleep 1h &
    SLEEP=$!
    while true; do
    echo $SLEEP > a/cgroup.procs
    echo $SLEEP > b/cgroup.procs
    done

    The deadlock does not seem possible, so it's debatable if there's any
    reason to modify the kernel. I suggest we should to prevent future
    surprises. And Wang Long said "this deadlock occurs three times in our
    environment", so there's more reason to apply this, even to stable.
    Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch
    see "[PATCH for-4.4] writeback: safer lock nesting"
    https://lkml.org/lkml/2018/4/11/146

    Wang Long said "this deadlock occurs three times in our environment"

    [gthelen@google.com: v4]
    Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
    [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
    Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
    Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
    Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
    Signed-off-by: Greg Thelen
    Reported-by: Wang Long
    Acked-by: Wang Long
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Johannes Weiner
    Cc: Tejun Heo
    Cc: Nicholas Piggin
    Cc: [v4.2+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     
  • The swap offset reported by /proc//pagemap may be not correct for
    PMD migration entries. If addr passed into pagemap_pmd_range() isn't
    aligned with PMD start address, the swap offset reported doesn't
    reflect this. And in the loop to report information of each sub-page,
    the swap offset isn't increased accordingly as that for PFN.

    This may happen after opening /proc//pagemap and seeking to a page
    whose address doesn't align with a PMD start address. I have verified
    this with a simple test program.

    BTW: migration swap entries have PFN information, do we need to restrict
    whether to show them?

    [akpm@linux-foundation.org: fix typo, per Huang, Ying]
    Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com
    Signed-off-by: "Huang, Ying"
    Cc: Michal Hocko
    Cc: "Kirill A. Shutemov"
    Cc: Andrei Vagin
    Cc: Dan Williams
    Cc: "Jerome Glisse"
    Cc: Daniel Colascione
    Cc: Zi Yan
    Cc: Naoya Horiguchi
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Signed-off-by: Aurelien Aptel
    Signed-off-by: Steve French
    Reported-by: Long Li

    Aurelien Aptel
     
  • RHBZ: 1453123

    Since at least the 3.10 kernel and likely a lot earlier we have
    not been able to create unix domain sockets in a cifs share
    when mounted using the SFU mount option (except when mounted
    with the cifs unix extensions to Samba e.g.)
    Trying to create a socket, for example using the af_unix command from
    xfstests will cause :
    BUG: unable to handle kernel NULL pointer dereference at 00000000
    00000040

    Since no one uses or depends on being able to create unix domains sockets
    on a cifs share the easiest fix to stop this vulnerability is to simply
    not allow creation of any other special files than char or block devices
    when sfu is used.

    Added update to Ronnie's patch to handle a tcon link leak, and
    to address a buf leak noticed by Gustavo and Colin.

    Acked-by: Gustavo A. R. Silva
    CC: Colin Ian King
    Reviewed-by: Pavel Shilovsky
    Reported-by: Eryu Guan
    Signed-off-by: Ronnie Sahlberg
    Signed-off-by: Steve French
    Cc: stable@vger.kernel.org

    Steve French
     
  • When sending through SMB Direct, also dump the packet in SMB send path.

    Also fixed a typo in debug message.

    Signed-off-by: Long Li
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve French
    Reviewed-by: Ronnie Sahlberg

    Long Li
     
  • This patch enhances the following things:

    - tree block header
    * add generation and owner output for node and leaf
    - node pointer generation output
    - allow btrfs_print_tree() to not follow nodes
    * just like btrfs-progs

    Please note that, although function btrfs_print_tree() is not called by
    anyone right now, it's still a pretty useful function to debug kernel.
    So that function is still kept for later use.

    Signed-off-by: Qu Wenruo
    Reviewed-by: Lu Fengqi
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • When the delayed refs for a head are all run, eventually
    cleanup_ref_head is called which (in case of deletion) obtains a
    reference for the relevant btrfs_space_info struct by querying the bg
    for the range. This is problematic because when the last extent of a
    bg is deleted a race window emerges between removal of that bg and the
    subsequent invocation of cleanup_ref_head. This can result in cache being null
    and either a null pointer dereference or assertion failure.

    task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
    RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
    RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
    RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
    RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
    R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
    R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
    FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
    btrfs_run_delayed_refs+0x68/0x250 [btrfs]
    btrfs_should_end_transaction+0x42/0x60 [btrfs]
    btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
    btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
    evict+0xc6/0x190
    do_unlinkat+0x19c/0x300
    do_syscall_64+0x74/0x140
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    RIP: 0033:0x7fbf589c57a7

    To fix this, introduce a new flag "is_system" to head_ref structs,
    which is populated at insertion time. This allows to decouple the
    querying for the spaceinfo from querying the possibly deleted bg.

    Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
    CC: stable@vger.kernel.org # 4.14+
    Suggested-by: Omar Sandoval
    Signed-off-by: Nikolay Borisov
    Reviewed-by: Omar Sandoval
    Signed-off-by: David Sterba

    Nikolay Borisov
     
  • In do_mount() when the MS_* flags are being converted to MNT_* flags,
    MS_RDONLY got accidentally convered to SB_RDONLY.

    Undo this change.

    Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • AFS server records get removed from the net->fs_servers tree when
    they're deleted, but not from the net->fs_addresses{4,6} lists, which
    can lead to an oops in afs_find_server() when a server record has been
    removed, for instance during rmmod.

    Fix this by deleting the record from the by-address lists before posting
    it for RCU destruction.

    The reason this hasn't been noticed before is that the fileserver keeps
    probing the local cache manager, thereby keeping the service record
    alive, so the oops would only happen when a fileserver eventually gets
    bored and stops pinging or if the module gets rmmod'd and a call comes
    in from the fileserver during the window between the server records
    being destroyed and the socket being closed.

    The oops looks something like:

    BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
    ...
    Workqueue: kafsd afs_process_async_call [kafs]
    RIP: 0010:afs_find_server+0x271/0x36f [kafs]
    ...
    Call Trace:
    afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
    afs_deliver_to_call+0x1ee/0x5e8 [kafs]
    afs_process_async_call+0x5b/0xd0 [kafs]
    process_one_work+0x2c2/0x504
    worker_thread+0x1d4/0x2ac
    kthread+0x11f/0x127
    ret_from_fork+0x24/0x30

    Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Pull vfs fixes from Al Viro:
    "Assorted fixes.

    Some of that is only a matter with fault injection (broken handling of
    small allocation failure in various mount-related places), but the
    last one is a root-triggerable stack overflow, and combined with
    userns it gets really nasty ;-/"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    Don't leak MNT_INTERNAL away from internal mounts
    mm,vmscan: Allow preallocating memory for register_shrinker().
    rpc_pipefs: fix double-dput()
    orangefs_kill_sb(): deal with allocation failures
    jffs2_kill_sb(): deal with failed allocations
    hypfs_kill_super(): deal with failed allocations

    Linus Torvalds
     
  • …/git/tyhicks/ecryptfs

    Pull eCryptfs fixes from Tyler Hicks:
    "Minor cleanups and a bug fix to completely ignore unencrypted
    filenames in the lower filesystem when filename encryption is enabled
    at the eCryptfs layer"

    * tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    eCryptfs: don't pass up plaintext names when using filename encryption
    ecryptfs: fix spelling mistake: "cadidate" -> "candidate"
    ecryptfs: lookup: Don't check if mount_crypt_stat is NULL

    Linus Torvalds
     
  • - isofs memory leak fix

    - two fsnotify fixes of event mask handling

    - udf fix of UTF-16 handling

    - couple other smaller cleanups

    * tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    udf: Fix leak of UTF-16 surrogates into encoded strings
    fs: ext2: Adding new return type vm_fault_t
    isofs: fix potential memory leak in mount option parsing
    MAINTAINERS: add an entry for FSNOTIFY infrastructure
    fsnotify: fix typo in a comment about mark->g_list
    fsnotify: fix ignore mask logic in send_to_group()
    isofs compress: Remove VLA usage
    fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
    fanotify: fix logic of events on child

    Linus Torvalds
     

20 Apr, 2018

1 commit

  • We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
    their copies. As it is, creating a deep stack of bindings of /proc/*/ns/*
    somewhere in a new namespace and exiting yields a stack overflow.

    Cc: stable@kernel.org
    Reported-by: Alexander Aring
    Bisected-by: Kirill Tkhai
    Tested-by: Kirill Tkhai
    Tested-by: Alexander Aring
    Signed-off-by: Al Viro

    Al Viro
     

19 Apr, 2018

2 commits

  • When sending the last iov that breaks into smaller buffers to fit the
    transfer size, it's necessary to check if this is the last iov.

    If this is the latest iov, stop and proceed to send pages.

    Signed-off-by: Long Li
    Cc: stable@vger.kernel.org
    Signed-off-by: Steve French
    Reviewed-by: Ronnie Sahlberg

    Long Li
     
  • The last update to readdir introduced a temporary buffer to store the
    emitted readdir data, but as there are file names of variable length,
    there's a lot of unaligned access.

    This was observed on a sparc64 machine:

    Kernel unaligned access at TPC[102f3080] btrfs_real_readdir+0x51c/0x718 [btrfs]

    Fixes: 23b5ec74943 ("btrfs: fix readdir deadlock with pagefault")
    CC: stable@vger.kernel.org # 4.14+
    Reported-and-tested-by: René Rebe
    Reviewed-by: Liu Bo
    Signed-off-by: David Sterba

    David Sterba
     

18 Apr, 2018

8 commits

  • If ext4 tries to start a reserved handle via
    jbd2_journal_start_reserved(), and the journal has been aborted, this
    can result in a NULL pointer dereference. This is because the fields
    h_journal and h_transaction in the handle structure share the same
    memory, via a union, so jbd2_journal_start_reserved() will clear
    h_journal before calling start_this_handle(). If this function fails
    due to an aborted handle, h_journal will still be NULL, and the call
    to jbd2_journal_free_reserved() will pass a NULL journal to
    sub_reserve_credits().

    This can be reproduced by running "kvm-xfstests -c dioread_nolock
    generic/475".

    Cc: stable@kernel.org # 3.11
    Fixes: 8f7d89f36829b ("jbd2: transaction reservation support")
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Andreas Dilger
    Reviewed-by: Jan Kara

    Theodore Ts'o
     
  • Commit 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type
    for delalloc") merged into mainline is not the latest version submitted
    to mail list in Dec 2017.

    It has a fatal wrong @qgroup_free parameter, which results increasing
    qgroup metadata pertrans reserved space, and causing a lot of early EDQUOT.

    Fix it by applying the correct diff on top of current branch.

    Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
    Signed-off-by: Qu Wenruo
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • Commit 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for
    delayed inode and item") merged into mainline was not latest version
    submitted to the mail list in Dec 2017.

    Which lacks the following fixes:

    1) Remove btrfs_qgroup_convert_reserved_meta() call in
    btrfs_delayed_item_release_metadata()
    2) Remove btrfs_qgroup_reserve_meta_prealloc() call in
    btrfs_delayed_inode_reserve_metadata()

    Those fixes will resolve unexpected EDQUOT problems.

    Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item")
    Signed-off-by: Qu Wenruo
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • Unlike reservation calculation used in inode rsv for metadata, qgroup
    doesn't really need to care about things like csum size or extent usage
    for the whole tree COW.

    Qgroups care more about net change of the extent usage.
    That's to say, if we're going to insert one file extent, it will mostly
    find its place in COWed tree block, leaving no change in extent usage.
    Or causing a leaf split, resulting in one new net extent and increasing
    qgroup number by nodesize.
    Or in an even more rare case, increase the tree level, increasing qgroup
    number by 2 * nodesize.

    So here instead of using the complicated calculation for extent
    allocator, which cares more about accuracy and no error, qgroup doesn't
    need that over-estimated reservation.

    This patch will maintain 2 new members in btrfs_block_rsv structure for
    qgroup, using much smaller calculation for qgroup rsv, reducing false
    EDQUOT.

    Signed-off-by: David Sterba
    Signed-off-by: Qu Wenruo

    Qu Wenruo
     
  • Unlike previous method that tries to commit transaction inside
    qgroup_reserve(), this time we will try to commit transaction using
    fs_info->transaction_kthread to avoid nested transaction and no need to
    worry about locking context.

    Since it's an asynchronous function call and we won't wait for
    transaction commit, unlike previous method, we must call it before we
    hit the qgroup limit.

    So this patch will use the ratio and size of qgroup meta_pertrans
    reservation as indicator to check if we should trigger a transaction
    commit. (meta_prealloc won't be cleaned in transaction committ, it's
    useless anyway)

    Signed-off-by: Qu Wenruo
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • OSTA UDF specification does not mention whether the CS0 charset in case
    of two bytes per character encoding should be treated in UTF-16 or
    UCS-2. The sample code in the standard does not treat UTF-16 surrogates
    in any special way but on systems such as Windows which work in UTF-16
    internally, filenames would be treated as being in UTF-16 effectively.
    In Linux it is more difficult to handle characters outside of Base
    Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
    characters only. Just make sure we don't leak UTF-16 surrogates into the
    resulting string when loading names from the filesystem for now.

    CC: stable@vger.kernel.org # >= v4.6
    Reported-by: Mingye Wang
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Use new return type vm_fault_t for page_mkwrite
    handler.

    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Signed-off-by: Steve French

    Souptick Joarder
     
  • The current code null checks variable err_buf, which is always null
    when it is checked, hence utf16_path is free'd and the function
    returns -ENOENT everytime it is called, making it impossible for the
    execution path to reach the following code:

    err_buf = err_iov.iov_base;

    Fix this by null checking err_iov.iov_base instead of err_buf. Also,
    notice that err_buf no longer needs to be initialized to NULL.

    Addresses-Coverity-ID: 1467876 ("Logically dead code")
    Fixes: 2d636199e400 ("cifs: Change SMB2_open to return an iov for the error parameter")
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Steve French
    Reviewed-by: Pavel Shilovsky

    Gustavo A. R. Silva
     

17 Apr, 2018

1 commit

  • Both ecryptfs_filldir() and ecryptfs_readlink_lower() use
    ecryptfs_decode_and_decrypt_filename() to translate lower filenames to
    upper filenames. The function correctly passes up lower filenames,
    unchanged, when filename encryption isn't in use. However, it was also
    passing up lower filenames when the filename wasn't encrypted or
    when decryption failed. Since 88ae4ab9802e, eCryptfs refuses to lookup
    lower plaintext names when filename encryption is enabled so this
    resulted in a situation where userspace would see lower plaintext
    filenames in calls to getdents(2) but then not be able to lookup those
    filenames.

    An example of this can be seen when enabling filename encryption on an
    eCryptfs mount at the root directory of an Ext4 filesystem:

    $ ls -1i /lower
    12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
    11 lost+found
    $ ls -1i /upper
    ls: cannot access '/upper/lost+found': No such file or directory
    ? lost+found
    12 test

    With this change, the lower lost+found dentry is ignored:

    $ ls -1i /lower
    12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
    11 lost+found
    $ ls -1i /upper
    12 test

    Additionally, some potentially noisy error/info messages in the related
    code paths are turned into debug messages so that the logs can't be
    easily filled.

    Fixes: 88ae4ab9802e ("ecryptfs_lookup(): try either only encrypted or plaintext name")
    Reported-by: Guenter Roeck
    Cc: Al Viro
    Signed-off-by: Tyler Hicks

    Tyler Hicks