05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

18 Mar, 2016

1 commit

  • Pull GFS2 updates from Bob Peterson:
    "We only have six patches ready for this merge window:

    - Arnd Bergmann contributed a patch that fixes an uninitialized
    variable warning.

    - The second patch avoids a kernel panic due to referencing an iopen
    glock that may not be held, in an error path.

    - The third patch fixes a rounding error that caused xfs_tests direct
    IO write "fsx" tests to fail on GFS2.

    - The fourth patch tidies up the code path when glocks are being
    reused to recreate a dinode that was recently deleted.

    - The fifth reverts an ages-old patch that should no longer be
    needed, and which interfered with the transition of dinodes from
    unlinked to free.

    - And lastly, a patch to eliminate a function parameter that's not
    needed"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    GFS2: Eliminate parameter non_block on gfs2_inode_lookup
    GFS2: Don't filter out I_FREEING inodes anymore
    GFS2: Prevent delete work from occurring on glocks used for create
    GFS2: Fix direct IO write rounding error
    gfs2: avoid uninitialized variable warning
    GFS2: Check if iopen is held when deleting inode

    Linus Torvalds
     

15 Mar, 2016

5 commits

  • Now that we're not filtering out I_FREEING inodes from our lookups
    anymore, we can eliminate the non_block parameter from the lookup
    function.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     
  • This patch basically reverts a very old patch from 2008,
    7a9f53b3c1875bef22ad4588e818bc046ef183da, with the title
    "Alternate gfs2_iget to avoid looking up inodes being freed".
    The original patch was designed to avoid a deadlock caused by lock
    ordering with try_rgrp_unlink. The patch forced the function to not
    find inodes that were being removed by VFS. The problem is, that
    made it impossible for nodes to delete their own unlinked dinodes
    after a certain point in time, because the inode needed was not found
    by this filtering process. There is no longer a need for the patch,
    since function try_rgrp_unlink no longer locks the inode: All it does
    is queue the glock onto the delete work_queue, so there should be no
    more deadlock.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch tries to prevent delete work (queued via iopen callback)
    from executing if the glock is currently being used to create
    a new inode.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     
  • The fsx test in xfstests was failing because it was using direct IO
    writes which were using a bad calculation. It was using
    loff_t lstart = offset & (PAGE_CACHE_SIZE - 1); when it should be
    loff_t lstart = offset & ~(PAGE_CACHE_SIZE - 1);
    Thus, the write at offset 0x67e00 was calculating lstart to be
    0xe00, the address of our corruption. Instead, it should have been
    0x67000. This patch fixes the calculation.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     
  • We get a bogus warning about a potential uninitialized variable
    use in gfs2, because the compiler does not figure out that we
    never use the leaf number if get_leaf_nr() returns an error:

    fs/gfs2/dir.c: In function 'get_first_leaf':
    fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
    fs/gfs2/dir.c: In function 'dir_split_leaf':
    fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]

    Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is
    sufficient to let gcc understand that this is exactly the same
    condition as in IS_ERR() so it can optimize the code path enough
    to understand it.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Bob Peterson

    Arnd Bergmann
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

18 Jan, 2016

1 commit

  • Pull security subsystem updates from James Morris:

    - EVM gains support for loading an x509 cert from the kernel
    (EVM_LOAD_X509), into the EVM trusted kernel keyring.

    - Smack implements 'file receive' process-based permission checking for
    sockets, rather than just depending on inode checks.

    - Misc enhancments for TPM & TPM2.

    - Cleanups and bugfixes for SELinux, Keys, and IMA.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (41 commits)
    selinux: Inode label revalidation performance fix
    KEYS: refcount bug fix
    ima: ima_write_policy() limit locking
    IMA: policy can be updated zero times
    selinux: rate-limit netlink message warnings in selinux_nlmsg_perm()
    selinux: export validatetrans decisions
    gfs2: Invalid security labels of inodes when they go invalid
    selinux: Revalidate invalid inode security labels
    security: Add hook to invalidate inode security labels
    selinux: Add accessor functions for inode->i_security
    security: Make inode argument of inode_getsecid non-const
    security: Make inode argument of inode_getsecurity non-const
    selinux: Remove unused variable in selinux_inode_init_security
    keys, trusted: seal with a TPM2 authorization policy
    keys, trusted: select hash algorithm for TPM2 chips
    keys, trusted: fix: *do not* allow duplicate key options
    tpm_ibmvtpm: properly handle interrupted packet receptions
    tpm_tis: Tighten IRQ auto-probing
    tpm_tis: Refactor the interrupt setup
    tpm_tis: Get rid of the duplicate IRQ probing code
    ...

    Linus Torvalds
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

14 Jan, 2016

1 commit

  • This patch fixes an error condition in which an inode is partially
    created in gfs2_create_inode() but then some error is discovered,
    which causes it to fail and call iput() before the iopen glock is
    created or held. In that case, gfs2_delete_inode would try to
    unlock an iopen glock that doesn't yet exist. Therefore, we test
    its holder (which must exist) for the HIF_HOLDER bit before trying
    to dq it.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     

13 Jan, 2016

2 commits

  • Pull GFS2 updates from Bob Peterson:
    "Here is a list of patches we've accumulated for GFS2 for the current
    upstream merge window. Last window's set was short, but I warned that
    this one would be bigger, and so it is. We've got 19 patches:

    - A patch from Abhi Das to propagate the GFS2_DIF_SYSTEM bit so that
    newly added journals don't get flagged, deleted, and recreated by
    fsck.gfs2.

    - Two patches from Andreas Gruenbacher to improve GFS2 performance
    where extended attributes are involved.

    - A patch from Andy Price to fix a suspicious rcu dereference error.

    - Two patches from Ben Marzinski that rework how GFS2's NFS cookies
    are managed. This fixes readdir problems with nfs-over-gfs2.

    - A patch from Ben Marzinski that fixes a race in unmounting GFS2.

    - A set of four patches from me to move the resource group
    reservations inside the gfs2 inode to improve performance and fix a
    bug whereby get_write_access improperly prevented some operations
    like chown.

    - A patch from me to spinlock-protect the setting of system statfs
    file data. This was causing small discrepancies between df and du.

    - A patch from me to reintroduce a timeout while clearing glocks
    which was accidentally dropped some time ago.

    - A patch from me to wait for iopen glock dequeues in order to
    improve deleting of files that were unlinked from a different
    cluster node.

    - A patch from me to ensure metadata address spaces get truncated
    when an inode is evicted.

    - A patch from me to fix a bug in which a memory leak could occur in
    some error cases when inodes were trying to be created.

    - A patch to consistently use iopen glocks to transition from the
    unlinked state to the deleted state.

    - A patch to fix a glock reference count error when inode creation
    fails.

    - A patch from Junxiao Bi to fix an flock panic.

    - A patch from Markus Elfring that removes an unnecessary if"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: fix flock panic issue
    GFS2: Don't do glock put on when inode creation fails
    GFS2: Always use iopen glock for gl_deletes
    GFS2: Release iopen glock in gfs2_create_inode error cases
    GFS2: Truncate address space mapping when deleting an inode
    GFS2: Wait for iopen glock dequeues
    gfs2: clear journal live bit in gfs2_log_flush
    gfs2: change gfs2 readdir cookie
    gfs2: keep offset when splitting dir leaf blocks
    GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear
    GFS2: Update master statfs buffer with sd_statfs_spin locked
    GFS2: Reduce size of incore inode
    GFS2: Make rgrp reservations part of the gfs2_inode structure
    GFS2: Extract quota data from reservations structure (revert 5407e24)
    gfs2: Extended attribute readahead optimization
    gfs2: Extended attribute readahead
    GFS2: Use rht_for_each_entry_rcu in glock_hash_walk
    GFS2: Delete an unnecessary check before the function call "iput"
    gfs2: Automatically set GFS2_DIF_SYSTEM flag on system files

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "All kinds of stuff. That probably should've been 5 or 6 separate
    branches, but by the time I'd realized how large and mixed that bag
    had become it had been too close to -final to play with rebasing.

    Some fs/namei.c cleanups there, memdup_user_nul() introduction and
    switching open-coded instances, burying long-dead code, whack-a-mole
    of various kinds, several new helpers for ->llseek(), assorted
    cleanups and fixes from various people, etc.

    One piece probably deserves special mention - Neil's
    lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
    called without ->i_mutex and tries to avoid ever taking it. That, of
    course, means that it's not useful for any directory modifications,
    but things like getting inode attributes in nfds readdirplus are fine
    with that. I really should've asked for moratorium on lookup-related
    changes this cycle, but since I hadn't done that early enough... I
    *am* asking for that for the coming cycle, though - I'm going to try
    and get conversion of i_mutex to rwsem with ->lookup() done under lock
    taken shared.

    There will be a patch closer to the end of the window, along the lines
    of the one Linus had posted last May - mechanical conversion of
    ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
    inode_is_locked()/inode_lock_nested(). To quote Linus back then:

    -----
    | This is an automated patch using
    |
    | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    | with a very few manual fixups
    -----

    I'm going to send that once the ->i_mutex-affecting stuff in -next
    gets mostly merged (or when Linus says he's about to stop taking
    merges)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    nfsd: don't hold i_mutex over userspace upcalls
    fs:affs:Replace time_t with time64_t
    fs/9p: use fscache mutex rather than spinlock
    proc: add a reschedule point in proc_readfd_common()
    logfs: constify logfs_block_ops structures
    fcntl: allow to set O_DIRECT flag on pipe
    fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
    fs: xattr: Use kvfree()
    [s390] page_to_phys() always returns a multiple of PAGE_SIZE
    nbd: use ->compat_ioctl()
    fs: use block_device name vsprintf helper
    lib/vsprintf: add %*pg format specifier
    fs: use gendisk->disk_name where possible
    poll: plug an unused argument to do_poll
    amdkfd: don't open-code memdup_user()
    cdrom: don't open-code memdup_user()
    rsxx: don't open-code memdup_user()
    mtip32xx: don't open-code memdup_user()
    [um] mconsole: don't open-code memdup_user_nul()
    [um] hostaudio: don't open-code memdup_user()
    ...

    Linus Torvalds
     

12 Jan, 2016

1 commit

  • Pull vfs xattr updates from Al Viro:
    "Andreas' xattr cleanup series.

    It's a followup to his xattr work that went in last cycle; -0.5KLoC"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    xattr handlers: Simplify list operation
    ocfs2: Replace list xattr handler operations
    nfs: Move call to security_inode_listsecurity into nfs_listxattr
    xfs: Change how listxattr generates synthetic attributes
    tmpfs: listxattr should include POSIX ACL xattrs
    tmpfs: Use xattr handler infrastructure
    btrfs: Use xattr handler infrastructure
    vfs: Distinguish between full xattr names and proper prefixes
    posix acls: Remove duplicate xattr name definitions
    gfs2: Remove gfs2_xattr_acl_chmod
    vfs: Remove vfs_xattr_cmp

    Linus Torvalds
     

07 Jan, 2016

1 commit


31 Dec, 2015

1 commit


25 Dec, 2015

1 commit

  • When gfs2 releases the glock of an inode, it must invalidate all
    information cached for that inode, including the page cache and acls.
    Use the new security_inode_invalidate_secctx hook to also invalidate
    security labels in that case. These items will be reread from disk
    when needed after reacquiring the glock.

    Signed-off-by: Andreas Gruenbacher
    Acked-by: Bob Peterson
    Acked-by: Steven Whitehouse
    Cc: cluster-devel@redhat.com
    [PM: fixed spelling errors and description line lengths]
    Signed-off-by: Paul Moore

    Andreas Gruenbacher
     

22 Dec, 2015

1 commit

  • Commit 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
    moved flock/posix lock identify code to locks_lock_inode_wait(), but
    missed to set fl_flags to FL_FLOCK which will cause kernel panic in
    locks_lock_inode_wait().

    Fixes: 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
    Signed-off-by: Junxiao Bi
    Signed-off-by: Bob Peterson

    Junxiao Bi
     

19 Dec, 2015

5 commits

  • Currently the error path of function gfs2_inode_lookup calls function
    gfs2_glock_put corresponding to an earlier call to gfs2_glock_get for
    the inode glock. That's wrong because the error path also calls
    iget_failed() which eventually calls iput, which eventually calls
    gfs2_evict_inode, which does another gfs2_glock_put. This double-put
    can cause the glock reference count to get off.

    Signed-off-by: Bob Peterson

    Bob Peterson
     
  • Before this patch, when function try_rgrp_unlink queued a glock for
    delete_work to reclaim the space, it used the inode glock to do so.
    That's different from the iopen callback which uses the iopen glock
    for the same purpose. We should be consistent and always use the
    iopen glock. This may also save us reference counting problems with
    the inode glock, since clear_glock does an extra glock_put() for the
    inode glock.

    Signed-off-by: Bob Peterson

    Bob Peterson
     
  • Some error cases in gfs2_create_inode were not unlocking the iopen
    glock, getting the reference count off. This adds the proper unlock.
    The error logic in function gfs2_create_inode was also convoluted,
    so this patch simplifies it. It also takes care of a bug in
    which gfs2_qa_delete() was not called in an error case.

    Signed-off-by: Bob Peterson

    Bob Peterson
     
  • In function gfs2_delete_inode() we write and flush the mapping for
    a glock, among other things. We truncate the mapping for the inode,
    but we never truncate the mapping for the glock. This patch makes it
    also truncate the metamapping. This avoid cases where the glock is
    reused by another process who is trying to recreate an inode in its
    place using the same block.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     
  • This patch changes every glock_dq for iopen glocks into a dq_wait.
    This makes sure that iopen glocks do not outlive the inode itself.
    In turn, that ensures that anyone trying to unlink the glock will
    be able to find the inode when it receives a remote iopen callback.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     

15 Dec, 2015

7 commits

  • When gfs2 was unmounting filesystems or changing them to read-only it
    was clearing the SDF_JOURNAL_LIVE bit before the final log flush. This
    caused a race. If an inode glock got demoted in the gap between
    clearing the bit and the shutdown flush, it would be unable to reserve
    log space to clear out the active items list in inode_go_sync, causing an
    error in inode_go_inval because the glock was still dirty.

    To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the
    shutdown log flush. This means that, because of the locking on the log
    blocks, either inode_go_sync will be able to reserve space to clean the
    glock before the shutdown flush, or the shutdown flush will clean the
    glock itself, before inode_go_sync fails to reserve the space. Either
    way, the glock will be clean before inode_go_inval.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Bob Peterson

    Benjamin Marzinski
     
  • gfs2 currently returns 31 bits of filename hash as a cookie that readdir
    uses for an offset into the directory. When there are a large number of
    directory entries, the likelihood of a collision goes up way too
    quickly. GFS2 will now return cookies that are guaranteed unique for a
    while, and then fail back to using 30 bits of filename hash.
    Specifically, the directory leaf blocks are divided up into chunks based
    on the minimum size of a gfs2 directory entry (48 bytes). Each entry's
    cookie is based off the chunk where it starts, in the linked list of
    leaf blocks that it hashes to (there are 131072 hash buckets). Directory
    entries will have unique names until they take reach chunk 8192.
    Assuming the largest filenames possible, and the least efficient spacing
    possible, this new method will still be able to return unique names when
    the previous method has statistically more than a 99% chance of a
    collision. The non-unique names it fails back to are guaranteed to not
    collide with the unique names.

    unique cookies will be in this format:
    - 1 bit "0" to make sure the the returned cookie is positive
    - 17 bits for the hash table index
    - 1 bit for the mode "0"
    - 13 bits for the offset

    non-unique cookies will be in this format:
    - 1 bit "0" to make sure the the returned cookie is positive
    - 17 bits for the hash table index
    - 1 bit for the mode "1"
    - 13 more bits of the name hash

    Another benefit of location based cookies, is that once a directory's
    exhash table is fully extended (so that multiple hash table indexs do
    not use the same leaf blocks), gfs2 can skip sorting the directory
    entries until it reaches the non-unique ones, and then it only needs to
    sort these. This provides a significant speed up for directory reads of
    very large directories.

    The only issue is that for these cookies to continue to point to the
    correct entry as files are added and removed from the directory, gfs2
    must keep the entries at the same offset in the leaf block when they are
    split (see my previous patch). This means that until all the nodes in a
    cluster are running with code that will split the directory leaf blocks
    this way, none of the nodes can use the new cookie code. To deal with
    this, gfs2 now has the mount option loccookie, which, if set, will make
    it return these new location based cookies. This option must not be set
    until all nodes in the cluster are at least running this version of the
    kernel code, and you have guaranteed that there are no outstanding
    cookies required by other software, such as NFS.

    gfs2 uses some of the extra space at the end of the gfs2_dirent
    structure to store the calculated readdir cookies. This keeps us from
    needing to allocate a seperate array to hold these values. gfs2
    recomputes the cookie stored in de_cookie for every readdir call. The
    time it takes to do so is small, and if gfs2 expected this value to be
    saved on disk, the new code wouldn't work correctly on filesystems
    created with an earlier version of gfs2.

    One issue with adding de_cookie to the union in the gfs2_dirent
    structure is that it caused the union to align itself to a 4 byte
    boundary, instead of its previous 2 byte boundary. This changed the
    offset of de_rahead. To solve that, I pulled de_rahead out of the union,
    since it does not need to be there.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Bob Peterson

    Benjamin Marzinski
     
  • Currently, when gfs2 splits a directory leaf block, the dirents that
    need to be copied to the new leaf block are packed into the start of it.
    This is good for space efficiency. However, if gfs2 were to copy those
    dirents into the exact same offset in the new leaf block as they had in
    the old block, it would be able to generate a readdir cookie based on
    the dirent location, that would be guaranteed to be unique up well past
    where the current code is statistically almost guaranteed to have
    collisions. So, gfs2 now keeps the dirent's offset in the block the
    same when it copies it to the new leaf block.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Bob Peterson

    Benjamin Marzinski
     
  • At some point in the past, we used to have a timeout when GFS2 was
    unmounting, trying to clear out its glocks. If the timeout expires,
    it would dump the remaining glocks to the kernel messages so that
    developers can debug the problem. That timeout was eliminated,
    probably by accident. This patch reintroduces it.

    Signed-off-by: Bob Peterson

    Bob Peterson
     
  • Before this patch, function update_statfs called gfs2_statfs_change_out
    to update the master statfs buffer without the sd_statfs_spin held.
    In theory, another process could call gfs2_statfs_sync, which takes
    the sd_statfs_spin lock and re-reads m_sc from the buffer. So there's
    a theoretical timing window in which one process could write the
    master statfs buffer, then another comes along and re-reads it, wiping
    out the changes.

    Signed-off-by: Bob Peterson

    Bob Peterson
     
  • This patch makes no functional changes. Its goal is to reduce the
    size of the gfs2 inode in memory by rearranging structures and
    changing the size of some variables within the structure.

    Signed-off-by: Bob Peterson

    Bob Peterson
     
  • Before this patch, multi-block reservation structures were allocated
    from a special slab. This patch folds the structure into the gfs2_inode
    structure. The disadvantage is that the gfs2_inode needs more memory,
    even when a file is opened read-only. The advantages are: (a) we don't
    need the special slab and the extra time it takes to allocate and
    deallocate from it. (b) we no longer need to worry that the structure
    exists for things like quota management. (c) This also allows us to
    remove the calls to get_write_access and put_write_access since we
    know the structure will exist.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

09 Dec, 2015

1 commit

  • new method: ->get_link(); replacement of ->follow_link(). The differences
    are:
    * inode and dentry are passed separately
    * might be called both in RCU and non-RCU mode;
    the former is indicated by passing it a NULL dentry.
    * when called that way it isn't allowed to block
    and should return ERR_PTR(-ECHILD) if it needs to be called
    in non-RCU mode.

    It's a flagday change - the old method is gone, all in-tree instances
    converted. Conversion isn't hard; said that, so far very few instances
    do not immediately bail out when called in RCU mode. That'll change
    in the next commits.

    Signed-off-by: Al Viro

    Al Viro
     

07 Dec, 2015

2 commits


24 Nov, 2015

1 commit

  • This patch basically reverts the majority of patch 5407e24.
    That patch eliminated the gfs2_qadata structure in favor of just
    using the reservations structure. The problem with doing that is that
    it increases the size of the reservations structure. That is not an
    issue until it comes time to fold the reservations structure into the
    inode in memory so we know it's always there. By separating out the
    quota structure again, we aren't punishing the non-quota users by
    making all the inodes bigger, requiring more slab space. This patch
    creates a new slab area to allocate the quota stuff so it's managed
    a little more sanely.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

19 Nov, 2015

1 commit

  • Instead of submitting a READ_SYNC bio for the inode and a READA bio for
    the inode's extended attributes through submit_bh, submit a single READ_SYNC
    bio for both through submit_bio when possible. This can be more
    efficient on some kinds of block devices.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

17 Nov, 2015

3 commits

  • When gfs2 allocates an inode and its extended attribute block next to
    each other at inode create time, the inode's directory entry indicates
    that in de_rahead. In that case, we can readahead the extended
    attribute block when we read in the inode.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     
  • This lockdep splat was being triggered on umount:

    [55715.973122] ===============================
    [55715.980169] [ INFO: suspicious RCU usage. ]
    [55715.981021] 4.3.0-11553-g8d3de01-dirty #15 Tainted: G W
    [55715.982353] -------------------------------
    [55715.983301] fs/gfs2/glock.c:1427 suspicious rcu_dereference_protected() usage!

    The code it refers to is the rht_for_each_entry_safe usage in
    glock_hash_walk. The condition that triggers the warning is
    lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
    __rcu_dereference_protected macro.

    The rhashtable buckets are not changed in glock_hash_walk so it's safe
    to rely on the rcu protection. Replace the rht_for_each_entry_safe()
    usage with rht_for_each_entry_rcu(), which doesn't care whether the
    bucket lock is held if the rcu read lock is held.

    Signed-off-by: Andrew Price
    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Andrew Price
     
  • The iput() function tests whether its argument is NULL and then
    returns immediately. Thus the test around the call is not needed.

    This issue was detected by using the Coccinelle software.

    Signed-off-by: Markus Elfring
    Signed-off-by: Bob Peterson

    Markus Elfring
     

14 Nov, 2015

2 commits

  • Pull vfs xattr cleanups from Al Viro.

    * 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    f2fs: xattr simplifications
    squashfs: xattr simplifications
    9p: xattr simplifications
    xattr handlers: Pass handler to operations instead of flags
    jffs2: Add missing capability check for listing trusted xattrs
    hfsplus: Remove unused xattr handler list operations
    ubifs: Remove unused security xattr handler
    vfs: Fix the posix_acl_xattr_list return value
    vfs: Check attribute names in posix acl xattr handers

    Linus Torvalds
     
  • The xattr_handler operations are currently all passed a file system
    specific flags value which the operations can use to disambiguate between
    different handlers; some file systems use that to distinguish the xattr
    namespace, for example. In some oprations, it would be useful to also have
    access to the handler prefix. To allow that, pass a pointer to the handler
    to operations instead of the flags value alone.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher