26 Jun, 2018

2 commits

  • commit 7f54910fa8dfe504f2e1563f4f6ddc3294dfbf3a upstream.

    OrangeFS formerly failed to set attributes_mask with the result that
    software could not see immutable and append flags present in the
    filesystem.

    Reported-by: Becky Ligon
    Signed-off-by: Martin Brandenburg
    Fixes: 68a24a6cc4a6 ("orangefs: implement statx")
    Cc: stable@vger.kernel.org
    Cc: hubcap@omnibond.com
    Signed-off-by: Mike Marshall
    Signed-off-by: Greg Kroah-Hartman

    Martin Brandenburg
     
  • commit f6a4b4c9d07dda90c7c29dae96d6119ac6425dca upstream.

    As long as a symlink inode remains in-core, the destination (and
    therefore size) will not be re-fetched from the server, as it cannot
    change. The original implementation of the attribute cache assumed that
    setting the expiry time in the past was sufficient to cause a re-fetch
    of all attributes on the next getattr. That does not work in this case.

    The bug manifested itself as follows. When the command sequence

    touch foo; ln -s foo bar; ls -l bar

    is run, the output was

    lrwxrwxrwx. 1 fedora fedora 4906 Apr 24 19:10 bar -> foo

    However, after a re-mount, ls -l bar produces

    lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo

    After this commit, even before a re-mount, the output is

    lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo

    Reported-by: Becky Ligon
    Signed-off-by: Martin Brandenburg
    Fixes: 71680c18c8f2 ("orangefs: Cache getattr results.")
    Cc: stable@vger.kernel.org
    Cc: hubcap@omnibond.com
    Signed-off-by: Mike Marshall
    Signed-off-by: Greg Kroah-Hartman

    Martin Brandenburg
     

30 May, 2018

1 commit

  • commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream.

    For anything NFS-exported we do _not_ want to unlock new inode
    before it has grown an alias; original set of fixes got the
    ordering right, but missed the nasty complication in case of
    lockdep being enabled - unlock_new_inode() does
    lockdep_annotate_inode_mutex_key(inode)
    which can only be done before anyone gets a chance to touch
    ->i_mutex. Unfortunately, flipping the order and doing
    unlock_new_inode() before d_instantiate() opens a window when
    mkdir can race with open-by-fhandle on a guessed fhandle, leading
    to multiple aliases for a directory inode and all the breakage
    that follows from that.

    Correct solution: a new primitive (d_instantiate_new())
    combining these two in the right order - lockdep annotate, then
    d_instantiate(), then the rest of unlock_new_inode(). All
    combinations of d_instantiate() with unlock_new_inode() should
    be converted to that.

    Cc: stable@kernel.org # 2.6.29 and later
    Tested-by: Mike Marshall
    Reviewed-by: Andreas Dilger
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     

24 Apr, 2018

1 commit


31 Jan, 2018

3 commits

  • commit 6793f1c450b1533a5e9c2493490de771d38b24f9 upstream.

    After do_readv_writev, the inode cache is invalidated anyway, so i_size
    will never be read. It will be fetched from the server which will also
    know about updates from other machines.

    Fixes deadlock on 32-bit SMP.

    See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2

    Signed-off-by: Martin Brandenburg
    Cc: Al Viro
    Cc: Mike Marshall
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Martin Brandenburg
     
  • commit a0ec1ded22e6a6bc41981fae22406835b006a66e upstream.

    In orangefs_devreq_read, there is a loop which picks an op off the list
    of pending ops. If the loop fails to find an op, there is nothing to
    read, and it returns EAGAIN. If the op has been given up on, the loop
    is restarted via a goto. The bug is that the variable which the found
    op is written to is not reinitialized, so if there are no more eligible
    ops on the list, the code runs again on the already handled op.

    This is triggered by interrupting a process while the op is being copied
    to the client-core. It's a fairly small window, but it's there.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Martin Brandenburg
     
  • commit 0afc0decf247f65b7aba666a76a0a68adf4bc435 upstream.

    set_op_state_purged can delete the op.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Martin Brandenburg
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

15 Sep, 2017

8 commits

  • MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bit

    The script “checkpatch.pl” pointed information out like the following.

    Comparison to NULL could be written !…

    Thus fix affected source code places.

    Signed-off-by: Markus Elfring
    Signed-off-by: Mike Marshall

    Markus Elfring
     
  • * A multiplication for the size determination of a memory allocation
    indicated that an array data structure should be processed.
    Thus use the corresponding function "kcalloc".

    This issue was detected by using the Coccinelle software.

    * Replace the specification of a data structure by a pointer dereference
    to make the corresponding size determination a bit safer according to
    the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: Mike Marshall

    Markus Elfring
     
  • Omit an extra message for a memory allocation failure in these functions.

    This issue was detected by using the Coccinelle software.

    Signed-off-by: Markus Elfring
    Signed-off-by: Mike Marshall

    Markus Elfring
     
  • The xattr_handler structure is only stored in an array of const
    structures. Thus the xattr_handler structure itself can be
    const.

    Signed-off-by: Julia Lawall
    Signed-off-by: Mike Marshall

    Julia Lawall
     
  • Orangefs doesn't do buffered writes yet, so there's no point in
    initiating and waiting for writeback.

    Signed-off-by: Jeff Layton
    Signed-off-by: Mike Marshall

    Jeff Layton
     
  • A previous patch which claimed to remove off by ones actually introduced
    them.

    strlen() returns the length of the string not including the NUL
    character. We are using strcpy() to copy "name" into a buffer which is
    ORANGEFS_MAX_XATTR_NAMELEN characters long. We should make sure to
    leave space for the NUL, otherwise we're writing one character beyond
    the end of the buffer.

    Fixes: e675c5ec51fe ("orangefs: clean up oversize xattr validation")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Mike Marshall

    Dan Carpenter
     
  • posix_acl_update_mode checks to see if the permissions
    described by the ACL can be encoded into the
    object's mode. If so, it sets "acl" to NULL
    and "mode" to the new desired value. Prior to this patch
    we failed to actually propagate the new mode back to the
    server.

    Signed-off-by: Mike Marshall

    Mike Marshall
     
  • When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
    set, DIR1 is expected to have SGID bit set (and owning group equal to
    the owning group of 'DIR0'). However when 'DIR0' also has some default
    ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
    'DIR1' to get cleared if user is not member of the owning group.

    Fix the problem by creating __orangefs_set_acl() function that does not
    call posix_acl_update_mode() and use it when inheriting ACLs. That
    prevents SGID bit clearing and the mode has been properly set by
    posix_acl_create() anyway.

    Fixes: 073931017b49d9458aa351605b43a7e34598caef
    CC: stable@vger.kernel.org
    CC: Mike Marshall
    CC: pvfs2-developers@beowulf-underground.org
    Signed-off-by: Jan Kara
    Signed-off-by: Mike Marshall

    Jan Kara
     

16 Jul, 2017

1 commit

  • Pull ->s_options removal from Al Viro:
    "Preparations for fsmount/fsopen stuff (coming next cycle). Everything
    gets moved to explicit ->show_options(), killing ->s_options off +
    some cosmetic bits around fs/namespace.c and friends. Basically, the
    stuff needed to work with fsmount series with minimum of conflicts
    with other work.

    It's not strictly required for this merge window, but it would reduce
    the PITA during the coming cycle, so it would be nice to have those
    bits and pieces out of the way"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    isofs: Fix isofs_show_options()
    VFS: Kill off s_options and helpers
    orangefs: Implement show_options
    9p: Implement show_options
    isofs: Implement show_options
    afs: Implement show_options
    affs: Implement show_options
    befs: Implement show_options
    spufs: Implement show_options
    bpf: Implement show_options
    ramfs: Implement show_options
    pstore: Implement show_options
    omfs: Implement show_options
    hugetlbfs: Implement show_options
    VFS: Don't use save/replace_mount_options if not using generic_show_options
    VFS: Provide empty name qstr
    VFS: Make get_filesystem() return the affected filesystem
    VFS: Clean up whitespace in fs/namespace.c and fs/super.c
    Provide a function to create a NUL-terminated string from unterminated data

    Linus Torvalds
     

11 Jul, 2017

1 commit

  • Implement the show_options superblock op for orangefs as part of a bid to
    rid of s_options and generic_show_options() to make it easier to implement
    a context-based mount where the mount options can be passed individually
    over a file descriptor.

    Signed-off-by: David Howells
    cc: Mike Marshall
    cc: pvfs2-developers@beowulf-underground.org
    Signed-off-by: Al Viro

    David Howells
     

20 Jun, 2017

2 commits

  • So I've noticed a number of instances where it was not obvious from the
    code whether ->task_list was for a wait-queue head or a wait-queue entry.

    Furthermore, there's a number of wait-queue users where the lists are
    not for 'tasks' but other entities (poll tables, etc.), in which case
    the 'task_list' name is actively confusing.

    To clear this all up, name the wait-queue head and entry list structure
    fields unambiguously:

    struct wait_queue_head::task_list => ::head
    struct wait_queue_entry::task_list => ::entry

    For example, this code:

    rqw->wait.task_list.next != &wait->task_list

    ... is was pretty unclear (to me) what it's doing, while now it's written this way:

    rqw->wait.head.next != &wait->entry

    ... which makes it pretty clear that we are iterating a list until we see the head.

    Other examples are:

    list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
    list_for_each_entry(wq, &fence->wait.task_list, task_list) {

    ... where it's unclear (to me) what we are iterating, and during review it's
    hard to tell whether it's trying to walk a wait-queue entry (which would be
    a bug), while now it's written as:

    list_for_each_entry_safe(pos, next, &x->head, entry) {
    list_for_each_entry(wq, &fence->wait.head, entry) {

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Rename:

    wait_queue_t => wait_queue_entry_t

    'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
    but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
    which had to carry the name.

    Start sorting this out by renaming it to 'wait_queue_entry_t'.

    This also allows the real structure name 'struct __wait_queue' to
    lose its double underscore and become 'struct wait_queue_entry',
    which is the more canonical nomenclature for such data types.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

06 May, 2017

1 commit

  • Pull orangefs updates from Mike Marshall:
    "Orangefs cleanups, fixes and statx support.

    Some cleanups:

    - remove unused get_fsid_from_ino
    - fix bounds check for listxattr
    - clean up oversize xattr validation
    - do not set getattr_time on orangefs_lookup
    - return from orangefs_devreq_read quickly if possible
    - do not wait for timeout if umounting
    - handle zero size write in debugfs

    Bug fixes:

    - do not check possibly stale size on truncate
    - ensure the userspace component is unmounted if mount fails
    - total reimplementation of dir.c

    New feature:

    - implement statx

    The new implementation of dir.c is kind of a big deal, all new code.
    It has been posted to fs-devel during the previous rc period, we
    didn't get much review or feedback from there, but it has been
    reviewed very heavily here, so much so that we have two entire
    versions of the reimplementation.

    Not only does the new implementation fix some xfstests, but it passes
    all the new tests we made here that involve seeking and rewinding and
    giant directories and long file names. The new dir code has three
    patches itself:

    - skip forward to the next directory entry if seek is short
    - invalidate stored directory on seek
    - count directory pieces correctly"

    * tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
    orangefs: count directory pieces correctly
    orangefs: invalidate stored directory on seek
    orangefs: skip forward to the next directory entry if seek is short
    orangefs: handle zero size write in debugfs
    orangefs: do not wait for timeout if umounting
    orangefs: return from orangefs_devreq_read quickly if possible
    orangefs: ensure the userspace component is unmounted if mount fails
    orangefs: do not check possibly stale size on truncate
    orangefs: implement statx
    orangefs: remove ORANGEFS_READDIR macros
    orangefs: support very large directories
    orangefs: support llseek on directories
    orangefs: rewrite readdir to fix several bugs
    orangefs: do not set getattr_time on orangefs_lookup
    orangefs: clean up oversize xattr validation
    orangefs: fix bounds check for listxattr
    orangefs: remove unused get_fsid_from_ino

    Linus Torvalds
     

05 May, 2017

3 commits

  • A large directory full of differently sized file names triggered this.
    Most directories, even very large directories with shorter names, would
    be lucky enough to fit in one server response.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • If an application seeks to a position before the point which has been
    read, it must want updates which have been made to the directory. So
    delete the copy stored in the kernel so it will be fetched again.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • If userspace seeks to a position in the stream which is not correct, it
    would have returned EIO because the data in the buffer at that offset
    would be incorrect. This and the userspace daemon returning a corrupt
    directory are indistinguishable.

    Now if the data does not look right, skip forward to the next chunk and
    try again. The motivation is that if the directory changes, an
    application may seek to a position that was valid and no longer is valid.

    It is not yet possible for a directory to change.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     

27 Apr, 2017

14 commits

  • If we write zero bytes to this debugfs file, then it will cause an
    underflow when we do copy_from_user(buf, ubuf, count - 1). Debugfs can
    normally only be written to by root so the impact of this is low.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Mike Marshall

    Dan Carpenter
     
  • When the computer is turned off, all the processes are killed and then
    all the filesystems are umounted. OrangeFS should not wait for the
    userspace daemon to come back in that case.

    This only works for plain umount(2). To actually take advantage of this
    interactively, `umount -f' is needed; otherwise umount will issue a
    statfs first, which will wait for the userspace daemon to come back.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • It is not necessary to take the lock and search through the request list
    if the list is empty.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • If the mount is aborted after userspace has been asked to mount,
    userspace must be told to unmount.

    Ordinarily orangefs_kill_sb does the unmount. However it cannot be
    called if the superblock has not been set up. This is a very narrow
    window.

    The NULL fs_id is not unmounted.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Let the server figure this out because our size might be out of date or
    not present.

    The bug was that

    xfs_io -f -t -c "pread -v 0 100" /mnt/foo
    echo "Test" > /mnt/foo
    xfs_io -f -t -c "pread -v 0 100" /mnt/foo

    fails because the second truncate did not happen if nothing had
    requested the size after the write in echo. Thus i_size was zero (not
    present) and the orangefs_setattr though i_size was zero and there was
    nothing to do.

    Signed-off-by: Martin Brandenburg
    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Fortunately OrangeFS has had a getattr request mask for a long time.

    The server basically has two difficulty levels for attributes. Fetching
    any attribute except size requires communicating with the metadata
    server for that handle. Since all the attributes are right there, it
    makes sense to return them all. Fetching the size requires
    communicating with every I/O server (that the file is distributed
    across). Therefore if asked for anything except size, get everything
    except size, and if asked for size, get everything.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • They are clones of the ORANGEFS_ITERATE macros in use elsewhere. Delete
    ORANGEFS_ITERATE_NEXT which is a hack previously used by readdir.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • This works by maintaining a linked list of pages which the directory
    has been read into rather than one giant fixed-size buffer.

    This replaces code which limits the total directory size to the total
    amount that could be returned in one server request. Since filenames
    are usually considerably shorter than the maximum, the old code could
    usually handle several server requests before running out of space.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • This and the previous commit fix xfstests generic/257.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • In the past, readdir assumed that the user buffer will be large enough
    that all entries from the server will fit. If this was not true,
    entries would be skipped.

    Since it works now, request 512 entries rather than 96 per server
    operation.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Since orangefs_lookup calls orangefs_iget which calls
    orangefs_inode_getattr, getattr_time will get set.

    Signed-off-by: Martin Brandenburg
    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Also don't check flags as this has been validated by the VFS already.

    Fix an off-by-one error in the max size checking.

    Stop logging just because userspace wants to write attributes which do
    not fit.

    This and the previous commit fix xfstests generic/020.

    Signed-off-by: Martin Brandenburg
    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Signed-off-by: Martin Brandenburg
    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     

22 Apr, 2017

1 commit


18 Apr, 2017

1 commit