12 Aug, 2020

1 commit


05 Aug, 2020

1 commit


29 Jul, 2020

1 commit

  • Al Viro pointed out that I broke some acl functionality...

    * ACLs could not be fully removed
    * posix_acl_chmod would be called while the old ACL was still cached
    * new mode propagated to orangefs server before ACL.

    ... when I tried to make sure that modes that got changed as a
    result of ACL-sets would be sent back to the orangefs server.

    Not wanting to try and change the code without having some cases to
    test it with, I began to hunt for setfacl examples that were expressible
    in pure mode. Along the way I found examples like the following
    which confused me:

    user A had a file (/home/A/asdf) with mode 740
    user B was in user A's group
    user C was not in user A's group

    setfacl -m u:C:rwx /home/A/asdf

    The above setfacl caused ls -l /home/A/asdf to show a mode of 770,
    making it appear that all users in user A's group now had full access
    to /home/A/asdf, however, user B still only had read acces. Madness.

    Anywho, I finally found that the above (whacky as it is) appears to
    be "posixly on purpose" and explained in acl(5):

    If the ACL has an ACL_MASK entry, the group permissions correspond
    to the permissions of the ACL_MASK entry.

    Signed-off-by: Mike Marshall

    Mike Marshall
     

24 Jun, 2020

1 commit


12 Jun, 2020

1 commit


06 Jun, 2020

1 commit


03 Jun, 2020

1 commit

  • Since the new pair function is introduced, we can call them to clean the
    code in orangefs.

    Signed-off-by: Guoqing Jiang
    Signed-off-by: Andrew Morton
    Tested-by: Mike Marshall
    Reviewed-by: Andrew Morton
    Cc: Martin Brandenburg
    Link: http://lkml.kernel.org/r/20200517214718.468-9-guoqing.jiang@cloud.ionos.com
    Signed-off-by: Linus Torvalds

    Guoqing Jiang
     

30 May, 2020

2 commits

  • This code was using get_user_pages*(), in a "Case 1" scenario
    (Direct IO), using the categorization from [1]. That means that it's
    time to convert the get_user_pages*() + put_page() calls to
    pin_user_pages*() + unpin_user_pages() calls.

    There is some helpful background in [2]: basically, this is a small
    part of fixing a long-standing disconnect between pinning pages, and
    file systems' use of those pages.

    [1] Documentation/core-api/pin_user_pages.rst

    [2] "Explicit pinning of user-space pages":
    https://lwn.net/Articles/807108/

    Cc: Mike Marshall
    Cc: Martin Brandenburg
    Cc: devel@lists.orangefs.org
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: John Hubbard
    Signed-off-by: Mike Marshall

    John Hubbard
     
  • The variable ret is being initialized with a value that is
    never read and it is being updated later with a new value. The
    initialization is redundant and can be removed.

    Addresses-Coverity: ("Unused value")
    Signed-off-by: Colin Ian King
    Signed-off-by: Mike Marshall

    Colin Ian King
     

11 Apr, 2020

1 commit


08 Apr, 2020

2 commits

  • Christoph Hellwig noticed that we were doing some unnecessary
    work in orangefs_flush:

    orangefs_flush just writes out data on every close(2) call. There is
    no need to change anything about the dirty state, especially as
    orangefs doesn't treat I_DIRTY_TIMES special in any way. The code
    seems to come from partially open coding vfs_fsync.

    He sent in a patch with the above commit message and also a
    patch that was a reversion of another Orangefs patch I had
    sent upstream a while ago. I had to fix his reversion patch
    so that it would compile which caused his "don't mess with
    I_DIRTY_TIMES" patch to fail to apply. So here I have just
    remade his patch and applied it after the fixed reversion patch.

    Signed-off-by: Mike Marshall

    Mike Marshall
     
  • Christoph Hellwig sent in a reversion of "orangefs: remember count
    when reading." because:

    ->read_iter calls can race with each other and one or
    more ->flush calls. Remove the the scheme to store the read
    count in the file private data as is is completely racy and
    can cause use after free or double free conditions

    Christoph's reversion caused Orangefs not to work or to compile. I
    added a patch that fixed that, but intel's kbuild test robot pointed
    out that sending Christoph's patch followed by my patch upstream, it
    would break bisection because of the failure to compile. So I have
    combined the reversion plus my patch... here's the commit message
    that was in my patch:

    Logically, optimal Orangefs "pages" are 4 megabytes. Reading
    large Orangefs files 4096 bytes at a time is like trying to
    kick a dead whale down the beach. Before Christoph's "Revert
    orangefs: remember count when reading." I tried to give users
    a knob whereby they could, for example, use "count" in
    read(2) or bs with dd(1) to get whatever they considered an
    appropriate amount of bytes at a time from Orangefs and fill
    as many page cache pages as they could at once.

    Without the racy code that Christoph reverted Orangefs won't
    even compile, much less work. So this replaces the logic that
    used the private file data that Christoph reverted with
    a static number of bytes to read from Orangefs.

    I ran tests like the following to determine what a
    reasonable static number of bytes might be:

    dd if=/pvfsmnt/asdf of=/dev/null count=128 bs=4194304
    dd if=/pvfsmnt/asdf of=/dev/null count=256 bs=2097152
    dd if=/pvfsmnt/asdf of=/dev/null count=512 bs=1048576
    .
    .
    .
    dd if=/pvfsmnt/asdf of=/dev/null count=4194304 bs=128

    Reads seem faster using the static number, so my "knob code"
    wasn't just racy, it wasn't even a good idea...

    Signed-off-by: Mike Marshall
    Reported-by: kbuild test robot

    Mike Marshall
     

10 Feb, 2020

1 commit


05 Feb, 2020

1 commit


09 Dec, 2019

1 commit


04 Dec, 2019

1 commit

  • Orangefs has no open, and orangefs checks file permissions
    on each file access. Posix requires that file permissions
    be checked on open and nowhere else. Orangefs-through-the-kernel
    needs to seem posix compliant.

    The VFS opens files, even if the filesystem provides no
    method. We can see if a file was successfully opened for
    read and or for write by looking at file->f_mode.

    When writes are flowing from the page cache, file is no
    longer available. We can trust the VFS to have checked
    file->f_mode before writing to the page cache.

    The mode of a file might change between when it is opened
    and IO commences, or it might be created with an arbitrary mode.

    We'll make sure we don't hit EACCES during the IO stage by
    using UID 0. Some of the time we have access without changing
    to UID 0 - how to check?

    Signed-off-by: Mike Marshall

    Mike Marshall
     

06 Nov, 2019

1 commit

  • Add a flag option to get xattr method that could have a bit flag of
    XATTR_NOSECURITY passed to it. XATTR_NOSECURITY is generally then
    set in the __vfs_getxattr path when called by security
    infrastructure.

    This handles the case of a union filesystem driver that is being
    requested by the security layer to report back the xattr data.

    For the use case where access is to be blocked by the security layer.

    The path then could be security(dentry) ->
    __vfs_getxattr(dentry...XATTR_NOSECURITY) ->
    handler->get(dentry...XATTR_NOSECURITY) ->
    __vfs_getxattr(lower_dentry...XATTR_NOSECURITY) ->
    lower_handler->get(lower_dentry...XATTR_NOSECURITY)
    which would report back through the chain data and success as
    expected, the logging security layer at the top would have the
    data to determine the access permissions and report back the target
    context that was blocked.

    Without the get handler flag, the path on a union filesystem would be
    the errant security(dentry) -> __vfs_getxattr(dentry) ->
    handler->get(dentry) -> vfs_getxattr(lower_dentry) -> nested ->
    security(lower_dentry, log off) -> lower_handler->get(lower_dentry)
    which would report back through the chain no data, and -EACCES.

    For selinux for both cases, this would translate to a correctly
    determined blocked access. In the first case with this change a correct avc
    log would be reported, in the second legacy case an incorrect avc log
    would be reported against an uninitialized u:object_r:unlabeled:s0
    context making the logs cosmetically useless for audit2allow.

    This patch series is inert and is the wide-spread addition of the
    flags option for xattr functions, and a replacement of __vfs_getxattr
    with __vfs_getxattr(...XATTR_NOSECURITY).

    Signed-off-by: Mark Salyzyn
    Reviewed-by: Jan Kara
    Acked-by: Jan Kara
    Acked-by: Jeff Layton
    Acked-by: David Sterba
    Acked-by: Darrick J. Wong
    Acked-by: Mike Marshall
    Cc: Stephen Smalley
    Cc: linux-kernel@vger.kernel.org
    Cc: kernel-team@android.com
    Cc: linux-security-module@vger.kernel.org

    (cherry picked from (rejected from archive because of too many recipients))
    Signed-off-by: Mark Salyzyn
    Bug: 133515582
    Bug: 136124883
    Bug: 129319403
    Change-Id: Iabbb8771939d5f66667a26bb23ddf4c562c349a1

    Mark Salyzyn
     

20 Sep, 2019

1 commit

  • Pull orangefs updates from Mike Marshall:
    "A fix and a cleanup.

    The fix: way back in the stone age (2003) mode was set to the magic
    number "755" in what is now fs/orangefs/namei.c(orangefs_symlink).
    Łukasz Wrochna reported it and Artur Świgoń sent in a patch to change
    it to octal. Maybe it shouldn't be a magic number at all but rather
    something like "S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH"...

    cleanup: Colin Ian King found a redundant assignment and sent in a
    patch to remove it"

    [ And no, octal numbers for permissions are a lot more legible than a
    binary 'or' of some line noise macros. So 0755 is preferred over
    trying to spell it out using "helpful" macros - Linus ]

    * tag 'for-linus-5.4-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
    orangefs: remove redundant assignment to err
    orangefs: Add octal zero prefix

    Linus Torvalds
     

13 Sep, 2019

2 commits


01 Aug, 2019

2 commits

  • This file has its own proper style, except that, after a while,
    the coding style gets violated and whitespaces are placed on
    different ways.

    As Sphinx and ReST are very sentitive to whitespace differences,
    I had to opt if each entry after required/mandatory/... fields
    should start with zero spaces or with a tab. I opted to start them
    all from the zero position, in order to avoid needing to break lines
    with more than 80 columns, with would make harder for review.

    Most of the other changes at porting.rst were made to use an unified
    notation with works nice as a text file while also produce a good html
    output after being parsed.

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     
  • There are 3 remaining files without an extension inside the fs docs
    dir.

    Manually convert them to ReST.

    In the case of the nfs/exporting.rst file, as the nfs docs
    aren't ported yet, I opted to convert and add a :orphan: there,
    with should be removed when it gets added into a nfs-specific
    part of the fs documentation.

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

17 Jul, 2019

1 commit


13 Jul, 2019

1 commit

  • Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong:
    "Here's a patch series that sets up common parameter checking functions
    for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations.

    The goal here is to reduce the amount of behaviorial variance between
    the filesystems where those ioctls originated (ext2 and XFS,
    respectively) and everybody else.

    - Standardize parameter checking for the SETFLAGS and FSSETXATTR
    ioctls (which were the file attribute setters for ext4 and xfs and
    have now been hoisted to the vfs)

    - Only allow the DAX flag to be set on files and directories"

    * tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    vfs: only allow FSSETXATTR to set DAX flag on files and dirs
    vfs: teach vfs_ioc_fssetxattr_check to check extent size hints
    vfs: teach vfs_ioc_fssetxattr_check to check project id info
    vfs: create a generic checking function for FS_IOC_FSSETXATTR
    vfs: create a generic checking and prep function for FS_IOC_SETFLAGS

    Linus Torvalds
     

12 Jul, 2019

2 commits


04 Jul, 2019

1 commit

  • Stephen writes:
    After merging the driver-core tree, today's linux-next build (x86_64
    allmodconfig) produced this warning:

    fs/orangefs/orangefs-debugfs.c: In function 'orangefs_debugfs_init':
    fs/orangefs/orangefs-debugfs.c:193:1: warning: label 'out' defined but not used [-Wunused-label]
    out:
    ^~~
    fs/orangefs/orangefs-debugfs.c: In function 'orangefs_kernel_debug_init':
    fs/orangefs/orangefs-debugfs.c:204:17: warning: unused variable 'ret' [-Wunused-variable]
    struct dentry *ret;
    ^~~
    Fix this up and change the return type of the function to void as it can
    not fail, which cleans up some more code and variables as well.

    Cc: Mike Marshall
    Cc: Martin Brandenburg
    Cc: devel@lists.orangefs.org
    Reported-by: Stephen Rothwell
    Fixes: f095adba36bb ("orangefs: no need to check return value of debugfs_create functions")
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

03 Jul, 2019

1 commit

  • When calling debugfs functions, there is no need to ever check the
    return value. The function can work or not, but the code logic should
    never do something different based on this.

    Cc: Mike Marshall
    Cc: Martin Brandenburg
    Cc: devel@lists.orangefs.org
    Signed-off-by: Greg Kroah-Hartman
    Link: https://lore.kernel.org/r/20190612152204.GA17511@kroah.com
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

01 Jul, 2019

1 commit

  • Create a generic function to check incoming FS_IOC_SETFLAGS flag values
    and later prepare the inode for updates so that we can standardize the
    implementations that follow ext4's flag values.

    Note that the efivarfs implementation no longer fails a no-op SETFLAGS
    without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Acked-by: David Sterba
    Reviewed-by: Bob Peterson

    Darrick J. Wong
     

21 May, 2019

2 commits


15 May, 2019

1 commit

  • To facilitate additional options to get_user_pages_fast() change the
    singular write parameter to be gup_flags.

    This patch does not change any functionality. New functionality will
    follow in subsequent patches.

    Some of the get_user_pages_fast() call sites were unchanged because they
    already passed FOLL_WRITE or 0 for the write parameter.

    NOTE: It was suggested to change the ordering of the get_user_pages_fast()
    arguments to ensure that callers were converted. This breaks the current
    GUP call site convention of having the returned pages be the final
    parameter. So the suggestion was rejected.

    Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Mike Marshall
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     

10 May, 2019

1 commit

  • Pull orangefs updates from Mike Marshall:
    "This includes one fix and our "Orangefs through the pagecache" patch
    series which greatly improves our small IO performance and helps us
    pass more xfstests than before.

    Fix:
    - orangefs: truncate before updating size

    Pagecache series:
    - all the rest"

    * tag 'for-linus-5.2-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (23 commits)
    orangefs: truncate before updating size
    orangefs: copy Orangefs-sized blocks into the pagecache if possible.
    orangefs: pass slot index back to readpage.
    orangefs: remember count when reading.
    orangefs: add orangefs_revalidate_mapping
    orangefs: implement writepages
    orangefs: write range tracking
    orangefs: avoid fsync service operation on flush
    orangefs: skip inode writeout if nothing to write
    orangefs: move do_readv_writev to direct_IO
    orangefs: do not return successful read when the client-core disappeared
    orangefs: implement writepage
    orangefs: migrate to generic_file_read_iter
    orangefs: service ops done for writeback are not killable
    orangefs: remove orangefs_readpages
    orangefs: reorganize setattr functions to track attribute changes
    orangefs: let setattr write to cached inode
    orangefs: set up and use backing_dev_info
    orangefs: hold i_lock during inode_getattr
    orangefs: update attributes rather than relying on server
    ...

    Linus Torvalds
     

04 May, 2019

7 commits

  • Otherwise we race with orangefs_writepage/orangefs_writepages
    which and does not expect i_size < page_offset.

    Fixes xfstests generic/129.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • ->readpage looks in file->private_data to try and find out how the
    userspace program set "count" in read(2) or with "dd bs=" or whatever.

    ->readpage uses "count" and inode->i_size to calculate how much
    data Orangefs should deposit in the Orangefs shared buffer, and
    remembers which slot the data is in.

    After copying data from the Orangefs shared buffer slot into
    "the page", readpage tries to increment through the pagecache index
    and fill as many pages as it can from the extra data in the shared
    buffer. Hopefully these extra pages will soon be needed by the vfs,
    and they'll be in the pagecache already.

    Signed-off-by: Mike Marshall
    Signed-off-by: Martin Brandenburg

    Mike Marshall
     
  • When userspace deposits more than a page of data into the shared buffer,
    we'll need to know which slot it is in when we get back to readpage
    so that we can try to use the extra data to fill some extra pages.

    Signed-off-by: Mike Marshall
    Signed-off-by: Martin Brandenburg

    Mike Marshall
     
  • Orangefs wins when it can do IO on large (up to four meg) blocks at a time,
    and looses when it has to do tiny "small io" reads and writes. Accessing
    Orangefs through the pagecache with the kernel module helps with small io,
    both reading and writing, a great deal. Readpage generally tries to fetch a
    page (four k) at a time. We'll let users use "count" (as in read(2) or
    pread(2) for example) as a knob to control how much data they get from
    Orangefs at a time and we'll try to use the data to fill extra
    pagecache pages when we get to ->readpage, hopefully resulting in
    fewer calls to readpage and Orangefs userspace.

    We need a way to remember how they set count so that we can still have
    it available when we get to ->readpage.

    - We'll use file->private_data to keep track of "count".
    We'll wrap generic_file_open with orangefs_file_open and
    initialize private_data to NULL there.

    - In ->read_iter we have access to both "count" and file, so
    we'll kmalloc some space onto file->private_data and store
    "count" there.

    - We'll kfree file->private_data each time we visit ->flush and
    reinitialize it to NULL.

    Signed-off-by: Mike Marshall
    Signed-off-by: Martin Brandenburg

    Mike Marshall
     
  • This is modeled after NFS, except our method is different. We use a
    simple timer to determine whether to invalidate the page cache. This
    is bound to perform.

    This addes a sysfs parameter cache_timeout_msecs which controls the time
    between page cache invalidations.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Go through pages and look for a consecutive writable region. After
    finding a number of consecutive writable pages or when finding that
    the next page's dirty range is not contiguous and cannot be written
    as one request, send the write to the server.

    The number of pages is determined by the client-core's buffer size.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg
     
  • Attach the actual range of bytes written to plus the responsible uid/gid
    to each dirty page. This information must be sent to the server when
    the page is written out.

    Now write_begin, page_mkwrite, and invalidatepage keep up with this
    information. There are several conditions where they must write out the
    page immediately to store the new range. Two non-contiguous ranges
    cannot be stored on a single page.

    Signed-off-by: Martin Brandenburg
    Signed-off-by: Mike Marshall

    Martin Brandenburg