25 Sep, 2019

1 commit

  • Patch series "Make working with compound pages easier", v2.

    These three patches add three helpers and convert the appropriate
    places to use them.

    This patch (of 3):

    It's unnecessarily hard to find out the size of a potentially huge page.
    Replace 'PAGE_SIZE << compound_order(page)' with page_size(page).

    Link: http://lkml.kernel.org/r/20190721104612.19120-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle)
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Reviewed-by: Ira Weiny
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

01 Jun, 2019

1 commit


21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

1 commit

  • To facilitate additional options to get_user_pages_fast() change the
    singular write parameter to be gup_flags.

    This patch does not change any functionality. New functionality will
    follow in subsequent patches.

    Some of the get_user_pages_fast() call sites were unchanged because they
    already passed FOLL_WRITE or 0 for the write parameter.

    NOTE: It was suggested to change the ordering of the get_user_pages_fast()
    arguments to ensure that callers were converted. This breaks the current
    GUP call site convention of having the returned pages be the final
    parameter. So the suggestion was rejected.

    Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
    Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
    Signed-off-by: Ira Weiny
    Reviewed-by: Mike Marshall
    Cc: Aneesh Kumar K.V
    Cc: Benjamin Herrenschmidt
    Cc: Borislav Petkov
    Cc: Dan Williams
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: James Hogan
    Cc: Jason Gunthorpe
    Cc: John Hubbard
    Cc: "Kirill A. Shutemov"
    Cc: Martin Schwidefsky
    Cc: Michal Hocko
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira Weiny
     

04 Apr, 2019

1 commit

  • If CONFIG_CRYPTO is not set or set to m,
    gcc building warn this:

    lib/iov_iter.o: In function `hash_and_copy_to_iter':
    iov_iter.c:(.text+0x9129): undefined reference to `crypto_stats_get'
    iov_iter.c:(.text+0x9152): undefined reference to `crypto_stats_ahash_update'

    Reported-by: Hulk Robot
    Fixes: d05f443554b3 ("iov_iter: introduce hash_and_copy_to_iter helper")
    Suggested-by: Al Viro
    Signed-off-by: YueHaibing
    Signed-off-by: Al Viro

    YueHaibing
     

27 Feb, 2019

1 commit

  • Avoid cache line miss dereferencing struct page if we can.

    page_copy_sane() mostly deals with order-0 pages.

    Extra cache line miss is visible on TCP recvmsg() calls dealing
    with GRO packets (typically 45 page frags are attached to one skb).

    Bringing the 45 struct pages into cpu cache while copying the data
    is not free, since the freeing of the skb (and associated
    page frags put_page()) can happen after cache lines have been evicted.

    Signed-off-by: Eric Dumazet
    Cc: Al Viro
    Signed-off-by: Al Viro

    Eric Dumazet
     

06 Jan, 2019

1 commit


04 Jan, 2019

1 commit

  • Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
    of the user address range verification function since we got rid of the
    old racy i386-only code to walk page tables by hand.

    It existed because the original 80386 would not honor the write protect
    bit when in kernel mode, so you had to do COW by hand before doing any
    user access. But we haven't supported that in a long time, and these
    days the 'type' argument is a purely historical artifact.

    A discussion about extending 'user_access_begin()' to do the range
    checking resulted this patch, because there is no way we're going to
    move the old VERIFY_xyz interface to that model. And it's best done at
    the end of the merge window when I've done most of my merges, so let's
    just get this done once and for all.

    This patch was mostly done with a sed-script, with manual fix-ups for
    the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

    There were a couple of notable cases:

    - csky still had the old "verify_area()" name as an alias.

    - the iter_iov code had magical hardcoded knowledge of the actual
    values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
    really used it)

    - microblaze used the type argument for a debug printout

    but other than those oddities this should be a total no-op patch.

    I tried to fix up all architectures, did fairly extensive grepping for
    access_ok() uses, and the changes are trivial, but I may have missed
    something. Any missed conversion should be trivially fixable, though.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Dec, 2018

2 commits

  • Allow consumers that want to use iov iterator helpers and also update
    a predefined hash calculation online when copying data. This is useful
    when copying incoming network buffers to a local iterator and calculate
    a digest on the incoming stream. nvme-tcp host driver that will be
    introduced in following patches is the first consumer via
    skb_copy_and_hash_datagram_iter.

    Acked-by: David S. Miller
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig

    Sagi Grimberg
     
  • The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram
    and we are trying to unite its logic with skb_copy_datagram_iter by passing
    a callback to the copy function that we want to apply. Thus, we need
    to make the checksum pointer private to the function.

    Acked-by: David S. Miller
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig

    Sagi Grimberg
     

28 Nov, 2018

1 commit


26 Nov, 2018

1 commit


24 Oct, 2018

3 commits

  • Add a new iterator, ITER_DISCARD, that can only be used in READ mode and
    just discards any data copied to it.

    This is useful in a network filesystem for discarding any unwanted data
    sent by a server.

    Signed-off-by: David Howells

    David Howells
     
  • In the iov_iter struct, separate the iterator type from the iterator
    direction and use accessor functions to access them in most places.

    Convert a bunch of places to use switch-statements to access them rather
    then chains of bitwise-AND statements. This makes it easier to add further
    iterator types. Also, this can be more efficient as to implement a switch
    of small contiguous integers, the compiler can use ~50% fewer compare
    instructions than it has to use bitwise-and instructions.

    Further, cease passing the iterator type into the iterator setup function.
    The iterator function can set that itself. Only the direction is required.

    Signed-off-by: David Howells

    David Howells
     
  • Use accessor functions to access an iterator's type and direction. This
    allows for the possibility of using some other method of determining the
    type of iterator than if-chains with bitwise-AND conditions.

    Signed-off-by: David Howells

    David Howells
     

16 Jul, 2018

3 commits

  • By mistake the ITER_PIPE early-exit / warning from copy_from_iter() was
    cargo-culted in _copy_to_iter_mcsafe() rather than a machine-check-safe
    version of copy_to_iter_pipe().

    Implement copy_pipe_to_iter_mcsafe() being careful to return the
    indication of short copies due to a CPU exception.

    Without this regression-fix all splice reads to dax-mode files fail.

    Reported-by: Ross Zwisler
    Tested-by: Ross Zwisler
    Signed-off-by: Dan Williams
    Acked-by: Al Viro
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
    Link: http://lkml.kernel.org/r/153108277278.37979.3327916996902264102.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Ingo Molnar

    Dan Williams
     
  • Add some theory of operation documentation to _copy_to_iter_flushcache().

    Reported-by: Al Viro
    Signed-off-by: Dan Williams
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Link: http://lkml.kernel.org/r/153108276767.37979.9462477994086841699.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Ingo Molnar

    Dan Williams
     
  • Add some theory of operation documentation to _copy_to_iter_mcsafe().

    Reported-by: Al Viro
    Signed-off-by: Dan Williams
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Link: http://lkml.kernel.org/r/153108276256.37979.1689794213845539316.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Ingo Molnar

    Dan Williams
     

05 Jun, 2018

1 commit

  • Pull x86 dax updates from Ingo Molnar:
    "This contains x86 memcpy_mcsafe() fault handling improvements the
    nvdimm tree would like to make more use of"

    * 'x86-dax-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()
    x86/asm/memcpy_mcsafe: Add write-protection-fault handling
    x86/asm/memcpy_mcsafe: Return bytes remaining
    x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling
    x86/asm/memcpy_mcsafe: Remove loop unrolling

    Linus Torvalds
     

15 May, 2018

1 commit

  • Use the updated memcpy_mcsafe() implementation to define
    copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant
    difference from typical copy_to_iter() is that the ITER_KVEC and
    ITER_BVEC iterator types can fail to complete a full transfer.

    Signed-off-by: Dan Williams
    Cc: Al Viro
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: hch@lst.de
    Cc: linux-fsdevel@vger.kernel.org
    Cc: linux-nvdimm@lists.01.org
    Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Ingo Molnar

    Dan Williams
     

03 May, 2018

2 commits


12 Oct, 2017

1 commit


21 Sep, 2017

1 commit

  • Issue is that if the data crosses a page boundary inside a compound
    page, this check will incorrectly trigger a WARN_ON.

    To fix this, compute the order using the head of the compound page and
    adjust the offset to be relative to that head.

    Fixes: 72e809ed81ed ("iov_iter: sanity checks for copy to/from page
    primitives")

    Signed-off-by: Petar Penkov
    CC: Al Viro
    CC: Eric Dumazet
    Signed-off-by: Al Viro

    Petar Penkov
     

08 Jul, 2017

1 commit

  • Pull iov_iter hardening from Al Viro:
    "This is the iov_iter/uaccess/hardening pile.

    For one thing, it trims the inline part of copy_to_user/copy_from_user
    to the minimum that *does* need to be inlined - object size checks,
    basically. For another, it sanitizes the checks for iov_iter
    primitives. There are 4 groups of checks: access_ok(), might_fault(),
    object size and KASAN.

    - access_ok() had been verified by whoever had set the iov_iter up.
    However, that has happened in a function far away, so proving that
    there's no path to actual copying bypassing those checks is hard
    and proving that iov_iter has not been buggered in the meanwhile is
    also not pleasant. So we want those redone in actual
    copyin/copyout.

    - might_fault() is better off consolidated - we know whether it needs
    to be checked as soon as we enter iov_iter primitive and observe
    the iov_iter flavour. No need to wait until the copyin/copyout. The
    call chains are short enough to make sure we won't miss anything -
    in fact, it's more robust that way, since there are cases where we
    do e.g. forced fault-in before getting to copyin/copyout. It's not
    quite what we need to check (in particular, combination of
    iovec-backed and set_fs(KERNEL_DS) is almost certainly a bug, not a
    cause to skip checks), but that's for later series. For now let's
    keep might_fault().

    - KASAN checks belong in copyin/copyout - at the same level where
    other iov_iter flavours would've hit them in memcpy().

    - object size checks should apply to *all* iov_iter flavours, not
    just iovec-backed ones.

    There are two groups of primitives - one gets the kernel object
    described as pointer + size (copy_to_iter(), etc.) while another gets
    it as page + offset + size (copy_page_to_iter(), etc.)

    For the first group the checks are best done where we actually have a
    chance to find the object size. In other words, those belong in inline
    wrappers in uio.h, before calling into iov_iter.c. Same kind as we
    have for inlined part of copy_to_user().

    For the second group there is no object to look at - offset in page is
    just a number, it bears no type information. So we do them in the
    common helper called by iov_iter.c primitives of that kind. All it
    currently does is checking that we are not trying to access outside of
    the compound page; eventually we might want to add some sanity checks
    on the page involved.

    So the things we need in copyin/copyout part of iov_iter.c do not
    quite match anything in uaccess.h (we want no zeroing, we *do* want
    access_ok() and KASAN and we want no might_fault() or object size
    checks done on that level). OTOH, these needs are simple enough to
    provide a couple of helpers (static in iov_iter.c) doing just what we
    need..."

    * 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    iov_iter: saner checks on copyin/copyout
    iov_iter: sanity checks for copy to/from page primitives
    iov_iter/hardening: move object size checks to inlined part
    copy_{to,from}_user(): consolidate object size checks
    copy_{from,to}_user(): move kasan checks and might_fault() out-of-line

    Linus Torvalds
     

07 Jul, 2017

1 commit

  • * might_fault() is better checked in caller (and e.g. fault-in + kmap_atomic
    codepath also needs might_fault() coverage)
    * we have already done object size checks
    * we have *NOT* done access_ok() recently enough; we rely upon the
    iovec array having passed sanity checks back when it had been created
    and not nothing having buggered it since. However, that's very much
    non-local, so we'd better recheck that.

    So the thing we want does not match anything in uaccess - we need
    access_ok + kasan checks + raw copy without any zeroing. Just define
    such helpers and use them here.

    Signed-off-by: Al Viro

    Al Viro
     

30 Jun, 2017

2 commits


10 Jun, 2017

1 commit

  • The pmem driver has a need to transfer data with a persistent memory
    destination and be able to rely on the fact that the destination writes are not
    cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
    (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
    to ensure data-writes have reached a power-fail-safe zone in the platform. The
    fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
    around and fence previous writes with an "sfence".

    Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
    memcpy_flushcache, that guarantee that the destination buffer is not dirty in
    the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
    will be used to replace the "pmem api" (include/linux/pmem.h +
    arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
    and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
    config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
    otherwise.

    This is meant to satisfy the concern from Linus that if a driver wants to do
    something beyond the normal nocache semantics it should be something private to
    that driver [1], and Al's concern that anything uaccess related belongs with
    the rest of the uaccess code [2].

    The first consumer of this interface is a new 'copy_from_iter' dax operation so
    that pmem can inject cache maintenance operations without imposing this
    overhead on other dax-capable drivers.

    [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
    [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

    Cc:
    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Al Viro
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

10 May, 2017

1 commit


09 May, 2017

2 commits

  • There are many code paths opencoding kvmalloc. Let's use the helper
    instead. The main difference to kvmalloc is that those users are
    usually not considering all the aspects of the memory allocator. E.g.
    allocation requests
    Reviewed-by: Boris Ostrovsky # Xen bits
    Acked-by: Kees Cook
    Acked-by: Vlastimil Babka
    Acked-by: Andreas Dilger # Lustre
    Acked-by: Christian Borntraeger # KVM/s390
    Acked-by: Dan Williams # nvdim
    Acked-by: David Sterba # btrfs
    Acked-by: Ilya Dryomov # Ceph
    Acked-by: Tariq Toukan # mlx4
    Acked-by: Leon Romanovsky # mlx5
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Herbert Xu
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Tony Luck
    Cc: "Rafael J. Wysocki"
    Cc: Ben Skeggs
    Cc: Kent Overstreet
    Cc: Santosh Raspatur
    Cc: Hariprasad S
    Cc: Yishai Hadas
    Cc: Oleg Drokin
    Cc: "Yan, Zheng"
    Cc: Alexander Viro
    Cc: Alexei Starovoitov
    Cc: Eric Dumazet
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Wrong sign of iov_iter_revert() argument. Unfortunately, slipped through
    the testing, since most of the time we don't do anything to the iterator
    afterwards and potential oops on walking the iter->iov too far backwards
    is too infrequent to be easily triggered.

    Add a sanity check in iov_iter_revert() to catch bugs like this one;
    fortunately, the same braino hadn't happened in other callers, but we'd
    better have a warning if such thing crops up.

    Signed-off-by: Al Viro

    Al Viro
     

02 May, 2017

1 commit

  • Pull uaccess unification updates from Al Viro:
    "This is the uaccess unification pile. It's _not_ the end of uaccess
    work, but the next batch of that will go into the next cycle. This one
    mostly takes copy_from_user() and friends out of arch/* and gets the
    zero-padding behaviour in sync for all architectures.

    Dealing with the nocache/writethrough mess is for the next cycle;
    fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
    sold on access_ok() in there, BTW; just not in this pile), same for
    reducing __copy_... callsites, strn*... stuff, etc. - there will be a
    pile about as large as this one in the next merge window.

    This one sat in -next for weeks. -3KLoC"

    * 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
    HAVE_ARCH_HARDENED_USERCOPY is unconditional now
    CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
    m32r: switch to RAW_COPY_USER
    hexagon: switch to RAW_COPY_USER
    microblaze: switch to RAW_COPY_USER
    get rid of padding, switch to RAW_COPY_USER
    ia64: get rid of copy_in_user()
    ia64: sanitize __access_ok()
    ia64: get rid of 'segment' argument of __do_{get,put}_user()
    ia64: get rid of 'segment' argument of __{get,put}_user_check()
    ia64: add extable.h
    powerpc: get rid of zeroing, switch to RAW_COPY_USER
    esas2r: don't open-code memdup_user()
    alpha: fix stack smashing in old_adjtimex(2)
    don't open-code kernel_setsockopt()
    mips: switch to RAW_COPY_USER
    mips: get rid of tail-zeroing in primitives
    mips: make copy_from_user() zero tail explicitly
    mips: clean and reorder the forest of macros...
    mips: consolidate __invoke_... wrappers
    ...

    Linus Torvalds
     

30 Apr, 2017

1 commit


03 Apr, 2017

1 commit


29 Mar, 2017

2 commits


15 Jan, 2017

1 commit

  • The logics in pipe_advance() used to release all buffers past the new
    position failed in cases when the number of buffers to release was equal
    to pipe->buffers. If that happened, none of them had been released,
    leaving pipe full. Worse, it was trivial to trigger and we end up with
    pipe full of uninitialized pages. IOW, it's an infoleak.

    Cc: stable@vger.kernel.org # v4.9
    Reported-by: "Alan J. Wylie"
    Tested-by: "Alan J. Wylie"
    Signed-off-by: Al Viro

    Al Viro
     

23 Dec, 2016

1 commit

  • Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
    and followups, except that in this case we really want to do nothing when
    asked for zero-length operation - unlike zero-length iterate_and_advance(),
    zero-length iterate_all_kinds() has no side effects, and callers are simpler
    that way.

    That got exposed when copy_from_iter_full() had been used by tipc, which
    builds an msghdr with zero payload and (now) feeds it to a primitive
    based on iterate_all_kinds() instead of iterate_and_advance().

    Reported-by: Jon Maloy
    Tested-by: Jon Maloy
    Signed-off-by: Al Viro

    Al Viro
     

17 Dec, 2016

1 commit

  • Pull vfs updates from Al Viro:

    - more ->d_init() stuff (work.dcache)

    - pathname resolution cleanups (work.namei)

    - a few missing iov_iter primitives - copy_from_iter_full() and
    friends. Either copy the full requested amount, advance the iterator
    and return true, or fail, return false and do _not_ advance the
    iterator. Quite a few open-coded callers converted (and became more
    readable and harder to fuck up that way) (work.iov_iter)

    - several assorted patches, the big one being logfs removal

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    logfs: remove from tree
    vfs: fix put_compat_statfs64() does not handle errors
    namei: fold should_follow_link() with the step into not-followed link
    namei: pass both WALK_GET and WALK_MORE to should_follow_link()
    namei: invert WALK_PUT logics
    namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
    namei: saner calling conventions for mountpoint_last()
    namei.c: get rid of user_path_parent()
    switch getfrag callbacks to ..._full() primitives
    make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
    [iov_iter] new primitives - copy_from_iter_full() and friends
    don't open-code file_inode()
    ceph: switch to use of ->d_init()
    ceph: unify dentry_operations instances
    lustre: switch to use of ->d_init()

    Linus Torvalds