22 Nov, 2018

1 commit

  • It returns EINVAL when the operation is not supported by the
    filesystem. Fix it to return EOPNOTSUPP to be consistent with
    the man page and clone_file_range().

    Clean up the inconsistent error return handling while I'm there.
    (I know, lipstick on a pig, but every little bit helps...)

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

03 Nov, 2018

1 commit

  • Pull vfs dedup fixes from Dave Chinner:
    "This reworks the vfs data cloning infrastructure.

    We discovered many issues with these interfaces late in the 4.19 cycle
    - the worst of them (data corruption, setuid stripping) were fixed for
    XFS in 4.19-rc8, but a larger rework of the infrastructure fixing all
    the problems was needed. That rework is the contents of this pull
    request.

    Rework the vfs_clone_file_range and vfs_dedupe_file_range
    infrastructure to use a common .remap_file_range method and supply
    generic bounds and sanity checking functions that are shared with the
    data write path. The current VFS infrastructure has problems with
    rlimit, LFS file sizes, file time stamps, maximum filesystem file
    sizes, stripping setuid bits, etc and so they are addressed in these
    commits.

    We also introduce the ability for the ->remap_file_range methods to
    return short clones so that clones for vfs_copy_file_range() don't get
    rejected if the entire range can't be cloned. It also allows
    filesystems to sliently skip deduplication of partial EOF blocks if
    they are not capable of doing so without requiring errors to be thrown
    to userspace.

    Existing filesystems are converted to user the new remap_file_range
    method, and both XFS and ocfs2 are modified to make use of the new
    generic checking infrastructure"

    * tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits)
    xfs: remove [cm]time update from reflink calls
    xfs: remove xfs_reflink_remap_range
    xfs: remove redundant remap partial EOF block checks
    xfs: support returning partial reflink results
    xfs: clean up xfs_reflink_remap_blocks call site
    xfs: fix pagecache truncation prior to reflink
    ocfs2: remove ocfs2_reflink_remap_range
    ocfs2: support partial clone range and dedupe range
    ocfs2: fix pagecache truncation prior to reflink
    ocfs2: truncate page cache for clone destination file before remapping
    vfs: clean up generic_remap_file_range_prep return value
    vfs: hide file range comparison function
    vfs: enable remap callers that can handle short operations
    vfs: plumb remap flags through the vfs dedupe functions
    vfs: plumb remap flags through the vfs clone functions
    vfs: make remap_file_range functions take and return bytes completed
    vfs: remap helper should update destination inode metadata
    vfs: pass remap flags to generic_remap_checks
    vfs: pass remap flags to generic_remap_file_range_prep
    vfs: combine the clone and dedupe into a single remap_file_range
    ...

    Linus Torvalds
     

02 Nov, 2018

1 commit

  • Pull misc vfs updates from Al Viro:
    "No common topic, really - a handful of assorted stuff; the least
    trivial bits are Mark's dedupe patches"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/exofs: only use true/false for asignment of bool type variable
    fs/exofs: fix potential memory leak in mount option parsing
    Delete invalid assignment statements in do_sendfile
    iomap: remove duplicated include from iomap.c
    vfs: dedupe should return EPERM if permission is not granted
    vfs: allow dedupe of user owned read-only files
    ntfs: don't open-code ERR_CAST
    ext4: don't open-code ERR_CAST

    Linus Torvalds
     

30 Oct, 2018

17 commits

  • Since the remap prep function can update the length of the remap
    request, we can change this function to return the usual return status
    instead of the odd behavior it has now.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • There are no callers of vfs_dedupe_file_range_compare, so we might as
    well make it a static helper and remove the export.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Plumb in a remap flag that enables the filesystem remap handler to
    shorten remapping requests for callers that can handle it. Now
    copy_file_range can report partial success (in case we run up against
    alignment problems, resource limits, etc.).

    We also enable CAN_SHORTEN for fideduperange to maintain existing
    userspace-visible behavior where xfs/btrfs shorten the dedupe range to
    avoid stale post-eof data exposure.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Plumb a remap_flags argument through the vfs_dedupe_file_range_one
    functions so that dedupe can take advantage of it.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Plumb a remap_flags argument through the {do,vfs}_clone_file_range
    functions so that clone can take advantage of it.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Change the remap_file_range functions to take a number of bytes to
    operate upon and return the number of bytes they operated on. This is a
    requirement for allowing fs implementations to return short clone/dedupe
    results to the user, which will enable us to obey resource limits in a
    graceful manner.

    A subsequent patch will enable copy_file_range to signal to the
    ->clone_file_range implementation that it can handle a short length,
    which will be returned in the function's return value. For now the
    short return is not implemented anywhere so the behavior won't change --
    either copy_file_range manages to clone the entire range or it tries an
    alternative.

    Neither clone ioctl can take advantage of this, alas.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Extend generic_remap_file_range_prep to handle inode metadata updates
    when remapping into a file. If the operation can possibly alter the
    file contents, we must update the ctime and mtime and remove security
    privileges, just like we do for regular file writes.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Pass the same remap flags to generic_remap_checks for consistency.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Plumb the remap flags through the filesystem from the vfs function
    dispatcher all the way to the prep function to prepare for behavior
    changes in subsequent patches.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Combine the clone_file_range and dedupe_file_range operations into a
    single remap_file_range file operation dispatch since they're
    fundamentally the same operation. The differences between the two can
    be made in the prep functions.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Since we use clone_verify_area for both clone and dedupe range checks,
    rename the function to make it clear that it's for both.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • The vfs_clone_file_prep is a generic function to be called by filesystem
    implementations only. Rename the prefix to generic_ and make it more
    clear that it applies to remap operations, not just clones.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Don't bother calling the filesystem for a zero-length dedupe request;
    we can return zero and exit.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • A deduplication data corruption is exposed in XFS and btrfs. It is
    caused by extending the block match range to include the partial EOF
    block, but then allowing unknown data beyond EOF to be considered a
    "match" to data in the destination file because the comparison is only
    made to the end of the source file. This corrupts the destination file
    when the source extent is shared with it.

    The VFS remapping prep functions only support whole block dedupe, but
    we still need to appear to support whole file dedupe correctly. Hence
    if the dedupe request includes the last block of the souce file, don't
    include it in the actual dedupe operation. If the rest of the range
    dedupes successfully, then reject the entire request. A subsequent
    patch will enable us to shorten dedupe requests correctly.

    When reflinking sub-file ranges, a data corruption can occur when the
    source file range includes a partial EOF block. This shares the unknown
    data beyond EOF into the second file at a position inside EOF, exposing
    stale data in the second file.

    If the reflink request includes the last block of the souce file, only
    proceed with the reflink operation if it lands at or past the
    destination file's current EOF. If it lands within the destination file
    EOF, reject the entire request with -EINVAL and make the caller go the
    hard way. A subsequent patch will enable us to shorten reflink requests
    correctly.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • If a remap caller asks us to remap to the source file's EOF and the
    source file length leaves us with a zero byte request, exit early.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • Move the file range checks from vfs_clone_file_prep into a separate
    generic_remap_checks function so that all the checks are collected in a
    central location. This forms the basis for adding more checks from
    generic_write_checks that will make cloning's input checking more
    consistent with write input checking.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Amir Goldstein
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     
  • vfs_clone_file_prep_inodes cannot return 0 if it is asked to remap from
    a zero byte file because that's what btrfs does.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Darrick J. Wong
     

26 Oct, 2018

1 commit

  • Pull timekeeping updates from Thomas Gleixner:
    "The timers and timekeeping departement provides:

    - Another large y2038 update with further preparations for providing
    the y2038 safe timespecs closer to the syscalls.

    - An overhaul of the SHCMT clocksource driver

    - SPDX license identifier updates

    - Small cleanups and fixes all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
    tick/sched : Remove redundant cpu_online() check
    clocksource/drivers/dw_apb: Add reset control
    clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE
    clocksource/drivers: Unify the names to timer-* format
    clocksource/drivers/sh_cmt: Add R-Car gen3 support
    dt-bindings: timer: renesas: cmt: document R-Car gen3 support
    clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer
    clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines
    clocksource/drivers/sh_cmt: Fixup for 64-bit machines
    clocksource/drivers/sh_tmu: Convert to SPDX identifiers
    clocksource/drivers/sh_mtu2: Convert to SPDX identifiers
    clocksource/drivers/sh_cmt: Convert to SPDX identifiers
    clocksource/drivers/renesas-ostm: Convert to SPDX identifiers
    clocksource: Convert to using %pOFn instead of device_node.name
    tick/broadcast: Remove redundant check
    RISC-V: Request newstat syscalls
    y2038: signal: Change rt_sigtimedwait to use __kernel_timespec
    y2038: socket: Change recvmmsg to use __kernel_timespec
    y2038: sched: Change sched_rr_get_interval to use __kernel_timespec
    y2038: utimes: Rework #ifdef guards for compat syscalls
    ...

    Linus Torvalds
     

18 Oct, 2018

3 commits

  • Assigning value -EINVAL to "retval" here, but that stored value is
    overwritten before it can be used.

    retval = -EINVAL;
    ....
    retval = rw_verify_area(WRITE, out.file, &out_pos, count);

    value_overwrite: Overwriting previous write to "retval" with value
    from rw_verify_area

    delete invalid assignment statements

    Signed-off-by: n00202754
    Signed-off-by: Al Viro

    nixiaoming
     
  • Right now we return EINVAL if a process does not have permission to dedupe a
    file. This was an oversight on my part. EPERM gives a true description of
    the nature of our error, and EINVAL is already used for the case that the
    filesystem does not support dedupe.

    Signed-off-by: Mark Fasheh
    Reviewed-by: Darrick J. Wong
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Mark Fasheh
     
  • The permission check in vfs_dedupe_file_range_one() is too coarse - We only
    allow dedupe of the destination file if the user is root, or they have the
    file open for write.

    This effectively limits a non-root user from deduping their own read-only
    files. In addition, the write file descriptor that the user is forced to
    hold open can prevent execution of files. As file data during a dedupe
    does not change, the behavior is unexpected and this has caused a number of
    issue reports. For an example, see:

    https://github.com/markfasheh/duperemove/issues/129

    So change the check so we allow dedupe on the target if:

    - the root or admin is asking for it
    - the process has write access
    - the owner of the file is asking for the dedupe
    - the process could get write access

    That way users can open read-only and still get dedupe.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Al Viro

    Mark Fasheh
     

24 Sep, 2018

1 commit

  • Commit 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze
    protection") created a wrapper do_clone_file_range() around
    vfs_clone_file_range() moving the freeze protection to former, so
    overlayfs could call the latter.

    The more common vfs practice is to call do_xxx helpers from vfs_xxx
    helpers, where freeze protecction is taken in the vfs_xxx helper, so
    this anomality could be a source of confusion.

    It seems that commit 8ede205541ff ("ovl: add reflink/copyfile/dedup
    support") may have fallen a victim to this confusion -
    ovl_clone_file_range() calls the vfs_clone_file_range() helper in the
    hope of getting freeze protection on upper fs, but in fact results in
    overlayfs allowing to bypass upper fs freeze protection.

    Swap the names of the two helpers to conform to common vfs practice
    and call the correct helpers from overlayfs and nfsd.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Miklos Szeredi

    Amir Goldstein
     

29 Aug, 2018

1 commit

  • The sys_llseek sytem call is needed on all 32-bit architectures and
    none of the 64-bit ones, so we can remove the __ARCH_WANT_SYS_LLSEEK guard
    and simplify the include/asm-generic/unistd.h header further.

    Since 32-bit tasks can run either natively or in compat mode on 64-bit
    architectures, we have to check for both !CONFIG_64BIT and CONFIG_COMPAT.

    There are a few 64-bit architectures that also reference sys_llseek
    in their 64-bit ABI (e.g. sparc), but I verified that those all
    select CONFIG_COMPAT, so the #if check is still correct here. It's
    a bit odd to include it in the syscall table though, as it's the
    same as sys_lseek() on 64-bit, but with strange calling conventions.

    Acked-by: Geert Uytterhoeven
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

18 Jul, 2018

1 commit


07 Jul, 2018

4 commits


13 Jun, 2018

1 commit

  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

16 Apr, 2018

1 commit


03 Apr, 2018

4 commits

  • Using the ksys_p{read,write}64() wrappers allows us to get rid of
    in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls.
    The ksys_ prefix denotes that this function is meant as a drop-in
    replacement for the syscall. In particular, it uses the same calling
    convention as sys_p{read,write}64().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using this helper allows us to avoid the in-kernel calls to the
    sys_read() syscall. The ksys_ prefix denotes that this function
    is meant as a drop-in replacement for the syscall. In particular, it
    uses the same calling convention as sys_read().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Alexander Viro
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using this helper allows us to avoid the in-kernel calls to the
    sys_lseek() syscall. The ksys_ prefix denotes that this function
    is meant as a drop-in replacement for the syscall. In particular, it
    uses the same calling convention as sys_lseek().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Alexander Viro
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using this helper allows us to avoid the in-kernel calls to the sys_write()
    syscall. The ksys_ prefix denotes that this function is meant as a drop-in
    replacement for the syscall. In particular, it uses the same calling
    convention as sys_write().

    In the near future, the do_mounts / initramfs callers of ksys_write()
    should be converted to use filp_open() and vfs_write() instead.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Alexander Viro
    Cc: linux-s390@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     

18 Nov, 2017

1 commit

  • Pull iov_iter updates from Al Viro:

    - bio_{map,copy}_user_iov() series; those are cleanups - fixes from the
    same pile went into mainline (and stable) in late September.

    - fs/iomap.c iov_iter-related fixes

    - new primitive - iov_iter_for_each_range(), which applies a function
    to kernel-mapped segments of an iov_iter.

    Usable for kvec and bvec ones, the latter does kmap()/kunmap() around
    the callback. _Not_ usable for iovec- or pipe-backed iov_iter; the
    latter is not hard to fix if the need ever appears, the former is by
    design.

    Another related primitive will have to wait for the next cycle - it
    passes page + offset + size instead of pointer + size, and that one
    will be usable for everything _except_ kvec. Unfortunately, that one
    didn't get exposure in -next yet, so...

    - a bit more lustre iov_iter work, including a use case for
    iov_iter_for_each_range() (checksum calculation)

    - vhost/scsi leak fix in failure exit

    - misc cleanups and detritectomy...

    * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
    iomap_dio_actor(): fix iov_iter bugs
    switch ksocknal_lib_recv_...() to use of iov_iter_for_each_range()
    lustre: switch struct ksock_conn to iov_iter
    vhost/scsi: switch to iov_iter_get_pages()
    fix a page leak in vhost_scsi_iov_to_sgl() error recovery
    new primitive: iov_iter_for_each_range()
    lnet_return_rx_credits_locked: don't abuse list_entry
    xen: don't open-code iov_iter_kvec()
    orangefs: remove detritus from struct orangefs_kiocb_s
    kill iov_shorten()
    bio_alloc_map_data(): do bmd->iter setup right there
    bio_copy_user_iov(): saner bio size calculation
    bio_map_user_iov(): get rid of copying iov_iter
    bio_copy_from_iter(): get rid of copying iov_iter
    move more stuff down into bio_copy_user_iov()
    blk_rq_map_user_iov(): move iov_iter_advance() down
    bio_map_user_iov(): get rid of the iov_for_each()
    bio_map_user_iov(): move alignment check into the main loop
    don't rely upon subsequent bio_add_pc_page() calls failing
    ... and with iov_iter_get_pages_alloc() it becomes even simpler
    ...

    Linus Torvalds
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

12 Oct, 2017

1 commit