15 Sep, 2017

1 commit

  • Pull zstd support from Chris Mason:
    "Nick Terrell's patch series to add zstd support to the kernel has been
    floating around for a while. After talking with Dave Sterba, Herbert
    and Phillip, we decided to send the whole thing in as one pull
    request.

    zstd is a big win in speed over zlib and in compression ratio over
    lzo, and the compression team here at FB has gotten great results
    using it in production. Nick will continue to update the kernel side
    with new improvements from the open source zstd userland code.

    Nick has a number of benchmarks for the main zstd code in his lib/zstd
    commit:

    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB
    of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel
    Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using
    `silesia.tar` [3], which is 211,988,480 B large. Run the following
    commands for the benchmark:

    sudo modprobe zstd_compress_test
    sudo mknod zstd_compress_test c 245 0
    sudo cp silesia.tar zstd_compress_test

    The time is reported by the time of the userland `cp`.
    The MB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

    which includes the time to copy from userland.
    The Adjusted MB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

    The memory reported is the amount of memory the compressor
    requests.

    | Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) |
    |----------|----------|----------|-------|---------|----------|----------|
    | none | 11988480 | 0.100 | 1 | 2119.88 | - | - |
    | zstd -1 | 73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 |
    | zstd -3 | 66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 |
    | zstd -5 | 65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 |
    | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 |
    | zstd -15 | 58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 |
    | zstd -19 | 54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 |
    | zlib -1 | 77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 |
    | zlib -3 | 72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 |
    | zlib -6 | 68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 |
    | zlib -9 | 67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 |

    I benchmarked zstd decompression using the same method on the same
    machine. The benchmark file is located in the upstream zstd repo
    under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The
    memory reported is the amount of memory required to decompress
    data compressed with the given compression level. If you know the
    maximum size of your input, you can reduce the memory usage of
    decompression irrespective of the compression level.

    | Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) |
    |----------|----------|---------|---------------|-------------|
    | none | 0.025 | 8479.54 | - | - |
    | zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 |
    | zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 |
    | zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 |
    | zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 |
    | zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 |
    | zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 |
    | zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 |
    | zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 |

    I ran a long series of tests and benchmarks on the btrfs side and the
    gains are very similar to the core benchmarks Nick ran"

    * 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    squashfs: Add zstd support
    btrfs: Add zstd support
    lib: Add zstd modules
    lib: Add xxhash module

    Linus Torvalds
     

12 Sep, 2017

1 commit

  • Pull libnvdimm from Dan Williams:
    "A rework of media error handling in the BTT driver and other updates.
    It has appeared in a few -next releases and collected some late-
    breaking build-error and warning fixups as a result.

    Summary:

    - Media error handling support in the Block Translation Table (BTT)
    driver is reworked to address sleeping-while-atomic locking and
    memory-allocation-context conflicts.

    - The dax_device lookup overhead for xfs and ext4 is moved out of the
    iomap hot-path to a mount-time lookup.

    - A new 'ecc_unit_size' sysfs attribute is added to advertise the
    read-modify-write boundary property of a persistent memory range.

    - Preparatory fix-ups for arm and powerpc pmem support are included
    along with other miscellaneous fixes"

    * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
    libnvdimm, btt: fix format string warnings
    libnvdimm, btt: clean up warning and error messages
    ext4: fix null pointer dereference on sbi
    libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
    dax: fix FS_DAX=n BLOCK=y compilation
    libnvdimm: fix integer overflow static analysis warning
    libnvdimm, nd_blk: remove mmio_flush_range()
    libnvdimm, btt: rework error clearing
    libnvdimm: fix potential deadlock while clearing errors
    libnvdimm, btt: cache sector_size in arena_info
    libnvdimm, btt: ensure that flags were also unchanged during a map_read
    libnvdimm, btt: refactor map entry operations with macros
    libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
    libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
    ext4: perform dax_device lookup at mount
    ext2: perform dax_device lookup at mount
    xfs: perform dax_device lookup at mount
    dax: introduce a fs_dax_get_by_bdev() helper
    libnvdimm, btt: check memory allocation failure
    libnvdimm, label: fix index block size calculation
    ...

    Linus Torvalds
     

09 Sep, 2017

1 commit

  • [akpm@linux-foundation.org: minor tweaks]
    Link: http://lkml.kernel.org/r/20170720184539.31609-3-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: "Martin K. Petersen"
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Minchan Kim
    Cc: Ralf Baechle
    Cc: Richard Henderson
    Cc: Russell King
    Cc: Sam Ravnborg
    Cc: Sergey Senozhatsky
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

01 Sep, 2017

1 commit

  • mmio_flush_range() suffers from a lack of clearly-defined semantics,
    and is somewhat ambiguous to port to other architectures where the
    scope of the writeback implied by "flush" and ordering might matter,
    but MMIO would tend to imply non-cacheable anyway. Per the rationale
    in 67a3e8fe9015 ("nd_blk: change aperture mapping from WC to WB"), the
    only existing use is actually to invalidate clean cache lines for
    ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
    cleanup of the pmem API, that also now happens to be the exact purpose
    of arch_invalidate_pmem(), which would be a far more well-defined tool
    for the job.

    Rather than risk potentially inconsistent implementations of
    mmio_flush_range() for the sake of one callsite, streamline things by
    removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
    definitions up to the libnvdimm level, so they can be shared by NFIT
    as well. This allows NFIT to be enabled for arm64.

    Signed-off-by: Robin Murphy
    Signed-off-by: Dan Williams

    Robin Murphy
     

16 Aug, 2017

2 commits

  • Add zstd compression and decompression kernel modules.
    zstd offers a wide varity of compression speed and quality trade-offs.
    It can compress at speeds approaching lz4, and quality approaching lzma.
    zstd decompressions at speeds more than twice as fast as zlib, and
    decompression speed remains roughly the same across all compression levels.

    The code was ported from the upstream zstd source repository. The
    `linux/zstd.h` header was modified to match linux kernel style.
    The cross-platform and allocation code was stripped out. Instead zstd
    requires the caller to pass a preallocated workspace. The source files
    were clang-formatted [1] to match the Linux Kernel style as much as
    possible. Otherwise, the code was unmodified. We would like to avoid
    as much further manual modification to the source code as possible, so it
    will be easier to keep the kernel zstd up to date.

    I benchmarked zstd compression as a special character device. I ran zstd
    and zlib compression at several levels, as well as performing no
    compression, which measure the time spent copying the data to kernel space.
    Data is passed to the compresser 4096 B at a time. The benchmark file is
    located in the upstream zstd source repository under
    `contrib/linux-kernel/zstd_compress_test.c` [2].

    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
    211,988,480 B large. Run the following commands for the benchmark:

    sudo modprobe zstd_compress_test
    sudo mknod zstd_compress_test c 245 0
    sudo cp silesia.tar zstd_compress_test

    The time is reported by the time of the userland `cp`.
    The MB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

    which includes the time to copy from userland.
    The Adjusted MB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

    The memory reported is the amount of memory the compressor requests.

    | Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) |
    |----------|----------|----------|-------|---------|----------|----------|
    | none | 11988480 | 0.100 | 1 | 2119.88 | - | - |
    | zstd -1 | 73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 |
    | zstd -3 | 66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 |
    | zstd -5 | 65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 |
    | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 |
    | zstd -15 | 58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 |
    | zstd -19 | 54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 |
    | zlib -1 | 77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 |
    | zlib -3 | 72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 |
    | zlib -6 | 68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 |
    | zlib -9 | 67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 |

    I benchmarked zstd decompression using the same method on the same machine.
    The benchmark file is located in the upstream zstd repo under
    `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is
    the amount of memory required to decompress data compressed with the given
    compression level. If you know the maximum size of your input, you can
    reduce the memory usage of decompression irrespective of the compression
    level.

    | Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) |
    |----------|----------|---------|---------------|-------------|
    | none | 0.025 | 8479.54 | - | - |
    | zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 |
    | zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 |
    | zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 |
    | zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 |
    | zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 |
    | zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 |
    | zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 |
    | zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 |

    Tested in userland using the test-suite in the zstd repo under
    `contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel
    functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under
    `contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8]
    with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the
    BtrFS and SquashFS patches coming next.

    [1] https://clang.llvm.org/docs/ClangFormat.html
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c
    [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c
    [5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp
    [6] http://llvm.org/docs/LibFuzzer.html
    [7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c
    [8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c

    zstd source repository: https://github.com/facebook/zstd

    Signed-off-by: Nick Terrell
    Signed-off-by: Chris Mason

    Nick Terrell
     
  • Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
    extremely fast non-cryptographic hash algorithm for checksumming.
    The zstd compression and decompression modules added in the next patch
    require xxhash. I extracted it out from zstd since it is useful on its
    own. I copied the code from the upstream XXHash source repository and
    translated it into kernel style. I ran benchmarks and tests in the kernel
    and tests in userland.

    I benchmarked xxhash as a special character device. I ran in four modes,
    no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
    kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
    hashes on the copied data. I also ran it with four different buffer sizes.
    The benchmark file is located in the upstream zstd source repository under
    `contrib/linux-kernel/xxhash_test.c` [1].

    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
    from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
    Run the following commands for the benchmark:

    modprobe xxhash_test
    mknod xxhash_test c 245 0
    time cp filesystem.squashfs xxhash_test

    The time is reported by the time of the userland `cp`.
    The GB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

    which includes the time to copy from userland.
    The Normalized GB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

    | Buffer Size (B) | Hash | Time (s) | GB/s | Adjusted GB/s |
    |-----------------|-------|----------|------|---------------|
    | 1024 | none | 0.408 | 3.77 | - |
    | 1024 | xxh32 | 0.649 | 2.37 | 6.37 |
    | 1024 | xxh64 | 0.542 | 2.83 | 11.46 |
    | 1024 | crc32 | 1.290 | 1.19 | 1.74 |
    | 4096 | none | 0.380 | 4.04 | - |
    | 4096 | xxh32 | 0.645 | 2.38 | 5.79 |
    | 4096 | xxh64 | 0.500 | 3.07 | 12.80 |
    | 4096 | crc32 | 1.168 | 1.32 | 1.95 |
    | 8192 | none | 0.351 | 4.38 | - |
    | 8192 | xxh32 | 0.614 | 2.50 | 5.84 |
    | 8192 | xxh64 | 0.464 | 3.31 | 13.60 |
    | 8192 | crc32 | 1.163 | 1.32 | 1.89 |
    | 16384 | none | 0.346 | 4.43 | - |
    | 16384 | xxh32 | 0.590 | 2.60 | 6.30 |
    | 16384 | xxh64 | 0.466 | 3.30 | 12.80 |
    | 16384 | crc32 | 1.183 | 1.30 | 1.84 |

    Tested in userland using the test-suite in the zstd repo under
    `contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
    kernel functions. A line in each branch of every function in `xxhash.c`
    was commented out to ensure that the test-suite fails. Additionally
    tested while testing zstd and with SMHasher [3].

    [1] https://phabricator.intern.facebook.com/P57526246
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
    [3] https://github.com/aappleby/smhasher

    zstd source repository: https://github.com/facebook/zstd
    XXHash source repository: https://github.com/cyan4973/xxhash

    Signed-off-by: Nick Terrell
    Signed-off-by: Chris Mason

    Nick Terrell
     

08 Jul, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "libnvdimm updates for the latest ACPI and UEFI specifications. This
    pull request also includes new 'struct dax_operations' enabling to
    undo the abuse of copy_user_nocache() for copy operations to pmem.

    The dax work originally missed 4.12 to address concerns raised by Al.

    Summary:

    - Introduce the _flushcache() family of memory copy helpers and use
    them for persistent memory write operations on x86. The
    _flushcache() semantic indicates that the cache is either bypassed
    for the copy operation (movnt) or any lines dirtied by the copy
    operation are written back (clwb, clflushopt, or clflush).

    - Extend dax_operations with ->copy_from_iter() and ->flush()
    operations. These operations and other infrastructure updates allow
    all persistent memory specific dax functionality to be pushed into
    libnvdimm and the pmem driver directly. It also allows dax-specific
    sysfs attributes to be linked to a host device, for example:
    /sys/block/pmem0/dax/write_cache

    - Add support for the new NVDIMM platform/firmware mechanisms
    introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
    namespace label format, extensions to the address-range-scrub
    command set, new error injection commands, and a new BTT
    (block-translation-table) layout. These updates support inter-OS
    and pre-OS compatibility.

    - Fix a longstanding memory corruption bug in nfit_test.

    - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
    capable.

    - Miscellaneous fixes and small updates across libnvdimm and the nfit
    driver.

    Acknowledgements that came after the branch was pushed: commit
    6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
    sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
    "

    * tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
    libnvdimm, namespace: record 'lbasize' for pmem namespaces
    acpi/nfit: Issue Start ARS to retrieve existing records
    libnvdimm: New ACPI 6.2 DSM functions
    acpi, nfit: Show bus_dsm_mask in sysfs
    libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
    acpi, nfit: Enable DSM pass thru for root functions.
    libnvdimm: passthru functions clear to send
    libnvdimm, btt: convert some info messages to warn/err
    libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
    libnvdimm: fix the clear-error check in nsio_rw_bytes
    libnvdimm, btt: fix btt_rw_page not returning errors
    acpi, nfit: quiet invalid block-aperture-region warnings
    libnvdimm, btt: BTT updates for UEFI 2.7 format
    acpi, nfit: constify *_attribute_group
    libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
    libnvdimm, pmem, dax: export a cache control attribute
    dax: convert to bitmask for flags
    dax: remove default copy_from_iter fallback
    libnvdimm, nfit: enable support for volatile ranges
    libnvdimm, pmem: fix persistence warning
    ...

    Linus Torvalds
     

10 Jun, 2017

1 commit

  • The pmem driver has a need to transfer data with a persistent memory
    destination and be able to rely on the fact that the destination writes are not
    cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
    (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
    to ensure data-writes have reached a power-fail-safe zone in the platform. The
    fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
    around and fence previous writes with an "sfence".

    Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
    memcpy_flushcache, that guarantee that the destination buffer is not dirty in
    the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
    will be used to replace the "pmem api" (include/linux/pmem.h +
    arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
    and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
    config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
    otherwise.

    This is meant to satisfy the concern from Linus that if a driver wants to do
    something beyond the normal nocache semantics it should be something private to
    that driver [1], and Al's concern that anything uaccess related belongs with
    the rest of the uaccess code [2].

    The first consumer of this interface is a new 'copy_from_iter' dax operation so
    that pmem can inject cache maintenance operations without imposing this
    overhead on other dax-capable drivers.

    [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
    [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

    Cc:
    Cc: Jan Kara
    Cc: Jeff Moyer
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Al Viro
    Cc: Thomas Gleixner
    Cc: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

09 Jun, 2017

1 commit

  • Add a little helper for crc4 calculations. This works 4-bits-at-a-time,
    using a simple table approach.

    We will need this in the FSI core code, as well as any master
    implementations that need to calculate CRCs in software.

    Signed-off-by: Jeremy Kerr
    Signed-off-by: Chris Bostic
    Signed-off-by: Joel Stanley
    Signed-off-by: Greg Kroah-Hartman

    Jeremy Kerr
     

01 Mar, 2017

1 commit

  • Pull networking fixes from David Miller:

    1) Don't save TIPC header values before the header has been validated,
    from Jon Paul Maloy.

    2) Fix memory leak in RDS, from Zhu Yanjun.

    3) We miss to initialize the UID in the flow key in some paths, from
    Julian Anastasov.

    4) Fix latent TOS masking bug in the routing cache removal from years
    ago, also from Julian.

    5) We forget to set the sockaddr port in sctp_copy_local_addr_list(),
    fix from Xin Long.

    6) Missing module ref count drop in packet scheduler actions, from
    Roman Mashak.

    7) Fix RCU annotations in rht_bucket_nested, from Herbert Xu.

    8) Fix use after free which happens because L2TP's ipv4 support returns
    non-zero values from it's backlog_rcv function which ipv4 interprets
    as protocol values. Fix from Paul Hüber.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
    qed: Don't use attention PTT for configuring BW
    qed: Fix race with multiple VFs
    l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
    xfrm: provide correct dst in xfrm_neigh_lookup
    rhashtable: Fix RCU dereference annotation in rht_bucket_nested
    rhashtable: Fix use before NULL check in bucket_table_free
    net sched actions: do not overwrite status of action creation.
    rxrpc: Kernel calls get stuck in recvmsg
    net sched actions: decrement module reference count after table flush.
    lib: Allow compile-testing of parman
    ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt
    sctp: set sin_port for addr param when checking duplicate address
    net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
    netfilter: nft_set_bitmap: incorrect bitmap size
    net: s2io: fix typo argumnet argument
    net: vxge: fix typo argumnet argument
    netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
    ipv4: mask tos for input route
    ipv4: add missing initialization for flowi4_uid
    lib: fix spelling mistake: "actualy" -> "actually"
    ...

    Linus Torvalds
     

27 Feb, 2017

1 commit


26 Feb, 2017

1 commit

  • Pull rdma DMA mapping updates from Doug Ledford:
    "Drop IB DMA mapping code and use core DMA code instead.

    Bart Van Assche noted that the ib DMA mapping code was significantly
    similar enough to the core DMA mapping code that with a few changes it
    was possible to remove the IB DMA mapping code entirely and switch the
    RDMA stack to use the core DMA mapping code.

    This resulted in a nice set of cleanups, but touched the entire tree
    and has been kept separate for that reason."

    * tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
    IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
    IB/core: Remove ib_device.dma_device
    nvme-rdma: Switch from dma_device to dev.parent
    RDS: net: Switch from dma_device to dev.parent
    IB/srpt: Modify a debug statement
    IB/srp: Switch from dma_device to dev.parent
    IB/iser: Switch from dma_device to dev.parent
    IB/IPoIB: Switch from dma_device to dev.parent
    IB/rxe: Switch from dma_device to dev.parent
    IB/vmw_pvrdma: Switch from dma_device to dev.parent
    IB/usnic: Switch from dma_device to dev.parent
    IB/qib: Switch from dma_device to dev.parent
    IB/qedr: Switch from dma_device to dev.parent
    IB/ocrdma: Switch from dma_device to dev.parent
    IB/nes: Remove a superfluous assignment statement
    IB/mthca: Switch from dma_device to dev.parent
    IB/mlx5: Switch from dma_device to dev.parent
    IB/mlx4: Switch from dma_device to dev.parent
    IB/i40iw: Remove a superfluous assignment statement
    IB/hns: Switch from dma_device to dev.parent
    ...

    Linus Torvalds
     

25 Feb, 2017

2 commits

  • Extract the glob test code into its own source file, to allow to compile
    it either to a loadable module, or builtin into the kernel.

    Link: http://lkml.kernel.org/r/1483470276-10517-2-git-send-email-geert@linux-m68k.org
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Extract the crc32 test code into its own source file, to allow to
    compile it either to a loadable module, or builtin into the kernel.

    Link: http://lkml.kernel.org/r/1483470276-10517-1-git-send-email-geert@linux-m68k.org
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

24 Feb, 2017

2 commits

  • Pull drm updates from Dave Airlie:
    "This is the main drm pull request for v4.11.

    Nothing too major, the tinydrm and mmu-less support should make
    writing smaller drivers easier for some of the simpler platforms, and
    there are a bunch of documentation updates.

    Intel grew displayport MST audio support which is hopefully useful to
    people, and FBC is on by default for GEN9+ (so people know where to
    look for regressions). AMDGPU has a lot of fixes that would like new
    firmware files installed for some GPUs.

    Other than that it's pretty scattered all over.

    I may have a follow up pull request as I know BenH has a bunch of AST
    rework and fixes and I'd like to get those in once they've been tested
    by AST, and I've got at least one pull request I'm just trying to get
    the author to fix up.

    Core:
    - drm_mm reworked
    - Connector list locking and iterators
    - Documentation updates
    - Format handling rework
    - MMU-less support for fbdev helpers
    - drm_crtc_from_index helper
    - Core CRC API
    - Remove drm_framebuffer_unregister_private
    - Debugfs cleanup
    - EDID/Infoframe fixes
    - Release callback
    - Tinydrm support (smaller drivers for simple hw)

    panel:
    - Add support for some new simple panels

    i915:
    - FBC by default for gen9+
    - Shared dpll cleanups and docs
    - GEN8 powerdomain cleanup
    - DMC support on GLK
    - DP MST audio support
    - HuC loading support
    - GVT init ordering fixes
    - GVT IOMMU workaround fix

    amdgpu/radeon:
    - Power/clockgating improvements
    - Preliminary SR-IOV support
    - TTM buffer priority and eviction fixes
    - SI DPM quirks removed due to firmware fixes
    - Powerplay improvements
    - VCE/UVD powergating fixes
    - Cleanup SI GFX code to match CI/VI
    - Support for > 2 displays on 3/5 crtc asics
    - SI headless fixes

    nouveau:
    - Rework securre boot code in prep for GP10x secure boot
    - Channel recovery improvements
    - Initial power budget code
    - MMU rework preperation

    vmwgfx:
    - Bunch of fixes and cleanups

    exynos:
    - Runtime PM support for MIC driver
    - Cleanups to use atomic helpers
    - UHD Support for TM2/TM2E boards
    - Trigger mode fix for Rinato board

    etnaviv:
    - Shader performance fix
    - Command stream validator fixes
    - Command buffer suballocator

    rockchip:
    - CDN DisplayPort support
    - IOMMU support for arm64 platform

    imx-drm:
    - Fix i.MX5 TV encoder probing
    - Remove lower fb size limits

    msm:
    - Support for HW cursor on MDP5 devices
    - DSI encoder cleanup
    - GPU DT bindings cleanup

    sti:
    - stih410 cleanups
    - Create fbdev at binding
    - HQVDP fixes
    - Remove stih416 chip functionality
    - DVI/HDMI mode selection fixes
    - FPS statistic reporting

    omapdrm:
    - IRQ code cleanup

    dwi-hdmi bridge:
    - Cleanups and fixes

    adv-bridge:
    - Updates for nexus

    sii8520 bridge:
    - Add interlace mode support
    - Rework HDMI and lots of fixes

    qxl:
    - probing/teardown cleanups

    ZTE drm:
    - HDMI audio via SPDIF interface
    - Video Layer overlay plane support
    - Add TV encoder output device

    atmel-hlcdc:
    - Rework fbdev creation logic

    tegra:
    - OF node fix

    fsl-dcu:
    - Minor fixes

    mali-dp:
    - Assorted fixes

    sunxi:
    - Minor fix"

    [ This was the "fixed" pull, that still had build warnings due to people
    not even having build tested the result. I'm not a happy camper

    I've fixed the things I noticed up in this merge. - Linus ]

    * tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux: (1177 commits)
    lib/Kconfig: make PRIME_NUMBERS not user selectable
    drm/tinydrm: helpers: Properly fix backlight dependency
    drm/tinydrm: mipi-dbi: Fix field width specifier warning
    drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized
    drm/sti: fix build warnings in sti_drv.c and sti_vtg.c files
    drm/amd/powerplay: fix PSI feature on Polars12
    drm/amdgpu: refuse to reserve io mem for split VRAM buffers
    drm/ttm: fix use-after-free races in vm fault handling
    drm/tinydrm: Add support for Multi-Inno MI0283QT display
    dt-bindings: Add Multi-Inno MI0283QT binding
    dt-bindings: display/panel: Add common rotation property
    of: Add vendor prefix for Multi-Inno
    drm/tinydrm: Add MIPI DBI support
    drm/tinydrm: Add helper functions
    drm: Add DRM support for tiny LCD displays
    drm/amd/amdgpu: post card if there is real hw resetting performed
    drm/nouveau/tmr: provide backtrace when a timeout is hit
    drm/nouveau/pci/g92: Fix rearm
    drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios
    drm/nouveau/hwmon: expose power_max and power_crit
    ..

    Linus Torvalds
     
  • Linus doesn't like it user selectable, so kill it until
    someone needs it for something else.

    Signed-off-by: Dave Airlie

    Dave Airlie
     

23 Feb, 2017

1 commit

  • As reported by Geert, remove the string so the user does not see this
    config option. The option is explicitly selected only as a dependency of
    in-kernel users.

    Reported-by: Geert Uytterhoeven
    Fixes: 44091d29f207 ("lib: Introduce priority array area manager")
    Signed-off-by: Jiri Pirko
    Tested-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller

    Jiri Pirko
     

04 Feb, 2017

1 commit

  • This introduces a infrastructure for management of linear priority
    areas. Priority order in an array matters, however order of items inside
    a priority group does not matter.

    As an initial implementation, L-sort algorithm is used. It is quite
    trivial. More advanced algorithm called P-sort will be introduced as a
    follow-up. The infrastructure is prepared for other algos.

    Alongside this, a testing module is introduced as well.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

25 Jan, 2017

2 commits

  • Several RDMA drivers (hfi1, qib and rxe) expect that ib_sge.addr
    is a virtual address. Provide DMA mapping operations that are
    suitable for these drivers.

    Signed-off-by: Bart Van Assche
    Cc: Christian Borntraeger
    Cc: Joerg Roedel
    Cc: Andy Lutomirski
    Cc: Michael S. Tsirkin
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • Reduce the kernel size by only building dma_noop_ops for those
    architectures that actually use it. This was suggested by
    Christoph Hellwig.

    Signed-off-by: Bart Van Assche
    Cc: Christian Borntraeger
    Cc: Joerg Roedel
    Cc: Andy Lutomirski
    Cc: Michael S. Tsirkin
    Cc: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     

27 Dec, 2016

1 commit

  • Prime numbers are interesting for testing components that use multiplies
    and divides, such as testing DRM's struct drm_mm alignment computations.

    v2: Move to lib/, add selftest
    v3: Fix initial constants (exclude 0/1 from being primes)
    v4: More RCU markup to keep 0day/sparse happy
    v5: Fix RCU unwind on module exit, add to kselftests
    v6: Tidy computation of bitmap size
    v7: for_each_prime_number_from()
    v8: Compose small-primes using BIT() for easier verification
    v9: Move rcu dance entirely into callers.
    v10: Improve quote for Betrand's Postulate (aka Chebyshev's theorem)

    Signed-off-by: Chris Wilson
    Cc: Lukas Wunner
    Reviewed-by: Joonas Lahtinen
    Signed-off-by: Daniel Vetter
    Link: http://patchwork.freedesktop.org/patch/msgid/20161222144514.3911-1-chris@chris-wilson.co.uk

    Chris Wilson
     

08 Oct, 2016

2 commits

  • Merge updates from Andrew Morton:

    - fsnotify updates

    - ocfs2 updates

    - all of MM

    * emailed patches from Andrew Morton : (127 commits)
    console: don't prefer first registered if DT specifies stdout-path
    cred: simpler, 1D supplementary groups
    CREDITS: update Pavel's information, add GPG key, remove snail mail address
    mailmap: add Johan Hovold
    .gitattributes: set git diff driver for C source code files
    uprobes: remove function declarations from arch/{mips,s390}
    spelling.txt: "modeled" is spelt correctly
    nmi_backtrace: generate one-line reports for idle cpus
    arch/tile: adopt the new nmi_backtrace framework
    nmi_backtrace: do a local dump_stack() instead of a self-NMI
    nmi_backtrace: add more trigger_*_cpu_backtrace() methods
    min/max: remove sparse warnings when they're nested
    Documentation/filesystems/proc.txt: add more description for maps/smaps
    mm, proc: fix region lost in /proc/self/smaps
    proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
    proc: add LSM hook checks to /proc//timerslack_ns
    proc: relax /proc//timerslack_ns capability requirements
    meminfo: break apart a very long seq_printf with #ifdefs
    seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
    proc: faster /proc/*/status
    ...

    Linus Torvalds
     
  • This came to light when implementing native 64-bit atomics for ARCv2.

    The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
    to check whether atomic64_dec_if_positive() is available. It seems it
    was needed when not every arch defined it. However as of current code
    the Kconfig option seems needless

    - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
    generic definition of API is present lib/atomic64.c
    - arches with native 64-bit atomics select it in arch/*/Kconfig and
    define the API in their headers

    So I see no point in keeping the Kconfig option

    Compile tested for:
    - blackfin (CONFIG_GENERIC_ATOMIC64)
    - x86 (!CONFIG_GENERIC_ATOMIC64)
    - ia64

    Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com
    Signed-off-by: Vineet Gupta
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Vineet Gupta
    Cc: Zhaoxiu Zeng
    Cc: Linus Walleij
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Herbert Xu
    Cc: Ming Lin
    Cc: Arnd Bergmann
    Cc: Geert Uytterhoeven
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Andi Kleen
    Cc: Boqun Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vineet Gupta
     

17 Sep, 2016

1 commit

  • This is a generally useful data structure, so make it available to
    anyone else who might want to use it. It's also a nice cleanup
    separating the allocation logic from the rest of the tag handling logic.

    The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
    selected by CONFIG_BLOCK for now.

    This should be a complete noop functionality-wise.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

21 May, 2016

1 commit

  • I've been receiving increasingly concerned notes from 0day about how
    much my recent changes have been bloating the radix tree. Make it
    happier by only including multiorder support if
    CONFIG_TRANSPARENT_HUGEPAGES is set.

    This is an independent Kconfig option, so other radix tree users can
    also set it if they have a need.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Cc: Konstantin Khlebnikov
    Cc: Kirill Shutemov
    Cc: Jan Kara
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

16 Apr, 2016

1 commit


26 Mar, 2016

1 commit

  • Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
    will allow KASAN store allocation/deallocation stack traces for memory
    chunks. The stack traces are stored in a hash table and referenced by
    handles which reside in the kasan_alloc_meta and kasan_free_meta
    structures in the allocated memory chunks.

    IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
    duplication.

    Right now stackdepot support is only enabled in SLAB allocator. Once
    KASAN features in SLAB are on par with those in SLUB we can switch SLUB
    to stackdepot as well, thus removing the dependency on SLUB stack
    bookkeeping, which wastes a lot of memory.

    This patch is based on the "mm: kasan: stack depots" patch originally
    prepared by Dmitry Chernenkov.

    Joonsoo has said that he plans to reuse the stackdepot code for the
    mm/page_owner.c debugging facility.

    [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
    [aryabinin@virtuozzo.com: comment style fixes]
    Signed-off-by: Alexander Potapenko
    Signed-off-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

24 Jan, 2016

1 commit

  • Pull rdma updates from Doug Ledford:
    "Initial roundup of 4.5 merge window patches

    - Remove usage of ib_query_device and instead store attributes in
    ib_device struct

    - Move iopoll out of block and into lib, rename to irqpoll, and use
    in several places in the rdma stack as our new completion queue
    polling library mechanism. Update the other block drivers that
    already used iopoll to use the new mechanism too.

    - Replace the per-entry GID table locks with a single GID table lock

    - IPoIB multicast cleanup

    - Cleanups to the IB MR facility

    - Add support for 64bit extended IB counters

    - Fix for netlink oops while parsing RDMA nl messages

    - RoCEv2 support for the core IB code

    - mlx4 RoCEv2 support

    - mlx5 RoCEv2 support

    - Cross Channel support for mlx5

    - Timestamp support for mlx5

    - Atomic support for mlx5

    - Raw QP support for mlx5

    - MAINTAINERS update for mlx4/mlx5

    - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

    - Add support for remote invalidate to the iSER driver (pushed
    through the RDMA tree due to dependencies, acknowledged by nab)

    - Update to NFSoRDMA (pushed through the RDMA tree due to
    dependencies, acknowledged by Bruce)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
    IB/mlx5: Unify CQ create flags check
    IB/mlx5: Expose Raw Packet QP to user space consumers
    {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
    IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
    IB/mlx5: Add Raw Packet QP query functionality
    IB/mlx5: Add create and destroy functionality for Raw Packet QP
    IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
    IB/mlx5: Allocate a Transport Domain for each ucontext
    net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
    net/mlx5_core: Add RQ and SQ event handling
    net/mlx5_core: Export transport objects
    IB/mlx5: Expose CQE version to user-space
    IB/mlx5: Add CQE version 1 support to user QPs and SRQs
    IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
    IB/sa: Fix netlink local service GFP crash
    IB/srpt: Remove redundant wc array
    IB/qib: Improve ipoib UD performance
    IB/mlx4: Advertise RoCE v2 support
    IB/mlx4: Create and use another QP1 for RoCEv2
    IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
    ...

    Linus Torvalds
     

23 Jan, 2016

1 commit

  • Pull crypto fixes from Herbert Xu:
    "This fixes the following issues:

    API:
    - A large number of bug fixes for the af_alg interface, credit goes
    to Dmitry Vyukov for discovering and reporting these issues.

    Algorithms:
    - sw842 needs to select crc32.
    - The soft dependency on crc32c is now in the correct spot.

    Drivers:
    - The atmel AES driver needs HAS_DMA.
    - The atmel AES driver was a missing break statement, fortunately
    it's only a debug function.
    - A number of bug fixes for the Intel qat driver"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (24 commits)
    crypto: algif_skcipher - sendmsg SG marking is off by one
    crypto: crc32c - Fix crc32c soft dependency
    crypto: algif_skcipher - Load TX SG list after waiting
    crypto: atmel-aes - Add missing break to atmel_aes_reg_name
    crypto: algif_skcipher - Fix race condition in skcipher_check_key
    crypto: algif_hash - Fix race condition in hash_check_key
    crypto: CRYPTO_DEV_ATMEL_AES should depend on HAS_DMA
    lib: sw842: select crc32
    crypto: af_alg - Forbid bind(2) when nokey child sockets are present
    crypto: algif_skcipher - Remove custom release parent function
    crypto: algif_hash - Remove custom release parent function
    crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path
    crypto: qat - update init_esram for C3xxx dev type
    crypto: qat - fix timeout issues
    crypto: qat - remove to call get_sram_bar_id for qat_c3xxx
    crypto: algif_skcipher - Add key check exception for cipher_null
    crypto: skcipher - Add crypto_skcipher_has_setkey
    crypto: algif_hash - Require setkey before accept(2)
    crypto: hash - Add crypto_ahash_has_setkey
    crypto: algif_skcipher - Add nokey compatibility path
    ...

    Linus Torvalds
     

18 Jan, 2016

1 commit

  • The sw842 library code was merged in linux-4.1 and causes a very rare randconfig
    failure when CONFIG_CRC32 is not set:

    lib/built-in.o: In function `sw842_compress':
    oid_registry.c:(.text+0x12ddc): undefined reference to `crc32_be'
    lib/built-in.o: In function `sw842_decompress':
    oid_registry.c:(.text+0x137e4): undefined reference to `crc32_be'

    This adds an explict 'select CRC32' statement, similar to what the other users
    of the crc32 code have. In practice, CRC32 is always enabled anyway because
    over 100 other symbols select it.

    Cc: stable@vger.kernel.org
    Signed-off-by: Arnd Bergmann
    Fixes: 2da572c959dd ("lib: add software 842 compression/decompression")
    Acked-by: Dan Streetman
    Signed-off-by: Herbert Xu

    Arnd Bergmann
     

12 Dec, 2015

1 commit


08 Dec, 2015

1 commit


17 Oct, 2015

1 commit

  • lib/built-in.o: In function `__bitrev32':
    deftree.c:(.text+0x1e799): undefined reference to `byte_rev_table'
    deftree.c:(.text+0x1e7a0): undefined reference to `byte_rev_table'
    deftree.c:(.text+0x1e7b4): undefined reference to `byte_rev_table'
    deftree.c:(.text+0x1e7c1): undefined reference to `byte_rev_table'

    Anything which uses bitrevX() has to select BITREVERSE, to grab
    lib/bitrev.o.

    Reported-by: Jim Davis
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

09 Sep, 2015

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "This update has successfully completed a 0day-kbuild run and has
    appeared in a linux-next release. The changes outside of the typical
    drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
    removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
    the introduction of ZONE_DEVICE + devm_memremap_pages().

    Summary:

    - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
    mechanism for adding device-driver-discovered memory regions to the
    kernel's direct map.

    This facility is used by the pmem driver to enable pfn_to_page()
    operations on the page frames returned by DAX ('direct_access' in
    'struct block_device_operations').

    For now, the 'memmap' allocation for these "device" pages comes
    from "System RAM". Support for allocating the memmap from device
    memory will arrive in a later kernel.

    - Introduce memremap() to replace usages of ioremap_cache() and
    ioremap_wt(). memremap() drops the __iomem annotation for these
    mappings to memory that do not have i/o side effects. The
    replacement of ioremap_cache() with memremap() is limited to the
    pmem driver to ease merging the api change in v4.3.

    Completion of the conversion is targeted for v4.4.

    - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
    driver, update the VFS DAX implementation and PMEM api to provide
    persistence guarantees for kernel operations on a DAX mapping.

    - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
    cacheable to improve performance.

    - Miscellaneous updates and fixes to libnvdimm including support for
    issuing "address range scrub" commands, clarifying the optimal
    'sector size' of pmem devices, a clarification of the usage of the
    ACPI '_STA' (status) property for DIMM devices, and other minor
    fixes"

    * tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
    libnvdimm, pmem: direct map legacy pmem by default
    libnvdimm, pmem: 'struct page' for pmem
    libnvdimm, pfn: 'struct page' provider infrastructure
    x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
    add devm_memremap_pages
    mm: ZONE_DEVICE for "device memory"
    mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
    dax: drop size parameter to ->direct_access()
    nd_blk: change aperture mapping from WC to WB
    nvdimm: change to use generic kvfree()
    pmem, dax: have direct_access use __pmem annotation
    dax: update I/O path to do proper PMEM flushing
    pmem: add copy_from_iter_pmem() and clear_pmem()
    pmem, x86: clean up conditional pmem includes
    pmem: remove layer when calling arch_has_wmb_pmem()
    pmem, x86: move x86 PMEM API to new pmem.h header
    libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
    pmem: switch to devm_ allocations
    devres: add devm_memremap
    libnvdimm, btt: write and validate parent_uuid
    ...

    Linus Torvalds
     

06 Sep, 2015

1 commit

  • Pull vfs updates from Al Viro:
    "In this one:

    - d_move fixes (Eric Biederman)

    - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
    handling ought to be fixed)

    - switch of sb_writers to percpu rwsem (Oleg Nesterov)

    - superblock scalability (Josef Bacik and Dave Chinner)

    - swapon(2) race fix (Hugh Dickins)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
    vfs: Test for and handle paths that are unreachable from their mnt_root
    dcache: Reduce the scope of i_lock in d_splice_alias
    dcache: Handle escaped paths in prepend_path
    mm: fix potential data race in SyS_swapon
    inode: don't softlockup when evicting inodes
    inode: rename i_wb_list to i_io_list
    sync: serialise per-superblock sync operations
    inode: convert inode_sb_list_lock to per-sb
    inode: add hlist_fake to avoid the inode hash lock in evict
    writeback: plug writeback at a high level
    change sb_writers to use percpu_rw_semaphore
    shift percpu_counter_destroy() into destroy_super_work()
    percpu-rwsem: kill CONFIG_PERCPU_RWSEM
    percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
    percpu-rwsem: introduce percpu_down_read_trylock()
    document rwsem_release() in sb_wait_write()
    fix the broken lockdep logic in __sb_start_write()
    introduce __sb_writers_{acquired,release}() helpers
    ufs_inode_get{frag,block}(): get rid of 'phys' argument
    ufs_getfrag_block(): tidy up a bit
    ...

    Linus Torvalds
     

03 Sep, 2015

1 commit

  • Pull networking updates from David Miller:
    "Another merge window, another set of networking changes. I've heard
    rumblings that the lightweight tunnels infrastructure has been voted
    networking change of the year. But what do I know?

    1) Add conntrack support to openvswitch, from Joe Stringer.

    2) Initial support for VRF (Virtual Routing and Forwarding), which
    allows the segmentation of routing paths without using multiple
    devices. There are some semantic kinks to work out still, but
    this is a reasonably strong foundation. From David Ahern.

    3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.

    4) Ignore route nexthops with a link down state in ipv6, just like
    ipv4. From Andy Gospodarek.

    5) Remove spinlock from fast path of act_gact and act_mirred, from
    Eric Dumazet.

    6) Document the DSA layer, from Florian Fainelli.

    7) Add netconsole support to bcmgenet, systemport, and DSA. Also
    from Florian Fainelli.

    8) Add Mellanox Switch Driver and core infrastructure, from Jiri
    Pirko.

    9) Add support for "light weight tunnels", which allow for
    encapsulation and decapsulation without bearing the overhead of a
    full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of
    others.

    10) Add Identifier Locator Addressing support for ipv6, from Tom
    Herbert.

    11) Support fragmented SKBs in iwlwifi, from Johannes Berg.

    12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.

    13) Add BQL support to 3c59x driver, from Loganaden Velvindron.

    14) Stop using a zero TX queue length to mean that a device shouldn't
    have a qdisc attached, use an explicit flag instead. From Phil
    Sutter.

    15) Use generic geneve netdevice infrastructure in openvswitch, from
    Pravin B Shelar.

    16) Add infrastructure to avoid re-forwarding a packet in software
    that was already forwarded by a hardware switch. From Scott
    Feldman.

    17) Allow AF_PACKET fanout function to be implemented in a bpf
    program, from Willem de Bruijn"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
    netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
    netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
    net: fec: clear receive interrupts before processing a packet
    ipv6: fix exthdrs offload registration in out_rt path
    xen-netback: add support for multicast control
    bgmac: Update fixed_phy_register()
    sock, diag: fix panic in sock_diag_put_filterinfo
    flow_dissector: Use 'const' where possible.
    flow_dissector: Fix function argument ordering dependency
    ixgbe: Resolve "initialized field overwritten" warnings
    ixgbe: Remove bimodal SR-IOV disabling
    ixgbe: Add support for reporting 2.5G link speed
    ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
    ixgbe: support for ethtool set_rxfh
    ixgbe: Avoid needless PHY access on copper phys
    ixgbe: cleanup to use cached mask value
    ixgbe: Remove second instance of lan_id variable
    ixgbe: use kzalloc for allocating one thing
    flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
    ixgbe: Remove unused PCI bus types
    ...

    Linus Torvalds
     

28 Aug, 2015

1 commit

  • This should result in a pretty sizeable performance gain for reads. For
    rough comparison I did some simple read testing using PMEM to compare
    reads of write combining (WC) mappings vs write-back (WB). This was
    done on a random lab machine.

    PMEM reads from a write combining mapping:
    # dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
    100000+0 records in
    100000+0 records out
    409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s

    PMEM reads from a write-back mapping:
    # dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
    1000000+0 records in
    1000000+0 records out
    4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s

    To be able to safely support a write-back aperture I needed to add
    support for the "read flush" _DSM flag, as outlined in the DSM spec:

    http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

    This flag tells the ND BLK driver that it needs to flush the cache lines
    associated with the aperture after the aperture is moved but before any
    new data is read. This ensures that any stale cache lines from the
    previous contents of the aperture will be discarded from the processor
    cache, and the new data will be read properly from the DIMM. We know
    that the cache lines are clean and will be discarded without any
    writeback because either a) the previous aperture operation was a read,
    and we never modified the contents of the aperture, or b) the previous
    aperture operation was a write and we must have written back the dirtied
    contents of the aperture to the DIMM before the I/O was completed.

    In order to add support for the "read flush" flag I needed to add a
    generic routine to invalidate cache lines, mmio_flush_range(). This is
    protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
    only supported on x86.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Ross Zwisler
     

25 Aug, 2015

1 commit

  • Sometimes a scatter-gather has to be split into several chunks, or sub
    scatter lists. This happens for example if a scatter list will be
    handled by multiple DMA channels, each one filling a part of it.

    A concrete example comes with the media V4L2 API, where the scatter list
    is allocated from userspace to hold an image, regardless of the
    knowledge of how many DMAs will fill it :
    - in a simple RGB565 case, one DMA will pump data from the camera ISP
    to memory
    - in the trickier YUV422 case, 3 DMAs will pump data from the camera
    ISP pipes, one for pipe Y, one for pipe U and one for pipe V

    For these cases, it is necessary to split the original scatter list into
    multiple scatter lists, which is the purpose of this patch.

    The guarantees that are required for this patch are :
    - the intersection of spans of any couple of resulting scatter lists is
    empty.
    - the union of spans of all resulting scatter lists is a subrange of
    the span of the original scatter list.
    - streaming DMA API operations (mapping, unmapping) should not happen
    both on both the resulting and the original scatter list. It's either
    the first or the later ones.
    - the caller is reponsible to call kfree() on the resulting
    scatterlists.

    Signed-off-by: Robert Jarzmik
    Signed-off-by: Jens Axboe

    Robert Jarzmik
     

21 Aug, 2015

1 commit


15 Aug, 2015

1 commit