02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

15 Sep, 2017

1 commit

  • Pull zstd support from Chris Mason:
    "Nick Terrell's patch series to add zstd support to the kernel has been
    floating around for a while. After talking with Dave Sterba, Herbert
    and Phillip, we decided to send the whole thing in as one pull
    request.

    zstd is a big win in speed over zlib and in compression ratio over
    lzo, and the compression team here at FB has gotten great results
    using it in production. Nick will continue to update the kernel side
    with new improvements from the open source zstd userland code.

    Nick has a number of benchmarks for the main zstd code in his lib/zstd
    commit:

    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB
    of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel
    Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using
    `silesia.tar` [3], which is 211,988,480 B large. Run the following
    commands for the benchmark:

    sudo modprobe zstd_compress_test
    sudo mknod zstd_compress_test c 245 0
    sudo cp silesia.tar zstd_compress_test

    The time is reported by the time of the userland `cp`.
    The MB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

    which includes the time to copy from userland.
    The Adjusted MB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

    The memory reported is the amount of memory the compressor
    requests.

    | Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) |
    |----------|----------|----------|-------|---------|----------|----------|
    | none | 11988480 | 0.100 | 1 | 2119.88 | - | - |
    | zstd -1 | 73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 |
    | zstd -3 | 66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 |
    | zstd -5 | 65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 |
    | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 |
    | zstd -15 | 58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 |
    | zstd -19 | 54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 |
    | zlib -1 | 77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 |
    | zlib -3 | 72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 |
    | zlib -6 | 68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 |
    | zlib -9 | 67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 |

    I benchmarked zstd decompression using the same method on the same
    machine. The benchmark file is located in the upstream zstd repo
    under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The
    memory reported is the amount of memory required to decompress
    data compressed with the given compression level. If you know the
    maximum size of your input, you can reduce the memory usage of
    decompression irrespective of the compression level.

    | Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) |
    |----------|----------|---------|---------------|-------------|
    | none | 0.025 | 8479.54 | - | - |
    | zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 |
    | zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 |
    | zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 |
    | zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 |
    | zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 |
    | zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 |
    | zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 |
    | zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 |

    I ran a long series of tests and benchmarks on the btrfs side and the
    gains are very similar to the core benchmarks Nick ran"

    * 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    squashfs: Add zstd support
    btrfs: Add zstd support
    lib: Add zstd modules
    lib: Add xxhash module

    Linus Torvalds
     

09 Sep, 2017

1 commit

  • Add a test module that allows testing that CONFIG_DEBUG_VIRTUAL works
    correctly, at least that it can catch invalid calls to virt_to_phys()
    against the non-linear kernel virtual address map.

    Link: http://lkml.kernel.org/r/20170808164035.26725-1-f.fainelli@gmail.com
    Signed-off-by: Florian Fainelli
    Cc: "Luis R. Rodriguez"
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Fainelli
     

16 Aug, 2017

2 commits

  • Add zstd compression and decompression kernel modules.
    zstd offers a wide varity of compression speed and quality trade-offs.
    It can compress at speeds approaching lz4, and quality approaching lzma.
    zstd decompressions at speeds more than twice as fast as zlib, and
    decompression speed remains roughly the same across all compression levels.

    The code was ported from the upstream zstd source repository. The
    `linux/zstd.h` header was modified to match linux kernel style.
    The cross-platform and allocation code was stripped out. Instead zstd
    requires the caller to pass a preallocated workspace. The source files
    were clang-formatted [1] to match the Linux Kernel style as much as
    possible. Otherwise, the code was unmodified. We would like to avoid
    as much further manual modification to the source code as possible, so it
    will be easier to keep the kernel zstd up to date.

    I benchmarked zstd compression as a special character device. I ran zstd
    and zlib compression at several levels, as well as performing no
    compression, which measure the time spent copying the data to kernel space.
    Data is passed to the compresser 4096 B at a time. The benchmark file is
    located in the upstream zstd source repository under
    `contrib/linux-kernel/zstd_compress_test.c` [2].

    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
    211,988,480 B large. Run the following commands for the benchmark:

    sudo modprobe zstd_compress_test
    sudo mknod zstd_compress_test c 245 0
    sudo cp silesia.tar zstd_compress_test

    The time is reported by the time of the userland `cp`.
    The MB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

    which includes the time to copy from userland.
    The Adjusted MB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

    The memory reported is the amount of memory the compressor requests.

    | Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) |
    |----------|----------|----------|-------|---------|----------|----------|
    | none | 11988480 | 0.100 | 1 | 2119.88 | - | - |
    | zstd -1 | 73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 |
    | zstd -3 | 66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 |
    | zstd -5 | 65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 |
    | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 |
    | zstd -15 | 58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 |
    | zstd -19 | 54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 |
    | zlib -1 | 77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 |
    | zlib -3 | 72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 |
    | zlib -6 | 68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 |
    | zlib -9 | 67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 |

    I benchmarked zstd decompression using the same method on the same machine.
    The benchmark file is located in the upstream zstd repo under
    `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is
    the amount of memory required to decompress data compressed with the given
    compression level. If you know the maximum size of your input, you can
    reduce the memory usage of decompression irrespective of the compression
    level.

    | Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) |
    |----------|----------|---------|---------------|-------------|
    | none | 0.025 | 8479.54 | - | - |
    | zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 |
    | zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 |
    | zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 |
    | zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 |
    | zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 |
    | zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 |
    | zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 |
    | zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 |
    | zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 |

    Tested in userland using the test-suite in the zstd repo under
    `contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel
    functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under
    `contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8]
    with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the
    BtrFS and SquashFS patches coming next.

    [1] https://clang.llvm.org/docs/ClangFormat.html
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c
    [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c
    [5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp
    [6] http://llvm.org/docs/LibFuzzer.html
    [7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c
    [8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c

    zstd source repository: https://github.com/facebook/zstd

    Signed-off-by: Nick Terrell
    Signed-off-by: Chris Mason

    Nick Terrell
     
  • Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
    extremely fast non-cryptographic hash algorithm for checksumming.
    The zstd compression and decompression modules added in the next patch
    require xxhash. I extracted it out from zstd since it is useful on its
    own. I copied the code from the upstream XXHash source repository and
    translated it into kernel style. I ran benchmarks and tests in the kernel
    and tests in userland.

    I benchmarked xxhash as a special character device. I ran in four modes,
    no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
    kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
    hashes on the copied data. I also ran it with four different buffer sizes.
    The benchmark file is located in the upstream zstd source repository under
    `contrib/linux-kernel/xxhash_test.c` [1].

    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
    from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
    Run the following commands for the benchmark:

    modprobe xxhash_test
    mknod xxhash_test c 245 0
    time cp filesystem.squashfs xxhash_test

    The time is reported by the time of the userland `cp`.
    The GB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

    which includes the time to copy from userland.
    The Normalized GB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

    | Buffer Size (B) | Hash | Time (s) | GB/s | Adjusted GB/s |
    |-----------------|-------|----------|------|---------------|
    | 1024 | none | 0.408 | 3.77 | - |
    | 1024 | xxh32 | 0.649 | 2.37 | 6.37 |
    | 1024 | xxh64 | 0.542 | 2.83 | 11.46 |
    | 1024 | crc32 | 1.290 | 1.19 | 1.74 |
    | 4096 | none | 0.380 | 4.04 | - |
    | 4096 | xxh32 | 0.645 | 2.38 | 5.79 |
    | 4096 | xxh64 | 0.500 | 3.07 | 12.80 |
    | 4096 | crc32 | 1.168 | 1.32 | 1.95 |
    | 8192 | none | 0.351 | 4.38 | - |
    | 8192 | xxh32 | 0.614 | 2.50 | 5.84 |
    | 8192 | xxh64 | 0.464 | 3.31 | 13.60 |
    | 8192 | crc32 | 1.163 | 1.32 | 1.89 |
    | 16384 | none | 0.346 | 4.43 | - |
    | 16384 | xxh32 | 0.590 | 2.60 | 6.30 |
    | 16384 | xxh64 | 0.466 | 3.30 | 12.80 |
    | 16384 | crc32 | 1.183 | 1.30 | 1.84 |

    Tested in userland using the test-suite in the zstd repo under
    `contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
    kernel functions. A line in each branch of every function in `xxhash.c`
    was commented out to ensure that the test-suite fails. Additionally
    tested while testing zstd and with SMHasher [3].

    [1] https://phabricator.intern.facebook.com/P57526246
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
    [3] https://github.com/aappleby/smhasher

    zstd source repository: https://github.com/facebook/zstd
    XXHash source repository: https://github.com/cyan4973/xxhash

    Signed-off-by: Nick Terrell
    Signed-off-by: Chris Mason

    Nick Terrell
     

15 Jul, 2017

1 commit

  • This adds a new stress test driver for kmod: the kernel module loader.
    The new stress test driver, test_kmod, is only enabled as a module right
    now. It should be possible to load this as built-in and load tests
    early (refer to the force_init_test module parameter), however since a
    lot of test can get a system out of memory fast we leave this disabled
    for now.

    Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
    fast with this test driver.

    The test_kmod driver exposes API knobs for us to fine tune simple
    request_module() and get_fs_type() calls. Since these API calls only
    allow each one parameter a test driver for these is rather simple.
    Other factors that can help out test driver though are the number of
    calls we issue and knowing current limitations of each. This exposes
    configuration as much as possible through userspace to be able to build
    tests directly from userspace.

    Since it allows multiple misc devices its will eventually (once we add a
    knob to let us create new devices at will) also be possible to perform
    more tests in parallel, provided you have enough memory.

    We only enable tests we know work as of right now.

    Demo screenshots:

    # tools/testing/selftests/kmod/kmod.sh
    kmod_test_0001_driver: OK! - loading kmod test
    kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
    kmod_test_0001_fs: OK! - loading kmod test
    kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
    kmod_test_0002_driver: OK! - loading kmod test
    kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
    kmod_test_0002_fs: OK! - loading kmod test
    kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
    kmod_test_0003: OK! - loading kmod test
    kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0004: OK! - loading kmod test
    kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0005: OK! - loading kmod test
    kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0006: OK! - loading kmod test
    kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0005: OK! - loading kmod test
    kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    kmod_test_0006: OK! - loading kmod test
    kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
    XXX: add test restult for 0007
    Test completed

    You can also request for specific tests:

    # tools/testing/selftests/kmod/kmod.sh -t 0001
    kmod_test_0001_driver: OK! - loading kmod test
    kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
    kmod_test_0001_fs: OK! - loading kmod test
    kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
    Test completed

    Lastly, the current available number of tests:

    # tools/testing/selftests/kmod/kmod.sh --help
    Usage: tools/testing/selftests/kmod/kmod.sh [ -t ]
    Valid tests: 0001-0009

    0001 - Simple test - 1 thread for empty string
    0002 - Simple test - 1 thread for modules/filesystems that do not exist
    0003 - Simple test - 1 thread for get_fs_type() only
    0004 - Simple test - 2 threads for get_fs_type() only
    0005 - multithreaded tests with default setup - request_module() only
    0006 - multithreaded tests with default setup - get_fs_type() only
    0007 - multithreaded tests with default setup test request_module() and get_fs_type()
    0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
    0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()

    The following test cases currently fail, as such they are not currently
    enabled by default:

    # tools/testing/selftests/kmod/kmod.sh -t 0008
    # tools/testing/selftests/kmod/kmod.sh -t 0009

    To be sure to run them as intended please unload both of the modules:

    o test_module
    o xfs

    And ensure they are not loaded on your system prior to testing them. If
    you use these paritions for your rootfs you can change the default test
    driver used for get_fs_type() by exporting it into your environment. For
    example of other test defaults you can override refer to kmod.sh
    allow_user_defaults().

    Behind the scenes this is how we fine tune at a test case prior to
    hitting a trigger to run it:

    cat /sys/devices/virtual/misc/test_kmod0/config
    echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
    echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
    echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
    cat /sys/devices/virtual/misc/test_kmod0/config
    echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads

    Finally to trigger:

    echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config

    The kmod.sh script uses the above constructs to build different test cases.

    A bit of interpretation of the current failures follows, first two
    premises:

    a) When request_module() is used userspace figures out an optimized
    version of module order for us. Once it finds the modules it needs, as
    per depmod symbol dep map, it will finit_module() the respective
    modules which are needed for the original request_module() request.

    b) We have an optimization in place whereby if a kernel uses
    request_module() on a module already loaded we never bother userspace
    as the module already is loaded. This is all handled by kernel/kmod.c.

    A few things to consider to help identify root causes of issues:

    0) kmod 19 has a broken heuristic for modules being assumed to be
    built-in to your kernel and will return 0 even though request_module()
    failed. Upgrade to a newer version of kmod.

    1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
    not for "xfs". The optimization in kernel described in b) fails to
    catch if we have a lot of consecutive get_fs_type() calls. The reason
    is the optimization in place does not look for aliases. This means two
    consecutive get_fs_type() calls will bump kmod_concurrent, whereas
    request_module() will not.

    This one explanation why test case 0009 fails at least once for
    get_fs_type().

    2) If a module fails to load --- for whatever reason (kmod_concurrent
    limit reached, file not yet present due to rootfs switch, out of
    memory) we have a period of time during which module request for the
    same name either with request_module() or get_fs_type() will *also*
    fail to load even if the file for the module is ready.

    This explains why *multiple* NULLs are possible on test 0009.

    3) finit_module() consumes quite a bit of memory.

    4) Filesystems typically also have more dependent modules than other
    modules, its important to note though that even though a get_fs_type()
    call does not incur additional kmod_concurrent bumps, since userspace
    loads dependencies it finds it needs via finit_module_fd(), it *will*
    take much more memory to load a module with a lot of dependencies.

    Because of 3) and 4) we will easily run into out of memory failures with
    certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM.
    It panics a box after reaping all userspace processes and still not
    having enough memory to reap.

    [arnd@arndb.de: add dependencies for test module]
    Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Cc: Jessica Yu
    Cc: Shuah Khan
    Cc: Rusty Russell
    Cc: Michal Marek
    Cc: Petr Mladek
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     

13 Jul, 2017

1 commit

  • The existing tools/testing/selftests/sysctl/ tests include two test
    cases, but these use existing production kernel sysctl interfaces. We
    want to expand test coverage but we can't just be looking for random
    safe production values to poke at, that's just insane!

    Instead just dedicate a test driver for debugging purposes and port the
    existing scripts to use it. This will make it easier for further tests
    to be added.

    Subsequent patches will extend our test coverage for sysctl.

    The stress test driver uses a new license (GPL on Linux, copyleft-next
    outside of Linux). Linus was fine with this [0] and later due to Ted's
    and Alans's request ironed out an "or" language clause to use [1] which
    is already present upstream.

    [0] https://lkml.kernel.org/r/CA+55aFyhxcvD+q7tp+-yrSFDKfR0mOHgyEAe=f_94aKLsOu0Og@mail.gmail.com
    [1] https://lkml.kernel.org/r/1495234558.7848.122.camel@linux.intel.com

    Link: http://lkml.kernel.org/r/20170630224431.17374-2-mcgrof@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Acked-by: Kees Cook
    Cc: "Eric W. Biederman"
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     

08 Jul, 2017

1 commit

  • Pull Writeback error handling updates from Jeff Layton:
    "This pile represents the bulk of the writeback error handling fixes
    that I have for this cycle. Some of the earlier patches in this pile
    may look trivial but they are prerequisites for later patches in the
    series.

    The aim of this set is to improve how we track and report writeback
    errors to userland. Most applications that care about data integrity
    will periodically call fsync/fdatasync/msync to ensure that their
    writes have made it to the backing store.

    For a very long time, we have tracked writeback errors using two flags
    in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
    writeback error occurs (via mapping_set_error) and are cleared as a
    side-effect of filemap_check_errors (as you noted yesterday). This
    model really sucks for userland.

    Only the first task to call fsync (or msync or fdatasync) will see the
    error. Any subsequent task calling fsync on a file will get back 0
    (unless another writeback error occurs in the interim). If I have
    several tasks writing to a file and calling fsync to ensure that their
    writes got stored, then I need to have them coordinate with one
    another. That's difficult enough, but in a world of containerized
    setups that coordination may even not be possible.

    But wait...it gets worse!

    The calls to filemap_check_errors can be buried pretty far down in the
    call stack, and there are internal callers of filemap_write_and_wait
    and the like that also end up clearing those errors. Many of those
    callers ignore the error return from that function or return it to
    userland at nonsensical times (e.g. truncate() or stat()). If I get
    back -EIO on a truncate, there is no reason to think that it was
    because some previous writeback failed, and a subsequent fsync() will
    (incorrectly) return 0.

    This pile aims to do three things:

    1) ensure that when a writeback error occurs that that error will be
    reported to userland on a subsequent fsync/fdatasync/msync call,
    regardless of what internal callers are doing

    2) report writeback errors on all file descriptions that were open at
    the time that the error occurred. This is a user-visible change,
    but I think most applications are written to assume this behavior
    anyway. Those that aren't are unlikely to be hurt by it.

    3) document what filesystems should do when there is a writeback
    error. Today, there is very little consistency between them, and a
    lot of cargo-cult copying. We need to make it very clear what
    filesystems should do in this situation.

    To achieve this, the set adds a new data type (errseq_t) and then
    builds new writeback error tracking infrastructure around that. Once
    all of that is in place, we change the filesystems to use the new
    infrastructure for reporting wb errors to userland.

    Note that this is just the initial foray into cleaning up this mess.
    There is a lot of work remaining here:

    1) convert the rest of the filesystems in a similar fashion. Once the
    initial set is in, then I think most other fs' will be fairly
    simple to convert. Hopefully most of those can in via individual
    filesystem trees.

    2) convert internal waiters on writeback to use errseq_t for
    detecting errors instead of relying on the AS_* flags. I have some
    draft patches for this for ext4, but they are not quite ready for
    prime time yet.

    This was a discussion topic this year at LSF/MM too. If you're
    interested in the gory details, LWN has some good articles about this:

    https://lwn.net/Articles/718734/
    https://lwn.net/Articles/724307/"

    * tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    btrfs: minimal conversion to errseq_t writeback error reporting on fsync
    xfs: minimal conversion to errseq_t writeback error reporting
    ext4: use errseq_t based error handling for reporting data writeback errors
    fs: convert __generic_file_fsync to use errseq_t based reporting
    block: convert to errseq_t based writeback error tracking
    dax: set errors in mapping when writeback fails
    Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
    mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
    fs: new infrastructure for writeback error handling and reporting
    lib: add errseq_t type and infrastructure for handling it
    mm: don't TestClearPageError in __filemap_fdatawait_range
    mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
    jbd2: don't clear and reset errors after waiting on writeback
    buffer: set errors in mapping at the time that the error occurs
    fs: check for writeback errors after syncing out buffers in generic_file_fsync
    buffer: use mapping_set_error instead of setting the flag
    mm: fix mapping_set_error call in me_pagecache_dirty

    Linus Torvalds
     

06 Jul, 2017

1 commit

  • An errseq_t is a way of recording errors in one place, and allowing any
    number of "subscribers" to tell whether an error has been set again
    since a previous time.

    It's implemented as an unsigned 32-bit value that is managed with atomic
    operations. The low order bits are designated to hold an error code
    (max size of MAX_ERRNO). The upper bits are used as a counter.

    The API works with consumers sampling an errseq_t value at a particular
    point in time. Later, that value can be used to tell whether new errors
    have been set since that time.

    Note that there is a 1 in 512k risk of collisions here if new errors
    are being recorded frequently, since we have so few bits to use as a
    counter. To mitigate this, one bit is used as a flag to tell whether the
    value has been sampled since a new value was recorded. That allows
    us to avoid bumping the counter if no one has sampled it since it
    was last bumped.

    Later patches will build on this infrastructure to change how writeback
    errors are tracked in the kernel.

    Signed-off-by: Jeff Layton
    Reviewed-by: NeilBrown
    Reviewed-by: Jan Kara

    Jeff Layton
     

04 Jul, 2017

1 commit

  • Pull char/misc updates from Greg KH:
    "Here is the "big" char/misc driver patchset for 4.13-rc1.

    Lots of stuff in here, a large thunderbolt update, w1 driver header
    reorg, the new mux driver subsystem, google firmware driver updates,
    and a raft of other smaller things. Full details in the shortlog.

    All of these have been in linux-next for a while with the only
    reported issue being a merge problem with this tree and the jc-docs
    tree in the w1 documentation area"

    * tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (147 commits)
    misc: apds990x: Use sysfs_match_string() helper
    mei: drop unreachable code in mei_start
    mei: validate the message header only in first fragment.
    DocBook: w1: Update W1 file locations and names in DocBook
    mux: adg792a: always require I2C support
    nvmem: rockchip-efuse: add support for rk322x-efuse
    nvmem: core: add locking to nvmem_find_cell
    nvmem: core: Call put_device() in nvmem_unregister()
    nvmem: core: fix leaks on registration errors
    nvmem: correct Broadcom OTP controller driver writes
    w1: Add subsystem kernel public interface
    drivers/fsi: Add module license to core driver
    drivers/fsi: Use asynchronous slave mode
    drivers/fsi: Add hub master support
    drivers/fsi: Add SCOM FSI client device driver
    drivers/fsi/gpio: Add tracepoints for GPIO master
    drivers/fsi: Add GPIO based FSI master
    drivers/fsi: Document FSI master sysfs files in ABI
    drivers/fsi: Add error handling for slave
    drivers/fsi: Add tracepoints for low-level operations
    ...

    Linus Torvalds
     

09 Jun, 2017

2 commits

  • Add a little helper for crc4 calculations. This works 4-bits-at-a-time,
    using a simple table approach.

    We will need this in the FSI core code, as well as any master
    implementations that need to calculate CRCs in software.

    Signed-off-by: Jeremy Kerr
    Signed-off-by: Chris Bostic
    Signed-off-by: Joel Stanley
    Signed-off-by: Greg Kroah-Hartman

    Jeremy Kerr
     
  • The sparse-based checking for non-RCU accesses to RCU-protected pointers
    has been around for a very long time, and it is now the only type of
    sparse-based checking that is optional. This commit therefore makes
    it unconditional.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Cc: Fengguang Wu

    Paul E. McKenney
     

09 May, 2017

1 commit

  • Extract the linked list sorting test code into its own source file, to
    allow to compile it either to a loadable module, or builtin into the
    kernel.

    Link: http://lkml.kernel.org/r/1488287219-15832-4-git-send-email-geert@linux-m68k.org
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Andy Shevchenko
    Cc: Arnd Bergmann
    Cc: Paul Gortmaker
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

03 May, 2017

1 commit

  • Pull crypto updates from Herbert Xu:
    "Here is the crypto update for 4.12:

    API:
    - Add batch registration for acomp/scomp
    - Change acomp testing to non-unique compressed result
    - Extend algorithm name limit to 128 bytes
    - Require setkey before accept(2) in algif_aead

    Algorithms:
    - Add support for deflate rfc1950 (zlib)

    Drivers:
    - Add accelerated crct10dif for powerpc
    - Add crc32 in stm32
    - Add sha384/sha512 in ccp
    - Add 3des/gcm(aes) for v5 devices in ccp
    - Add Queue Interface (QI) backend support in caam
    - Add new Exynos RNG driver
    - Add ThunderX ZIP driver
    - Add driver for hardware random generator on MT7623 SoC"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (101 commits)
    crypto: stm32 - Fix OF module alias information
    crypto: algif_aead - Require setkey before accept(2)
    crypto: scomp - add support for deflate rfc1950 (zlib)
    crypto: scomp - allow registration of multiple scomps
    crypto: ccp - Change ISR handler method for a v5 CCP
    crypto: ccp - Change ISR handler method for a v3 CCP
    crypto: crypto4xx - rename ce_ring_contol to ce_ring_control
    crypto: testmgr - Allow ecb(cipher_null) in FIPS mode
    Revert "crypto: arm64/sha - Add constant operand modifier to ASM_EXPORT"
    crypto: ccp - Disable interrupts early on unload
    crypto: ccp - Use only the relevant interrupt bits
    hwrng: mtk - Add driver for hardware random generator on MT7623 SoC
    dt-bindings: hwrng: Add Mediatek hardware random generator bindings
    crypto: crct10dif-vpmsum - Fix missing preempt_disable()
    crypto: testmgr - replace compression known answer test
    crypto: acomp - allow registration of multiple acomps
    hwrng: n2 - Use devm_kcalloc() in n2rng_probe()
    crypto: chcr - Fix error handling related to 'chcr_alloc_shash'
    padata: get_next is never NULL
    crypto: exynos - Add new Exynos RNG driver
    ...

    Linus Torvalds
     

27 Apr, 2017

1 commit


29 Mar, 2017

1 commit


24 Mar, 2017

1 commit

  • The md5_transform function is no longer used any where in the tree,
    except for the crypto api's actual implementation of md5, so we can drop
    the function from lib and put it as a static function of the crypto
    file, where it belongs. There should be no new users of md5_transform,
    anyway, since there are more modern ways of doing what it once achieved.

    Signed-off-by: Jason A. Donenfeld
    Reviewed-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     

01 Mar, 2017

2 commits

  • Pull IDR rewrite from Matthew Wilcox:
    "The most significant part of the following is the patch to rewrite the
    IDR & IDA to be clients of the radix tree. But there's much more,
    including an enhancement of the IDA to be significantly more space
    efficient, an IDR & IDA test suite, some improvements to the IDR API
    (and driver changes to take advantage of those improvements), several
    improvements to the radix tree test suite and RCU annotations.

    The IDR & IDA rewrite had a good spin in linux-next and Andrew's tree
    for most of the last cycle. Coupled with the IDR test suite, I feel
    pretty confident that any remaining bugs are quite hard to hit. 0-day
    did a great job of watching my git tree and pointing out problems; as
    it hit them, I added new test-cases to be sure not to be caught the
    same way twice"

    Willy goes on to expand a bit on the IDR rewrite rationale:
    "The radix tree and the IDR use very similar data structures.

    Merging the two codebases lets us share the memory allocation pools,
    and results in a net deletion of 500 lines of code. It also opens up
    the possibility of exposing more of the features of the radix tree to
    users of the IDR (and I have some interesting patches along those
    lines waiting for 4.12)

    It also shrinks the size of the 'struct idr' from 40 bytes to 24 which
    will shrink a fair few data structures that embed an IDR"

    * 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax: (32 commits)
    radix tree test suite: Add config option for map shift
    idr: Add missing __rcu annotations
    radix-tree: Fix __rcu annotations
    radix-tree: Add rcu_dereference and rcu_assign_pointer calls
    radix tree test suite: Run iteration tests for longer
    radix tree test suite: Fix split/join memory leaks
    radix tree test suite: Fix leaks in regression2.c
    radix tree test suite: Fix leaky tests
    radix tree test suite: Enable address sanitizer
    radix_tree_iter_resume: Fix out of bounds error
    radix-tree: Store a pointer to the root in each node
    radix-tree: Chain preallocated nodes through ->parent
    radix tree test suite: Dial down verbosity with -v
    radix tree test suite: Introduce kmalloc_verbose
    idr: Return the deleted entry from idr_remove
    radix tree test suite: Build separate binaries for some tests
    ida: Use exceptional entries for small IDAs
    ida: Move ida_bitmap to a percpu variable
    Reimplement IDR and IDA using the radix tree
    radix-tree: Add radix_tree_iter_delete
    ...

    Linus Torvalds
     
  • Pull locking fixes from Ingo Molnar:
    "The main change is the uninlining of large refcount_t APIs, plus a
    header dependency fix.

    Note that the uninlining allowed us to enable the underflow/overflow
    warnings unconditionally and remove the debug Kconfig switch: this
    might trigger new warnings in buggy code and turn
    crashes/use-after-free bugs into less harmful memory leaks"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/refcounts: Add missing kernel.h header to have UINT_MAX defined
    locking/refcounts: Out-of-line everything

    Linus Torvalds
     

26 Feb, 2017

1 commit

  • Pull rdma DMA mapping updates from Doug Ledford:
    "Drop IB DMA mapping code and use core DMA code instead.

    Bart Van Assche noted that the ib DMA mapping code was significantly
    similar enough to the core DMA mapping code that with a few changes it
    was possible to remove the IB DMA mapping code entirely and switch the
    RDMA stack to use the core DMA mapping code.

    This resulted in a nice set of cleanups, but touched the entire tree
    and has been kept separate for that reason."

    * tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
    IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
    IB/core: Remove ib_device.dma_device
    nvme-rdma: Switch from dma_device to dev.parent
    RDS: net: Switch from dma_device to dev.parent
    IB/srpt: Modify a debug statement
    IB/srp: Switch from dma_device to dev.parent
    IB/iser: Switch from dma_device to dev.parent
    IB/IPoIB: Switch from dma_device to dev.parent
    IB/rxe: Switch from dma_device to dev.parent
    IB/vmw_pvrdma: Switch from dma_device to dev.parent
    IB/usnic: Switch from dma_device to dev.parent
    IB/qib: Switch from dma_device to dev.parent
    IB/qedr: Switch from dma_device to dev.parent
    IB/ocrdma: Switch from dma_device to dev.parent
    IB/nes: Remove a superfluous assignment statement
    IB/mthca: Switch from dma_device to dev.parent
    IB/mlx5: Switch from dma_device to dev.parent
    IB/mlx4: Switch from dma_device to dev.parent
    IB/i40iw: Remove a superfluous assignment statement
    IB/hns: Switch from dma_device to dev.parent
    ...

    Linus Torvalds
     

25 Feb, 2017

3 commits

  • Along with the addition made to Kconfig.debug, the prior existing but
    permanently disabled test function has been slightly refactored.

    Patch has been tested using QEMU 2.1.2 with a .config obtained through
    'make defconfig' (x86_64) and manually enabling the option.

    [arnd@arndb.de: move sort self-test into a separate file]
    Link: http://lkml.kernel.org/r/20170112110657.3123790-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/HE1PR09MB0394B0418D504DCD27167D4FD49B0@HE1PR09MB0394.eurprd09.prod.outlook.com
    Signed-off-by: Kostenzer Felix
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kostenzer Felix
     
  • Extract the glob test code into its own source file, to allow to compile
    it either to a loadable module, or builtin into the kernel.

    Link: http://lkml.kernel.org/r/1483470276-10517-2-git-send-email-geert@linux-m68k.org
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Extract the crc32 test code into its own source file, to allow to
    compile it either to a loadable module, or builtin into the kernel.

    Link: http://lkml.kernel.org/r/1483470276-10517-1-git-send-email-geert@linux-m68k.org
    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

24 Feb, 2017

2 commits

  • Linus asked to please make this real C code.

    And since size then isn't an issue what so ever anymore, remove the
    debug knob and make all WARN()s unconditional.

    Suggested-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dwindsor@gmail.com
    Cc: elena.reshetova@intel.com
    Cc: gregkh@linuxfoundation.org
    Cc: ishkamiel@gmail.com
    Cc: keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Pull drm updates from Dave Airlie:
    "This is the main drm pull request for v4.11.

    Nothing too major, the tinydrm and mmu-less support should make
    writing smaller drivers easier for some of the simpler platforms, and
    there are a bunch of documentation updates.

    Intel grew displayport MST audio support which is hopefully useful to
    people, and FBC is on by default for GEN9+ (so people know where to
    look for regressions). AMDGPU has a lot of fixes that would like new
    firmware files installed for some GPUs.

    Other than that it's pretty scattered all over.

    I may have a follow up pull request as I know BenH has a bunch of AST
    rework and fixes and I'd like to get those in once they've been tested
    by AST, and I've got at least one pull request I'm just trying to get
    the author to fix up.

    Core:
    - drm_mm reworked
    - Connector list locking and iterators
    - Documentation updates
    - Format handling rework
    - MMU-less support for fbdev helpers
    - drm_crtc_from_index helper
    - Core CRC API
    - Remove drm_framebuffer_unregister_private
    - Debugfs cleanup
    - EDID/Infoframe fixes
    - Release callback
    - Tinydrm support (smaller drivers for simple hw)

    panel:
    - Add support for some new simple panels

    i915:
    - FBC by default for gen9+
    - Shared dpll cleanups and docs
    - GEN8 powerdomain cleanup
    - DMC support on GLK
    - DP MST audio support
    - HuC loading support
    - GVT init ordering fixes
    - GVT IOMMU workaround fix

    amdgpu/radeon:
    - Power/clockgating improvements
    - Preliminary SR-IOV support
    - TTM buffer priority and eviction fixes
    - SI DPM quirks removed due to firmware fixes
    - Powerplay improvements
    - VCE/UVD powergating fixes
    - Cleanup SI GFX code to match CI/VI
    - Support for > 2 displays on 3/5 crtc asics
    - SI headless fixes

    nouveau:
    - Rework securre boot code in prep for GP10x secure boot
    - Channel recovery improvements
    - Initial power budget code
    - MMU rework preperation

    vmwgfx:
    - Bunch of fixes and cleanups

    exynos:
    - Runtime PM support for MIC driver
    - Cleanups to use atomic helpers
    - UHD Support for TM2/TM2E boards
    - Trigger mode fix for Rinato board

    etnaviv:
    - Shader performance fix
    - Command stream validator fixes
    - Command buffer suballocator

    rockchip:
    - CDN DisplayPort support
    - IOMMU support for arm64 platform

    imx-drm:
    - Fix i.MX5 TV encoder probing
    - Remove lower fb size limits

    msm:
    - Support for HW cursor on MDP5 devices
    - DSI encoder cleanup
    - GPU DT bindings cleanup

    sti:
    - stih410 cleanups
    - Create fbdev at binding
    - HQVDP fixes
    - Remove stih416 chip functionality
    - DVI/HDMI mode selection fixes
    - FPS statistic reporting

    omapdrm:
    - IRQ code cleanup

    dwi-hdmi bridge:
    - Cleanups and fixes

    adv-bridge:
    - Updates for nexus

    sii8520 bridge:
    - Add interlace mode support
    - Rework HDMI and lots of fixes

    qxl:
    - probing/teardown cleanups

    ZTE drm:
    - HDMI audio via SPDIF interface
    - Video Layer overlay plane support
    - Add TV encoder output device

    atmel-hlcdc:
    - Rework fbdev creation logic

    tegra:
    - OF node fix

    fsl-dcu:
    - Minor fixes

    mali-dp:
    - Assorted fixes

    sunxi:
    - Minor fix"

    [ This was the "fixed" pull, that still had build warnings due to people
    not even having build tested the result. I'm not a happy camper

    I've fixed the things I noticed up in this merge. - Linus ]

    * tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux: (1177 commits)
    lib/Kconfig: make PRIME_NUMBERS not user selectable
    drm/tinydrm: helpers: Properly fix backlight dependency
    drm/tinydrm: mipi-dbi: Fix field width specifier warning
    drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized
    drm/sti: fix build warnings in sti_drv.c and sti_vtg.c files
    drm/amd/powerplay: fix PSI feature on Polars12
    drm/amdgpu: refuse to reserve io mem for split VRAM buffers
    drm/ttm: fix use-after-free races in vm fault handling
    drm/tinydrm: Add support for Multi-Inno MI0283QT display
    dt-bindings: Add Multi-Inno MI0283QT binding
    dt-bindings: display/panel: Add common rotation property
    of: Add vendor prefix for Multi-Inno
    drm/tinydrm: Add MIPI DBI support
    drm/tinydrm: Add helper functions
    drm: Add DRM support for tiny LCD displays
    drm/amd/amdgpu: post card if there is real hw resetting performed
    drm/nouveau/tmr: provide backtrace when a timeout is hit
    drm/nouveau/pci/g92: Fix rearm
    drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios
    drm/nouveau/hwmon: expose power_max and power_crit
    ..

    Linus Torvalds
     

23 Feb, 2017

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
    Varadhan.

    2) Simplify classifier state on sk_buff in order to shrink it a bit.
    From Willem de Bruijn.

    3) Introduce SIPHASH and it's usage for secure sequence numbers and
    syncookies. From Jason A. Donenfeld.

    4) Reduce CPU usage for ICMP replies we are going to limit or
    suppress, from Jesper Dangaard Brouer.

    5) Introduce Shared Memory Communications socket layer, from Ursula
    Braun.

    6) Add RACK loss detection and allow it to actually trigger fast
    recovery instead of just assisting after other algorithms have
    triggered it. From Yuchung Cheng.

    7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

    8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

    9) Export MPLS packet stats via netlink, from Robert Shearman.

    10) Significantly improve inet port bind conflict handling, especially
    when an application is restarted and changes it's setting of
    reuseport. From Josef Bacik.

    11) Implement TX batching in vhost_net, from Jason Wang.

    12) Extend the dummy device so that VF (virtual function) features,
    such as configuration, can be more easily tested. From Phil
    Sutter.

    13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
    Dumazet.

    14) Add new bpf MAP, implementing a longest prefix match trie. From
    Daniel Mack.

    15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

    16) Add new aquantia driver, from David VomLehn.

    17) Add bpf tracepoints, from Daniel Borkmann.

    18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
    Florian Fainelli.

    19) Remove custom busy polling in many drivers, it is done in the core
    networking since 4.5 times. From Eric Dumazet.

    20) Support XDP adjust_head in virtio_net, from John Fastabend.

    21) Fix several major holes in neighbour entry confirmation, from
    Julian Anastasov.

    22) Add XDP support to bnxt_en driver, from Michael Chan.

    23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

    24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

    25) Support GRO in IPSEC protocols, from Steffen Klassert"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
    Revert "ath10k: Search SMBIOS for OEM board file extension"
    net: socket: fix recvmmsg not returning error from sock_error
    bnxt_en: use eth_hw_addr_random()
    bpf: fix unlocking of jited image when module ronx not set
    arch: add ARCH_HAS_SET_MEMORY config
    net: napi_watchdog() can use napi_schedule_irqoff()
    tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
    net/hsr: use eth_hw_addr_random()
    net: mvpp2: enable building on 64-bit platforms
    net: mvpp2: switch to build_skb() in the RX path
    net: mvpp2: simplify MVPP2_PRS_RI_* definitions
    net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
    net: mvpp2: remove unused register definitions
    net: mvpp2: simplify mvpp2_bm_bufs_add()
    net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
    net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
    net: mvpp2: release reference to txq_cpu[] entry after unmapping
    net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
    net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
    net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
    ...

    Linus Torvalds
     

14 Feb, 2017

2 commits

  • Where we use the radix tree iteration macros, we need to annotate 'slot'
    with __rcu. Make sure we don't forget any new places in the future with
    the same CFLAGS check used for radix-tree.c.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • Many places were missing __rcu annotations. A few places needed a few
    lines of explanation about why it was safe to not use RCU accessors.
    Add a custom CFLAGS setting to the Makefile to ensure that new patches
    don't miss RCU annotations.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     

04 Feb, 2017

1 commit

  • This introduces a infrastructure for management of linear priority
    areas. Priority order in an array matters, however order of items inside
    a priority group does not matter.

    As an initial implementation, L-sort algorithm is used. It is quite
    trivial. More advanced algorithm called P-sort will be introduced as a
    follow-up. The infrastructure is prepared for other algos.

    Alongside this, a testing module is introduced as well.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     

03 Feb, 2017

1 commit

  • The "half md4" transform should not be used by any new code. And
    fortunately, it's only used now by ext4. Since ext4 supports several
    hashing methods, at some point it might be desirable to move to
    something like SipHash. As an intermediate step, remove half md4 from
    cryptohash.h and lib, and make it just a local function in ext4's
    hash.c. There's precedent for doing this; the other function ext can use
    for its hashes -- TEA -- is also implemented in the same place. Also, by
    being a local function, this might allow gcc to perform some additional
    optimizations.

    Signed-off-by: Jason A. Donenfeld
    Reviewed-by: Andreas Dilger
    Cc: Theodore Ts'o
    Signed-off-by: Theodore Ts'o

    Jason A. Donenfeld
     

25 Jan, 2017

2 commits

  • Several RDMA drivers (hfi1, qib and rxe) expect that ib_sge.addr
    is a virtual address. Provide DMA mapping operations that are
    suitable for these drivers.

    Signed-off-by: Bart Van Assche
    Cc: Christian Borntraeger
    Cc: Joerg Roedel
    Cc: Andy Lutomirski
    Cc: Michael S. Tsirkin
    Signed-off-by: Doug Ledford

    Bart Van Assche
     
  • Reduce the kernel size by only building dma_noop_ops for those
    architectures that actually use it. This was suggested by
    Christoph Hellwig.

    Signed-off-by: Bart Van Assche
    Cc: Christian Borntraeger
    Cc: Joerg Roedel
    Cc: Andy Lutomirski
    Cc: Michael S. Tsirkin
    Cc: Christoph Hellwig
    Signed-off-by: Doug Ledford

    Bart Van Assche
     

10 Jan, 2017

1 commit

  • SipHash is a 64-bit keyed hash function that is actually a
    cryptographically secure PRF, like HMAC. Except SipHash is super fast,
    and is meant to be used as a hashtable keyed lookup function, or as a
    general PRF for short input use cases, such as sequence numbers or RNG
    chaining.

    For the first usage:

    There are a variety of attacks known as "hashtable poisoning" in which an
    attacker forms some data such that the hash of that data will be the
    same, and then preceeds to fill up all entries of a hashbucket. This is
    a realistic and well-known denial-of-service vector. Currently
    hashtables use jhash, which is fast but not secure, and some kind of
    rotating key scheme (or none at all, which isn't good). SipHash is meant
    as a replacement for jhash in these cases.

    There are a modicum of places in the kernel that are vulnerable to
    hashtable poisoning attacks, either via userspace vectors or network
    vectors, and there's not a reliable mechanism inside the kernel at the
    moment to fix it. The first step toward fixing these issues is actually
    getting a secure primitive into the kernel for developers to use. Then
    we can, bit by bit, port things over to it as deemed appropriate.

    While SipHash is extremely fast for a cryptographically secure function,
    it is likely a bit slower than the insecure jhash, and so replacements
    will be evaluated on a case-by-case basis based on whether or not the
    difference in speed is negligible and whether or not the current jhash usage
    poses a real security risk.

    For the second usage:

    A few places in the kernel are using MD5 or SHA1 for creating secure
    sequence numbers, syn cookies, port numbers, or fast random numbers.
    SipHash is a faster and more fitting, and more secure replacement for MD5
    in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
    obvious and straight-forward, and so is submitted along with this patch
    series. There shouldn't be much of a debate over its efficacy.

    Dozens of languages are already using this internally for their hash
    tables and PRFs. Some of the BSDs already use this in their kernels.
    SipHash is a widely known high-speed solution to a widely known set of
    problems, and it's time we catch-up.

    Signed-off-by: Jason A. Donenfeld
    Reviewed-by: Jean-Philippe Aumasson
    Cc: Linus Torvalds
    Cc: Eric Biggers
    Cc: David Laight
    Cc: Eric Dumazet
    Signed-off-by: David S. Miller

    Jason A. Donenfeld
     

09 Jan, 2017

1 commit

  • First -misc pull for 4.11:
    - drm_mm rework + lots of selftests (Chris Wilson)
    - new connector_list locking+iterators
    - plenty of kerneldoc updates
    - format handling rework from Ville
    - atomic helper changes from Maarten for better plane corner-case handling
    in drivers, plus the i915 legacy cursor patch that needs this
    - bridge cleanup from Laurent
    - plus plenty of small stuff all over
    - also contains a merge of the 4.10 docs tree so that we could apply the
    dma-buf kerneldoc patches

    It's a lot more than usual, but due to the merge window blackout it also
    covers about 4 weeks, so all in line again on a per-week basis. The more
    annoying part with no pull request for 4 weeks is managing cross-tree
    work. The -intel pull request I'll follow up with does conflict quite a
    bit with -misc here. Longer-term (if drm-misc keeps growing) a
    drm-next-queued to accept pull request for the next merge window during
    this time might be useful.

    I'd also like to backmerge -rc2+this into drm-intel next week, we have
    quite a pile of patches waiting for the stuff in here.

    * tag 'drm-misc-next-2016-12-30' of git://anongit.freedesktop.org/git/drm-misc: (126 commits)
    drm: Add kerneldoc markup for new @scan parameters in drm_mm
    drm/mm: Document locking rules
    drm: Use drm_mm_insert_node_in_range_generic() for everyone
    drm: Apply range restriction after color adjustment when allocation
    drm: Wrap drm_mm_node.hole_follows
    drm: Apply tight eviction scanning to color_adjust
    drm: Simplify drm_mm scan-list manipulation
    drm: Optimise power-of-two alignments in drm_mm_scan_add_block()
    drm: Compute tight evictions for drm_mm_scan
    drm: Fix application of color vs range restriction when scanning drm_mm
    drm: Unconditionally do the range check in drm_mm_scan_add_block()
    drm: Rename prev_node to hole in drm_mm_scan_add_block()
    drm: Fix O= out-of-tree builds for selftests
    drm: Extract struct drm_mm_scan from struct drm_mm
    drm: Add asserts to catch overflow in drm_mm_init() and drm_mm_init_scan()
    drm: Simplify drm_mm_clean()
    drm: Detect overflow in drm_mm_reserve_node()
    drm: Fix kerneldoc for drm_mm_scan_remove_block()
    drm: Promote drm_mm alignment to u64
    drm: kselftest for drm_mm and restricted color eviction
    ...

    Dave Airlie
     

27 Dec, 2016

1 commit

  • Prime numbers are interesting for testing components that use multiplies
    and divides, such as testing DRM's struct drm_mm alignment computations.

    v2: Move to lib/, add selftest
    v3: Fix initial constants (exclude 0/1 from being primes)
    v4: More RCU markup to keep 0day/sparse happy
    v5: Fix RCU unwind on module exit, add to kselftests
    v6: Tidy computation of bitmap size
    v7: for_each_prime_number_from()
    v8: Compose small-primes using BIT() for easier verification
    v9: Move rcu dance entirely into callers.
    v10: Improve quote for Betrand's Postulate (aka Chebyshev's theorem)

    Signed-off-by: Chris Wilson
    Cc: Lukas Wunner
    Reviewed-by: Joonas Lahtinen
    Signed-off-by: Daniel Vetter
    Link: http://patchwork.freedesktop.org/patch/msgid/20161222144514.3911-1-chris@chris-wilson.co.uk

    Chris Wilson
     

25 Dec, 2016

1 commit

  • hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(),
    register_hotcpu_notifier(), register_cpu_notifier(),
    __register_hotcpu_notifier(), __register_cpu_notifier(),
    unregister_hotcpu_notifier(), unregister_cpu_notifier(),
    __unregister_hotcpu_notifier(), __unregister_cpu_notifier()

    are unused now. Remove them and all related code.

    Remove also the now pointless cpu notifier error injection mechanism. The
    states can be executed step by step and error rollback is the same as cpu
    down, so any state transition can be tested w/o requiring the notifier
    error injection.

    Some CPU hotplug states are kept as they are (ab)used for hotplug state
    tracking.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

12 Oct, 2016

1 commit

  • There's no point in collecting coverage from lib/stackdepot.c, as it is
    not a function of syscall inputs. Disabling kcov instrumentation for that
    file will reduce the coverage noise level.

    Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko
    Acked-by: Dmitry Vyukov
    Cc: Kostya Serebryany
    Cc: Andrey Konovalov
    Cc: syzkaller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

08 Oct, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main pull request for block layer changes in 4.9.

    As mentioned at the last merge window, I've changed things up and now
    do just one branch for core block layer changes, and driver changes.
    This avoids dependencies between the two branches. Outside of this
    main pull request, there are two topical branches coming as well.

    This pull request contains:

    - A set of fixes, and a conversion to blk-mq, of nbd. From Josef.

    - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd.
    Followup dependency fix from Geert.

    - General fixes from Bart, Baoyou, Guoqing, and Linus W.

    - CFQ async write starvation fix from Glauber.

    - Add supprot for delayed kick of the requeue list, from Mike.

    - Pull out the scalable bitmap code from blk-mq-tag.c and make it
    generally available under the name of sbitmap. Only blk-mq-tag uses
    it for now, but the blk-mq scheduling bits will use it as well.
    From Omar.

    - bdev thaw error progagation from Pierre.

    - Improve the blk polling statistics, and allow the user to clear
    them. From Stephen.

    - Set of minor cleanups from Christoph in block/blk-mq.

    - Set of cleanups and optimizations from me for block/blk-mq.

    - Various nvme/nvmet/nvmeof fixes from the various folks"

    * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits)
    fs/block_dev.c: return the right error in thaw_bdev()
    nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
    nvme/scsi: Remove power management support
    nvmet: Make dsm number of ranges zero based
    nvmet: Use direct IO for writes
    admin-cmd: Added smart-log command support.
    nvme-fabrics: Add host_traddr options field to host infrastructure
    nvme-fabrics: revise host transport option descriptions
    nvme-fabrics: rework nvmf_get_address() for variable options
    nbd: use BLK_MQ_F_BLOCKING
    blkcg: Annotate blkg_hint correctly
    cfq: fix starvation of asynchronous writes
    blk-mq: add flag for drivers wanting blocking ->queue_rq()
    blk-mq: remove non-blocking pass in blk_mq_map_request
    blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
    block: export bio_free_pages to other modules
    lightnvm: propagate device_add() error code
    lightnvm: expose device geometry through sysfs
    lightnvm: control life of nvm_dev in driver
    blk-mq: register device instead of disk
    ...

    Linus Torvalds
     

21 Sep, 2016

1 commit

  • This commit introduces a generic library to estimate either the min or
    max value of a time-varying variable over a recent time window. This
    is code originally from Kathleen Nichols. The current form of the code
    is from Van Jacobson.

    A single struct minmax_sample will track the estimated windowed-max
    value of the series if you call minmax_running_max() or the estimated
    windowed-min value of the series if you call minmax_running_min().

    Nearly equivalent code is already in place for minimum RTT estimation
    in the TCP stack. This commit extracts that code and generalizes it to
    handle both min and max. Moving the code here reduces the footprint
    and complexity of the TCP code base and makes the filter generally
    available for other parts of the codebase, including an upcoming TCP
    congestion control module.

    This library works well for time series where the measurements are
    smoothly increasing or decreasing.

    Signed-off-by: Van Jacobson
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Neal Cardwell
     

17 Sep, 2016

1 commit

  • This is a generally useful data structure, so make it available to
    anyone else who might want to use it. It's also a nice cleanup
    separating the allocation logic from the rest of the tag handling logic.

    The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
    selected by CONFIG_BLOCK for now.

    This should be a complete noop functionality-wise.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval