12 Oct, 2016

1 commit

  • There's no point in collecting coverage from lib/stackdepot.c, as it is
    not a function of syscall inputs. Disabling kcov instrumentation for that
    file will reduce the coverage noise level.

    Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko
    Acked-by: Dmitry Vyukov
    Cc: Kostya Serebryany
    Cc: Andrey Konovalov
    Cc: syzkaller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

08 Oct, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main pull request for block layer changes in 4.9.

    As mentioned at the last merge window, I've changed things up and now
    do just one branch for core block layer changes, and driver changes.
    This avoids dependencies between the two branches. Outside of this
    main pull request, there are two topical branches coming as well.

    This pull request contains:

    - A set of fixes, and a conversion to blk-mq, of nbd. From Josef.

    - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd.
    Followup dependency fix from Geert.

    - General fixes from Bart, Baoyou, Guoqing, and Linus W.

    - CFQ async write starvation fix from Glauber.

    - Add supprot for delayed kick of the requeue list, from Mike.

    - Pull out the scalable bitmap code from blk-mq-tag.c and make it
    generally available under the name of sbitmap. Only blk-mq-tag uses
    it for now, but the blk-mq scheduling bits will use it as well.
    From Omar.

    - bdev thaw error progagation from Pierre.

    - Improve the blk polling statistics, and allow the user to clear
    them. From Stephen.

    - Set of minor cleanups from Christoph in block/blk-mq.

    - Set of cleanups and optimizations from me for block/blk-mq.

    - Various nvme/nvmet/nvmeof fixes from the various folks"

    * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits)
    fs/block_dev.c: return the right error in thaw_bdev()
    nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
    nvme/scsi: Remove power management support
    nvmet: Make dsm number of ranges zero based
    nvmet: Use direct IO for writes
    admin-cmd: Added smart-log command support.
    nvme-fabrics: Add host_traddr options field to host infrastructure
    nvme-fabrics: revise host transport option descriptions
    nvme-fabrics: rework nvmf_get_address() for variable options
    nbd: use BLK_MQ_F_BLOCKING
    blkcg: Annotate blkg_hint correctly
    cfq: fix starvation of asynchronous writes
    blk-mq: add flag for drivers wanting blocking ->queue_rq()
    blk-mq: remove non-blocking pass in blk_mq_map_request
    blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
    block: export bio_free_pages to other modules
    lightnvm: propagate device_add() error code
    lightnvm: expose device geometry through sysfs
    lightnvm: control life of nvm_dev in driver
    blk-mq: register device instead of disk
    ...

    Linus Torvalds
     

21 Sep, 2016

1 commit

  • This commit introduces a generic library to estimate either the min or
    max value of a time-varying variable over a recent time window. This
    is code originally from Kathleen Nichols. The current form of the code
    is from Van Jacobson.

    A single struct minmax_sample will track the estimated windowed-max
    value of the series if you call minmax_running_max() or the estimated
    windowed-min value of the series if you call minmax_running_min().

    Nearly equivalent code is already in place for minimum RTT estimation
    in the TCP stack. This commit extracts that code and generalizes it to
    handle both min and max. Moving the code here reduces the footprint
    and complexity of the TCP code base and makes the filter generally
    available for other parts of the codebase, including an upcoming TCP
    congestion control module.

    This library works well for time series where the measurements are
    smoothly increasing or decreasing.

    Signed-off-by: Van Jacobson
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Neal Cardwell
     

17 Sep, 2016

1 commit

  • This is a generally useful data structure, so make it available to
    anyone else who might want to use it. It's also a nice cleanup
    separating the allocation logic from the rest of the tag handling logic.

    The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
    selected by CONFIG_BLOCK for now.

    This should be a complete noop functionality-wise.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

31 Aug, 2016

1 commit

  • There are three usercopy warnings which are currently being silenced for
    gcc 4.6 and newer:

    1) "copy_from_user() buffer size is too small" compile warning/error

    This is a static warning which happens when object size and copy size
    are both const, and copy size > object size. I didn't see any false
    positives for this one. So the function warning attribute seems to
    be working fine here.

    Note this scenario is always a bug and so I think it should be
    changed to *always* be an error, regardless of
    CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

    2) "copy_from_user() buffer size is not provably correct" compile warning

    This is another static warning which happens when I enable
    __compiletime_object_size() for new compilers (and
    CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size
    is const, but copy size is *not*. In this case there's no way to
    compare the two at build time, so it gives the warning. (Note the
    warning is a byproduct of the fact that gcc has no way of knowing
    whether the overflow function will be called, so the call isn't dead
    code and the warning attribute is activated.)

    So this warning seems to only indicate "this is an unusual pattern,
    maybe you should check it out" rather than "this is a bug".

    I get 102(!) of these warnings with allyesconfig and the
    __compiletime_object_size() gcc check removed. I don't know if there
    are any real bugs hiding in there, but from looking at a small
    sample, I didn't see any. According to Kees, it does sometimes find
    real bugs. But the false positive rate seems high.

    3) "Buffer overflow detected" runtime warning

    This is a runtime warning where object size is const, and copy size >
    object size.

    All three warnings (both static and runtime) were completely disabled
    for gcc 4.6 with the following commit:

    2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")

    That commit mistakenly assumed that the false positives were caused by a
    gcc bug in __compiletime_object_size(). But in fact,
    __compiletime_object_size() seems to be working fine. The false
    positives were instead triggered by #2 above. (Though I don't have an
    explanation for why the warnings supposedly only started showing up in
    gcc 4.6.)

    So remove warning #2 to get rid of all the false positives, and re-enable
    warnings #1 and #3 by reverting the above commit.

    Furthermore, since #1 is a real bug which is detected at compile time,
    upgrade it to always be an error.

    Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
    needed.

    Signed-off-by: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc: Andy Lutomirski
    Cc: Steven Rostedt
    Cc: Brian Gerst
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Byungchul Park
    Cc: Nilay Vaish
    Signed-off-by: Linus Torvalds

    Josh Poimboeuf
     

28 Jul, 2016

1 commit

  • Pull random driver updates from Ted Ts'o:
    "A number of improvements for the /dev/random driver; the most
    important is the use of a ChaCha20-based CRNG for /dev/urandom, which
    is faster, more efficient, and easier to make scalable for
    silly/abusive userspace programs that want to read from /dev/urandom
    in a tight loop on NUMA systems.

    This set of patches also improves entropy gathering on VM's running on
    Microsoft Azure, and will take advantage of a hw random number
    generator (if present) to initialize the /dev/urandom pool"

    (It turns out that the random tree hadn't been in linux-next this time
    around, because it had been dropped earlier as being too quiet. Oh
    well).

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    random: strengthen input validation for RNDADDTOENTCNT
    random: add backtracking protection to the CRNG
    random: make /dev/urandom scalable for silly userspace programs
    random: replace non-blocking pool with a Chacha20-based CRNG
    random: properly align get_random_int_hash
    random: add interrupt callback to VMBus IRQ handler
    random: print a warning for the first ten uninitialized random users
    random: initialize the non-blocking pool via add_hwgenerator_randomness()

    Linus Torvalds
     

03 Jul, 2016

1 commit


08 Jun, 2016

1 commit

  • People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench
    into kcov, lto, etc, experimentations.

    Add asm versions for __sw_hweight{32,64}() and do explicit saving and
    restoring of clobbered registers. This gets rid of the special calling
    convention. We get to call those functions on !X86_FEATURE_POPCNT CPUs.

    We still need to hardcode POPCNT and register operands as some old gas
    versions which we support, do not know about POPCNT.

    Btw, remove redundant REX prefix from 32-bit POPCNT because alternatives
    can do padding now.

    Suggested-by: H. Peter Anvin
    Signed-off-by: Borislav Petkov
    Acked-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1464605787-20603-1-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

31 May, 2016

1 commit

  • It appears that somehow I missed a test of the latest UUID rework which
    landed in the kernel. Present a small test module to avoid such cases
    in the future.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

29 May, 2016

2 commits

  • Pull string hash improvements from George Spelvin:
    "This series does several related things:

    - Makes the dcache hash (fs/namei.c) useful for general kernel use.

    (Thanks to Bruce for noticing the zero-length corner case)

    - Converts the string hashes in to use the
    above.

    - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
    32-bit multiplies will do well enough.

    - Rids the world of the bad hash multipliers in hash_32.

    This finishes the job started in commit 689de1d6ca95 ("Minimal
    fix-up of bad hashing behavior of hash_64()")

    The vast majority of Linux architectures have hardware support for
    32x32-bit multiply and so derive no benefit from "simplified"
    multipliers.

    The few processors that do not (68000, h8/300 and some models of
    Microblaze) have arch-specific implementations added. Those
    patches are last in the series.

    - Overhauls the dcache hash mixing.

    The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
    CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
    Replaced with a much more careful design that's simultaneously
    faster and better. (My own invention, as there was noting suitable
    in the literature I could find. Comments welcome!)

    - Modify the hash_name() loop to skip the initial HASH_MIX(). This
    would let us salt the hash if we ever wanted to.

    - Sort out partial_name_hash().

    The hash function is declared as using a long state, even though
    it's truncated to 32 bits at the end and the extra internal state
    contributes nothing to the result. And some callers do odd things:

    - fs/hfs/string.c only allocates 32 bits of state
    - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

    - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
    rather than 0..sizeof(long)-1. This would simplify users other
    than full_name_hash"

    Special thanks to Bruce Fields for testing and finding bugs in v1. (I
    learned some humbling lessons about "obviously correct" code.)

    On the arch-specific front, the m68k assembly has been tested in a
    standalone test harness, I've been in contact with the Microblaze
    maintainers who mostly don't care, as the hardware multiplier is never
    omitted in real-world applications, and I haven't heard anything from
    the H8/300 world"

    * 'hash' of git://ftp.sciencehorizons.net/linux:
    h8300: Add
    microblaze: Add
    m68k: Add
    : Add support for architecture-specific functions
    fs/namei.c: Improve dcache hash function
    Eliminate bad hash multipliers from hash_32() and hash_64()
    Change hash_64() return value to 32 bits
    : Define hash_str() in terms of hashlen_string()
    fs/namei.c: Add hashlen_string() function
    Pull out string hash to

    Linus Torvalds
     
  • This is just the infrastructure; there are no users yet.

    This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
    the existence of .

    That file may define its own versions of various functions, and define
    HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

    Included is a self-test (in lib/test_hash.c) that verifies the basics.
    It is NOT in general required that the arch-specific functions compute
    the same thing as the generic, but if a HAVE_* symbol is defined with
    the value 1, then equality is tested.

    Signed-off-by: George Spelvin
    Cc: Geert Uytterhoeven
    Cc: Greg Ungerer
    Cc: Andreas Schwab
    Cc: Philippe De Muyter
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: Alistair Francis
    Cc: Michal Simek
    Cc: Yoshinori Sato
    Cc: uclinux-h8-devel@lists.sourceforge.jp

    George Spelvin
     

20 May, 2016

1 commit

  • Lots of code does

    node = next_node(node, XXX);
    if (node == MAX_NUMNODES)
    node = first_node(XXX);

    so create next_node_in() to do this and use it in various places.

    [mhocko@suse.com: use next_node_in() helper]
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Signed-off-by: Michal Hocko
    Cc: Xishi Qiu
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Naoya Horiguchi
    Cc: Laura Abbott
    Cc: Hui Zhu
    Cc: Wang Xiaoqiang
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

19 May, 2016

1 commit

  • Pull SCSI updates from James Bottomley:
    "First round of SCSI updates for the 4.6+ merge window.

    This batch includes the usual quota of driver updates (bnx2fc, mp3sas,
    hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas). There's
    also a multiqueue update for scsi_debug, assorted bug fixes and a few
    other minor updates (refactor of scsi_sg_pools into generic code, alua
    and VPD updates, and struct timeval conversions)"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (138 commits)
    mpt3sas: Used "synchronize_irq()"API to synchronize timed-out IO & TMs
    mpt3sas: Set maximum transfer length per IO to 4MB for VDs
    mpt3sas: Updating mpt3sas driver version to 13.100.00.00
    mpt3sas: Fix initial Reference tag field for 4K PI drives.
    mpt3sas: Handle active cable exception event
    mpt3sas: Update MPI header to 2.00.42
    Revert "lpfc: Delete unnecessary checks before the function call mempool_destroy"
    eata_pio: missing break statement
    hpsa: Fix type ZBC conditional checks
    scsi_lib: Decode T10 vendor IDs
    scsi_dh_alua: do not fail for unknown VPD identification
    scsi_debug: use locally assigned naa
    scsi_debug: uuid for lu name
    scsi_debug: vpd and mode page work
    scsi_debug: add multiple queue support
    bfa: fix bfa_fcb_itnim_alloc() error handling
    megaraid_sas: Downgrade two success messages to info
    cxlflash: Fix to resolve dead-lock during EEH recovery
    scsi_debug: rework resp_report_luns
    scsi_debug: use pdt constants
    ...

    Linus Torvalds
     

16 Apr, 2016

1 commit


01 Apr, 2016

1 commit

  • By accident I stumbled across code that is no longer used. According
    to git grep, the global functions in lib/proportions.c are not used
    anywhere. This patch removes the old, unused code.

    Peter Zijlstra further commented:

    "Ah indeed, that got replaced with the flex proportion code a while back."

    Signed-off-by: Richard Cochran
    Acked-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/4265b49bed713fbe3faaf8c05da0e1792f09c0b3.1459432020.git.rcochran@linutronix.de
    Signed-off-by: Ingo Molnar

    Richard Cochran
     

26 Mar, 2016

1 commit

  • Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
    will allow KASAN store allocation/deallocation stack traces for memory
    chunks. The stack traces are stored in a hash table and referenced by
    handles which reside in the kasan_alloc_meta and kasan_free_meta
    structures in the allocated memory chunks.

    IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
    duplication.

    Right now stackdepot support is only enabled in SLAB allocator. Once
    KASAN features in SLAB are on par with those in SLUB we can switch SLUB
    to stackdepot as well, thus removing the dependency on SLUB stack
    bookkeeping, which wastes a lot of memory.

    This patch is based on the "mm: kasan: stack depots" patch originally
    prepared by Dmitry Chernenkov.

    Joonsoo has said that he plans to reuse the stackdepot code for the
    mm/page_owner.c debugging facility.

    [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
    [aryabinin@virtuozzo.com: comment style fixes]
    Signed-off-by: Alexander Potapenko
    Signed-off-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

23 Mar, 2016

1 commit

  • kcov provides code coverage collection for coverage-guided fuzzing
    (randomized testing). Coverage-guided fuzzing is a testing technique
    that uses coverage feedback to determine new interesting inputs to a
    system. A notable user-space example is AFL
    (http://lcamtuf.coredump.cx/afl/). However, this technique is not
    widely used for kernel testing due to missing compiler and kernel
    support.

    kcov does not aim to collect as much coverage as possible. It aims to
    collect more or less stable coverage that is function of syscall inputs.
    To achieve this goal it does not collect coverage in soft/hard
    interrupts and instrumentation of some inherently non-deterministic or
    non-interesting parts of kernel is disbled (e.g. scheduler, locking).

    Currently there is a single coverage collection mode (tracing), but the
    API anticipates additional collection modes. Initially I also
    implemented a second mode which exposes coverage in a fixed-size hash
    table of counters (what Quentin used in his original patch). I've
    dropped the second mode for simplicity.

    This patch adds the necessary support on kernel side. The complimentary
    compiler support was added in gcc revision 231296.

    We've used this support to build syzkaller system call fuzzer, which has
    found 90 kernel bugs in just 2 months:

    https://github.com/google/syzkaller/wiki/Found-Bugs

    We've also found 30+ bugs in our internal systems with syzkaller.
    Another (yet unexplored) direction where kcov coverage would greatly
    help is more traditional "blob mutation". For example, mounting a
    random blob as a filesystem, or receiving a random blob over wire.

    Why not gcov. Typical fuzzing loop looks as follows: (1) reset
    coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
    typical coverage can be just a dozen of basic blocks (e.g. an invalid
    input). In such context gcov becomes prohibitively expensive as
    reset/collect coverage steps depend on total number of basic
    blocks/edges in program (in case of kernel it is about 2M). Cost of
    kcov depends only on number of executed basic blocks/edges. On top of
    that, kernel requires per-thread coverage because there are always
    background threads and unrelated processes that also produce coverage.
    With inlined gcov instrumentation per-thread coverage is not possible.

    kcov exposes kernel PCs and control flow to user-space which is
    insecure. But debugfs should not be mapped as user accessible.

    Based on a patch by Quentin Casasnovas.

    [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
    [akpm@linux-foundation.org: unbreak allmodconfig]
    [akpm@linux-foundation.org: follow x86 Makefile layout standards]
    Signed-off-by: Dmitry Vyukov
    Reviewed-by: Kees Cook
    Cc: syzkaller
    Cc: Vegard Nossum
    Cc: Catalin Marinas
    Cc: Tavis Ormandy
    Cc: Will Deacon
    Cc: Quentin Casasnovas
    Cc: Kostya Serebryany
    Cc: Eric Dumazet
    Cc: Alexander Potapenko
    Cc: Kees Cook
    Cc: Bjorn Helgaas
    Cc: Sasha Levin
    Cc: David Drysdale
    Cc: Ard Biesheuvel
    Cc: Andrey Ryabinin
    Cc: Kirill A. Shutemov
    Cc: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     

21 Mar, 2016

1 commit

  • Pull virtio/vhost updates from Michael Tsirkin:
    "New features, performance improvements, cleanups:

    - basic polling support for vhost
    - rework virtio to optionally use DMA API, fixing it on Xen
    - balloon stats gained a new entry
    - using the new napi_alloc_skb speeds up virtio net
    - virtio blk stats can now be read while another VCPU is busy
    inflating or deflating the balloon

    plus misc cleanups in various places"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()
    vhost_net: basic polling support
    vhost: introduce vhost_vq_avail_empty()
    vhost: introduce vhost_has_work()
    virtio_balloon: Allow to resize and update the balloon stats in parallel
    virtio_balloon: Use a workqueue instead of "vballoon" kthread
    virtio/s390: size of SET_IND payload
    virtio/s390: use dev_to_virtio
    vhost: rename vhost_init_used()
    vhost: rename cross-endian helpers
    virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH
    vring: Use the DMA API on Xen
    virtio_pci: Use the DMA API if enabled
    virtio_mmio: Use the DMA API if enabled
    virtio: Add improved queue allocation API
    virtio_ring: Support DMA APIs
    vring: Introduce vring_use_dma_api()
    s390/dma: Allow per device dma ops
    alpha/dma: use common noop dma ops
    dma: Provide simple noop dma ops

    Linus Torvalds
     

02 Mar, 2016

1 commit

  • We are going to require dma_ops for several common drivers, even for
    systems that do have an identity mapping. Lets provide some minimal
    no-op dma_ops that can be used for that purpose.

    Signed-off-by: Christian Borntraeger
    Reviewed-by: Joerg Roedel
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin

    Christian Borntraeger
     

20 Feb, 2016

1 commit


24 Jan, 2016

1 commit

  • Pull rdma updates from Doug Ledford:
    "Initial roundup of 4.5 merge window patches

    - Remove usage of ib_query_device and instead store attributes in
    ib_device struct

    - Move iopoll out of block and into lib, rename to irqpoll, and use
    in several places in the rdma stack as our new completion queue
    polling library mechanism. Update the other block drivers that
    already used iopoll to use the new mechanism too.

    - Replace the per-entry GID table locks with a single GID table lock

    - IPoIB multicast cleanup

    - Cleanups to the IB MR facility

    - Add support for 64bit extended IB counters

    - Fix for netlink oops while parsing RDMA nl messages

    - RoCEv2 support for the core IB code

    - mlx4 RoCEv2 support

    - mlx5 RoCEv2 support

    - Cross Channel support for mlx5

    - Timestamp support for mlx5

    - Atomic support for mlx5

    - Raw QP support for mlx5

    - MAINTAINERS update for mlx4/mlx5

    - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

    - Add support for remote invalidate to the iSER driver (pushed
    through the RDMA tree due to dependencies, acknowledged by nab)

    - Update to NFSoRDMA (pushed through the RDMA tree due to
    dependencies, acknowledged by Bruce)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
    IB/mlx5: Unify CQ create flags check
    IB/mlx5: Expose Raw Packet QP to user space consumers
    {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
    IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
    IB/mlx5: Add Raw Packet QP query functionality
    IB/mlx5: Add create and destroy functionality for Raw Packet QP
    IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
    IB/mlx5: Allocate a Transport Domain for each ucontext
    net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
    net/mlx5_core: Add RQ and SQ event handling
    net/mlx5_core: Export transport objects
    IB/mlx5: Expose CQE version to user-space
    IB/mlx5: Add CQE version 1 support to user QPs and SRQs
    IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
    IB/sa: Fix netlink local service GFP crash
    IB/srpt: Remove redundant wc array
    IB/qib: Improve ipoib UD performance
    IB/mlx4: Advertise RoCE v2 support
    IB/mlx4: Create and use another QP1 for RoCEv2
    IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
    ...

    Linus Torvalds
     

21 Jan, 2016

3 commits

  • UBSAN uses compile-time instrumentation to catch undefined behavior
    (UB). Compiler inserts code that perform certain kinds of checks before
    operations that could cause UB. If check fails (i.e. UB detected)
    __ubsan_handle_* function called to print error message.

    So the most of the work is done by compiler. This patch just implements
    ubsan handlers printing errors.

    GCC has this capability since 4.9.x [1] (see -fsanitize=undefined
    option and its suboptions).
    However GCC 5.x has more checkers implemented [2].
    Article [3] has a bit more details about UBSAN in the GCC.

    [1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
    [2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
    [3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

    Issues which UBSAN has found thus far are:

    Found bugs:

    * out-of-bounds access - 97840cb67ff5 ("netfilter: nfnetlink: fix
    insufficient validation in nfnetlink_bind")

    undefined shifts:

    * d48458d4a768 ("jbd2: use a better hash function for the revoke
    table")

    * 10632008b9e1 ("clockevents: Prevent shift out of bounds")

    * 'x << -1' shift in ext4 -
    http://lkml.kernel.org/r/

    * undefined rol32(0) -
    http://lkml.kernel.org/r/

    * undefined dirty_ratelimit calculation -
    http://lkml.kernel.org/r/

    * undefined roundown_pow_of_two(0) -
    http://lkml.kernel.org/r/

    * [WONTFIX] undefined shift in __bpf_prog_run -
    http://lkml.kernel.org/r/

    WONTFIX here because it should be fixed in bpf program, not in kernel.

    signed overflows:

    * 32a8df4e0b33f ("sched: Fix odd values in effective_load()
    calculations")

    * mul overflow in ntp -
    http://lkml.kernel.org/r/

    * incorrect conversion into rtc_time in rtc_time64_to_tm() -
    http://lkml.kernel.org/r/

    * unvalidated timespec in io_getevents() -
    http://lkml.kernel.org/r/

    * [NOTABUG] signed overflow in ktime_add_safe() -
    http://lkml.kernel.org/r/

    [akpm@linux-foundation.org: fix unused local warning]
    [akpm@linux-foundation.org: fix __int128 build woes]
    Signed-off-by: Andrey Ryabinin
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: Randy Dunlap
    Cc: Rasmus Villemoes
    Cc: Jonathan Corbet
    Cc: Michal Marek
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yury Gribov
    Cc: Dmitry Vyukov
    Cc: Konstantin Khlebnikov
    Cc: Kostya Serebryany
    Cc: Johannes Berg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • The clz table (__clz_tab) in lib/clz_tab.c is also provided as part of
    libgcc.a, and many architectures link against libgcc. To allow the
    linker to avoid a multiple-definition link failure, clz_tab.o has to be
    in lib/lib.a rather than lib/builtin.o. The specific issue is that
    libgcc.a comes before lib/builtin.o on vmlinux.o's link command line, so
    its _clz.o is pulled to satisfy __clz_tab, and then when the remainder
    of lib/builtin.o is pulled in to satisfy all the other dependencies, the
    __clz_tab symbols conflict. By putting clz_tab.o in lib.a, the linker
    can simply avoid pulling it into vmlinux.o when this situation arises.

    The definitions of __clz_tab are the same in libgcc.a and in the kernel;
    arguably we could also simply rename the kernel version, but it's
    unlikely the libgcc version will ever change to become incompatible, so
    just using it seems reasonably safe.

    Signed-off-by: Chris Metcalf
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • The test suite currently doesn't cover many corner cases when
    hex_dump_to_buffer() runs into overflow. Refactor and amend test suite
    to cover most of the cases.

    This patch (of 9):

    Just to follow the scheme that most of the test modules are using.

    There is no fuctional change.

    Signed-off-by: Andy Shevchenko
    Acked-by: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

12 Dec, 2015

1 commit


02 Dec, 2015

1 commit

  • This module allows to insert errors in some of netdevice's notifier
    events. All network drivers use these notifiers to signal various events
    and to check if they are allowed, e.g. PRECHANGEMTU and CHANGEMTU
    afterwards. Until recently I had to run failure tests by injecting
    a custom module, but now this infrastructure makes it trivial to test
    these failure paths. Some of the recent bugs I fixed were found using
    this module.
    Here's an example:
    $ cd /sys/kernel/debug/notifier-error-inject/netdev
    $ echo -22 > actions/NETDEV_CHANGEMTU/error
    $ ip link set eth0 mtu 1024
    RTNETLINK answers: Invalid argument

    CC: Akinobu Mita
    CC: "David S. Miller"
    CC: netdev
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

07 Nov, 2015

1 commit

  • This adds a simple module for testing the kernel's printf facilities.
    Previously, some %p extensions have caused a wrong return value in case
    the entire output didn't fit and/or been unusable in kasprintf(). This
    should help catch such issues. Also, it should help ensure that changes
    to the formatting algorithms don't break anything.

    I'm not sure if we have a struct dentry or struct file lying around at
    boot time or if we can fake one, but most %p extensions should be
    testable, as should the ordinary number and string formatting.

    The nature of vararg functions means we can't use a more conventional
    table-driven approach.

    For now, this is mostly a skeleton; contributions are very
    welcome. Some tests are/will be slightly annoying to write, since the
    expected output depends on stuff like CONFIG_*, sizeof(long), runtime
    values etc.

    Signed-off-by: Rasmus Villemoes
    Reviewed-by: Kees Cook
    Cc: Andy Shevchenko
    Cc: Martin Kletzander
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

08 Oct, 2015

1 commit

  • There's no good reason why users outside of networking should not
    be using this facility, f.e. for initializing their seeds.

    Therefore, make it accessible from there as get_random_once().

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

09 Sep, 2015

1 commit

  • Pull NMI backtrace update from Russell King:
    "These changes convert the x86 NMI handling to be a library
    implementation which other architectures can make use of. Thomas
    Gleixner has reviewed and tested these changes, and wishes me to send
    these rather than taking them through the tip tree.

    The final patch in the set adds an initial implementation using this
    infrastructure to ARM, even though it doesn't send the IPI at "NMI"
    level. Patches are in progress to add the ARM equivalent of NMI, but
    we still need the IRQ-level fallback for systems where the "NMI" isn't
    available due to secure firmware denying access to it"

    * 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
    ARM: add basic support for on-demand backtrace of other CPUs
    nmi: x86: convert to generic nmi handler
    nmi: create generic NMI backtrace implementation

    Linus Torvalds
     

04 Sep, 2015

1 commit

  • Pull locking and atomic updates from Ingo Molnar:
    "Main changes in this cycle are:

    - Extend atomic primitives with coherent logic op primitives
    (atomic_{or,and,xor}()) and deprecate the old partial APIs
    (atomic_{set,clear}_mask())

    The old ops were incoherent with incompatible signatures across
    architectures and with incomplete support. Now every architecture
    supports the primitives consistently (by Peter Zijlstra)

    - Generic support for 'relaxed atomics':

    - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
    - atomic_read_acquire()
    - atomic_set_release()

    This came out of porting qwrlock code to arm64 (by Will Deacon)

    - Clean up the fragile static_key APIs that were causing repeat bugs,
    by introducing a new one:

    DEFINE_STATIC_KEY_TRUE(name);
    DEFINE_STATIC_KEY_FALSE(name);

    which define a key of different types with an initial true/false
    value.

    Then allow:

    static_branch_likely()
    static_branch_unlikely()

    to take a key of either type and emit the right instruction for the
    case. To be able to know the 'type' of the static key we encode it
    in the jump entry (by Peter Zijlstra)

    - Static key self-tests (by Jason Baron)

    - qrwlock optimizations (by Waiman Long)

    - small futex enhancements (by Davidlohr Bueso)

    - ... and misc other changes"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
    jump_label/x86: Work around asm build bug on older/backported GCCs
    locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
    locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
    locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
    locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
    locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
    locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
    locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
    locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
    locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
    locking/static_keys: Make verify_keys() static
    jump label, locking/static_keys: Update docs
    locking/static_keys: Provide a selftest
    jump_label: Provide a self-test
    s390/uaccess, locking/static_keys: employ static_branch_likely()
    x86, tsc, locking/static_keys: Employ static_branch_likely()
    locking/static_keys: Add selftest
    locking/static_keys: Add a new static_key interface
    locking/static_keys: Rework update logic
    locking/static_keys: Add static_key_{en,dis}able() helpers
    ...

    Linus Torvalds
     

03 Sep, 2015

1 commit

  • Pull networking updates from David Miller:
    "Another merge window, another set of networking changes. I've heard
    rumblings that the lightweight tunnels infrastructure has been voted
    networking change of the year. But what do I know?

    1) Add conntrack support to openvswitch, from Joe Stringer.

    2) Initial support for VRF (Virtual Routing and Forwarding), which
    allows the segmentation of routing paths without using multiple
    devices. There are some semantic kinks to work out still, but
    this is a reasonably strong foundation. From David Ahern.

    3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.

    4) Ignore route nexthops with a link down state in ipv6, just like
    ipv4. From Andy Gospodarek.

    5) Remove spinlock from fast path of act_gact and act_mirred, from
    Eric Dumazet.

    6) Document the DSA layer, from Florian Fainelli.

    7) Add netconsole support to bcmgenet, systemport, and DSA. Also
    from Florian Fainelli.

    8) Add Mellanox Switch Driver and core infrastructure, from Jiri
    Pirko.

    9) Add support for "light weight tunnels", which allow for
    encapsulation and decapsulation without bearing the overhead of a
    full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of
    others.

    10) Add Identifier Locator Addressing support for ipv6, from Tom
    Herbert.

    11) Support fragmented SKBs in iwlwifi, from Johannes Berg.

    12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.

    13) Add BQL support to 3c59x driver, from Loganaden Velvindron.

    14) Stop using a zero TX queue length to mean that a device shouldn't
    have a qdisc attached, use an explicit flag instead. From Phil
    Sutter.

    15) Use generic geneve netdevice infrastructure in openvswitch, from
    Pravin B Shelar.

    16) Add infrastructure to avoid re-forwarding a packet in software
    that was already forwarded by a hardware switch. From Scott
    Feldman.

    17) Allow AF_PACKET fanout function to be implemented in a bpf
    program, from Willem de Bruijn"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
    netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
    netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
    net: fec: clear receive interrupts before processing a packet
    ipv6: fix exthdrs offload registration in out_rt path
    xen-netback: add support for multicast control
    bgmac: Update fixed_phy_register()
    sock, diag: fix panic in sock_diag_put_filterinfo
    flow_dissector: Use 'const' where possible.
    flow_dissector: Fix function argument ordering dependency
    ixgbe: Resolve "initialized field overwritten" warnings
    ixgbe: Remove bimodal SR-IOV disabling
    ixgbe: Add support for reporting 2.5G link speed
    ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
    ixgbe: support for ethtool set_rxfh
    ixgbe: Avoid needless PHY access on copper phys
    ixgbe: cleanup to use cached mask value
    ixgbe: Remove second instance of lan_id variable
    ixgbe: use kzalloc for allocating one thing
    flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
    ixgbe: Remove unused PCI bus types
    ...

    Linus Torvalds
     

27 Aug, 2015

1 commit


25 Aug, 2015

1 commit

  • Sometimes a scatter-gather has to be split into several chunks, or sub
    scatter lists. This happens for example if a scatter list will be
    handled by multiple DMA channels, each one filling a part of it.

    A concrete example comes with the media V4L2 API, where the scatter list
    is allocated from userspace to hold an image, regardless of the
    knowledge of how many DMAs will fill it :
    - in a simple RGB565 case, one DMA will pump data from the camera ISP
    to memory
    - in the trickier YUV422 case, 3 DMAs will pump data from the camera
    ISP pipes, one for pipe Y, one for pipe U and one for pipe V

    For these cases, it is necessary to split the original scatter list into
    multiple scatter lists, which is the purpose of this patch.

    The guarantees that are required for this patch are :
    - the intersection of spans of any couple of resulting scatter lists is
    empty.
    - the union of spans of all resulting scatter lists is a subrange of
    the span of the original scatter list.
    - streaming DMA API operations (mapping, unmapping) should not happen
    both on both the resulting and the original scatter list. It's either
    the first or the later ones.
    - the caller is reponsible to call kfree() on the resulting
    scatterlists.

    Signed-off-by: Robert Jarzmik
    Signed-off-by: Jens Axboe

    Robert Jarzmik
     

03 Aug, 2015

2 commits

  • The 'jump label' self-test is in reality testing static keys - rename things
    accordingly.

    Also prettify the code in various places while at it.

    Acked-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Jason Baron
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Shuah Khan
    Cc: Thomas Gleixner
    Cc: benh@kernel.crashing.org
    Cc: bp@alien8.de
    Cc: davem@davemloft.net
    Cc: ddaney@caviumnetworks.com
    Cc: heiko.carstens@de.ibm.com
    Cc: linux-kernel@vger.kernel.org
    Cc: liuj97@gmail.com
    Cc: luto@amacapital.net
    Cc: michael@ellerman.id.au
    Cc: rabin@rab.in
    Cc: ralf@linux-mips.org
    Cc: rostedt@goodmis.org
    Cc: vbabka@suse.cz
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/0c091ecebd78a879ed8a71835d205a691a75ab4e.1438227999.git.jbaron@akamai.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Signed-off-by: Jason Baron
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: benh@kernel.crashing.org
    Cc: bp@alien8.de
    Cc: davem@davemloft.net
    Cc: ddaney@caviumnetworks.com
    Cc: heiko.carstens@de.ibm.com
    Cc: linux-kernel@vger.kernel.org
    Cc: liuj97@gmail.com
    Cc: luto@amacapital.net
    Cc: michael@ellerman.id.au
    Cc: rabin@rab.in
    Cc: ralf@linux-mips.org
    Cc: rostedt@goodmis.org
    Cc: shuahkh@osg.samsung.com
    Cc: vbabka@suse.cz
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/0c091ecebd78a879ed8a71835d205a691a75ab4e.1438227999.git.jbaron@akamai.com
    Signed-off-by: Ingo Molnar

    Jason Baron
     

17 Jul, 2015

1 commit

  • x86s NMI backtrace implementation (for arch_trigger_all_cpu_backtrace())
    is fairly generic in nature - the only architecture specific bits are
    the act of raising the NMI to other CPUs, and reporting the status of
    the NMI handler.

    These are fairly simple to factor out, and produce a generic
    implementation which can be shared between ARM and x86.

    Reviewed-by: Thomas Gleixner
    Signed-off-by: Russell King

    Russell King
     

03 Jul, 2015

1 commit

  • Pull kbuild updates from Michal Marek:
    "Just a few kbuild core commits this time:

    - kallsyms fix for CONFIG_XIP_KERNEL

    - bashisms in scripts/link-vmlinux.sh fixed

    - workaround to make DEBUG_INFO_REDUCED more useful yet still space
    efficient

    - clang is not wrongly detected when cross-compiling"

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kbuild: include core debug info when DEBUG_INFO_REDUCED
    scripts: link-vmlinux: Don't pass page offset to kallsyms if XIP Kernel
    scripts: fix link-vmlinux.sh bash-ism
    Makefile: Fix detection of clang when cross-compiling

    Linus Torvalds
     

19 Jun, 2015

1 commit


11 Jun, 2015

1 commit

  • With CONFIG_DEBUG_INFO_REDUCED, we do get quite a lot of debug info
    (around 22.7 MB for a defconfig+DEBUG_INFO_REDUCED). However, the
    "basenames must match" rule used by -femit-struct-debug-baseonly
    option means that we miss some core data structures, such as struct
    {device, file, inode, mm_struct, page} etc.

    We can easily get these included as well, while still getting the
    benefits of CONFIG_DEBUG_INFO_REDUCED (faster build times and smaller
    individual object files): All it takes is a dummy translation unit
    including a few strategic headers and compiled with a flag overriding
    -femit-struct-debug-baseonly.

    This increases the size of .debug_info by ~0.3%, but these 90 KB
    contain some rather useful info.

    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Michal Marek

    Rasmus Villemoes
     

11 May, 2015

1 commit

  • Add 842-format software compression and decompression functions.
    Update the MAINTAINERS 842 section to include the new files.

    The 842 compression function can compress any input data into the 842
    compression format. The 842 decompression function can decompress any
    standard-format 842 compressed data - specifically, either a compressed
    data buffer created by the 842 software compression function, or a
    compressed data buffer created by the 842 hardware compressor (located
    in PowerPC coprocessors).

    The 842 compressed data format is explained in the header comments.

    This is used in a later patch to provide a full software 842 compression
    and decompression crypto interface.

    Signed-off-by: Dan Streetman
    Signed-off-by: Herbert Xu

    Dan Streetman