31 Jan, 2020

2 commits


30 Jan, 2020

1 commit


29 Jan, 2020

3 commits

  • Pull networking updates from David Miller:

    1) Add WireGuard

    2) Add HE and TWT support to ath11k driver, from John Crispin.

    3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

    4) Add variable window congestion control to TIPC, from Jon Maloy.

    5) Add BCM84881 PHY driver, from Russell King.

    6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

    7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

    8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

    9) Add BPF dynamic program extensions, from Alexei Starovoitov.

    10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

    11) Add support for macsec HW offloading, from Antoine Tenart.

    12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

    13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
    net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
    udp: segment looped gso packets correctly
    netem: change mailing list
    qed: FW 8.42.2.0 debug features
    qed: rt init valid initialization changed
    qed: Debug feature: ilt and mdump
    qed: FW 8.42.2.0 Add fw overlay feature
    qed: FW 8.42.2.0 HSI changes
    qed: FW 8.42.2.0 iscsi/fcoe changes
    qed: Add abstraction for different hsi values per chip
    qed: FW 8.42.2.0 Additional ll2 type
    qed: Use dmae to write to widebus registers in fw_funcs
    qed: FW 8.42.2.0 Parser offsets modified
    qed: FW 8.42.2.0 Queue Manager changes
    qed: FW 8.42.2.0 Expose new registers and change windows
    qed: FW 8.42.2.0 Internal ram offsets modifications
    MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
    Documentation: net: octeontx2: Add RVU HW and drivers overview
    octeontx2-pf: ethtool RSS config support
    octeontx2-pf: Add basic ethtool support
    ...

    Linus Torvalds
     
  • Pull crypto updates from Herbert Xu:
    "API:
    - Removed CRYPTO_TFM_RES flags
    - Extended spawn grabbing to all algorithm types
    - Moved hash descsize verification into API code

    Algorithms:
    - Fixed recursive pcrypt dead-lock
    - Added new 32 and 64-bit generic versions of poly1305
    - Added cryptogams implementation of x86/poly1305

    Drivers:
    - Added support for i.MX8M Mini in caam
    - Added support for i.MX8M Nano in caam
    - Added support for i.MX8M Plus in caam
    - Added support for A33 variant of SS in sun4i-ss
    - Added TEE support for Raven Ridge in ccp
    - Added in-kernel API to submit TEE commands in ccp
    - Added AMD-TEE driver
    - Added support for BCM2711 in iproc-rng200
    - Added support for AES256-GCM based ciphers for chtls
    - Added aead support on SEC2 in hisilicon"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits)
    crypto: arm/chacha - fix build failured when kernel mode NEON is disabled
    crypto: caam - add support for i.MX8M Plus
    crypto: x86/poly1305 - emit does base conversion itself
    crypto: hisilicon - fix spelling mistake "disgest" -> "digest"
    crypto: chacha20poly1305 - add back missing test vectors and test chunking
    crypto: x86/poly1305 - fix .gitignore typo
    tee: fix memory allocation failure checks on drv_data and amdtee
    crypto: ccree - erase unneeded inline funcs
    crypto: ccree - make cc_pm_put_suspend() void
    crypto: ccree - split overloaded usage of irq field
    crypto: ccree - fix PM race condition
    crypto: ccree - fix FDE descriptor sequence
    crypto: ccree - cc_do_send_request() is void func
    crypto: ccree - fix pm wrongful error reporting
    crypto: ccree - turn errors to debug msgs
    crypto: ccree - fix AEAD decrypt auth fail
    crypto: ccree - fix typo in comment
    crypto: ccree - fix typos in error msgs
    crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data
    crypto: x86/sha - Eliminate casts on asm implementations
    ...

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:
    "These were the main changes in this cycle:

    - More -rt motivated separation of CONFIG_PREEMPT and
    CONFIG_PREEMPTION.

    - Add more low level scheduling topology sanity checks and warnings
    to filter out nonsensical topologies that break scheduling.

    - Extend uclamp constraints to influence wakeup CPU placement

    - Make the RT scheduler more aware of asymmetric topologies and CPU
    capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y

    - Make idle CPU selection more consistent

    - Various fixes, smaller cleanups, updates and enhancements - please
    see the git log for details"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
    sched/fair: Define sched_idle_cpu() only for SMP configurations
    sched/topology: Assert non-NUMA topology masks don't (partially) overlap
    idle: fix spelling mistake "iterrupts" -> "interrupts"
    sched/fair: Remove redundant call to cpufreq_update_util()
    sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
    sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
    sched/fair: calculate delta runnable load only when it's needed
    sched/cputime: move rq parameter in irqtime_account_process_tick
    stop_machine: Make stop_cpus() static
    sched/debug: Reset watchdog on all CPUs while processing sysrq-t
    sched/core: Fix size of rq::uclamp initialization
    sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
    sched/fair: Load balance aggressively for SCHED_IDLE CPUs
    sched/fair : Improve update_sd_pick_busiest for spare capacity case
    watchdog: Remove soft_lockup_hrtimer_cnt and related code
    sched/rt: Make RT capacity-aware
    sched/fair: Make EAS wakeup placement consider uclamp restrictions
    sched/fair: Make task_fits_capacity() consider uclamp restrictions
    sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
    sched/uclamp: Make uclamp util helpers use and return UL values
    ...

    Linus Torvalds
     

28 Jan, 2020

6 commits

  • Pull timer updates from Thomas Gleixner:
    "The timekeeping and timers departement provides:

    - Time namespace support:

    If a container migrates from one host to another then it expects
    that clocks based on MONOTONIC and BOOTTIME are not subject to
    disruption. Due to different boot time and non-suspended runtime
    these clocks can differ significantly on two hosts, in the worst
    case time goes backwards which is a violation of the POSIX
    requirements.

    The time namespace addresses this problem. It allows to set offsets
    for clock MONOTONIC and BOOTTIME once after creation and before
    tasks are associated with the namespace. These offsets are taken
    into account by timers and timekeeping including the VDSO.

    Offsets for wall clock based clocks (REALTIME/TAI) are not provided
    by this mechanism. While in theory possible, the overhead and code
    complexity would be immense and not justified by the esoteric
    potential use cases which were discussed at Plumbers '18.

    The overhead for tasks in the root namespace (ie where host time
    offsets = 0) is in the noise and great effort was made to ensure
    that especially in the VDSO. If time namespace is disabled in the
    kernel configuration the code is compiled out.

    Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
    feature and kept on for more than a year addressing review
    comments, finding better solutions. A pleasant experience.

    - Overhaul of the alarmtimer device dependency handling to ensure
    that the init/suspend/resume ordering is correct.

    - A new clocksource/event driver for Microchip PIT64

    - Suspend/resume support for the Hyper-V clocksource

    - The usual pile of fixes, updates and improvements mostly in the
    driver code"

    * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
    alarmtimer: Use wakeup source from alarmtimer platform device
    alarmtimer: Make alarmtimer platform device child of RTC device
    alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
    hrtimer: Add missing sparse annotation for __run_timer()
    lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
    MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
    clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
    clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
    clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
    clocksource/drivers/exynos_mct: Rename Exynos to lowercase
    clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
    clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
    clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
    clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
    clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
    clocksource/drivers/bcm2835_timer: Fix memory leak of timer
    clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
    clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
    clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
    ...

    Linus Torvalds
     
  • Pull debugobjects update from Thomas Gleixner:
    "A single commit for debug objects which fixes a pile of potential data
    races detected by KCSAN"

    * tag 'core-debugobjects-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    debugobjects: Fix various data races

    Linus Torvalds
     
  • Pull ioremap updates from Christoph Hellwig:
    "Remove the ioremap_nocache API (plus wrappers) that are always
    identical to ioremap"

    * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
    remove ioremap_nocache and devm_ioremap_nocache
    MIPS: define ioremap_nocache to ioremap

    Linus Torvalds
     
  • Pull block driver updates from Jens Axboe:
    "Like the core side, not a lot of changes here, just two main items:

    - Series of patches (via Coly) with fixes for bcache (Coly,
    Christoph)

    - MD pull request from Song"

    * tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block: (31 commits)
    bcache: reap from tail of c->btree_cache in bch_mca_scan()
    bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
    bcache: remove member accessed from struct btree
    bcache: print written and keys in trace_bcache_btree_write
    bcache: avoid unnecessary btree nodes flushing in btree_flush_write()
    bcache: add code comments for state->pool in __btree_sort()
    lib: crc64: include for 'crc64_be'
    bcache: use read_cache_page_gfp to read the superblock
    bcache: store a pointer to the on-disk sb in the cache and cached_dev structures
    bcache: return a pointer to the on-disk sb from read_super
    bcache: transfer the sb_page reference to register_{bdev,cache}
    bcache: fix use-after-free in register_bcache()
    bcache: properly initialize 'path' and 'err' in register_bcache()
    bcache: rework error unwinding in register_bcache
    bcache: use a separate data structure for the on-disk super block
    bcache: cached_dev_free needs to put the sb page
    md/raid1: introduce wait_for_serialization
    md/raid1: use bucket based mechanism for IO serialization
    md: introduce a new struct for IO serialization
    md: don't destroy serial_info_pool if serialize_policy is true
    ...

    Linus Torvalds
     
  • Pull livepatching updates from Jiri Kosina:
    "Fixes for selftests and samples for 'shadow variables' livepatching
    feature, from Petr Mladek"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
    livepatch: Handle allocation failure in the sample of shadow variable API
    livepatch/samples/selftest: Use klp_shadow_alloc() API correctly
    livepatch/selftest: Clean up shadow variable names and type
    livepatch/sample: Use the right type for the leaking data pointer

    Linus Torvalds
     
  • Pull arm64 updates from Will Deacon:
    "The changes are a real mixed bag this time around.

    The only scary looking one from the diffstat is the uapi change to
    asm-generic/mman-common.h, but this has been acked by Arnd and is
    actually just adding a pair of comments in an attempt to prevent
    allocation of some PROT values which tend to get used for
    arch-specific purposes. We'll be using them for Branch Target
    Identification (a CFI-like hardening feature), which is currently
    under review on the mailing list.

    New architecture features:

    - Support for Armv8.5 E0PD, which benefits KASLR in the same way as
    KPTI but without the overhead. This allows KPTI to be disabled on
    CPUs that are not affected by Meltdown, even is KASLR is enabled.

    - Initial support for the Armv8.5 RNG instructions, which claim to
    provide access to a high bandwidth, cryptographically secure
    hardware random number generator. As well as exposing these to
    userspace, we also use them as part of the KASLR seed and to seed
    the crng once all CPUs have come online.

    - Advertise a bunch of new instructions to userspace, including
    support for Data Gathering Hint, Matrix Multiply and 16-bit
    floating point.

    Kexec:

    - Cleanups in preparation for relocating with the MMU enabled

    - Support for loading crash dump kernels with kexec_file_load()

    Perf and PMU drivers:

    - Cleanups and non-critical fixes for a couple of system PMU drivers

    FPU-less (aka broken) CPU support:

    - Considerable fixes to support CPUs without the FP/SIMD extensions,
    including their presence in heterogeneous systems. Good luck
    finding a 64-bit userspace that handles this.

    Modern assembly function annotations:

    - Start migrating our use of ENTRY() and ENDPROC() over to the
    new-fangled SYM_{CODE,FUNC}_{START,END} macros, which are intended
    to aid debuggers

    Kbuild:

    - Cleanup detection of LSE support in the assembler by introducing
    'as-instr'

    - Remove compressed Image files when building clean targets

    IP checksumming:

    - Implement optimised IPv4 checksumming routine when hardware offload
    is not in use. An IPv6 version is in the works, pending testing.

    Hardware errata:

    - Work around Cortex-A55 erratum #1530923

    Shadow call stack:

    - Work around some issues with Clang's integrated assembler not
    liking our perfectly reasonable assembly code

    - Avoid allocating the X18 register, so that it can be used to hold
    the shadow call stack pointer in future

    ACPI:

    - Fix ID count checking in IORT code. This may regress broken
    firmware that happened to work with the old implementation, in
    which case we'll have to revert it and try something else

    - Fix DAIF corruption on return from GHES handler with pseudo-NMIs

    Miscellaneous:

    - Whitelist some CPUs that are unaffected by Spectre-v2

    - Reduce frequency of ASID rollover when KPTI is compiled in but
    inactive

    - Reserve a couple of arch-specific PROT flags that are already used
    by Sparc and PowerPC and are planned for later use with BTI on
    arm64

    - Preparatory cleanup of our entry assembly code in preparation for
    moving more of it into C later on

    - Refactoring and cleanup"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (73 commits)
    arm64: acpi: fix DAIF manipulation with pNMI
    arm64: kconfig: Fix alignment of E0PD help text
    arm64: Use v8.5-RNG entropy for KASLR seed
    arm64: Implement archrandom.h for ARMv8.5-RNG
    arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
    arm64: entry: Avoid empty alternatives entries
    arm64: Kconfig: select HAVE_FUTEX_CMPXCHG
    arm64: csum: Fix pathological zero-length calls
    arm64: entry: cleanup sp_el0 manipulation
    arm64: entry: cleanup el0 svc handler naming
    arm64: entry: mark all entry code as notrace
    arm64: assembler: remove smp_dmb macro
    arm64: assembler: remove inherit_daif macro
    ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()
    mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use
    arm64: Use macros instead of hard-coded constants for MAIR_EL1
    arm64: Add KRYO{3,4}XX CPU cores to spectre-v2 safe list
    arm64: kernel: avoid x18 in __cpu_soft_restart
    arm64: kvm: stop treating register x18 as caller save
    arm64/lib: copy_page: avoid x18 register in assembler code
    ...

    Linus Torvalds
     

27 Jan, 2020

2 commits


25 Jan, 2020

1 commit

  • The range passed to user_access_begin() by strncpy_from_user() and
    strnlen_user() starts at 'src' and goes up to the limit of userspace
    although reads will be limited by the 'count' param.

    On 32 bits powerpc (book3s/32) access has to be granted for each
    256Mbytes segment and the cost increases with the number of segments to
    unlock.

    Limit the range with 'count' param.

    Fixes: 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
    Signed-off-by: Christophe Leroy
    Signed-off-by: Linus Torvalds

    Christophe Leroy
     

24 Jan, 2020

2 commits

  • Pull XArray fixes from Matthew Wilcox:
    "Primarily bugfixes, mostly around handling index wrap-around
    correctly.

    A couple of doc fixes and adding missing APIs.

    I had an oops live on stage at linux.conf.au this year, and it turned
    out to be a bug in xas_find() which I can't prove isn't triggerable in
    the current codebase. Then in looking for the bug, I spotted two more
    bugs.

    The bots have had a few days to chew on this with no problems
    reported, and it passes the test-suite (which now has more tests to
    make sure these problems don't come back)"

    * tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax:
    XArray: Add xa_for_each_range
    XArray: Fix xas_find returning too many entries
    XArray: Fix xa_find_after with multi-index entries
    XArray: Fix infinite loop with entry at ULONG_MAX
    XArray: Add wrappers for nested spinlocks
    XArray: Improve documentation of search marks
    XArray: Fix xas_pause at ULONG_MAX

    Linus Torvalds
     
  • The crc64_be() is declared in so include
    this where the symbol is defined to avoid the following
    warning:

    lib/crc64.c:43:12: warning: symbol 'crc64_be' was not declared. Should it be static?

    Signed-off-by: Ben Dooks (Codethink)
    Reviewed-by: Andy Shevchenko
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Ben Dooks (Codethink)
     

22 Jan, 2020

1 commit

  • When this was originally ported, the 12-byte nonce vectors were left out
    to keep things simple. I agree that we don't need nor want a library
    interface for 12-byte nonces. But these test vectors were specially
    crafted to look at issues in the underlying primitives and related
    interactions. Therefore, we actually want to keep around all of the
    test vectors, and simply have a helper function to test them with.

    Secondly, the sglist-based chunking code in the library interface is
    rather complicated, so this adds a developer-only test for ensuring that
    all the book keeping is correct, across a wide array of possibilities.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     

20 Jan, 2020

1 commit


18 Jan, 2020

3 commits


17 Jan, 2020

4 commits

  • The counters obj_pool_free, and obj_nr_tofree, and the flag obj_freeing are
    read locklessly outside the pool_lock critical sections. If read with plain
    accesses, this would result in data races.

    This is addressed as follows:

    * reads outside critical sections become READ_ONCE()s (pairing with
    WRITE_ONCE()s added);

    * writes become WRITE_ONCE()s (pairing with READ_ONCE()s added); since
    writes happen inside critical sections, only the write and not the read
    of RMWs needs to be atomic, thus WRITE_ONCE(var, var +/- X) is
    sufficient.

    The data races were reported by KCSAN:

    BUG: KCSAN: data-race in __free_object / fill_pool

    write to 0xffffffff8beb04f8 of 4 bytes by interrupt on cpu 1:
    __free_object+0x1ee/0x8e0 lib/debugobjects.c:404
    __debug_check_no_obj_freed+0x199/0x330 lib/debugobjects.c:969
    debug_check_no_obj_freed+0x3c/0x44 lib/debugobjects.c:994
    slab_free_hook mm/slub.c:1422 [inline]

    read to 0xffffffff8beb04f8 of 4 bytes by task 1 on cpu 2:
    fill_pool+0x3d/0x520 lib/debugobjects.c:135
    __debug_object_init+0x3c/0x810 lib/debugobjects.c:536
    debug_object_init lib/debugobjects.c:591 [inline]
    debug_object_activate+0x228/0x320 lib/debugobjects.c:677
    debug_rcu_head_queue kernel/rcu/rcu.h:176 [inline]

    BUG: KCSAN: data-race in __debug_object_init / fill_pool

    read to 0xffffffff8beb04f8 of 4 bytes by task 10 on cpu 6:
    fill_pool+0x3d/0x520 lib/debugobjects.c:135
    __debug_object_init+0x3c/0x810 lib/debugobjects.c:536
    debug_object_init_on_stack+0x39/0x50 lib/debugobjects.c:606
    init_timer_on_stack_key kernel/time/timer.c:742 [inline]

    write to 0xffffffff8beb04f8 of 4 bytes by task 1 on cpu 3:
    alloc_object lib/debugobjects.c:258 [inline]
    __debug_object_init+0x717/0x810 lib/debugobjects.c:544
    debug_object_init lib/debugobjects.c:591 [inline]
    debug_object_activate+0x228/0x320 lib/debugobjects.c:677
    debug_rcu_head_queue kernel/rcu/rcu.h:176 [inline]

    BUG: KCSAN: data-race in free_obj_work / free_object

    read to 0xffffffff9140c190 of 4 bytes by task 10 on cpu 6:
    free_object+0x4b/0xd0 lib/debugobjects.c:426
    debug_object_free+0x190/0x210 lib/debugobjects.c:824
    destroy_timer_on_stack kernel/time/timer.c:749 [inline]

    write to 0xffffffff9140c190 of 4 bytes by task 93 on cpu 1:
    free_obj_work+0x24f/0x480 lib/debugobjects.c:313
    process_one_work+0x454/0x8d0 kernel/workqueue.c:2264
    worker_thread+0x9a/0x780 kernel/workqueue.c:2410

    Reported-by: Qian Cai
    Signed-off-by: Marco Elver
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20200116185529.11026-1-elver@google.com

    Marco Elver
     
  • The commit e91c2518a5d22a ("livepatch: Initialize shadow variables
    safely by a custom callback") leads to the following static checker
    warning:

    samples/livepatch/livepatch-shadow-fix1.c:86 livepatch_fix1_dummy_alloc()
    error: 'klp_shadow_alloc()' 'leak' too small (4 vs 8)

    It is because klp_shadow_alloc() is used a wrong way:

    int *leak;
    shadow_leak = klp_shadow_alloc(d, SV_LEAK, sizeof(leak), GFP_KERNEL,
    shadow_leak_ctor, leak);

    The code is supposed to store the "leak" pointer into the shadow variable.
    3rd parameter correctly passes size of the data (size of pointer). But
    the 5th parameter is wrong. It should pass pointer to the data (pointer
    to the pointer) but it passes the pointer directly.

    It works because shadow_leak_ctor() handle "ctor_data" as the data
    instead of pointer to the data. But it is semantically wrong and
    confusing.

    The same problem is also in the module used by selftests. In this case,
    "pvX" variables are introduced. They represent the data stored in
    the shadow variables.

    Reported-by: Dan Carpenter
    Signed-off-by: Petr Mladek
    Reviewed-by: Joe Lawrence
    Acked-by: Miroslav Benes
    Reviewed-by: Kamalesh Babulal
    Signed-off-by: Jiri Kosina

    Petr Mladek
     
  • The shadow variable selftest is quite tricky. Especially it is problematic
    to understand what values are stored, returned, and printed.

    Make it easier to understand by using "int *var, **sv" variables
    consistently everywhere instead of the generic "void *", "ret",
    and "ctor_data".

    Signed-off-by: Petr Mladek
    Reviewed-by: Joe Lawrence
    Acked-by: Miroslav Benes
    Reviewed-by: Kamalesh Babulal
    Signed-off-by: Jiri Kosina

    Petr Mladek
     
  • Only perform READ_ONCE(vd[CS_HRES_COARSE].hrtimer_res) for
    HRES and RAW clocks.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/7ac2f0d21652f95e2bbdfa6bd514ae6c7caf53ab.1579196675.git.christophe.leroy@c-s.fr

    Christophe Leroy
     

16 Jan, 2020

6 commits

  • If CRYPTO_CURVE25519 is y, CRYPTO_LIB_CURVE25519_GENERIC will be
    y, but CRYPTO_LIB_CURVE25519 may be set to m, this causes build
    errors:

    lib/crypto/curve25519-selftest.o: In function `curve25519':
    curve25519-selftest.c:(.text.unlikely+0xc): undefined reference to `curve25519_arch'
    lib/crypto/curve25519-selftest.o: In function `curve25519_selftest':
    curve25519-selftest.c:(.init.text+0x17e): undefined reference to `curve25519_base_arch'

    This is because the curve25519 self-test code is being controlled
    by the GENERIC option rather than the overall CURVE25519 option,
    as is the case with blake2s. To recap, the GENERIC and ARCH options
    for CURVE25519 are internal only and selected by users such as
    the Crypto API, or the externally visible CURVE25519 option which
    in turn is selected by wireguard. The self-test is specific to the
    the external CURVE25519 option and should not be enabled by the
    Crypto API.

    This patch fixes this by splitting the GENERIC module from the
    CURVE25519 module with the latter now containing just the self-test.

    Reported-by: Hulk Robot
    Fixes: aa127963f1ca ("crypto: lib/curve25519 - re-add selftests")
    Signed-off-by: Herbert Xu
    Reviewed-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • These x86_64 vectorized implementations support AVX, AVX-2, and AVX512F.
    The AVX-512F implementation is disabled on Skylake, due to throttling,
    but it is quite fast on >= Cannonlake.

    On the left is cycle counts on a Core i7 6700HQ using the AVX-2
    codepath, comparing this implementation ("new") to the implementation in
    the current crypto api ("old"). On the right are benchmarks on a Xeon
    Gold 5120 using the AVX-512 codepath. The new implementation is faster
    on all benchmarks.

    AVX-2 AVX-512
    --------- -----------

    size old new size old new
    ---- ---- ---- ---- ---- ----
    0 70 68 0 74 70
    16 92 90 16 96 92
    32 134 104 32 136 106
    48 172 120 48 184 124
    64 218 136 64 218 138
    80 254 158 80 260 160
    96 298 174 96 300 176
    112 342 192 112 342 194
    128 388 212 128 384 212
    144 428 228 144 420 226
    160 466 246 160 464 248
    176 510 264 176 504 264
    192 550 282 192 544 282
    208 594 302 208 582 300
    224 628 316 224 624 318
    240 676 334 240 662 338
    256 716 354 256 708 358
    272 764 374 272 748 372
    288 802 352 288 788 358
    304 420 366 304 422 370
    320 428 360 320 432 364
    336 484 378 336 486 380
    352 426 384 352 434 390
    368 478 400 368 480 408
    384 488 394 384 490 398
    400 542 408 400 542 412
    416 486 416 416 492 426
    432 534 430 432 538 436
    448 544 422 448 546 432
    464 600 438 464 600 448
    480 540 448 480 548 456
    496 594 464 496 594 476
    512 602 456 512 606 470
    528 656 476 528 656 480
    544 600 480 544 606 498
    560 650 494 560 652 512
    576 664 490 576 662 508
    592 714 508 592 716 522
    608 656 514 608 664 538
    624 708 532 624 710 552
    640 716 524 640 720 516
    656 770 536 656 772 526
    672 716 548 672 722 544
    688 770 562 688 768 556
    704 774 552 704 778 556
    720 826 568 720 832 568
    736 768 574 736 780 584
    752 822 592 752 826 600
    768 830 584 768 836 560
    784 884 602 784 888 572
    800 828 610 800 838 588
    816 884 628 816 884 604
    832 888 618 832 894 598
    848 942 632 848 946 612
    864 884 644 864 896 628
    880 936 660 880 942 644
    896 948 652 896 952 608
    912 1000 664 912 1004 616
    928 942 676 928 954 634
    944 994 690 944 1000 646
    960 1002 680 960 1008 646
    976 1054 694 976 1062 658
    992 1002 706 992 1012 674
    1008 1052 720 1008 1058 690

    This commit wires in the prior implementation from Andy, and makes the
    following changes to be suitable for kernel land.

    - Some cosmetic and structural changes, like renaming labels to
    .Lname, constants, and other Linux conventions, as well as making
    the code easy for us to maintain moving forward.

    - CPU feature checking is done in C by the glue code.

    - We avoid jumping into the middle of functions, to appease objtool,
    and instead parameterize shared code.

    - We maintain frame pointers so that stack traces make sense.

    - We remove the dependency on the perl xlate code, which transforms
    the output into things that assemblers we don't care about use.

    Importantly, none of our changes affect the arithmetic or core code, but
    just involve the differing environment of kernel space.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Samuel Neves
    Co-developed-by: Samuel Neves
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     
  • These two C implementations from Zinc -- a 32x32 one and a 64x64 one,
    depending on the platform -- come from Andrew Moon's public domain
    poly1305-donna portable code, modified for usage in the kernel. The
    precomputation in the 32-bit version and the use of 64x64 multiplies in
    the 64-bit version make these perform better than the code it replaces.
    Moon's code is also very widespread and has received many eyeballs of
    scrutiny.

    There's a bit of interference between the x86 implementation, which
    relies on internal details of the old scalar implementation. In the next
    commit, the x86 implementation will be replaced with a faster one that
    doesn't rely on this, so none of this matters much. But for now, to keep
    this passing the tests, we inline the bits of the old implementation
    that the x86 implementation relied on. Also, since we now support a
    slightly larger key space, via the union, some offsets had to be fixed
    up.

    Nonce calculation was folded in with the emit function, to take
    advantage of 64x64 arithmetic. However, Adiantum appeared to rely on no
    nonce handling in emit, so this path was conditionalized. We also
    introduced a new struct, poly1305_core_key, to represent the precise
    amount of space that particular implementation uses.

    Testing with kbench9000, depending on the CPU, the update function for
    the 32x32 version has been improved by 4%-7%, and for the 64x64 by
    19%-30%. The 32x32 gains are small, but I think there's great value in
    having a parallel implementation to the 64x64 one so that the two can be
    compared side-by-side as nice stand-alone units.

    Signed-off-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Jason A. Donenfeld
     
  • In order to do kernel builds with the bounds checker individually
    available, introduce CONFIG_UBSAN_BOUNDS, with the remaining options
    under CONFIG_UBSAN_MISC.

    For example, using this, we can start to expand the coverage syzkaller is
    providing. Right now, all of UBSan is disabled for syzbot builds because
    taken as a whole, it is too noisy. This will let us focus on one feature
    at a time.

    For the bounds checker specifically, this provides a mechanism to
    eliminate an entire class of array overflows with close to zero
    performance overhead (I cannot measure a difference). In my (mostly)
    defconfig, enabling bounds checking adds ~4200 checks to the kernel.
    Performance changes are in the noise, likely due to the branch predictors
    optimizing for the non-fail path.

    Some notes on the bounds checker:

    - it does not instrument {mem,str}*()-family functions, it only
    instruments direct indexed accesses (e.g. "foo[i]"). Dealing with
    the {mem,str}*()-family functions is a work-in-progress around
    CONFIG_FORTIFY_SOURCE[1].

    - it ignores flexible array members, including the very old single
    byte (e.g. "int foo[1];") declarations. (Note that GCC's
    implementation appears to ignore _all_ trailing arrays, but Clang only
    ignores empty, 0, and 1 byte arrays[2].)

    [1] https://github.com/KSPP/linux/issues/6
    [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92589

    Bug: 136249967
    Link: https://lore.kernel.org/kernel-hardening/20191121181519.28637-3-keescook@chromium.org/
    Suggested-by: Elena Petrova
    Signed-off-by: Kees Cook
    Reviewed-by: Andrey Ryabinin
    Signed-off-by: Elena Petrova
    Change-Id: I1f79faea7386af1bc50faaf8b399ea6448611d5a

    Kees Cook
     
  • The Undefined Behavior Sanitizer can operate in two modes: warning
    reporting mode via lib/ubsan.c handler calls, or trap mode, which uses
    __builtin_trap() as the handler. Using lib/ubsan.c means the kernel
    image is about 5% larger (due to all the debugging text and reporting
    structures to capture details about the warning conditions). Using the
    trap mode, the image size changes are much smaller, though at the loss
    of the "warning only" mode.

    In order to give greater flexibility to system builders that want
    minimal changes to image size and are prepared to deal with kernel
    threads being killed, this introduces CONFIG_UBSAN_TRAP. The resulting
    image sizes comparison:

    text data bss dec hex filename
    19533663 6183037 18554956 44271656 2a38828 vmlinux.stock
    19991849 7618513 18874448 46484810 2c54d4a vmlinux.ubsan
    19712181 6284181 18366540 44362902 2a4ec96 vmlinux.ubsan-trap

    CONFIG_UBSAN=y: image +4.8% (text +2.3%, data +18.9%)
    CONFIG_UBSAN_TRAP=y: image +0.2% (text +0.9%, data +1.6%)

    Bug: 136249967
    Link: https://lore.kernel.org/kernel-hardening/20191121181519.28637-2-keescook@chromium.org/
    Suggested-by: Elena Petrova
    Signed-off-by: Kees Cook
    Signed-off-by: Elena Petrova
    Change-Id: Ifa36d25f9649958cfc7b78e21777390f128db165

    Kees Cook
     
  • Casting the comparison function to a different type trips indirect call
    Control-Flow Integrity (CFI) checking. Remove the additional consts from
    cmp_func, and the now unneeded casts.

    Fixes: 043b3f7b6388 ("lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS")
    (am from https://lore.kernel.org/patchwork/patch/1178059/)
    Link: https://lore.kernel.org/lkml/20200110225602.91663-1-samitolvanen@google.com
    Bug: 147506196
    Change-Id: I329b1a454c30af78f9851db6a38c3f060499ec0d
    Signed-off-by: Sami Tolvanen
    Signed-off-by: Todd Kjos

    Sami Tolvanen
     

14 Jan, 2020

8 commits

  • To support time namespaces in the vdso with a minimal impact on regular non
    time namespace affected tasks, the namespace handling needs to be hidden in
    a slow path.

    The most obvious place is vdso_seq_begin(). If a task belongs to a time
    namespace then the VVAR page which contains the system wide vdso data is
    replaced with a namespace specific page which has the same layout as the
    VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
    and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
    namespace handling path.

    The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
    update of the vdso data is in progress, is not really affecting regular
    tasks which are not part of a time namespace as the task is spin waiting
    for the update to finish and vdso_data->seq to become even again.

    If a time namespace task hits that code path, it invokes the corresponding
    time getter function which retrieves the real VVAR page, reads host time
    and then adds the offset for the requested clock which is stored in the
    special VVAR page.

    If VDSO time namespace support is disabled the whole magic is compiled out.

    Initial testing shows that the disabled case is almost identical to the
    host case which does not take the slow timens path. With the special timens
    page installed the performance hit is constant time and in the range of
    5-7%.

    For the vdso functions which are not using the sequence count an
    unconditional check for vdso_data->clock_mode is added which switches to
    the real vdso when the clock_mode is VCLOCK_TIMENS.

    [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
    pointer by CS_RAW offset.]

    Suggested-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrei Vagin
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com

    Thomas Gleixner
     
  • Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
    (more clock_gettime() cycles - the better):

    clock | before | after | diff
    ----------------------------------------------------------
    monotonic | 153222105 | 166775025 | 8.8%
    monotonic-coarse | 671557054 | 691513017 | 3.0%
    monotonic-raw | 147116067 | 161057395 | 9.5%
    boottime | 153446224 | 166962668 | 9.1%

    The improvement for arm64 for monotonic and boottime is around 3.5%.

    clock | before | after | diff
    ==================================================
    monotonic 17326692 17951770 3.6%
    monotonic-coarse 43624027 44215292 1.3%
    monotonic-raw 17541809 17554932 0.1%
    boottime 17334982 17954361 3.5%

    [ tglx: Avoid the goto ]

    Signed-off-by: Andrei Vagin
    Signed-off-by: Dmitry Safonov
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20191112012724.250792-3-dima@arista.com

    Andrei Vagin
     
  • VDSO_HRES and VDSO_RAW clocks are handled the same way.

    Avoid the code duplication.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Andy Lutomirski
    Link: https://lore.kernel.org/r/fdf1a968a8f7edd61456f1689ac44082ebb19c15.1577111367.git.christophe.leroy@c-s.fr

    Christophe Leroy
     
  • do_coarse() is similar to do_hres() except that it never fails.

    Change its type to int instead of void and let it always return success (0)
    to simplify the call site.

    Signed-off-by: Christophe Leroy
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/21e8afa38c02ca8672c2690307383507fe63b454.1577111367.git.christophe.leroy@c-s.fr

    Christophe Leroy
     
  • Since all the architectures that support the generic vDSO library have
    been converted to support the 32 bit fallbacks it is not required
    anymore to check the return value of __cvdso_clock_get*time32_common()
    before updating the old_timespec fields.

    Remove the related checks from the generic vdso library.

    References: c60a32ea4f45 ("lib/vdso/32: Provide legacy syscall fallbacks")
    Signed-off-by: Vincenzo Frascino
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20190830135902.20861-6-vincenzo.frascino@arm.com

    Vincenzo Frascino
     
  • VDSO_HAS_32BIT_FALLBACK was introduced to address a regression which
    caused seccomp to deny access to the applications to clock_gettime64()
    and clock_getres64() because they are not enabled in the existing
    filters.

    The purpose of VDSO_HAS_32BIT_FALLBACK was to simplify the conditional
    implementation of __cvdso_clock_get*time32() variants.

    Now that all the architectures that support the generic vDSO library
    have been converted to support the 32 bit fallbacks the conditional
    can be removed.

    Signed-off-by: Vincenzo Frascino
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20190830135902.20861-5-vincenzo.frascino@arm.com

    References: c60a32ea4f45 ("lib/vdso/32: Provide legacy syscall fallbacks")

    Vincenzo Frascino
     
  • clock_gettime32 and clock_getres_time32 should be compiled only with a
    32 bit vdso library.

    Exclude these symbols when BUILD_VDSO32 is not defined.

    Signed-off-by: Vincenzo Frascino
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Andy Lutomirski
    Link: https://lore.kernel.org/r/20190830135902.20861-3-vincenzo.frascino@arm.com

    Vincenzo Frascino
     
  • There are several algorithms available for raid6 to generate xor and syndrome
    parity, including basic int1, int2 ... int32 and SIMD optimized implementation
    like sse and neon. To test and choose the best algorithms at the initial
    stage, we need provide enough disk data to feed the algorithms. However, the
    disk number we provided depends on page size and gfmul table, seeing bellow:

    const int disks = (65536/PAGE_SIZE) + 2;

    So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk,
    as a result the chosed algorithm is not reliable. For example, on my arm64
    machine with 64K page enabled, it will choose intx32 as the best one, although
    the NEON implementation is better.

    This patch tries to fix the problem by defining a constant raid6 disk number to
    supporting arbitrary page size.

    Suggested-by: H. Peter Anvin
    Signed-off-by: Zhengyuan Liu
    Signed-off-by: Song Liu

    Zhengyuan Liu