04 Sep, 2020

1 commit

  • syzbot reports,

    WARNING: inconsistent lock state
    5.9.0-rc2-syzkaller #0 Not tainted
    --------------------------------
    inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
    syz-executor.0/26715 takes:
    (padata_works_lock){+.?.}-{2:2}, at: padata_do_parallel kernel/padata.c:220
    {IN-SOFTIRQ-W} state was registered at:
    spin_lock include/linux/spinlock.h:354 [inline]
    padata_do_parallel kernel/padata.c:220
    ...
    __do_softirq kernel/softirq.c:298
    ...
    sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1091
    asm_sysvec_apic_timer_interrupt arch/x86/include/asm/idtentry.h:581

    Possible unsafe locking scenario:

    CPU0
    ----
    lock(padata_works_lock);

    lock(padata_works_lock);

    padata_do_parallel() takes padata_works_lock with softirqs enabled, so a
    deadlock is possible if, on the same CPU, the lock is acquired in
    process context and then softirq handling done in an interrupt leads to
    the same path.

    Fix by leaving softirqs disabled while do_parallel holds
    padata_works_lock.

    Reported-by: syzbot+f4b9f49e38e25eb4ef52@syzkaller.appspotmail.com
    Fixes: 4611ce2246889 ("padata: allocate work structures for parallel jobs from a pool")
    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     

23 Jul, 2020

6 commits

  • Only its reorder field is actually used now, so remove the struct and
    embed @reorder directly in parallel_data.

    No functional change, just a cleanup.

    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • There's no reason to have two interfaces when there's only one caller.
    Removing _possible saves text and simplifies future changes.

    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • A padata instance has effective cpumasks that store the user-supplied
    masks ANDed with the online mask, but this middleman is unnecessary.
    parallel_data keeps the same information around. Removing this saves
    text and code churn in future changes.

    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • pd_setup_cpumasks() has only one caller. Move its contents inline to
    prepare for the next cleanup.

    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • padata_stop() has two callers and is unnecessary in both cases. When
    pcrypt calls it before padata_free(), it's being unloaded so there are
    no outstanding padata jobs[0]. When __padata_free() calls it, it's
    either along the same path or else pcrypt initialization failed, which
    of course means there are also no outstanding jobs.

    Removing it simplifies padata and saves text.

    [0] https://lore.kernel.org/linux-crypto/20191119225017.mjrak2fwa5vccazl@gondor.apana.org.au/

    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • padata_start() is only used right after pcrypt allocates an instance
    with all possible CPUs, when PADATA_INVALID can't happen, so there's no
    need for a separate "start" step. It can be done during allocation to
    save text, make using padata easier, and avoid unneeded calls in the
    future.

    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     

18 Jun, 2020

1 commit

  • A 5.7 kernel hangs during a tcrypt test of padata that waits for an AEAD
    request to finish. This is only seen on large machines running many
    concurrent requests.

    The issue is that padata never serializes the request. The removal of
    the reorder_objects atomic missed that the memory barrier in
    padata_do_serial() depends on it.

    Upgrade the barrier from smp_mb__after_atomic to smp_mb to get correct
    ordering again.

    Fixes: 3facced7aeed1 ("padata: remove reorder_objects")
    Signed-off-by: Daniel Jordan
    Cc: Steffen Klassert
    Cc: linux-kernel@vger.kernel.org
    Cc:
    Signed-off-by: Herbert Xu

    Daniel Jordan
     

04 Jun, 2020

4 commits

  • Sometimes the kernel doesn't take full advantage of system memory
    bandwidth, leading to a single CPU spending excessive time in
    initialization paths where the data scales with memory size.

    Multithreading naturally addresses this problem.

    Extend padata, a framework that handles many parallel yet singlethreaded
    jobs, to also handle multithreaded jobs by adding support for splitting up
    the work evenly, specifying a minimum amount of work that's appropriate
    for one helper thread to do, load balancing between helpers, and
    coordinating them.

    This is inspired by work from Pavel Tatashin and Steve Sistare.

    Signed-off-by: Daniel Jordan
    Signed-off-by: Andrew Morton
    Tested-by: Josh Triplett
    Cc: Alexander Duyck
    Cc: Alex Williamson
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Herbert Xu
    Cc: Jason Gunthorpe
    Cc: Jonathan Corbet
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Pavel Machek
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Robert Elliott
    Cc: Shile Zhang
    Cc: Steffen Klassert
    Cc: Steven Sistare
    Cc: Tejun Heo
    Cc: Zi Yan
    Link: http://lkml.kernel.org/r/20200527173608.2885243-5-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds

    Daniel Jordan
     
  • padata allocates per-CPU, per-instance work structs for parallel jobs. A
    do_parallel call assigns a job to a sequence number and hashes the number
    to a CPU, where the job will eventually run using the corresponding work.

    This approach fit with how padata used to bind a job to each CPU
    round-robin, makes less sense after commit bfde23ce200e6 ("padata: unbind
    parallel jobs from specific CPUs") because a work isn't bound to a
    particular CPU anymore, and isn't needed at all for multithreaded jobs
    because they don't have sequence numbers.

    Replace the per-CPU works with a preallocated pool, which allows sharing
    them between existing padata users and the upcoming multithreaded user.
    The pool will also facilitate setting NUMA-aware concurrency limits with
    later users.

    The pool is sized according to the number of possible CPUs. With this
    limit, MAX_OBJ_NUM no longer makes sense, so remove it.

    If the global pool is exhausted, a parallel job is run in the current task
    instead to throttle a system trying to do too much in parallel.

    Signed-off-by: Daniel Jordan
    Signed-off-by: Andrew Morton
    Tested-by: Josh Triplett
    Cc: Alexander Duyck
    Cc: Alex Williamson
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Herbert Xu
    Cc: Jason Gunthorpe
    Cc: Jonathan Corbet
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Pavel Machek
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Robert Elliott
    Cc: Shile Zhang
    Cc: Steffen Klassert
    Cc: Steven Sistare
    Cc: Tejun Heo
    Cc: Zi Yan
    Link: http://lkml.kernel.org/r/20200527173608.2885243-4-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds

    Daniel Jordan
     
  • padata will soon initialize the system's struct pages in parallel, so it
    needs to be ready by page_alloc_init_late().

    The error return from padata_driver_init() triggers an initcall warning,
    so add a warning to padata_init() to avoid silent failure.

    Signed-off-by: Daniel Jordan
    Signed-off-by: Andrew Morton
    Tested-by: Josh Triplett
    Cc: Alexander Duyck
    Cc: Alex Williamson
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Herbert Xu
    Cc: Jason Gunthorpe
    Cc: Jonathan Corbet
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Pavel Machek
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Robert Elliott
    Cc: Shile Zhang
    Cc: Steffen Klassert
    Cc: Steven Sistare
    Cc: Tejun Heo
    Cc: Zi Yan
    Link: http://lkml.kernel.org/r/20200527173608.2885243-3-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds

    Daniel Jordan
     
  • Patch series "padata: parallelize deferred page init", v3.

    Deferred struct page init is a bottleneck in kernel boot--the biggest for
    us and probably others. Optimizing it maximizes availability for
    large-memory systems and allows spinning up short-lived VMs as needed
    without having to leave them running. It also benefits bare metal
    machines hosting VMs that are sensitive to downtime. In projects such as
    VMM Fast Restart[1], where guest state is preserved across kexec reboot,
    it helps prevent application and network timeouts in the guests.

    So, multithread deferred init to take full advantage of system memory
    bandwidth.

    Extend padata, a framework that handles many parallel singlethreaded jobs,
    to handle multithreaded jobs as well by adding support for splitting up
    the work evenly, specifying a minimum amount of work that's appropriate
    for one helper thread to do, load balancing between helpers, and
    coordinating them. More documentation in patches 4 and 8.

    This series is the first step in a project to address other memory
    proportional bottlenecks in the kernel such as pmem struct page init, vfio
    page pinning, hugetlb fallocate, and munmap. Deferred page init doesn't
    require concurrency limits, resource control, or priority adjustments like
    these other users will because it happens during boot when the system is
    otherwise idle and waiting for page init to finish.

    This has been run on a variety of x86 systems and speeds up kernel boot by
    4% to 49%, saving up to 1.6 out of 4 seconds. Patch 6 has more numbers.

    This patch (of 8):

    padata_driver_exit() is unnecessary because padata isn't built as a module
    and doesn't exit.

    padata's init routine will soon allocate memory, so getting rid of the
    exit function now avoids pointless code to free it.

    Signed-off-by: Daniel Jordan
    Signed-off-by: Andrew Morton
    Tested-by: Josh Triplett
    Cc: Alexander Duyck
    Cc: Alex Williamson
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Hildenbrand
    Cc: Herbert Xu
    Cc: Jason Gunthorpe
    Cc: Jonathan Corbet
    Cc: Kirill Tkhai
    Cc: Michal Hocko
    Cc: Pavel Machek
    Cc: Pavel Tatashin
    Cc: Peter Zijlstra
    Cc: Randy Dunlap
    Cc: Robert Elliott
    Cc: Shile Zhang
    Cc: Steffen Klassert
    Cc: Steven Sistare
    Cc: Tejun Heo
    Cc: Zi Yan
    Link: http://lkml.kernel.org/r/20200527173608.2885243-1-daniel.m.jordan@oracle.com
    Link: http://lkml.kernel.org/r/20200527173608.2885243-2-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds

    Daniel Jordan
     

30 Apr, 2020

1 commit

  • Removing the pcrypt module triggers this:

    general protection fault, probably for non-canonical
    address 0xdead000000000122
    CPU: 5 PID: 264 Comm: modprobe Not tainted 5.6.0+ #2
    Hardware name: QEMU Standard PC
    RIP: 0010:__cpuhp_state_remove_instance+0xcc/0x120
    Call Trace:
    padata_sysfs_release+0x74/0xce
    kobject_put+0x81/0xd0
    padata_free+0x12/0x20
    pcrypt_exit+0x43/0x8ee [pcrypt]

    padata instances wrongly use the same hlist node for the online and dead
    states, so __padata_free()'s second cpuhp remove call chokes on the node
    that the first poisoned.

    cpuhp multi-instance callbacks only walk forward in cpuhp_step->list and
    the same node is linked in both the online and dead lists, so the list
    corruption that results from padata_alloc() adding the node to a second
    list without removing it from the first doesn't cause problems as long
    as no instances are freed.

    Avoid the issue by giving each state its own node.

    Fixes: 894c9ef9780c ("padata: validate cpumask without removed CPU during offline")
    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org # v5.4+
    Signed-off-by: Herbert Xu

    Daniel Jordan
     

06 Mar, 2020

1 commit

  • Simplify the error handling in pcrypt_create_aead() by taking advantage
    of crypto_grab_aead() now handling an ERR_PTR() name and by taking
    advantage of crypto_drop_aead() now accepting (as a no-op) a spawn that
    hasn't been grabbed yet.

    This required also making padata_free_shell() accept a NULL argument.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     

22 Feb, 2020

1 commit

  • According to Geert's report[0],

    kernel/padata.c: warning: 'err' may be used uninitialized in this
    function [-Wuninitialized]: => 539:2

    Warning is seen only with older compilers on certain archs. The
    runtime effect is potentially returning garbage down the stack when
    padata's cpumasks are modified before any pcrypt requests have run.

    Simplest fix is to initialize err to the success value.

    [0] http://lkml.kernel.org/r/20200210135506.11536-1-geert@linux-m68k.org

    Reported-by: Geert Uytterhoeven
    Fixes: bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock by using per-instance padata queues")
    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     

11 Dec, 2019

8 commits

  • Remove references to unused functions, standardize language, update to
    reflect new functionality, migrate to rst format, and fix all kernel-doc
    warnings.

    Fixes: 815613da6a67 ("kernel/padata.c: removed unused code")
    Signed-off-by: Daniel Jordan
    Cc: Eric Biggers
    Cc: Herbert Xu
    Cc: Jonathan Corbet
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-doc@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Daniel Jordan
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • reorder_objects is unused since the rework of padata's flushing, so
    remove it.

    Signed-off-by: Daniel Jordan
    Cc: Eric Biggers
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • Since commit 63d3578892dc ("crypto: pcrypt - remove padata cpumask
    notifier") this feature is unused, so get rid of it.

    Signed-off-by: Daniel Jordan
    Cc: Eric Biggers
    Cc: Herbert Xu
    Cc: Jonathan Corbet
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-doc@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • lockdep complains when padata's paths to update cpumasks via CPU hotplug
    and sysfs are both taken:

    # echo 0 > /sys/devices/system/cpu/cpu1/online
    # echo ff > /sys/kernel/pcrypt/pencrypt/parallel_cpumask

    ======================================================
    WARNING: possible circular locking dependency detected
    5.4.0-rc8-padata-cpuhp-v3+ #1 Not tainted
    ------------------------------------------------------
    bash/205 is trying to acquire lock:
    ffffffff8286bcd0 (cpu_hotplug_lock.rw_sem){++++}, at: padata_set_cpumask+0x2b/0x120

    but task is already holding lock:
    ffff8880001abfa0 (&pinst->lock){+.+.}, at: padata_set_cpumask+0x26/0x120

    which lock already depends on the new lock.

    padata doesn't take cpu_hotplug_lock and pinst->lock in a consistent
    order. Which should be first? CPU hotplug calls into padata with
    cpu_hotplug_lock already held, so it should have priority.

    Fixes: 6751fb3c0e0c ("padata: Use get_online_cpus/put_online_cpus")
    Signed-off-by: Daniel Jordan
    Cc: Eric Biggers
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • Configuring an instance's parallel mask without any online CPUs...

    echo 2 > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
    echo 0 > /sys/devices/system/cpu/cpu1/online

    ...makes tcrypt mode=215 crash like this:

    divide error: 0000 [#1] SMP PTI
    CPU: 4 PID: 283 Comm: modprobe Not tainted 5.4.0-rc8-padata-doc-v2+ #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191013_105130-anatol 04/01/2014
    RIP: 0010:padata_do_parallel+0x114/0x300
    Call Trace:
    pcrypt_aead_encrypt+0xc0/0xd0 [pcrypt]
    crypto_aead_encrypt+0x1f/0x30
    do_mult_aead_op+0x4e/0xdf [tcrypt]
    test_mb_aead_speed.constprop.0.cold+0x226/0x564 [tcrypt]
    do_test+0x28c2/0x4d49 [tcrypt]
    tcrypt_mod_init+0x55/0x1000 [tcrypt]
    ...

    cpumask_weight() in padata_cpu_hash() returns 0 because the mask has no
    CPUs. The problem is __padata_remove_cpu() checks for valid masks too
    early and so doesn't mark the instance PADATA_INVALID as expected, which
    would have made padata_do_parallel() return error before doing the
    division.

    Fix by introducing a second padata CPU hotplug state before
    CPUHP_BRINGUP_CPU so that __padata_remove_cpu() sees the online mask
    without @cpu. No need for the second argument to padata_replace() since
    @cpu is now already missing from the online mask.

    Fixes: 33e54450683c ("padata: Handle empty padata cpumasks")
    Signed-off-by: Daniel Jordan
    Cc: Eric Biggers
    Cc: Herbert Xu
    Cc: Sebastian Andrzej Siewior
    Cc: Steffen Klassert
    Cc: Thomas Gleixner
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • If the pcrypt template is used multiple times in an algorithm, then a
    deadlock occurs because all pcrypt instances share the same
    padata_instance, which completes requests in the order submitted. That
    is, the inner pcrypt request waits for the outer pcrypt request while
    the outer request is already waiting for the inner.

    This patch fixes this by allocating a set of queues for each pcrypt
    instance instead of using two global queues. In order to maintain
    the existing user-space interface, the pinst structure remains global
    so any sysfs modifications will apply to every pcrypt instance.

    Note that when an update occurs we have to allocate memory for
    every pcrypt instance. Should one of the allocations fail we
    will abort the update without rolling back changes already made.

    The new per-instance data structure is called padata_shell and is
    essentially a wrapper around parallel_data.

    Reproducer:

    #include
    #include
    #include

    int main()
    {
    struct sockaddr_alg addr = {
    .salg_type = "aead",
    .salg_name = "pcrypt(pcrypt(rfc4106-gcm-aesni))"
    };
    int algfd, reqfd;
    char buf[32] = { 0 };

    algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
    bind(algfd, (void *)&addr, sizeof(addr));
    setsockopt(algfd, SOL_ALG, ALG_SET_KEY, buf, 20);
    reqfd = accept(algfd, 0, 0);
    write(reqfd, buf, 32);
    read(reqfd, buf, 16);
    }

    Reported-by: syzbot+56c7151cad94eec37c521f0e47d2eee53f9361c4@syzkaller.appspotmail.com
    Fixes: 5068c7a883d1 ("crypto: pcrypt - Add pcrypt crypto parallelization wrapper")
    Signed-off-by: Herbert Xu
    Tested-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • The function padata_remove_cpu was supposed to have been removed
    along with padata_add_cpu but somehow it remained behind. Let's
    kill it now as it doesn't even have a prototype anymore.

    Fixes: 815613da6a67 ("kernel/padata.c: removed unused code")
    Signed-off-by: Herbert Xu
    Reviewed-by: Daniel Jordan
    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • The function padata_flush_queues is fundamentally broken because
    it cannot force padata users to complete the request that is
    underway. IOW padata has to passively wait for the completion
    of any outstanding work.

    As it stands flushing is used in two places. Its use in padata_stop
    is simply unnecessary because nothing depends on the queues to
    be flushed afterwards.

    The other use in padata_replace is more substantial as we depend
    on it to free the old pd structure. This patch instead uses the
    pd->refcnt to dynamically free the pd structure once all requests
    are complete.

    Fixes: 2b73b07ab8a4 ("padata: Flush the padata queues actively")
    Cc:
    Signed-off-by: Herbert Xu
    Reviewed-by: Daniel Jordan
    Signed-off-by: Herbert Xu

    Herbert Xu
     

13 Sep, 2019

6 commits

  • With the removal of the ENODATA case from padata_get_next, the cpu_index
    field is no longer useful, so it can go away.

    Signed-off-by: Daniel Jordan
    Acked-by: Steffen Klassert
    Cc: Herbert Xu
    Cc: Lai Jiangshan
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • Padata binds the parallel part of a job to a single CPU and round-robins
    over all CPUs in the system for each successive job. Though the serial
    parts rely on per-CPU queues for correct ordering, they're not necessary
    for parallel work, and it improves performance to run the job locally on
    NUMA machines and let the scheduler pick the CPU within a node on a busy
    system.

    So, make the parallel workqueue unbound.

    Update the parallel workqueue's cpumask when the instance's parallel
    cpumask changes.

    Now that parallel jobs no longer run on max_active=1 workqueues, two or
    more parallel works that hash to the same CPU may run simultaneously,
    finish out of order, and so be serialized out of order. Prevent this by
    keeping the works sorted on the reorder list by sequence number and
    checking that in the reordering logic.

    padata_get_next becomes padata_find_next so it can be reused for the end
    of padata_reorder, where it's used to avoid uselessly queueing work when
    the next job by sequence number isn't finished yet but a later job that
    hashed to the same CPU has.

    The ENODATA case in padata_find_next no longer makes sense because
    parallel jobs aren't bound to specific CPUs. The EINPROGRESS case takes
    care of the scenario where a parallel job is potentially running on the
    same CPU as padata_find_next, and with only one error code left, just
    use NULL instead.

    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Lai Jiangshan
    Cc: Peter Zijlstra
    Cc: Steffen Klassert
    Cc: Tejun Heo
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • padata currently uses one per-CPU workqueue per instance for all work.

    Prepare for running parallel jobs on an unbound workqueue by introducing
    dedicated workqueues for parallel and serial work.

    Signed-off-by: Daniel Jordan
    Acked-by: Steffen Klassert
    Cc: Herbert Xu
    Cc: Lai Jiangshan
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • With pcrypt's cpumask no longer used, take the CPU hotplug lock inside
    padata_alloc_possible.

    Useful later in the series for avoiding nested acquisition of the CPU
    hotplug lock in padata when padata_alloc_possible is allocating an
    unbound workqueue.

    Without this patch, this nested acquisition would happen later in the
    series:

    pcrypt_init_padata
    get_online_cpus
    alloc_padata_possible
    alloc_padata
    alloc_workqueue(WQ_UNBOUND) // later in the series
    alloc_and_link_pwqs
    apply_wqattrs_lock
    get_online_cpus // recursive rwsem acquisition

    Signed-off-by: Daniel Jordan
    Acked-by: Steffen Klassert
    Cc: Herbert Xu
    Cc: Lai Jiangshan
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • padata_do_parallel currently returns -EINVAL if the callback CPU isn't
    in the callback cpumask.

    pcrypt tries to prevent this situation by keeping its own callback
    cpumask in sync with padata's and checks that the callback CPU it passes
    to padata is valid. Make padata handle this instead.

    padata_do_parallel now takes a pointer to the callback CPU and updates
    it for the caller if an alternate CPU is used. Overall behavior in
    terms of which callback CPUs are chosen stays the same.

    Prepares for removal of the padata cpumask notifier in pcrypt, which
    will fix a lockdep complaint about nested acquisition of the CPU hotplug
    lock later in the series.

    Signed-off-by: Daniel Jordan
    Acked-by: Steffen Klassert
    Cc: Herbert Xu
    Cc: Lai Jiangshan
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • Move workqueue allocation inside of padata to prepare for further
    changes to how padata uses workqueues.

    Guarantees the workqueue is created with max_active=1, which padata
    relies on to work correctly. No functional change.

    Signed-off-by: Daniel Jordan
    Acked-by: Steffen Klassert
    Cc: Herbert Xu
    Cc: Jonathan Corbet
    Cc: Lai Jiangshan
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-doc@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     

09 Aug, 2019

1 commit

  • Exercising CPU hotplug on a 5.2 kernel with recent padata fixes from
    cryptodev-2.6.git in an 8-CPU kvm guest...

    # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
    # echo 0 > /sys/devices/system/cpu/cpu1/online
    # echo c > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
    # modprobe tcrypt mode=215

    ...caused the following crash:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP PTI
    CPU: 2 PID: 134 Comm: kworker/2:2 Not tainted 5.2.0-padata-base+ #7
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-
    Workqueue: pencrypt padata_parallel_worker
    RIP: 0010:padata_reorder+0xcb/0x180
    ...
    Call Trace:
    padata_do_serial+0x57/0x60
    pcrypt_aead_enc+0x3a/0x50 [pcrypt]
    padata_parallel_worker+0x9b/0xe0
    process_one_work+0x1b5/0x3f0
    worker_thread+0x4a/0x3c0
    ...

    In padata_alloc_pd, pd->cpu is set using the user-supplied cpumask
    instead of the effective cpumask, and in this case cpumask_first picked
    an offline CPU.

    The offline CPU's reorder->list.next is NULL in padata_reorder because
    the list wasn't initialized in padata_init_pqueues, which only operates
    on CPUs in the effective mask.

    Fix by using the effective mask in padata_alloc_pd.

    Fixes: 6fc4dbcf0276 ("padata: Replace delayed timer with immediate workqueue in padata_reorder")
    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     

27 Jul, 2019

2 commits

  • With the removal of the padata timer, padata_do_serial no longer
    needs special CPU handling, so remove it.

    Signed-off-by: Daniel Jordan
    Cc: Herbert Xu
    Cc: Steffen Klassert
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     
  • The function padata_reorder will use a timer when it cannot progress
    while completed jobs are outstanding (pd->reorder_objects > 0). This
    is suboptimal as if we do end up using the timer then it would have
    introduced a gratuitous delay of one second.

    In fact we can easily distinguish between whether completed jobs
    are outstanding and whether we can make progress. All we have to
    do is look at the next pqueue list.

    This patch does that by replacing pd->processed with pd->cpu so
    that the next pqueue is more accessible.

    A work queue is used instead of the original try_again to avoid
    hogging the CPU.

    Note that we don't bother removing the work queue in
    padata_flush_queues because the whole premise is broken. You
    cannot flush async crypto requests so it makes no sense to even
    try. A subsequent patch will fix it by replacing it with a ref
    counting scheme.

    Signed-off-by: Herbert Xu

    Herbert Xu
     

18 Jul, 2019

1 commit

  • Testing padata with the tcrypt module on a 5.2 kernel...

    # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
    # modprobe tcrypt mode=211 sec=1

    ...produces this splat:

    INFO: task modprobe:10075 blocked for more than 120 seconds.
    Not tainted 5.2.0-base+ #16
    modprobe D 0 10075 10064 0x80004080
    Call Trace:
    ? __schedule+0x4dd/0x610
    ? ring_buffer_unlock_commit+0x23/0x100
    schedule+0x6c/0x90
    schedule_timeout+0x3b/0x320
    ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
    wait_for_common+0x160/0x1a0
    ? wake_up_q+0x80/0x80
    { crypto_wait_req } # entries in braces added by hand
    { do_one_aead_op }
    { test_aead_jiffies }
    test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
    do_test+0x4053/0x6a2b [tcrypt]
    ? 0xffffffffa00f4000
    tcrypt_mod_init+0x50/0x1000 [tcrypt]
    ...

    The second modprobe command never finishes because in padata_reorder,
    CPU0's load of reorder_objects is executed before the unlocking store in
    spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:

    CPU0 CPU1

    padata_reorder padata_do_serial
    LOAD reorder_objects // 0
    INC reorder_objects // 1
    padata_reorder
    TRYLOCK pd->lock // failed
    UNLOCK pd->lock

    CPU0 deletes the timer before returning from padata_reorder and since no
    other job is submitted to padata, modprobe waits indefinitely.

    Add a pair of full barriers to guarantee proper ordering:

    CPU0 CPU1

    padata_reorder padata_do_serial
    UNLOCK pd->lock
    smp_mb()
    LOAD reorder_objects
    INC reorder_objects
    smp_mb__after_atomic()
    padata_reorder
    TRYLOCK pd->lock

    smp_mb__after_atomic is needed so the read part of the trylock operation
    comes after the INC, as Andrea points out. Thanks also to Andrea for
    help with writing a litmus test.

    Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface")
    Signed-off-by: Daniel Jordan
    Cc:
    Cc: Andrea Parri
    Cc: Boqun Feng
    Cc: Herbert Xu
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Steffen Klassert
    Cc: linux-arch@vger.kernel.org
    Cc: linux-crypto@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Herbert Xu

    Daniel Jordan
     

26 Apr, 2019

1 commit

  • The kobj_type default_attrs field is being replaced by the
    default_groups field. Replace padata_attr_type's default_attrs field
    with default_groups and use the ATTRIBUTE_GROUPS macro to create
    padata_default_groups.

    This patch was tested by loading the pcrypt module and verifying that
    the sysfs files for the attributes in the default groups were created.

    Signed-off-by: Kimberly Brown
    Signed-off-by: Greg Kroah-Hartman

    Kimberly Brown
     

16 Nov, 2018

1 commit


05 Jan, 2018

1 commit


22 Nov, 2017

1 commit

  • This converts all remaining cases of the old setup_timer() API into using
    timer_setup(), where the callback argument is the structure already
    holding the struct timer_list. These should have no behavioral changes,
    since they just change which pointer is passed into the callback with
    the same available pointers after conversion. It handles the following
    examples, in addition to some other variations.

    Casting from unsigned long:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

    and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

    become:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    Direct function assignments:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

    have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

    And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

    have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    The conversion is done with the following Coccinelle script:

    spatch --very-quiet --all-includes --include-headers \
    -I ./arch/x86/include -I ./arch/x86/include/generated \
    -I ./include -I ./arch/x86/include/uapi \
    -I ./arch/x86/include/generated/uapi -I ./include/uapi \
    -I ./include/generated/uapi --include ./include/linux/kconfig.h \
    --dir . \
    --cocci-file ~/src/data/timer_setup.cocci

    @fix_address_of@
    expression e;
    @@

    setup_timer(
    -&(e)
    +&e
    , ...)

    // Update any raw setup_timer() usages that have a NULL callback, but
    // would otherwise match change_timer_function_usage, since the latter
    // will update all function assignments done in the face of a NULL
    // function initialization in setup_timer().
    @change_timer_function_usage_NULL@
    expression _E;
    identifier _timer;
    type _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, NULL, _E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, &_E);
    +timer_setup(&_E._timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
    +timer_setup(&_E._timer, NULL, 0);
    )

    @change_timer_function_usage@
    expression _E;
    identifier _timer;
    struct timer_list _stl;
    identifier _callback;
    type _cast_func, _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, _callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    _E->_timer@_stl.function = _callback;
    |
    _E->_timer@_stl.function = &_callback;
    |
    _E->_timer@_stl.function = (_cast_func)_callback;
    |
    _E->_timer@_stl.function = (_cast_func)&_callback;
    |
    _E._timer@_stl.function = _callback;
    |
    _E._timer@_stl.function = &_callback;
    |
    _E._timer@_stl.function = (_cast_func)_callback;
    |
    _E._timer@_stl.function = (_cast_func)&_callback;
    )

    // callback(unsigned long arg)
    @change_callback_handle_cast
    depends on change_timer_function_usage@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    (
    ... when != _origarg
    _handletype *_handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    )
    }

    // callback(unsigned long arg) without existing variable
    @change_callback_handle_cast_no_arg
    depends on change_timer_function_usage &&
    !change_callback_handle_cast@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    + _handletype *_origarg = from_timer(_origarg, t, _timer);
    +
    ... when != _origarg
    - (_handletype *)_origarg
    + _origarg
    ... when != _origarg
    }

    // Avoid already converted callbacks.
    @match_callback_converted
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    { ... }

    // callback(struct something *handle)
    @change_callback_handle_arg
    depends on change_timer_function_usage &&
    !match_callback_converted &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_handletype *_handle
    +struct timer_list *t
    )
    {
    + _handletype *_handle = from_timer(_handle, t, _timer);
    ...
    }

    // If change_callback_handle_arg ran on an empty function, remove
    // the added handler.
    @unchange_callback_handle_arg
    depends on change_timer_function_usage &&
    change_callback_handle_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    {
    - _handletype *_handle = from_timer(_handle, t, _timer);
    }

    // We only want to refactor the setup_timer() data argument if we've found
    // the matching callback. This undoes changes in change_timer_function_usage.
    @unchange_timer_function_usage
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg &&
    !change_callback_handle_arg@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type change_timer_function_usage._cast_data;
    @@

    (
    -timer_setup(&_E->_timer, _callback, 0);
    +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    |
    -timer_setup(&_E._timer, _callback, 0);
    +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    )

    // If we fixed a callback from a .function assignment, fix the
    // assignment cast now.
    @change_timer_function_assignment
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_func;
    typedef TIMER_FUNC_TYPE;
    @@

    (
    _E->_timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -&_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    )

    // Sometimes timer functions are called directly. Replace matched args.
    @change_timer_function_calls
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression _E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_data;
    @@

    _callback(
    (
    -(_cast_data)_E
    +&_E->_timer
    |
    -(_cast_data)&_E
    +&_E._timer
    |
    -_E
    +&_E->_timer
    )
    )

    // If a timer has been configured without a data argument, it can be
    // converted without regard to the callback argument, since it is unused.
    @match_timer_function_unused_data@
    expression _E;
    identifier _timer;
    identifier _callback;
    @@

    (
    -setup_timer(&_E->_timer, _callback, 0);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0L);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0UL);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0L);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0UL);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0L);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0UL);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0L);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0UL);
    +timer_setup(_timer, _callback, 0);
    )

    @change_callback_unused_data
    depends on match_timer_function_unused_data@
    identifier match_timer_function_unused_data._callback;
    type _origtype;
    identifier _origarg;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *unused
    )
    {
    ... when != _origarg
    }

    Signed-off-by: Kees Cook

    Kees Cook
     

07 Oct, 2017

3 commits

  • If the algorithm we're parallelizing is asynchronous we might change
    CPUs between padata_do_parallel() and padata_do_serial(). However, we
    don't expect this to happen as we need to enqueue the padata object into
    the per-cpu reorder queue we took it from, i.e. the same-cpu's parallel
    queue.

    Ensure we're not switching CPUs for a given padata object by tracking
    the CPU within the padata object. If the serial callback gets called on
    the wrong CPU, defer invoking padata_reorder() via a kernel worker on
    the CPU we're expected to run on.

    Signed-off-by: Mathias Krause
    Signed-off-by: Herbert Xu

    Mathias Krause
     
  • The reorder timer function runs on the CPU where the timer interrupt was
    handled which is not necessarily one of the CPUs of the 'pcpu' CPU mask
    set.

    Ensure the padata_reorder() callback runs on the correct CPU, which is
    one in the 'pcpu' CPU mask set and, preferrably, the next expected one.
    Do so by comparing the current CPU with the expected target CPU. If they
    match, call padata_reorder() right away. If they differ, schedule a work
    item on the target CPU that does the padata_reorder() call for us.

    Signed-off-by: Mathias Krause
    Signed-off-by: Herbert Xu

    Mathias Krause
     
  • The parallel queue per-cpu data structure gets initialized only for CPUs
    in the 'pcpu' CPU mask set. This is not sufficient as the reorder timer
    may run on a different CPU and might wrongly decide it's the target CPU
    for the next reorder item as per-cpu memory gets memset(0) and we might
    be waiting for the first CPU in cpumask.pcpu, i.e. cpu_index 0.

    Make the '__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index'
    compare in padata_get_next() fail in this case by initializing the
    cpu_index member of all per-cpu parallel queues. Use -1 for unused ones.

    Signed-off-by: Mathias Krause
    Signed-off-by: Herbert Xu

    Mathias Krause