15 Dec, 2016

12 commits

  • The AUDIT_KERNEL event is not following name=value format. This causes
    some information to get lost. The event has been reformatted to follow
    the convention. Additionally the audit_enabled value was added for
    troubleshooting purposes. The following is an example of the new event:

    type=KERNEL audit(1480621249.833:1): state=initialized
    audit_enabled=0 res=1

    Signed-off-by: Steve Grubb
    [PM: commit tweaks to make checkpatch.pl happy]
    Signed-off-by: Paul Moore

    Steve Grubb
     
  • Resetting audit_sock appears to be racy.

    audit_sock was being copied and dereferenced without using a refcount on
    the source sock.

    Bump the refcount on the underlying sock when we store a refrence in
    audit_sock and release it when we reset audit_sock. audit_sock
    modification needs the audit_cmd_mutex.

    See: https://lkml.org/lkml/2016/11/26/232

    Thanks to Eric Dumazet and Cong Wang
    on ideas how to fix it.

    Signed-off-by: Richard Guy Briggs
    Reviewed-by: Cong Wang
    [PM: fixed the comment block text formatting for auditd_reset()]
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     
  • Bring back commit bc51dddf98c9 ("netns: avoid disabling irq for netns
    id") now that we've fixed some audit multicast issues that caused
    problems with original attempt. Additional information, and history,
    can be found in the links below:

    * https://github.com/linux-audit/audit-kernel/issues/22
    * https://github.com/linux-audit/audit-kernel/issues/23

    Signed-off-by: Cong Wang
    Signed-off-by: Paul Moore

    Paul Moore
     
  • Sleeping on a command record/message in audit_log_start() could slow
    something, e.g. auditd, from doing something important, e.g. clean
    shutdown, which could present problems on a heavily loaded system.
    This patch allows tasks to bypass any queue restrictions if they are
    logging a command record/message.

    Signed-off-by: Paul Moore

    Paul Moore
     
  • When auditd stops cleanly it sets 'auditd_pid' to 0 with an
    AUDIT_SET message, in this case we should reset our backlog
    queues via the auditd_reset() function. This patch also adds
    a 'auditd_pid' check to the top of kauditd_send_unicast_skb()
    so we can fail quicker.

    Signed-off-by: Paul Moore

    Paul Moore
     
  • This patch was suggested by Richard Briggs back in 2015, see the link
    to the mail archive below. Unfortunately, that patch is no longer
    even remotely valid due to other changes to the code.

    * https://www.redhat.com/archives/linux-audit/2015-October/msg00075.html

    Suggested-by: Richard Guy Briggs
    Signed-off-by: Paul Moore

    Paul Moore
     
  • The backlog queue handling in audit_log_start() is a little odd with
    some questionable design decisions, this patch attempts to rectify
    this with the following changes:

    * Never make auditd wait, ignore any backlog limits as we need auditd
    awake so it can drain the backlog queue.

    * When we hit a backlog limit and start dropping records, don't wake
    all the tasks sleeping on the backlog, that's silly. Instead, let
    kauditd_thread() take care of waking everyone once it has had a chance
    to drain the backlog queue.

    * Don't keep a global backlog timeout countdown, make it per-task. A
    per-task timer means we won't have all the sleeping tasks waking at
    the same time and hammering on an already stressed backlog queue.

    Signed-off-by: Paul Moore

    Paul Moore
     
  • The audit record backlog queue has always been a bit of a mess, and
    the moving the multicast send into kauditd_thread() from
    audit_log_end() only makes things worse. This patch attempts to fix
    the backlog queue with a better design that should hold up better
    under load and have less of a performance impact at syscall
    invocation time.

    While it looks like there is a log going on in this patch, the main
    change is the move from a single backlog queue to three queues:

    * A queue for holding records generated from audit_log_end() that
    haven't been consumed by kauditd_thread() (audit_queue).

    * A queue for holding records that have been sent via multicast but
    had a temporary failure when sending via unicast and need a resend
    (audit_retry_queue).

    * A queue for holding records that haven't been sent via unicast
    because no one is listening (audit_hold_queue).

    Special care is taken in this patch to ensure that the proper
    record ordering is preserved, e.g. we send everything in the hold
    queue first, then the retry queue, and finally the main queue.

    Signed-off-by: Paul Moore

    Paul Moore
     
  • The audit queue names can be shortened and the record sending
    helpers associated with the kauditd task could be named better, do
    these small cleanups now to make life easier once we start reworking
    the queues and kauditd code.

    Signed-off-by: Paul Moore

    Paul Moore
     
  • Sending audit netlink multicast messages is bad for all the same
    reasons that sending audit netlink unicast messages is bad, so this
    patch reworks things so that we don't do the multicast send in
    audit_log_end(), we do it from the dedicated kauditd_thread thread just
    as we do for unicast messages.

    See the GitHub issues below for more information/history:

    * https://github.com/linux-audit/audit-kernel/issues/23
    * https://github.com/linux-audit/audit-kernel/issues/22

    Signed-off-by: Paul Moore

    Paul Moore
     
  • Make sure everything is initialized before we start the kauditd_thread
    and don't emit the "initialized" record until everything is finished.
    We also panic with a descriptive message if we can't start the
    kauditd_thread.

    Signed-off-by: Paul Moore

    Paul Moore
     
  • Richard made this change some time ago but Eric backed it out because
    the rest of the supporting code wasn't ready. In order to move the
    netlink multicast send to kauditd_thread we need to ensure the
    kauditd_thread is always running, so restore commit 6ff5e459 ("audit:
    move kaudit thread start from auditd registration to kaudit init").

    Signed-off-by: Richard Guy Briggs
    [PM: brought forward and merged based on Richard's old patch]
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     

30 Nov, 2016

1 commit

  • Define AUDIT_SESSIONID in the uapi and add support for specifying user
    filters based on the session ID. Also add the new session ID filter
    to the feature bitmap so userspace knows it is available.

    https://github.com/linux-audit/audit-kernel/issues/4
    RFE: add a session ID filter to the kernel's user filter

    Signed-off-by: Richard Guy Briggs
    [PM: combine multiple patches from Richard into this one]
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     

21 Nov, 2016

2 commits


15 Nov, 2016

1 commit


04 Nov, 2016

1 commit


03 Oct, 2016

7 commits

  • Linus Torvalds
     
  • Pull ARM fixes from Russell King:
    "Three relatively small fixes for ARM:

    - Roger noticed that dma_max_pfn() was calculating the upper limit
    wrongly, by adding the PFN offset of memory twice.

    - A fix from Robin to correct parsing of MPIDR values when the
    address size is larger than one BE32 unit.

    - A fix from Srinivas to ensure that we do not rely on the boot
    loader (or previous Linux kernel) setting the translation table
    base register a certain way in the decompressor, which can lead to
    crashes"

    * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
    ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
    ARM: 8617/1: dma: fix dma_max_pfn()
    ARM: 8616/1: dt: Respect property size when parsing CPUs

    Linus Torvalds
     
  • If the bootloader uses the long descriptor format and jumps to
    kernel decompressor code, TTBCR may not be in a right state.
    Before enabling the MMU, it is required to clear the TTBCR.PD0
    field to use TTBR0 for translation table walks.

    The commit dbece45894d3a ("ARM: 7501/1: decompressor:
    reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but
    doesn't consider all the bits for the size of TTBCR.N.

    Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to
    indicate the use of TTBR0 and the correct base address width.

    Fixes: dbece45894d3 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores")
    Acked-by: Robin Murphy
    Signed-off-by: Srinivas Ramana
    Signed-off-by: Russell King

    Srinivas Ramana
     
  • Pull x86 fixes from Thomas Gleixner:
    "The last regression fixes for 4.8 final:

    - Two patches addressing the fallout of the CR4 optimizations which
    caused CR4-less machines to fail.

    - Fix the VDSO build on big endian machines

    - Take care of FPU initialization if no CPUID is available otherwise
    task struct size ends up being zero

    - Fix up context tracking in case load_gs_index fails"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/entry/64: Fix context tracking state warning when load_gs_index fails
    x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
    x86/vdso: Fix building on big endian host
    x86/boot: Fix another __read_cr4() case on 486
    x86/init: Fix cr4_init_shadow() on CR4-less machines

    Linus Torvalds
     
  • Pull MIPS fixes from Ralf Baechle:
    "Another round of fixes:

    - CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systems
    - CPS: Avoid BUG() when offlining pre-r6 CPUs
    - DEC: Avoid gas warnings due to suspicious instruction scheduling by
    manually expanding assembler macros.
    - FTLB: Fix configuration by moving confiuguratoin after probing
    - FTLB: clear execution hazard after changing FTLB enable
    - Highmem: Fix detection of unsupported highmem with cache aliases
    - I6400: Don't touch FTLBP chicken bits
    - microMIPS: Fix BUILD_ROLLBACK_PROLOGUE
    - Malta: Fix IOCU disable switch read for MIPS64
    - Octeon: Fix probing of devices attached to GPIO lines
    - uprobes: Misc small fixes"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
    MIPS: CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systems
    MIPS: Fix detection of unsupported highmem with cache aliases
    MIPS: Malta: Fix IOCU disable switch read for MIPS64
    MIPS: Fix BUILD_ROLLBACK_PROLOGUE for microMIPS
    MIPS: clear execution hazard after changing FTLB enable
    MIPS: Configure FTLB after probing TLB sizes from config4
    MIPS: Stop setting I6400 FTLBP
    MIPS: DEC: Avoid la pseudo-instruction in delay slots
    MIPS: Octeon: mark GPIO controller node not populated after IRQ init.
    MIPS: uprobes: fix use of uninitialised variable
    MIPS: uprobes: remove incorrect set_orig_insn
    MIPS: fix uretprobe implementation
    MIPS: smp-cps: Avoid BUG() when offlining pre-r6 CPUs

    Linus Torvalds
     
  • Pull sparc fixes from David Miller:

    1) Fix section mismatches in some builds, from Paul Gortmaker.

    2) Need to count huge zero page mappings when doing TSB sizing, from
    Mike Kravetz.

    3) Fix handing of cpu_possible_mask when nr_cpus module option is
    specified, from Atish Patra.

    4) Don't allocate irq stacks until nr_irqs has been processed, also
    from Atish Patra.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc64: Fix non-SMP build.
    sparc64: Fix irq stack bootmem allocation.
    sparc64: Fix cpu_possible_mask if nr_cpus is set
    sparc64 mm: Fix more TSB sizing issues
    sparc64: fix section mismatch in find_numa_latencies_for_group

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Fix wrong TCP checksums on MTU probing when checksum offloading is
    disabled, from Douglas Caetano dos Santos.

    2) Fix qdisc backlog updates in qfq and sfb schedulers, from Cong Wang.

    3) Route lookup flow key protocol value is wrong in ip6gre_xmit_other(),
    fix from Lance Richardson.

    4) Scheduling while atomic in multicast routing code of ipv4 and ipv6,
    fix from Nikolay Aleksandrov.

    5) Fix packet alignment in fec driver, from Eric Nelson.

    6) Fix perf regression in sctp due to struct layout and cache misses,
    from Xin Long.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock
    sctp: change to check peer prsctp_capable when using prsctp polices
    sctp: remove prsctp_param from sctp_chunk
    sctp: move sent_count to the memory hole in sctp_chunk
    tg3: Avoid NULL pointer dereference in tg3_io_error_detected()
    act_ife: Fix false encoding
    act_ife: Fix external mac header on encode
    VSOCK: Don't dec ack backlog twice for rejected connections
    Revert "net: ethernet: bcmgenet: use phydev from struct net_device"
    net: fec: align IP header in hardware
    net: fec: remove QUIRK_HAS_RACC from i.mx27
    net: fec: remove QUIRK_HAS_RACC from i.mx25
    ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
    ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
    tcp: fix a compile error in DBGUNDO()
    tcp: fix wrong checksum calculation on MTU probing
    sch_sfb: keep backlog updated with qlen
    sch_qfq: keep backlog updated with qlen
    can: dev: fix deadlock reported after bus-off

    Linus Torvalds
     

02 Oct, 2016

1 commit

  • When discovering the number of VPEs per core, smp_num_siblings will be
    incorrect for kernels built without support for the MIPS MultiThreading
    (MT) ASE running on systems which implement said ASE. This leads to
    accesses to VPEs in secondary cores being performed incorrectly since
    mips_cm_vp_id calculates the wrong ID to write to the local "other"
    registers. Fix this by examining the number of VPEs in the core as
    reported by the CM.

    This patch presumes that the number of VPEs will be the same in each
    core of the system. As this path only applies to systems with CM version
    2.5 or lower, and this property is true of all such known systems, this
    is likely to be fine but is described in a comment for good measure.

    Signed-off-by: Paul Burton
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/14338/
    Signed-off-by: Ralf Baechle

    Paul Burton
     

01 Oct, 2016

7 commits

  • Pull SCSI fix from James Bottomley:
    "One final fix before 4.8.

    There was a memory leak triggered by turning scsi mq off due to the
    fact that we assume on host release that the already running hosts
    weren't mq based because that's the state of the global flag (even
    though they were).

    Fix it by tracking this on a per host host basis"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: Avoid that toggling use_blk_mq triggers a memory leak

    Linus Torvalds
     
  • Pull input fix from Dmitry Torokhov:
    "One small change to make joydev (which is used by older games) to bind
    to devices that export Z axis but not X or Y (such as TRC rudder)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: joydev - recognize devices with Z axis as joysticks

    Linus Torvalds
     
  • Merge more fixes from Andrew Morton:
    "Three fixes"

    * emailed patches from Andrew Morton :
    include/linux/property.h: fix typo/compile error
    ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
    mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()

    Linus Torvalds
     
  • This fixes commit d76eebfa175e ("include/linux/property.h: fix build
    issues with gcc-4.4.4").

    With that commit we get the following compile error when using the
    PROPERTY_ENTRY_INTEGER_ARRAY macro.

    include/linux/property.h:201:39: error: `u32_data' undeclared (first
    use in this function)
    PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u32, _val_)
    ^
    include/linux/property.h:193:17: note: in definition of macro
    `PROPERTY_ENTRY_INTEGER_ARRAY'
    { .pointer = { _type_##_data = _val_ } }, \
    ^

    This needs a '.' to reference the union member. It seems this was just
    overlooked here since it is done correctly in similar constructs in
    other parts of the original commit.

    This fix is in preparation of upcoming commits that will use this macro.

    Fixes: commit d76eebfa175e ("include/linux/property.h: fix build issues with gcc-4.4.4")
    Link: http://lkml.kernel.org/r/2de3b929290d88a723ed829a3e3cbd02044714df.1475114627.git.johnyoun@synopsys.com
    Signed-off-by: John Youn
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John Youn
     
  • The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.

    In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
    there are 2 process repeatedly performing the following operations
    respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
    1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
    ftruncate(fd, CLUSTER_SIZE) again and again.

    This is the backtrace when the deadlock happens:

    __wait_on_bit_lock+0x50/0xa0
    __lock_page+0xb7/0xc0
    ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
    ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
    do_page_mkwrite+0x66/0xc0
    handle_mm_fault+0x685/0x1350
    __do_page_fault+0x1d8/0x4d0
    trace_do_page_fault+0x37/0xf0
    do_async_page_fault+0x19/0x70
    async_page_fault+0x28/0x30

    In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
    disk space for this write; ocfs2_try_to_free_truncate_log() will be
    called if -ENOSPC is returned; if we're lucky to get enough clusters,
    which is usually the case, we start over again.

    But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
    will deadlock when trying to grab the target page again.

    Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
    Another deadlock will happen in __do_page_mkwrite() if
    ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
    locked target page.

    These two errors fail on the same path, so fix them by unlocking the
    target page manually before ocfs2_free_write_ctxt().

    Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
    cause.

    Changes since v1:
    1. Also put ENOMEM error case into consideration.

    Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com
    Signed-off-by: Eric Ren
    Reviewed-by: He Gang
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Ren
     
  • Antonio reports the following crash when using fuse under memory pressure:

    kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: all of them
    CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
    Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
    task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
    RIP: shadow_lru_isolate+0x181/0x190
    Call Trace:
    __list_lru_walk_one.isra.3+0x8f/0x130
    list_lru_walk_one+0x23/0x30
    scan_shadow_nodes+0x34/0x50
    shrink_slab.part.40+0x1ed/0x3d0
    shrink_zone+0x2ca/0x2e0
    kswapd+0x51e/0x990
    kthread+0xd8/0xf0
    ret_from_fork+0x3f/0x70

    which corresponds to the following sanity check in the shadow node
    tracking:

    BUG_ON(node->count & RADIX_TREE_COUNT_MASK);

    The workingset code tracks radix tree nodes that exclusively contain
    shadow entries of evicted pages in them, and this (somewhat obscure)
    line checks whether there are real pages left that would interfere with
    reclaim of the radix tree node under memory pressure.

    While discussing ways how fuse might sneak pages into the radix tree
    past the workingset code, Miklos pointed to replace_page_cache_page(),
    and indeed there is a problem there: it properly accounts for the old
    page being removed - __delete_from_page_cache() does that - but then
    does a raw raw radix_tree_insert(), not accounting for the replacement
    page. Eventually the page count bits in node->count underflow while
    leaving the node incorrectly linked to the shadow node LRU.

    To address this, make sure replace_page_cache_page() uses the tracked
    page insertion code, page_cache_tree_insert(). This fixes the page
    accounting and makes sure page-containing nodes are properly unlinked
    from the shadow node LRU again.

    Also, make the sanity checks a bit less obscure by using the helpers for
    checking the number of pages and shadows in a radix tree node.

    Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
    Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reported-by: Antonio SJ Musumeci
    Debugged-by: Miklos Szeredi
    Cc: [3.15+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Change my email address to my kernel.org account instead of the ARM one.

    Signed-off-by: Javi Merino
    Signed-off-by: Linus Torvalds

    Javi Merino
     

30 Sep, 2016

8 commits

  • This warning:

    WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50
    CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13
    Call Trace:
    dump_stack+0x99/0xd0
    __warn+0xd1/0xf0
    warn_slowpath_null+0x1d/0x20
    enter_from_user_mode+0x32/0x50
    error_entry+0x6d/0xc0
    ? general_protection+0x12/0x30
    ? native_load_gs_index+0xd/0x20
    ? do_set_thread_area+0x19c/0x1f0
    SyS_set_thread_area+0x24/0x30
    do_int80_syscall_32+0x7c/0x220
    entry_INT80_compat+0x38/0x50

    ... can be reproduced by running the GS testcase of the ldt_gdt test unit in
    the x86 selftests.

    do_int80_syscall_32() will call enter_form_user_mode() to convert context
    tracking state from user state to kernel state. The load_gs_index() call
    can fail with user gsbase, gsbase will be fixed up and proceed if this
    happen.

    However, enter_from_user_mode() will be called again in the fixed up path
    though it is context tracking kernel state currently.

    This patch fixes it by just fixing up gsbase and telling lockdep that IRQs
    are off once load_gs_index() failed with user gsbase.

    Signed-off-by: Wanpeng Li
    Acked-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com
    Signed-off-by: Ingo Molnar

    Wanpeng Li
     
  • Otherwise arch_task_struct_size == 0 and we die. While we're at it,
    set X86_FEATURE_ALWAYS, too.

    Reported-by: David Saggiorato
    Tested-by: David Saggiorato
    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86")
    Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • We need to call GET_LE to read hdr->e_type.

    Fixes: 57f90c3dfc75 ("x86/vdso: Error out if the vDSO isn't a valid DSO")
    Reported-by: Paul Gortmaker
    Signed-off-by: Segher Boessenkool
    Acked-by: Andy Lutomirski
    Cc: Stephen Rothwell
    Cc: linux-next@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org
    Signed-off-by: Thomas Gleixner

    Segher Boessenkool
     
  • The condition for reading CR4 was wrong: there are some CPUs with
    CPUID but not CR4. Rather than trying to make the condition exact,
    use __read_cr4_safe().

    Fixes: 18bc7bd523e0 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly")
    Reported-by: david@saggiorato.net
    Signed-off-by: Andy Lutomirski
    Reviewed-by: Borislav Petkov
    Cc: Brian Gerst
    Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org
    Signed-off-by: Thomas Gleixner

    Andy Lutomirski
     
  • When sctp dumps all the ep->assocs, it needs to lock_sock first,
    but now it locks sock in rcu_read_lock, and lock_sock may sleep,
    which would break rcu_read_lock.

    This patch is to get and hold one sock when traversing the list.
    After that and get out of rcu_read_lock, lock and dump it. Then
    it will traverse the list again to get the next one until all
    sctp socks are dumped.

    For sctp_diag_dump_one, it fixes this issue by holding asoc and
    moving cb() out of rcu_read_lock in sctp_transport_lookup_process.

    Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • Xin Long says:

    ====================
    sctp: a bunch of fixes for prsctp polices

    This patchset is to fix 2 issues for prsctp polices:

    1. patch 1 and 2 fix "netperf-Throughput_Mbps -37.2% regression" issue
    when overloading the CPU.

    2. patch 3 fix "prsctp polices should check both sides' prsctp_capable,
    instead of only local side".
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Now before using prsctp polices, sctp uses asoc->prsctp_enable to
    check if prsctp is enabled. However asoc->prsctp_enable is set only
    means local host support prsctp, sctp should not abandon packet if
    peer host doesn't enable prsctp.

    So this patch is to use asoc->peer.prsctp_capable to check if prsctp
    is enabled on both side, instead of asoc->prsctp_enable, as asoc's
    peer.prsctp_capable is set only when local and peer both enable prsctp.

    Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • Now sctp uses chunk->prsctp_param to save the prsctp param for all the
    prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
    We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
    and reuse msg->expires_at for TTL policy, as the prsctp polices and old
    expires policy are mutual exclusive.

    This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
    expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
    polices.

    Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
    as it needs a u64 variables to save the expires_at time.

    This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
    issue.

    Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long