26 Feb, 2017

14 commits

  • [ Upstream commit e623a9e9dec29ae811d11f83d0074ba254aba374 ]

    Commit 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path"),
    changed the exit path of recvmmsg to always return the datagrams
    variable and modified the error paths to set the variable to the error
    code returned by recvmsg if necessary.

    However in the case sock_error returned an error, the error code was
    then ignored, and recvmmsg returned 0.

    Change the error path of recvmmsg to correctly return the error code
    of sock_error.

    The bug was triggered by using recvmmsg on a CAN interface which was
    not up. Linux 4.6 and later return 0 in this case while earlier
    releases returned -ENETDOWN.

    Fixes: 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path")
    Signed-off-by: Maxime Jayat
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Maxime Jayat
     
  • [ Upstream commit ca4ef4574f1ee5252e2cd365f8f5d5bafd048f32 ]

    The skbs processed by ip_cmsg_recv() are not guaranteed to
    be linear e.g. when sending UDP packets over loopback with
    MSGMORE.
    Using csum_partial() on [potentially] the whole skb len
    is dangerous; instead be on the safe side and use skb_checksum().

    Thanks to syzkaller team to detect the issue and provide the
    reproducer.

    v1 -> v2:
    - move the variable declaration in a tighter scope

    Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv")
    Reported-by: Andrey Konovalov
    Signed-off-by: Paolo Abeni
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     
  • [ Upstream commit e71695307114335be1ed912f4a347396c2ed0e69 ]

    Resizing currently drops consumer lock. This can cause entries to be
    reordered, which isn't good in itself. More importantly, consumer can
    detect a false ring empty condition and block forever.

    Further, nesting of consumer within producer lock is problematic for
    tun, since it produces entries in a BH, which causes a lock order
    reversal:

    CPU0 CPU1
    ---- ----
    consume:
    lock(&(&r->consumer_lock)->rlock);
    resize:
    local_irq_disable();
    lock(&(&r->producer_lock)->rlock);
    lock(&(&r->consumer_lock)->rlock);

    produce:
    lock(&(&r->producer_lock)->rlock);

    To fix, nest producer lock within consumer lock during resize,
    and keep consumer lock during the whole swap operation.

    Reported-by: Dmitry Vyukov
    Cc: stable@vger.kernel.org
    Cc: "David S. Miller"
    Acked-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Michael S. Tsirkin
     
  • [ Upstream commit 4c03b862b12f980456f9de92db6d508a4999b788 ]

    A nested lock depth was added to the hasbin_delete() code but it
    doesn't actually work some well and results in tons of lockdep splats.

    Fix the code instead to properly drop the lock around the operation
    and just keep peeking the head of the hashbin queue.

    Reported-by: Dmitry Vyukov
    Tested-by: Dmitry Vyukov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David S. Miller
     
  • [ Upstream commit 22f0708a718daea5e79de2d29b4829de016a4ff4 ]

    Since the commit 0c1d70af924b ("net: use dst_cache for vxlan device")
    vxlan_fill_metadata_dst() calls vxlan_get_route() passing a NULL
    dst_cache pointer, so the latter should explicitly check for
    valid dst_cache ptr. Unfortunately the commit d71785ffc7e7 ("net: add
    dst_cache to ovs vxlan lwtunnel") removed said check.

    As a result is possible to trigger a null pointer access calling
    vxlan_fill_metadata_dst(), e.g. with:

    ovs-vsctl add-br ovs-br0
    ovs-vsctl add-port ovs-br0 vxlan0 -- set interface vxlan0 \
    type=vxlan options:remote_ip=192.168.1.1 \
    options:key=1234 options:dst_port=4789 ofport_request=10
    ip address add dev ovs-br0 172.16.1.2/24
    ovs-vsctl set Bridge ovs-br0 ipfix=@i -- --id=@i create IPFIX \
    targets=\"172.16.1.1:1234\" sampling=1
    iperf -c 172.16.1.1 -u -l 1000 -b 10M -t 1 -p 1234

    This commit addresses the issue passing to vxlan_get_route() the
    dst_cache already available into the lwt info processed by
    vxlan_fill_metadata_dst().

    Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel")
    Signed-off-by: Paolo Abeni
    Acked-by: Jiri Benc
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     
  • [ Upstream commit 5edabca9d4cff7f1f2b68f0bac55ef99d9798ba4 ]

    In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
    is forcibly freed via __kfree_skb in dccp_rcv_state_process if
    dccp_v6_conn_request successfully returns.

    However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
    is saved to ireq->pktopts and the ref count for skb is incremented in
    dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
    in dccp_rcv_state_process.

    Fix by calling consume_skb instead of doing goto discard and therefore
    calling __kfree_skb.

    Similar fixes for TCP:

    fb7e2399ec17f1004c0e0ccfd17439f8759ede01 [TCP]: skb is unexpectedly freed.
    0aea76d35c9651d55bbaf746e7914e5f9ae5a25d tcp: SYN packets are now
    simply consumed

    Signed-off-by: Andrey Konovalov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Andrey Konovalov
     
  • [ Upstream commit 7627ae6030f56a9a91a5b3867b21f35d79c16e64 ]

    When setting a neigh related sysctl parameter, we always send a
    NETEVENT_DELAY_PROBE_TIME_UPDATE netevent. For instance, when
    executing

    sysctl net.ipv6.neigh.wlp3s0.retrans_time_ms=2000

    a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent is generated.

    This is caused by commit 2a4501ae18b5 ("neigh: Send a
    notification when DELAY_PROBE_TIME changes"). According to the
    commit's description, it was intended to generate such an event
    when setting the "delay_first_probe_time" sysctl parameter.

    In order to fix this, only generate this event when actually
    setting the "delay_first_probe_time" sysctl parameter. This fix
    should not have any unintended side-effects, because all but one
    registered netevent callbacks check for other netevent event
    types (the registered callbacks were obtained by grepping for
    "register_netevent_notifier"). The only callback that uses the
    NETEVENT_DELAY_PROBE_TIME_UPDATE event is
    mlxsw_sp_router_netevent_event() (in
    drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c): in case
    of this event, it only accesses the DELAY_PROBE_TIME of the
    passed neigh_parms.

    Fixes: 2a4501ae18b5 ("neigh: Send a notification when DELAY_PROBE_TIME changes")
    Signed-off-by: Marcus Huewe
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Marcus Huewe
     
  • [ Upstream commit 2bd624b4611ffee36422782d16e1c944d1351e98 ]

    Commit 6664498280cf ("packet: call fanout_release, while UNREGISTERING a
    netdev"), unfortunately, introduced the following issues.

    1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
    rcu_read-side critical section. rcu_read_lock disables preemption, most often,
    which prohibits calling sleeping functions.

    [ ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section!
    [ ]
    [ ] rcu_scheduler_active = 1, debug_locks = 0
    [ ] 4 locks held by ovs-vswitchd/1969:
    [ ] #0: (cb_lock){++++++}, at: [] genl_rcv+0x19/0x40
    [ ] #1: (ovs_mutex){+.+.+.}, at: [] ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
    [ ] #2: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
    [ ] #3: (rcu_read_lock){......}, at: [] packet_notifier+0x5/0x3f0
    [ ]
    [ ] Call Trace:
    [ ] [] dump_stack+0x85/0xc4
    [ ] [] lockdep_rcu_suspicious+0x107/0x110
    [ ] [] ___might_sleep+0x57/0x210
    [ ] [] __might_sleep+0x70/0x90
    [ ] [] mutex_lock_nested+0x3c/0x3a0
    [ ] [] ? vprintk_default+0x1f/0x30
    [ ] [] ? printk+0x4d/0x4f
    [ ] [] fanout_release+0x1d/0xe0
    [ ] [] packet_notifier+0x2f9/0x3f0

    2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
    "sleeping function called from invalid context"

    [ ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
    [ ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
    [ ] INFO: lockdep is turned off.
    [ ] Call Trace:
    [ ] [] dump_stack+0x85/0xc4
    [ ] [] ___might_sleep+0x202/0x210
    [ ] [] __might_sleep+0x70/0x90
    [ ] [] mutex_lock_nested+0x3c/0x3a0
    [ ] [] fanout_release+0x1d/0xe0
    [ ] [] packet_notifier+0x2f9/0x3f0

    3. calling dev_remove_pack(&fanout->prot_hook), from inside
    spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
    -> synchronize_net(), which might sleep.

    [ ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002
    [ ] INFO: lockdep is turned off.
    [ ] Call Trace:
    [ ] [] dump_stack+0x85/0xc4
    [ ] [] __schedule_bug+0x64/0x73
    [ ] [] __schedule+0x6b/0xd10
    [ ] [] schedule+0x6b/0x80
    [ ] [] schedule_timeout+0x38d/0x410
    [ ] [] synchronize_sched_expedited+0x53d/0x810
    [ ] [] synchronize_rcu_expedited+0xe/0x10
    [ ] [] synchronize_net+0x35/0x50
    [ ] [] dev_remove_pack+0x13/0x20
    [ ] [] fanout_release+0xbe/0xe0
    [ ] [] packet_notifier+0x2f9/0x3f0

    4. fanout_release() races with calls from different CPU.

    To fix the above problems, remove the call to fanout_release() under
    rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
    netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
    to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
    __fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
    fanout->prot_hook is removed as well.

    Fixes: 6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev")
    Reported-by: Eric Dumazet
    Signed-off-by: Anoob Soman
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Anoob Soman
     
  • [ Upstream commit d199fab63c11998a602205f7ee7ff7c05c97164b ]

    Multiple threads can call fanout_add() at the same time.

    We need to grab fanout_mutex earlier to avoid races that could
    lead to one thread freeing po->rollover that was set by another thread.

    Do the same in fanout_release(), for peace of mind, and to help us
    finding lockdep issues earlier.

    Fixes: dc99f600698d ("packet: Add fanout support.")
    Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")
    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit a60ced990e309666915d21445e95347d12406694 ]

    There is a copy-paste error, which hides breaking of resume
    for CPSW driver: there was replaced netdev_priv() to ndev_to_cpsw(ndev)
    in suspend, but left it unchanged in resume.

    Fixes: 606f39939595a4d4540406bfc11f265b2036af6d
    (ti: cpsw: move platform data and slaves info to cpsw_common)

    Reported-by: Alexey Starikovskiy
    Signed-off-by: Ivan Khoronzhuk
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ivan Khoronzhuk
     
  • [ Upstream commit 8b74d439e1697110c5e5c600643e823eb1dd0762 ]

    It seems nobody used LLC since linux-3.12.

    Fortunately fuzzers like syzkaller still know how to run this code,
    otherwise it would be no fun.

    Setting skb->sk without skb->destructor leads to all kinds of
    bugs, we now prefer to be very strict about it.

    Ideally here we would use skb_set_owner() but this helper does not exist yet,
    only CAN seems to have a private helper for that.

    Fixes: 376c7311bdb6 ("net: add a temporary sanity check in skb_orphan()")
    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit fed06ee89b78d3af32e235e0e89ad0d946fcb95d ]

    When called by HW offloading drivers, the TC action (e.g
    net/sched/act_mirred.c) code uses this_cpu logic, e.g

    _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets)

    per the kernel documention, preemption should be disabled, add that.

    Before the fix, when running with CONFIG_PREEMPT set, we get a

    BUG: using smp_processor_id() in preemptible [00000000] code: tc/3793

    asserion from the TC action (mirred) stats_update callback.

    Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter statistics support')
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Or Gerlitz
     
  • [ Upstream commit cd27b96bc13841ee7af25837a6ae86fee87273d6 ]

    In commit 98e3862ca2b1 ("kcm: fix 0-length case for kcm_sendmsg()")
    I tried to avoid skb allocation for 0-length case, but missed
    a check for NULL pointer in the non EOR case.

    Fixes: 98e3862ca2b1 ("kcm: fix 0-length case for kcm_sendmsg()")
    Reported-by: Dmitry Vyukov
    Cc: Tom Herbert
    Signed-off-by: Cong Wang
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    WANG Cong
     
  • [ Upstream commit 98e3862ca2b1ae595a13805dcab4c3a6d7718f4d ]

    Dmitry reported a kernel warning:

    WARNING: CPU: 3 PID: 2936 at net/kcm/kcmsock.c:627
    kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
    CPU: 3 PID: 2936 Comm: a.out Not tainted 4.10.0-rc6+ #209
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:15 [inline]
    dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
    panic+0x1fb/0x412 kernel/panic.c:179
    __warn+0x1c4/0x1e0 kernel/panic.c:539
    warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
    kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
    kcm_sendmsg+0x163a/0x2200 net/kcm/kcmsock.c:1029
    sock_sendmsg_nosec net/socket.c:635 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:645
    sock_write_iter+0x326/0x600 net/socket.c:848
    new_sync_write fs/read_write.c:499 [inline]
    __vfs_write+0x483/0x740 fs/read_write.c:512
    vfs_write+0x187/0x530 fs/read_write.c:560
    SYSC_write fs/read_write.c:607 [inline]
    SyS_write+0xfb/0x230 fs/read_write.c:599
    entry_SYSCALL_64_fastpath+0x1f/0xc2

    when calling syscall(__NR_write, sock2, 0x208aaf27ul, 0x0ul) on a KCM
    seqpacket socket. It appears that kcm_sendmsg() does not handle len==0
    case correctly, which causes an empty skb is allocated and queued.
    Fix this by skipping the skb allocation for len==0 case.

    Reported-by: Dmitry Vyukov
    Cc: Tom Herbert
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    WANG Cong
     

24 Feb, 2017

23 commits

  • Greg Kroah-Hartman
     
  • commit 35879ee4769099905fa3bda0b21e73d434e2df6a upstream.

    This reverts 'commit 7e0739cd9c40 ("[media] videodev2.h: fix
    sYCC/AdobeYCC default quantization range").

    The problem is that many drivers can convert R'G'B' content (often
    from sensors) to Y'CbCr, but they all produce limited range Y'CbCr.

    To stay backwards compatible the default quantization range for
    sRGB and AdobeRGB Y'CbCr encoding should be limited range, not full
    range, even though the corresponding standards specify full range.

    Update the V4L2_MAP_QUANTIZATION_DEFAULT define accordingly and
    also update the documentation.

    Fixes: 7e0739cd9c40 ("[media] videodev2.h: fix sYCC/AdobeYCC default quantization range")
    Signed-off-by: Hans Verkuil
    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Hans Verkuil
     
  • commit be628be09563f8f6e81929efbd7cf3f45c344416 upstream.

    Signed-off-by: Kent Overstreet
    Cc: Coly Li
    Signed-off-by: Greg Kroah-Hartman

    Kent Overstreet
     
  • commit 8fcd0950c021d7be8493280541332b924b9de962 upstream.

    Fix typo causing ntb_transport_create_queue to select the first
    queue every time, instead of using the next free queue.

    Signed-off-by: Thomas VanSelus
    Signed-off-by: Aaron Sierra
    Acked-by: Allen Hubbe
    Fixes: fce8a7bb5 ("PCI-Express Non-Transparent Bridge Support")
    Signed-off-by: Jon Mason
    Signed-off-by: Greg Kroah-Hartman

    Thomas VanSelus
     
  • commit 9644347c5240d0ee3ba7472ef332aaa4ff4db398 upstream.

    In the normal I/O execution path, ntb_perf is missing a call to
    dmaengine_unmap_put() after submission. That causes us to leak
    unmap objects.

    Signed-off-by: Dave Jiang
    Fixes: 8a7b6a77 ("ntb: ntb perf tool")
    Signed-off-by: Jon Mason
    Signed-off-by: Greg Kroah-Hartman

    Dave Jiang
     
  • commit dd62245e73de9138333cb0e7a42c8bc1215c3ce6 upstream.

    The call to debugfs_remove_recursive(qp->debugfs_dir) of the sub-level
    directory must not be later than
    debugfs_remove_recursive(nt_debugfs_dir) of the top-level directory.
    Otherwise, the sub-level directory will not exist, and it would be
    invalid (panic) to attempt to remove it. This removes the top-level
    directory last, after sub-level directories have been cleaned up.

    Signed-off-by: Allen Hubbe
    Fixes: e26a5843f ("NTB: Split ntb_hw_intel and ntb_transport drivers")
    Signed-off-by: Jon Mason
    Signed-off-by: Greg Kroah-Hartman

    Allen Hubbe
     
  • commit f222449c9dfad7c9bb8cb53e64c5c407b172ebbc upstream.

    We cannot do printk() from tk_debug_account_sleep_time(), because
    tk_debug_account_sleep_time() is called under tk_core seq lock.
    The reason why printk() is unsafe there is that console_sem may
    invoke scheduler (up()->wake_up_process()->activate_task()), which,
    in turn, can return back to timekeeping code, for instance, via
    get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

    [ 48.950592] ======================================================
    [ 48.950622] [ INFO: possible circular locking dependency detected ]
    [ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
    [ 48.950622] -------------------------------------------------------
    [ 48.950622] kworker/0:0/3 is trying to acquire lock:
    [ 48.950653] (tk_core){----..}, at: [] retrigger_next_event+0x4c/0x90
    [ 48.950683]
    but task is already holding lock:
    [ 48.950683] (hrtimer_bases.lock){-.-...}, at: [] retrigger_next_event+0x38/0x90
    [ 48.950714]
    which lock already depends on the new lock.

    [ 48.950714]
    the existing dependency chain (in reverse order) is:
    [ 48.950714]
    -> #5 (hrtimer_bases.lock){-.-...}:
    [ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
    [ 48.950775] lock_hrtimer_base+0x28/0x58
    [ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
    [ 48.950775] __enqueue_rt_entity+0x320/0x360
    [ 48.950805] enqueue_rt_entity+0x2c/0x44
    [ 48.950805] enqueue_task_rt+0x24/0x94
    [ 48.950836] ttwu_do_activate+0x54/0xc0
    [ 48.950836] try_to_wake_up+0x248/0x5c8
    [ 48.950836] __setup_irq+0x420/0x5f0
    [ 48.950836] request_threaded_irq+0xdc/0x184
    [ 48.950866] devm_request_threaded_irq+0x58/0xa4
    [ 48.950866] omap_i2c_probe+0x530/0x6a0
    [ 48.950897] platform_drv_probe+0x50/0xb0
    [ 48.950897] driver_probe_device+0x1f8/0x2cc
    [ 48.950897] __driver_attach+0xc0/0xc4
    [ 48.950927] bus_for_each_dev+0x6c/0xa0
    [ 48.950927] bus_add_driver+0x100/0x210
    [ 48.950927] driver_register+0x78/0xf4
    [ 48.950958] do_one_initcall+0x3c/0x16c
    [ 48.950958] kernel_init_freeable+0x20c/0x2d8
    [ 48.950958] kernel_init+0x8/0x110
    [ 48.950988] ret_from_fork+0x14/0x24
    [ 48.950988]
    -> #4 (&rt_b->rt_runtime_lock){-.-...}:
    [ 48.951019] _raw_spin_lock+0x40/0x50
    [ 48.951019] rq_offline_rt+0x9c/0x2bc
    [ 48.951019] set_rq_offline.part.2+0x2c/0x58
    [ 48.951049] rq_attach_root+0x134/0x144
    [ 48.951049] cpu_attach_domain+0x18c/0x6f4
    [ 48.951049] build_sched_domains+0xba4/0xd80
    [ 48.951080] sched_init_smp+0x68/0x10c
    [ 48.951080] kernel_init_freeable+0x160/0x2d8
    [ 48.951080] kernel_init+0x8/0x110
    [ 48.951080] ret_from_fork+0x14/0x24
    [ 48.951110]
    -> #3 (&rq->lock){-.-.-.}:
    [ 48.951110] _raw_spin_lock+0x40/0x50
    [ 48.951141] task_fork_fair+0x30/0x124
    [ 48.951141] sched_fork+0x194/0x2e0
    [ 48.951141] copy_process.part.5+0x448/0x1a20
    [ 48.951171] _do_fork+0x98/0x7e8
    [ 48.951171] kernel_thread+0x2c/0x34
    [ 48.951171] rest_init+0x1c/0x18c
    [ 48.951202] start_kernel+0x35c/0x3d4
    [ 48.951202] 0x8000807c
    [ 48.951202]
    -> #2 (&p->pi_lock){-.-.-.}:
    [ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
    [ 48.951232] try_to_wake_up+0x30/0x5c8
    [ 48.951232] up+0x4c/0x60
    [ 48.951263] __up_console_sem+0x2c/0x58
    [ 48.951263] console_unlock+0x3b4/0x650
    [ 48.951263] vprintk_emit+0x270/0x474
    [ 48.951293] vprintk_default+0x20/0x28
    [ 48.951293] printk+0x20/0x30
    [ 48.951324] kauditd_hold_skb+0x94/0xb8
    [ 48.951324] kauditd_thread+0x1a4/0x56c
    [ 48.951324] kthread+0x104/0x148
    [ 48.951354] ret_from_fork+0x14/0x24
    [ 48.951354]
    -> #1 ((console_sem).lock){-.....}:
    [ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
    [ 48.951385] down_trylock+0xc/0x2c
    [ 48.951385] __down_trylock_console_sem+0x24/0x80
    [ 48.951385] console_trylock+0x10/0x8c
    [ 48.951416] vprintk_emit+0x264/0x474
    [ 48.951416] vprintk_default+0x20/0x28
    [ 48.951416] printk+0x20/0x30
    [ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
    [ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
    [ 48.951446] timekeeping_resume+0x218/0x23c
    [ 48.951477] syscore_resume+0x94/0x42c
    [ 48.951477] suspend_enter+0x554/0x9b4
    [ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
    [ 48.951507] enter_state+0x934/0xbd4
    [ 48.951507] pm_suspend+0x14/0x70
    [ 48.951507] state_store+0x68/0xc8
    [ 48.951538] kernfs_fop_write+0xf4/0x1f8
    [ 48.951538] __vfs_write+0x1c/0x114
    [ 48.951538] vfs_write+0xa0/0x168
    [ 48.951568] SyS_write+0x3c/0x90
    [ 48.951568] __sys_trace_return+0x0/0x10
    [ 48.951568]
    -> #0 (tk_core){----..}:
    [ 48.951599] lock_acquire+0xe0/0x294
    [ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
    [ 48.951629] retrigger_next_event+0x4c/0x90
    [ 48.951629] on_each_cpu+0x40/0x7c
    [ 48.951629] clock_was_set_work+0x14/0x20
    [ 48.951660] process_one_work+0x2b4/0x808
    [ 48.951660] worker_thread+0x3c/0x550
    [ 48.951660] kthread+0x104/0x148
    [ 48.951690] ret_from_fork+0x14/0x24
    [ 48.951690]
    other info that might help us debug this:

    [ 48.951690] Chain exists of:
    tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

    [ 48.951721] Possible unsafe locking scenario:

    [ 48.951721] CPU0 CPU1
    [ 48.951721] ---- ----
    [ 48.951721] lock(hrtimer_bases.lock);
    [ 48.951751] lock(&rt_b->rt_runtime_lock);
    [ 48.951751] lock(hrtimer_bases.lock);
    [ 48.951751] lock(tk_core);
    [ 48.951782]
    *** DEADLOCK ***

    [ 48.951782] 3 locks held by kworker/0:0/3:
    [ 48.951782] #0: ("events"){.+.+.+}, at: [] process_one_work+0x1f8/0x808
    [ 48.951812] #1: (hrtimer_work){+.+...}, at: [] process_one_work+0x1f8/0x808
    [ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [] retrigger_next_event+0x38/0x90
    [ 48.951843] stack backtrace:
    [ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
    [ 48.951904] Workqueue: events clock_was_set_work
    [ 48.951904] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
    [ 48.951934] [] (show_stack) from [] (dump_stack+0xac/0xe0)
    [ 48.951934] [] (dump_stack) from [] (print_circular_bug+0x1d0/0x308)
    [ 48.951965] [] (print_circular_bug) from [] (validate_chain+0xf50/0x1324)
    [ 48.951965] [] (validate_chain) from [] (__lock_acquire+0x468/0x7e8)
    [ 48.951995] [] (__lock_acquire) from [] (lock_acquire+0xe0/0x294)
    [ 48.951995] [] (lock_acquire) from [] (ktime_get_update_offsets_now+0x5c/0x1d4)
    [ 48.952026] [] (ktime_get_update_offsets_now) from [] (retrigger_next_event+0x4c/0x90)
    [ 48.952026] [] (retrigger_next_event) from [] (on_each_cpu+0x40/0x7c)
    [ 48.952056] [] (on_each_cpu) from [] (clock_was_set_work+0x14/0x20)
    [ 48.952056] [] (clock_was_set_work) from [] (process_one_work+0x2b4/0x808)
    [ 48.952087] [] (process_one_work) from [] (worker_thread+0x3c/0x550)
    [ 48.952087] [] (worker_thread) from [] (kthread+0x104/0x148)
    [ 48.952087] [] (kthread) from [] (ret_from_fork+0x14/0x24)

    Replace printk() with printk_deferred(), which does not call into
    the scheduler.

    Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
    Reported-and-tested-by: Tony Lindgren
    Signed-off-by: Sergey Senozhatsky
    Cc: Petr Mladek
    Cc: Sergey Senozhatsky
    Cc: Peter Zijlstra
    Cc: "Rafael J . Wysocki"
    Cc: Steven Rostedt
    Cc: John Stultz
    Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Sergey Senozhatsky
     
  • commit fc98c3c8c9dcafd67adcce69e6ce3191d5306c9c upstream.

    Use rcuidle console tracepoint because, apparently, it may be issued
    from an idle CPU:

    hw-breakpoint: Failed to enable monitor mode on CPU 0.
    hw-breakpoint: CPU 0 failed to disable vector catch

    ===============================
    [ ERR: suspicious RCU usage. ]
    4.10.0-rc8-next-20170215+ #119 Not tainted
    -------------------------------
    ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!

    other info that might help us debug this:

    RCU used illegally from idle CPU!
    rcu_scheduler_active = 2, debug_locks = 0
    RCU used illegally from extended quiescent state!
    2 locks held by swapper/0/0:
    #0: (cpu_pm_notifier_lock){......}, at: [] cpu_pm_exit+0x10/0x54
    #1: (console_lock){+.+.+.}, at: [] vprintk_emit+0x264/0x474

    stack backtrace:
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
    Hardware name: Generic OMAP4 (Flattened Device Tree)
    console_unlock
    vprintk_emit
    vprintk_default
    printk
    reset_ctrl_regs
    dbg_cpu_pm_notify
    notifier_call_chain
    cpu_pm_exit
    omap_enter_idle_coupled
    cpuidle_enter_state
    cpuidle_enter_state_coupled
    do_idle
    cpu_startup_entry
    start_kernel

    This RCU warning, however, is suppressed by lockdep_off() in printk().
    lockdep_off() increments the ->lockdep_recursion counter and thus
    disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
    lockdep to be enabled "current->lockdep_recursion == 0".

    Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Reported-by: Tony Lindgren
    Tested-by: Tony Lindgren
    Acked-by: Paul E. McKenney
    Acked-by: Steven Rostedt (VMware)
    Cc: Petr Mladek
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tony Lindgren
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Sergey Senozhatsky
     
  • commit afe3e4d11bdf50a4c3965eb6465ba6bebbcf5dcf upstream.

    In addition to making PME non-modular, d7def2040077 ("PCI/PME: Make
    explicitly non-modular") removed the pcie_pme_driver .remove() method,
    pcie_pme_remove().

    pcie_pme_remove() freed the PME IRQ that was requested in pci_pme_probe().
    The fact that we don't free the IRQ after d7def2040077 causes the following
    crash when removing a PCIe port device via /sys:

    ------------[ cut here ]------------
    kernel BUG at drivers/pci/msi.c:370!
    invalid opcode: 0000 [#1] SMP
    Modules linked in:
    CPU: 1 PID: 14509 Comm: sh Tainted: G W 4.8.0-rc1-yh-00012-gd29438d
    RIP: 0010:[] free_msi_irqs+0x65/0x190
    ...
    Call Trace:
    [] pci_disable_msi+0x34/0x40
    [] cleanup_service_irqs+0x27/0x30
    [] pcie_port_device_remove+0x2a/0x40
    [] pcie_portdrv_remove+0x40/0x50
    [] pci_device_remove+0x4b/0xc0
    [] __device_release_driver+0xb6/0x150
    [] device_release_driver+0x25/0x40
    [] pci_stop_bus_device+0x74/0xa0
    [] pci_stop_and_remove_bus_device_locked+0x1a/0x30
    [] remove_store+0x50/0x70
    [] dev_attr_store+0x18/0x30
    [] sysfs_kf_write+0x44/0x60
    [] kernfs_fop_write+0x10e/0x190
    [] __vfs_write+0x28/0x110
    [] ? percpu_down_read+0x44/0x80
    [] ? __sb_start_write+0xa7/0xe0
    [] ? __sb_start_write+0xa7/0xe0
    [] vfs_write+0xc4/0x180
    [] SyS_write+0x49/0xa0
    [] do_syscall_64+0xa6/0x1b0
    [] entry_SYSCALL64_slow_path+0x25/0x25
    ...
    RIP [] free_msi_irqs+0x65/0x190
    RSP
    ---[ end trace f4505e1dac5b95d3 ]---
    Segmentation fault

    Restore pcie_pme_remove().

    [bhelgaas: changelog]
    Fixes: d7def2040077 ("PCI/PME: Make explicitly non-modular")
    Signed-off-by: Yinghai Lu
    Signed-off-by: Bjorn Helgaas
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Yinghai Lu
     
  • commit 12688dc21f71f4dcc9e2b8b5556b0c6cc8df1491 upstream.

    This reverts commit 63d0f0a6952a1a02bc4f116b7da7c7887e46efa3.

    It caused a regression on platforms where I2C controller is synthesized
    with dynamic TAR update disabled. Detection code is testing is bit
    DW_IC_CON_10BITADDR_MASTER in register DW_IC_CON read-only but fails to
    restore original value in case bit is read-write.

    Instead of fixing this we revert the commit since it was preparation for
    the commit 0317e6c0f1dc ("i2c: designware: do not disable adapter after
    transfer") which was also reverted.

    Reported-by: Shah Nehal-Bakulchandra
    Reported-by: Suravee Suthikulpanit
    Acked-By: Lucas De Marchi
    Fixes: 63d0f0a6952a ("i2c: designware: detect when dynamic tar update is possible")
    Signed-off-by: Jarkko Nikula
    Signed-off-by: Wolfram Sang
    Signed-off-by: Greg Kroah-Hartman

    Jarkko Nikula
     
  • commit 9e3440481845b2ec22508f60837ee2cab2b6054f upstream.

    The 64-bit get_user() wasn't clearing the high word due to a typo in the
    error handler. The exception handler entry was already correct, though.
    Noticed during recent usercopy test additions in lib/test_user_copy.c.

    Signed-off-by: Kees Cook
    Signed-off-by: Russell King
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit 25f71d1c3e98ef0e52371746220d66458eac75bc upstream.

    The UEVENT user mode helper is enabled before the initcalls are executed
    and is available when the root filesystem has been mounted.

    The user mode helper is triggered by device init calls and the executable
    might use the futex syscall.

    futex_init() is marked __initcall which maps to device_initcall, but there
    is no guarantee that futex_init() is invoked _before_ the first device init
    call which triggers the UEVENT user mode helper.

    If the user mode helper uses the futex syscall before futex_init() then the
    syscall crashes with a NULL pointer dereference because the futex subsystem
    has not been initialized yet.

    Move futex_init() to core_initcall so futexes are initialized before the
    root filesystem is mounted and the usermode helper becomes available.

    [ tglx: Rewrote changelog ]

    Signed-off-by: Yang Yang
    Cc: jiang.biao2@zte.com.cn
    Cc: jiang.zhengxiong@zte.com.cn
    Cc: zhong.weidong@zte.com.cn
    Cc: deng.huali@zte.com.cn
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Yang Yang
     
  • commit bb08c04dc867b5f392caec635c097d5d5fcd8c9f upstream.

    100% reproducible issue found on SKL SkullCanyon NUC with two external
    DP daisy-chained monitors in DP/MST mode. When turning off or changing
    the input of the second monitor the machine stops with a kernel
    oops. This issue happened with 4.8.8 as well as drm/drm-intel-nightly.

    This issue is traced to an inconsistent control flow in
    drm_dp_update_payload_part1(): the 'port' pointer is set to NULL at the
    same time as 'req_payload.num_slots' is set to zero, but the pointer is
    dereferenced even when req_payload.num_slot is zero.

    The problematic dereference was introduced in commit dfda0df34
    ("drm/mst: rework payload table allocation to conform better") and may
    impact all versions since v3.18

    The fix suggested by Chris Wilson removes the kernel oops and was found to
    work well after 10mn of monkey-testing with the second monitor power and
    input buttons

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98990
    Fixes: dfda0df34264 ("drm/mst: rework payload table allocation to conform better.")
    Cc: Dave Airlie
    Cc: Chris Wilson
    Cc: Nathan D Ciobanu
    Cc: Dhinakaran Pandiyan
    Cc: Sean Paul
    Tested-by: Nathan D Ciobanu
    Reviewed-by: Dhinakaran Pandiyan
    Signed-off-by: Pierre-Louis Bossart
    Signed-off-by: Jani Nikula
    Link: http://patchwork.freedesktop.org/patch/msgid/1487076561-2169-1-git-send-email-jani.nikula@intel.com
    Signed-off-by: Greg Kroah-Hartman

    Pierre-Louis Bossart
     
  • commit d74c67dd7800fc7aae381f272875c337f268806c upstream.

    The crtc_h/vdisplay fields may not match the CRTC viewport dimensions
    with special modes such as interlaced ones.

    Fixes the HW cursor disappearing in the bottom half of the screen with
    interlaced modes.

    Fixes: 6b16cf7785a4 ("drm/radeon: Hide the HW cursor while it's out of bounds")
    Reported-by: Ashutosh Kumar
    Tested-by: Sonny Jiang
    Reviewed-by: Alex Deucher
    Signed-off-by: Michel Dänzer
    Signed-off-by: Alex Deucher
    Signed-off-by: Greg Kroah-Hartman

    Michel Dänzer
     
  • commit 722c5ac708b4f5c1fcfad5fed4c95234c8b06590 upstream.

    ELAN0605 has been confirmed to be a variant of ELAN0600, which is
    blacklisted in the hid-core to be managed by elan_i2c. This device can be
    found in Lenovo ideapad 310s (80U4000).

    Signed-off-by: Hiroka IHARA
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    IHARA Hiroka
     
  • commit 137d01df511b3afe1f05499aea05f3bafc0fb221 upstream.

    What happens is that a write to /dev/sg is given a request with non-zero
    ->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing
    to an array full of empty iovecs.

    Having write permission to /dev/sg shouldn't be equivalent to the
    ability to trigger BUG_ON() while holding spinlocks...

    Found by Dmitry Vyukov and syzkaller.

    [ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the
    underlying issue. - Linus ]

    Signed-off-by: Al Viro
    Reported-by: Dmitry Vyukov
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit fd3fc0b4d7305fa7246622dcc0dec69c42443f45 upstream.

    Don't crash the machine just because of an empty transfer. Use WARN_ON()
    combined with returning an error.

    Found by Dmitry Vyukov and syzkaller.

    [ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root
    cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON()
    might still be a cause of excessive log spamming.

    NOTE! If this warning ever triggers, we may end up leaking resources,
    since this doesn't bother to try to clean the command up. So this
    WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is
    much worse.

    People really need to stop using BUG_ON() for "this shouldn't ever
    happen". It makes pretty much any bug worse. - Linus ]

    Signed-off-by: Johannes Thumshirn
    Reported-by: Dmitry Vyukov
    Cc: James Bottomley
    Cc: Al Viro
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Johannes Thumshirn
     
  • commit 3f91a89d424a79f8082525db5a375e438887bb3e upstream.

    Currently, if the kernel is running on a POWER9 processor under a
    hypervisor, it may try to use the radix MMU even though it doesn't have
    the necessary code to do so (it doesn't negotiate use of radix, and it
    doesn't do the H_REGISTER_PROC_TBL hcall). If the hypervisor supports
    both radix and HPT, then it will set up the guest to use HPT (since the
    guest doesn't request radix in the CAS call), but if the radix feature
    bit is set in the ibm,pa-features property (which is valid, since
    ibm,pa-features is defined to represent the capabilities of the
    processor) the guest will try to use radix, resulting in a crash when
    it turns the MMU on.

    This makes the minimal fix for the current code, which is to disable
    radix unless we are running in hypervisor mode.

    Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines")
    Signed-off-by: Paul Mackerras
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Paul Mackerras
     
  • commit 3d4ef329757cfd5e0b23cce97cdeca7e2df89c99 upstream.

    Commit 577fb13199b1 ("mmc: rework selection of bus speed mode")
    refactored bus width selection code to mmc_select_bus_width().

    However, it also altered the behavior to not call the selection code in
    non-high-speed modes anymore.

    This causes 1-bit mode to always be used when the high-speed mode is not
    enabled, even though 4-bit and 8-bit bus are valid bus widths in the
    backwards-compatibility (legacy) mode as well (see e.g. 5.3.2 Bus Speed
    Modes in JEDEC 84-B50). This results in a significant regression in
    transfer speeds.

    Fix the code to allow 4-bit and 8-bit widths even without high-speed
    mode, as before.

    Tested with a Zynq-7000 PicoZed 7020 board.

    Fixes: 577fb13199b1 ("mmc: rework selection of bus speed mode")
    Signed-off-by: Anssi Hannula
    Signed-off-by: Ulf Hansson
    Signed-off-by: Greg Kroah-Hartman

    Anssi Hannula
     
  • commit 84588a93d097bace24b9233930f82511d4f34210 upstream.

    Signed-off-by: Miklos Szeredi
    Fixes: d82718e348fe ("fuse_dev_splice_read(): switch to add_to_pipe()")
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 6ba4d2722d06960102c981322035239cd66f7316 upstream.

    There is a potential race between fuse_dev_do_write()
    and request_wait_answer() contexts as shown below:

    TASK 1:
    __fuse_request_send():
    |--spin_lock(&fiq->waitq.lock);
    |--queue_request();
    |--spin_unlock(&fiq->waitq.lock);
    |--request_wait_answer():
    |--if (test_bit(FR_SENT, &req->flags))

    TASK 2:
    fuse_dev_do_write():
    |--clears bit FR_SENT,
    |--request_end():
    |--sets bit FR_FINISHED
    |--spin_lock(&fiq->waitq.lock);
    |--list_del_init(&req->intr_entry);
    |--spin_unlock(&fiq->waitq.lock);
    |--fuse_put_request();
    |--queue_interrupt();

    |--wake_up_locked(&fiq->waitq);
    |--wait_event_freezable();

    Now, the next fuse_dev_do_read(), see interrupts list is not empty
    and then calls fuse_read_interrupt() which tries to access the request
    which is already free'd and gets the below crash:

    [11432.401266] Unable to handle kernel paging request at virtual address
    6b6b6b6b6b6b6b6b
    ...
    [11432.418518] Kernel BUG at ffffff80083720e0
    [11432.456168] PC is at __list_del_entry+0x6c/0xc4
    [11432.463573] LR is at fuse_dev_do_read+0x1ac/0x474
    ...
    [11432.679999] [] __list_del_entry+0x6c/0xc4
    [11432.687794] [] fuse_dev_do_read+0x1ac/0x474
    [11432.693180] [] fuse_dev_read+0x6c/0x78
    [11432.699082] [] __vfs_read+0xc0/0xe8
    [11432.704459] [] vfs_read+0x90/0x108
    [11432.709406] [] SyS_read+0x58/0x94

    As FR_FINISHED bit is set before deleting the intr_entry with input
    queue lock in request completion path, do the testing of this flag and
    queueing atomically with the same lock in queue_interrupt().

    Signed-off-by: Sahitya Tummala
    Signed-off-by: Miklos Szeredi
    Fixes: fd22d62ed0c3 ("fuse: no fc->lock for iqueue parts")
    Signed-off-by: Greg Kroah-Hartman

    Sahitya Tummala
     
  • commit f9c85ee67164b37f9296eab3b754e543e4e96a1c upstream.

    Reported as a Kaffeine bug:
    https://bugs.kde.org/show_bug.cgi?id=375811

    The USB control messages require DMA to work. We cannot pass
    a stack-allocated buffer, as it is not warranted that the
    stack would be into a DMA enabled area.

    On Kernel 4.9, the default is to not accept DMA on stack anymore
    on x86 architecture. On other architectures, this has been a
    requirement since Kernel 2.2. So, after this patch, this driver
    should likely work fine on all archs.

    Tested with USB ID 2040:5510: Hauppauge Windham

    Signed-off-by: Mauro Carvalho Chehab
    Signed-off-by: Greg Kroah-Hartman

    Mauro Carvalho Chehab
     
  • commit 5a81e6a171cdbd1fa8bc1fdd80c23d3d71816fac upstream.

    Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
    unused part of the pipe ring buffer. Previously splice_to_pipe() left
    the flags value alone, which could result in incorrect behavior.

    Uninitialized flags appears to have been there from the introduction of
    the splice syscall.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

18 Feb, 2017

3 commits

  • Greg Kroah-Hartman
     
  • commit dffba9a31c7769be3231c420d4b364c92ba3f1ac upstream.

    The compacted-format XSAVES area is determined at boot time and
    never changed after. The field xsave.header.xcomp_bv indicates
    which components are in the fixed XSAVES format.

    In fpstate_init() we did not set xcomp_bv to reflect the XSAVES
    format since at the time there is no valid data.

    However, after we do copy_init_fpstate_to_fpregs() in fpu__clear(),
    as in commit:

    b22cbe404a9c x86/fpu: Fix invalid FPU ptrace state after execve()

    and when __fpu_restore_sig() does fpu__restore() for a COMPAT-mode
    app, a #GP occurs. This can be easily triggered by doing valgrind on
    a COMPAT-mode "Hello World," as reported by Joakim Tjernlund and
    others:

    https://bugzilla.kernel.org/show_bug.cgi?id=190061

    Fix it by setting xcomp_bv correctly.

    This patch also moves the xcomp_bv initialization to the proper
    place, which was in copyin_to_xsaves() as of:

    4c833368f0bf x86/fpu: Set the xcomp_bv when we fake up a XSAVES area

    which fixed the bug too, but it's more efficient and cleaner to
    initialize things once per boot, not for every signal handling
    operation.

    Reported-by: Kevin Hao
    Reported-by: Joakim Tjernlund
    Signed-off-by: Yu-cheng Yu
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Ravi V. Shankar
    Cc: Thomas Gleixner
    Cc: haokexin@gmail.com
    Link: http://lkml.kernel.org/r/1485212084-4418-1-git-send-email-yu-cheng.yu@intel.com
    [ Combined it with 4c833368f0bf. ]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Yu-cheng Yu
     
  • commit 92e55f412cffd016cc245a74278cb4d7b89bb3bc upstream.

    Unlike ipv4, this control socket is shared by all cpus so we cannot use
    it as scratchpad area to annotate the mark that we pass to ip6_xmit().

    Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
    family caches the flowi6 structure in the sctp_transport structure, so
    we cannot use to carry the mark unless we later on reset it back, which
    I discarded since it looks ugly to me.

    Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled")
    Suggested-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira