01 Nov, 2017
1 commit
-
MIPS will soon not be a part of Imagination Technologies, and as such
many @imgtec.com email addresses will no longer be valid. This patch
updates the addresses for those who:- Have 10 or more patches in mainline authored using an @imgtec.com
email address, or any patches dated within the past year.- Are still with Imagination but leaving as part of the MIPS business
unit, as determined from an internal email address list.- Haven't already updated their email address (ie. JamesH) or expressed
a desire to be excluded (ie. Maciej).- Acked v2 or earlier of this patch, which leaves Deng-Cheng, Matt &
myself.New addresses are of the form firstname.lastname@mips.com, and all
verified against an internal email address list. An entry is added to
.mailmap for each person such that get_maintainer.pl will report the new
addresses rather than @imgtec.com addresses which will soon be dead.Instances of the affected addresses throughout the tree are then
mechanically replaced with the new @mips.com address.Signed-off-by: Paul Burton
Cc: Deng-Cheng Zhu
Cc: Deng-Cheng Zhu
Acked-by: Dengcheng Zhu
Cc: Matt Redfearn
Cc: Matt Redfearn
Acked-by: Matt Redfearn
Cc: Andrew Morton
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: trivial@kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17540/
Signed-off-by: James Hogan
29 Oct, 2017
4 commits
-
Pull networking fixes from David Miller:
1) Fix route leak in xfrm_bundle_create().
2) In mac80211, validate user rate mask before configuring it. From
Johannes Berg.3) Properly enforce memory limits in fair queueing code, from Toke
Hoiland-Jorgensen.4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.
5) Fix TSO header allocation and management in mvpp2 driver, from Yan
Markman.6) Don't take socket lock in BH handler in strparser code, from Tom
Herbert.7) Don't show sockets from other namespaces in AF_UNIX code, from
Andrei Vagin.8) Fix double free in error path of tap_open(), from Girish Moodalbail.
9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
and Alexander Duyck.10) Fix DCB mode programming in stmmac driver, from Jose Abreu.
11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
Long.12) Properly align SKB head before building SKB in tuntap, from Jason
Wang.13) Avoid matching qdiscs with a zero handle during lookups, from Cong
Wang.14) Fix various endianness bugs in sctp, from Xin Long.
15) Fix tc filter callback races and add selftests which trigger the
problem, from Cong Wang.* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
selftests: Introduce a new test case to tc testsuite
selftests: Introduce a new script to generate tc batch file
net_sched: fix call_rcu() race on act_sample module removal
net_sched: add rtnl assertion to tcf_exts_destroy()
net_sched: use tcf_queue_work() in tcindex filter
net_sched: use tcf_queue_work() in rsvp filter
net_sched: use tcf_queue_work() in route filter
net_sched: use tcf_queue_work() in u32 filter
net_sched: use tcf_queue_work() in matchall filter
net_sched: use tcf_queue_work() in fw filter
net_sched: use tcf_queue_work() in flower filter
net_sched: use tcf_queue_work() in flow filter
net_sched: use tcf_queue_work() in cgroup filter
net_sched: use tcf_queue_work() in bpf filter
net_sched: use tcf_queue_work() in basic filter
net_sched: introduce a workqueue for RCU callbacks of tc filter
sctp: fix some type cast warnings introduced since very beginning
sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
sctp: fix some type cast warnings introduced by transport rhashtable
sctp: fix some type cast warnings introduced by stream reconf
... -
Pull input fixes from Dmitry Torokhov:
- fix gtco tablet driver, tightening parsing of HID descriptors
- add ACPI ID added to Elan driver to be able to handle touchpads found
in Lenovo Ideapad 320/520- fix the Symaptics RMI4 driver to adjust handling of buttons
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi4 - limit the range of what GPIOs are buttons
Input: gtco - fix potential out-of-bound access
Input: elan_i2c - add ELAN0611 to the ACPI table -
Pull drm fixes from Dave Airlie:
"Two amd fixes, one i915 core and a few i915 GVT fixes, things seem
fairly quiet"* tag 'drm-fixes-for-v4.14-rc7' of git://people.freedesktop.org/~airlied/linux:
drm/i915/gvt: Adding ACTHD mmio read handler
drm/i915/gvt: Extract mmio_read_from_hw() common function
drm/i915/gvt: Refine MMIO_RING_F()
drm/i915/gvt: properly check per_ctx bb valid state
drm/i915/perf: fix perf enable/disable ioctls with 32bits userspace
drm/amd/amdgpu: Remove workaround check for UVD6 on APUs
drm/amd/powerplay: fix uninitialized variable -
Pull SCSI fixes from James Bottomley:
"Six fixes for mostly minor issues, most of which have small race
windows for occurring"* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFER
scsi: sg: Re-fix off by one in sg_fill_request_table()
scsi: aacraid: Fix controller initialization failure
scsi: hpsa: Fix configured_logical_drive_count·check
scsi: qla2xxx: Initialize Work element before requesting IRQs
scsi: zfcp: fix erp_action use-before-initialize in REC action trace
28 Oct, 2017
6 commits
-
The commit 9a393b5d5988 ("tap: tap as an independent module") created a
separate tap module that implements tap functionality and exports
interfaces that will be used by macvtap and ipvtap modules to create
create respective tap devices.However, that patch introduced a regression wherein the modules macvtap
and ipvtap can be removed (through modprobe -r) while there are
applications using the respective /dev/tapX devices. These applications
cause kernel to hold reference to /dev/tapX through 'struct cdev
macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
modules respectively. So, when the application is later closed the
kernel panics because we are referencing KVA that is present in the
unloaded modules.----------8
BUG: unable to handle kernel paging request at ffffffffa038c500
IP: cdev_put+0xf/0x30
----------8
Signed-off-by: David S. Miller -
An unaligned alloc_frag->offset caused by previous allocation will
result an unaligned skb->head. This will lead unaligned
skb_shared_info and then unaligned dataref which requires to be
aligned for accessing on some architecture. Fix this by aligning
alloc_frag->offset before the frag refilling.Fixes: 0bbd7dad34f8 ("tun: make tun_build_skb() thread safe")
Cc: Eric Dumazet
Cc: Willem de Bruijn
Cc: Wei Wei
Cc: Dmitry Vyukov
Cc: Mark Rutland
Reported-by: Wei Wei
Signed-off-by: Jason Wang
Signed-off-by: David S. Miller -
Pull xen fixes from Juergen Gross:
- a fix for the Xen gntdev device repairing an issue in case of partial
failure of mapping multiple pages of another domain- a fix of a regression in the Xen balloon driver introduced in 4.13
- a build fix for Xen on ARM which will trigger e.g. for Linux RT
- a maintainers update for pvops (not really Xen, but carrying through
this tree just for convenience)* tag 'for-linus-4.14c-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
maintainers: drop Chris Wright from pvops
arm/xen: don't inclide rwlock.h directly.
xen: fix booting ballooned down hvm guest
xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap() -
Pull EFI fixes from Ingo Molnar:
"Two fixes: an ARM fix for KASLR interaction with hibernation, plus an
efi_test crash fix"* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/arm: Don't randomize runtime regions when CONFIG_HIBERNATION=y
efi/efi_test: Prevent an Oops in efi_runtime_query_capsulecaps() -
By convention the first 6 bits of F30 Ctrl 2 and 3 are used to signify
GPIOs which are connected to buttons. Additional GPIOs may be used as
input GPIOs to signal the touch controller of some event
(ie disable touchpad). These additional GPIOs may meet the criteria of
a button in rmi_f30_is_valid_button() but should not be considered
buttons. This patch limits the GPIOs which are mapped to buttons to just
the first 6.Signed-off-by: Andrew Duggan
Reported-by: Daniel Martin
Tested-by: Daniel Martin
Acked-By: Benjamin Tissoires
Signed-off-by: Dmitry Torokhov -
parse_hid_report_descriptor() has a while (i < length) loop, which
only guarantees that there's at least 1 byte in the buffer, but the
loop body can read multiple bytes which causes out-of-bounds access.Reported-by: Andrey Konovalov
Reviewed-by: Andrey Konovalov
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov
27 Oct, 2017
11 commits
-
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2017-10-26This series contains fixes to e1000, igb, ixgbe and i40e.
Vincenzo Maffione fixes a potential race condition which would result in
the interface being up but transmits are disabled in the hardware.Colin Ian King fixes a possible NULL pointer dereference in e1000, which
was found by Coverity.Jean-Philippe Brucker fixes a possible kernel panic when a driver cannot
map a transmit buffer, which is caused by an erroneous test.Alex provides a fix for ixgbe, which is a partial revert of the commit
ffed21bcee7a ("ixgbe: Don't bother clearing buffer memory for descriptor rings")
because the previous commit messed up the exception handling path by
adding the count back in when we did not need to. Also fixed a typo,
where the transmit ITR setting was being used to determine if we were
using adaptive receive interrupt moderation or not. Lastly, fixed a
memory leak by including programming descriptors in the cleaned count.
====================Signed-off-by: David S. Miller
-
According to DWMAC databook the first queue operating mode
must always be in DCB.As MTL_QUEUE_DCB = 1, we need to always set the first queue
operating mode to DCB otherwise driver will think that queue
is in AVB mode (because MTL_QUEUE_AVB = 0).Signed-off-by: Jose Abreu
Cc: Joao Pinto
Cc: David S. Miller
Cc: Giuseppe Cavallaro
Cc: Alexandre Torgue
Signed-off-by: David S. Miller -
According to DT bindings documentation we are expecting a
property called "snps,read-requests" but we are parsing
instead a property called "read,read-requests".This is clearly a typo. Fix it.
Signed-off-by: Jose Abreu
Cc: Joao Pinto
Cc: David S. Miller
Cc: Giuseppe Cavallaro
Cc: Alexandre Torgue
Signed-off-by: David S. Miller -
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2017-10-26The series includes some misc fixes for mlx5 core and etherent driver.
Please pull and let me know if there's any problem.For -Stable:
net/mlx5e: Properly deal with encap flows add/del under neigh update (kernels >= 4.12)
net/mlx5: Fix health work queue spin lock to IRQ safe (kernels >= 4.13)
====================Signed-off-by: David S. Miller
-
One fix for stable:
- fix perf enable/disable ioctls for 32bits (Lionel)
Plus GVT fixes:
- Fix per_ctx_bb check (Zhenyu)
- Fix GPU hang of Linux guest (Xion)
- Refine MMIO_RING_F to check for presence of VCS2 ring (Zhi)* tag 'drm-intel-fixes-2017-10-26' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915/gvt: Adding ACTHD mmio read handler
drm/i915/gvt: Extract mmio_read_from_hw() common function
drm/i915/gvt: Refine MMIO_RING_F()
drm/i915/gvt: properly check per_ctx bb valid state -
Pull rdma fix from Doug Ledford:
"Fix an oops issue in the new RDMA netlink code"* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag -
When a workload is too heavy to finish it in gpu hang check timer
intervals(1.5), gpu hang check function will check ACTHD register
value to decide whether gpu is real dead or not. On real hw,
ACTHD is updated by HW when workload is running, then host kernel
won't think it is gpu hang. while guest kernel always read a constant
ACTHD value as GVT doesn't supply ACTHD emulate handler, then
guest kernel detects a fake gpu hang.To remove such guest fake gpu hang, this patch supply ACTHD
mmio read handler which read real HW ACTHD register directly.Signed-off-by: Xiong Zhang
Signed-off-by: Zhi Wang
Signed-off-by: Rodrigo Vivi
Link: https://patchwork.freedesktop.org/patch/msgid/b4c9a097-3e62-124e-6856-b0c37764df7b@intel.com -
The mmio read handler for ring timestmap / instdone register are same
as reading hw value directly.Extract it as common function to reduce code duplications.
Signed-off-by: Xiong Zhang
Signed-off-by: Zhi Wang -
Inspect if the host has VCS2 ring by host i915 macro in MMIO_RING_F().
Also this helps on reducing some LOCs.Signed-off-by: Zhi Wang
-
Need to check valid state for per_ctx bb and bypass batch buffer
combine for scan if necessary. Otherwise adding invalid MI batch
buffer start cmd for per_ctx bb will cause scan failure, which is
taken as -EFAULT now so vGPU would be put in failsafe. This trys
to fix that by checking per_ctx bb valid state. Also remove old
invalid WARNING that indirect ctx bb shouldn't depend on valid
per_ctx bb.Signed-off-by: Zhenyu Wang
Signed-off-by: Zhi Wang -
Pull power management fix from Rafael Wysocki:
"This fixes a device power management quality of service (PM QoS)
framework implementation issue causing 'no restriction' requests for
device resume latency, including 'no restriction' set by user space,
to effectively override requests with specific device resume latency
requirements.It is late in the cycle, but the bug in question is in the 'user space
can trigger unexpected behavior' category and the fix is
stable-candidate, so here it goes"* tag 'pm-4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / QoS: Fix device resume latency PM QoS
26 Oct, 2017
18 commits
-
Pull block fixes from Jens Axboe:
"A few select fixes that should go into this series. Mainly for NVMe,
but also a single stable fix for nbd from Josef"* 'for-linus' of git://git.kernel.dk/linux-block:
nbd: handle interrupted sendmsg with a sndtimeo set
nvme-rdma: Fix error status return in tagset allocation failure
nvme-rdma: Fix possible double free in reconnect flow
nvmet: synchronize sqhd update
nvme-fc: retry initial controller connections 3 times
nvme-fc: fix iowait hang -
Pull spi fixes from Mark Brown:
"There are a bunch of device specific fixes (more than I'd like, I've
been lax sending these) plus one important core fix for the conversion
to use an IDR for bus number allocation which avoids issues with
collisions when some but not all of the buses in the system have a
fixed bus number specified.The Armada changes are rather large, specificially "spi: armada-3700:
Fix padding when sending not 4-byte aligned data", but it's a storage
corruption issue and there's things like indentation changes which
make it look bigger than it really is. It's been cooking in -next for
quite a while now and is part of the reason for the delay"* tag 'spi-fix-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fix IDR collision on systems with both fixed and dynamic SPI bus numbers
spi: bcm-qspi: Fix use after free in bcm_qspi_probe() in error path
spi: a3700: Return correct value on timeout detection
spi: uapi: spidev: add missing ioctl header
spi: stm32: Fix logical error in stm32_spi_prepare_mbr()
spi: armada-3700: Fix padding when sending not 4-byte aligned data
spi: armada-3700: Fix failing commands with quad-SPI -
This patch updates the i40e driver to include programming descriptors in
the cleaned_count. Without this change it becomes possible for us to leak
memory as we don't trigger a large enough allocation when the time comes to
allocate new buffers and we end up overwriting a number of rx_buffers equal
to the number of programming descriptors we encountered.Fixes: 0e626ff7ccbf ("i40e: Fix support for flow director programming status")
Signed-off-by: Alexander Duyck
Tested-by: Anders K. Pedersen
Signed-off-by: Jeff Kirsher -
It looks like there was either a copy/paste error or just a typo that
resulted in the Tx ITR setting being used to determine if we were using
adaptive Rx interrupt moderation or not.This patch fixes the typo.
Fixes: 65e87c0398f5 ("i40evf: support queue-specific settings for interrupt moderation")
Signed-off-by: Alexander Duyck
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher -
This patch is a partial revert of "ixgbe: Don't bother clearing buffer
memory for descriptor rings". Specifically I messed up the exception
handling path a bit and this resulted in us incorrectly adding the count
back in when we didn't need to.In order to make this simpler I am reverting most of the exception handling
path change and instead just replacing the bit that was handled by the
unmap_and_free call.Fixes: ffed21bcee7a ("ixgbe: Don't bother clearing buffer memory for descriptor rings")
Signed-off-by: Alexander Duyck
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher -
When the driver cannot map a TX buffer, instead of rolling back
gracefully and retrying later, we currently get a panic:[ 159.885994] igb 0000:00:00.0: TX DMA map failed
[ 159.886588] Unable to handle kernel paging request at virtual address ffff00000a08c7a8
...
[ 159.897031] PC is at igb_xmit_frame_ring+0x9c8/0xcb8Fix the erroneous test that leads to this situation.
Signed-off-by: Jean-Philippe Brucker
Tested-by: Andrew Bowers
Signed-off-by: Jeff Kirsher -
Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously. Fix this by skipping over the read of p on an invalid
stat type.Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")
Signed-off-by: Colin Ian King
Reviewed-by: Alexander Duyck
Tested-by: Aaron Brown
Signed-off-by: Jeff Kirsher -
This patch fixes a race condition that can result into the interface being
up and carrier on, but with transmits disabled in the hardware.
The bug may show up by repeatedly IFF_DOWN+IFF_UP the interface, which
allows e1000_watchdog() interleave with e1000_down().CPU x CPU y
--------------------------------------------------------------------
e1000_down():
netif_carrier_off()
e1000_watchdog():
if (carrier == off) {
netif_carrier_on();
enable_hw_transmit();
}
disable_hw_transmit();
e1000_watchdog():
/* carrier on, do nothing */Signed-off-by: Vincenzo Maffione
Tested-by: Aaron Brown
Signed-off-by: Jeff Kirsher -
Commit 96edd61dcf44362d3ef0bed1a5361e0ac7886a63 ("xen/balloon: don't
online new memory initially") introduced a regression when booting a
HVM domain with memory less than mem-max: instead of ballooning down
immediately the system would try to use the memory up to mem-max
resulting in Xen crashing the domain.For HVM domains the current size will be reflected in Xenstore node
memory/static-max instead of memory/target.Additionally we have to trigger the ballooning process at once.
Cc: # 4.13
Fixes: 96edd61dcf44362d3ef0bed1a5361e0ac7886a63 ("xen/balloon: don't
online new memory initially")Reported-by: Simon Gaiser
Suggested-by: Boris Ostrovsky
Signed-off-by: Juergen Gross
Reviewed-by: Boris Ostrovsky
Signed-off-by: Boris Ostrovsky -
Double free of skb_array in tap module is causing kernel panic. When
tap_set_queue() fails we free skb_array right away by calling
skb_array_cleanup(). However, later on skb_array_cleanup() is called
again by tap_sock_destruct through sock_put(). This patch fixes that
issue.Fixes: 362899b8725b35e3 (macvtap: switch to use skb array)
Signed-off-by: Girish Moodalbail
Acked-by: Jason Wang
Signed-off-by: David S. Miller -
…nux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-10-24here's another pull request for net/master.
The patch by Gerhard Bertelsmann fixes the CAN_CTRLMODE_LOOPBACK in the
sun4i driver. Two patches by Jimmy Assarsson for the kvaser_usb driver
fix a print in the error path of the kvaser_usb_close() and remove a
wrong warning message with the Leaf v2 firmware version v4.1.844.
====================Signed-off-by: David S. Miller <davem@davemloft.net>
-
This patch replaces GFP_KERNEL by GFP_ATOMIC to avoid sleeping in the
ndo_set_rx_mode() call which is called with BH disabled.Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Antoine Tenart
Signed-off-by: David S. Miller -
When calling mvpp2_prs_mac_multi_set() from mvpp2_prs_mac_init(), two
parameters (the port index and the table index) are inverted. Fixes
this.Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Antoine Tenart
Signed-off-by: David S. Miller -
This patch fixes a typo in the mvpp2_prs_tcam_data_cmp() function, as
the shift value is inverted with the data.Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Antoine Tenart
Signed-off-by: David S. Miller -
Previously, tc with ets type and zero bandwidth is not accepted
by driver. This behavior does not follow the IEEE802.1qaz spec.If there are tcs with ets type and zero bandwidth, these tcs are
assigned to the lowest priority tc_group #0. We equally distribute
100% bw of the tc_group #0 to these zero bandwidth ets tcs.
Also, the non zero bandwidth ets tcs are assigned to tc_group #1.If there is no zero bandwidth ets tc, the non zero bandwidth ets tcs
are assigned to tc_group #0.Fixes: cdcf11212b22 ("net/mlx5e: Validate BW weight values of ETS")
Signed-off-by: Huy Nguyen
Reviewed-by: Parav Pandit
Signed-off-by: Saeed Mahameed -
Currently, the encap action offload is handled in the actions parse
function and not in mlx5e_tc_add_fdb_flow() where we deal with all
the other aspects of offloading actions (vlan, modify header) and
the rule itself.When the neigh update code (mlx5e_tc_encap_flows_add()) recreates the
encap entry and offloads the related flows, we wrongly call again into
mlx5e_tc_add_fdb_flow(), this for itself would cause us to handle
again the offloading of vlans and header re-write which puts things
in non consistent state and step on freed memory (e.g the modify
header parse buffer which is already freed).Since on error, mlx5e_tc_add_fdb_flow() detaches and may release the
encap entry, it causes a corruption at the neigh update code which goes
over the list of flows associated with this encap entry, or double free
when the tc flow is later deleted by user-space.When neigh update (mlx5e_tc_encap_flows_del()) unoffloads the flows related
to an encap entry which is now invalid, we do a partial repeat of the eswitch
flow removal code which is wrong too.To fix things up we do the following:
(1) handle the encap action offload in the eswitch flow add function
mlx5e_tc_add_fdb_flow() as done for the other actions and the rule itself.(2) modify the neigh update code (mlx5e_tc_encap_flows_add/del) to only
deal with the encap entry and rules delete/add and not with any of
the other offloaded actions.Fixes: 232c001398ae ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Or Gerlitz
Reviewed-by: Paul Blakey
Signed-off-by: Saeed Mahameed -
mlx5_ib_add is called during mlx5_pci_resume after a pci error.
Before mlx5_ib_add completes, there are multiple events which trigger
function mlx5_ib_event. This cause kernel panic because mlx5_ib_event
accesses unitialized resources.The fix is to extend Erez Shitrit's patch
("net/mlx5: Delay events till ib registration ends") to cover
the pci resume code path.Trace:
mlx5_core 0001:01:00.6: mlx5_pci_resume was called
mlx5_core 0001:01:00.6: firmware version: 16.20.1011
mlx5_core 0001:01:00.6: mlx5_attach_interface:164:(pid 779):
mlx5_ib_event:2996:(pid 34777): warning: event on port 1
mlx5_ib_event:2996:(pid 34782): warning: event on port 1
Unable to handle kernel paging request for data at address 0x0001c104
Faulting instruction address: 0xd000000008f411fc
Oops: Kernel access of bad area, sig: 11 [#1]
...
...
Call Trace:
[c000000fff77bb70] [d000000008f4119c] mlx5_ib_event+0x64/0x470 [mlx5_ib] (unreliable)
[c000000fff77bc60] [d000000008e67130] mlx5_core_event+0xb8/0x210 [mlx5_core]
[c000000fff77bd10] [d000000008e4bd00] mlx5_eq_int+0x528/0x860[mlx5_core]Fixes: 97834eba7c19 ("net/mlx5: Delay events till ib registration ends")
Signed-off-by: Huy Nguyen
Reviewed-by: Saeed Mahameed
Signed-off-by: Saeed Mahameed -
spin_lock/unlock of health->wq_lock should be IRQ safe.
It was changed to spin_lock_irqsave since adding commit 0179720d6be2
("net/mlx5: Introduce trigger_health_work function") which uses
spin_lock from asynchronous event (IRQ) context.
Thus, all spin_lock/unlock of health->wq_lock should have been moved
to IRQ safe mode.
However, one occurrence on new code using this lock missed that
change, resulting in possible deadlock:
kernel: Possible unsafe locking scenario:
kernel: CPU0
kernel: ----
kernel: lock(&(&health->wq_lock)->rlock);
kernel:
kernel: lock(&(&health->wq_lock)->rlock);
kernel: #012 *** DEADLOCK ***Fixes: 2a0165a034ac ("net/mlx5: Cancel delayed recovery work when unloading the driver")
Signed-off-by: Moshe Shemesh
Signed-off-by: Saeed Mahameed