05 Jul, 2015
5 commits
-
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
... -
Commit 835a6a2f8603 ("Bluetooth: Stop sabotaging list poisoning")
thought that the code was sabotaging the list poisoning when NULL'ing
out the list pointers and removed it.But what was going on was that the bluetooth code was using NULL
pointers for the list as a way to mark it empty, and that commit just
broke it (and replaced the test with NULL with a "list_empty()" test on
a uninitialized list instead, breaking things even further).So fix it all up to use the regular and real list_empty() handling
(which does not use NULL, but a pointer to itself), also making sure to
initialize the list properly (the previous NULL case was initialized
implicitly by the session being allocated with kzalloc())This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong
An.[ I would normally expect to get this through the bt tree, but I'm going
to release -rc1, so I'm just committing this directly - Linus ]Reported-and-tested-by: Jörg Otte
Cc: Alexey Dobriyan
Original-by: Tedd Ho-Jeong An
Original-by: Marcel Holtmann :
Signed-off-by: Linus Torvalds -
if server claims to have written/read more than we'd told it to,
warn and cap the claimed byte count to avoid advancing more than
we are ready to. -
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.Cc: stable@vger.kernel.org
Signed-off-by: Al Viro -
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.Cc: stable@vger.kernel.org # v3.2 and later
Signed-off-by: Al Viro
03 Jul, 2015
3 commits
-
Pull Ceph updates from Sage Weil:
"We have a pile of bug fixes from Ilya, including a few patches that
sync up the CRUSH code with the latest from userspace.There is also a long series from Zheng that fixes various issues with
snapshots, inline data, and directory fsync, some simplification and
improvement in the cap release code, and a rework of the caching of
directory contents.To top it off there are a few small fixes and cleanups from Benoit and
Hong"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
rbd: use GFP_NOIO in rbd_obj_request_create()
crush: fix a bug in tree bucket decode
libceph: Fix ceph_tcp_sendpage()'s more boolean usage
libceph: Remove spurious kunmap() of the zero page
rbd: queue_depth map option
rbd: store rbd_options in rbd_device
rbd: terminate rbd_opts_tokens with Opt_err
ceph: fix ceph_writepages_start()
rbd: bump queue_max_segments
ceph: rework dcache readdir
crush: sync up with userspace
crush: fix crash from invalid 'take' argument
ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
ceph: pre-allocate data structure that tracks caps flushing
ceph: re-send flushing caps (which are revoked) in reconnect stage
ceph: send TID of the oldest pending caps flush to MDS
ceph: track pending caps flushing globally
ceph: track pending caps flushing accurately
libceph: fix wrong name "Ceph filesystem for Linux"
ceph: fix directory fsync
... -
Pull NFS client updates from Trond Myklebust:
"Highlights include:Stable patches:
- Fix a crash in the NFSv4 file locking code.
- Fix an fsync() regression, where we were failing to retry I/O in
some circumstances.
- Fix an infinite loop in NFSv4.0 OPEN stateid recovery
- Fix a memory leak when an attempted pnfs fails.
- Fix a memory leak in the backchannel code
- Large hostnames were not supported correctly in NFSv4.1
- Fix a pNFS/flexfiles bug that was impeding error reporting on I/O.
- Fix a couple of credential issues in pNFS/flexfilesBugfixes + cleanups:
- Open flag sanity checks in the NFSv4 atomic open codepath
- More NFSv4 delegation related bugfixes
- Various NFSv4.1 backchannel bugfixes and cleanups
- Fix the NFS swap socket code
- Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code
- Fix a UDP transport deadlock issueFeatures:
- More RDMA client transport improvements
- NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles"* tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (87 commits)
nfs: Remove invalid tk_pid from debug message
nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh
nfs: Drop bad comment in nfs41_walk_client_list()
nfs: Remove unneeded micro checking of CONFIG_PROC_FS
nfs: Don't setting FILE_CREATED flags always
nfs: Use remove_proc_subtree() instead remove_proc_entry()
nfs: Remove unused argument in nfs_server_set_fsinfo()
nfs: Fix a memory leak when meeting an unsupported state protect
nfs: take extra reference to fl->fl_file when running a LOCKU operation
NFSv4: When returning a delegation, don't reclaim an incompatible open mode.
NFSv4.2: LAYOUTSTATS is optional to implement
NFSv4.2: Fix up a decoding error in layoutstats
pNFS/flexfiles: Fix the reset of struct pgio_header when resending
pNFS/flexfiles: Turn off layoutcommit for servers that don't need it
pnfs/flexfiles: protect ktime manipulation with mirror lock
nfs: provide pnfs_report_layoutstat when NFS42 is disabled
nfs: verify open flags before allowing open
nfs: always update creds in mirror, even when we have an already connected ds
nfs: fix potential credential leak in ff_layout_update_mirror_cred
pnfs/flexfiles: report layoutstat regularly
... -
…scm/linux/kernel/git/paulg/linux
Pull module_init replacement part two from Paul Gortmaker:
"Replace module_init with appropriate alternate initcall in non
modules.This series converts non-modular code that is using the module_init()
call to hook itself into the system to instead use one of our
alternate priority initcalls.Unlike the previous series that used device_initcall and hence was a
runtime no-op, these commits change to one of the alternate initcalls,
because (a) we have them and (b) it seems like the right thing to do.For example, it would seem logical to use arch_initcall for arch
specific setup code and fs_initcall for filesystem setup code.This does mean however, that changes in the init ordering will be
taking place, and so there is a small risk that some kind of implicit
init ordering issue may lie uncovered. But I think it is still better
to give these ones sensible priorities than to just assign them all to
device_initcall in order to exactly preserve the old ordering.Thad said, we have already made similar changes in core kernel code in
commit c96d6660dc65 ("kernel: audit/fix non-modular users of
module_init in core code") without any regressions reported, so this
type of change isn't without precedent. It has also got the same
local testing and linux-next coverage as all the other pull requests
that I'm sending for this merge window have got.Once again, there is an unused module_exit function removal that shows
up as an outlier upon casual inspection of the diffstat"* tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling
x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling
mm/page_owner.c: use late_initcall to hook in enabling
lib/list_sort: use late_initcall to hook in self tests
arm: use subsys_initcall in non-modular pl320 IPC code
powerpc: don't use module_init for non-modular core hugetlb code
powerpc: use subsys_initcall for Freescale Local Bus
x86: don't use module_init for non-modular core bootflag code
netfilter: don't use module_init/exit in core IPV4 code
fs/notify: don't use module_init for non-modular inotify_user code
mm: replace module_init usages with subsys_initcall in nommu.c
02 Jul, 2015
2 commits
-
Pull networking fixes from David Miller:
1) mlx4 driver bug fixes (TX queue wakeups, csum complete indications)
from Ido Shamay, Eran Ben Elisha, and Or Gerlitz.2) Missing unlock in error path of PTP support in renesas driver, from
Dan Carpenter.3) Add Vitesse 8641 phy IDs to vitesse PHY driver, from Shaohui Xie.
4) Bnx2x driver bug fixes (linearization of encap packets, scratchpad
parity error notifications, flow-control and speed settings) from
Yuval Mintz, Manish Chopra, Shahed Shaikh, and Ariel Elior.5) ipv6 extension header parsing in the igb chip has a HW errata,
disable it. Frm Todd Fujinaka.6) Fix PCI link state locking issue in e1000e driver, from Yanir
Lubetkin.7) Cure panics during MTU change in i40e, from Mitch Williams.
8) Don't leak promisc refs in DSA slave driver, from Gilad Ben-Yossef.
9) Add missing HAS_DMA dep to VIA Rhine driver, from Geery
Uytterhoeven.10) Make sure DMA map/unmap calls are symmetric in bnx2x driver, from
Michal Schmidt.11) Workaround for MDIO access problems in bcm7xxx devices, from FLorian
Fainelli.12) Fix races in SCTP protocol between OTTB responses and route
removals, from Alexander Sverdlin.13) Fix jumbo frame checksum issue with some mvneta devices, from Simon
Guinot.* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (58 commits)
sock_diag: don't broadcast kernel sockets
net: mvneta: disable IP checksum with jumbo frames for Armada 370
ARM: mvebu: update Ethernet compatible string for Armada XP
net: mvneta: introduce compatible string "marvell, armada-xp-neta"
api: fix compatibility of linux/in.h with netinet/in.h
net: icplus: fix typo in constant name
sis900: Trivial: Fix typos in enums
stmmac: Trivial: fix typo in constant name
sctp: Fix race between OOTB responce and route removal
net-Liquidio: Delete unnecessary checks before the function call "vfree"
vmxnet3: Bump up driver version number
amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation
net: phy: mdio-bcm-unimac: workaround initial read failures for integrated PHYs
net: bcmgenet: workaround initial read failures for integrated PHYs
net: phy: bcm7xxx: workaround MDIO management controller initial read
bnx2x: fix DMA API usage
net: via: VIA_RHINE and VIA_VELOCITY should depend on HAS_DMA
net/phy: tune get_phy_c45_ids to support more c45 phy
bnx2x: fix lockdep splat
net: fec: don't access RACC register when not available
... -
Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...
01 Jul, 2015
2 commits
-
struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used. -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews. The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.Fixes: http://tracker.ceph.com/issues/2759
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov
Reviewed-by: Josh Durgin -
Kernel sockets do not hold a reference for the network namespace to
which they point. Socket destruction broadcasting relies on the
network namespace and will cause the splat below when a kernel socket
is destroyed.This fix simply ignores kernel sockets when they are destroyed.
Reported as:
general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 9130 Comm: kworker/1:1 Not tainted 4.1.0-gelk-debug+ #1
Workqueue: sock_diag_events sock_diag_broadcast_destroy_work
Stack:
ffff8800b9c586c0 ffff8800b9c586c0 ffff8800ac4692c0 ffff8800936d4a90
ffff8800352efd38 ffffffff8469a93e ffff8800352efd98 ffffffffc09b9b90
ffff8800352efd78 ffff8800ac4692c0 ffff8800b9c586c0 ffff8800831b6ab8
Call Trace:
[] ? mutex_unlock+0xe/0x10
[] ? inet_diag_handler_get_info+0x110/0x1fb [inet_diag]
[] netlink_broadcast+0x1d/0x20
[] ? mutex_unlock+0xe/0x10
[] sock_diag_broadcast_destroy_work+0xd5/0x160
[] process_one_work+0x147/0x420
[] worker_thread+0x69/0x470
[] ? preempt_count_sub+0xa3/0xf0
[] ? rescuer_thread+0x320/0x320
[] kthread+0x107/0x120
[] ? kthread_create_on_node+0x1b0/0x1b0
[] ret_from_fork+0x3f/0x70
[] ? kthread_create_on_node+0x1b0/0x1b0Tested:
Using a debug kernel while 'ss -E' is running:
ip netns add test-ns
ip netns delete test-nsFixes: eb4cb008529c sock_diag: define destruction multicast groups
Fixes: 26abe14379f8 net: Modify sk_alloc to not reference count the
netns of kernel sockets.
Reported-by: Dave Jones
Suggested-by: Eric DumazetSigned-off-by: Craig Gallek
Signed-off-by: David S. Miller
30 Jun, 2015
2 commits
-
From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h:
bool last_piece; /* current is last piece */
In ceph_msg_data_next():
*last_piece = cursor->last_piece;
A call to ceph_msg_data_next() is followed by:
ret = ceph_tcp_sendpage(con->sock, page, page_offset,
length, last_piece);while ceph_tcp_sendpage() is:
static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
int offset, size_t size, bool more)The logic is inverted: correct it.
Signed-off-by: Benoît Canet
Reviewed-by: Alex Elder
Signed-off-by: Ilya Dryomov -
There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
ip route del [interface-specific route]
ip route add [interface-specific route]
done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
responceOn x86_64 the crash looks like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [] sctp_packet_transmit+0x63c/0x730 [sctp]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1
Hardware name: ...
task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[] [] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP: 0018:ffff880127c037b8 EFLAGS: 00010296
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
Stack:
ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
0000000000000000 0000000000000001 0000000000000000 0000000000000000
Call Trace:
[] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
[] ? sctp_transport_put+0x52/0x80 [sctp]
[] sctp_do_sm+0xb8c/0x19a0 [sctp]
[] ? trigger_load_balance+0x90/0x210
[] ? update_process_times+0x59/0x60
[] ? timerqueue_add+0x60/0xb0
[] ? enqueue_hrtimer+0x29/0xa0
[] ? read_tsc+0x9/0x10
[] ? put_page+0x55/0x60
[] ? clockevents_program_event+0x6d/0x100
[] ? skb_free_head+0x58/0x80
[] ? chksum_update+0x1b/0x27 [crc32c_generic]
[] ? crypto_shash_update+0xce/0xf0
[] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
[] sctp_inq_push+0x46/0x60 [sctp]
[] sctp_rcv+0x880/0x910 [sctp]
[] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
[] ? sctp_csum_update+0x20/0x20 [sctp]
[] ? ip_route_input_noref+0x235/0xd30
[] ? ack_ioapic_level+0x7b/0x150
[] ip_local_deliver_finish+0xae/0x210
[] ip_local_deliver+0x35/0x90
[] ip_rcv_finish+0xf5/0x370
[] ip_rcv+0x2b8/0x3a0
[] __netif_receive_skb_core+0x763/0xa50
[] __netif_receive_skb+0x18/0x60
[] netif_receive_skb_internal+0x40/0xd0
[] napi_gro_receive+0xe8/0x120
[] rtl8169_poll+0x2da/0x660 [r8169]
[] net_rx_action+0x21a/0x360
[] __do_softirq+0xe1/0x2d0
[] irq_exit+0xad/0xb0
[] do_IRQ+0x58/0xf0
[] common_interrupt+0x6d/0x6d
[] ? hrtimer_start+0x18/0x20
[] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
[] ? mwait_idle+0x60/0xa0
[] arch_cpu_idle+0xf/0x20
[] cpu_startup_entry+0x3ec/0x480
[] rest_init+0x85/0x90
[] start_kernel+0x48b/0x4ac
[] ? early_idt_handlers+0x120/0x120
[] x86_64_start_reservations+0x2a/0x2c
[] x86_64_start_kernel+0x161/0x184
Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
RIP [] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP
CR2: 0000000000000020
---[ end trace 5aec7fd2dc983574 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
---[ end Kernel panic - not syncing: Fatal exception in interruptSigned-off-by: Alexander Sverdlin
Acked-by: Neil Horman
Acked-by: Marcelo Ricardo Leitner
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller
29 Jun, 2015
6 commits
-
DSA master netdev promiscuity counter was not being properly
decremented on slave device open error path.Signed-off-by: Gilad Ben-Yossef
CC: Gilad Ben-Yossef
CC: David S. Miller
CC: Florian Fainelli
CC: Guenter Roeck
CC: Andrew Lunn
CC: Scott Feldman
Acked-by: Andrew Lunn
Acked-by: Florian Fainelli
Signed-off-by: David S. Miller -
No more users, so it can now be removed.
Signed-off-by: David S. Miller
-
Just make a ax25_sock structure that provides the ax25_cb pointer.
Signed-off-by: David S. Miller
-
net/core/flow_dissector.c: In function ‘__skb_flow_dissect’:
net/core/flow_dissector.c:132: warning: ‘ip_proto’ may be used uninitialized in this functionSigned-off-by: Geert Uytterhoeven
Acked-by: Tom Herbert
Signed-off-by: David S. Miller -
The following lockdep splat was seen due to the wrong context for
grabbing in_dev.===============================
[ INFO: suspicious RCU usage. ]
4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244 Not tainted
-------------------------------
include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by ip/403:
#0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x19
#1: ((inetaddr_chain).rwsem){.+.+.+}, at: [] __blocking_notifier_call_chain+0x35/0x6astack backtrace:
CPU: 2 PID: 403 Comm: ip Not tainted 4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244
0000000000000001 ffff8800b189b728 ffffffff8150a542 ffffffff8107a8b3
ffff880037bbea40 ffff8800b189b758 ffffffff8107cb74 ffff8800379dbd00
ffff8800bec85800 ffff8800bf9e13c0 00000000000000ff ffff8800b189b7d8
Call Trace:
[] dump_stack+0x4c/0x6e
[] ? up+0x39/0x3e
[] lockdep_rcu_suspicious+0xf7/0x100
[] fib_dump_info+0x227/0x3e2
[] rtmsg_fib+0xa6/0x116
[] fib_table_insert+0x316/0x355
[] fib_magic+0xb7/0xc7
[] fib_add_ifaddr+0xb1/0x13b
[] fib_inetaddr_event+0x36/0x90
[] notifier_call_chain+0x4c/0x71
[] __blocking_notifier_call_chain+0x4e/0x6a
[] blocking_notifier_call_chain+0x14/0x16
[] __inet_insert_ifa+0x1a5/0x1b3
[] inet_rtm_newaddr+0x350/0x35f
[] rtnetlink_rcv_msg+0x17b/0x18a
[] ? trace_hardirqs_on+0xd/0xf
[] ? netlink_deliver_tap+0x1cb/0x1f7
[] ? rtnl_newlink+0x72a/0x72a
...This patch resolves that splat.
Signed-off-by: Andy Gospodarek
Reported-by: Sergey Senozhatsky
Signed-off-by: David S. Miller -
In commit 1f66d161ab3d8b518903fa6c3f9c1f48d6919e74
("tipc: introduce starvation free send algorithm")
we introduced a counter per priority level for buffers
in the link backlog queue. We also introduced a new
function tipc_link_purge_backlog(), to reset these
counters to zero when the link is reset.Unfortunately, we missed to call this function when
the broadcast link is reset, with the result that the
values of these counters might be permanently skewed
when new nodes are attached. This may in the worst case
lead to permananent, but spurious, broadcast link
congestion, where no broadcast packets can be sent at
all.We fix this bug with this commit.
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
28 Jun, 2015
1 commit
-
Pull nfsd updates from Bruce Fields:
"A relatively quiet cycle, with a mix of cleanup and smaller bugfixes"* 'for-4.2' of git://linux-nfs.org/~bfields/linux: (24 commits)
sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()
nfsd: wrap too long lines in nfsd4_encode_read
nfsd: fput rd_file from XDR encode context
nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
nfsd: refactor nfs4_preprocess_stateid_op
nfsd: clean up raparams handling
nfsd: use swap() in sort_pacl_range()
rpcrdma: Merge svcrdma and xprtrdma modules into one
svcrdma: Add a separate "max data segs macro for svcrdma
svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL
svcrdma: Keep rpcrdma_msg fields in network byte-order
svcrdma: Fix byte-swapping in svc_rdma_sendto.c
nfsd: Update callback sequnce id only CB_SEQUENCE success
nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
svcrdma: Remove svc_rdma_xdr_decode_deferred_req()
SUNRPC: Move EXPORT_SYMBOL for svc_process
uapi/nfs: Add NFSv4.1 ACL definitions
nfsd: Remove dead declarations
nfsd: work around a gcc-5.1 warning
nfsd: Checking for acl support does not require fetching any acls
...
25 Jun, 2015
11 commits
-
ceph_tcp_sendpage already does the work of mapping/unmapping
the zero page if needed.Signed-off-by: Benoît Canet
Reviewed-by: Alex Elder
Signed-off-by: Ilya Dryomov -
Fix typo in the validation rules for flower's attributes
Signed-off-by: Jamal Hadi Salim
Signed-off-by: David S. Miller -
.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
between kernel and userspace"). This fixes a bunch of recently pulled
coding style issues and makes includes a bit cleaner.A patch "crush:Make the function crush_ln static" from Nicholas Krause
is folded in as crush_ln() has been made static
in userspace as well.Signed-off-by: Ilya Dryomov
-
Verify that the 'take' argument is a valid device or bucket.
Otherwise ignore it (do not add the value to the working vector).Reflects ceph.git commit 9324d0a1af61e1c234cc48e2175b4e6320fff8f4.
Signed-off-by: Ilya Dryomov
-
modinfo libceph prints the module name "Ceph filesystem for Linux",
which is same as the real fs module ceph. It's confusing.Signed-off-by: Hong Zhiguo
Signed-off-by: Ilya Dryomov -
- return -ETIMEDOUT instead of -EIO in case of timeout
- wait_event_interruptible_timeout() returns time left until timeout
and since it can be almost LONG_MAX we had better assign it to longSigned-off-by: Ilya Dryomov
Reviewed-by: Alex Elder -
There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.There is also a bug in the way mount_timeout=0 is handled. It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder -
This one sneaked in through vfs tree with commit 2b777c9dd9eb
("ceph_sync_read: stop poking into iov_iter guts").Signed-off-by: Ilya Dryomov
-
Signed-off-by: Yan, Zheng
Reviewed-by: Alex Elder -
Signed-off-by: Yan, Zheng
Reviewed-by: Alex Elder -
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmveth, from Thomas Falcon
3) Move away from cached routes in ipv6, just like ipv4, from Martin
KaFai Lau.4) Lots of new rhashtable tests, from Thomas Graf.
5) Run ingress qdisc lockless, from Alexei Starovoitov.
6) Allow servers to fetch TCP packet headers for SYN packets of new
connections, for fingerprinting. From Eric Dumazet.7) Add mode parameter to pktgen, for testing receive. From Alexei
Starovoitov.8) Cache access optimizations via simplifications of build_skb(), from
Alexander Duyck.9) Move page frag allocator under mm/, also from Alexander.
10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
11) Add a counter guard in case we try to perform endless reclassify
loops in the packet scheduler.12) Extern flow dissector to be programmable and use it in new "Flower"
classifier. From Jiri Pirko.13) AF_PACKET fanout rollover fixes, performance improvements, and new
statistics. From Willem de Bruijn.14) Add netdev driver for GENEVE tunnels, from John W Linville.
15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
Borkmann.18) Add tail call support to BPF, from Alexei Starovoitov.
19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
21) Favor even port numbers for allocation to connect() requests, and
odd port numbers for bind(0), in an effort to help avoid
ip_local_port_range exhaustion. From Eric Dumazet.22) Add Cavium ThunderX driver, from Sunil Goutham.
23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
from Alexei Starovoitov.24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
25) Double TCP Small Queues default to 256K to accomodate situations
like the XEN driver and wireless aggregation. From Wei Liu.26) Add more entropy inputs to flow dissector, from Tom Herbert.
27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
Jonassen.28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
29) Track and act upon link status of ipv4 route nexthops, from Andy
Gospodarek.* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
bridge: vlan: flush the dynamically learned entries on port vlan delete
bridge: multicast: add a comment to br_port_state_selection about blocking state
net: inet_diag: export IPV6_V6ONLY sockopt
stmmac: troubleshoot unexpected bits in des0 & des1
net: ipv4 sysctl option to ignore routes when nexthop link is down
net: track link-status of ipv4 nexthops
net: switchdev: ignore unsupported bridge flags
net: Cavium: Fix MAC address setting in shutdown state
drivers: net: xgene: fix for ACPI support without ACPI
ip: report the original address of ICMP messages
net/mlx5e: Prefetch skb data on RX
net/mlx5e: Pop cq outside mlx5e_get_cqe
net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
net/mlx5e: Remove extra spaces
net/mlx5e: Avoid TX CQE generation if more xmit packets expected
net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
...
24 Jun, 2015
8 commits
-
Add a new argument to br_fdb_delete_by_port which allows to specify a
vid to match when flushing entries and use it in nbp_vlan_delete() to
flush the dynamically learned entries of the vlan/port pair when removing
a vlan from a port. Before this patch only the local mac was being
removed and the dynamically learned ones were left to expire.
Note that the do_all argument is still respected and if specified, the
vid will be ignored.Signed-off-by: Nikolay Aleksandrov
Signed-off-by: David S. Miller -
Add a comment to explain why we're not disabling port's multicast when it
goes in blocking state. Since there's a check in the timer's function which
bypasses the timer if the port's in blocking/disabled state, the timer will
simply expire and stop without sending more queries.Suggested-by: Herbert Xu
Signed-off-by: Nikolay Aleksandrov
Acked-by: Herbert Xu
Signed-off-by: David S. Miller -
Conflicts:
drivers/net/ethernet/mellanox/mlx4/main.c
net/packet/af_packet.cBoth conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller
-
For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is
exported to userspace. It indicates whether a socket bound to in6addr_any
listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not
listed by e.g. 'ss -l -4'.This patch is accompanied by an appropriate one for iproute2 to enable
the additional information in 'ss -e'.Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller -
This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup. This will signal to userspace that the route will not be
selected. The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down. This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected. The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache
80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15
cacheWhile the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down. Now interface p8p1 is linked-up:default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache
80.0.0.2 dev p8p1 src 80.0.0.1
cacheand the output changes to what one would expect.
If the sysctl is not set, the following output would be expected when
p8p1 is down:default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set. Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.v4: Drop binary sysctl
v5: Whitespace fixups from Dave
v6: Style changes from Dave and checkpatch suggestions
v7: One more checkpatch fixup
Signed-off-by: Andy Gospodarek
Signed-off-by: Dinesh Dutt
Acked-by: Scott Feldman
Signed-off-by: David S. Miller -
Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off. No action is taken,
but additional flags are passed to userspace to indicate carrier status.This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.v2: Split out kernel functionality into 2 patches, this patch simply
sets and clears new nexthop flag RTNH_F_LINKDOWN.v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.v5: Whitespace and variable declaration fixups suggested by Dave.
v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
all.Signed-off-by: Andy Gospodarek
Signed-off-by: Dinesh Dutt
Acked-by: Scott Feldman
Signed-off-by: David S. Miller -
switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS
attributes, but a driver doesn't need to implement this in order to get
bridge link information.So error out only on errors different than -EOPNOTSUPP.
(This is a follow-up patch for 7d4f8d8.)
Fixes: 8793d0a664a8 ("switchdev: add new switchdev_port_bridge_getlink")
Signed-off-by: Vivien Didelot
Acked-by: Jiri Pirko
Acked-by: Scott Feldman
Signed-off-by: David S. Miller -
ICMP messages can trigger ICMP and local errors. In this case
serr->port is 0 and starting from Linux 4.0 we do not return
the original target address to the error queue readers.
Add function to define which errors provide addr_offset.
With this fix my ping command is not silent anymore.Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling")
Signed-off-by: Julian Anastasov
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller