Eric Lee / smarc-fsl-linux-kernel

05 May, 2015

1 commit

9507271d9 svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures ... Browse Code »

In an environment where the KDC is running Active Directory, the
exported composite name field returned in the context could be large
enough to span a page boundary. Attaching a scratch buffer to the
decoding xdr_stream helps deal with those cases.

The case where we saw this was actually due to behavior that's been
fixed in newer gss-proxy versions, but we're fixing it here too.

Signed-off-by: Scott Mayhew
Cc: stable@vger.kernel.org
Reviewed-by: Simo Sorce
Signed-off-by: J. Bruce Fields

Scott Mayhew
2015-05-05 00:02:40 +0800

02 May, 2015

1 commit

a134f083e ipv4: Missing sk_nulls_node_init() in ping_unhash(). ... Browse Code »

If we don't do that, then the poison value is left in the ->pprev
backlink.

This can cause crashes if we do a disconnect, followed by a connect().

Tested-by: Linus Torvalds
Reported-by: Wen Xu
Signed-off-by: David S. Miller

David S. Miller
2015-05-02 10:02:47 +0800

30 Apr, 2015

7 commits

50d4964f1 net: dsa: Fix scope of eeprom-length property ... Browse Code »

eeprom-length is a switch property, not a dsa property, and thus
needs to be attached to the switch node, not to the dsa node.

Reported-by: Andrew Lunn
Fixes: 6793abb4e849 ("net: dsa: Add support for switch EEPROM access")
Signed-off-by: Guenter Roeck
Acked-by: Andrew Lunn
Signed-off-by: David S. Miller

Guenter Roeck
2015-04-30 03:35:04 +0800
0d699f28e tipc: fix problem with parallel link synchronization mechanism ... Browse Code »

Currently, we try to accumulate arrived packets in the links's
'deferred' queue during the parallel link syncronization phase.

This entails two problems:

- With an unlucky combination of arriving packets the algorithm
may go into a lockstep with the out-of-sequence handling function,
where the synch mechanism is adding a packet to the deferred queue,
while the out-of-sequence handling is retrieving it again, thus
ending up in a loop inside the node_lock scope.

- Even if this is avoided, the link will very often send out
unnecessary protocol messages, in the worst case leading to
redundant retransmissions.

We fix this by just dropping arriving packets on the upcoming link
during the synchronization phase, thus relying on the retransmission
protocol to resolve the situation once the two links have arrived to
a synchronized state.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-04-30 03:08:59 +0800
f2f67390a tipc: remove wrong use of NLM_F_MULTI ... Browse Code »

NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.

Libraries like libnl will wait forever for NLMSG_DONE.

Fixes: 35b9dd7607f0 ("tipc: add bearer get/dump to new netlink api")
Fixes: 7be57fc69184 ("tipc: add link get/dump to new netlink api")
Fixes: 46f15c6794fb ("tipc: add media get/dump to new netlink api")
CC: Richard Alpe
CC: Jon Maloy
CC: Ying Xue
CC: tipc-discussion@lists.sourceforge.net
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2015-04-30 02:59:17 +0800
46c264daa bridge/nl: remove wrong use of NLM_F_MULTI ... Browse Code »

NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.

Libraries like libnl will wait forever for NLMSG_DONE.

Fixes: e5a55a898720 ("net: create generic bridge ops")
Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
CC: John Fastabend
CC: Sathya Perla
CC: Subbu Seetharaman
CC: Ajit Khaparde
CC: Jeff Kirsher
CC: intel-wired-lan@lists.osuosl.org
CC: Jiri Pirko
CC: Scott Feldman
CC: Stephen Hemminger
CC: bridge@lists.linux-foundation.org
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2015-04-30 02:59:16 +0800
821996795 bridge/mdb: remove wrong use of NLM_F_MULTI ... Browse Code »

NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.

Libraries like libnl will wait forever for NLMSG_DONE.

Fixes: 37a393bc4932 ("bridge: notify mdb changes via netlink")
CC: Cong Wang
CC: Stephen Hemminger
CC: bridge@lists.linux-foundation.org
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2015-04-30 02:59:16 +0800
2b70fe5ab net: sched: act_connmark: don't zap skb->nfct ... Browse Code »

This action is meant to be passive, i.e. we should not alter
skb->nfct: If nfct is present just leave it alone.

Compile tested only.

Cc: Jamal Hadi Salim
Signed-off-by: Florian Westphal
Acked-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Florian Westphal
2015-04-30 02:56:40 +0800
cb6ccf09d route: Use ipv4_mtu instead of raw rt_pmtu ... Browse Code »

The commit 3cdaa5be9e81a914e633a6be7b7d2ef75b528562 ("ipv4: Don't
increase PMTU with Datagram Too Big message") broke PMTU in cases
where the rt_pmtu value has expired but is smaller than the new
PMTU value.

This obsolete rt_pmtu then prevents the new PMTU value from being
installed.

Fixes: 3cdaa5be9e81 ("ipv4: Don't increase PMTU with Datagram Too Big message")
Reported-by: Gerd v. Egidy
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-04-30 02:43:22 +0800

28 Apr, 2015

3 commits

39376ccb1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix a crash in nf_tables when dictionaries are used from the ruleset,
due to memory corruption, from Florian Westphal.

2) Fix another crash in nf_queue when used with br_netfilter. Also from
Florian.

Both fixes are related to new stuff that got in 4.0-rc.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-04-28 11:12:34 +0800
2decb2682 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) mlx4 doesn't check fully for supported valid RSS hash function, fix
from Amir Vadai

2) Off by one in ibmveth_change_mtu(), from David Gibson

3) Prevent altera chip from reporting false error interrupts in some
circumstances, from Chee Nouk Phoon

4) Get rid of that stupid endless loop trying to allocate a FIN packet
in TCP, and in the process kill deadlocks. From Eric Dumazet

5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from
Eric Dumazet

6) Fix two bugs in async rhashtable resizing, from Thomas Graf

7) Fix topology server listener socket namespace bug in TIPC, from Ying
Xue

8) Add some missing HAS_DMA kconfig dependencies, from Geert
Uytterhoeven

9) bgmac driver intends to force re-polling but does so by returning
the wrong value from it's ->poll() handler. Fix from Rafał Miłecki

10) When the creater of an rhashtable configures a max size for it,
don't bark in the logs and drop insertions when that is exceeded.
Fix from Johannes Berg

11) Recover from out of order packets in ppp mppe properly, from Sylvain
Rochet

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
bnx2x: really disable TPA if 'disable_tpa' option is set
net:treewide: Fix typo in drivers/net
net/mlx4_en: Prevent setting invalid RSS hash function
mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions
netfilter; Add some missing default cases to switch statements in nft_reject.
ppp: mppe: discard late packet in stateless mode
ppp: mppe: sanity error path rework
net/bonding: Make DRV macros private
net: rfs: fix crash in get_rps_cpus()
altera tse: add support for fixed-links.
pxa168: fix double deallocation of managed resources
net: fix crash in build_skb()
net: eth: altera: Resolve false errors from MSGDMA to TSE
ehea: Fix memory hook reference counting crashes
net/tg3: Release IRQs on permanent error
net: mdio-gpio: support access that may sleep
inet: fix possible panic in reqsk_queue_unlink()
rhashtable: don't attempt to grow when at max_size
bgmac: fix requests for extra polling calls from NAPI
tcp: avoid looping in tcp_send_fin()
...

Linus Torvalds
2015-04-28 05:05:19 +0800
129d23a56 netfilter; Add some missing default cases to switch statements in nft_reject. ... Browse Code »

This fixes:

====================
net/netfilter/nft_reject.c: In function ‘nft_reject_dump’:
net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswitch]
switch (priv->type) {
^
net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_ICMPX_UNREACH’ not handled in switch [-Wswi\
tch]
net/netfilter/nft_reject_inet.c: In function ‘nft_reject_inet_dump’:
net/netfilter/nft_reject_inet.c:105:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswi\
tch]
switch (priv->type) {
^
====================

Signed-off-by: David S. Miller

David S. Miller
2015-04-28 01:20:34 +0800

27 Apr, 2015

3 commits

59953fba8 Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Another set of mainly bugfixes and a couple of cleanups. No new
functionality in this round.

Highlights include:

Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping

Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned"
mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()

Features:
- Various improvements to the RDMA transport code's handling of
memory registration
- Various code cleanups"

* tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
fs/nfs: fix new compiler warning about boolean in switch
nfs: Remove unneeded casts in nfs
NFS: Don't attempt to decode missing directory entries
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
NFS: Rename idmap.c to nfs4idmap.c
NFS: Move nfs_idmap.h into fs/nfs/
NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
NFS: Add a stub for GETDEVICELIST
nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
nfs: fix DIO good bytes calculation
nfs: Fetch MOUNTED_ON_FILEID when updating an inode
sunrpc: make debugfs file creation failure non-fatal
nfs: fix high load average due to callback thread sleeping
NFS: Reduce time spent holding the i_mutex during fallocate()
NFS: Don't zap caches on fallocate()
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Add "open" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
...

Linus Torvalds
2015-04-27 08:33:59 +0800
9ec3a646f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only

Linus Torvalds
2015-04-27 08:22:07 +0800
a31196b07 net: rfs: fix crash in get_rps_cpus() ... Browse Code »

Commit 567e4b79731c ("net: rfs: add hash collision detection") had one
mistake :

RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu()
and get_rps_cpu(), as @next_cpu was the result of an AND with
rps_cpu_mask

This bug showed up on a host with 72 cpus :
next_cpu was 0x7f, and the code was trying to access percpu data of an
non existent cpu.

In a follow up patch, we might get rid of compares against nr_cpu_ids,
if we init the tables with 0. This is silly to test for a very unlikely
condition that exists only shortly after table initialization, as
we got rid of rps_reset_sock_flow() and similar functions that were
writing this RPS_NO_CPU magic value at flow dismantle : When table is
old enough, it never contains this value anymore.

Fixes: 567e4b79731c ("net: rfs: add hash collision detection")
Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Cc: Ben Hutchings
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-27 04:07:57 +0800

26 Apr, 2015

1 commit

2ea2f62c8 net: fix crash in build_skb() ... Browse Code »

When I added pfmemalloc support in build_skb(), I forgot netlink
was using build_skb() with a vmalloc() area.

In this patch I introduce __build_skb() for netlink use,
and build_skb() is a wrapper handling both skb->head_frag and
skb->pfmemalloc

This means netlink no longer has to hack skb->head_frag

[ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26!
[ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[ 1567.700067] Dumping ftrace buffer:
[ 1567.700067] (ftrace buffer empty)
[ 1567.700067] Modules linked in:
[ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167
[ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000
[ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3))
[ 1567.700067] RSP: 0018:ffff8802467779d8 EFLAGS: 00010202
[ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c
[ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049
[ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000
[ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000
[ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000
[ 1567.700067] FS: 00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000
[ 1567.700067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0
[ 1567.700067] Stack:
[ 1567.700067] ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000
[ 1567.700067] ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08
[ 1567.700067] ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821
[ 1567.700067] Call Trace:
[ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316)
[ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329)
[ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311)
[ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
[ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
[ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623)
[ 1567.774369] sock_write_iter (net/socket.c:823)
[ 1567.774369] ? sock_sendmsg (net/socket.c:806)
[ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491)
[ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249)
[ 1567.774369] ? default_llseek (fs/read_write.c:487)
[ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701)
[ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4))
[ 1567.774369] vfs_write (fs/read_write.c:539)
[ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
[ 1567.774369] ? SyS_read (fs/read_write.c:577)
[ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636)
[ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
[ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261)

Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
Signed-off-by: Eric Dumazet
Reported-by: Sasha Levin
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-26 03:49:49 +0800

25 Apr, 2015

1 commit

4c4ed0748 netfilter: nf_tables: fix wrong length for jump/goto verdicts ... Browse Code »

NFT_JUMP/GOTO erronously sets length to sizeof(void *).

We then allocate insufficient memory when such element is added to a vmap.

Suggested-by: Patrick McHardy
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2015-04-25 02:51:23 +0800

24 Apr, 2015

6 commits

b357a364c inet: fix possible panic in reqsk_queue_unlink() ... Browse Code »

[ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
0000000000000080
[ 3897.931025] IP: [] reqsk_timer_handler+0x1a6/0x243

There is a race when reqsk_timer_handler() and tcp_check_req() call
inet_csk_reqsk_queue_unlink() on the same req at the same time.

Before commit fa76ce7328b2 ("inet: get rid of central tcp/dccp listener
timer"), listener spinlock was held and race could not happen.

To solve this bug, we change reqsk_queue_unlink() to not assume req
must be found, and we return a status, to conditionally release a
refcount on the request sock.

This also means tcp_check_req() in non fastopen case might or not
consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have
to properly handle this.

(Same remark for dccp_check_req() and its callers)

inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
called 4 times in tcp and 3 times in dccp.

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet
Reported-by: Yuchung Cheng
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-24 23:39:15 +0800
845704a53 tcp: avoid looping in tcp_send_fin() ... Browse Code »

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-24 23:06:48 +0800
f139b6c67 Merge tag 'nfs-rdma-for-4.1-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma ... Browse Code »

NFS: NFSoRDMA Client Changes

This patch series creates an operation vector for each of the different
memory registration modes. This should make it easier to one day increase
credit limit, rsize, and wsize.

Signed-off-by: Anna Schumaker

Trond Myklebust
2015-04-24 03:16:37 +0800
21330b667 Merge branch 'bugfixes' ... Browse Code »

* bugfixes:
NFSv4: Return delegations synchronously in evict_inode
SUNRPC: Fix a regression when reconnecting
NFS: remount with security change should return EINVAL
nfs: do not export discarded symbols
NFSv4.1: don't export static symbol

Trond Myklebust
2015-04-24 03:16:27 +0800
3f9400981 sunrpc: make debugfs file creation failure non-fatal ... Browse Code »

v2: gracefully handle the case where some dentry pointers end up NULL
and be more dilligent about zeroing out dentry pointers

We currently have a problem that SELinux policy is being enforced when
creating debugfs files. If a debugfs file is created as a side effect of
doing some syscall, then that creation can fail if the SELinux policy
for that process prevents it.

This seems wrong. We don't do that for files under /proc, for instance,
so Bruce has proposed a patch to fix that.

While discussing that patch however, Greg K.H. stated:

"No kernel code should care / fail if a debugfs function fails, so
please fix up the sunrpc code first."

This patch converts all of the sunrpc debugfs setup code to be void
return functins, and the callers to not look for errors from those
functions.

This should allow rpc_clnt and rpc_xprt creation to work, even if the
kernel fails to create debugfs files for some reason.

Cc: Greg Kroah-Hartman
Acked-by: "J. Bruce Fields"
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2015-04-24 02:42:27 +0800
d1ab39f17 net: unix: garbage: fixed several comment and whitespace style issues ... Browse Code »

fixed several comment and whitespace style issues

Signed-off-by: Jason Eastman
Signed-off-by: David S. Miller

Jason Eastman
2015-04-24 01:15:20 +0800

23 Apr, 2015

10 commits

73a317377 tipc: fix node refcount issue ... Browse Code »

When link statistics is dumped over netlink, we iterate over
the list of peer nodes and append each links statistics to
the netlink msg. In the case where the dump is resumed after
filling up a nlmsg, the node refcnt is decremented without
having been incremented previously which may cause the node
reference to be freed. When this happens, the following
info/stacktrace will be generated, followed by a crash or
undefined behavior.
We fix this by removing the erroneous call to tipc_node_put
inside the loop that iterates over nodes.

[ 384.312303] INFO: trying to register non-static key.
[ 384.313110] the code is fine but needs lockdep annotation.
[ 384.313290] turning off the locking correctness validator.
[ 384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13
[ 384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 384.313290] ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007
[ 384.313290] ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200
[ 384.313290] ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001
[ 384.313290] Call Trace:
[ 384.313290] [] dump_stack+0x4c/0x65
[ 384.313290] [] __lock_acquire+0xf3d/0xf50
[ 384.313290] [] lock_acquire+0xd5/0x290
[ 384.313290] [] ? link_timeout+0x1c/0x170 [tipc]
[ 384.313290] [] ? link_state_event+0x4e0/0x4e0 [tipc]
[ 384.313290] [] _raw_spin_lock_bh+0x40/0x80
[ 384.313290] [] ? link_timeout+0x1c/0x170 [tipc]
[ 384.313290] [] link_timeout+0x1c/0x170 [tipc]
[ 384.313290] [] call_timer_fn+0xb8/0x490
[ 384.313290] [] ? process_timeout+0x10/0x10
[ 384.313290] [] run_timer_softirq+0x21c/0x420
[ 384.313290] [] ? link_state_event+0x4e0/0x4e0 [tipc]
[ 384.313290] [] __do_softirq+0xf4/0x630
[ 384.313290] [] irq_exit+0x5d/0x60
[ 384.313290] [] smp_apic_timer_interrupt+0x41/0x50
[ 384.313290] [] apic_timer_interrupt+0x70/0x80
[ 384.313290] [] ? default_idle+0x20/0x210
[ 384.313290] [] ? default_idle+0x1e/0x210
[ 384.313290] [] arch_cpu_idle+0xa/0x10
[ 384.313290] [] cpu_startup_entry+0x2c3/0x530
[ 384.313290] [] ? clockevents_register_device+0x113/0x200
[ 384.313290] [] start_secondary+0x13f/0x170

Fixes: 8a0f6ebe8494 ("tipc: involve reference counter for node structure")
Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Erik Hugne
2015-04-23 23:50:34 +0800
9871b27f6 tipc: fix random link reset problem ... Browse Code »

In the function tipc_sk_rcv(), the stack variable 'err'
is only initialized to TIPC_ERR_NO_PORT for the first
iteration over the link input queue. If a chain of messages
are received from a link, failure to lookup the socket for
any but the first message will cause the message to bounce back
out on a random link.
We fix this by properly initializing err.

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Erik Hugne
2015-04-23 23:50:34 +0800
def81f69b tipc: fix topology server broken issue ... Browse Code »

When a new topology server is launched in a new namespace, its
listening socket is inserted into the "init ns" namespace's socket
hash table rather than the one owned by the new namespace. Although
the socket's namespace is forcedly changed to the new namespace later,
the socket is still stored in the socket hash table of "init ns"
namespace. When a client created in the new namespace connects
its own topology server, the connection is failed as its server's
socket could not be found from its own namespace's socket table.

If __sock_create() instead of original sock_create_kern() is used
to create the server's socket through specifying an expected namesapce,
the socket will be inserted into the specified namespace's socket
table, thereby avoiding to the topology server broken issue.

Fixes: 76100a8a64bc ("tipc: fix netns refcnt leak")

Reported-by: Erik Hugne
Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-04-23 23:50:34 +0800
79930f589 net: do not deplete pfmemalloc reserve ... Browse Code »

build_skb() should look at the page pfmemalloc status.
If set, this means page allocator allocated this page in the
expectation it would help to free other pages. Networking
stack can do that only if skb->pfmemalloc is also set.

Also, we must refrain using high order pages from the pfmemalloc
reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for
them. Under memory pressure, using order-0 pages is probably the best
strategy.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-23 04:24:59 +0800
26349c71b ip6_gre: use netdev_alloc_pcpu_stats() ... Browse Code »

The code there just open-codes the same, so use the provided macro instead.

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2015-04-23 03:39:05 +0800
1204c4644 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph updates from Sage Weil:
"This time around we have a collection of CephFS fixes from Zheng
around MDS failure handling and snapshots, support for a new CRUSH
straw2 algorithm (to sync up with userspace) and several RBD cleanups
and fixes from Ilya, an error path leak fix from Taesoo, and then an
assorted collection of cleanups from others"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits)
rbd: rbd_wq comment is obsolete
libceph: announce support for straw2 buckets
crush: straw2 bucket type with an efficient 64-bit crush_ln()
crush: ensuring at most num-rep osds are selected
crush: drop unnecessary include from mapper.c
ceph: fix uninline data function
ceph: rename snapshot support
ceph: fix null pointer dereference in send_mds_reconnect()
ceph: hold on to exclusive caps on complete directories
libceph: simplify our debugfs attr macro
ceph: show non-default options only
libceph: expose client options through debugfs
libceph, ceph: split ceph_show_options()
rbd: mark block queue as non-rotational
libceph: don't overwrite specific con error msgs
ceph: cleanup unsafe requests when reconnecting is denied
ceph: don't zero i_wrbuffer_ref when reconnecting is denied
ceph: don't mark dirty caps when there is no auth cap
ceph: keep i_snap_realm while there are writers
libceph: osdmap.h: Add missing format newlines
...

Linus Torvalds
2015-04-23 02:30:10 +0800
5a9ab0176 mpls: Prevent use of implicit NULL label as outgoing label ... Browse Code »

The reserved implicit-NULL label isn't allowed to appear in the label
stack for packets, so make it an error for the control plane to
specify it as an outgoing label.

Suggested-by: "Eric W. Biederman"
Signed-off-by: Robert Shearman
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Robert Shearman
2015-04-23 02:24:54 +0800
37bde7997 mpls: Per-device enabling of packet input ... Browse Code »

An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, the default state is changed to MPLS being
disabled on all interfaces unless explicitly enabled and no global
option is provided to change the default. Whilst this differs from
other protocols (e.g. IPv6), network operators are used to explicitly
enabling MPLS forwarding on interfaces, and with the number of links
to the MPLS core typically fairly low this doesn't present too much of
a burden on operators.

Cc: "Eric W. Biederman"
Signed-off-by: Robert Shearman
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Robert Shearman
2015-04-23 02:24:54 +0800
03c57747a mpls: Per-device MPLS state ... Browse Code »

Add per-device MPLS state to supported interfaces. Use the presence of
this state in mpls_route_add to determine that this is a supported
interface.

Use the presence of mpls_dev to drop packets that arrived on an
unsupported interface - previously they were allowed through.

Cc: "Eric W. Biederman"
Signed-off-by: Robert Shearman
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Robert Shearman
2015-04-23 02:24:54 +0800
d83769a58 tcp: fix possible deadlock in tcp_send_fin() ... Browse Code »

Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
case a huge process is killed by OOM, and tcp_mem[2] is hit.

To be able to free memory we need to make progress, so this
patch allows FIN packets to not care about tcp_mem[2], if
skb allocation succeeded.

In a follow-up patch, we might abort tcp_send_fin() infinite loop
in case TIF_MEMDIE is set on this thread, as memory allocator
did its best getting extra memory already.

This patch reverts d22e15371811 ("tcp: fix tcp fin memory accounting")

Fixes: d22e15371811 ("tcp: fix tcp fin memory accounting")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-23 02:13:11 +0800

22 Apr, 2015

5 commits

958a27658 crush: straw2 bucket type with an efficient 64-bit crush_ln() ... Browse Code »

This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed. Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated. The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
(with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
32a1ead92efcd351822d22a5fc37d159c65c1338,
6289912418c4a3597a11778bcf29ed5415117ad9,
35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
b5921d55d16796e12d66ad2c4add7305f9ce2353.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:43 +0800
45002267e crush: ensuring at most num-rep osds are selected ... Browse Code »

Crush temporary buffers are allocated as per replica size configured
by the user. When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash. Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.

Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
234b066ba04976783d15ff2abc3e81b6cc06fb10.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:42 +0800
9be6df215 crush: drop unnecessary include from mapper.c ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:42 +0800
8aaa51b63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"Just a few fixes trickling in at this point.

1) If we see an attached socket on an skb in the ipv4 forwarding path,
bail. This can happen due to races with FIB rule addition, and
deletion, and we should just drop such frames. From Sebastian
Pöhn.

2) pppoe receive should only accept packets destined for this hosts's
MAC address. From Joakim Tjernlund.

3) Handle checksum unwrapping properly in ppp receive properly when
it's encapsulated in UDP in some way, fix from Tom Herbert.

4) Fix some bugs in mv88e6xxx DSA driver resulting from the conversion
from register offset constants to mnenomic macros. From Vivien
Didelot.

5) Fix handling of HCA max message size in mlx4 adapters, from Eran
Ben ELisha"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP
tcp: add memory barriers to write space paths
altera tse: Error-Bit on tx-avalon-stream always set.
net: dsa: mv88e6xxx: use PORT_DEFAULT_VLAN
net: dsa: mv88e6xxx: fix setup of port control 1
ppp: call skb_checksum_complete_unset in ppp_receive_frame
net: add skb_checksum_complete_unset
pppoe: Lacks DST MAC address check
ip_forward: Drop frames with attached skb->sk

Linus Torvalds
2015-04-22 13:37:27 +0800
3c7151275 tcp: add memory barriers to write space paths ... Browse Code »

Ensure that we either see that the buffer has write space
in tcp_poll() or that we perform a wakeup from the input
side. Did not run into any actual problem here, but thought
that we should make things explicit.

Signed-off-by: Jason Baron
Signed-off-by: David S. Miller

jbaron@akamai.com
2015-04-22 03:57:34 +0800

21 Apr, 2015

1 commit

2ab957492 ip_forward: Drop frames with attached skb->sk ... Browse Code »

Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment

Signed-off-by: Sebastian Poehn
Signed-off-by: David S. Miller

Sebastian Pöhn
2015-04-21 02:07:33 +0800

20 Apr, 2015

1 commit

5cf7bd301 libceph: expose client options through debugfs ... Browse Code »

Add a client_options attribute for showing libceph options.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-20 23:55:39 +0800