Eric Lee / smarc-fsl-linux-kernel

27 Apr, 2015

2 commits

59953fba8 Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Another set of mainly bugfixes and a couple of cleanups. No new
functionality in this round.

Highlights include:

Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping

Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned"
mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()

Features:
- Various improvements to the RDMA transport code's handling of
memory registration
- Various code cleanups"

* tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
fs/nfs: fix new compiler warning about boolean in switch
nfs: Remove unneeded casts in nfs
NFS: Don't attempt to decode missing directory entries
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
NFS: Rename idmap.c to nfs4idmap.c
NFS: Move nfs_idmap.h into fs/nfs/
NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
NFS: Add a stub for GETDEVICELIST
nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
nfs: fix DIO good bytes calculation
nfs: Fetch MOUNTED_ON_FILEID when updating an inode
sunrpc: make debugfs file creation failure non-fatal
nfs: fix high load average due to callback thread sleeping
NFS: Reduce time spent holding the i_mutex during fallocate()
NFS: Don't zap caches on fallocate()
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Add "open" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
...

Linus Torvalds
2015-04-27 08:33:59 +0800
9ec3a646f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only

Linus Torvalds
2015-04-27 08:22:07 +0800

24 Apr, 2015

3 commits

f139b6c67 Merge tag 'nfs-rdma-for-4.1-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma ... Browse Code »

NFS: NFSoRDMA Client Changes

This patch series creates an operation vector for each of the different
memory registration modes. This should make it easier to one day increase
credit limit, rsize, and wsize.

Signed-off-by: Anna Schumaker

Trond Myklebust
2015-04-24 03:16:37 +0800
21330b667 Merge branch 'bugfixes' ... Browse Code »

* bugfixes:
NFSv4: Return delegations synchronously in evict_inode
SUNRPC: Fix a regression when reconnecting
NFS: remount with security change should return EINVAL
nfs: do not export discarded symbols
NFSv4.1: don't export static symbol

Trond Myklebust
2015-04-24 03:16:27 +0800
3f9400981 sunrpc: make debugfs file creation failure non-fatal ... Browse Code »

v2: gracefully handle the case where some dentry pointers end up NULL
and be more dilligent about zeroing out dentry pointers

We currently have a problem that SELinux policy is being enforced when
creating debugfs files. If a debugfs file is created as a side effect of
doing some syscall, then that creation can fail if the SELinux policy
for that process prevents it.

This seems wrong. We don't do that for files under /proc, for instance,
so Bruce has proposed a patch to fix that.

While discussing that patch however, Greg K.H. stated:

"No kernel code should care / fail if a debugfs function fails, so
please fix up the sunrpc code first."

This patch converts all of the sunrpc debugfs setup code to be void
return functins, and the callers to not look for errors from those
functions.

This should allow rpc_clnt and rpc_xprt creation to work, even if the
kernel fails to create debugfs files for some reason.

Cc: Greg Kroah-Hartman
Acked-by: "J. Bruce Fields"
Signed-off-by: Jeff Layton
Signed-off-by: Trond Myklebust

Jeff Layton
2015-04-24 02:42:27 +0800

23 Apr, 2015

1 commit

1204c4644 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph updates from Sage Weil:
"This time around we have a collection of CephFS fixes from Zheng
around MDS failure handling and snapshots, support for a new CRUSH
straw2 algorithm (to sync up with userspace) and several RBD cleanups
and fixes from Ilya, an error path leak fix from Taesoo, and then an
assorted collection of cleanups from others"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits)
rbd: rbd_wq comment is obsolete
libceph: announce support for straw2 buckets
crush: straw2 bucket type with an efficient 64-bit crush_ln()
crush: ensuring at most num-rep osds are selected
crush: drop unnecessary include from mapper.c
ceph: fix uninline data function
ceph: rename snapshot support
ceph: fix null pointer dereference in send_mds_reconnect()
ceph: hold on to exclusive caps on complete directories
libceph: simplify our debugfs attr macro
ceph: show non-default options only
libceph: expose client options through debugfs
libceph, ceph: split ceph_show_options()
rbd: mark block queue as non-rotational
libceph: don't overwrite specific con error msgs
ceph: cleanup unsafe requests when reconnecting is denied
ceph: don't zero i_wrbuffer_ref when reconnecting is denied
ceph: don't mark dirty caps when there is no auth cap
ceph: keep i_snap_realm while there are writers
libceph: osdmap.h: Add missing format newlines
...

Linus Torvalds
2015-04-23 02:30:10 +0800

22 Apr, 2015

5 commits

958a27658 crush: straw2 bucket type with an efficient 64-bit crush_ln() ... Browse Code »

This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed. Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated. The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
(with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
32a1ead92efcd351822d22a5fc37d159c65c1338,
6289912418c4a3597a11778bcf29ed5415117ad9,
35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
b5921d55d16796e12d66ad2c4add7305f9ce2353.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:43 +0800
45002267e crush: ensuring at most num-rep osds are selected ... Browse Code »

Crush temporary buffers are allocated as per replica size configured
by the user. When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash. Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.

Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
234b066ba04976783d15ff2abc3e81b6cc06fb10.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:42 +0800
9be6df215 crush: drop unnecessary include from mapper.c ... Browse Code »

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-22 23:33:42 +0800
8aaa51b63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"Just a few fixes trickling in at this point.

1) If we see an attached socket on an skb in the ipv4 forwarding path,
bail. This can happen due to races with FIB rule addition, and
deletion, and we should just drop such frames. From Sebastian
Pöhn.

2) pppoe receive should only accept packets destined for this hosts's
MAC address. From Joakim Tjernlund.

3) Handle checksum unwrapping properly in ppp receive properly when
it's encapsulated in UDP in some way, fix from Tom Herbert.

4) Fix some bugs in mv88e6xxx DSA driver resulting from the conversion
from register offset constants to mnenomic macros. From Vivien
Didelot.

5) Fix handling of HCA max message size in mlx4 adapters, from Eran
Ben ELisha"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP
tcp: add memory barriers to write space paths
altera tse: Error-Bit on tx-avalon-stream always set.
net: dsa: mv88e6xxx: use PORT_DEFAULT_VLAN
net: dsa: mv88e6xxx: fix setup of port control 1
ppp: call skb_checksum_complete_unset in ppp_receive_frame
net: add skb_checksum_complete_unset
pppoe: Lacks DST MAC address check
ip_forward: Drop frames with attached skb->sk

Linus Torvalds
2015-04-22 13:37:27 +0800
3c7151275 tcp: add memory barriers to write space paths ... Browse Code »

Ensure that we either see that the buffer has write space
in tcp_poll() or that we perform a wakeup from the input
side. Did not run into any actual problem here, but thought
that we should make things explicit.

Signed-off-by: Jason Baron
Signed-off-by: David S. Miller

jbaron@akamai.com
2015-04-22 03:57:34 +0800

21 Apr, 2015

1 commit

2ab957492 ip_forward: Drop frames with attached skb->sk ... Browse Code »

Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment

Signed-off-by: Sebastian Poehn
Signed-off-by: David S. Miller

Sebastian Pöhn
2015-04-21 02:07:33 +0800

20 Apr, 2015

3 commits

5cf7bd301 libceph: expose client options through debugfs ... Browse Code »

Add a client_options attribute for showing libceph options.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-20 23:55:39 +0800
ff40f9ae9 libceph, ceph: split ceph_show_options() ... Browse Code »

Split ceph_show_options() into two pieces and move the piece
responsible for printing client (libceph) options into net/ceph. This
way people adding a libceph option wouldn't have to remember to update
code in fs/ceph.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-20 23:55:38 +0800
67c64eb74 libceph: don't overwrite specific con error msgs ... Browse Code »

- specific con->error_msg messages (e.g. "protocol version mismatch")
end up getting overwritten by a catch-all "socket error on read
/ write", introduced in commit 3a140a0d5c4b ("libceph: report socket
read/write error message")
- "bad message sequence # for incoming message" loses to "bad crc" due
to the fact that -EBADMSG is used for both

Fix it, and tidy up con->error_msg assignments and pr_errs while at it.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2015-04-20 23:55:37 +0800

19 Apr, 2015

1 commit

dba94f215 Merge tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs ... Browse Code »

Pull 9pfs updates from Eric Van Hensbergen:
"Some accumulated cleanup patches for kerneldoc and unused variables as
well as some lock bug fixes and adding privateport option for RDMA"

* tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
net/9p: add a privport option for RDMA transport.
fs/9p: Initialize status in v9fs_file_do_lock.
net/9p: Initialize opts->privport as it should be.
net/9p: use memcpy() instead of snprintf() in p9_mount_tag_show()
9p: use unsigned integers for nwqid/count
9p: do not crash on unknown lock status code
9p: fix error handling in v9fs_file_do_lock
9p: remove unused variable in p9_fd_create()
9p: kerneldoc warning fixes

Linus Torvalds
2015-04-19 05:45:30 +0800

18 Apr, 2015

7 commits

1f5014d6a Bluetooth: hidp: Fix regression with older userspace and flags validation ... Browse Code »

While it is not used by newer userspace anymore, the older userspace was
utilizing HIDP_VIRTUAL_CABLE_UNPLUG and HIDP_BOOT_PROTOCOL_MODE flags
when adding a new HIDP connection.

The flags validation is important, but we can not break older userspace
and with that allow providing these flags even if newer userspace does
not use them anymore.

Reported-and-tested-by: Jörg Otte
Signed-off-by: Marcel Holtmann
Signed-off-by: Linus Torvalds

Marcel Holtmann
2015-04-18 23:01:08 +0800
388f99762 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix verifier memory corruption and other bugs in BPF layer, from
Alexei Starovoitov.

2) Add a conservative fix for doing BPF properly in the BPF classifier
of the packet scheduler on ingress. Also from Alexei.

3) The SKB scrubber should not clear out the packet MARK and security
label, from Herbert Xu.

4) Fix oops on rmmod in stmmac driver, from Bryan O'Donoghue.

5) Pause handling is not correct in the stmmac driver because it
doesn't take into consideration the RX and TX fifo sizes. From
Vince Bridgers.

6) Failure path missing unlock in FOU driver, from Wang Cong.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: dsa: use DEVICE_ATTR_RW to declare temp1_max
netns: remove BUG_ONs from net_generic()
IB/ipoib: Fix ndo_get_iflink
sfc: Fix memcpy() with const destination compiler warning.
altera tse: Fix network-delays and -retransmissions after high throughput.
net: remove unused 'dev' argument from netif_needs_gso()
act_mirred: Fix bogus header when redirecting from VLAN
inet_diag: fix access to tcp cc information
tcp: tcp_get_info() should fetch socket fields once
net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()
skbuff: Do not scrub skb mark within the same name space
Revert "net: Reset secmark when scrubbing packet"
bpf: fix two bugs in verification logic when accessing 'ctx' pointer
bpf: fix bpf helpers to use skb->mac_header relative offsets
stmmac: Configure Flow Control to work correctly based on rxfifo size
stmmac: Enable unicast pause frame detect in GMAC Register 6
stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree
stmmac: Add defines and documentation for enabling flow control
stmmac: Add properties for transmit and receive fifo sizes
stmmac: fix oops on rmmod after assigning ip addr
...

Linus Torvalds
2015-04-18 04:31:08 +0800
e3122b7fa net: dsa: use DEVICE_ATTR_RW to declare temp1_max ... Browse Code »

Since commit da4759c (sysfs: Use only return value from is_visible for
the file mode), it is possible to reduce the permissions of a file.

So declare temp1_max with the DEVICE_ATTR_RW macro and remove the write
permission in dsa_hwmon_attrs_visible if set_temp_limit isn't provided.

Signed-off-by: Vivien Didelot
Reviewed-by: Guenter Roeck
Signed-off-by: David S. Miller

Vivien Didelot
2015-04-18 03:58:37 +0800
8b86a61da net: remove unused 'dev' argument from netif_needs_gso() ... Browse Code »

In commit 04ffcb255f22 ("net: Add ndo_gso_check") Tom originally
added the 'dev' argument to be able to call ndo_gso_check().

Then later, when generalizing this in commit 5f35227ea34b
("net: Generalize ndo_gso_check to ndo_features_check")
Jesse removed the call to ndo_gso_check() in netif_needs_gso()
by calling the new ndo_features_check() in a different place.
This made the 'dev' argument unused.

Remove the unused argument and go back to the code as before.

Cc: Tom Herbert
Cc: Jesse Gross
Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2015-04-18 01:29:41 +0800
f40ae9130 act_mirred: Fix bogus header when redirecting from VLAN ... Browse Code »

When you redirect a VLAN device to any device, you end up with
crap in af_packet on the xmit path because hard_header_len is
not equal to skb->mac_len. So the redirected packet contains
four extra bytes at the start which then gets interpreted as
part of the MAC address.

This patch fixes this by only pushing skb->mac_len. We also
need to fix ifb because it tries to undo the pushing done by
act_mirred.

Signed-off-by: Herbert Xu
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Herbert Xu
2015-04-18 01:29:28 +0800
521f1cf1d inet_diag: fix access to tcp cc information ... Browse Code »

Two different problems are fixed here :

1) inet_sk_diag_fill() might be called without socket lock held.
icsk->icsk_ca_ops can change under us and module be unloaded.
-> Access to freed memory.
Fix this using rcu_read_lock() to prevent module unload.

2) Some TCP Congestion Control modules provide information
but again this is not safe against icsk->icsk_ca_ops
change and nla_put() errors were ignored. Some sockets
could not get the additional info if skb was almost full.

Fix this by returning a status from get_info() handlers and
using rcu protection as well.

Signed-off-by: Eric Dumazet
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-18 01:28:31 +0800
fad9dfefe tcp: tcp_get_info() should fetch socket fields once ... Browse Code »

tcp_get_info() can be called without holding socket lock,
so any socket fields can change under us.

Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate

Fixes: 977cb0ecf82e ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-18 01:28:31 +0800

17 Apr, 2015

5 commits

213dd74ae skbuff: Do not scrub skb mark within the same name space ... Browse Code »

On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
> Le 15/04/2015 15:57, Herbert Xu a écrit :
> >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
> [snip]
> >Subject: skbuff: Do not scrub skb mark within the same name space
> >
> >The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels:
> Maybe add a Fixes tag?
> Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
>
> >harmonize cleanup done on skb on rx path") broke anyone trying to
> >use netfilter marking across IPv4 tunnels. While most of the
> >fields that are cleared by skb_scrub_packet don't matter, the
> >netfilter mark must be preserved.
> >
> >This patch rearranges skb_scurb_packet to preserve the mark field.
> nit: s/scurb/scrub
>
> Else it's fine for me.

Sure.

PS I used the wrong email for James the first time around. So
let me repeat the question here. Should secmark be preserved
or cleared across tunnels within the same name space? In fact,
do our security models even support name spaces?

---8
Acked-by: Thomas Graf
Signed-off-by: David S. Miller

Herbert Xu
2015-04-17 02:20:40 +0800
4c0ee414e Revert "net: Reset secmark when scrubbing packet" ... Browse Code »

This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602
because the secmark must be preserved even when a packet crosses
namespace boundaries. The reason is that security labels apply to
the system as a whole and is not per-namespace.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2015-04-17 02:19:02 +0800
a166151cb bpf: fix bpf helpers to use skb->mac_header relative offsets ... Browse Code »

For the short-term solution, lets fix bpf helper functions to use
skb->mac_header relative offsets instead of skb->data in order to
get the same eBPF programs with cls_bpf and act_bpf work on ingress
and egress qdisc path. We need to ensure that mac_header is set
before calling into programs. This is effectively the first option
from below referenced discussion.

More long term solution for LD_ABS|LD_IND instructions will be more
intrusive but also more beneficial than this, and implemented later
as it's too risky at this point in time.

I.e., we plan to look into the option of moving skb_pull() out of
eth_type_trans() and into netif_receive_skb() as has been suggested
as second option. Meanwhile, this solution ensures ingress can be
used with eBPF, too, and that we won't run into ABI troubles later.
For dealing with negative offsets inside eBPF helper functions,
we've implemented bpf_skb_clone_unwritable() to test for unwriteable
headers.

Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Alexei Starovoitov
2015-04-17 02:08:49 +0800
5a950ad58 netns: remove duplicated include from net_namespace.c ... Browse Code »

Remove duplicated include.

Signed-off-by: Wei Yongjun
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Wei Yongjun
2015-04-17 00:14:24 +0800
540207ae6 fou: avoid missing unlock in failure path ... Browse Code »

Fixes: 7a6c8c34e5b7 ("fou: implement FOU_CMD_GET")
Reported-by: Dan Carpenter
Cc: Dan Carpenter
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2015-04-17 00:11:19 +0800

16 Apr, 2015

8 commits

eea3a0026 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge second patchbomb from Andrew Morton:

- the rest of MM

- various misc bits

- add ability to run /sbin/reboot at reboot time

- printk/vsprintf changes

- fiddle with seq_printf() return value

* akpm: (114 commits)
parisc: remove use of seq_printf return value
lru_cache: remove use of seq_printf return value
tracing: remove use of seq_printf return value
cgroup: remove use of seq_printf return value
proc: remove use of seq_printf return value
s390: remove use of seq_printf return value
cris fasttimer: remove use of seq_printf return value
cris: remove use of seq_printf return value
openrisc: remove use of seq_printf return value
ARM: plat-pxa: remove use of seq_printf return value
nios2: cpuinfo: remove use of seq_printf return value
microblaze: mb: remove use of seq_printf return value
ipc: remove use of seq_printf return value
rtc: remove use of seq_printf return value
power: wakeup: remove use of seq_printf return value
x86: mtrr: if: remove use of seq_printf return value
linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
MAINTAINERS: CREDITS: remove Stefano Brivio from B43
.mailmap: add Ricardo Ribalda
CREDITS: add Ricardo Ribalda Delgado
...

Linus Torvalds
2015-04-16 07:39:15 +0800
41416f233 lib/string_helpers.c: change semantics of string_escape_mem ... Browse Code »

The current semantics of string_escape_mem are inadequate for one of its
current users, vsnprintf(). If that is to honour its contract, it must
know how much space would be needed for the entire escaped buffer, and
string_escape_mem provides no way of obtaining that (short of allocating a
large enough buffer (~4 times input string) to let it play with, and
that's definitely a big no-no inside vsnprintf).

So change the semantics for string_escape_mem to be more snprintf-like:
Return the size of the output that would be generated if the destination
buffer was big enough, but of course still only write to the part of dst
it is allowed to, and (contrary to snprintf) don't do '\0'-termination.
It is then up to the caller to detect whether output was truncated and to
append a '\0' if desired. Also, we must output partial escape sequences,
otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause
printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever
they previously contained.

This also fixes a bug in the escaped_string() helper function, which used
to unconditionally pass a length of "end-buf" to string_escape_mem();
since the latter doesn't check osz for being insanely large, it would
happily write to dst. For example, kasprintf(GFP_KERNEL, "something and
then %pE", ...); is an easy way to trigger an oops.

In test-string_helpers.c, the -ENOMEM test is replaced with testing for
getting the expected return value even if the buffer is too small. We
also ensure that nothing is written (by relying on a NULL pointer deref)
if the output size is 0 by passing NULL - this has to work for
kasprintf("%pE") to work.

In net/sunrpc/cache.c, I think qword_add still has the same semantics.
Someone should definitely double-check this.

In fs/proc/array.c, I made the minimum possible change, but longer-term it
should stop poking around in seq_file internals.

[andriy.shevchenko@linux.intel.com: simplify qword_add]
[andriy.shevchenko@linux.intel.com: add missed curly braces]
Signed-off-by: Rasmus Villemoes
Acked-by: Andy Shevchenko
Signed-off-by: Andy Shevchenko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rasmus Villemoes
2015-04-16 07:35:24 +0800
2813893f8 kernel: conditionally support non-root users, groups and capabilities ... Browse Code »

There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root. For these systems,
supporting multiple users is not necessary.

This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional. It is enabled
under CONFIG_EXPERT menu.

When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.

The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.

Also, groups.c is compiled out completely.

In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.

This change saves about 25 KB on a defconfig build. The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB. (The 25k goes down a bit with allnoconfig, but not that much.

The kernel was booted in Qemu. All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.

Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Iulia Manda
Reviewed-by: Josh Triplett
Acked-by: Geert Uytterhoeven
Tested-by: Paul E. McKenney
Reviewed-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Iulia Manda
2015-04-16 07:35:22 +0800
fa927894b Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second vfs update from Al Viro:
"Now that net-next went in... Here's the next big chunk - killing
->aio_read() and ->aio_write().

There'll be one more pile today (direct_IO changes and
generic_write_checks() cleanups/fixes), but I'd prefer to keep that
one separate"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
->aio_read and ->aio_write removed
pcm: another weird API abuse
infinibad: weird APIs switched to ->write_iter()
kill do_sync_read/do_sync_write
fuse: use iov_iter_get_pages() for non-splice path
fuse: switch to ->read_iter/->write_iter
switch drivers/char/mem.c to ->read_iter/->write_iter
make new_sync_{read,write}() static
coredump: accept any write method
switch /dev/loop to vfs_iter_write()
serial2002: switch to __vfs_read/__vfs_write
ashmem: use __vfs_read()
export __vfs_read()
autofs: switch to __vfs_write()
new helper: __vfs_write()
switch hugetlbfs to ->read_iter()
coda: switch to ->read_iter/->write_iter
ncpfs: switch to ->read_iter/->write_iter
net/9p: remove (now-)unused helpers
p9_client_attach(): set fid->uid correctly
...

Linus Torvalds
2015-04-16 04:22:56 +0800
c5ef60352 VFS: net/: d_inode() annotations ... Browse Code »

socket inodes and sunrpc filesystems - inodes owned by that code

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-04-16 03:06:56 +0800
a25b376bd VFS: net/unix: d_backing_inode() annotations ... Browse Code »

places where we are dealing with S_ISSOCK file creation/lookups.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-04-16 03:06:56 +0800
ee8ac4d61 VFS: AF_UNIX sockets should call mknod on the top layer only ... Browse Code »

AF_UNIX sockets should call mknod on the top layer only and should not attempt
to modify the lower layer in a layered filesystem such as overlayfs.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-04-16 03:06:54 +0800
6c373ca89 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Add BQL support to via-rhine, from Tino Reichardt.

2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
can support hw switch offloading. From Floria Fainelli.

3) Allow 'ip address' commands to initiate multicast group join/leave,
from Madhu Challa.

4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
Borkmann.

6) Remove the ugly compat support in ARP for ugly layers like ax25,
rose, etc. And use this to clean up the neigh layer, then use it to
implement MPLS support. All from Eric Biederman.

7) Support L3 forwarding offloading in switches, from Scott Feldman.

8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
up route lookups even further. From Alexander Duyck.

9) Many improvements and bug fixes to the rhashtable implementation,
from Herbert Xu and Thomas Graf. In particular, in the case where
an rhashtable user bulk adds a large number of items into an empty
table, we expand the table much more sanely.

10) Don't make the tcp_metrics hash table per-namespace, from Eric
Biederman.

11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

12) Split out new connection request sockets so that they can be
established in the main hash table. Much less false sharing since
hash lookups go direct to the request sockets instead of having to
go first to the listener then to the request socks hashed
underneath. From Eric Dumazet.

13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

14) Support stable privacy address generation for RFC7217 in IPV6. From
Hannes Frederic Sowa.

15) Hash network namespace into IP frag IDs, also from Hannes Frederic
Sowa.

16) Convert PTP get/set methods to use 64-bit time, from Richard
Cochran.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
fm10k: Bump driver version to 0.15.2
fm10k: corrected VF multicast update
fm10k: mbx_update_max_size does not drop all oversized messages
fm10k: reset head instead of calling update_max_size
fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
fm10k: update xcast mode before synchronizing multicast addresses
fm10k: start service timer on probe
fm10k: fix function header comment
fm10k: comment next_vf_mbx flow
fm10k: don't handle mailbox events in iov_event path and always process mailbox
fm10k: use separate workqueue for fm10k driver
fm10k: Set PF queues to unlimited bandwidth during virtualization
fm10k: expose tx_timeout_count as an ethtool stat
fm10k: only increment tx_timeout_count in Tx hang path
fm10k: remove extraneous "Reset interface" message
fm10k: separate PF only stats so that VF does not display them
fm10k: use hw->mac.max_queues for stats
fm10k: only show actual queues, not the maximum in hardware
fm10k: allow creation of VLAN on default vid
fm10k: fix unused warnings
...

Linus Torvalds
2015-04-16 00:00:47 +0800

15 Apr, 2015

4 commits

1dcf58d6e Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge first patchbomb from Andrew Morton:

- arch/sh updates

- ocfs2 updates

- kernel/watchdog feature

- about half of mm/

* emailed patches from Andrew Morton : (122 commits)
Documentation: update arch list in the 'memtest' entry
Kconfig: memtest: update number of test patterns up to 17
arm: add support for memtest
arm64: add support for memtest
memtest: use phys_addr_t for physical addresses
mm: move memtest under mm
mm, hugetlb: abort __get_user_pages if current has been oom killed
mm, mempool: do not allow atomic resizing
memcg: print cgroup information when system panics due to panic_on_oom
mm: numa: remove migrate_ratelimited
mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
mm: split ET_DYN ASLR from mmap ASLR
s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
mm: expose arch_mmap_rnd when available
s390: standardize mmap_rnd() usage
powerpc: standardize mmap_rnd() usage
mips: extract logic for mmap_rnd()
arm64: standardize mmap_rnd() usage
x86: standardize mmap_rnd() usage
arm: factor out mmap ASLR into mmap_rnd
...

Linus Torvalds
2015-04-15 07:49:17 +0800
4167e9b2c mm: remove GFP_THISNODE ... Browse Code »

NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.

GFP_THISNODE is a secret combination of gfp bits that have different
behavior than expected. It is a combination of __GFP_THISNODE,
__GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
allocator slowpath to fail without trying reclaim even though it may be
used in combination with __GFP_WAIT.

An example of the problem this creates: commit e97ca8e5b864 ("mm: fix
GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
that really just wanted __GFP_THISNODE. The problem doesn't end there,
however, because even it was a no-op for alloc_misplaced_dst_page(),
which also sets __GFP_NORETRY and __GFP_NOWARN, and
migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is a
no-op in these cases since the page allocator special-cases
__GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.

It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE
to restrict an allocation to a local node, but remove GFP_THISNODE and
its obscurity. Instead, we require that a caller clear __GFP_WAIT if it
wants to avoid reclaim.

This allows the aforementioned functions to actually reclaim as they
should. It also enables any future callers that want to do
__GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The
rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.

Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
it is unchanged.

Signed-off-by: David Rientjes
Acked-by: Vlastimil Babka
Cc: Christoph Lameter
Acked-by: Pekka Enberg
Cc: Joonsoo Kim
Acked-by: Johannes Weiner
Cc: Mel Gorman
Cc: Pravin Shelar
Cc: Jarno Rajahalme
Cc: Li Zefan
Cc: Greg Thelen
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2015-04-15 07:49:03 +0800
bae97d841 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

A final pull request, I know it's very late but this time I think it's worth a
bit of rush.

The following patchset contains Netfilter/nf_tables updates for net-next, more
specifically concatenation support and dynamic stateful expression
instantiation.

This also comes with a couple of small patches. One to fix the ebtables.h
userspace header and another to get rid of an obsolete example file in tree
that describes a nf_tables expression.

This time, I decided to paste the original descriptions. This will result in a
rather large commit description, but I think these bytes to keep.

Patrick McHardy says:

====================
netfilter: nf_tables: concatenation support

The following patches add support for concatenations, which allow multi
dimensional exact matches in O(1).

The basic idea is to split the data registers, currently consisting of
4 registers of 16 bytes each, into smaller units, 16 registers of 4
bytes each, and making sure each register store always leaves the
full 32 bit in a well defined state, meaning smaller stores will
zero the remaining bits.

Based on that, we can load multiple adjacent registers with different
values, thereby building a concatenated bigger value, and use that
value for set lookups.

Sets are changed to use variable sized extensions for their key and
data values, removing the fixed limit of 16 bytes while saving memory
if less space is needed.

As a side effect, these patches will allow some nice optimizations in
the future, like using jhash2 in nft_hash, removing the masking in
nft_cmp_fast, optimized data comparison using 32 bit word size etc.
These are not done so far however.

The patches are split up as follows:

* the first five patches add length validation to register loads and
stores to make sure we stay within bounds and prepare the validation
functions for the new addressing mode

* the next patches prepare for changing to 32 bit addressing by
introducing a struct nft_regs, which holds the verdict register as
well as the data registers. The verdict members are moved to a new
struct nft_verdict to allow to pull struct nft_data out of the stack.

* the next patches contain preparatory conversions of expressions and
sets to use 32 bit addressing

* the next patch introduces so far unused register conversion helpers
for parsing and dumping register numbers over netlink

* following is the real conversion to 32 bit addressing, consisting of
replacing struct nft_data in struct nft_regs by an array of u32s and
actually translating and validating the new register numbers.

* the final two patches add support for variable sized data items and
variable sized keys / data in set elements

The patches have been verified to work correctly with nft binaries using
both old and new addressing.
====================

Patrick McHardy says:

====================
netfilter: nf_tables: dynamic stateful expression instantiation

The following patches are the grand finale of my nf_tables set work,
using all the building blocks put in place by the previous patches
to support something like iptables hashlimit, but a lot more powerful.

Sets are extended to allow attaching expressions to set elements.
The dynset expression dynamically instantiates these expressions
based on a template when creating new set elements and evaluates
them for all new or updated set members.

In combination with concatenations this effectively creates state
tables for arbitrary combinations of keys, using the existing
expression types to maintain that state. Regular set GC takes care
of purging expired states.

We currently support two different stateful expressions, counter
and limit. Using limit as a template we can express the functionality
of hashlimit, but completely unrestricted in the combination of keys.
Using counter we can perform accounting for arbitrary flows.

The following examples from patch 5/5 show some possibilities.
Userspace syntax is still WIP, especially the listing of state
tables will most likely be seperated from normal set listings
and use a more structured format:

1. Limit the rate of new SSH connections per host, similar to iptables
hashlimit:

flow ip saddr timeout 60s \
limit 10/second \
accept

2. Account network traffic between each set of /24 networks:

flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
counter

3. Account traffic to each host per user:

flow skuid . ip daddr \
counter

4. Account traffic for each combination of source address and TCP flags:

flow ip saddr . tcp flags \
counter

The resulting set content after a Xmas-scan look like this:

{
192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
192.168.122.1 . ack : counter packets 74 bytes 3848,
192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}

In the future the "expressions attached to elements" will be extended
to also support user created non-stateful expressions to allow to
efficiently select beween a set of parameter sets, f.i. a set of log
statements with different prefixes based on the interface, which currently
require one rule each. This will most likely have to wait until the next
kernel version though.
====================

Signed-off-by: David S. Miller

David S. Miller
2015-04-15 06:51:19 +0800
ca2ec3265 Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs update from Al Viro:
"Part one:

- struct filename-related cleanups

- saner iov_iter_init() replacements (and switching the syscalls to
use of those)

- ntfs switch to ->write_iter() (Anton)

- aio cleanups and splitting iocb into common and async parts
(Christoph)

- assorted fixes (me, bfields, Andrew Elble)

There's a lot more, including the completion of switchover to
->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags
race fixes, etc, but that goes after #for-davem merge. David has
pulled it, and once it's in I'll send the next vfs pull request"

* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits)
sg_start_req(): use import_iovec()
sg_start_req(): make sure that there's not too many elements in iovec
blk_rq_map_user(): use import_single_range()
sg_io(): use import_iovec()
process_vm_access: switch to {compat_,}import_iovec()
switch keyctl_instantiate_key_common() to iov_iter
switch {compat_,}do_readv_writev() to {compat_,}import_iovec()
aio_setup_vectored_rw(): switch to {compat_,}import_iovec()
vmsplice_to_user(): switch to import_iovec()
kill aio_setup_single_vector()
aio: simplify arguments of aio_setup_..._rw()
aio: lift iov_iter_init() into aio_setup_..._rw()
lift iov_iter into {compat_,}do_readv_writev()
NFS: fix BUG() crash in notify_change() with patch to chown_common()
dcache: return -ESTALE not -EBUSY on distributed fs race
NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.
VFS: Add iov_iter_fault_in_multipages_readable()
drop bogus check in file_open_root()
switch security_inode_getattr() to struct path *
constify tomoyo_realpath_from_path()
...

Linus Torvalds
2015-04-15 06:31:03 +0800