Eric Lee / smarc-fsl-linux-kernel

27 Mar, 2016

1 commit

d5a38f6e4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph updates from Sage Weil:
"There is quite a bit here, including some overdue refactoring and
cleanup on the mon_client and osd_client code from Ilya, scattered
writeback support for CephFS and a pile of bug fixes from Zheng, and a
few random cleanups and fixes from others"

[ I already decided not to pull this because of it having been rebased
recently, but ended up changing my mind after all. Next time I'll
really hold people to it. Oh well. - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
libceph: use KMEM_CACHE macro
ceph: use kmem_cache_zalloc
rbd: use KMEM_CACHE macro
ceph: use lookup request to revalidate dentry
ceph: kill ceph_get_dentry_parent_inode()
ceph: fix security xattr deadlock
ceph: don't request vxattrs from MDS
ceph: fix mounting same fs multiple times
ceph: remove unnecessary NULL check
ceph: avoid updating directory inode's i_size accidentally
ceph: fix race during filling readdir cache
libceph: use sizeof_footer() more
ceph: kill ceph_empty_snapc
ceph: fix a wrong comparison
ceph: replace CURRENT_TIME by current_fs_time()
ceph: scattered page writeback
libceph: add helper that duplicates last extent operation
libceph: enable large, variable-sized OSD requests
libceph: osdc->req_mempool should be backed by a slab pool
libceph: make r_request msg_size calculation clearer
...

Linus Torvalds
2016-03-27 06:53:16 +0800

26 Mar, 2016

17 commits

5ee61e95b libceph: use KMEM_CACHE macro ... Browse Code »

Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.

Signed-off-by: Geliang Tang
Signed-off-by: Ilya Dryomov

Geliang Tang
2016-03-26 01:51:57 +0800
89f081730 libceph: use sizeof_footer() more ... Browse Code »

Don't open-code sizeof_footer() in read_partial_message() and
ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now
possible to use con_out_kvec_add() in prepare_write_message_footer().

Signed-off-by: Ilya Dryomov
Reviewed-by: Alex Elder

Ilya Dryomov
2016-03-26 01:51:53 +0800
2c63f49a7 libceph: add helper that duplicates last extent operation ... Browse Code »

This helper duplicates last extent operation in OSD request, then
adjusts the new extent operation's offset and length. The helper
is for scatterd page writeback, which adds nonconsecutive dirty
pages to single OSD request.

Signed-off-by: Yan, Zheng
Signed-off-by: Ilya Dryomov

Yan, Zheng
2016-03-26 01:51:43 +0800
3f1af42ad libceph: enable large, variable-sized OSD requests ... Browse Code »

Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests. The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.

r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once. ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:43 +0800
9e767adbd libceph: osdc->req_mempool should be backed by a slab pool ... Browse Code »

ceph_osd_request_cache was introduced a long time ago. Also, osd_req
is about to get a flexible array member, which ceph_osd_request_cache
is going to be aware of.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:43 +0800
ae458f5a1 libceph: make r_request msg_size calculation clearer ... Browse Code »

Although msg_size is calculated correctly, the terms are grouped in
a misleading way - snaps appears to not have room for a u32 length.
Move calculation closer to its use and regroup terms.

No functional change.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:42 +0800
7665d85b7 libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op ... Browse Code »

This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.

Signed-off-by: Yan, Zheng
Signed-off-by: Ilya Dryomov

Yan, Zheng
2016-03-26 01:51:42 +0800
de2aa102e libceph: rename ceph_osd_req_op::payload_len to indata_len ... Browse Code »

Follow userspace nomenclature on this - the next commit adds
outdata_len.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:41 +0800
b5d91704f libceph: behave in mon_fault() if cur_mon < 0 ... Browse Code »

This can happen if __close_session() in ceph_monc_stop() races with
a connection reset. We need to ignore such faults, otherwise it's
likely we would take !hunting, call __schedule_delayed() and end up
with delayed_work() executing on invalid memory, among other things.

The (two!) con->private tests are useless, as nothing ever clears
con->private. Nuke them.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:40 +0800
bee3a37c4 libceph: reschedule tick in mon_fault() ... Browse Code »

Doing __schedule_delayed() in the hunting branch is pointless, as the
tick will have already been scheduled by then.

What we need to do instead is *reschedule* it in the !hunting branch,
after reopen_session() changes hunt_mult, which affects the delay.
This helps with spacing out connection attempts and avoiding things
like two back-to-back attempts followed by a longer period of waiting
around.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:40 +0800
1752b50ca libceph: introduce and switch to reopen_session() ... Browse Code »

hunting is now set in __open_session() and cleared in finish_hunting(),
instead of all around. The "session lost" message is printed not only
on connection resets, but also on keepalive timeouts.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:39 +0800
168b9090c libceph: monc hunt rate is 3s with backoff up to 30s ... Browse Code »

Unless we are in the process of setting up a client (i.e. connecting to
the monitor cluster for the first time), apply a backoff: every time we
want to reopen a session, increase our timeout by a multiple (currently
2); when we complete the connection, reduce that multipler by 50%.

Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:39 +0800
58d81b129 libceph: monc ping rate is 10s ... Browse Code »

Split ping interval and ping timeout: ping interval is 10s; keepalive
timeout is 30s.

Make monc_ping_timeout a constant while at it - it's not actually
exported as a mount option (and the rest of tick-related settings won't
be either), so it's got no place in ceph_options.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:39 +0800
0e04dc26c libceph: pick a different monitor when reconnecting ... Browse Code »

Don't try to reconnect to the same monitor when we fail to establish
a session within a timeout or it's lost.

For that, pick_new_mon() needs to see the old value of cur_mon, so
don't clear it in __close_session() - all calls to __close_session()
but one are followed by __open_session() anyway. __open_session() is
only called when a new session needs to be established, so the "already
open?" branch, which is now in the way, is simply dropped.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:38 +0800
82dcabad7 libceph: revamp subs code, switch to SUBSCRIBE2 protocol ... Browse Code »

It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime". To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs. Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.

Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).

Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:38 +0800
0f9af169a libceph: decouple hunting and subs management ... Browse Code »

Coupling hunting state with subscribe state is not a good idea. Clear
hunting when we complete the authentication handshake.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:38 +0800
02ac956c4 libceph: move debugfs initialization into __ceph_open_session() ... Browse Code »

Our debugfs dir name is a concatenation of cluster fsid and client
unique ID ("global_id"). It used to be the case that we learned
global_id first, nowadays we always learn fsid first - the monmap is
sent before any auth replies are. ceph_debugfs_client_init() call in
ceph_monc_handle_map() is therefore never executed and can be removed.

Its counterpart in handle_auth_reply() doesn't really belong there
either: having to do monc->client and unlocking early to work around
lockdep is a testament to that. Move it into __ceph_open_session(),
where it can be called unconditionally.

Signed-off-by: Ilya Dryomov

Ilya Dryomov
2016-03-26 01:51:37 +0800

25 Mar, 2016

1 commit

5b1e167d8 Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Various bugfixes, a RDMA update from Chuck Lever, and support for a
new pnfs layout type from Christoph Hellwig. The new layout type is a
variant of the block layout which uses SCSI features to offer improved
fencing and device identification.

(Also: note this pull request also includes the client side of SCSI
layout, with Trond's permission.)"

* tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
nfsd: recover: fix memory leak
nfsd: fix deadlock secinfo+readdir compound
nfsd4: resfh unused in nfsd4_secinfo
svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
svcrdma: Remove close_out exit path
svcrdma: Hook up the logic to return ERR_CHUNK
svcrdma: Use correct XID in error replies
svcrdma: Make RDMA_ERROR messages work
rpcrdma: Add RPCRDMA_HDRLEN_ERR
svcrdma: svc_rdma_post_recv() should close connection on error
svcrdma: Close connection when a send error occurs
nfsd: Lower NFSv4.1 callback message size limit
svcrdma: Do not send Write chunk XDR pad with inline content
svcrdma: Do not write xdr_buf::tail in a Write chunk
svcrdma: Find client-provided write and reply chunks once per reply
nfsd: Update NFS server comments related to RDMA support
nfsd: Fix a memory leak when meeting unsupported state_protect_how4
nfsd4: fix bad bounds checking

Linus Torvalds
2016-03-25 01:41:00 +0800

24 Mar, 2016

3 commits

aca04ce5d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking bugfixes from David Miller:
"Several bug fixes rolling in, some for changes introduced in this
merge window, and some for problems that have existed for some time:

1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda.

2) The new DST_CACHE should be a silent config option, from Dave
Jones.

3) inet_current_timestamp() unintentionally truncates timestamps to
16-bit, from Deepa Dinamani.

4) Missing reference to netns in ppp, from Guillaume Nault.

5) Free memory reference in hv_netvsc driver, from Haiyang Zhang.

6) Missing kernel doc documentation for function arguments in various
spots around the networking, from Luis de Bethencourt.

7) UDP stopped receiving broadcast packets properly, due to
overzealous multicast checks, fix from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
net: ping: make ping_v6_sendmsg static
hv_netvsc: Fix the order of num_sc_offered decrement
net: Fix typos and whitespace.
hv_netvsc: Fix the array sizes to be max supported channels
hv_netvsc: Fix accessing freed memory in netvsc_change_mtu()
ppp: take reference on channels netns
net: Reset encap_level to avoid resetting features on inner IP headers
net: mediatek: fix checking for NULL instead of IS_ERR() in .probe
net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY
at803x: fix reset handling
AF_VSOCK: Shrink the area influenced by prepare_to_wait
Revert "vsock: Fix blocking ops call in prepare_to_wait"
macb: fix PHY reset
ipv4: initialize flowi4_flags before calling fib_lookup()
fsl/fman: Workaround for Errata A-007273
ipv4: fix broadcast packets reception
net: hns: bug fix about the overflow of mss
net: hns: adds limitation for debug port mtu
net: hns: fix the bug about mtu setting
net: hns: fixes a bug of RSS
...

Linus Torvalds
2016-03-24 14:25:14 +0800
6579a023a net: ping: make ping_v6_sendmsg static ... Browse Code »

As ping_v6_sendmsg is used only in this file,
making it static

The body of "pingv6_prot" and "pingv6_protosw" were
moved at the middle of the file, to avoid having to
declare some static prototypes.

Signed-off-by: Haishuang Yan
Signed-off-by: David S. Miller

Haishuang Yan
2016-03-24 10:09:58 +0800
5197f3499 net: Reset encap_level to avoid resetting features on inner IP headers ... Browse Code »

This patch corrects an oversight in which we were allowing the encap_level
value to pass from the outer headers to the inner headers. As a result we
were incorrectly identifying UDP or GRE tunnels as also making use of ipip
or sit when the second header actually represented a tunnel encapsulated in
either a UDP or GRE tunnel which already had the features masked.

Fixes: 76443456227097179c1482 ("net: Move GSO csum into SKB_GSO_CB")
Reported-by: Tom Herbert
Signed-off-by: Alexander Duyck
Acked-by: Tom Herbert
Signed-off-by: David S. Miller

Alexander Duyck
2016-03-24 02:19:08 +0800

23 Mar, 2016

10 commits

a24e3d414 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge third patch-bomb from Andrew Morton:

- more ocfs2 changes

- a few hotfixes

- Andy's compat cleanups

- misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

- many rapidio updates: fixes, new drivers.

- kcov: kernel code coverage feature. Like gcov, but not
"prohibitively expensive".

- extable code consolidation for various archs

* emailed patches from Andrew Morton : (81 commits)
ia64/extable: use generic search and sort routines
x86/extable: use generic search and sort routines
s390/extable: use generic search and sort routines
alpha/extable: use generic search and sort routines
kernel/...: convert pr_warning to pr_warn
drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
memremap: add MEMREMAP_WC flag
memremap: don't modify flags
kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
ipc/sem: make semctl setting sempid consistent
ubsan: fix tree-wide -Wmaybe-uninitialized false positives
kfifo: fix sparse complaints
scripts/gdb: account for changes in module data structure
scripts/gdb: add cmdline reader command
scripts/gdb: add version command
kernel: add kcov code coverage
profile: hide unused functions when !CONFIG_PROC_FS
hpwdt: use nmi_panic() when kernel panics in NMI handler
...

Linus Torvalds
2016-03-23 08:09:14 +0800
b8ba45268 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma ... Browse Code »

Pull more rdma updates from Doug Ledford:
"Round two of 4.6 merge window patches.

This is a monster pull request. I held off on the hfi1 driver updates
(the hfi1 driver is intimately tied to the qib driver and the new
rdmavt software library that was created to help both of them) in my
first pull request. The hfi1/qib/rdmavt update is probably 90% of
this pull request. The hfi1 driver is being left in staging so that
it can be fixed up in regards to the API that Al and yourself didn't
like. Intel has agreed to do the work, but in the meantime, this
clears out 300+ patches in the backlog queue and brings my tree and
their tree closer to sync.

This also includes about 10 patches to the core and a few to mlx5 to
create an infrastructure for configuring SRIOV ports on IB devices.
That series includes one patch to the net core that we sent to netdev@
and Dave Miller with each of the three revisions to the series. We
didn't get any response to the patch, so we took that as implicit
approval.

Finally, this series includes Intel's new iWARP driver for their x722
cards. It's not nearly the beast as the hfi1 driver. It also has a
linux-next merge issue, but that has been resolved and it now passes
just fine.

Summary:

- A few minor core fixups needed for the next patch series

- The IB SRIOV series. This has bounced around for several versions.
Of note is the fact that the first patch in this series effects the
net core. It was directed to netdev and DaveM for each iteration
of the series (three versions total). Dave did not object, but did
not respond either. I've taken this as permission to move forward
with the series.

- The new Intel X722 iWARP driver

- A huge set of updates to the Intel hfi1 driver. Of particular
interest here is that we have left the driver in staging since it
still has an API that people object to. Intel is working on a fix,
but getting these patches in now helps keep me sane as the upstream
and Intel's trees were over 300 patches apart"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
IB/ipoib: Allow mcast packets from other VFs
IB/mlx5: Implement callbacks for manipulating VFs
net/mlx5_core: Implement modify HCA vport command
net/mlx5_core: Add VF param when querying vport counter
IB/ipoib: Add ndo operations for configuring VFs
IB/core: Add interfaces to control VF attributes
IB/core: Support accessing SA in virtualized environment
IB/core: Add subnet prefix to port info
IB/mlx5: Fix decision on using MAD_IFC
net/core: Add support for configuring VF GUIDs
IB/{core, ulp} Support above 32 possible device capability flags
IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
net/mlx5_core: Introduce offload arithmetic hardware capabilities
net/mlx5_core: Refactor device capability function
net/mlx5_core: Fix caching ATOMIC endian mode capability
ib_srpt: fix a WARN_ON() message
i40iw: Replace the obsolete crypto hash interface with shash
IB/hfi1: Add SDMA cache eviction algorithm
IB/hfi1: Switch to using the pin query function
IB/hfi1: Specify mm when releasing pages
...

Linus Torvalds
2016-03-23 06:48:44 +0800
2bf8c4762 net/xfrm_user: use in_compat_syscall to deny compat syscalls ... Browse Code »

The code wants to prevent compat code from receiving messages. Use
in_compat_syscall for this.

Signed-off-by: Andy Lutomirski
Cc: Steffen Klassert
Cc: Herbert Xu
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Lutomirski
2016-03-23 06:36:02 +0800
96c0e0a90 net/sctp: use in_compat_syscall for sctp_getsockopt_connectx3 ... Browse Code »

SCTP unfortunately has a different ABI for SCTP_SOCKOPT_CONNECTX3 for
32-bit and 64-bit callers. Use in_compat_syscall to correctly
distinguish them on all architectures.

Signed-off-by: Andy Lutomirski
Cc: Vlad Yasevich
Cc: Neil Horman
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Lutomirski
2016-03-23 06:36:02 +0800
f7f9b5e7f AF_VSOCK: Shrink the area influenced by prepare_to_wait ... Browse Code »

When a thread is prepared for waiting by calling prepare_to_wait, sleeping
is not allowed until either the wait has taken place or finish_wait has
been called. The existing code in af_vsock imposed unnecessary no-sleep
assumptions to a broad list of backend functions.
This patch shrinks the influence of prepare_to_wait to the area where it
is strictly needed, therefore relaxing the no-sleep restriction there.

Signed-off-by: Claudio Imbrenda
Signed-off-by: David S. Miller

Claudio Imbrenda
2016-03-23 04:18:41 +0800
6f57e56a1 Revert "vsock: Fix blocking ops call in prepare_to_wait" ... Browse Code »

This reverts commit 5988818008257ca42010d6b43a3e0e48afec9898 ("vsock: Fix
blocking ops call in prepare_to_wait")

The commit reverted with this patch caused us to potentially miss wakeups.
Since the condition is not checked between the prepare_to_wait and the
schedule(), if a wakeup happens after the condition is checked but before
the sleep happens, we will miss it. ( A description of the problem can be
found here: http://www.makelinux.net/ldd3/chp-6-sect-2 ).

By reverting the patch, the behaviour is still incorrect (since we
shouldn't sleep between the prepare_to_wait and the schedule) but at least
it will not miss wakeups.

The next patch in the series actually fixes the behaviour.

Signed-off-by: Claudio Imbrenda
Signed-off-by: David S. Miller

Claudio Imbrenda
2016-03-23 04:18:41 +0800
01cde1538 Merge tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Highlights include:

Features:
- Add support for multiple NFSv4.1 callbacks in flight
- Initial patchset for RPC multipath support
- Adapt RPC/RDMA to use the new completion queue API

Bugfixes and cleanups:
- nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
- Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
- Fix RPC/RDMA credit accounting
- Properly handle RDMA_ERROR replies
- xprtrdma: Do not wait if ib_post_send() fails
- xprtrdma: Segment head and tail XDR buffers on page boundaries
- xprtrdma cleanups for dprintk, physical_op_map and unused macros"

* tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits)
nfs/blocklayout: make sure making a aligned read request
nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
nfs: remove nfs_inode_dio_wait
nfs: remove nfs4_file_fsync
xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
xprtrdma: Use an anonymous union in struct rpcrdma_mw
xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
xprtrdma: Serialize credit accounting again
xprtrdma: Properly handle RDMA_ERROR replies
rpcrdma: Add RPCRDMA_HDRLEN_ERR
xprtrdma: Do not wait if ib_post_send() fails
xprtrdma: Segment head and tail XDR buffers on page boundaries
xprtrdma: Clean up dprintk format string containing a newline
xprtrdma: Clean up physical_op_map()
xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback
pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3
SUNRPC: Allow addition of new transports to a struct rpc_clnt
NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections
SUNRPC: Make NFS swap work with multipath
...

Linus Torvalds
2016-03-23 04:16:21 +0800
4cfc86f3d ipv4: initialize flowi4_flags before calling fib_lookup() ... Browse Code »

Field fl4.flowi4_flags is not initialized in fib_compute_spec_dst()
before calling fib_lookup(), which means fib_table_lookup() is
using non-deterministic data at this line:

if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {

Fix by initializing the entire fl4 structure, which will prevent
similar issues as fields are added in the future by ensuring that
all fields are initialized to zero unless explicitly initialized
to another value.

Fixes: 58189ca7b2741 ("net: Fix vti use case with oif in dst lookups")
Suggested-by: David Ahern
Signed-off-by: Lance Richardson
Acked-by: David Ahern
Signed-off-by: David S. Miller

Lance Richardson
2016-03-23 03:59:23 +0800
ad0ea1989 ipv4: fix broadcast packets reception ... Browse Code »

Currently, ingress ipv4 broadcast datagrams are dropped since,
in udp_v4_early_demux(), ip_check_mc_rcu() is invoked even on
bcast packets.

This patch addresses the issue, invoking ip_check_mc_rcu()
only for mcast packets.

Fixes: 6e5403093261 ("ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()")
Signed-off-by: Paolo Abeni
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Paolo Abeni
2016-03-23 03:53:50 +0800
025c68186 netlink: add support for NIC driver ioctls ... Browse Code »

By returning -ENOIOCTLCMD, sock_do_ioctl() falls back to calling
dev_ioctl(), which provides support for NIC driver ioctls, which
includes ethtool support. This is similar to the way ioctls are handled
in udp.c or tcp.c.

This removes the requirement that ethtool for example be tied to the
support of a specific L3 protocol (ethtool uses an AF_INET socket
today).

Signed-off-by: David Decotigny
Signed-off-by: David S. Miller

David Decotigny
2016-03-23 03:45:44 +0800

22 Mar, 2016

6 commits

3ba9d300c net: ipv4: Fix truncated timestamp returned by inet_current_timestamp() ... Browse Code »

The millisecond timestamps returned by the function is
converted to network byte order by making a call to htons().
htons() only returns __be16 while __be32 is required here.

This was identified by the sparse warning from the buildbot:
net/ipv4/af_inet.c:1405:16: sparse: incorrect type in return
expression (different base types)
net/ipv4/af_inet.c:1405:16: expected restricted __be32
net/ipv4/af_inet.c:1405:16: got restricted __be16 [usertype]

Change the function to use htonl() to return the correct __be32 type
instead so that the millisecond value doesn't get truncated.

Signed-off-by: Deepa Dinamani
Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: Hideaki YOSHIFUJI
Cc: James Morris
Cc: Patrick McHardy
Cc: Arnd Bergmann
Fixes: 822c868532ca ("net: ipv4: Convert IP network timestamps to be y2038 safe")
Reported-by: Fengguang Wu [0-day test robot]
Signed-off-by: David S. Miller

Deepa Dinamani
2016-03-22 10:56:38 +0800
9b246841f Make DST_CACHE a silent config option ... Browse Code »

commit 911362c70d ("net: add dst_cache support") added a new
kconfig option that gets selected by other networking options.
It seems the intent wasn't to offer this as a user-selectable
option given the lack of help text, so this patch converts it
to a silent option.

Signed-off-by: Dave Jones
Signed-off-by: David S. Miller

Dave Jones
2016-03-22 10:56:38 +0800
cc8e27cc9 net/core: Add support for configuring VF GUIDs ... Browse Code »

Add two new NLAs to support configuration of Infiniband node or port
GUIDs. New applications can choose to use this interface to configure
GUIDs with iproute2 with commands such as:

ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78

A new ndo, ndo_sef_vf_guid is introduced to notify the net device of the
request to change the GUID.

Signed-off-by: Eli Cohen
Reviewed-by: Or Gerlitz
Signed-off-by: Doug Ledford

Eli Cohen
2016-03-22 04:34:06 +0800
ae74f1006 bridge: update max_gso_segs and max_gso_size ... Browse Code »

It can be useful to lower max_gso_segs on NIC with very low
number of TX descriptors like bcmgenet.

However, this is defeated by bridge since it does not propagate
the lower value of max_gso_segs and max_gso_size.

Signed-off-by: Eric Dumazet
Cc: Petri Gynther
Cc: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2016-03-22 01:35:56 +0800
c70ce028e net/rtnetlink: add IFLA_GSO_MAX_SEGS and IFLA_GSO_MAX_SIZE attributes ... Browse Code »

It can be useful to report dev->gso_max_segs and dev->gso_max_size
so that "ip -d link" can display them to help debugging.

For the moment, these attributes are read-only.

Signed-off-by: Eric Dumazet
Cc: Petri Gynther
Cc: Stephen Hemminger
Signed-off-by: David S. Miller

Eric Dumazet
2016-03-22 01:35:56 +0800
ed49e6503 net: add description for len argument of dev_get_phys_port_name ... Browse Code »

When the function dev_get_phys_port_name was added it missed a description
for it's len argument. Adding it.

Fixes: db24a9044ee1 ("net: add support for phys_port_name")
Signed-off-by: Luis de Bethencourt
Signed-off-by: David S. Miller

Luis de Bethencourt
2016-03-22 01:28:31 +0800

21 Mar, 2016

2 commits

643ad15d4 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

There's a background article at LWN.net:

https://lwn.net/Articles/643797/

The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.

This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).

This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:

mmap(..., PROT_EXEC);

or

mprotect(ptr, sz, PROT_EXEC);

(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.

So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.

We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.

There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.

Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...

Linus Torvalds
2016-03-21 10:08:56 +0800
e9fc2f052 net: sched: Add description for cpu_bstats argument ... Browse Code »

Commit 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe")
added the argument cpu_bstats to functions gen_new_estimator and
gen_replace_estimator and now the descriptions of these are missing for the
documentation. Adding them.

Signed-off-by: Luis de Bethencourt
Signed-off-by: David S. Miller

Luis de Bethencourt
2016-03-21 04:48:07 +0800