Doug / smarc-fsl-linux-kernel | Embedian Git Server

12 Sep, 2013

1 commit

3ddc5b46a kernel-wide: fix missing validations on __get/__put/__copy_to/__copy_from_user() ... Browse Code »

I found the following pattern that leads in to interesting findings:

grep -r "ret.*|=.*__put_user" *
grep -r "ret.*|=.*__get_user" *
grep -r "ret.*|=.*__copy" *

The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat,
since those appear in compat code, we could probably expect the kernel
addresses not to be reachable in the lower 32-bit range, so I think they
might not be exploitable.

For the "__get_user" cases, I don't think those are exploitable: the worse
that can happen is that the kernel will copy kernel memory into in-kernel
buffers, and will fail immediately afterward.

The alpha csum_partial_copy_from_user() seems to be missing the
access_ok() check entirely. The fix is inspired from x86. This could
lead to information leak on alpha. I also noticed that many architectures
map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I
wonder if the latter is performing the access checks on every
architectures.

Signed-off-by: Mathieu Desnoyers
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Jens Axboe
Cc: Oleg Nesterov
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2013-09-12 06:58:18 +0800

11 Sep, 2013

1 commit

cf596766f Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"This was a very quiet cycle! Just a few bugfixes and some cleanup"

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
rpc: let xdr layer allocate gssproxy receieve pages
rpc: fix huge kmalloc's in gss-proxy
rpc: comment on linux_cred encoding, treat all as unsigned
rpc: clean up decoding of gssproxy linux creds
svcrpc: remove unused rq_resused
nfsd4: nfsd4_create_clid_dir prints uninitialized data
nfsd4: fix leak of inode reference on delegation failure
Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"
sunrpc: prepare NFS for 2038
nfsd4: fix setlease error return
nfsd: nfs4_file_get_access: need to be more careful with O_RDWR

Linus Torvalds
2013-09-11 11:04:59 +0800

10 Sep, 2013

2 commits

bf97293eb Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Highlights include:

- Fix NFSv4 recovery so that it doesn't recover lost locks in cases
such as lease loss due to a network partition, where doing so may
result in data corruption. Add a kernel parameter to control
choice of legacy behaviour or not.
- Performance improvements when 2 processes are writing to the same
file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
NFS clients from being able to manipulate our lease and file
locking state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients.
- Fix the broken NFSv4 security auto-negotiation between client and
server.
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery
issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration
support"

* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
NFSv4: Allow security autonegotiation for submounts
NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
NFSv4: Fix security auto-negotiation
NFS: Clean up nfs_parse_security_flavors()
NFS: Clean up the auth flavour array mess
NFSv4.1 Use MDS auth flavor for data server connection
NFS: Don't check lock owner compatability unless file is locked (part 2)
NFS: Don't check lock owner compatibility in writes unless file is locked
nfs4: Map NFS4ERR_WRONG_CRED to EPERM
nfs4.1: Add SP4_MACH_CRED write and commit support
nfs4.1: Add SP4_MACH_CRED stateid support
nfs4.1: Add SP4_MACH_CRED secinfo support
nfs4.1: Add SP4_MACH_CRED cleanup support
nfs4.1: Add state protection handler
nfs4.1: Minimal SP4_MACH_CRED implementation
SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
SUNRPC: Add an identifier for struct rpc_clnt
SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
...

Linus Torvalds
2013-09-10 00:19:15 +0800
6cccc7d30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull ceph updates from Sage Weil:
"This includes both the first pile of Ceph patches (which I sent to
torvalds@vger, sigh) and a few new patches that add support for
fscache for Ceph. That includes a few fscache core fixes that David
Howells asked go through the Ceph tree. (Thanks go to Milosz Tanski
for putting this feature together)

This first batch of patches (included here) had (has) several
important RBD bug fixes, hole punch support, several different
cleanups in the page cache interactions, improvements in the truncate
code (new truncate mutex to avoid shenanigans with i_mutex), and a
series of fixes in the synchronous striping read/write code.

On top of that is a random collection of small fixes all across the
tree (error code checks and error path cleanup, obsolete wq flags,
etc)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
ceph: use d_invalidate() to invalidate aliases
ceph: remove ceph_lookup_inode()
ceph: trivial buildbot warnings fix
ceph: Do not do invalidate if the filesystem is mounted nofsc
ceph: page still marked private_2
ceph: ceph_readpage_to_fscache didn't check if marked
ceph: clean PgPrivate2 on returning from readpages
ceph: use fscache as a local presisent cache
fscache: Netfs function for cleanup post readpages
FS-Cache: Fix heading in documentation
CacheFiles: Implement interface to check cache consistency
FS-Cache: Add interface to check consistency of a cached object
rbd: fix null dereference in dout
rbd: fix buffer size for writes to images with snapshots
libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
rbd: fix I/O error propagation for reads
ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
ceph: allow sync_read/write return partial successed size of read/write.
ceph: fix bugs about handling short-read for sync read mode.
ceph: remove useless variable revoked_rdcache
...

Linus Torvalds
2013-09-10 00:13:22 +0800

08 Sep, 2013

2 commits

c7c4591db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull namespace changes from Eric Biederman:
"This is an assorted mishmash of small cleanups, enhancements and bug
fixes.

The major theme is user namespace mount restrictions. nsown_capable
is killed as it encourages not thinking about details that need to be
considered. A very hard to hit pid namespace exiting bug was finally
tracked and fixed. A couple of cleanups to the basic namespace
infrastructure.

Finally there is an enhancement that makes per user namespace
capabilities usable as capabilities, and an enhancement that allows
the per userns root to nice other processes in the user namespace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Kill nsown_capable it makes the wrong thing easy
capabilities: allow nice if we are privileged
pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
userns: Allow PR_CAPBSET_DROP in a user namespace.
namespaces: Simplify copy_namespaces so it is clear what is going on.
pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
sysfs: Restrict mounting sysfs
userns: Better restrictions on when proc and sysfs can be mounted
vfs: Don't copy mount bind mounts of /proc//ns/mnt between namespaces
kernel/nsproxy.c: Improving a snippet of code.
proc: Restrict mounting the proc filesystem
vfs: Lock in place mounts from more privileged users

Linus Torvalds
2013-09-08 05:35:32 +0800
0ffb01d9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"A quick set of fixes, some to deal with fallout from yesterday's
net-next merge.

1) Fix compilation of bnx2x driver with CONFIG_BNX2X_SRIOV disabled,
from Dmitry Kravkov.

2) Fix a bnx2x regression caused by one of Dave Jones's mistaken
braces changes, from Eilon Greenstein.

3) Add some protective filtering in the netlink tap code, from Daniel
Borkmann.

4) Fix TCP congestion window growth regression after timeouts, from
Yuchung Cheng.

5) Correctly adjust TCP's rcv_ssthresh for out of order packets, from
Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
tcp: properly increase rcv_ssthresh for ofo packets
net: add documentation for BQL helpers
mlx5: remove unused MLX5_DEBUG param in Kconfig
bnx2x: Restore a call to config_init
bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set
tcp: fix no cwnd growth after timeout
net: netlink: filter particular protocols from analyzers

Linus Torvalds
2013-09-08 05:27:46 +0800

07 Sep, 2013

6 commits

4e4f1fc22 tcp: properly increase rcv_ssthresh for ofo packets ... Browse Code »

TCP receive window handling is multi staged.

A socket has a memory budget, static or dynamic, in sk_rcvbuf.

Because we do not really know how this memory budget translates to
a TCP window (payload), TCP announces a small initial window
(about 20 MSS).

When a packet is received, we increase TCP rcv_win depending
on the payload/truesize ratio of this packet. Good citizen
packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2

This heuristic takes place in tcp_grow_window()

Problem is : We currently call tcp_grow_window() only for in-order
packets.

This means that reorders or packet losses stop proper grow of
rcv_win, and senders are unable to benefit from fast recovery,
or proper reordering level detection.

Really, a packet being stored in OFO queue is not a bad citizen.
It should be part of the game as in-order packets.

In our traces, we very often see sender is limited by linux small
receive windows, even if linux hosts use autotuning (DRS) and should
allow rcv_win to grow to ~3MB.

Signed-off-by: Eric Dumazet
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Eric Dumazet
2013-09-07 02:43:49 +0800
16edfe7ee tcp: fix no cwnd growth after timeout ... Browse Code »

In commit 0f7cc9a3 "tcp: increase throughput when reordering is high",
it only allows cwnd to increase in Open state. This mistakenly disables
slow start after timeout (CA_Loss). Moreover cwnd won't grow if the
state moves from Disorder to Open later in tcp_fastretrans_alert().

Therefore the correct logic should be to allow cwnd to grow as long
as the data is received in order in Open, Loss, or even Disorder state.

Signed-off-by: Yuchung Cheng
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Yuchung Cheng
2013-09-07 02:43:49 +0800
5ffd5cddd net: netlink: filter particular protocols from analyzers ... Browse Code »

Fix finer-grained control and let only a whitelist of allowed netlink
protocols pass, in our case related to networking. If later on, other
subsystems decide they want to add their protocol as well to the list
of allowed protocols they shall simply add it. While at it, we also
need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can
not pick it up (as it's not filled out).

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-07 02:43:48 +0800
cd0a2df68 Merge tag 'fscache-fixes-for-ceph' into wip-fscache ... Browse Code »

Patches for Ceph FS-Cache support

Milosz Tanski
2013-09-07 00:41:20 +0800
2e515bf09 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree from Jiri Kosina:
"The usual trivial updates all over the tree -- mostly typo fixes and
documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
doc: Documentation/cputopology.txt fix typo
treewide: Convert retrun typos to return
Fix comment typo for init_cma_reserved_pageblock
Documentation/trace: Correcting and extending tracepoint documentation
mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
power: Documentation: Update s2ram link
doc: fix a typo in Documentation/00-INDEX
Documentation/printk-formats.txt: No casts needed for u64/s64
doc: Fix typo "is is" in Documentations
treewide: Fix printks with 0x%#
zram: doc fixes
Documentation/kmemcheck: update kmemcheck documentation
doc: documentation/hwspinlock.txt fix typo
PM / Hibernate: add section for resume options
doc: filesystems : Fix typo in Documentations/filesystems
scsi/megaraid fixed several typos in comments
ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
page_isolation: Fix a comment typo in test_pages_isolated()
doc: fix a typo about irq affinity
...

Linus Torvalds
2013-09-07 00:36:28 +0800
22e04f6b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid ... Browse Code »

Pull HID updates from Jiri Kosina:
"Highlights:

- conversion of HID subsystem to use devm-based resource management,
from Benjamin Tissoires

- i2c-hid support for DT bindings, from Benjamin Tissoires

- much improved support for Win8-multitouch devices, from Benjamin
Tissoires

- cleanup of core code using common hidinput_input_event(), from
David Herrmann

- fix for bug in implement() access to the bit stream (causing oops)
that has been present in the code for ages, but devices that are
able to trigger it have started to appear only now, from Jiri
Kosina

- fixes for CVE-2013-2899, CVE-2013-2898, CVE-2013-2896,
CVE-2013-2892, CVE-2013-2888 (all triggerable only by specially
crafted malicious HW devices plugged into the system), from Kees
Cook

- hidraw oops fix, from Manoj Chourasia

- various smaller fixes here and there, support for a bunch of new
devices by various contributors"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (53 commits)
HID: MAINTAINERS: add roccat drivers
HID: hid-sensor-hub: change kmalloc + memcpy by kmemdup
HID: hid-sensor-hub: move to devm_kzalloc
HID: hid-sensor-hub: fix indentation accross the code
HID: move HID_REPORT_TYPES closer to the report-definitions
HID: check for NULL field when setting values
HID: picolcd_core: validate output report details
HID: sensor-hub: validate feature report details
HID: ntrig: validate feature report details
HID: pantherlord: validate output report details
HID: hid-wiimote: print small buffers via %*phC
HID: uhid: improve uhid example client
HID: Correct the USB IDs for the new Macbook Air 6
HID: wiimote: add support for Guitar-Hero guitars
HID: wiimote: add support for Guitar-Hero drums
Input: introduce BTN/ABS bits for drums and guitars
HID: battery: don't do DMA from stack
HID: roccat: add support for KonePureOptical v2
HID: picolcd: Prevent NULL pointer dereference on _remove()
HID: usbhid: quirk for N-Trig DuoSense Touch Screen
...

Linus Torvalds
2013-09-07 00:30:36 +0800

06 Sep, 2013

13 commits

d4a516560 rpc: let xdr layer allocate gssproxy receieve pages ... Browse Code »

In theory the linux cred in a gssproxy reply can include up to
NGROUPS_MAX data, 256K of data. In the common case we expect it to be
shorter. So do as the nfsv3 ACL code does and let the xdr code allocate
the pages as they come in, instead of allocating a lot of pages that
won't typically be used.

Tested-by: Simo Sorce
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2013-09-06 23:45:58 +0800
9dfd87da1 rpc: fix huge kmalloc's in gss-proxy ... Browse Code »

The reply to a gssproxy can include up to NGROUPS_MAX gid's, which will
take up more than a page. We therefore need to allocate an array of
pages to hold the reply instead of trying to allocate a single huge
buffer.

Tested-by: Simo Sorce
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2013-09-06 23:45:58 +0800
6a36978e6 rpc: comment on linux_cred encoding, treat all as unsigned ... Browse Code »

The encoding of linux creds is a bit confusing.

Also: I think in practice it doesn't really matter whether we treat any
of these things as signed or unsigned, but unsigned seems more
straightforward: uid_t/gid_t are unsigned and it simplifies the ngroups
overflow check.

Tested-by: Simo Sorce
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2013-09-06 23:45:57 +0800
778e512bb rpc: clean up decoding of gssproxy linux creds ... Browse Code »

We can use the normal coding infrastructure here.

Two minor behavior changes:

- we're assuming no wasted space at the end of the linux cred.
That seems to match gss-proxy's behavior, and I can't see why
it would need to do differently in the future.

- NGROUPS_MAX check added: note groups_alloc doesn't do this,
this is the caller's responsibility.

Tested-by: Simo Sorce
Signed-off-by: J. Bruce Fields

J. Bruce Fields
2013-09-06 23:45:56 +0800
cc998ff88 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking changes from David Miller:
"Noteworthy changes this time around:

1) Multicast rejoin support for team driver, from Jiri Pirko.

2) Centralize and simplify TCP RTT measurement handling in order to
reduce the impact of bad RTO seeding from SYN/ACKs. Also, when
both timestamps and local RTT measurements are available prefer
the later because there are broken middleware devices which
scramble the timestamp.

From Yuchung Cheng.

3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel
memory consumed to queue up unsend user data. From Eric Dumazet.

4) Add a "physical port ID" abstraction for network devices, from
Jiri Pirko.

5) Add a "suppress" operation to influence fib_rules lookups, from
Stefan Tomanek.

6) Add a networking development FAQ, from Paul Gortmaker.

7) Extend the information provided by tcp_probe and add ipv6 support,
from Daniel Borkmann.

8) Use RCU locking more extensively in openvswitch data paths, from
Pravin B Shelar.

9) Add SCTP support to openvswitch, from Joe Stringer.

10) Add EF10 chip support to SFC driver, from Ben Hutchings.

11) Add new SYNPROXY netfilter target, from Patrick McHardy.

12) Compute a rate approximation for sending in TCP sockets, and use
this to more intelligently coalesce TSO frames. Furthermore, add
a new packet scheduler which takes advantage of this estimate when
available. From Eric Dumazet.

13) Allow AF_PACKET fanouts with random selection, from Daniel
Borkmann.

14) Add ipv6 support to vxlan driver, from Cong Wang"

Resolved conflicts as per discussion.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits)
openvswitch: Fix alignment of struct sw_flow_key.
netfilter: Fix build errors with xt_socket.c
tcp: Add missing braces to do_tcp_setsockopt
caif: Add missing braces to multiline if in cfctrl_linkup_request
bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize
vxlan: Fix kernel panic on device delete.
net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
icplus: Use netif_running to determine device state
ethernet/arc/arc_emac: Fix huge delays in large file copies
tuntap: orphan frags before trying to set tx timestamp
tuntap: purge socket error queue on detach
qlcnic: use standard NAPI weights
ipv6:introduce function to find route for redirect
bnx2x: VF RSS support - VF side
bnx2x: VF RSS support - PF side
vxlan: Notify drivers for listening UDP port changes
net: usbnet: update addr_assign_type if appropriate
driver/net: enic: update enic maintainers and driver
driver/net: enic: Exposing symbols for Cisco's low latency driver
...

Linus Torvalds
2013-09-06 05:54:29 +0800
0d40f75bd openvswitch: Fix alignment of struct sw_flow_key. ... Browse Code »

sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
However, this breaks on the m68k architecture where long is 32 bit in
size but 16 bit aligned by default. This aligns to the size of a long to
ensure that we can always do comparsions in full long-sized chunks. It
also adds an additional build check to catch any reduction in alignment.

CC: Andy Zhou
Reported-by: Fengguang Wu
Reported-by: Geert Uytterhoeven
Signed-off-by: Jesse Gross
Signed-off-by: David S. Miller

Jesse Gross
2013-09-06 03:54:37 +0800
06c54055b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
net/bridge/br_multicast.c
net/ipv6/sit.c

The conflicts were minor:

1) sit.c changes overlap with change to ip_tunnel_xmit() signature.

2) br_multicast.c had an overlap between computing max_delay using
msecs_to_jiffies and turning MLDV2_MRC() into an inline function
with a name using lowercase instead of uppercase letters.

3) stmmac had two overlapping changes, one which conditionally allocated
and hooked up a dma_cfg based upon the presence of the pbl OF property,
and another one handling store-and-forward DMA made. The latter of
which should not go into the new of_find_property() basic block.

Signed-off-by: David S. Miller

David S. Miller
2013-09-06 02:58:52 +0800
1a5bbfc3d netfilter: Fix build errors with xt_socket.c ... Browse Code »

As reported by Randy Dunlap:

====================
when CONFIG_IPV6=m
and CONFIG_NETFILTER_XT_MATCH_SOCKET=y:

net/built-in.o: In function `socket_mt6_v1_v2':
xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup'
net/built-in.o: In function `socket_mt_init':
xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable'
====================

Like several other modules under net/netfilter/ we have to
have a dependency "IPV6 disabled or set compatibly with this
module" clause.

Reported-by: Randy Dunlap
Signed-off-by: David S. Miller

David S. Miller
2013-09-06 02:38:03 +0800
e2e5c4c07 tcp: Add missing braces to do_tcp_setsockopt ... Browse Code »

Signed-off-by: Dave Jones
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Dave Jones
2013-09-06 02:31:02 +0800
0c1db731b caif: Add missing braces to multiline if in cfctrl_linkup_request ... Browse Code »

The indentation here implies this was meant to be a multi-line if.

Introduced several years back in commit c85c2951d4da1236e32f1858db418221e624aba5
("caif: Handle dev_queue_xmit errors.")

Signed-off-by: Dave Jones
Signed-off-by: David S. Miller

Dave Jones
2013-09-06 02:31:02 +0800
b55b76b22 ipv6:introduce function to find route for redirect ... Browse Code »

RFC 4861 says that the IP source address of the Redirect is the
same as the current first-hop router for the specified ICMP
Destination Address, so the gateway should be taken into
consideration when we find the route for redirect.

There was once a check in commit
a6279458c534d01ccc39498aba61c93083ee0372 ("NDISC: Search over
all possible rules on receipt of redirect.") and the check
went away in commit b94f1c0904da9b8bf031667afc48080ba7c3e8c9
("ipv6: Use icmpv6_notify() to propagate redirect, instead of
rt6_redirect()").

The bug is only "exploitable" on layer-2 because the source
address of the redirect is checked to be a valid link-local
address but it makes spoofing a lot easier in the same L2
domain nonetheless.

Thanks very much for Hannes's help.

Signed-off-by: Duan Jiong
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Duan Jiong
2013-09-06 00:44:31 +0800
3c3769e63 bridge: apply multicast snooping to IPv6 link-local, too ... Browse Code »

The multicast snooping code should have matured enough to be safely
applicable to IPv6 link-local multicast addresses (excluding the
link-local all nodes address, ff02::1), too.

Signed-off-by: Linus Lüssing
Signed-off-by: David S. Miller

Linus Lüssing
2013-09-06 00:35:53 +0800
8fad9c39f bridge: prevent flooding IPv6 packets that do not have a listener ... Browse Code »

Currently if there is no listener for a certain group then IPv6 packets
for that group are flooded on all ports, even though there might be no
host and router interested in it on a port.

With this commit they are only forwarded to ports with a multicast
router.

Just like commit bd4265fe36 ("bridge: Only flood unregistered groups
to routers") did for IPv4, let's do the same for IPv6 with the same
reasoning.

Signed-off-by: Linus Lüssing
Signed-off-by: David S. Miller

Linus Lüssing
2013-09-06 00:35:41 +0800

05 Sep, 2013

15 commits

2f048db46 SUNRPC: Add an identifier for struct rpc_clnt ... Browse Code »

Add an identifier in order to aid debugging.

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-09-05 22:13:15 +0800
27703bb4a Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull PTR_RET() removal patches from Rusty Russell:
"PTR_RET() is a weird name, and led to some confusing usage. We ended
up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages.

This has been sitting in linux-next for a whole cycle"

[ There are still some PTR_RET users scattered about, with some of them
possibly being new, but most of them existing in Rusty's tree too. We
have that

#define PTR_RET(p) PTR_ERR_OR_ZERO(p)

thing in , so they continue to work for now - Linus ]

* tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO
Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO
drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO
sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO
dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO
drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO
mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR().
staging/zcache: don't use PTR_RET().
remoteproc: don't use PTR_RET().
pinctrl: don't use PTR_RET().
acpi: Replace weird use of PTR_RET.
s390: Replace weird use of PTR_RET.
PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.
PTR_RET is now PTR_ERR_OR_ZERO

Linus Torvalds
2013-09-05 08:31:11 +0800
b4af8def5 net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions ... Browse Code »

We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
more readable.

Signed-off-by: Daniel Borkmann
Cc: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-05 02:53:21 +0800
2b7c121f8 net: ipv6: mld: refactor query processing into v1/v2 functions ... Browse Code »

Make igmp6_event_query() a bit easier to read by refactoring code
parts into mld_process_v1() and mld_process_v2().

Signed-off-by: Daniel Borkmann
Cc: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-05 02:53:21 +0800
cc7f7ab75 net: ipv6: mld: similarly to MLDv2 have min max_delay of 1 ... Browse Code »

Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.

Signed-off-by: Daniel Borkmann
Cc: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-05 02:53:21 +0800
58c0ecfd8 net: ipv6: mld: implement RFC3810 MLDv2 mode only ... Browse Code »

RFC3810, 10. Security Considerations says under subsection 10.1.
Query Message:

A forged Version 1 Query message will put MLDv2 listeners on that
link in MLDv1 Host Compatibility Mode. This scenario can be avoided
by providing MLDv2 hosts with a configuration option to ignore
Version 1 messages completely.

Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:

echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version or
echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version

Note that device has a higher precedence as it was previously
also the case in the macro MLD_V1_SEEN() that would "short-circuit"
if condition on case.

Signed-off-by: Daniel Borkmann
Cc: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-05 02:53:20 +0800
e3f5b1704 net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation ... Browse Code »

Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.

Signed-off-by: Daniel Borkmann
Cc: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-05 02:53:20 +0800
6c567b78c net: ipv6: mld: clean up MLD_V1_SEEN macro ... Browse Code »

Replace the macro with a function to make it more readable. GCC will
eventually decide whether to inline this or not (also, that's not
fast-path anyway).

Signed-off-by: Daniel Borkmann
Cc: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-05 02:53:20 +0800
89225d1ce net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. ... Browse Code »

i) RFC3810, 9.2. Query Interval [QI] says:

The Query Interval variable denotes the interval between General
Queries sent by the Querier. Default value: 125 seconds. [...]

ii) RFC3810, 9.3. Query Response Interval [QRI] says:

The Maximum Response Delay used to calculate the Maximum Response
Code inserted into the periodic General Queries. Default value:
10000 (10 seconds) [...] The number of seconds represented by the
[Query Response Interval] must be less than the [Query Interval].

iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:

The Older Version Querier Present Timeout is the time-out for
transitioning a host back to MLDv2 Host Compatibility Mode. When an
MLDv1 query is received, MLDv2 hosts set their Older Version Querier
Present Timer to [Older Version Querier Present Timeout].

This value MUST be ([Robustness Variable] times (the [Query Interval]
in the last Query received)) plus ([Query Response Interval]).

Hence, on *default* the timeout results in:

[RV] = 2, [QI] = 125sec, [QRI] = 10sec
[OVQPT] = [RV] * [QI] + [QRI] = 260sec

Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...

switchback = (idev->mc_qrv + 1) * max_delay

RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:

This section is meant to provide advice to network administrators on
how to tune these settings to their network. Ambitious router
implementations might tune these settings dynamically based upon
changing characteristics of the network. [...]

iv) RFC38010, 9.14.2. Query Interval:

The overall level of periodic MLD traffic is inversely proportional
to the Query Interval. A longer Query Interval results in a lower
overall level of MLD traffic. The value of the Query Interval MUST
be equal to or greater than the Maximum Response Delay used to
calculate the Maximum Response Code inserted in General Query
messages.

I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.

Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.

Hence, introduce necessary helper functions and fix this up properly as it
should be.

Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.

Signed-off-by: Daniel Borkmann
Cc: David Stevens
Cc: Hannes Frederic Sowa
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-05 02:53:20 +0800
8d1018c77 SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints ... Browse Code »

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-09-05 02:45:13 +0800
3a1c75659 net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv ... Browse Code »

In tcp_v6_do_rcv() code, when processing pkt options, we soley work
on our skb clone opt_skb that we've created earlier before entering
tcp_rcv_established() on our way. However, only in condition ...

if (np->rxopt.bits.rxtclass)
np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb));

... we work on skb itself. As we extract every other information out
of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can
already be released by tcp_rcv_established() earlier on. When we try
to access it in ipv6_hdr(), we will dereference freed skb.

[ Bug added by commit 4c507d2897bd9b ("net: implement IP_RECVTOS for
IP_PKTOPTIONS") ]

Signed-off-by: Daniel Borkmann
Cc: Eric Dumazet
Acked-by: Eric Dumazet
Acked-by: Jiri Benc
Signed-off-by: David S. Miller

Daniel Borkmann
2013-09-05 02:44:41 +0800
52f20e655 tcp: better comments for RTO initiallization ... Browse Code »

Commit 1b7fdd2ab585("tcp: do not use cached RTT for RTT estimation")
removes important comments on how RTO is initialized and updated.
Hopefully this patch puts those information back.

Signed-off-by: Yuchung Cheng
Acked-by: Neal Cardwell
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yuchung Cheng
2013-09-05 02:41:55 +0800
25a6e6b84 ipv6: Don't depend on per socket memory for neighbour discovery messages ... Browse Code »

Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.

If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.

The number of neighbour discovery messages sent is very limited,
use of alloc_skb() bypasses the socket wmem buffer size enforcement
while the manual call to skb_set_owner_w() maintains the socket
reference needed for the IPv6 output path.

This patch has orginally been posted by Eric Dumazet in a modified
form.

Signed-off-by: Thomas Graf
Cc: Eric Dumazet
Cc: Hannes Frederic Sowa
Cc: Stephen Warren
Cc: Fabio Estevam
Tested-by: Fabio Estevam
Tested-by: Stephen Warren
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Thomas Graf
2013-09-05 02:37:41 +0800
639739b5e ipv6: fix null pointer dereference in __ip6addrlbl_add ... Browse Code »

Commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a ("hlist: drop
the node parameter from iterators") changed the behavior of
hlist_for_each_entry_safe to leave the p argument NULL.

Fix this up by tracking the last argument.

Reported-by: Michele Baldessari
Cc: Hideaki YOSHIFUJI
Cc: Sasha Levin
Signed-off-by: Hannes Frederic Sowa
Tested-by: Michele Baldessari
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-09-05 02:14:53 +0800
c08751c85 net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 ... Browse Code »

net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4

Initially the problem was observed with ipsec, but later it became clear that
SCTP data chunk fragmentation algorithm has problems with MTU values which are
not multiple of 4. Test program was used which just transmits 2000 bytes long
packets to other host. tcpdump was used to observe re-fragmentation in IP layer
after SCTP already fragmented data chunks.

With MTU 1500:
12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

With MTU 1499:
13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
10.151.38.153 > 10.151.24.91: ip-proto-132
13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

Here problem in data portion limit calculation leads to re-fragmentation in IP,
which is sub-optimal. The problem is max_data initial value, which doesn't take
into account the fact, that data chunk must be padded to 4-bytes boundary.
It's enough to correct max_data, because all later adjustments are correctly
aligned to 4-bytes boundary.

After the fix is applied, everything is fragmented correctly for uneven MTUs:
15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

The bug was there for years already, but
- is a performance issue, the packets are still transmitted
- doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)

Signed-off-by: Alexander Sverdlin
Acked-by: Vlad Yasevich
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Alexander Sverdlin
2013-09-05 01:20:27 +0800