12 Oct, 2014
1 commit
-
Pull file locking related changes from Jeff Layton:
"This release is a little more busy for file locking changes than the
last:- a set of patches from Kinglong Mee to fix the lockowner handling in
knfsd
- a pile of cleanups to the internal file lease API. This should get
us a bit closer to allowing for setlease methods that can block.There are some dependencies between mine and Bruce's trees this cycle,
and I based my tree on top of the requisite patches in Bruce's tree"* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
locks: set fl_owner for leases to filp instead of current->files
locks: give lm_break a return value
locks: __break_lease cleanup in preparation of allowing direct removal of leases
locks: remove i_have_this_lease check from __break_lease
locks: move freeing of leases outside of i_lock
locks: move i_lock acquisition into generic_*_lease handlers
locks: define a lm_setup handler for leases
locks: plumb a "priv" pointer into the setlease routines
nfsd: don't keep a pointer to the lease in nfs4_file
locks: clean up vfs_setlease kerneldoc comments
locks: generic_delete_lease doesn't need a file_lock at all
nfsd: fix potential lease memory leak in nfs4_setlease
locks: close potential race in lease_get_mtime
security: make security_file_set_fowner, f_setown and __f_setown void return
locks: consolidate "nolease" routines
locks: remove lock_may_read and lock_may_write
lockd: rip out deferred lock handling from testlock codepath
NFSD: Get reference of lockowner when coping file_lock
...
24 Sep, 2014
1 commit
-
Conflicts:
arch/mips/net/bpf_jit.c
drivers/net/can/flexcan.cBoth the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.Signed-off-by: David S. Miller
10 Sep, 2014
3 commits
-
Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
break old binaries and any code for which there is no access to source code.
To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.Signed-off-by: Ani Sinha
Signed-off-by: David S. Miller -
Few packets have timestamping enabled. Exit sock_tx_timestamp quickly
in this common case.Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller -
security_file_set_fowner always returns 0, so make it f_setown and
__f_setown void return functions and fix up the error handling in the
callers.Cc: linux-security-module@vger.kernel.org
Signed-off-by: Jeff Layton
Reviewed-by: Christoph Hellwig
06 Sep, 2014
2 commits
-
This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: David S. Miller -
The timestamping API has separate bits for generating and reporting
timestamps. A software timestamp should only be reported for a packet
when the packet has the relevant generation flag (SKBTX_..) set
and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set.The second check was accidentally removed. Reinstitute the original
behavior.Tested:
Without this patch, Documentation/networking/txtimestamp reports
timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set.
After the patch, it only reports them when the flag is set.Fixes: f24b9be5957b ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct")
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
07 Aug, 2014
1 commit
-
sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
stack can store SKBTX_SHARED_FRAG in it.Also first argument (struct sock *) can be const.
Signed-off-by: Eric Dumazet
Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping")
Cc: Willem de Bruijn
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller
06 Aug, 2014
4 commits
-
Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
in the send() call is acknowledged. It implements the feature for TCP.The timestamp is generated when the TCP socket cumulative ACK is moved
beyond the tracked seqno for the first time. The feature ignores SACK
and FACK, because those acknowledge the specific byte, but not
necessarily the entire contents of the buffer up to that byte.Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller -
Kernel transmit latency is often incurred in the packet scheduler.
Introduce a new timestamp on transmission just before entering the
scheduler. When data travels through multiple devices (bonding,
tunneling, ...) each device will export an individual timestamp.Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller -
sk_flags is reaching its limit. New timestamping options will not fit.
Move all of them into a new field sk->sk_tsflags.Added benefit is that this removes boilerplate code to convert between
SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
timestamp logic (netstamp_needed). That can be simplified and this
last key removed, but will leave that for a separate patch.Signed-off-by: Willem de Bruijn
----
The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs,
though that scatters tstamp fields throughout the struct.
Signed-off-by: David S. Miller -
Applications that request kernel tx timestamps with SO_TIMESTAMPING
read timestamps as recvmsg() ancillary data. The response is defined
implicitly as timespec[3].1) define struct scm_timestamping explicitly and
2) add support for new tstamp types. On tx, scm_timestamping always
accompanies a sock_extended_err. Define previously unused field
ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
define the existing behavior.The reception path is not modified. On rx, no struct similar to
sock_extended_err is passed along with SCM_TIMESTAMPING.Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
30 Jul, 2014
1 commit
-
The SO_TIMESTAMPING API defines three types of timestamps: software,
hardware in raw format (hwtstamp) and hardware converted to system
format (syststamp). The last has been deprecated in favor of combining
hwtstamp with a PTP clock driver. There are no active users in the
kernel.The option was device driver dependent. If set, but without hardware
support, the correct behavior is to return zero in the relevant field
in the SCM_TIMESTAMPING ancillary message. Without device drivers
implementing the option, this field is effectively always zero.Remove the internal plumbing to dissuage new drivers from implementing
the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to
avoid breaking existing applications that request the timestamp.Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller
17 Apr, 2014
1 commit
-
Make sys_recv a first class citizen by using the SYSCALL_DEFINEx
macro. Besides being cleaner this will also generate meta data
for the system call so tracing tools like ftrace or LTTng can
resolve this system call.Signed-off-by: Jan Glauber
Signed-off-by: David S. Miller
02 Apr, 2014
1 commit
-
This commit fixes a build error reported by Fengguang, that is
triggered when CONFIG_NETWORK_PHY_TIMESTAMPING is not set:ERROR: "ptp_classify_raw" [drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.ko] undefined!
The fix is to introduce its own file for the PTP BPF classifier,
so that PTP_1588_CLOCK and/or NETWORK_PHY_TIMESTAMPING can select
it independently from each other. IXP4xx driver on ARM needs to
select it as well since it does not seem to select PTP_1588_CLOCK
or similar that would pull it in automatically.This also allows for hiding all of the internals of the BPF PTP
program inside that file, and only exporting relevant API bits
to drivers.This patch also adds a kdoc documentation of ptp_classify_raw()
API to make it clear that it can return PTP_CLASS_* defines. Also,
the BPF program has been translated into bpf_asm code, so that it
can be more easily read and altered (extensively documented in [1]).In the kernel tree under tools/net/ we have bpf_asm and bpf_dbg
tools, so the commented program can simply be translated via
`./bpf_asm -c prog` where prog is a file that contains the
commented code. This makes it easily readable/verifiable and when
there's a need to change something, jump offsets etc do not need
to be replaced manually which can be very error prone. Instead,
a newly translated version via bpf_asm can simply replace the old
code. I have checked opcode diffs before/after and it's the very
same filter.[1] Documentation/networking/filter.txt
Fixes: 164d8c666521 ("net: ptp: do not reimplement PTP/BPF classifier")
Reported-by: Fengguang Wu
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Cc: Richard Cochran
Cc: Jiri Benc
Acked-by: Richard Cochran
Signed-off-by: David S. Miller
15 Mar, 2014
1 commit
-
Conflicts:
drivers/net/usb/r8152.c
drivers/net/xen-netback/netback.cBoth the r8152 and netback conflicts were simple overlapping
changes.Signed-off-by: David S. Miller
14 Mar, 2014
1 commit
-
Pull networking fixes from David Miller:
"I know this is a bit more than you want to see, and I've told the
wireless folks under no uncertain terms that they must severely scale
back the extent of the fixes they are submitting this late in the
game.Anyways:
1) vmxnet3's netpoll doesn't perform the equivalent of an ISR, which
is the correct implementation, like it should. Instead it does
something like a NAPI poll operation. This leads to crashes.From Neil Horman and Arnd Bergmann.
2) Segmentation of SKBs requires proper socket orphaning of the
fragments, otherwise we might access stale state released by the
release callbacks.This is a 5 patch fix, but the initial patches are giving
variables and such significantly clearer names such that the
actual fix itself at the end looks trivial.From Michael S. Tsirkin.
3) TCP control block release can deadlock if invoked from a timer on
an already "owned" socket. Fix from Eric Dumazet.4) In the bridge multicast code, we must validate that the
destination address of general queries is the link local all-nodes
multicast address. From Linus Lüssing.5) The x86 BPF JIT support for negative offsets puts the parameter
for the helper function call in the wrong register. Fix from
Alexei Starovoitov.6) The descriptor type used for RTL_GIGA_MAC_VER_17 chips in the
r8169 driver is incorrect. Fix from Hayes Wang.7) The xen-netback driver tests skb_shinfo(skb)->gso_type bits to see
if a packet is a GSO frame, but that's not the correct test. It
should use skb_is_gso(skb) instead. Fix from Wei Liu.8) Negative msg->msg_namelen values should generate an error, from
Matthew Leach.9) at86rf230 can deadlock because it takes the same lock from it's
ISR and it's hard_start_xmit method, without disabling interrupts
in the latter. Fix from Alexander Aring.10) The FEC driver's restart doesn't perform operations in the correct
order, so promiscuous settings can get lost. Fix from Stefan
Wahren.11) Fix SKB leak in SCTP cookie handling, from Daniel Borkmann.
12) Reference count and memory leak fixes in TIPC from Ying Xue and
Erik Hugne.13) Forced eviction in inet_frag_evictor() must strictly make sure all
frags are deleted, otherwise module unload (f.e. 6lowpan) can
crash. Fix from Florian Westphal.14) Remove assumptions in AF_UNIX's use of csum_partial() (which it
uses as a hash function), which breaks on PowerPC. From Anton
Blanchard.The main gist of the issue is that csum_partial() is defined only
as a value that, once folded (f.e. via csum_fold()) produces a
correct 16-bit checksum. It is legitimate, therefore, for
csum_partial() to produce two different 32-bit values over the
same data if their respective alignments are different.15) Fix endiannes bug in MAC address handling of ibmveth driver, also
from Anton Blanchard.16) Error checks for ipv6 exthdrs offload registration are reversed,
from Anton Nayshtut.17) Externally triggered ipv6 addrconf routes should count against the
garbage collection threshold. Fix from Sabrina Dubroca.18) The PCI shutdown handler added to the bnx2 driver can wedge the
chip if it was not brought up earlier already, which in particular
causes the firmware to shut down the PHY. Fix from Michael Chan.19) Adjust the sanity WARN_ON_ONCE() in qdisc_list_add() because as
currently coded it can and does trigger in legitimate situations.
From Eric Dumazet.20) BNA driver fails to build on ARM because of a too large udelay()
call, fix from Ben Hutchings.21) Fair-Queue qdisc holds locks during GFP_KERNEL allocations, fix
from Eric Dumazet.22) The vlan passthrough ops added in the previous release causes a
regression in source MAC address setting of outgoing headers in
some circumstances. Fix from Peter Boström"* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
ipv6: Avoid unnecessary temporary addresses being generated
eth: fec: Fix lost promiscuous mode after reconnecting cable
bonding: set correct vlan id for alb xmit path
at86rf230: fix lockdep splats
net/mlx4_en: Deregister multicast vxlan steering rules when going down
vmxnet3: fix building without CONFIG_PCI_MSI
MAINTAINERS: add networking selftests to NETWORKING
net: socket: error on a negative msg_namelen
MAINTAINERS: Add tools/net to NETWORKING [GENERAL]
packet: doc: Spelling s/than/that/
net/mlx4_core: Load the IB driver when the device supports IBoE
net/mlx4_en: Handle vxlan steering rules for mac address changes
net/mlx4_core: Fix wrong dump of the vxlan offloads device capability
xen-netback: use skb_is_gso in xenvif_start_xmit
r8169: fix the incorrect tx descriptor version
tools/net/Makefile: Define PACKAGE to fix build problems
x86: bpf_jit: support negative offsets
bridge: multicast: enable snooping on general queries only
bridge: multicast: add sanity check for general query destination
tcp: tcp_release_cb() should release socket ownership
...
13 Mar, 2014
1 commit
-
When copying in a struct msghdr from the user, if the user has set the
msg_namelen parameter to a negative value it gets clamped to a valid
size due to a comparison between signed and unsigned values.Ensure the syscall errors when the user passes in a negative value.
Signed-off-by: Matthew Leach
Signed-off-by: David S. Miller
10 Mar, 2014
1 commit
-
Signed-off-by: Al Viro
14 Feb, 2014
1 commit
-
Prefer pr_*(...) to printk(KERN_* ...).
Signed-off-by: Yang Yingliang
Signed-off-by: David S. Miller
11 Dec, 2013
1 commit
-
This patch makes socketpair() use error paths which do not
rely on heavy-weight call to sys_close(): it's better to try
to push the file descriptor to userspace before installing
the socket file to the file descriptor, so that errors are
catched earlier and being easier to handle.Using sys_close() seems to be the exception, while writing the
file descriptor before installing it look like it's more or less
the norm: eg. except for code used in init/, error handling
involve fput() and put_unused_fd(), but not sys_close().This make socketpair() usage of sys_close() quite unusual.
So it deserves to be replaced by the common pattern relying on
fput() and put_unused_fd() just like, for example, the one used
in pipe(2) or recvmsg(2).Three distinct error paths are still needed since calling
fput() on file structure returned by sock_alloc_file() will
implicitly call sock_release() on the associated socket
structure.Cc: David S. Miller
Cc: Al Viro
Signed-off-by: Yann Droneaud
Link: http://marc.info/?i=1385979146-13825-1-git-send-email-ydroneaud@opteya.com
Signed-off-by: David S. Miller
06 Dec, 2013
1 commit
-
Ben Hutchings says:
====================
SIOCGHWTSTAMP ioctl1. Add the SIOCGHWTSTAMP ioctl and update the timestamping
documentation.
2. Implement SIOCGHWTSTAMP in most drivers that support SIOCSHWTSTAMP.
3. Add a test program to exercise SIOC{G,S}HWTSTAMP.
====================Signed-off-by: David S. Miller
30 Nov, 2013
1 commit
-
If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
original code that would lead to memory corruption in the kernel if you
had audit configured. If you didn't have audit configured it was
harmless.There are some programs such as beta versions of Ruby which use too
large of a buffer and returning an error code breaks them. We should
clamp the ->msg_namelen value instead.Fixes: 1661bf364ae9 ("net: heap overflow in __audit_sockaddr()")
Reported-by: Eric Wong
Signed-off-by: Dan Carpenter
Tested-by: Eric Wong
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
21 Nov, 2013
2 commits
-
In that case it is probable that kernel code overwrote part of the
stack. So we should bail out loudly here.The BUG_ON may be removed in future if we are sure all protocols are
conformant.Suggested-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller -
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size
Suggested-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
20 Nov, 2013
1 commit
-
SIOCSHWTSTAMP returns the real configuration to the application
using it, but there is currently no way for any other
application to find out the configuration non-destructively.
Add a new ioctl for this, making it unprivileged.Signed-off-by: Ben Hutchings
19 Nov, 2013
2 commits
-
Signed-off-by: Ben Hutchings
-
We don't need to check that ifr_data itself is a valid user pointer,
but we should check &ifr_data is. Thankfully the copy of ifr_name is
checked, so this can only leak a few bytes from immediately above the
user address limit.Signed-off-by: Ben Hutchings
04 Oct, 2013
1 commit
-
We need to cap ->msg_namelen or it leads to a buffer overflow when we
to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to
exploit this bug.The call tree is:
___sys_recvmsg()
move_addr_to_user()
audit_sockaddr()
__audit_sockaddr()Reported-by: Jüri Aedla
Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller
14 Sep, 2013
1 commit
-
Pull aio changes from Ben LaHaise:
"First off, sorry for this pull request being late in the merge window.
Al had raised a couple of concerns about 2 items in the series below.
I addressed the first issue (the race introduced by Gu's use of
mm_populate()), but he has not provided any further details on how he
wants to rework the anon_inode.c changes (which were sent out months
ago but have yet to be commented on).The bulk of the changes have been sitting in the -next tree for a few
months, with all the issues raised being addressed"* git://git.kvack.org/~bcrl/aio-next: (22 commits)
aio: rcu_read_lock protection for new rcu_dereference calls
aio: fix race in ring buffer page lookup introduced by page migration support
aio: fix rcu sparse warnings introduced by ioctx table lookup patch
aio: remove unnecessary debugging from aio_free_ring()
aio: table lookup: verify ctx pointer
staging/lustre: kiocb->ki_left is removed
aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
aio: convert the ioctx list to table lookup v3
aio: double aio_max_nr in calculations
aio: Kill ki_dtor
aio: Kill ki_users
aio: Kill unneeded kiocb members
aio: Kill aio_rw_vect_retry()
aio: Don't use ctx->tail unnecessarily
aio: io_cancel() no longer returns the io_event
aio: percpu ioctx refcount
aio: percpu reqs_available
aio: reqs_active -> reqs_available
aio: fix build when migration is disabled
...
12 Sep, 2013
1 commit
-
I found the following pattern that leads in to interesting findings:
grep -r "ret.*|=.*__put_user" *
grep -r "ret.*|=.*__get_user" *
grep -r "ret.*|=.*__copy" *The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat,
since those appear in compat code, we could probably expect the kernel
addresses not to be reachable in the lower 32-bit range, so I think they
might not be exploitable.For the "__get_user" cases, I don't think those are exploitable: the worse
that can happen is that the kernel will copy kernel memory into in-kernel
buffers, and will fail immediately afterward.The alpha csum_partial_copy_from_user() seems to be missing the
access_ok() check entirely. The fix is inspired from x86. This could
lead to information leak on alpha. I also noticed that many architectures
map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I
wonder if the latter is performing the access checks on every
architectures.Signed-off-by: Mathieu Desnoyers
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Jens Axboe
Cc: Oleg Nesterov
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Aug, 2013
1 commit
-
Eliezer renames several *ll_poll to *busy_poll, but forgets
CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.Cc: Eliezer Tamir
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
30 Jul, 2013
2 commits
-
sock_aio_dtor() is dead code - and stuff that does need to do cleanup
can simply do it before calling aio_complete().Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Cc: Theodore Ts'o
Signed-off-by: Benjamin LaHaise -
This code doesn't serve any purpose anymore, since the aio retry
infrastructure has been removed.This change should be safe because aio_read/write are also used for
synchronous IO, and called from do_sync_read()/do_sync_write() - and
there's no looping done in the sync case (the read and write syscalls).Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Signed-off-by: Benjamin LaHaise
11 Jul, 2013
2 commits
-
Rename LL_SO to BUSY_POLL_SO
Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
Fix up users of these variables.
Fix documentation for sysctl.a patch for the socket.7 man page will follow separately,
because of limitations of my mail setup.Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller -
Rename the file and correct all the places where it is included.
Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
09 Jul, 2013
1 commit
-
Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
26 Jun, 2013
1 commit
-
select/poll busy-poll support.
Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txtAdd a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller
18 Jun, 2013
2 commits
-
adds a socket option for low latency polling.
This allows overriding the global sysctl value with a per-socket one.
Unexport sysctl_net_ll_poll since for now it's not needed in modules.Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller -
There is no reason for sysctl_net_ll_poll to be an unsigned long.
Change it into an unsigned int.
Fix the proc handler.Signed-off-by: Eliezer Tamir
Signed-off-by: David S. Miller