14 Oct, 2013
1 commit
-
[ Upstream commit bd784a140712fd06674f2240eecfc4ccae421129 ]
DCCP shouldn't be setting sk_err on redirects as it
isn't an error condition. it should be doing exactly
what tcp is doing and leaving the error handler without
touching the socket.Signed-off-by: Duan Jiong
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
18 Mar, 2013
1 commit
-
TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.before this patch:
average (among 7 runs) of 20845.5 Requests/Second
after:
average (among 7 runs) of 21403.6 Requests/SecondSigned-off-by: Christoph Paasch
Signed-off-by: David S. Miller
22 Feb, 2013
1 commit
-
Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:- add a new function called devm_ioremap_resource() to properly be
able to check return values.- remove CONFIG_EXPERIMENTAL
Other than those patches, there's not much here, some minor fixes and
updates"Fix up trivial conflicts
* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...
19 Feb, 2013
2 commits
-
proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.Signed-off-by: Gao feng
Signed-off-by: David S. Miller -
Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.Signed-off-by: Gao feng
Signed-off-by: David S. Miller
12 Jan, 2013
2 commits
-
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.CC: Gerrit Renker
CC: "David S. Miller"
Signed-off-by: Kees Cook
Acked-by: David S. Miller
Acked-by: Gerrit Renker -
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.CC: Gerrit Renker
CC: "David S. Miller"
Signed-off-by: Kees Cook
Acked-by: David S. Miller
Acked-by: Gerrit Renker
15 Dec, 2012
1 commit
-
If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:unreferenced object 0xffff88022e8a92c0 (size 1592):
comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
hex dump (first 32 bytes):
0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................
02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x21/0x3e
[] kmem_cache_alloc+0xb5/0xc5
[] sk_prot_alloc.isra.53+0x2b/0xcd
[] sk_clone_lock+0x16/0x21e
[] inet_csk_clone_lock+0x10/0x7b
[] tcp_create_openreq_child+0x21/0x481
[] tcp_v4_syn_recv_sock+0x3a/0x23b
[] tcp_check_req+0x29f/0x416
[] tcp_v4_do_rcv+0x161/0x2bc
[] tcp_v4_rcv+0x6c9/0x701
[] ip_local_deliver_finish+0x70/0xc4
[] ip_local_deliver+0x4e/0x7f
[] ip_rcv_finish+0x1fc/0x233
[] ip_rcv+0x217/0x267
[] __netif_receive_skb+0x49e/0x553
[] netif_receive_skb+0x50/0x82This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().A similar approach is taken for dccp by calling dccp_done().
This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version >= 2.6.37.Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller
04 Nov, 2012
1 commit
-
For passive TCP connections using TCP_DEFER_ACCEPT facility,
we incorrectly increment req->retrans each time timeout triggers
while no SYNACK is sent.SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
which we received the ACK from client). Only the last SYNACK is sent
so that we can receive again an ACK from client, to move the req into
accept queue. We plan to change this later to avoid the useless
retransmit (and potential problem as this SYNACK could be lost)TCP_INFO later gives wrong information to user, claiming imaginary
retransmits.Decouple req->retrans field into two independent fields :
num_retrans : number of retransmit
num_timeout : number of timeoutsnum_timeout is the counter that is incremented at each timeout,
regardless of actual SYNACK being sent or not, and used to
compute the exponential timeout.Introduce inet_rtx_syn_ack() helper to increment num_retrans
only if ->rtx_syn_ack() succeeded.Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
when we re-send a SYNACK in answer to a (retransmitted) SYN.
Prior to this patch, we were not counting these retransmits.Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
only if a synack packet was successfully queued.Reported-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Cc: Julian Anastasov
Cc: Vijay Subramanian
Cc: Elliott Hughes
Cc: Neal Cardwell
Signed-off-by: David S. Miller
16 Aug, 2012
2 commits
-
The CCID3 code fails to initialize the trailing padding bytes of struct
tfrc_tx_info added for alignment on 64 bit architectures. It that for
potentially leaks four bytes kernel stack via the getsockopt() syscall.
Add an explicit memset(0) before filling the structure to avoid the
info leak.Signed-off-by: Mathias Krause
Cc: Gerrit Renker
Signed-off-by: David S. Miller -
ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with
a NULL ccid pointer leading to a NULL pointer dereference. This could
lead to a privilege escalation if the attacker is able to map page 0 and
prepare it with a fake ccid_ops pointer.Signed-off-by: Mathias Krause
Cc: Gerrit Renker
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller
24 Jul, 2012
1 commit
-
Use inet_iif() consistently, and for TCP record the input interface of
cached RX dst in inet sock.rt->rt_iif is going to be encoded differently, so that we can
legitimately cache input routes in the FIB info more aggressively.When the input interface is "use SKB device index" the rt->rt_iif will
be set to zero.This forces us to move the TCP RX dst cache installation into the ipv4
specific code, and as well it should since doing the route caching for
ipv6 is pointless at the moment since it is not inspected in the ipv6
input paths yet.Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
obsolete set to a non-zero value to force invocation of the check
callback.Signed-off-by: David S. Miller
21 Jul, 2012
1 commit
-
Signed-off-by: David S. Miller
17 Jul, 2012
1 commit
-
This will be used so that we can compose a full flow key.
Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.Signed-off-by: David S. Miller
16 Jul, 2012
2 commits
-
This is the ipv6 version of inet_csk_update_pmtu().
Signed-off-by: David S. Miller
-
This abstracts away the call to dst_ops->update_pmtu() so that we can
transparently handle the fact that, in the future, the dst itself can
be invalidated by the PMTU update (when we have non-host routes cached
in sockets).So we try to rebuild the socket cached route after the method
invocation if necessary.This isn't used by SCTP because it needs to cache dsts per-transport,
and thus will need it's own local version of this helper.Signed-off-by: David S. Miller
12 Jul, 2012
3 commits
-
No longer necessary.
Signed-off-by: David S. Miller
-
Signed-off-by: David S. Miller
-
Signed-off-by: David S. Miller
11 Jul, 2012
1 commit
-
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller
05 Jul, 2012
1 commit
-
opt always equals np->opts, so it is meaningless to define opt, and
check if opt does not equal np->opts and then try to free opt.Signed-off-by: RongQing.Li
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
23 Jun, 2012
1 commit
-
Don't cache output dst for syncookies, as this adds pressure on IP route
cache and rcu subsystem for no gain.Signed-off-by: Eric Dumazet
Cc: Hans Schillstrom
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
16 Jun, 2012
1 commit
-
One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
to handle the error pass the 32-bit info cookie in network byte order
whereas ipv4 passes it around in host byte order.Like the ipv4 side, we have two helper functions. One for when we
have a socket context and one for when we do not.ip6ip6 tunnels are not handled here, because they handle PMTU events
by essentially relaying another ICMP packet-too-big message back to
the original sender.This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all
kinds of situations that simply cannot happen when we do the PMTU
update directly using a fully resolved route.In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
likely be removed or changed into a BUG_ON() check. We should never
have a prefixed ipv6 route when we get there.Another piece of strange history here is that TCP and DCCP, unlike in
ipv4, never invoke the update_pmtu() method from their ICMP error
handlers. This is incredibly astonishing since this is the context
where we have the most accurate context in which to make a PMTU
update, namely we have a fully connected socket and associated cached
socket route.Signed-off-by: David S. Miller
17 May, 2012
1 commit
-
bool/const conversions where possible
__inline__ -> inline
space cleanups
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
21 Apr, 2012
2 commits
-
This results in code with less boiler plate that is a bit easier
to read.Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
This makes it clearer which sysctls are relative to your current network
namespace.This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller
20 Apr, 2012
1 commit
-
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
16 Apr, 2012
1 commit
-
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
15 Apr, 2012
1 commit
-
There are two struct request_sock_ops providers, tcp and dccp.
inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being
NULL if we make it non NULL like syn_ack_timeoutSigned-off-by: Eric Dumazet
Cc: Gerrit Renker
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller
04 Mar, 2012
2 commits
-
This fixes a bug in the sequence number validation during the initial handshake.
The code did not treat the initial sequence numbers ISS and ISR as read-only and
did not keep state for GSR and GSS as required by the specification. This causes
problems with retransmissions during the initial handshake, causing the
budding connection to be reset.This patch now treats ISS/ISR as read-only and tracks GSS/GSR as required.
Signed-off-by: Samuel Jero
Signed-off-by: Gerrit Renker -
This replaces an unjustified BUG_ON(), which could get triggered under normal
conditions: X_calc can be 0 when p > 0. X would in this case be set to the
minimum, s/t_mbi. Its replacement avoids t_ipi = 0 (unbounded sending rate).Thanks to Jordi, Victor and Xavier who reported this.
Signed-off-by: Gerrit Renker
Acked-by: Ian McDonald
12 Jan, 2012
1 commit
-
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
20 Dec, 2011
2 commits
-
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).
Cc: "David S. Miller"
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell
Signed-off-by: David S. Miller -
DaveM said:
Please, this kind of stuff rots forever and not using bool properly
drives me crazy.Joe Perches gave me the spatch script:
@@
bool b;
@@
-b = 0
+b = false
@@
bool b;
@@
-b = 1
+b = trueI merely installed coccinelle, read the documentation and took credit.
Signed-off-by: Rusty Russell
Signed-off-by: David S. Miller
17 Dec, 2011
1 commit
-
I've made a mistake when fixing the sock_/inet_diag aliases :(
1. The sock_diag layer should request the family-based alias,
not just the IPPROTO_IP one;
2. The inet_diag layer should request for AF_INET+protocol alias,
not just the protocol one.Thus fix this.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
12 Dec, 2011
1 commit
-
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Dec, 2011
2 commits
-
Introduce two callbacks in inet_diag_handler -- one for dumping all
sockets (with filters) and the other one for dumping a single sk.Replace direct calls to icsk handlers with indirect calls to callbacks
provided by handlers.Make existing TCP and DCCP handlers use provided helpers for icsk-s.
The UDP diag module will provide its own.
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
There's an info_size value stored on inet_diag_handler, but for existing
code this value is effectively constant, so just use sizeof(struct tcp_info)
where required.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
07 Dec, 2011
2 commits
-
Sorry, but the vger didn't let this message go to the list. Re-sending it with
less spam-filter-prone subject.When dumping the AF_INET/AF_INET6 sockets user will also specify the protocol,
so prepare the protocol diag handlers to work with IPPROTO_ constants.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The ultimate goal is to get the sock_diag module, that works in
family+protocol terms. Currently this is suitable to do on the
inet_diag basis, so rename parts of the code. It will be moved
to sock_diag.c later.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller