19 Oct, 2011
1 commit
-
The transparent socket option setting was not copied to the time wait
socket when an inet socket was being replaced by a time wait socket. This
broke the --transparent option of the socket match and may have caused
that FIN packets belonging to sockets in FIN_WAIT2 or TIME_WAIT state
were being dropped by the packet filter.Signed-off-by: KOVACS Krisztian
Signed-off-by: David S. Miller
05 Oct, 2011
2 commits
-
lost_skb_hint is used by tcp_mark_head_lost() to mark the first unhandled skb.
lost_cnt_hint is the number of packets or sacked packets before the lost_skb_hint;
When shifting a skb that is before the lost_skb_hint, if tcp_is_fack() is ture,
the skb has already been counted in the lost_cnt_hint; if tcp_is_fack() is false,
tcp_sacktag_one() will increase the lost_cnt_hint. So tcp_shifted_skb() does not
need to adjust the lost_cnt_hint by itself. When shifting a skb that is equal to
lost_skb_hint, the shifted packets will not be counted by tcp_mark_head_lost().
So tcp_shifted_skb() should adjust the lost_cnt_hint even tcp_is_fack(tp) is true.Signed-off-by: Zheng Yan
Signed-off-by: David S. Miller -
tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers
only hold one reference to md5sig_pool. but tcp_v4_md5_do_add()
increases use count of md5sig_pool for each peer. This patch
makes tcp_v4_md5_do_add() only increases use count for the first
tcp md5sig peer.Signed-off-by: Zheng Yan
Signed-off-by: David S. Miller
19 Sep, 2011
1 commit
-
D-SACK is allowed to reside below snd_una. But the corresponding check
in tcp_is_sackblock_valid() is the exact opposite. It looks like a typo.Signed-off-by: Zheng Yan
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Sep, 2011
1 commit
-
Commit 4670994d(net,rcu: convert call_rcu(fc_rport_free_rcu) to
kfree_rcu()) introduced a memory leak. This patch reverts it.Signed-off-by: Zheng Yan
Signed-off-by: David S. Miller
16 Sep, 2011
2 commits
-
"Possible SYN flooding on port xxxx " messages can fill logs on servers.
Change logic to log the message only once per listener, and add two new
SNMP counters to track :TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client
TCPReqQFullDrop : number of times a SYN request was dropped because
syncookies were not enabled.Based on a prior patch from Tom Herbert, and suggestions from David.
Signed-off-by: Eric Dumazet
CC: Tom Herbert
Signed-off-by: David S. Miller
31 Aug, 2011
2 commits
-
commit d0733d2e29b65 (Check for mistakenly passed in non-IPv4 address)
added regression on legacy apps that use bind() with AF_UNSPEC family.Relax the check, but make sure the bind() is done on INADDR_ANY
addresses, as AF_UNSPEC has probably no sane meaning for other
addresses.Bugzilla reference : https://bugzilla.kernel.org/show_bug.cgi?id=42012
Signed-off-by: Eric Dumazet
Reported-and-bisected-by: Rene Meier
CC: Marcus Meissner
Signed-off-by: David S. Miller
30 Aug, 2011
1 commit
-
A userspace listener may send (bogus) NF_STOLEN verdict, which causes skb leak.
This problem was previously fixed via
64507fdbc29c3a622180378210ecea8659b14e40 (netfilter:
nf_queue: fix NF_STOLEN skb leak) but this had to be reverted because
NF_STOLEN can also be returned by a netfilter hook when iterating the
rules in nf_reinject.Reject userspace NF_STOLEN verdict, as suggested by Michal Miroslaw.
This is complementary to commit fad54440438a7c231a6ae347738423cbabc936d9
(netfilter: avoid double free in nf_reinject).Cc: Julian Anastasov
Cc: Eric Dumazet
Signed-off-by: Florian Westphal
Signed-off-by: Patrick McHardy
25 Aug, 2011
1 commit
-
Should check use count of include mode filter instead of total number
of include mode filters.Signed-off-by: Zheng Yan
Signed-off-by: David S. Miller
11 Aug, 2011
2 commits
-
As rt_iif represents input device even for packets
coming from loopback with output route, it is not an unique
key specific to input routes. Now rt_route_iif has such role,
it was fl.iif in 2.6.38, so better to change the checks at
some places to save CPU cycles and to restore 2.6.38 semantics.compare_keys:
- input routes: only rt_route_iif matters, rt_iif is same
- output routes: only rt_oif matters, rt_iif is not
used for matching in __ip_route_output_key
- now we are back to 2.6.38 stateip_route_input_common:
- matching rt_route_iif implies input route
- compared to 2.6.38 we eliminated one rth->fl.oif check
because it was not needed even for 2.6.38compare_hash_inputs:
Only the change here is not an optimization, it has
effect only for output routes. I assume I'm restoring
the original intention to ignore oif, it was using fl.iif
- now we are back to 2.6.38 stateSigned-off-by: Julian Anastasov
Signed-off-by: David S. Miller -
Using a gcc 4.4.3, warnings are emitted for a possibly uninitialized use
of ecn_ok.This can happen if cookie_check_timestamp() returns due to not having
seen a timestamp. Defaulting to ecn off seems like a reasonable thing
to do in this case, so initialized ecn_ok to false.Signed-off-by: Mike Waychison
Signed-off-by: David S. Miller
08 Aug, 2011
5 commits
-
Make sure skb dst has reference when moving to
another context. Currently, I don't see protocols that can
hit it when sending broadcasts/multicasts to loopback using
noref dsts, so it is just a precaution.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller -
The raw sockets can provide source address for
routing but their privileges are not considered. We
can provide non-local source address, make sure the
FLOWI_FLAG_ANYSRC flag is set if socket has privileges
for this, i.e. based on hdrincl (IP_HDRINCL) and
transparent flags.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller -
TCP in some cases uses different global (raw) socket
to send RST and ACK. The transparent flag is not set there.
Currently, it is a problem for rerouting after the previous
change.Fix it by simplifying the checks in ip_route_me_harder
and use FLOWI_FLAG_ANYSRC even for sockets. It looks safe
because the initial routing allowed this source address to
be used and now we just have to make sure the packet is rerouted.As a side effect this also allows rerouting for normal
raw sockets that use spoofed source addresses which was not possible
even before we eliminated the ip_route_input call.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller -
IP_PKTOPTIONS is broken for 32-bit applications running
in COMPAT mode on 64-bit kernels.This happens because msghdr's msg_flags field is always
set to zero. When running in COMPAT mode this should be
set to MSG_CMSG_COMPAT instead.Signed-off-by: Tiberiu Szocs-Mihai
Signed-off-by: Daniel Baluta
Signed-off-by: David S. Miller -
compare_keys and ip_route_input_common rely on
rt_oif for distinguishing of input and output routes
with same keys values. But sometimes the input route has
also same hash chain (keyed by iif != 0) with the output
routes (keyed by orig_oif=0). Problem visible if running
with small number of rhash_entries.Fix them to use rt_route_iif instead. By this way
input route can not be returned to users that request
output route.The patch fixes the ip_rt_bug errors that were
reported in ip_local_out context, mostly for 255.255.255.255
destinations.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller
07 Aug, 2011
1 commit
-
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.Reported-by: Dan Kaminsky
Tested-by: Willy Tarreau
Signed-off-by: David S. Miller
03 Aug, 2011
1 commit
-
Gergely Kalman reported crashes in check_peer_redir().
It appears commit f39925dbde778 (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)Many thanks to Gergely for providing a pretty report pointing to the
bug.Reported-by: Gergely Kalman
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Aug, 2011
1 commit
-
Convert array index from the loop bound to the loop index.
A simplified version of the semantic patch that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)//
@@
expression e1,e2,ar;
@@for(e1 = 0; e1 < e2; e1++) { }
//Signed-off-by: Julia Lawall
Signed-off-by: David S. Miller
29 Jul, 2011
1 commit
-
ipq_build_packet_message() in net/ipv4/netfilter/ip_queue.c and
net/ipv6/netfilter/ip6_queue.c contain a small potential mem leak as
far as I can tell.We allocate memory for 'skb' with alloc_skb() annd then call
nlh = NLMSG_PUT(skb, 0, 0, IPQM_PACKET, size - sizeof(*nlh));NLMSG_PUT is a macro
NLMSG_PUT(skb, pid, seq, type, len) \
NLMSG_NEW(skb, pid, seq, type, len, 0)that expands to NLMSG_NEW, which is also a macro which expands to:
NLMSG_NEW(skb, pid, seq, type, len, flags) \
({ if (unlikely(skb_tailroom(skb) < (int)NLMSG_SPACE(len))) \
goto nlmsg_failure; \
__nlmsg_put(skb, pid, seq, type, len, flags); })If we take the true branch of the 'if' statement and 'goto
nlmsg_failure', then we'll, at that point, return from
ipq_build_packet_message() without having assigned 'skb' to anything
and we'll leak the memory we allocated for it when it goes out of
scope.Fix this by placing a 'kfree(skb)' at 'nlmsg_failure'.
I admit that I do not know how likely this to actually happen or even
if there's something that guarantees that it will never happen - I'm
not that familiar with this code, but if that is so, I've not been
able to spot it.Signed-off-by: Jesper Juhl
Signed-off-by: Patrick McHardy
28 Jul, 2011
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
tg3: Remove 5719 jumbo frames and TSO blocks
tg3: Break larger frags into 4k chunks for 5719
tg3: Add tx BD budgeting code
tg3: Consolidate code that calls tg3_tx_set_bd()
tg3: Add partial fragment unmapping code
tg3: Generalize tg3_skb_error_unmap()
tg3: Remove short DMA check for 1st fragment
tg3: Simplify tx bd assignments
tg3: Reintroduce tg3_tx_ring_info
ASIX: Use only 11 bits of header for data size
ASIX: Simplify condition in rx_fixup()
Fix cdc-phonet build
bonding: reduce noise during init
bonding: fix string comparison errors
net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared
net: add IFF_SKB_TX_SHARED flag to priv_flags
net: sock_sendmsg_nosec() is static
forcedeth: fix vlans
gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
gro: Only reset frag0 when skb can be pulled
...
27 Jul, 2011
1 commit
-
This allows us to move duplicated code in
(atomic_inc_not_zero() for now) toSigned-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jul, 2011
1 commit
-
If a device event generates gratuitous ARP messages, only primary
address is used for sending. This patch iterates through the whole
list. Tested with 2 IP addresses configuration on bonding interface.Signed-off-by: Zoltan Kiss
Signed-off-by: David S. Miller
24 Jul, 2011
2 commits
-
Fix improper protocol err_handler, current implementation is fully
unapplicable and may cause kernel crash due to double kfree_skb.Signed-off-by: Dmitry Kozlov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
rt_tos was changed to iph->tos but it must be filtered by RT_TOS
Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller
22 Jul, 2011
6 commits
-
icmp_route_lookup() uses the wrong flow parameters if the reverse
session route lookup isn't used.So do not commit to the re-decoded flow until we actually make a
final decision to use a real route saved in 'rt2'.Reported-by: Florian Westphal
Signed-off-by: David S. Miller -
Because the ip fragment offset field counts 8-byte chunks, ip
fragments other than the last must contain a multiple of 8 bytes of
payload. ip_ufo_append_data wasn't respecting this constraint and,
depending on the MTU and ip option sizes, could create malformed
non-final fragments.Google-Bug-Id: 5009328
Signed-off-by: Bill Sommerfeld
Signed-off-by: David S. Miller -
IPv6 fragment identification generation is way beyond what we use for
IPv4 : It uses a single generator. Its not scalable and allows DOS
attacks.Now inetpeer is IPv6 aware, we can use it to provide a more secure and
scalable frag ident generator (per destination, instead of system wide)This patch :
1) defines a new secure_ipv6_id() helper
2) extends inet_getid() to provide 32bit results
3) extends ipv6_select_ident() with a new dest parameterReported-by: Fernando Gont
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
- remove useless vlan parameters and pointers
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
no longer used
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
19 Jul, 2011
1 commit
-
Compiler is not smart enough to avoid double BSWAP instructions in
ntohl(inet_make_mask(plen)).Lets cache this value in struct leaf_info, (fill a hole on 64bit arches)
With route cache disabled, this saves ~2% of cpu in udpflood bench on
x86_64 machine.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Jul, 2011
3 commits
-
In the future dst entries will be neigh-less. In that environment we
need to have an easy transition point for current users of
dst->neighbour outside of the packet output fast path.Signed-off-by: David S. Miller
-
dst_{get,set}_neighbour()
Signed-off-by: David S. Miller
-
This will get us closer to being able to do "neigh stuff"
completely independent of the underlying dst_entry for
protocols (ipv4/ipv6) that wish to do so.We will also be able to make dst entries neigh-less.
Signed-off-by: David S. Miller
17 Jul, 2011
3 commits
-
It is always dev_queue_xmit().
Signed-off-by: David S. Miller
-
It's just taking on one of two possible values, either
neigh_ops->output or dev_queue_xmit(). And this is purely depending
upon whether nud_state has NUD_CONNECTED set or not.Signed-off-by: David S. Miller
-
It's always dev_queue_xmit().
Signed-off-by: David S. Miller