Eric Lee / smarc-fsl-linux-kernel

10 Dec, 2011

1 commit

fce823381 udp: Export code sk lookup routines ... Browse Code »

The UDP diag get_exact handler will require them to find a
socket by provided net, [sd]addr-s, [sd]ports and device.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2011-12-10 03:14:08 +0800

03 Dec, 2011

1 commit

b3613118e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2011-12-03 02:49:21 +0800

02 Dec, 2011

1 commit

59c2cdae2 Revert "udp: remove redundant variable" ... Browse Code »

This reverts commit 81d54ec8479a2c695760da81f05b5a9fb2dbe40a.

If we take the "try_again" goto, due to a checksum error,
the 'len' has already been truncated. So we won't compute
the same values as the original code did.

Reported-by: paul bilke
Signed-off-by: David S. Miller

David S. Miller
2011-12-02 03:12:55 +0800

17 Nov, 2011

1 commit

c8f44affb net: introduce and use netdev_features_t for device features sets ... Browse Code »
86

v2: add couple missing conversions in drivers
split unexporting netdev_fix_features()
implemented %pNF
convert sock::sk_route_(no?)caps

Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2011-11-17 06:43:10 +0800

10 Nov, 2011

1 commit

d826eb14e ipv4: PKTINFO doesnt need dst reference ... Browse Code »

Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :

> At least, in recent kernels we dont change dst->refcnt in forwarding
> patch (usinf NOREF skb->dst)
>
> One particular point is the atomic_inc(dst->refcnt) we have to perform
> when queuing an UDP packet if socket asked PKTINFO stuff (for example a
> typical DNS server has to setup this option)
>
> I have one patch somewhere that stores the information in skb->cb[] and
> avoid the atomic_{inc|dec}(dst->refcnt).
>

OK I found it, I did some extra tests and believe its ready.

[PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference

When a socket uses IP_PKTINFO notifications, we currently force a dst
reference for each received skb. Reader has to access dst to get needed
information (rt_iif & rt_spec_dst) and must release dst reference.

We also forced a dst reference if skb was put in socket backlog, even
without IP_PKTINFO handling. This happens under stress/load.

We can instead store the needed information in skb->cb[], so that only
softirq handler really access dst, improving cache hit ratios.

This removes two atomic operations per packet, and false sharing as
well.

On a benchmark using a mono threaded receiver (doing only recvmsg()
calls), I can reach 720.000 pps instead of 570.000 pps.

IP_PKTINFO is typically used by DNS servers, and any multihomed aware
UDP application.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-10 05:36:27 +0800

02 Nov, 2011

2 commits

0ad92ad03 udp: fix a race in encap_rcv handling ... Browse Code »

udp_queue_rcv_skb() has a possible race in encap_rcv handling, since
this pointer can be changed anytime.

We should use ACCESS_ONCE() to close the race.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-02 12:51:27 +0800
73cb88ecb net: make the tcp and udp file_operations for the /proc stuff const ... Browse Code »

the tcp and udp code creates a set of struct file_operations at runtime
while it can also be done at compile time, with the added benefit of then
having these file operations be const.

the trickiest part was to get the "THIS_MODULE" reference right; the naive
method of declaring a struct in the place of registration would not work
for this reason.

Signed-off-by: Arjan van de Ven
Signed-off-by: David S. Miller

Arjan van de Ven
2011-11-02 05:56:14 +0800

18 Aug, 2011

1 commit

bdeab9919 rps: Add flag to skb to indicate rxhash is based on L4 tuple ... Browse Code »

The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet. This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2011-08-18 11:06:03 +0800

12 Aug, 2011

1 commit

33d480ce6 net: cleanup some rcu_dereference_raw ... Browse Code »

RCU api had been completed and rcu_access_pointer() or
rcu_dereference_protected() are better than generic
rcu_dereference_raw()

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-08-12 17:55:28 +0800

14 Jul, 2011

1 commit

6a7ebdf2f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
net/bluetooth/l2cap_core.c

David S. Miller
2011-07-14 22:56:40 +0800

07 Jul, 2011

1 commit

f03d78db6 net: refine {udp|tcp|sctp}_mem limits ... Browse Code »

Current tcp/udp/sctp global memory limits are not taking into account
hugepages allocations, and allow 50% of ram to be used by buffers of a
single protocol [ not counting space used by sockets / inodes ...]

Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram
per protocol, and a minimum of 128 pages.
Heavy duty machines sysadmins probably need to tweak limits anyway.

References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032
Reported-by: starlight
Suggested-by: Andrew Morton
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-07-07 15:27:05 +0800

06 Jul, 2011

1 commit

e12fe68ce Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Browse Code »

David S. Miller
2011-07-06 14:23:37 +0800

22 Jun, 2011

2 commits

9cfaa8def udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packet ... Browse Code »

Consider this scenario: When the size of the first received udp packet
is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags.
However, if checksum error happens and this is a blocking socket, it will
goto try_again loop to receive the next packet. But if the size of the
next udp packet is smaller than receive buffer, MSG_TRUNC flag should not
be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before
receive the next packet, MSG_TRUNC is still set, which is wrong.

Fix this problem by clearing MSG_TRUNC flag when starting over for a
new packet.

Signed-off-by: Xufeng Zhang
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Xufeng Zhang
2011-06-22 13:34:27 +0800
296f7ea75 udp: add tracepoints for queueing skb to rcvbuf ... Browse Code »

This patch adds a tracepoint to __udp_queue_rcv_skb to get the
return value of ip_queue_rcv_skb. It indicates why kernel drops
a packet at this point.

ip_queue_rcv_skb returns following values in the packet drop case:

rcvbuf is full : -ENOMEM
sk_filter returns error : -EINVAL, -EACCESS, -ENOMEM, etc.
__sk_mem_schedule returns error: -ENOBUF

Signed-off-by: Satoru Moriya
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Satoru Moriya
2011-06-22 07:06:10 +0800

24 May, 2011

1 commit

71338aa7d net: convert %p usage to %pK ... Browse Code »

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree. This patch converts users of %p in net/ to %pK. Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.

Signed-off-by: Dan Rosenberg
Cc: James Morris
Cc: Eric Dumazet
Cc: Thomas Graf
Cc: Eugene Teo
Cc: Kees Cook
Cc: Ingo Molnar
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller

Dan Rosenberg
2011-05-24 13:13:12 +0800

11 May, 2011

1 commit

79ab05314 ipv4: udp: Eliminate remaining uses of rt->rt_src ... Browse Code »

We already track and pass around the correct flow key,
so simply use it in udp_send_skb().

Signed-off-by: David S. Miller

David S. Miller
2011-05-11 04:32:47 +0800

09 May, 2011

3 commits

f5fca6086 ipv4: Pass flow key down into ip_append_*(). ... Browse Code »

This way rt->rt_dst accesses are unnecessary.

Signed-off-by: David S. Miller

David S. Miller
2011-05-09 12:24:07 +0800
77968b782 ipv4: Pass flow keys down into datagram packet building engine. ... Browse Code »

This way ip_output.c no longer needs rt->rt_{src,dst}.

We already have these keys sitting, ready and waiting, on the stack or
in a socket structure.

Signed-off-by: David S. Miller

David S. Miller
2011-05-09 12:24:06 +0800
e474995f2 udp: Use flow key information instead of rt->rt_{src,dst} ... Browse Code »

We have two cases.

Either the socket is in TCP_ESTABLISHED state and connect() filled
in the inet socket cork flow, or we looked up the route here and
used an on-stack flow.

Track which one it was, and use it to obtain src/dst addrs.

Signed-off-by: David S. Miller

David S. Miller
2011-05-09 12:12:48 +0800

29 Apr, 2011

1 commit

f6d8bd051 inet: add RCU protection to inet->opt ... Browse Code »

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet
Cc: Herbert Xu
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-29 04:16:35 +0800

23 Apr, 2011

1 commit

b71d1d426 inet: constify ip headers and in6_addr ... Browse Code »

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-23 02:04:14 +0800

12 Apr, 2011

1 commit

1c01a80cf Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/smsc911x.c

David S. Miller
2011-04-12 04:44:25 +0800

31 Mar, 2011

2 commits

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800
c0951cbcf ipv4: Use flowi4_init_output() in udp_sendmsg() ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-31 19:54:27 +0800

13 Mar, 2011

5 commits

9cce96df5 net: Put fl4_* macros to struct flowi4 and use them again. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:54 +0800
b6f21b268 ipv4: Use flowi4 in UDP ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:50 +0800
9d6ec9380 ipv4: Use flowi4 in public route lookup interfaces. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:48 +0800
6281dcc94 net: Make flowi ports AF dependent. ... Browse Code »

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:46 +0800
1d28f42c1 net: Put flowi_* prefix on AF independent members of struct flowi ... Browse Code »

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:44 +0800

04 Mar, 2011

1 commit

06dc94b1e ipv4: Fix crash in dst_release when udp_sendmsg route lookup fails. ... Browse Code »

As reported by Eric:

[11483.697233] IP: [] dst_release+0x18/0x60
...
[11483.697741] Call Trace:
[11483.697764] [] udp_sendmsg+0x282/0x6e0
[11483.697790] [] ? memcpy_toiovec+0x51/0x70
[11483.697818] [] ? ip_generic_getfrag+0x0/0xb0

The pointer passed to dst_release() is -EINVAL, that's because
we leave an error pointer in the local variable "rt" by accident.

NULL it out to fix the bug.

Reported-by: Eric Dumazet
Signed-off-by: David S. Miller

David S. Miller
2011-03-04 02:38:01 +0800

03 Mar, 2011

1 commit

b23dd4fe4 ipv4: Make output route lookup return rtable directly. ... Browse Code »

Instead of on the stack.

Signed-off-by: David S. Miller

David S. Miller
2011-03-03 06:31:35 +0800

02 Mar, 2011

5 commits

273447b35 ipv4: Kill can_sleep arg to ip_route_output_flow() ... Browse Code »

This boolean state is now available in the flow flags.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:27:04 +0800
5df65e556 net: Add FLOWI_FLAG_CAN_SLEEP. ... Browse Code »

And set is in contexts where the route resolution can sleep.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:22:19 +0800
420d44daa ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" ... Browse Code »

Since that is what the current vague "flags" argument means.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:19:23 +0800
903ab86d1 udp: Add lockless transmit path ... Browse Code »

The UDP transmit path has been running under the socket lock
for a long time because of the corking feature. This means that
transmitting to the same socket in multiple threads does not
scale at all.

However, as most users don't actually use corking, the locking
can be removed in the common case.

This patch creates a lockless fast path where corking is not used.

Please note that this does create a slight inaccuracy in the
enforcement of socket send buffer limits. In particular, we
may exceed the socket limit by up to (number of CPUs) * (packet
size) because of the way the limit is computed.

As the primary purpose of socket buffers is to indicate congestion,
this should not be a great problem for now.

Signed-off-by: Herbert Xu
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Herbert Xu
2011-03-02 04:35:42 +0800
f6b9664f8 udp: Switch to ip_finish_skb ... Browse Code »

This patch converts UDP to use the new ip_finish_skb API. This
would then allows us to more easily use ip_make_skb which allows
UDP to run without a socket lock.

Signed-off-by: Herbert Xu
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Herbert Xu
2011-03-02 04:35:03 +0800

25 Jan, 2011

1 commit

04ed3e741 net: change netdev->features to u32 ... Browse Code »

Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2011-01-25 07:32:47 +0800

18 Dec, 2010

1 commit

b4aa9e05a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/bnx2x/bnx2x.h
drivers/net/wireless/iwlwifi/iwl-1000.c
drivers/net/wireless/iwlwifi/iwl-6000.c
drivers/net/wireless/iwlwifi/iwl-core.h
drivers/vhost/vhost.c

David S. Miller
2010-12-18 04:27:22 +0800

17 Dec, 2010

2 commits

55508d601 net: Use skb_checksum_start_offset() ... Browse Code »

Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).

Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2010-12-17 06:43:14 +0800
fcbdf09d9 net: fix nulls list corruptions in sk_prot_alloc ... Browse Code »

Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.

The patch fixes the following crash:

BUG: unable to handle kernel paging request at fffffffffffffff0
IP: [] udp4_lib_lookup2+0xad/0x370
[] __udp4_lib_lookup+0x282/0x360
[] __udp4_lib_rcv+0x31e/0x700
[] ? ip_local_deliver_finish+0x65/0x190
[] ? ip_local_deliver+0x88/0xa0
[] udp_rcv+0x15/0x20
[] ip_local_deliver_finish+0x65/0x190
[] ip_local_deliver+0x88/0xa0
[] ip_rcv_finish+0x32d/0x6f0
[] ? netif_receive_skb+0x99c/0x11c0
[] ip_rcv+0x2bb/0x350
[] netif_receive_skb+0x99c/0x11c0

Signed-off-by: Leonard Crestez
Signed-off-by: Octavian Purdila
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Octavian Purdila
2010-12-17 06:26:56 +0800