Eric Lee / smarc-fsl-linux-kernel

17 Jun, 2009

1 commit

b3fec0fe3 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck ... Browse Code »

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits)
signal: fix __send_signal() false positive kmemcheck warning
fs: fix do_mount_root() false positive kmemcheck warning
fs: introduce __getname_gfp()
trace: annotate bitfields in struct ring_buffer_event
net: annotate struct sock bitfield
c2port: annotate bitfield for kmemcheck
net: annotate inet_timewait_sock bitfields
ieee1394/csr1212: fix false positive kmemcheck report
ieee1394: annotate bitfield
net: annotate bitfields in struct inet_sock
net: use kmemcheck bitfields API for skbuff
kmemcheck: introduce bitfield API
kmemcheck: add opcode self-testing at boot
x86: unify pte_hidden
x86: make _PAGE_HIDDEN conditional
kmemcheck: make kconfig accessible for other architectures
kmemcheck: enable in the x86 Kconfig
kmemcheck: add hooks for the page allocator
kmemcheck: add hooks for page- and sg-dma-mappings
kmemcheck: don't track page tables
...

Linus Torvalds
2009-06-17 04:09:51 +0800

15 Jun, 2009

2 commits

9e337b0fb net: annotate inet_timewait_sock bitfields ... Browse Code »

The use of bitfields here would lead to false positive warnings with
kmemcheck. Silence them.

(Additionally, one erroneous comment related to the bitfield was also
fixed.)

Signed-off-by: Vegard Nossum

Vegard Nossum
2009-06-15 21:49:32 +0800
e0f7cb8c8 ipv4: Fix fib_trie rebalancing ... Browse Code »

While doing trie_rebalance(): resize(), inflate(), halve() RCU free
tnodes before updating their parents. It depends on RCU delaying the
real destruction, but if RCU readers start after call_rcu() and before
parent update they could access freed memory.

It is currently prevented with preempt_disable() on the update side,
but it's not safe, except maybe classic RCU, plus it conflicts with
memory allocations with GFP_KERNEL flag used from these functions.

This patch explicitly delays freeing of tnodes by adding them to the
list, which is flushed after the update is finished.

Reported-by: Yan Zheng
Signed-off-by: Jarek Poplawski
Signed-off-by: David S. Miller

Jarek Poplawski
2009-06-15 17:31:29 +0800

14 Jun, 2009

3 commits

403dbb97f PIM-SM: namespace changes ... Browse Code »

IPv4:
- make PIM register vifs netns local
- set the netns when a PIM register vif is created
- make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2)
by adding the protocol handler when multicast routing is initialized

IPv6:
- make PIM register vifs netns local
- make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2)
by adding the protocol handler when multicast routing is initialized

Signed-off-by: Tom Goff
Signed-off-by: David S. Miller

Tom Goff
2009-06-14 18:16:13 +0800
e61a4b634 ipv4: update ARPD help text ... Browse Code »

Removed the statements about ARP cache size as this config option does
not affect it. The cache size is controlled by neigh_table gc thresholds.

Remove also expiremental and obsolete markings as the API originally
intended for arp caching is useful for implementing ARP-like protocols
(e.g. NHRP) in user space and has been there for a long enough time.

Signed-off-by: Timo Teras
Signed-off-by: David S. Miller

Timo Teräs
2009-06-14 14:36:32 +0800
125bb8f56 net: use a deferred timer in rt_check_expire ... Browse Code »

For the sake of power saver lovers, use a deferrable timer to fire
rt_check_expire()

As some big routers cache equilibrium depends on garbage collection
done in time, we take into account elapsed time between two
rt_check_expire() invocations to adjust the amount of slots we have to
check.

Based on an initial idea and patch from Tero Kristo

Signed-off-by: Eric Dumazet
Signed-off-by: Tero Kristo
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-14 14:36:31 +0800

12 Jun, 2009

1 commit

24992eacd netfilter: ip_tables: fix build error ... Browse Code »

Fix build error introduced by commit bb70dfa5 (netfilter: xtables:
consolidate comefrom debug cast access):

net/ipv4/netfilter/ip_tables.c: In function 'ipt_do_table':
net/ipv4/netfilter/ip_tables.c:421: error: 'comefrom' undeclared (first use in this function)
net/ipv4/netfilter/ip_tables.c:421: error: (Each undeclared identifier is reported only once
net/ipv4/netfilter/ip_tables.c:421: error: for each function it appears in.)

Signed-off-by: Patrick McHardy

Patrick McHardy
2009-06-12 07:53:09 +0800

11 Jun, 2009

2 commits

36432dae7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 Browse Code »

Patrick McHardy
2009-06-11 22:00:49 +0800
2b85a34e9 net: No more expensive sock_hold()/sock_put() on each tx ... Browse Code »

One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-11 17:55:43 +0800

10 Jun, 2009

1 commit

0808dc809 netfilter: Fix extra semi-colon in skb_walk_frags() changes. ... Browse Code »

Noticed by Jesper Dangaard Brouer

Signed-off-by: David S. Miller

David S. Miller
2009-06-10 09:05:28 +0800

09 Jun, 2009

2 commits

343a99724 netfilter: Use frag list abstraction interfaces. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2009-06-09 15:23:58 +0800
d7fcf1a5c ipv4: Use frag list abstraction interfaces. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2009-06-09 15:19:37 +0800

08 Jun, 2009

1 commit

f87fb666b netfilter: nf_ct_icmp: keep the ICMP ct entries longer ... Browse Code »

Current conntrack code kills the ICMP conntrack entry as soon as
the first reply is received. This is incorrect, as we then see only
the first ICMP echo reply out of several possible duplicates as
ESTABLISHED, while the rest will be INVALID. Also this unnecessarily
increases the conntrackd traffic on H-A firewalls.

Make all the ICMP conntrack entries (including the replied ones)
last for the default of nf_conntrack_icmp{,v6}_timeout seconds.

Signed-off-by: Jan "Yenya" Kasprzak
Signed-off-by: Patrick McHardy

Jan Kasprzak
2009-06-08 21:53:43 +0800

05 Jun, 2009

1 commit

17f2f52be netfilter: ipt_MASQUERADE: remove redundant rwlock ... Browse Code »

The lock "protects" an assignment and a comparision of an integer.
When the caller of device_cmp() evaluates the result, nat->masq_index
may already have been changed (regardless if the lock is there or not).

So, the lock either has to be held during nf_ct_iterate_cleanup(),
or can be removed.

This does the latter.

Signed-off-by: Florian Westphal
Signed-off-by: Patrick McHardy

Florian Westphal
2009-06-05 19:26:21 +0800

04 Jun, 2009

2 commits

a5e788209 netfilter: x_tables: added hook number into match extension parameter structure. ... Browse Code »

Signed-off-by: Evgeniy Polyakov
Signed-off-by: Patrick McHardy

Evgeniy Polyakov
2009-06-04 22:54:42 +0800
2307f866f ipv4: remove ip_mc_drop_socket() declaration from af_inet.c. ... Browse Code »

ip_mc_drop_socket() method is declared in linux/igmp.h, which
is included anyhow in af_inet.c. So there is no need for this declaration.
This patch removes it from af_inet.c.

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2009-06-04 12:43:26 +0800

03 Jun, 2009

4 commits

adf30907d net: skb->dst accessors ... Browse Code »

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-03 17:51:04 +0800
511c3f92a net: skb->rtable accessor ... Browse Code »

Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb

Delete skb->rtable field

Setting rtable is not allowed, just set dst instead as rtable is an alias.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-03 17:51:02 +0800
b2f8f7525 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/forcedeth.c

David S. Miller
2009-06-03 17:43:41 +0800
17e6e4eac netfilter: conntrack: simplify event caching system ... Browse Code »

This patch simplifies the conntrack event caching system by removing
several events:

* IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
since the have no clients.
* IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
days.
* IPCT_REFRESH which is not of any use since we always include the
timeout in the messages.

After this patch, the existing events are:

* IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
addition and deletion of entries.
* IPCT_STATUS, that notes that the status bits have changes,
eg. IPS_SEEN_REPLY and IPS_ASSURED.
* IPCT_PROTOINFO, that reports that internal protocol information has
changed, eg. the TCP, DCCP and SCTP protocol state.
* IPCT_HELPER, that a helper has been assigned or unassigned to this
entry.
* IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
covers the case when a mark is set to zero.
* IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
adjustment.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2009-06-03 02:08:46 +0800

02 Jun, 2009

3 commits

8cc848fa3 Merge branch 'master' of git://dev.medozas.de/linux Browse Code »

Patrick McHardy
2009-06-02 19:44:56 +0800
f771bef98 ipv4: New multicast-all socket option ... Browse Code »

After some discussion offline with Christoph Lameter and David Stevens
regarding multicast behaviour in Linux, I'm submitting a slightly
modified patch from the one Christoph submitted earlier.

This patch provides a new socket option IP_MULTICAST_ALL.

In this case, default behaviour is _unchanged_ from the current
Linux standard. The socket option is set by default to provide
original behaviour. Sockets wishing to receive data only from
multicast groups they join explicitly will need to clear this
socket option.

Signed-off-by: Nivedita Singhvi
Signed-off-by: Christoph Lameter
Acked-by: David Stevens
Signed-off-by: David S. Miller

Nivedita Singhvi
2009-06-02 15:45:24 +0800
4d52cfbef net: ipv4/ip_sockglue.c cleanups ... Browse Code »

Pure cleanups

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-06-02 15:42:16 +0800

30 May, 2009

1 commit

2df9001ed tcp: fix loop in ofo handling code and reduce its complexity ... Browse Code »

Somewhat luckily, I was looking into these parts with very fine
comb because I've made somewhat similar changes on the same
area (conflicts that arose weren't that lucky though). The loop
was very much overengineered recently in commit 915219441d566
(tcp: Use SKB queue and list helpers instead of doing it
by-hand), while it basically just wants to know if there are
skbs after 'skb'.

Also it got broken because skb1 = skb->next got translated into
skb1 = skb1->next (though abstracted) improperly. Note that
'skb1' is pointing to previous sk_buff than skb or NULL if at
head. Two things went wrong:
- We'll kfree 'skb' on the first iteration instead of the
skbuff following 'skb' (it would require required SACK reneging
to recover I think).
- The list head case where 'skb1' is NULL is checked too early
and the loop won't execute whereas it previously did.

Conclusion, mostly revert the recent changes which makes the
cset very messy looking but using proper accessor in the
previous-like version.

The effective changes against the original can be viewed with:
git-diff 915219441d566f1da0caa0e262be49b666159e17^ \
net/ipv4/tcp_input.c | sed -n -e '57,70 p'

Signed-off-by: Ilpo Järvinen
Signed-off-by: David S. Miller

Ilpo Järvinen
2009-05-30 06:02:29 +0800

29 May, 2009

3 commits

108bfa895 net: unset IFF_XMIT_DST_RELEASE in ipgre_tunnel_setup() ... Browse Code »

ipgre_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit()
to no release it.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-05-29 16:46:29 +0800
28e72216d net: unset IFF_XMIT_DST_RELEASE in ipip_tunnel_setup() ... Browse Code »

ipip_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit()
to no release it.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-05-29 16:46:27 +0800
915219441 tcp: Use SKB queue and list helpers instead of doing it by-hand. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2009-05-29 12:35:47 +0800

27 May, 2009

6 commits

a2a804cdd tcp: Do not check flush when comparing options for GRO ... Browse Code »

There is no need to repeatedly check flush when comparing TCP
options for GRO as it will be false 99% of the time where it
matters.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-05-27 18:26:05 +0800
1075f3f65 ipv4: Use 32-bit loads for ID and length in GRO ... Browse Code »

This patch optimises the IPv4 GRO code by using 32-bit loads
(instead of 16-bit ones) on the ID and length checks in the receive
function.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-05-27 18:26:02 +0800
a5b1cf288 gro: Avoid unnecessary comparison after skb_gro_header ... Browse Code »

For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL. Yet we must check it because of its current
form. This patch splits it up into multiple functions in order
to avoid this.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-05-27 18:26:01 +0800
30a3ae30c tcp: Optimise len/mss comparison ... Browse Code »

Instead of checking len > mss || len == 0, we can accomplish
both by checking (len - 1) > mss using the unsigned wraparound.
At nearly a million times a second, this might just help.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-05-27 18:26:00 +0800
4a9a2968a tcp: Remove unnecessary window comparisons for GRO ... Browse Code »

The window has already been checked as part of the flag word
so there is no need to check it explicitly.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-05-27 18:25:59 +0800
745898eaf tcp: Optimise GRO port comparisons ... Browse Code »

Instead of doing two 16-bit operations for the source/destination
ports, we can do one 32-bit operation to take care both.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2009-05-27 18:25:57 +0800

26 May, 2009

1 commit

c80a5cdfc tcp: tcp_vegas ssthresh bugfix ... Browse Code »

This patch fixes ssthresh accounting issues in tcp_vegas when cwnd decreases

Signed-off-by: Doug Leith
Signed-off-by: David S. Miller

Doug Leith
2009-05-26 13:44:59 +0800

25 May, 2009

1 commit

c649c0e31 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/wireless/ath/ath5k/phy.c
drivers/net/wireless/iwlwifi/iwl-agn.c
drivers/net/wireless/iwlwifi/iwl3945-base.c

David S. Miller
2009-05-25 16:42:21 +0800

22 May, 2009

1 commit

3ed18d76d ipv4: Fix oops with FIB_TRIE ... Browse Code »

It seems we can fix this by disabling preemption while we re-balance the
trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high
loads continuesly taking a full BGP table up/down via iproute -batch.

Note. fib_trie is not updated for CONFIG_PREEMPT_RCU

Reported-by: Andrei Popa
Signed-off-by: Robert Olsson
Signed-off-by: David S. Miller

Robert Olsson
2009-05-22 06:20:59 +0800

21 May, 2009

3 commits

04af8cf6f net: Remove unused parameter from fill method in fib_rules_ops. ... Browse Code »

The netlink message header (struct nlmsghdr) is an unused parameter in
fill method of fib_rules_ops struct. This patch removes this
parameter from this method and fixes the places where this method is
called.

(include/net/fib_rules.h)

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2009-05-21 08:26:23 +0800
1ddbcb005 net: fix rtable leak in net/ipv4/route.c ... Browse Code »

Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation
2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
patch has at least one critical flaw, and another problem.

rt_intern_hash calculates rthi pointer, which is later used for new entry
insertion. The same loop calculates cand pointer which is used to clean the
list. If the pointers are the same, rtable leak occurs, as first the cand is
removed then the new entry is appended to it.

This leak leads to unregister_netdevice problem (usage count > 0).

Another problem of the patch is that it tries to insert the entries in certain
order, to facilitate counting of entries distinct by all but QoS parameters.
Unfortunately, referencing an existing rtable entry moves it to list beginning,
to speed up further lookups, so the carefully built order is destroyed.

For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
it will also destroy the ordering.
end_of_quotation

Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.

Reported-and-analyzed-by: Alexander V. Lukyanov
Signed-off-by: Eric Dumazet
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Eric Dumazet
2009-05-21 08:18:02 +0800
cf8da764f net: fix length computation in rt_check_expire() ... Browse Code »

rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.

Signed-off-by: Eric Dumazet
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Eric Dumazet
2009-05-21 08:18:01 +0800

20 May, 2009

1 commit

9643f4551 ipv4: teach ipconfig about the MTU option in DHCP ... Browse Code »

The DHCP spec allows the server to specify the MTU. This can be useful
for netbooting with UDP-based NFS-root on a network using jumbo frames.
This patch allows the kernel IP autoconfiguration to handle this option
correctly.

It would be possible to use initramfs and add a script to set the MTU,
but that seems like a complicated solution if no initramfs is otherwise
necessary, and would bloat the kernel image more than this code would.

This patch was originally submitted to LKML in 2003 by Hans-Peter Jansen.

Signed-off-by: Chris Friesen
Signed-off-by: David S. Miller

Chris Friesen
2009-05-20 06:36:17 +0800