Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

11 Sep, 2012

1 commit

15e473046 netlink: Rename pid to portid to avoid confusion ... Browse Code »

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman"
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Eric W. Biederman
2012-09-11 03:30:41 +0800

09 Sep, 2012

2 commits

9f00d9776 netlink: hide struct module parameter in netlink_kernel_create ... Browse Code »

This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2012-09-09 06:46:30 +0800
9785e10ae netlink: kill netlink_set_nonroot ... Browse Code »

Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.

This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).

Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2012-09-09 06:45:27 +0800

08 Sep, 2012

2 commits

4ccfe6d41 ipv4/route: arg delay is useless in rt_cache_flush() ... Browse Code »

Since route cache deletion (89aef8921bfbac22f), delay is no
more used. Remove it.

Signed-off-by: Nicolas Dichtel
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Nicolas Dichtel
2012-09-08 02:44:08 +0800
dbe9a4173 scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie. ... Browse Code »

Passing uids and gids on NETLINK_CB from a process in one user
namespace to a process in another user namespace can result in the
wrong uid or gid being presented to userspace. Avoid that problem by
passing kuids and kgids instead.

- define struct scm_creds for use in scm_cookie and netlink_skb_parms
that holds uid and gid information in kuid_t and kgid_t.

- Modify scm_set_cred to fill out scm_creds by heand instead of using
cred_to_ucred to fill out struct ucred. This conversion ensures
userspace does not get incorrect uid or gid values to look at.

- Modify scm_recv to convert from struct scm_creds to struct ucred
before copying credential values to userspace.

- Modify __scm_send to populate struct scm_creds on in the scm_cookie,
instead of just copying struct ucred from userspace.

- Modify netlink_sendmsg to copy scm_creds instead of struct ucred
into the NETLINK_CB.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2012-09-08 02:42:05 +0800

06 Sep, 2012

3 commits

ef2c7d7b5 ipv6: fix handling of blackhole and prohibit routes ... Browse Code »

When adding a blackhole or a prohibit route, they were handling like classic
routes. Moreover, it was only possible to add this kind of routes by specifying
an interface.

Bug already reported here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498

Before the patch:
$ ip route add blackhole 2001::1/128
RTNETLINK answers: No such device
$ ip route add blackhole 2001::1/128 dev eth0
$ ip -6 route | grep 2001
2001::1 dev eth0 metric 1024

After:
$ ip route add blackhole 2001::1/128
$ ip -6 route | grep 2001
blackhole 2001::1 dev lo metric 1024 error -22

v2: wrong patch
v3: add a field fc_type in struct fib6_config to store RTN_* type

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2012-09-06 05:49:28 +0800
23d3b8bfb net: qdisc busylock needs lockdep annotations ... Browse Code »

It seems we need to provide ability for stacked devices
to use specific lock_class_key for sch->busylock

We could instead default l2tpeth tx_queue_len to 0 (no qdisc), but
a user might use a qdisc anyway.

(So same fixes are probably needed on non LLTX stacked drivers)

Noticed while stressing L2TPV3 setup :

======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc3+ #788 Not tainted
-------------------------------------------------------
netperf/4660 is trying to acquire lock:
(l2tpsock){+.-...}, at: [] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]

but task is already holding lock:
(&(&sch->busylock)->rlock){+.-...}, at: [] dev_queue_xmit+0xd75/0xe00

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&sch->busylock)->rlock){+.-...}:
[] lock_acquire+0x90/0x200
[] _raw_spin_lock_irqsave+0x4c/0x60
[] __wake_up+0x32/0x70
[] tty_wakeup+0x3e/0x80
[] pty_write+0x73/0x80
[] tty_put_char+0x3c/0x40
[] process_echoes+0x142/0x330
[] n_tty_receive_buf+0x8fb/0x1230
[] flush_to_ldisc+0x142/0x1c0
[] process_one_work+0x198/0x760
[] worker_thread+0x186/0x4b0
[] kthread+0x93/0xa0
[] kernel_thread_helper+0x4/0x10

-> #0 (l2tpsock){+.-...}:
[] __lock_acquire+0x1628/0x1b10
[] lock_acquire+0x90/0x200
[] _raw_spin_lock+0x41/0x50
[] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
[] dev_hard_start_xmit+0x502/0xa70
[] sch_direct_xmit+0xfe/0x290
[] dev_queue_xmit+0x1e5/0xe00
[] ip_finish_output+0x3d0/0x890
[] ip_output+0x59/0xf0
[] ip_local_out+0x2d/0xa0
[] ip_queue_xmit+0x1c3/0x680
[] tcp_transmit_skb+0x402/0xa60
[] tcp_write_xmit+0x1f4/0xa30
[] tcp_push_one+0x30/0x40
[] tcp_sendmsg+0xe82/0x1040
[] inet_sendmsg+0x125/0x230
[] sock_sendmsg+0xdc/0xf0
[] sys_sendto+0xfe/0x130
[] system_call_fastpath+0x16/0x1b
Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&sch->busylock)->rlock);
lock(l2tpsock);
lock(&(&sch->busylock)->rlock);
lock(l2tpsock);

*** DEADLOCK ***

5 locks held by netperf/4660:
#0: (sk_lock-AF_INET){+.+.+.}, at: [] tcp_sendmsg+0x2c/0x1040
#1: (rcu_read_lock){.+.+..}, at: [] ip_queue_xmit+0x0/0x680
#2: (rcu_read_lock_bh){.+....}, at: [] ip_finish_output+0x135/0x890
#3: (rcu_read_lock_bh){.+....}, at: [] dev_queue_xmit+0x0/0xe00
#4: (&(&sch->busylock)->rlock){+.-...}, at: [] dev_queue_xmit+0xd75/0xe00

stack backtrace:
Pid: 4660, comm: netperf Not tainted 3.6.0-rc3+ #788
Call Trace:
[] print_circular_bug+0x1fb/0x20c
[] __lock_acquire+0x1628/0x1b10
[] ? check_usage+0x9b/0x4d0
[] ? __lock_acquire+0x2e4/0x1b10
[] lock_acquire+0x90/0x200
[] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[] _raw_spin_lock+0x41/0x50
[] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
[] dev_hard_start_xmit+0x502/0xa70
[] ? dev_hard_start_xmit+0x5e/0xa70
[] ? dev_queue_xmit+0x141/0xe00
[] sch_direct_xmit+0xfe/0x290
[] dev_queue_xmit+0x1e5/0xe00
[] ? dev_hard_start_xmit+0xa70/0xa70
[] ip_finish_output+0x3d0/0x890
[] ? ip_finish_output+0x135/0x890
[] ip_output+0x59/0xf0
[] ip_local_out+0x2d/0xa0
[] ip_queue_xmit+0x1c3/0x680
[] ? ip_local_out+0xa0/0xa0
[] tcp_transmit_skb+0x402/0xa60
[] ? tcp_md5_do_lookup+0x18e/0x1a0
[] tcp_write_xmit+0x1f4/0xa30
[] tcp_push_one+0x30/0x40
[] tcp_sendmsg+0xe82/0x1040
[] inet_sendmsg+0x125/0x230
[] ? inet_create+0x6b0/0x6b0
[] ? sock_update_classid+0xc2/0x3b0
[] ? sock_update_classid+0x130/0x3b0
[] sock_sendmsg+0xdc/0xf0
[] ? fget_light+0x3f9/0x4f0
[] sys_sendto+0xfe/0x130
[] ? trace_hardirqs_on+0xd/0x10
[] ? _raw_spin_unlock_irq+0x30/0x50
[] ? finish_task_switch+0x83/0xf0
[] ? finish_task_switch+0x46/0xf0
[] ? sysret_check+0x1b/0x56
[] system_call_fastpath+0x16/0x1b

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-09-06 05:49:27 +0800
d23ff7016 tcp: add generic netlink support for tcp_metrics ... Browse Code »
13

Add support for genl "tcp_metrics". No locking
is changed, only that now we can unlink and delete
entries after grace period. We implement get/del for
single entry and dump to support show/flush filtering
in user space. Del without address attribute causes
flush for all addresses, sadly under genl_mutex.

v2:
- remove rcu_assign_pointer as suggested by Eric Dumazet,
it is not needed because there are no other writes under lock
- move the flushing code in tcp_metrics_flush_all

v3:
- remove synchronize_rcu on flush as suggested by Eric Dumazet

Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller

Julian Anastasov
2012-09-06 03:15:02 +0800

04 Sep, 2012

2 commits

1e9f0207d Merge branch 'master' of git://1984.lsi.us.es/nf-next Browse Code »

David S. Miller
2012-09-04 08:26:45 +0800
684bad110 tcp: use PRR to reduce cwin in CWR state ... Browse Code »

Use proportional rate reduction (PRR) algorithm to reduce cwnd in CWR state,
in addition to Recovery state. Retire the current rate-halving in CWR.
When losses are detected via ACKs in CWR state, the sender enters Recovery
state but the cwnd reduction continues and does not restart.

Rename and refactor cwnd reduction functions since both CWR and Recovery
use the same algorithm:
tcp_init_cwnd_reduction() is new and initiates reduction state variables.
tcp_cwnd_reduction() is previously tcp_update_cwnd_in_recovery().
tcp_ends_cwnd_reduction() is previously tcp_complete_cwr().

The rate halving functions and logic such as tcp_cwnd_down(), tcp_min_cwnd(),
and the cwnd moderation inside tcp_enter_cwr() are removed. The unused
parameter, flag, in tcp_cwnd_reduction() is also removed.

Signed-off-by: Yuchung Cheng
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Yuchung Cheng
2012-09-04 02:34:02 +0800

03 Sep, 2012

2 commits

ace1fe123 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling
of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.

Pablo Neira Ayuso
2012-09-03 21:34:51 +0800
84b5ee939 netfilter: nf_conntrack: add nf_ct_timeout_lookup ... Browse Code »

This patch adds the new nf_ct_timeout_lookup function to encapsulate
the timeout policy attachment that is called in the nf_conntrack_in
path.

Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2012-09-03 19:33:03 +0800

01 Sep, 2012

6 commits

8336886f7 tcp: TCP Fast Open Server - support TFO listeners ... Browse Code »

This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu
Cc: Yuchung Cheng
Cc: Neal Cardwell
Cc: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller

Jerry Chu
2012-09-01 08:02:19 +0800
104671636 tcp: TCP Fast Open Server - header & support functions ... Browse Code »

This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.

In addition, it includes the following:

1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.

2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.

"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.

"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.

3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.

Signed-off-by: H.K. Jerry Chu
Cc: Yuchung Cheng
Cc: Neal Cardwell
Cc: Eric Dumazet
Cc: Tom Herbert
Signed-off-by: David S. Miller

Jerry Chu
2012-09-01 08:02:18 +0800
d56631a66 net:stmmac: Remove bus_id from mdio platform data. ... Browse Code »

This patch removes bus_id from mdio platform data, The reason to remove
bus_id is, stmmac mdio bus_id is always same as stmmac bus-id, so there
is no point in passing this in different variable.
Also stmmac ethernet driver connects to phy with bus_id passed its
platform data.
So, having single bus-id is much simpler.

Signed-off-by: Srinivas Kandagatla
Signed-off-by: David S. Miller

Srinivas Kandagatla
2012-09-01 04:11:28 +0800
6c9ff979d tcp: Increase timeout for SYN segments ... Browse Code »

Commit 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for
the passive open side") changed the initRTO from 3secs to 1sec in
accordance to RFC6298 (former RFC2988bis). This reduced the time till
the last SYN retransmission packet gets sent from 93secs to 31secs.

RFC1122 is stating that the retransmission should be done for at least 3
minutes, but this seems to be quite high.

"However, the values of R1 and R2 may be different for SYN
and data segments. In particular, R2 for a SYN segment MUST
be set large enough to provide retransmission of the segment
for at least 3 minutes. The application can close the
connection (i.e., give up on the open attempt) sooner, of
course."

This patch increases the value of TCP_SYN_RETRIES to the value of 6,
providing a retransmission window of 63secs.

The comments for SYN and SYNACK retries have also been updated to
describe the current settings. The same goes for the documentation file
"Documentation/networking/ip-sysctl.txt".

Signed-off-by: Alexander Bergmann
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Alex Bergmann
2012-09-01 03:42:10 +0800
c32f38619 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.

Signed-off-by: David S. Miller

David S. Miller
2012-09-01 03:14:18 +0800
0dcd5052c Merge branch 'master' of git://1984.lsi.us.es/nf Browse Code »

David S. Miller
2012-09-01 01:06:37 +0800

31 Aug, 2012

3 commits

5b423f6a4 netfilter: nf_conntrack: fix racy timer handling with reliable events ... Browse Code »
5

Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.

This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.

Tested-by: Oliver Smith
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2012-08-31 21:50:28 +0800
5c879d209 bnx2x: fix 57840_MF pci id ... Browse Code »

Commit c3def943c7117d42caaed3478731ea7c3c87190e have added support for
new pci ids of the 57840 board, while failing to change the obsolete value
in 'pci_ids.h'.
This patch does so, allowing the probe of such devices.

Signed-off-by: Yuval Mintz
Signed-off-by: Eilon Greenstein
Signed-off-by: David S. Miller

Yuval Mintz
2012-08-31 01:14:48 +0800
f9dc9ac51 of/mdio: Add dummy functions in of_mdio.h. ... Browse Code »

This patch adds dummy functions in of_mdio.h, so that driver need not
ifdef there code with CONFIG_OF.

Signed-off-by: Srinivas Kandagatla
Signed-off-by: David S. Miller

Srinivas Kandagatla
2012-08-31 00:37:52 +0800

30 Aug, 2012

8 commits

8a91bb0c3 netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:25 +0800
9a6648210 netfilter: nf_nat: support IPv6 in SIP NAT helper ... Browse Code »

Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:22 +0800
b3f644fc8 netfilter: ip6tables: add MASQUERADE target ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:18 +0800
58a317f10 netfilter: ipv6: add IPv6 NAT support ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:17 +0800
2cf545e83 net: core: add function for incremental IPv6 pseudo header checksum updates ... Browse Code »

Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.

Signed-off-by: Patrick McHardy
Acked-by: David S. Miller

Patrick McHardy
2012-08-30 09:00:16 +0800
c7232c997 netfilter: add protocol independent NAT core ... Browse Code »

Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:14 +0800
051966c0c netfilter: nf_nat: add protoff argument to packet mangling functions ... Browse Code »

For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:13 +0800
4cdd34084 netfilter: nf_conntrack_ipv6: improve fragmentation handling ... Browse Code »

The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
assigned, the reassembled packet is passed through the stack instead
of the original fragments.

- During defragmentation, the largest received fragment size is stored.
On output, the packet is refragmented if required. If the largest
received fragment size exceeds the outgoing MTU, a "packet too big"
message is generated, thus behaving as if the original fragments
were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
connections using a helper, so it is switched to use ipv6_skip_exthdr()
instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer .

Signed-off-by: Patrick McHardy

Patrick McHardy
2012-08-30 09:00:10 +0800

27 Aug, 2012

1 commit

5f2d04f1f ipv4: fix path MTU discovery with connection tracking ... Browse Code »
6

IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.

Signed-off-by: Patrick McHardy
Acked-by: Eric Dumazet
Acked-by: David S. Miller

Patrick McHardy
2012-08-27 01:13:55 +0800

25 Aug, 2012

5 commits

e6acb3848 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller

David S. Miller
2012-08-25 06:54:37 +0800
85c21049f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next ... Browse Code »

John W. Linville says:

====================
This is a batch of updates intended for 3.7. The bulk of it is
mac80211 changes, including some mesh work from Thomas Pederson and
some multi-channel work from Johannes. A variety of driver updates
and other bits are scattered in there as well.
====================

Signed-off-by: David S. Miller

David S. Miller
2012-08-25 03:18:07 +0800
9b361c13c vlan: add helper which can be called to see if device is used by vlan ... Browse Code »

also, remove unused vlan_info definition from header

CC: Patrick McHardy
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2012-08-25 01:46:39 +0800
8f4cccbbd net: Set device operstate at registration time ... Browse Code »

The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver. The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.

For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened. However, we must not activate linkwatch for a
unregistered device, and commit b473001 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't. But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.

The same issue exists with the dormant state.

The proper initialisation sequence, avoiding a race with opening of
the device, is:

rtnl_lock();
rc = register_netdevice(dev);
if (rc)
goto out_unlock;
netif_carrier_off(dev); /* or netif_dormant_on(dev) */
rtnl_unlock();

but it seems silly that this should have to be repeated in so many
drivers. Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.

Commit 22604c8 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.

This initialises the operstate synchronously at registration time
only.

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller

Ben Hutchings
2012-08-25 00:46:13 +0800
f20b6213f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/… ... Browse Code »

…wireless-next into for-davem

John W. Linville
2012-08-25 00:25:30 +0800

24 Aug, 2012

1 commit

f63c45e0e packet: fix broken build. ... Browse Code »

This patch fixes a broken build due to a missing header:
...
CC net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...

The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.

See commit 0fa7fa98dbcc2789409ed24e885485e645803d7f:
packet: Protect packet sk list with mutex (v2) patch,

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2012-08-24 00:29:45 +0800

23 Aug, 2012

4 commits

a4881cc45 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Browse Code »

John W. Linville
2012-08-23 21:49:42 +0800
0fa7fa98d packet: Protect packet sk list with mutex (v2) ... Browse Code »

Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [] sock_diag_rcv+0x1f/0x3e
[152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [] netlink_dump+0x23/0x1ab
[152363.820693] #3: (rcu_read_lock){.+.+..}, at: [] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Pavel Emelyanov
2012-08-23 13:58:27 +0800
b32607dd4 mdio: translation of MMD EEE registers to/from ethtool settings ... Browse Code »

The helper functions which translate IEEE MDIO Manageable Device (MMD)
Energy-Efficient Ethernet (EEE) registers 3.20, 7.60 and 7.61 to and from
the comparable ethtool supported/advertised settings will be needed by
drivers other than those in PHYLIB (e.g. e1000e in a follow-on patch).

In the same fashion as similar translation functions in linux/mii.h, move
these functions from the PHYLIB core to the linux/mdio.h header file so the
code will not have to be duplicated in each driver needing MMD-to-ethtool
(and vice-versa) translations. The function and some variable names have
been renamed to be more descriptive.

Not tested on the only hardware that currently calls the related functions,
stmmac, because I don't have access to any. Has been compile tested and
the translations have been tested on a locally modified version of e1000e.

Signed-off-by: Bruce Allan
Cc: Giuseppe Cavallaro
Signed-off-by: David S. Miller

Allan, Bruce W
2012-08-23 13:58:27 +0800
0115e8e30 net: remove delay at device dismantle ... Browse Code »

I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.

Use NETDEV_UNREGISTER_FINAL with care !

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

With help from Gao feng

Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Cc: Mahesh Bandewar
Cc: "Eric W. Biederman"
Cc: Gao feng
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-23 12:50:36 +0800