01 Sep, 2010

1 commit


19 Aug, 2010

1 commit

  • Since
    commit 1dacc76d0014a034b8aca14237c127d7c19d7726
    Author: Johannes Berg
    Date: Wed Jul 1 11:26:02 2009 +0000

    net/compat/wext: send different messages to compat tasks

    we had a race condition when setting and then
    restoring frag_list. Eric attempted to fix it,
    but the fix created even worse problems.

    However, the original motivation I had when I
    added the code that turned out to be racy is
    no longer clear to me, since we only copy up
    to skb->len to userspace, which doesn't include
    the frag_list length. As a result, not doing
    any frag_list clearing and restoring avoids
    the race condition, while not introducing any
    other problems.

    Additionally, while preparing this patch I found
    that since none of the remaining netlink code is
    really aware of the frag_list, we need to use the
    original skb's information for packet information
    and credentials. This fixes, for example, the
    group information received by compat tasks.

    Cc: Eric Dumazet
    Cc: stable@kernel.org [2.6.31+, for 2.6.35 revert 1235f504aa]
    Signed-off-by: Johannes Berg
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Johannes Berg
     

16 Aug, 2010

1 commit


27 Jul, 2010

3 commits


21 Jul, 2010

1 commit

  • Convert a few calls from kfree_skb to consume_skb

    Noticed while I was working on dropwatch that I was detecting lots of internal
    skb drops in several places. While some are legitimate, several were not,
    freeing skbs that were at the end of their life, rather than being discarded due
    to an error. This patch converts those calls sites from using kfree_skb to
    consume_skb, which quiets the in-kernel drop_monitor code from detecting them as
    drops. Tested successfully by myself

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

17 Jun, 2010

1 commit


22 May, 2010

1 commit

  • When netlink sockets are used to convey data that is in a namespace
    we need a way to select a subset of the listening sockets to deliver
    the packet to. For the network namespace we have been doing this
    by only transmitting packets in the correct network namespace.

    For data belonging to other namespaces netlink_bradcast_filtered
    provides a mechanism that allows us to examine the destination
    socket and to decide if we should transmit the specified packet
    to it.

    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

12 Apr, 2010

1 commit


07 Apr, 2010

1 commit


06 Apr, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
    smc91c92_cs: fix the problem of "Unable to find hardware address"
    r8169: clean up my printk uglyness
    net: Hook up cxgb4 to Kconfig and Makefile
    cxgb4: Add main driver file and driver Makefile
    cxgb4: Add remaining driver headers and L2T management
    cxgb4: Add packet queues and packet DMA code
    cxgb4: Add HW and FW support code
    cxgb4: Add register, message, and FW definitions
    netlabel: Fix several rcu_dereference() calls used without RCU read locks
    bonding: fix potential deadlock in bond_uninit()
    net: check the length of the socket address passed to connect(2)
    stmmac: add documentation for the driver.
    stmmac: fix kconfig for crc32 build error
    be2net: fix bug in vlan rx path for big endian architecture
    be2net: fix flashing on big endian architectures
    be2net: fix a bug in flashing the redboot section
    bonding: bond_xmit_roundrobin() fix
    drivers/net: Add missing unlock
    net: gianfar - align BD ring size console messages
    net: gianfar - initialize per-queue statistics
    ...

    Linus Torvalds
     

04 Apr, 2010

1 commit


02 Apr, 2010

1 commit

  • check the length of the socket address passed to connect(2).

    Check the length of the socket address passed to connect(2). If the
    length is invalid, -EINVAL will be returned.

    Signed-off-by: Changli Gao
    ----
    net/bluetooth/l2cap.c | 3 ++-
    net/bluetooth/rfcomm/sock.c | 3 ++-
    net/bluetooth/sco.c | 3 ++-
    net/can/bcm.c | 3 +++
    net/ieee802154/af_ieee802154.c | 3 +++
    net/ipv4/af_inet.c | 5 +++++
    net/netlink/af_netlink.c | 3 +++
    7 files changed, 20 insertions(+), 3 deletions(-)
    Signed-off-by: David S. Miller

    Changli Gao
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Mar, 2010

1 commit

  • This was included in OpenVZ kernels but wasn't integrated upstream.
    >From git://git.openvz.org/pub/linux-2.6.24-openvz:

    commit 5c69402f18adf7276352e051ece2cf31feefab02
    Author: Alexey Dobriyan
    Date: Mon Dec 24 14:37:45 2007 +0300

    netlink: fixup ->tgid to work in multiple PID namespaces

    Signed-off-by: Tom Goff
    Acked-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Tom Goff
     

21 Mar, 2010

1 commit

  • Currently, ENOBUFS errors are reported to the socket via
    netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
    that should not happen. This fixes this problem and it changes the
    prototype of netlink_set_err() to return the number of sockets that
    have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
    value is used in the next patch in these bugfix series.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

28 Feb, 2010

1 commit

  • The Inode field in /proc/net/{tcp,udp,packet,raw,...} is useful to know the types of
    file descriptors associated to a process. Actually lsof utility uses the field.
    Unfortunately, unlike /proc/net/{tcp,udp,packet,raw,...}, /proc/net/netlink doesn't have the field.
    This patch adds the field to /proc/net/netlink.

    Signed-off-by: Masatake YAMATO
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Masatake YAMATO
     

04 Feb, 2010

2 commits

  • David S. Miller
     
  • Netlink code does module autoload if protocol userspace is asking for is
    not ready. However, module can dissapear right after it was autoloaded.
    Example: modprobe/rmmod stress-testing and xfrm_user.ko providing NETLINK_XFRM.

    netlink_create() in such situation _will_ create userspace socket and
    _will_not_ pin module. Now if module was removed and we're going to call
    ->netlink_rcv into nothing:

    BUG: unable to handle kernel paging request at ffffffffa02f842a
    ^^^^^^^^^^^^^^^^
    modules are loaded near these addresses here

    IP: [] 0xffffffffa02f842a
    PGD 161f067 PUD 1623063 PMD baa12067 PTE 0
    Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
    CPU 1
    Pid: 11515, comm: ip Not tainted 2.6.33-rc5-netns-00594-gaaa5728-dirty #6 P5E/P5E
    RIP: 0010:[] [] 0xffffffffa02f842a
    RSP: 0018:ffff8800baa3db48 EFLAGS: 00010292
    RAX: ffff8800baa3dfd8 RBX: ffff8800be353640 RCX: 0000000000000000
    RDX: ffffffff81959380 RSI: ffff8800bab7f130 RDI: 0000000000000001
    RBP: ffff8800baa3db58 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000011
    R13: ffff8800be353640 R14: ffff8800bcdec240 R15: ffff8800bd488010
    FS: 00007f93749656f0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffffffffa02f842a CR3: 00000000ba82b000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process ip (pid: 11515, threadinfo ffff8800baa3c000, task ffff8800bab7eb30)
    Stack:
    ffffffff813637c0 ffff8800bd488000 ffff8800baa3dba8 ffffffff8136397d
    0000000000000000 ffffffff81344adc 7fffffffffffffff 0000000000000000
    ffff8800baa3ded8 ffff8800be353640 ffff8800bcdec240 0000000000000000
    Call Trace:
    [] ? netlink_unicast+0x100/0x2d0
    [] netlink_unicast+0x2bd/0x2d0

    netlink_unicast_kernel:
    nlk->netlink_rcv(skb);

    [] ? memcpy_fromiovec+0x6c/0x90
    [] netlink_sendmsg+0x1d3/0x2d0
    [] sock_sendmsg+0xbb/0xf0
    [] ? __lock_acquire+0x27b/0xa60
    [] ? might_fault+0x73/0xd0
    [] ? might_fault+0x73/0xd0
    [] ? __lock_release+0x82/0x170
    [] ? might_fault+0xbe/0xd0
    [] ? might_fault+0x73/0xd0
    [] ? verify_iovec+0x47/0xd0
    [] sys_sendmsg+0x1a9/0x360
    [] ? _raw_spin_unlock_irqrestore+0x65/0x70
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? _raw_spin_unlock_irqrestore+0x42/0x70
    [] ? __up_read+0x84/0xb0
    [] ? trace_hardirqs_on_caller+0x145/0x190
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b
    Code: Bad RIP value.
    RIP [] 0xffffffffa02f842a
    RSP
    CR2: ffffffffa02f842a

    If module was quickly removed after autoloading, return -E.

    Return -EPROTONOSUPPORT if module was quickly removed after autoloading.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

14 Jan, 2010

1 commit


26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

17 Nov, 2009

1 commit

  • The netlink URELEASE notifier doesn't notify for
    sockets that have been used to receive multicast
    but it should be called for such sockets as well
    since they might _also_ be used for sending and
    not solely for receiving multicast. We will need
    that for nl80211 (generic netlink sockets) in the
    future.

    Signed-off-by: Johannes Berg
    Cc: Patrick McHardy
    Signed-off-by: David S. Miller

    Johannes Berg
     

11 Nov, 2009

1 commit


06 Nov, 2009

1 commit

  • The generic __sock_create function has a kern argument which allows the
    security system to make decisions based on if a socket is being created by
    the kernel or by userspace. This patch passes that flag to the
    net_proto_family specific create function, so it can do the same thing.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

18 Oct, 2009

2 commits

  • 1. GENL_MIN_ID is a valid id -> no need to start at
    GENL_MIN_ID + 1.
    2. Avoid going through the ids two times: If we start at
    GENL_MIN_ID+1 (*or bigger*) and all ids are over!, the
    code iterates through the list twice (*or lesser*).
    3. Simplify code - no need to start at idx=0 which gets
    reset to GENL_MIN_ID.

    Patch on net-next-2.6. Reboot test shows that first id
    passed to genl_register_family was 16, next two were
    GENL_ID_GENERATE and genl_generate_id returned 17 & 18
    (user level testing of same code shows expected values
    across entire range of MIN/MAX).

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     
  • genl_register_family() doesn't need to call genl_family_find_byid
    when GENL_ID_GENERATE is passed during register.

    Patch on net-next-2.6, compile and reboot testing only.

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     

07 Oct, 2009

1 commit


01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Sep, 2009

1 commit


25 Sep, 2009

1 commit

  • Similar to commit d136f1bd366fdb7e747ca7e0218171e7a00a98a5,
    there's a bug when unregistering a generic netlink family,
    which is caught by the might_sleep() added in that commit:

    BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
    in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
    2 locks held by rmmod/1510:
    #0: (genl_mutex){+.+.+.}, at: [] genl_unregister_family+0x2b/0x130
    #1: (rcu_read_lock){.+.+..}, at: [] __genl_unregister_mc_group+0x1c/0x120
    Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
    Call Trace:
    [] __might_sleep+0x119/0x150
    [] netlink_table_grab+0x21/0x100
    [] netlink_clear_multicast_users+0x23/0x60
    [] __genl_unregister_mc_group+0x71/0x120
    [] genl_unregister_family+0x56/0x130
    [] nl80211_exit+0x15/0x20 [cfg80211]
    [] cfg80211_exit+0x1a/0x40 [cfg80211]

    Fix in the same way by grabbing the netlink table lock
    before doing rcu_read_lock().

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

22 Sep, 2009

1 commit

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

15 Sep, 2009

1 commit

  • Since my commits introducing netns awareness into
    genetlink we can get this problem:

    BUG: scheduling while atomic: modprobe/1178/0x00000002
    2 locks held by modprobe/1178:
    #0: (genl_mutex){+.+.+.}, at: [] genl_register_mc_grou
    #1: (rcu_read_lock){.+.+..}, at: [] genl_register_mc_g
    Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty #
    Call Trace:
    [] __schedule_bug+0x85/0x90
    [] schedule+0x108/0x588
    [] netlink_table_grab+0xa1/0xf0
    [] netlink_change_ngroups+0x47/0x100
    [] genl_register_mc_group+0x12f/0x290

    because I overlooked that netlink_table_grab() will
    schedule, thinking it was just the rwlock. However,
    in the contention case, that isn't actually true.

    Fix this by letting the code grab the netlink table
    lock first and then the RCU for netns protection.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

11 Sep, 2009

1 commit


05 Sep, 2009

1 commit

  • CC net/netlink/genetlink.o
    net/netlink/genetlink.c: In function ‘genl_register_mc_group’:
    net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function

    From following the code 'err' is initialized, but set it to zero to
    silence the warning.

    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     

25 Aug, 2009

1 commit


15 Jul, 2009

1 commit

  • Wireless extensions have the unfortunate problem that events
    are multicast netlink messages, and are not independent of
    pointer size. Thus, currently 32-bit tasks on 64-bit platforms
    cannot properly receive events and fail with all kinds of
    strange problems, for instance wpa_supplicant never notices
    disassociations, due to the way the 64-bit event looks (to a
    32-bit process), the fact that the address is all zeroes is
    lost, it thinks instead it is 00:00:00:00:01:00.

    The same problem existed with the ioctls, until David Miller
    fixed those some time ago in an heroic effort.

    A different problem caused by this is that we cannot send the
    ASSOCREQIE/ASSOCRESPIE events because sending them causes a
    32-bit wpa_supplicant on a 64-bit system to overwrite its
    internal information, which is worse than it not getting the
    information at all -- so we currently resort to sending a
    custom string event that it then parses. This, however, has a
    severe size limitation we are frequently hitting with modern
    access points; this limitation would can be lifted after this
    patch by sending the correct binary, not custom, event.

    A similar problem apparently happens for some other netlink
    users on x86_64 with 32-bit tasks due to the alignment for
    64-bit quantities.

    In order to fix these problems, I have implemented a way to
    send compat messages to tasks. When sending an event, we send
    the non-compat event data together with a compat event data in
    skb_shinfo(main_skb)->frag_list. Then, when the event is read
    from the socket, the netlink code makes sure to pass out only
    the skb that is compatible with the task. This approach was
    suggested by David Miller, my original approach required
    always sending two skbs but that had various small problems.

    To determine whether compat is needed or not, I have used the
    MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
    recvfrom to include it, even if those calls do not have a cmsg
    parameter.

    I have not solved one small part of the problem, and I don't
    think it is necessary to: if a 32-bit application uses read()
    rather than any form of recvmsg() it will still get the wrong
    (64-bit) event. However, neither do applications actually do
    this, nor would it be a regression.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

13 Jul, 2009

3 commits

  • This makes generic netlink network namespace aware. No
    generic netlink families except for the controller family
    are made namespace aware, they need to be checked one by
    one and then set the family->netnsok member to true.

    A new function genlmsg_multicast_netns() is introduced to
    allow sending a multicast message in a given namespace,
    for example when it applies to an object that lives in
    that namespace, a new function genlmsg_multicast_allns()
    to send a message to all network namespaces (for objects
    that do not have an associated netns).

    The function genlmsg_multicast() is changed to multicast
    the message in just init_net, which is currently correct
    for all generic netlink families since they only work in
    init_net right now. Some will later want to work in all
    net namespaces because they do not care about the netns
    at all -- those will have to be converted to use one of
    the new functions genlmsg_multicast_allns() or
    genlmsg_multicast_netns() whenever they are made netns
    aware in some way.

    After this patch families can easily decide whether or
    not they should be available in all net namespaces. Many
    genl families us it for objects not related to networking
    and should therefore be available in all namespaces, but
    that will have to be done on a per family basis.

    Note that this doesn't touch on the checkpoint/restart
    problem where network namespaces could be used, genl
    families and multicast groups are numbered globally and
    I see no easy way of changing that, especially since it
    must be possible to multicast to all network namespaces
    for those families that do not care about netns.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • For the network namespace work in generic netlink I need
    to be able to call this function under rcu_read_lock(),
    otherwise the locking becomes a nightmare and more locks
    would be needed. Instead, just embed a struct rcu_head
    (actually a struct listeners_rcu_head that also carries
    the pointer to the memory block) into the listeners
    memory so we can use call_rcu() instead of synchronising
    and then freeing. No rcu_barrier() is needed since this
    code cannot be modular.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • I added those myself in commits b4ff4f04 and 84659eb5,
    but I see no reason now why they should be exported,
    only generic netlink uses them which cannot be modular.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg