02 Apr, 2010

1 commit

  • check the length of the socket address passed to connect(2).

    Check the length of the socket address passed to connect(2). If the
    length is invalid, -EINVAL will be returned.

    Signed-off-by: Changli Gao
    ----
    net/bluetooth/l2cap.c | 3 ++-
    net/bluetooth/rfcomm/sock.c | 3 ++-
    net/bluetooth/sco.c | 3 ++-
    net/can/bcm.c | 3 +++
    net/ieee802154/af_ieee802154.c | 3 +++
    net/ipv4/af_inet.c | 5 +++++
    net/netlink/af_netlink.c | 3 +++
    7 files changed, 20 insertions(+), 3 deletions(-)
    Signed-off-by: David S. Miller

    Changli Gao
     

21 Mar, 2010

1 commit

  • Currently, ENOBUFS errors are reported to the socket via
    netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
    that should not happen. This fixes this problem and it changes the
    prototype of netlink_set_err() to return the number of sockets that
    have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
    value is used in the next patch in these bugfix series.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

28 Feb, 2010

1 commit

  • The Inode field in /proc/net/{tcp,udp,packet,raw,...} is useful to know the types of
    file descriptors associated to a process. Actually lsof utility uses the field.
    Unfortunately, unlike /proc/net/{tcp,udp,packet,raw,...}, /proc/net/netlink doesn't have the field.
    This patch adds the field to /proc/net/netlink.

    Signed-off-by: Masatake YAMATO
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Masatake YAMATO
     

04 Feb, 2010

1 commit

  • Netlink code does module autoload if protocol userspace is asking for is
    not ready. However, module can dissapear right after it was autoloaded.
    Example: modprobe/rmmod stress-testing and xfrm_user.ko providing NETLINK_XFRM.

    netlink_create() in such situation _will_ create userspace socket and
    _will_not_ pin module. Now if module was removed and we're going to call
    ->netlink_rcv into nothing:

    BUG: unable to handle kernel paging request at ffffffffa02f842a
    ^^^^^^^^^^^^^^^^
    modules are loaded near these addresses here

    IP: [] 0xffffffffa02f842a
    PGD 161f067 PUD 1623063 PMD baa12067 PTE 0
    Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
    CPU 1
    Pid: 11515, comm: ip Not tainted 2.6.33-rc5-netns-00594-gaaa5728-dirty #6 P5E/P5E
    RIP: 0010:[] [] 0xffffffffa02f842a
    RSP: 0018:ffff8800baa3db48 EFLAGS: 00010292
    RAX: ffff8800baa3dfd8 RBX: ffff8800be353640 RCX: 0000000000000000
    RDX: ffffffff81959380 RSI: ffff8800bab7f130 RDI: 0000000000000001
    RBP: ffff8800baa3db58 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000011
    R13: ffff8800be353640 R14: ffff8800bcdec240 R15: ffff8800bd488010
    FS: 00007f93749656f0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffffffffa02f842a CR3: 00000000ba82b000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process ip (pid: 11515, threadinfo ffff8800baa3c000, task ffff8800bab7eb30)
    Stack:
    ffffffff813637c0 ffff8800bd488000 ffff8800baa3dba8 ffffffff8136397d
    0000000000000000 ffffffff81344adc 7fffffffffffffff 0000000000000000
    ffff8800baa3ded8 ffff8800be353640 ffff8800bcdec240 0000000000000000
    Call Trace:
    [] ? netlink_unicast+0x100/0x2d0
    [] netlink_unicast+0x2bd/0x2d0

    netlink_unicast_kernel:
    nlk->netlink_rcv(skb);

    [] ? memcpy_fromiovec+0x6c/0x90
    [] netlink_sendmsg+0x1d3/0x2d0
    [] sock_sendmsg+0xbb/0xf0
    [] ? __lock_acquire+0x27b/0xa60
    [] ? might_fault+0x73/0xd0
    [] ? might_fault+0x73/0xd0
    [] ? __lock_release+0x82/0x170
    [] ? might_fault+0xbe/0xd0
    [] ? might_fault+0x73/0xd0
    [] ? verify_iovec+0x47/0xd0
    [] sys_sendmsg+0x1a9/0x360
    [] ? _raw_spin_unlock_irqrestore+0x65/0x70
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? _raw_spin_unlock_irqrestore+0x42/0x70
    [] ? __up_read+0x84/0xb0
    [] ? trace_hardirqs_on_caller+0x145/0x190
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b
    Code: Bad RIP value.
    RIP [] 0xffffffffa02f842a
    RSP
    CR2: ffffffffa02f842a

    If module was quickly removed after autoloading, return -E.

    Return -EPROTONOSUPPORT if module was quickly removed after autoloading.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

26 Nov, 2009

1 commit

  • Generated with the following semantic patch

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 == n2
    + net_eq(n1, n2)

    @@
    struct net *n1;
    struct net *n2;
    @@
    - n1 != n2
    + !net_eq(n1, n2)

    applied over {include,net,drivers/net}.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

17 Nov, 2009

1 commit

  • The netlink URELEASE notifier doesn't notify for
    sockets that have been used to receive multicast
    but it should be called for such sockets as well
    since they might _also_ be used for sending and
    not solely for receiving multicast. We will need
    that for nl80211 (generic netlink sockets) in the
    future.

    Signed-off-by: Johannes Berg
    Cc: Patrick McHardy
    Signed-off-by: David S. Miller

    Johannes Berg
     

11 Nov, 2009

1 commit


06 Nov, 2009

1 commit

  • The generic __sock_create function has a kern argument which allows the
    security system to make decisions based on if a socket is being created by
    the kernel or by userspace. This patch passes that flag to the
    net_proto_family specific create function, so it can do the same thing.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

07 Oct, 2009

1 commit


01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Sep, 2009

1 commit


25 Sep, 2009

1 commit

  • Similar to commit d136f1bd366fdb7e747ca7e0218171e7a00a98a5,
    there's a bug when unregistering a generic netlink family,
    which is caught by the might_sleep() added in that commit:

    BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
    in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
    2 locks held by rmmod/1510:
    #0: (genl_mutex){+.+.+.}, at: [] genl_unregister_family+0x2b/0x130
    #1: (rcu_read_lock){.+.+..}, at: [] __genl_unregister_mc_group+0x1c/0x120
    Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
    Call Trace:
    [] __might_sleep+0x119/0x150
    [] netlink_table_grab+0x21/0x100
    [] netlink_clear_multicast_users+0x23/0x60
    [] __genl_unregister_mc_group+0x71/0x120
    [] genl_unregister_family+0x56/0x130
    [] nl80211_exit+0x15/0x20 [cfg80211]
    [] cfg80211_exit+0x1a/0x40 [cfg80211]

    Fix in the same way by grabbing the netlink table lock
    before doing rcu_read_lock().

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

22 Sep, 2009

1 commit

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

15 Sep, 2009

1 commit

  • Since my commits introducing netns awareness into
    genetlink we can get this problem:

    BUG: scheduling while atomic: modprobe/1178/0x00000002
    2 locks held by modprobe/1178:
    #0: (genl_mutex){+.+.+.}, at: [] genl_register_mc_grou
    #1: (rcu_read_lock){.+.+..}, at: [] genl_register_mc_g
    Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty #
    Call Trace:
    [] __schedule_bug+0x85/0x90
    [] schedule+0x108/0x588
    [] netlink_table_grab+0xa1/0xf0
    [] netlink_change_ngroups+0x47/0x100
    [] genl_register_mc_group+0x12f/0x290

    because I overlooked that netlink_table_grab() will
    schedule, thinking it was just the rwlock. However,
    in the contention case, that isn't actually true.

    Fix this by letting the code grab the netlink table
    lock first and then the RCU for netns protection.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

25 Aug, 2009

1 commit


15 Jul, 2009

1 commit

  • Wireless extensions have the unfortunate problem that events
    are multicast netlink messages, and are not independent of
    pointer size. Thus, currently 32-bit tasks on 64-bit platforms
    cannot properly receive events and fail with all kinds of
    strange problems, for instance wpa_supplicant never notices
    disassociations, due to the way the 64-bit event looks (to a
    32-bit process), the fact that the address is all zeroes is
    lost, it thinks instead it is 00:00:00:00:01:00.

    The same problem existed with the ioctls, until David Miller
    fixed those some time ago in an heroic effort.

    A different problem caused by this is that we cannot send the
    ASSOCREQIE/ASSOCRESPIE events because sending them causes a
    32-bit wpa_supplicant on a 64-bit system to overwrite its
    internal information, which is worse than it not getting the
    information at all -- so we currently resort to sending a
    custom string event that it then parses. This, however, has a
    severe size limitation we are frequently hitting with modern
    access points; this limitation would can be lifted after this
    patch by sending the correct binary, not custom, event.

    A similar problem apparently happens for some other netlink
    users on x86_64 with 32-bit tasks due to the alignment for
    64-bit quantities.

    In order to fix these problems, I have implemented a way to
    send compat messages to tasks. When sending an event, we send
    the non-compat event data together with a compat event data in
    skb_shinfo(main_skb)->frag_list. Then, when the event is read
    from the socket, the netlink code makes sure to pass out only
    the skb that is compatible with the task. This approach was
    suggested by David Miller, my original approach required
    always sending two skbs but that had various small problems.

    To determine whether compat is needed or not, I have used the
    MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
    recvfrom to include it, even if those calls do not have a cmsg
    parameter.

    I have not solved one small part of the problem, and I don't
    think it is necessary to: if a 32-bit application uses read()
    rather than any form of recvmsg() it will still get the wrong
    (64-bit) event. However, neither do applications actually do
    this, nor would it be a regression.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

13 Jul, 2009

2 commits

  • For the network namespace work in generic netlink I need
    to be able to call this function under rcu_read_lock(),
    otherwise the locking becomes a nightmare and more locks
    would be needed. Instead, just embed a struct rcu_head
    (actually a struct listeners_rcu_head that also carries
    the pointer to the memory block) into the listeners
    memory so we can use call_rcu() instead of synchronising
    and then freeing. No rcu_barrier() is needed since this
    code cannot be modular.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • I added those myself in commits b4ff4f04 and 84659eb5,
    but I see no reason now why they should be exported,
    only generic netlink uses them which cannot be modular.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

18 Jun, 2009

1 commit

  • commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    changed initial sk_wmem_alloc value.

    We need to take into account this offset when reporting
    sk_wmem_alloc to user, in PROC_FS files or various
    ioctls (SIOCOUTQ/TIOCOUTQ)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Mar, 2009

2 commits

  • This patch adds the NETLINK_NO_ENOBUFS socket flag. This flag can
    be used by unicast and broadcast listeners to avoid receiving
    ENOBUFS errors.

    Generally speaking, ENOBUFS errors are useful to notify two things
    to the listener:

    a) You may increase the receiver buffer size via setsockopt().
    b) You have lost messages, you may be out of sync.

    In some cases, ignoring ENOBUFS errors can be useful. For example:

    a) nfnetlink_queue: this subsystem does not have any sort of resync
    method and you can decide to ignore ENOBUFS once you have set a
    given buffer size.

    b) ctnetlink: you can use this together with the socket flag
    NETLINK_BROADCAST_SEND_ERROR to stop getting ENOBUFS errors as
    you do not need to resync (packets whose event are not delivered
    are drop to provide reliable logging and state-synchronization).

    Moreover, the use of NETLINK_NO_ENOBUFS also reduces a "go up, go down"
    effect in terms of performance which is due to the netlink congestion
    control when the listener cannot back off. The effect is the following:

    1) throughput rate goes up and netlink messages are inserted in the
    receiver buffer.
    2) Then, netlink buffer fills and overruns (set on nlk->state bit 0).
    3) While the listener empties the receiver buffer, netlink keeps
    dropping messages. Thus, throughput goes dramatically down.
    4) Then, once the listener has emptied the buffer (nlk->state
    bit 0 is set off), goto step 1.

    This effect is easy to trigger with netlink broadcast under heavy
    load, and it is more noticeable when using a big receiver buffer.
    You can find some results in [1] that show this problem.

    [1] http://1984.lsi.us.es/linux/netlink/

    This patch also includes the use of sk_drop to account the number of
    netlink messages drop due to overrun. This value is shown in
    /proc/net/netlink.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • David S. Miller
     

23 Mar, 2009

1 commit


05 Mar, 2009

1 commit


04 Mar, 2009

1 commit

  • The callers of netlink_set_err() currently pass a negative value
    as parameter for the error code. However, sk->sk_err wants a
    positive error value. Without this patch, skb_recv_datagram() called
    by netlink_recvmsg() may return a positive value to report an error.

    Another choice to fix this is to change callers to pass a positive
    error value, but this seems a bit inconsistent and error prone
    to me. Indeed, the callers of netlink_set_err() assumed that the
    (usual) negative value for error codes was fine before this patch :).

    This patch also includes some documentation in docbook format
    for netlink_set_err() to avoid this sort of confusion.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

27 Feb, 2009

1 commit


25 Feb, 2009

1 commit

  • This patch changes the return value of nlmsg_notify() as follows:

    If NETLINK_BROADCAST_ERROR is set by any of the listeners and
    an error in the delivery happened, return the broadcast error;
    else if there are no listeners apart from the socket that
    requested a change with the echo flag, return the result of the
    unicast notification. Thus, with this patch, the unicast
    notification is handled in the same way of a broadcast listener
    that has set the NETLINK_BROADCAST_ERROR socket flag.

    This patch is useful in case that the caller of nlmsg_notify()
    wants to know the result of the delivery of a netlink notification
    (including the broadcast delivery) and take any action in case
    that the delivery failed. For example, ctnetlink can drop packets
    if the event delivery failed to provide reliable logging and
    state-synchronization at the cost of dropping packets.

    This patch also modifies the rtnetlink code to ignore the return
    value of rtnl_notify() in all callers. The function rtnl_notify()
    (before this patch) returned the error of the unicast notification
    which makes rtnl_set_sk_err() reports errors to all listeners. This
    is not of any help since the origin of the change (the socket that
    requested the echoing) notices the ENOBUFS error if the notification
    fails and should resync itself.

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

20 Feb, 2009

1 commit

  • This patch adds NETLINK_BROADCAST_ERROR which is a netlink
    socket option that the listener can set to make netlink_broadcast()
    return errors in the delivery to the caller. This option is useful
    if the caller of netlink_broadcast() do something with the result
    of the message delivery, like in ctnetlink where it drops a network
    packet if the event delivery failed, this is used to enable reliable
    logging and state-synchronization. If this socket option is not set,
    netlink_broadcast() only reports ESRCH errors and silently ignore
    ENOBUFS errors, which is what most netlink_broadcast() callers
    should do.

    This socket option is based on a suggestion from Patrick McHardy.
    Patrick McHardy can exchange this patch for a beer from me ;).

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

06 Feb, 2009

1 commit

  • Currently, netlink_broadcast() reports errors to the caller if no
    messages at all were delivered:

    1) If, at least, one message has been delivered correctly, returns 0.
    2) Otherwise, if no messages at all were delivered due to skb_clone()
    failure, return -ENOBUFS.
    3) Otherwise, if there are no listeners, return -ESRCH.

    With this patch, the caller knows if the delivery of any of the
    messages to the listeners have failed:

    1) If it fails to deliver any message (for whatever reason), return
    -ENOBUFS.
    2) Otherwise, if all messages were delivered OK, returns 0.
    3) Otherwise, if no listeners, return -ESRCH.

    In the current ctnetlink code and in Netfilter in general, we can add
    reliable logging and connection tracking event delivery by dropping the
    packets whose events were not successfully delivered over Netlink. Of
    course, this option would be settable via /proc as this approach reduces
    performance (in terms of filtered connections per seconds by a stateful
    firewall) but providing reliable logging and event delivery (for
    conntrackd) in return.

    This patch also changes some clients of netlink_broadcast() that
    may report ENOBUFS errors via printk. This error handling is not
    of any help. Instead, the userspace daemons that are listening to
    those netlink messages should resync themselves with the kernel-side
    if they hit ENOBUFS.

    BTW, netlink_broadcast() clients include those that call
    cn_netlink_send(), nlmsg_multicast() and genlmsg_multicast() since they
    internally call netlink_broadcast() and return its error value.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

25 Nov, 2008

1 commit


24 Nov, 2008

2 commits


17 Oct, 2008

1 commit


14 Oct, 2008

1 commit

  • Clean up the various different email addresses of mine listed in the code
    to a single current and valid address. As Dave says his network merges
    for 2.6.28 are now done this seems a good point to send them in where
    they won't risk disrupting real changes.

    Signed-off-by: Alan Cox
    Signed-off-by: David S. Miller

    Alan Cox
     

26 Jul, 2008

1 commit

  • Removes legacy reinvent-the-wheel type thing. The generic
    machinery integrates much better to automated debugging aids
    such as kerneloops.org (and others), and is unambiguous due to
    better naming. Non-intuively BUG_TRAP() is actually equal to
    WARN_ON() rather than BUG_ON() though some might actually be
    promoted to BUG_ON() but I left that to future.

    I could make at least one BUILD_BUG_ON conversion.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

06 Jul, 2008

1 commit


02 Jul, 2008

1 commit


06 Jun, 2008

1 commit


28 Apr, 2008

1 commit

  • Previously I added sessionid output to all audit messages where it was
    available but we still didn't know the sessionid of the sender of
    netlink messages. This patch adds that information to netlink messages
    so we can audit who sent netlink messages.

    Signed-off-by: Eric Paris
    Signed-off-by: Al Viro

    Eric Paris
     

19 Apr, 2008

2 commits

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    security: fix up documentation for security_module_enable
    Security: Introduce security= boot parameter
    Audit: Final renamings and cleanup
    SELinux: use new audit hooks, remove redundant exports
    Audit: internally use the new LSM audit hooks
    LSM/Audit: Introduce generic Audit LSM hooks
    SELinux: remove redundant exports
    Netlink: Use generic LSM hook
    Audit: use new LSM hooks instead of SELinux exports
    SELinux: setup new inode/ipc getsecid hooks
    LSM: Introduce inode_getsecid and ipc_getsecid hooks

    Linus Torvalds
     
  • Don't use SELinux exported selinux_get_task_sid symbol.
    Use the generic LSM equivalent instead.

    Signed-off-by: Casey Schaufler
    Signed-off-by: Ahmed S. Darwish
    Acked-by: James Morris
    Acked-by: David S. Miller
    Reviewed-by: Paul Moore

    Ahmed S. Darwish