23 Sep, 2010

1 commit


22 May, 2010

1 commit

  • When netlink sockets are used to convey data that is in a namespace
    we need a way to select a subset of the listening sockets to deliver
    the packet to. For the network namespace we have been doing this
    by only transmitting packets in the correct network namespace.

    For data belonging to other namespaces netlink_bradcast_filtered
    provides a mechanism that allows us to examine the destination
    socket and to decide if we should transmit the specified packet
    to it.

    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

21 Mar, 2010

1 commit

  • Currently, ENOBUFS errors are reported to the socket via
    netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
    that should not happen. This fixes this problem and it changes the
    prototype of netlink_set_err() to return the number of sockets that
    have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
    value is used in the next patch in these bugfix series.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

05 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Sep, 2009

1 commit

  • Similar to commit d136f1bd366fdb7e747ca7e0218171e7a00a98a5,
    there's a bug when unregistering a generic netlink family,
    which is caught by the might_sleep() added in that commit:

    BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
    in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
    2 locks held by rmmod/1510:
    #0: (genl_mutex){+.+.+.}, at: [] genl_unregister_family+0x2b/0x130
    #1: (rcu_read_lock){.+.+..}, at: [] __genl_unregister_mc_group+0x1c/0x120
    Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
    Call Trace:
    [] __might_sleep+0x119/0x150
    [] netlink_table_grab+0x21/0x100
    [] netlink_clear_multicast_users+0x23/0x60
    [] __genl_unregister_mc_group+0x71/0x120
    [] genl_unregister_family+0x56/0x130
    [] nl80211_exit+0x15/0x20 [cfg80211]
    [] cfg80211_exit+0x1a/0x40 [cfg80211]

    Fix in the same way by grabbing the netlink table lock
    before doing rcu_read_lock().

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

15 Sep, 2009

1 commit

  • Since my commits introducing netns awareness into
    genetlink we can get this problem:

    BUG: scheduling while atomic: modprobe/1178/0x00000002
    2 locks held by modprobe/1178:
    #0: (genl_mutex){+.+.+.}, at: [] genl_register_mc_grou
    #1: (rcu_read_lock){.+.+..}, at: [] genl_register_mc_g
    Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty #
    Call Trace:
    [] __schedule_bug+0x85/0x90
    [] schedule+0x108/0x588
    [] netlink_table_grab+0xa1/0xf0
    [] netlink_change_ngroups+0x47/0x100
    [] genl_register_mc_group+0x12f/0x290

    because I overlooked that netlink_table_grab() will
    schedule, thinking it was just the rwlock. However,
    in the contention case, that isn't actually true.

    Fix this by letting the code grab the netlink table
    lock first and then the RCU for netns protection.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

25 Aug, 2009

1 commit


25 Mar, 2009

1 commit

  • This patch adds the NETLINK_NO_ENOBUFS socket flag. This flag can
    be used by unicast and broadcast listeners to avoid receiving
    ENOBUFS errors.

    Generally speaking, ENOBUFS errors are useful to notify two things
    to the listener:

    a) You may increase the receiver buffer size via setsockopt().
    b) You have lost messages, you may be out of sync.

    In some cases, ignoring ENOBUFS errors can be useful. For example:

    a) nfnetlink_queue: this subsystem does not have any sort of resync
    method and you can decide to ignore ENOBUFS once you have set a
    given buffer size.

    b) ctnetlink: you can use this together with the socket flag
    NETLINK_BROADCAST_SEND_ERROR to stop getting ENOBUFS errors as
    you do not need to resync (packets whose event are not delivered
    are drop to provide reliable logging and state-synchronization).

    Moreover, the use of NETLINK_NO_ENOBUFS also reduces a "go up, go down"
    effect in terms of performance which is due to the netlink congestion
    control when the listener cannot back off. The effect is the following:

    1) throughput rate goes up and netlink messages are inserted in the
    receiver buffer.
    2) Then, netlink buffer fills and overruns (set on nlk->state bit 0).
    3) While the listener empties the receiver buffer, netlink keeps
    dropping messages. Thus, throughput goes dramatically down.
    4) Then, once the listener has emptied the buffer (nlk->state
    bit 0 is set off), goto step 1.

    This effect is easy to trigger with netlink broadcast under heavy
    load, and it is more noticeable when using a big receiver buffer.
    You can find some results in [1] that show this problem.

    [1] http://1984.lsi.us.es/linux/netlink/

    This patch also includes the use of sk_drop to account the number of
    netlink messages drop due to overrun. This value is shown in
    /proc/net/netlink.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

20 Feb, 2009

1 commit

  • This patch adds NETLINK_BROADCAST_ERROR which is a netlink
    socket option that the listener can set to make netlink_broadcast()
    return errors in the delivery to the caller. This option is useful
    if the caller of netlink_broadcast() do something with the result
    of the message delivery, like in ctnetlink where it drops a network
    packet if the event delivery failed, this is used to enable reliable
    logging and state-synchronization. If this socket option is not set,
    netlink_broadcast() only reports ESRCH errors and silently ignore
    ENOBUFS errors, which is what most netlink_broadcast() callers
    should do.

    This socket option is based on a suggestion from Patrick McHardy.
    Patrick McHardy can exchange this patch for a beer from me ;).

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

20 Nov, 2008

1 commit


06 Jun, 2008

1 commit


28 Apr, 2008

1 commit

  • Previously I added sessionid output to all audit messages where it was
    available but we still didn't know the sessionid of the sender of
    netlink messages. This patch adds that information to netlink messages
    so we can audit who sent netlink messages.

    Signed-off-by: Eric Paris
    Signed-off-by: Al Viro

    Eric Paris
     

01 Feb, 2008

1 commit

  • Normally during a dump the key of the last dumped entry is used for
    continuation, but since lock is dropped it might be lost. In that case
    fallback to the old counter based N^2 behaviour. This means the dump
    will end up skipping some routes which matches what FIB_HASH does.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

29 Jan, 2008

2 commits


07 Nov, 2007

1 commit

  • Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
    by moving the schedule_timeout() call to a new function that doesn't
    propagate the remaining timeout back to the caller. This means on each
    retry we start with the full timeout again.

    ipc/mqueue.c seems to actually want to wait indefinitely so this
    behaviour is retained.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

11 Oct, 2007

4 commits

  • This patch make processing netlink user -> kernel messages synchronious.
    This change was inspired by the talk with Alexey Kuznetsov about current
    netlink messages processing. He says that he was badly wrong when introduced
    asynchronious user -> kernel communication.

    The call netlink_unicast is the only path to send message to the kernel
    netlink socket. But, unfortunately, it is also used to send data to the
    user.

    Before this change the user message has been attached to the socket queue
    and sk->sk_data_ready was called. The process has been blocked until all
    pending messages were processed. The bad thing is that this processing
    may occur in the arbitrary process context.

    This patch changes nlk->data_ready callback to get 1 skb and force packet
    processing right in the netlink_unicast.

    Kernel -> user path in netlink_unicast remains untouched.

    EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
    drop, but the process remains in the cycle until the message will be fully
    processed. So, there is no need to use this kludges now.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • netlink_sendskb does not use third argument. Clean it and save a couple of
    bytes.

    Signed-off-by: Denis V. Lunev
    Acked-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • This change allows the generic attribute interface to be used within
    the netfilter subsystem where this flag was initially introduced.

    The byte-order flag is yet unused, it's intended use is to
    allow automatic byte order convertions for all atomic types.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Each netlink socket will live in exactly one network namespace,
    this includes the controlling kernel sockets.

    This patch updates all of the existing netlink protocols
    to only support the initial network namespace. Request
    by clients in other namespaces will get -ECONREFUSED.
    As they would if the kernel did not have the support for
    that netlink protocol compiled in.

    As each netlink protocol is updated to be multiple network
    namespace safe it can register multiple kernel sockets
    to acquire a presence in the rest of the network namespaces.

    The implementation in af_netlink is a simple filter implementation
    at hash table insertion and hash table look up time.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

19 Jul, 2007

2 commits


06 May, 2007

1 commit

  • People treating the *_pid fields in netlink as a process ID has caused
    endless confusion over the years. The fact that our own netlink.h
    does this only adds to the confusion.

    So here is a patch to change the comments to refer to it as the port
    ID which hopefully will make it clear what the purpose of the fields
    really is.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

26 Apr, 2007

5 commits


13 Feb, 2007

1 commit

  • This is the transport code for public key functionality in eCryptfs. It
    manages encryption/decryption request queues with a transport mechanism.
    Currently, netlink is the only implemented transport.

    Each inode has a unique File Encryption Key (FEK). Under passphrase, a File
    Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase
    combo on mount. This FEKEK encrypts each FEK and writes it into the header of
    each file using the packet format specified in RFC 2440. This is all
    symmetric key encryption, so it can all be done via the kernel crypto API.

    These new patches introduce public key encryption of the FEK. There is no
    asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes
    the FEK encryption and decryption out to a userspace daemon. After
    considering our requirements and determining the complexity of using various
    transport mechanisms, we settled on netlink for this communication.

    eCryptfs stores authentication tokens into the kernel keyring. These tokens
    correlate with individual keys. For passphrase mode of operation, the
    authentication token contains the symmetric FEKEK. For public key, the
    authentication token contains a PKI type and an opaque data blob managed by
    individual PKI modules in userspace.

    Each user who opens a file under an eCryptfs partition mounted in public key
    mode must be running a daemon. That daemon has the user's credentials and has
    access to all of the keys to which the user should have access. The daemon,
    when started, initializes the pluggable PKI modules available on the system
    and registers itself with the eCryptfs kernel module. Userspace utilities
    register public key authentication tokens into the user session keyring.
    These authentication tokens correlate key signatures with PKI modules and PKI
    blobs. The PKI blobs contain PKI-specific information necessary for the PKI
    module to carry out asymmetric key encryption and decryption.

    When the eCryptfs module parses the header of an existing file and finds a Tag
    1 (Public Key) packet (see RFC 2440), it reads in the public key identifier
    (signature). The asymmetrically encrypted FEK is in the Tag 1 packet;
    eCryptfs puts together a decrypt request packet containing the signature and
    the encrypted FEK, then it passes it to the daemon registered for the
    current->euid via a netlink unicast to the PID of the daemon, which was
    registered at the time the daemon was started by the user.

    The daemon actually just makes calls to libecryptfs, which implements request
    packet parsing and manages PKI modules. libecryptfs grabs the public key
    authentication token for the given signature from the user session keyring.
    This auth tok tells libecryptfs which PKI module should receive the request.
    libecryptfs then makes a decrypt() call to the PKI module, and it passes along
    the PKI block from the auth tok. The PKI uses the blob to figure out how it
    should decrypt the data passed to it; it performs the decryption and passes
    the decrypted data back to libecryptfs. libecryptfs then puts together a
    reply packet with the decrypted FEK and passes that back to the eCryptfs
    module.

    The eCryptfs module manages these request callouts to userspace code via
    message context structs. The module maintains an array of message context
    structs and places the elements of the array on two lists: a free and an
    allocated list. When eCryptfs wants to make a request, it moves a msg ctx
    from the free list to the allocated list, sets its state to pending, and fires
    off the message to the user's registered daemon.

    When eCryptfs receives a netlink message (via the callback), it correlates the
    msg ctx struct in the alloc list with the data in the message itself. The
    msg->index contains the offset of the array of msg ctx structs. It verifies
    that the registered daemon PID is the same as the PID of the process that sent
    the message. It also validates a sequence number between the received packet
    and the msg ctx. Then, it copies the contents of the message (the reply
    packet) into the msg ctx struct, sets the state in the msg ctx to done, and
    wakes up the process that was sleeping while waiting for the reply.

    The sleeping process was whatever was performing the sys_open(). This process
    originally called ecryptfs_send_message(); it is now in
    ecryptfs_wait_for_response(). When it wakes up and sees that the msg ctx
    state was set to done, it returns a pointer to the message contents (the reply
    packet) and returns. If all went well, this packet contains the decrypted
    FEK, which is then copied into the crypt_stat struct, and life continues as
    normal.

    The case for creation of a new file is very similar, only instead of a decrypt
    request, eCryptfs sends out an encrypt request.

    > - We have a great clod of key mangement code in-kernel. Why is that
    > not suitable (or growable) for public key management?

    eCryptfs uses Howells' keyring to store persistent key data and PKI state
    information. It defers public key cryptographic transformations to userspace
    code. The userspace data manipulation request really is orthogonal to key
    management in and of itself. What eCryptfs basically needs is a secure way to
    communicate with a particular daemon for a particular task doing a syscall,
    based on the UID. Nothing running under another UID should be able to access
    that channel of communication.

    > - Is it appropriate that new infrastructure for public key
    > management be private to a particular fs?

    The messaging.c file contains a lot of code that, perhaps, could be extracted
    into a separate kernel service. In essence, this would be a sort of
    request/reply mechanism that would involve a userspace daemon. I am not aware
    of anything that does quite what eCryptfs does, so I was not aware of any
    existing tools to do just what we wanted.

    > What happens if one of these daemons exits without sending a quit
    > message?

    There is a stale uidpid association in the hash table for that user. When
    the user registers a new daemon, eCryptfs cleans up the old association and
    generates a new one. See ecryptfs_process_helo().

    > - _why_ does it use netlink?

    Netlink provides the transport mechanism that would minimize the complexity of
    the implementation, given that we can have multiple daemons (one per user). I
    explored the possibility of using relayfs, but that would involve having to
    introduce control channels and a protocol for creating and tearing down
    channels for the daemons. We do not have to worry about any of that with
    netlink.

    Signed-off-by: Michael Halcrow
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     

03 Dec, 2006

2 commits


03 Sep, 2006

1 commit

  • This patch formally adds support for the posting of FC events via netlink.
    It is a followup to the original RFC at:
    http://marc.theaimsgroup.com/?l=linux-scsi&m=114530667923464&w=2
    and the initial posting at:
    http://marc.theaimsgroup.com/?l=linux-scsi&m=115507374832500&w=2

    The patch has been updated to optimize the send path, per the discussions
    in the initial posting.

    Per discussions at the Storage Summit and at OLS, we are to use netlink for
    async events from transports. Also per discussions, to avoid a netlink
    protocol per transport, I've create a single NETLINK_SCSITRANSPORT protocol,
    which can then be used by all transports.

    This patch:
    - Creates new files scsi_netlink.c and scsi_netlink.h, which contains the
    single and shared definitions for the SCSI Transport. It is tied into the
    base SCSI subsystem intialization.
    Contains a single interface routine, scsi_send_transport_event(), for a
    transport to send an event (via multicast to a protocol specific group).
    - Creates a new scsi_netlink_fc.h file, which contains the FC netlink event
    messages
    - Adds 3 new routines to the fc transport:
    fc_get_event_number() - to get a FC event #
    fc_host_post_event() - to send a simple FC event (32 bits of data)
    fc_host_post_vendor_event() - to send a Vendor unique event, with
    arbitrary amounts of data.

    Note: the separation of event number allows for a LLD to send a standard
    event, followed by vendor-specific data for the event.

    Note: This patch assumes 2 prior fc transport patches have been installed:
    http://marc.theaimsgroup.com/?l=linux-scsi&m=115555807316329&w=2
    http://marc.theaimsgroup.com/?l=linux-scsi&m=115581614930261&w=2

    Sorry - next time I'll do something like making these individual
    patches of the same posting when I know they'll be posted closely
    together.

    Signed-off-by: James Smart

    Tidy up configuration not to make SCSI always select NET

    Signed-off-by: James Bottomley

    James Smart
     

23 Jun, 2006

1 commit


01 May, 2006

1 commit

  • The below patch should be applied after the inode and ipc sid patches.
    This patch is a reworking of Tim's patch that has been updated to match
    the inode and ipc patches since its similar.

    [updated:
    > Stephen Smalley also wanted to change a variable from isec to tsec in the
    > user sid patch. ]

    Signed-off-by: Steve Grubb
    Signed-off-by: Al Viro

    Steve Grubb
     

21 Mar, 2006

1 commit


10 Feb, 2006

1 commit

  • netlink overrun was broken while improvement of netlink.
    Destination socket is used in the place where it was meant to be source socket,
    so that now overrun is never sent to user netlink sockets, when it should be,
    and it even can be set on kernel socket, which results in complete deadlock
    of rtnetlink.

    Suggested fix is to restore status quo passing source socket as additional
    argument to netlink_attachskb().

    A little explanation: overrun is set on a socket, when it failed
    to receive some message and sender of this messages does not or even
    have no way to handle this error. This happens in two cases:
    1. when kernel sends something. Kernel never retransmits and cannot
    wait for buffer space.
    2. when user sends a broadcast and the message was not delivered
    to some recipients.

    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller

    Alexey Kuznetsov
     

10 Nov, 2005

1 commit

  • Introduces a new type-safe interface for netlink message and
    attributes handling. The interface is fully binary compatible
    with the old interface towards userspace. Besides type safety,
    this interface features attribute validation capabilities,
    simplified message contstruction, and documentation.

    The resulting netlink code should be smaller, less error prone
    and easier to understand.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

15 Sep, 2005

1 commit


12 Sep, 2005

1 commit

  • Kernel connector - new userspace kernel space easy to use
    communication module which implements easy to use bidirectional
    message bus using netlink as it's backend. Connector was created to
    eliminate complex skb handling both in send and receive message bus
    direction.

    Connector driver adds possibility to connect various agents using as
    one of it's backends netlink based network. One must register
    callback and identifier. When driver receives special netlink message
    with appropriate identifier, appropriate callback will be called.

    From the userspace point of view it's quite straightforward:

    socket();
    bind();
    send();
    recv();

    But if kernelspace want to use full power of such connections, driver
    writer must create special sockets, must know about struct sk_buff
    handling... Connector allows any kernelspace agents to use netlink
    based networking for inter-process communication in a significantly
    easier way:

    int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
    void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);

    struct cb_id
    {
    __u32 idx;
    __u32 val;
    };

    idx and val are unique identifiers which must be registered in
    connector.h for in-kernel usage. void (*callback) (void *) - is a
    callback function which will be called when message with above idx.val
    will be received by connector core.

    Using connector completely hides low-level transport layer from it's
    users.

    Connector uses new netlink ability to have many groups in one socket.

    [ Incorporating many cleanups and fixes by myself and
    Andrew Morton -DaveM ]

    Signed-off-by: Evgeniy Polyakov
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Evgeniy Polyakov