19 Oct, 2012

1 commit

  • On some suspend/resume operations involving wimax device, we have
    noticed some intermittent memory corruptions in netlink code.

    Stéphane Marchesin tracked this corruption in netlink_update_listeners()
    and suggested a patch.

    It appears netlink_release() should use kfree_rcu() instead of kfree()
    for the listeners structure as it may be used by other cpus using RCU
    protection.

    netlink_release() must set to NULL the listeners pointer when
    it is about to be freed.

    Also have to protect netlink_update_listeners() and
    netlink_has_listeners() if listeners is NULL.

    Add a nl_deref_protected() lockdep helper to properly document which
    locks protects us.

    Reported-by: Jonathan Kliegman
    Signed-off-by: Eric Dumazet
    Cc: Stéphane Marchesin
    Cc: Sam Leffler
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Oct, 2012

1 commit

  • I get a panic when I use ss -a and rmmod inet_diag at the
    same time.

    It's because netlink_dump uses inet_diag_dump which belongs to module
    inet_diag.

    I search the codes and find many modules have the same problem. We
    need to add a reference to the module which the cb->dump belongs to.

    Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.

    Change From v3:
    change netlink_dump_start to inline,suggestion from Pablo and
    Eric.

    Change From v2:
    delete netlink_dump_done,and call module_put in netlink_dump
    and netlink_sock_destruct.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

09 Sep, 2012

2 commits

  • This patch defines netlink_kernel_create as a wrapper function of
    __netlink_kernel_create to hide the struct module *me parameter
    (which seems to be THIS_MODULE in all existing netlink subsystems).

    Suggested by David S. Miller.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Replace netlink_set_nonroot by one new field `flags' in
    struct netlink_kernel_cfg that is passed to netlink_kernel_create.

    This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
    now the flags field in nl_table is generic (so we can add more
    flags if needed in the future).

    Also adjust all callers in the net-next tree to use these flags
    instead of netlink_set_nonroot.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

08 Sep, 2012

1 commit

  • Passing uids and gids on NETLINK_CB from a process in one user
    namespace to a process in another user namespace can result in the
    wrong uid or gid being presented to userspace. Avoid that problem by
    passing kuids and kgids instead.

    - define struct scm_creds for use in scm_cookie and netlink_skb_parms
    that holds uid and gid information in kuid_t and kgid_t.

    - Modify scm_set_cred to fill out scm_creds by heand instead of using
    cred_to_ucred to fill out struct ucred. This conversion ensures
    userspace does not get incorrect uid or gid values to look at.

    - Modify scm_recv to convert from struct scm_creds to struct ucred
    before copying credential values to userspace.

    - Modify __scm_send to populate struct scm_creds on in the scm_cookie,
    instead of just copying struct ucred from userspace.

    - Modify netlink_sendmsg to copy scm_creds instead of struct ucred
    into the NETLINK_CB.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

01 Sep, 2012

1 commit


25 Aug, 2012

2 commits

  • This is an initial merge in of Eric Biederman's work to start adding
    user namespace support to the networking.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Non-root user-space processes can send Netlink messages to other
    processes that are well-known for being subscribed to Netlink
    asynchronous notifications. This allows ilegitimate non-root
    process to send forged messages to Netlink subscribers.

    The userspace process usually verifies the legitimate origin in
    two ways:

    a) Socket credentials. If UID != 0, then the message comes from
    some ilegitimate process and the message needs to be dropped.

    b) Netlink portID. In general, portID == 0 means that the origin
    of the messages comes from the kernel. Thus, discarding any
    message not coming from the kernel.

    However, ctnetlink sets the portID in event messages that has
    been triggered by some user-space process, eg. conntrack utility.
    So other processes subscribed to ctnetlink events, eg. conntrackd,
    know that the event was triggered by some user-space action.

    Neither of the two ways to discard ilegitimate messages coming
    from non-root processes can help for ctnetlink.

    This patch adds capability validation in case that dst_pid is set
    in netlink_sendmsg(). This approach is aggressive since existing
    applications using any Netlink bus to deliver messages between
    two user-space processes will break. Note that the exception is
    NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
    userspace communication.

    Still, if anyone wants that his Netlink bus allows netlink-to-netlink
    userspace, then they can set NL_NONROOT_SEND. However, by default,
    I don't think it makes sense to allow to use NETLINK_ROUTE to
    communicate two processes that are sending no matter what information
    that is not related to link/neighbouring/routing. They should be using
    NETLINK_USERSOCK instead for that.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

22 Aug, 2012

1 commit

  • Pablo Neira Ayuso discovered that avahi and
    potentially NetworkManager accept spoofed Netlink messages because of a
    kernel bug. The kernel passes all-zero SCM_CREDENTIALS ancillary data
    to the receiver if the sender did not provide such data, instead of not
    including any such data at all or including the correct data from the
    peer (as it is the case with AF_UNIX).

    This bug was introduced in commit 16e572626961
    (af_unix: dont send SCM_CREDENTIALS by default)

    This patch forces passing credentials for netlink, as
    before the regression.

    Another fix would be to not add SCM_CREDENTIALS in
    netlink messages if not provided by the sender, but it
    might break some programs.

    With help from Florian Weimer & Petr Matousek

    This issue is designated as CVE-2012-3520

    Signed-off-by: Eric Dumazet
    Cc: Petr Matousek
    Cc: Florian Weimer
    Cc: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Aug, 2012

1 commit

  • The sending socket of an skb is already available by it's port id
    in the NETLINK_CB. If you want to know more like to examine the
    credentials on the sending socket you have to look up the sending
    socket by it's port id and all of the needed functions and data
    structures are static inside of af_netlink.c. So do the simple
    thing and pass the sending socket to the receivers in the NETLINK_CB.

    I intend to use this to get the user namespace of the sending socket
    in inet_diag so that I can report uids in the context of the process
    who opened the socket, the same way I report uids in the contect
    of the process who opens files.

    Acked-by: David S. Miller
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

24 Jul, 2012

1 commit


11 Jul, 2012

1 commit


30 Jun, 2012

2 commits

  • This patch adds a hook in the binding path of netlink.

    This is used by ctnetlink to allow module autoloading for the case
    in which one user executes:

    conntrack -E

    So far, this resulted in nfnetlink loaded, but not
    nf_conntrack_netlink.

    I have received in the past many complains on this behaviour.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • This patch adds the following structure:

    struct netlink_kernel_cfg {
    unsigned int groups;
    void (*input)(struct sk_buff *skb);
    struct mutex *cb_mutex;
    };

    That can be passed to netlink_kernel_create to set optional configurations
    for netlink kernel sockets.

    I've populated this structure by looking for NULL and zero parameters at the
    existing code. The remaining parameters that always need to be set are still
    left in the original interface.

    That includes optional parameters for the netlink socket creation. This allows
    easy extensibility of this interface in the future.

    This patch also adapts all callers to use this new interface.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

30 May, 2012

1 commit

  • Generic netlink searches for -type- formatted aliases when requesting a module to
    fulfill a protocol request (i.e. net-pf-16-proto-16-type-, where x is a type
    value). However generic netlink protocols have no well defined type numbers,
    they have string names. Modify genl_ctrl_getfamily to request an alias in the
    format net-pf-16-proto-16-family- instead, where x is a generic string, and
    add a macro that builds on the previously added MODULE_ALIAS_NET_PF_PROTO_NAME
    macro to allow modules to specifify those generic strings.

    Note, l2tp previously hacked together an net-pf-16-proto-16-type-l2tp alias
    using the MODULE_ALIAS macro, with these updates we can convert that to use the
    PROTO_NAME macro.

    Signed-off-by: Neil Horman
    CC: Eric Dumazet
    CC: James Chapman
    CC: David Miller
    Signed-off-by: David S. Miller

    Neil Horman
     

24 Apr, 2012

2 commits


20 Apr, 2012

1 commit


11 Apr, 2012

1 commit


06 Apr, 2012

1 commit


02 Apr, 2012

1 commit


27 Feb, 2012

2 commits

  • This patch allows you to pass a data pointer that can be
    accessed from the dump callback.

    Netfilter is going to use this patch to provide filtered dumps
    to user-space. This is specifically interesting in ctnetlink that
    may handle lots of conntrack entries. We can save precious
    cycles by skipping the conversion to TLV format of conntrack
    entries that are not interesting for user-space.

    More specifically, ctnetlink will include one operation to allow
    to filter the dumping of conntrack entries by ctmark values.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Davem considers that the argument list of this interface is getting
    out of control. This patch tries to address this issue following
    his proposal:

    struct netlink_dump_control c = { .dump = dump, .done = done, ... };

    netlink_dump_start(..., &c);

    Suggested by David S. Miller.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

31 Jan, 2012

1 commit

  • text data bss dec hex filename
    8455963 532732 1810804 10799499 a4c98b vmlinux.o.before
    8448899 532732 1810804 10792435 a4adf3 vmlinux.o

    This change also removes commented-out copy of __nlmsg_put
    which was last touched in 2005 with "Enable once all users
    have been converted" comment on top.

    Changes in v2: rediffed against net-next.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

15 Jan, 2012

1 commit

  • * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
    capabilities: remove __cap_full_set definition
    security: remove the security_netlink_recv hook as it is equivalent to capable()
    ptrace: do not audit capability check when outputing /proc/pid/stat
    capabilities: remove task_ns_* functions
    capabitlies: ns_capable can use the cap helpers rather than lsm call
    capabilities: style only - move capable below ns_capable
    capabilites: introduce new has_ns_capabilities_noaudit
    capabilities: call has_ns_capability from has_capability
    capabilities: remove all _real_ interfaces
    capabilities: introduce security_capable_noaudit
    capabilities: reverse arguments to security_capable
    capabilities: remove the task from capable LSM hook entirely
    selinux: sparse fix: fix several warnings in the security server cod
    selinux: sparse fix: fix warnings in netlink code
    selinux: sparse fix: eliminate warnings for selinuxfs
    selinux: sparse fix: declare selinux_disable() in security.h
    selinux: sparse fix: move selinux_complete_init
    selinux: sparse fix: make selinux_secmark_refcount static
    SELinux: Fix RCU deref check warning in sel_netport_insert()

    Manually fix up a semantic mis-merge wrt security_netlink_recv():

    - the interface was removed in commit fd7784615248 ("security: remove
    the security_netlink_recv hook as it is equivalent to capable()")

    - a new user of it appeared in commit a38f7907b926 ("crypto: Add
    userspace configuration API")

    causing no automatic merge conflict, but Eric Paris pointed out the
    issue.

    Linus Torvalds
     

06 Jan, 2012

1 commit


29 Dec, 2011

1 commit

  • When testing L2TP support, I discovered that the l2tp module is not autoloaded
    as are other netlink interfaces. There is because of lack of hook in genetlink to call
    request_module and load the module.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

24 Dec, 2011

1 commit

  • We can't do this without propagating the const to nlk_sk()
    too, otherwise:

    net/netlink/af_netlink.c: In function ‘netlink_is_kernel’:
    net/netlink/af_netlink.c:103:2: warning: passing argument 1 of ‘nlk_sk’ discards ‘const’ qualifier from pointer target type [enabled by default]
    net/netlink/af_netlink.c:96:36: note: expected ‘struct sock *’ but argument is of type ‘const struct sock *’

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Dec, 2011

2 commits


04 Dec, 2011

2 commits

  • Open vSwitch uses genl_mutex locking to protect datapath
    data-structures like flow-table, flow-actions. Following patch adds
    lockdep_genl_is_held() which is used for rcu annotation to prove
    locking.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     
  • Open vSwitch uses Generic Netlink interface for communication
    between userspace and kernel module. genl_notify() is used
    for sending notification back to userspace.

    genl_notify() is analogous to rtnl_notify() but uses genl_sock
    instead of rtnl.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     

29 Sep, 2011

1 commit

  • Since commit 7361c36c5224 (af_unix: Allow credentials to work across
    user and pid namespaces) af_unix performance dropped a lot.

    This is because we now take a reference on pid and cred in each write(),
    and release them in read(), usually done from another process,
    eventually from another cpu. This triggers false sharing.

    # Events: 154K cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... .................. .........................
    #
    10.40% hackbench [kernel.kallsyms] [k] put_pid
    8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
    7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg
    6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock
    4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb
    4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns
    4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred
    2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm
    2.24% hackbench [kernel.kallsyms] [k] sub_preempt_count
    1.75% hackbench [kernel.kallsyms] [k] fget_light
    1.51% hackbench [kernel.kallsyms] [k]
    __mutex_lock_interruptible_slowpath
    1.42% hackbench [kernel.kallsyms] [k] sock_alloc_send_pskb

    This patch includes SCM_CREDENTIALS information in a af_unix message/skb
    only if requested by the sender, [man 7 unix for details how to include
    ancillary data using sendmsg() system call]

    Note: This might break buggy applications that expected SCM_CREDENTIAL
    from an unaware write() system call, and receiver not using SO_PASSCRED
    socket option.

    If SOCK_PASSCRED is set on source or destination socket, we still
    include credentials for mere write() syscalls.

    Performance boost in hackbench : more than 50% gain on a 16 thread
    machine (2 quad-core cpus, 2 threads per core)

    hackbench 20 thread 2000

    4.228 sec instead of 9.102 sec

    Signed-off-by: Eric Dumazet
    Acked-by: Tim Chen
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Aug, 2011

1 commit


25 Jun, 2011

1 commit


23 Jun, 2011

1 commit

  • Consider the following situation:
    * a dump that would show 8 entries, four in the first
    round, and four in the second
    * between the first and second rounds, 6 entries are
    removed
    * now the second round will not show any entry, and
    even if there is a sequence/generation counter the
    application will not know

    To solve this problem, add a new flag NLM_F_DUMP_INTR
    to the netlink header that indicates the dump wasn't
    consistent, this flag can also be set on the MSG_DONE
    message that terminates the dump, and as such above
    situation can be detected.

    To achieve this, add a sequence counter to the netlink
    callback struct. Of course, netlink code still needs
    to use this new functionality. The correct way to do
    that is to always set cb->seq when a dumpit callback
    is invoked and call nl_dump_check_consistent() for
    each new message. The core code will also call this
    function for the final MSG_DONE message.

    To make it usable with generic netlink, a new function
    genlmsg_nlhdr() is needed to obtain the netlink header
    from the genetlink user header.

    Signed-off-by: Johannes Berg
    Acked-by: David S. Miller
    Signed-off-by: John W. Linville

    Johannes Berg
     

17 Jun, 2011

1 commit


10 Jun, 2011

1 commit

  • The message size allocated for rtnl ifinfo dumps was limited to
    a single page. This is not enough for additional interface info
    available with devices that support SR-IOV and caused a bug in
    which VF info would not be displayed if more than approximately
    40 VFs were created per interface.

    Implement a new function pointer for the rtnl_register service that will
    calculate the amount of data required for the ifinfo dump and allocate
    enough data to satisfy the request.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher

    Greg Rose
     

24 May, 2011

1 commit

  • The %pK format specifier is designed to hide exposed kernel pointers,
    specifically via /proc interfaces. Exposing these pointers provides an
    easy target for kernel write vulnerabilities, since they reveal the
    locations of writable structures containing easily triggerable function
    pointers. The behavior of %pK depends on the kptr_restrict sysctl.

    If kptr_restrict is set to 0, no deviation from the standard %p behavior
    occurs. If kptr_restrict is set to 1, the default, if the current user
    (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
    (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
    If kptr_restrict is set to 2, kernel pointers using %pK are printed as
    0's regardless of privileges. Replacing with 0's was chosen over the
    default "(null)", which cannot be parsed by userland %p, which expects
    "(nil)".

    The supporting code for kptr_restrict and %pK are currently in the -mm
    tree. This patch converts users of %p in net/ to %pK. Cases of printing
    pointers to the syslog are not covered, since this would eliminate useful
    information for postmortem debugging and the reading of the syslog is
    already optionally protected by the dmesg_restrict sysctl.

    Signed-off-by: Dan Rosenberg
    Cc: James Morris
    Cc: Eric Dumazet
    Cc: Thomas Graf
    Cc: Eugene Teo
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Dan Rosenberg