24 Apr, 2012

2 commits


20 Apr, 2012

1 commit


11 Apr, 2012

1 commit


06 Apr, 2012

1 commit


02 Apr, 2012

1 commit


27 Feb, 2012

2 commits

  • This patch allows you to pass a data pointer that can be
    accessed from the dump callback.

    Netfilter is going to use this patch to provide filtered dumps
    to user-space. This is specifically interesting in ctnetlink that
    may handle lots of conntrack entries. We can save precious
    cycles by skipping the conversion to TLV format of conntrack
    entries that are not interesting for user-space.

    More specifically, ctnetlink will include one operation to allow
    to filter the dumping of conntrack entries by ctmark values.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     
  • Davem considers that the argument list of this interface is getting
    out of control. This patch tries to address this issue following
    his proposal:

    struct netlink_dump_control c = { .dump = dump, .done = done, ... };

    netlink_dump_start(..., &c);

    Suggested by David S. Miller.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

31 Jan, 2012

1 commit

  • text data bss dec hex filename
    8455963 532732 1810804 10799499 a4c98b vmlinux.o.before
    8448899 532732 1810804 10792435 a4adf3 vmlinux.o

    This change also removes commented-out copy of __nlmsg_put
    which was last touched in 2005 with "Enable once all users
    have been converted" comment on top.

    Changes in v2: rediffed against net-next.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

15 Jan, 2012

1 commit

  • * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
    capabilities: remove __cap_full_set definition
    security: remove the security_netlink_recv hook as it is equivalent to capable()
    ptrace: do not audit capability check when outputing /proc/pid/stat
    capabilities: remove task_ns_* functions
    capabitlies: ns_capable can use the cap helpers rather than lsm call
    capabilities: style only - move capable below ns_capable
    capabilites: introduce new has_ns_capabilities_noaudit
    capabilities: call has_ns_capability from has_capability
    capabilities: remove all _real_ interfaces
    capabilities: introduce security_capable_noaudit
    capabilities: reverse arguments to security_capable
    capabilities: remove the task from capable LSM hook entirely
    selinux: sparse fix: fix several warnings in the security server cod
    selinux: sparse fix: fix warnings in netlink code
    selinux: sparse fix: eliminate warnings for selinuxfs
    selinux: sparse fix: declare selinux_disable() in security.h
    selinux: sparse fix: move selinux_complete_init
    selinux: sparse fix: make selinux_secmark_refcount static
    SELinux: Fix RCU deref check warning in sel_netport_insert()

    Manually fix up a semantic mis-merge wrt security_netlink_recv():

    - the interface was removed in commit fd7784615248 ("security: remove
    the security_netlink_recv hook as it is equivalent to capable()")

    - a new user of it appeared in commit a38f7907b926 ("crypto: Add
    userspace configuration API")

    causing no automatic merge conflict, but Eric Paris pointed out the
    issue.

    Linus Torvalds
     

06 Jan, 2012

1 commit


29 Dec, 2011

1 commit

  • When testing L2TP support, I discovered that the l2tp module is not autoloaded
    as are other netlink interfaces. There is because of lack of hook in genetlink to call
    request_module and load the module.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

24 Dec, 2011

1 commit

  • We can't do this without propagating the const to nlk_sk()
    too, otherwise:

    net/netlink/af_netlink.c: In function ‘netlink_is_kernel’:
    net/netlink/af_netlink.c:103:2: warning: passing argument 1 of ‘nlk_sk’ discards ‘const’ qualifier from pointer target type [enabled by default]
    net/netlink/af_netlink.c:96:36: note: expected ‘struct sock *’ but argument is of type ‘const struct sock *’

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Dec, 2011

2 commits


04 Dec, 2011

2 commits

  • Open vSwitch uses genl_mutex locking to protect datapath
    data-structures like flow-table, flow-actions. Following patch adds
    lockdep_genl_is_held() which is used for rcu annotation to prove
    locking.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     
  • Open vSwitch uses Generic Netlink interface for communication
    between userspace and kernel module. genl_notify() is used
    for sending notification back to userspace.

    genl_notify() is analogous to rtnl_notify() but uses genl_sock
    instead of rtnl.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: Jesse Gross

    Pravin B Shelar
     

29 Sep, 2011

1 commit

  • Since commit 7361c36c5224 (af_unix: Allow credentials to work across
    user and pid namespaces) af_unix performance dropped a lot.

    This is because we now take a reference on pid and cred in each write(),
    and release them in read(), usually done from another process,
    eventually from another cpu. This triggers false sharing.

    # Events: 154K cycles
    #
    # Overhead Command Shared Object Symbol
    # ........ ....... .................. .........................
    #
    10.40% hackbench [kernel.kallsyms] [k] put_pid
    8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
    7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg
    6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock
    4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb
    4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns
    4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred
    2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm
    2.24% hackbench [kernel.kallsyms] [k] sub_preempt_count
    1.75% hackbench [kernel.kallsyms] [k] fget_light
    1.51% hackbench [kernel.kallsyms] [k]
    __mutex_lock_interruptible_slowpath
    1.42% hackbench [kernel.kallsyms] [k] sock_alloc_send_pskb

    This patch includes SCM_CREDENTIALS information in a af_unix message/skb
    only if requested by the sender, [man 7 unix for details how to include
    ancillary data using sendmsg() system call]

    Note: This might break buggy applications that expected SCM_CREDENTIAL
    from an unaware write() system call, and receiver not using SO_PASSCRED
    socket option.

    If SOCK_PASSCRED is set on source or destination socket, we still
    include credentials for mere write() syscalls.

    Performance boost in hackbench : more than 50% gain on a 16 thread
    machine (2 quad-core cpus, 2 threads per core)

    hackbench 20 thread 2000

    4.228 sec instead of 9.102 sec

    Signed-off-by: Eric Dumazet
    Acked-by: Tim Chen
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Aug, 2011

1 commit


25 Jun, 2011

1 commit


23 Jun, 2011

1 commit

  • Consider the following situation:
    * a dump that would show 8 entries, four in the first
    round, and four in the second
    * between the first and second rounds, 6 entries are
    removed
    * now the second round will not show any entry, and
    even if there is a sequence/generation counter the
    application will not know

    To solve this problem, add a new flag NLM_F_DUMP_INTR
    to the netlink header that indicates the dump wasn't
    consistent, this flag can also be set on the MSG_DONE
    message that terminates the dump, and as such above
    situation can be detected.

    To achieve this, add a sequence counter to the netlink
    callback struct. Of course, netlink code still needs
    to use this new functionality. The correct way to do
    that is to always set cb->seq when a dumpit callback
    is invoked and call nl_dump_check_consistent() for
    each new message. The core code will also call this
    function for the final MSG_DONE message.

    To make it usable with generic netlink, a new function
    genlmsg_nlhdr() is needed to obtain the netlink header
    from the genetlink user header.

    Signed-off-by: Johannes Berg
    Acked-by: David S. Miller
    Signed-off-by: John W. Linville

    Johannes Berg
     

17 Jun, 2011

1 commit


10 Jun, 2011

1 commit

  • The message size allocated for rtnl ifinfo dumps was limited to
    a single page. This is not enough for additional interface info
    available with devices that support SR-IOV and caused a bug in
    which VF info would not be displayed if more than approximately
    40 VFs were created per interface.

    Implement a new function pointer for the rtnl_register service that will
    calculate the amount of data required for the ifinfo dump and allocate
    enough data to satisfy the request.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher

    Greg Rose
     

24 May, 2011

1 commit

  • The %pK format specifier is designed to hide exposed kernel pointers,
    specifically via /proc interfaces. Exposing these pointers provides an
    easy target for kernel write vulnerabilities, since they reveal the
    locations of writable structures containing easily triggerable function
    pointers. The behavior of %pK depends on the kptr_restrict sysctl.

    If kptr_restrict is set to 0, no deviation from the standard %p behavior
    occurs. If kptr_restrict is set to 1, the default, if the current user
    (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
    (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
    If kptr_restrict is set to 2, kernel pointers using %pK are printed as
    0's regardless of privileges. Replacing with 0's was chosen over the
    default "(null)", which cannot be parsed by userland %p, which expects
    "(nil)".

    The supporting code for kptr_restrict and %pK are currently in the -mm
    tree. This patch converts users of %p in net/ to %pK. Cases of printing
    pointers to the syslog are not covered, since this would eliminate useful
    information for postmortem debugging and the reading of the syslog is
    already optionally protected by the dmesg_restrict sysctl.

    Signed-off-by: Dan Rosenberg
    Cc: James Morris
    Cc: Eric Dumazet
    Cc: Thomas Graf
    Cc: Eugene Teo
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Dan Rosenberg
     

08 May, 2011

1 commit


04 Mar, 2011

3 commits


01 Mar, 2011

1 commit

  • netlink_dump() may failed, but nobody handle its error.
    It generates output data, when a previous portion has been returned to
    user space. This mechanism works when all data isn't go in skb. If we
    enter in netlink_recvmsg() and skb is absent in the recv queue, the
    netlink_dump() will not been executed. So if netlink_dump() is failed
    one time, the new data never appear and the reader will sleep forever.

    netlink_dump() is called from two places:

    1. from netlink_sendmsg->...->netlink_dump_start().
    In this place we can report error directly and it will be returned
    by sendmsg().

    2. from netlink_recvmsg
    There we can't report error directly, because we have a portion of
    valid output data and call netlink_dump() for prepare the next portion.
    If netlink_dump() is failed, the socket will be mark as error and the
    next recvmsg will be failed.

    Signed-off-by: Andrey Vagin
    Signed-off-by: David S. Miller

    Andrey Vagin
     

20 Jan, 2011

1 commit


10 Jan, 2011

1 commit

  • Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
    when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
    being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
    non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

    Substitute the condition to test for _all_ bits being set.

    Signed-off-by: Jan Engelhardt
    Acked-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Jan Engelhardt
     

25 Oct, 2010

1 commit

  • commit 6c04bb18ddd633 (netlink: use call_rcu for netlink_change_ngroups)
    used a somewhat convoluted and racy way to perform call_rcu().

    The old block of memory is freed after a grace period, but the rcu_head
    used to track it is located in new block.

    This can clash if we call two times or more netlink_change_ngroups(),
    and a block is freed before another. call_rcu() called on different cpus
    makes no guarantee in order of callbacks.

    Fix this using a more standard way of handling this : Each block of
    memory contains its own rcu_head, so that no 'use after free' can
    happens.

    Signed-off-by: Eric Dumazet
    CC: Johannes Berg
    CC: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Oct, 2010

1 commit


06 Oct, 2010

1 commit

  • Each family may have some amount of boilerplate
    locking code that applies to most, or even all,
    commands.

    This allows a family to handle such things in
    a more generic way, by allowing it to
    a) include private flags in each operation
    b) specify a pre_doit hook that is called,
    before an operation's doit() callback and
    may return an error directly,
    c) specify a post_doit hook that can undo
    locking or similar things done by pre_doit,
    and finally
    d) include two private pointers in each info
    struct passed between all these operations
    including doit(). (It's two because I'll
    need two in nl80211 -- can be extended.)

    Signed-off-by: Johannes Berg
    Acked-by: David S. Miller
    Signed-off-by: John W. Linville

    Johannes Berg
     

01 Sep, 2010

1 commit


19 Aug, 2010

1 commit

  • Since
    commit 1dacc76d0014a034b8aca14237c127d7c19d7726
    Author: Johannes Berg
    Date: Wed Jul 1 11:26:02 2009 +0000

    net/compat/wext: send different messages to compat tasks

    we had a race condition when setting and then
    restoring frag_list. Eric attempted to fix it,
    but the fix created even worse problems.

    However, the original motivation I had when I
    added the code that turned out to be racy is
    no longer clear to me, since we only copy up
    to skb->len to userspace, which doesn't include
    the frag_list length. As a result, not doing
    any frag_list clearing and restoring avoids
    the race condition, while not introducing any
    other problems.

    Additionally, while preparing this patch I found
    that since none of the remaining netlink code is
    really aware of the frag_list, we need to use the
    original skb's information for packet information
    and credentials. This fixes, for example, the
    group information received by compat tasks.

    Cc: Eric Dumazet
    Cc: stable@kernel.org [2.6.31+, for 2.6.35 revert 1235f504aa]
    Signed-off-by: Johannes Berg
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Johannes Berg
     

16 Aug, 2010

1 commit


27 Jul, 2010

3 commits