19 Mar, 2019

1 commit

  • [ Upstream commit 6d2b0f02f5a07a4bf02e4cbc90d7eaa85cac2986 ]

    proc_exit_connector() uses ->real_parent lockless. This is not
    safe that its parent can go away at any moment, so use RCU to
    protect it, and ensure that this task is not released.

    [ 747.624551] ==================================================================
    [ 747.632946] BUG: KASAN: use-after-free in proc_exit_connector+0x1f7/0x310
    [ 747.640686] Read of size 4 at addr ffff88a0276988e0 by task sshd/2882
    [ 747.648032]
    [ 747.649804] CPU: 11 PID: 2882 Comm: sshd Tainted: G E 4.19.26-rc2 #11
    [ 747.658629] Hardware name: IBM x3550M4 -[7914OFV]-/00AM544, BIOS -[D7E142BUS-1.71]- 07/31/2014
    [ 747.668419] Call Trace:
    [ 747.671269] dump_stack+0xf0/0x19b
    [ 747.675186] ? show_regs_print_info+0x5/0x5
    [ 747.679988] ? kmsg_dump_rewind_nolock+0x59/0x59
    [ 747.685302] print_address_description+0x6a/0x270
    [ 747.691162] kasan_report+0x258/0x380
    [ 747.695835] ? proc_exit_connector+0x1f7/0x310
    [ 747.701402] proc_exit_connector+0x1f7/0x310
    [ 747.706767] ? proc_coredump_connector+0x2d0/0x2d0
    [ 747.712715] ? _raw_write_unlock_irq+0x29/0x50
    [ 747.718270] ? _raw_write_unlock_irq+0x29/0x50
    [ 747.723820] ? ___preempt_schedule+0x16/0x18
    [ 747.729193] ? ___preempt_schedule+0x16/0x18
    [ 747.734574] do_exit+0xa11/0x14f0
    [ 747.738880] ? mm_update_next_owner+0x590/0x590
    [ 747.744525] ? debug_show_all_locks+0x3c0/0x3c0
    [ 747.761448] ? ktime_get_coarse_real_ts64+0xeb/0x1c0
    [ 747.767589] ? lockdep_hardirqs_on+0x1a6/0x290
    [ 747.773154] ? check_chain_key+0x139/0x1f0
    [ 747.778345] ? check_flags.part.35+0x240/0x240
    [ 747.783908] ? __lock_acquire+0x2300/0x2300
    [ 747.789171] ? _raw_spin_unlock_irqrestore+0x59/0x70
    [ 747.795316] ? _raw_spin_unlock_irqrestore+0x59/0x70
    [ 747.801457] ? do_raw_spin_unlock+0x10f/0x1e0
    [ 747.806914] ? do_raw_spin_trylock+0x120/0x120
    [ 747.812481] ? preempt_count_sub+0x14/0xc0
    [ 747.817645] ? _raw_spin_unlock+0x2e/0x50
    [ 747.822708] ? __handle_mm_fault+0x12db/0x1fa0
    [ 747.828367] ? __pmd_alloc+0x2d0/0x2d0
    [ 747.833143] ? check_noncircular+0x50/0x50
    [ 747.838309] ? match_held_lock+0x7f/0x340
    [ 747.843380] ? check_noncircular+0x50/0x50
    [ 747.848561] ? handle_mm_fault+0x21a/0x5f0
    [ 747.853730] ? check_flags.part.35+0x240/0x240
    [ 747.859290] ? check_chain_key+0x139/0x1f0
    [ 747.864474] ? __do_page_fault+0x40f/0x760
    [ 747.869655] ? __audit_syscall_entry+0x4b/0x1f0
    [ 747.875319] ? syscall_trace_enter+0x1d5/0x7b0
    [ 747.880877] ? trace_raw_output_preemptirq_template+0x90/0x90
    [ 747.887895] ? trace_raw_output_sys_exit+0x80/0x80
    [ 747.893860] ? up_read+0x3b/0x90
    [ 747.898142] ? stop_critical_timings+0x260/0x260
    [ 747.903909] do_group_exit+0xe0/0x1c0
    [ 747.908591] ? __x64_sys_exit+0x30/0x30
    [ 747.913460] ? trace_raw_output_preemptirq_template+0x90/0x90
    [ 747.920485] ? tracer_hardirqs_on+0x270/0x270
    [ 747.925956] __x64_sys_exit_group+0x28/0x30
    [ 747.931214] do_syscall_64+0x117/0x400
    [ 747.935988] ? syscall_return_slowpath+0x2f0/0x2f0
    [ 747.941931] ? trace_hardirqs_off_thunk+0x1a/0x1c
    [ 747.947788] ? trace_hardirqs_on_caller+0x1d0/0x1d0
    [ 747.953838] ? lockdep_sys_exit+0x16/0x8e
    [ 747.958915] ? trace_hardirqs_off_thunk+0x1a/0x1c
    [ 747.964784] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 747.971021] RIP: 0033:0x7f572f154c68
    [ 747.975606] Code: Bad RIP value.
    [ 747.979791] RSP: 002b:00007ffed2dfaa58 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
    [ 747.989324] RAX: ffffffffffffffda RBX: 00007f572f431840 RCX: 00007f572f154c68
    [ 747.997910] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
    [ 748.006495] RBP: 0000000000000001 R08: 00000000000000e7 R09: fffffffffffffee0
    [ 748.015079] R10: 00007f572f4387e8 R11: 0000000000000246 R12: 00007f572f431840
    [ 748.023664] R13: 000055a7f90f2c50 R14: 000055a7f96e2310 R15: 000055a7f96e2310
    [ 748.032287]
    [ 748.034509] Allocated by task 2300:
    [ 748.038982] kasan_kmalloc+0xa0/0xd0
    [ 748.043562] kmem_cache_alloc_node+0xf5/0x2e0
    [ 748.049018] copy_process+0x1781/0x4790
    [ 748.053884] _do_fork+0x166/0x9a0
    [ 748.058163] do_syscall_64+0x117/0x400
    [ 748.062943] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 748.069180]
    [ 748.071405] Freed by task 15395:
    [ 748.075591] __kasan_slab_free+0x130/0x180
    [ 748.080752] kmem_cache_free+0xc2/0x310
    [ 748.085619] free_task+0xea/0x130
    [ 748.089901] __put_task_struct+0x177/0x230
    [ 748.095063] finish_task_switch+0x51b/0x5d0
    [ 748.100315] __schedule+0x506/0xfa0
    [ 748.104791] schedule+0xca/0x260
    [ 748.108978] futex_wait_queue_me+0x27e/0x420
    [ 748.114333] futex_wait+0x251/0x550
    [ 748.118814] do_futex+0x75b/0xf80
    [ 748.123097] __x64_sys_futex+0x231/0x2a0
    [ 748.128065] do_syscall_64+0x117/0x400
    [ 748.132835] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 748.139066]
    [ 748.141289] The buggy address belongs to the object at ffff88a027698000
    [ 748.141289] which belongs to the cache task_struct of size 12160
    [ 748.156589] The buggy address is located 2272 bytes inside of
    [ 748.156589] 12160-byte region [ffff88a027698000, ffff88a02769af80)
    [ 748.171114] The buggy address belongs to the page:
    [ 748.177055] page:ffffea00809da600 count:1 mapcount:0 mapping:ffff888107d01e00 index:0x0 compound_mapcount: 0
    [ 748.189136] flags: 0x57ffffc0008100(slab|head)
    [ 748.194688] raw: 0057ffffc0008100 ffffea00809a3200 0000000300000003 ffff888107d01e00
    [ 748.204424] raw: 0000000000000000 0000000000020002 00000001ffffffff 0000000000000000
    [ 748.214146] page dumped because: kasan: bad access detected
    [ 748.220976]
    [ 748.223197] Memory state around the buggy address:
    [ 748.229128] ffff88a027698780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 748.238271] ffff88a027698800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 748.247414] >ffff88a027698880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 748.256564] ^
    [ 748.264267] ffff88a027698900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 748.273493] ffff88a027698980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [ 748.282630] ==================================================================

    Fixes: b086ff87251b4a4 ("connector: add parent pid and tgid to coredump and exit events")
    Signed-off-by: Zhang Yu
    Signed-off-by: Li RongQing
    Acked-by: Evgeniy Polyakov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Li RongQing
     

08 Jul, 2018

1 commit

  • Fix a build warning in connector.c when CONFIG_PROC_FS is not enabled
    by marking the unused function as __maybe_unused.

    ../drivers/connector/connector.c:242:12: warning: 'cn_proc_show' defined but not used [-Wunused-function]

    Signed-off-by: Randy Dunlap
    Cc: Evgeniy Polyakov
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Randy Dunlap
     

07 Jun, 2018

1 commit

  • Pull networking updates from David Miller:

    1) Add Maglev hashing scheduler to IPVS, from Inju Song.

    2) Lots of new TC subsystem tests from Roman Mashak.

    3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

    4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

    5) Add ttl inherit support to vxlan, from Hangbin Liu.

    6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

    7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

    8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

    9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

    10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

    11) Plumb extack down into fib_rules, from Roopa Prabhu.

    12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

    13) Add UDP GSO support, from Willem de Bruijn.

    14) Add documentation for eBPF helpers, from Quentin Monnet.

    15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

    16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

    17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

    18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

    19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

    20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

    21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

    22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

    23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

    24) Add TCP SACK compression, from Eric Dumazet.

    25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

    26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

    27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

    28) Add lots of forwarding selftests, from Petr Machata.

    29) Add generic network device failover driver, from Sridhar Samudrala.

    * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
    strparser: Add __strp_unpause and use it in ktls.
    rxrpc: Fix terminal retransmission connection ID to include the channel
    net: hns3: Optimize PF CMDQ interrupt switching process
    net: hns3: Fix for VF mailbox receiving unknown message
    net: hns3: Fix for VF mailbox cannot receiving PF response
    bnx2x: use the right constant
    Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
    net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
    enic: fix UDP rss bits
    netdev-FAQ: clarify DaveM's position for stable backports
    rtnetlink: validate attributes in do_setlink()
    mlxsw: Add extack messages for port_{un, }split failures
    netdevsim: Add extack error message for devlink reload
    devlink: Add extack to reload and port_{un, }split operations
    net: metrics: add proper netlink validation
    ipmr: fix error path when ipmr_new_table fails
    ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
    net: hns3: remove unused hclgevf_cfg_func_mta_filter
    netfilter: provide udp*_lib_lookup for nf_tproxy
    qed*: Utilize FW 8.37.2.0
    ...

    Linus Torvalds
     

16 May, 2018

1 commit


02 May, 2018

1 commit

  • The intention is to get notified of process failures as soon
    as possible, before a possible core dumping (which could be very long)
    (e.g. in some process-manager). Coredump and exit process events
    are perfect for such use cases (see 2b5faa4c553f "connector: Added
    coredumping event to the process connector").

    The problem is that for now the process-manager cannot know the parent
    of a dying process using connectors. This could be useful if the
    process-manager should monitor for failures only children of certain
    parents, so we could filter the coredump and exit events by parent
    process and/or thread ID.

    Add parent pid and tgid to coredump and exit process connectors event
    data.

    Signed-off-by: Stefan Strogin
    Acked-by: Evgeniy Polyakov
    Signed-off-by: David S. Miller

    Stefan Strogin
     

22 Oct, 2017

1 commit

  • atomic_t variables are currently used to implement reference
    counters with the following properties:
    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable cn_callback_entry.refcnt is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    Suggested-by: Kees Cook
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Signed-off-by: Elena Reshetova
    Signed-off-by: David S. Miller

    Elena Reshetova
     

06 Jul, 2016

1 commit

  • The Kconfig controlling build of this code is currently:

    drivers/connector/Kconfig:config PROC_EVENTS
    drivers/connector/Kconfig: bool "Report process events to userspace"

    ...meaning that it currently is not being built as a module by anyone.
    Lets remove the two modular references, so that when reading the driver
    there is no doubt it is builtin-only.

    Since module_init translates to device_initcall in the non-modular
    case, the init ordering remains unchanged with this commit.

    Cc: Evgeniy Polyakov
    Cc: netdev@vger.kernel.org
    Signed-off-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Paul Gortmaker
     

28 Jun, 2016

1 commit

  • The proc connector messages include a sequence number, allowing userspace
    programs to detect lost messages. However, performing this detection is
    currently more difficult than necessary, since netlink messages can be
    delivered to the application out-of-order. To fix this, leave pre-emption
    disabled during cn_netlink_send(), and use GFP_NOWAIT.

    The following was written as a test case. Building the kernel w/ make -j32
    proved a reliable way to generate out-of-order cn_proc messages.

    int
    main(int argc, char *argv[])
    {
    static uint32_t last_seq[CPU_SETSIZE], seq;
    int cpu, fd;
    struct sockaddr_nl sa;
    struct __attribute__((aligned(NLMSG_ALIGNTO))) {
    struct nlmsghdr nl_hdr;
    struct __attribute__((__packed__)) {
    struct cn_msg cn_msg;
    struct proc_event cn_proc;
    };
    } rmsg;
    struct __attribute__((aligned(NLMSG_ALIGNTO))) {
    struct nlmsghdr nl_hdr;
    struct __attribute__((__packed__)) {
    struct cn_msg cn_msg;
    enum proc_cn_mcast_op cn_mcast;
    };
    } smsg;

    fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
    if (fd < 0) {
    perror("socket");
    }

    sa.nl_family = AF_NETLINK;
    sa.nl_groups = CN_IDX_PROC;
    sa.nl_pid = getpid();
    if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
    perror("bind");
    }

    memset(&smsg, 0, sizeof(smsg));
    smsg.nl_hdr.nlmsg_len = sizeof(smsg);
    smsg.nl_hdr.nlmsg_pid = getpid();
    smsg.nl_hdr.nlmsg_type = NLMSG_DONE;
    smsg.cn_msg.id.idx = CN_IDX_PROC;
    smsg.cn_msg.id.val = CN_VAL_PROC;
    smsg.cn_msg.len = sizeof(enum proc_cn_mcast_op);
    smsg.cn_mcast = PROC_CN_MCAST_LISTEN;
    if (send(fd, &smsg, sizeof(smsg), 0) != sizeof(smsg)) {
    perror("send");
    }

    while (recv(fd, &rmsg, sizeof(rmsg), 0) == sizeof(rmsg)) {
    cpu = rmsg.cn_proc.cpu;
    if (cpu < 0) {
    continue;
    }
    seq = rmsg.cn_msg.seq;
    if ((last_seq[cpu] != 0) && (seq != last_seq[cpu] + 1)) {
    printf("out-of-order seq=%d on cpu=%d\n", seq, cpu);
    }
    last_seq[cpu] = seq;
    }

    /* NOTREACHED */

    perror("recv");

    return -1;
    }

    Signed-off-by: Aaron Campbell
    Signed-off-by: David S. Miller

    Aaron Campbell
     

05 Jan, 2016

1 commit


07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

07 Jan, 2015

1 commit


27 Nov, 2014

1 commit

  • The struct cn_msg len field comes from userspace and needs to be
    validated. More logical to do so here where the cn_msg pointer is
    pulled out of the sk_buff than the callback which is passed cn_msg *
    and might assume no validation is needed.

    Reported-by: Dan Carpenter
    Acked-by: Evgeniy Polyakov
    Signed-off-by: David Fries
    Signed-off-by: Greg Kroah-Hartman

    David Fries
     

24 Jul, 2014

1 commit


03 Jun, 2014

1 commit

  • …gregkh/char-misc into next

    Pull char/misc driver patches from Greg KH:
    "Here is the big char / misc driver update for 3.16-rc1.

    Lots of different driver updates for a variety of different drivers
    and minor driver subsystems.

    All have been in linux-next with no reported issues"

    * tag 'char-misc-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (79 commits)
    hv: use correct order when freeing monitor_pages
    spmi: of: fixup generic SPMI devicetree binding example
    applicom: dereferencing NULL on error path
    misc: genwqe: fix uninitialized return value in genwqe_free_sync_sgl()
    miscdevice.h: Simple syntax fix to make pointers consistent.
    MAINTAINERS: Add miscdevice.h to file list for char/misc drivers.
    mcb: Add support for shared PCI IRQs
    drivers: Remove duplicate conditionally included subdirs
    misc: atmel_pwm: only build for supported platforms
    mei: me: move probe quirk to cfg structure
    mei: add per device configuration
    mei: me: read H_CSR after asserting reset
    mei: me: drop harmful wait optimization
    mei: me: fix hw ready reset flow
    mei: fix memory leak of mei_clients array
    uio: fix vma io range check in mmap
    drivers: uio_dmem_genirq: Fix memory leak in uio_dmem_genirq_probe()
    w1: do not unlock unheld list_mutex in __w1_remove_master_device()
    w1: optional bundling of netlink kernel replies
    connector: allow multiple messages to be sent in one packet
    ...

    Linus Torvalds
     

28 May, 2014

1 commit

  • This increases the amount of bundling to reduce the number of packets
    sent. For the one wire use there can be multiple struct
    w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in
    struct cn_msg, and with this change multiple of those in a struct
    nlmsghdr, and at each level the len identifies there being multiple of
    the next.

    Signed-off-by: David Fries
    Acked-by: Evgeniy Polyakov
    Signed-off-by: Greg Kroah-Hartman

    David Fries
     

25 Apr, 2014

1 commit

  • It is possible by passing a netlink socket to a more privileged
    executable and then to fool that executable into writing to the socket
    data that happens to be valid netlink message to do something that
    privileged executable did not intend to do.

    To keep this from happening replace bare capable and ns_capable calls
    with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
    Which act the same as the previous calls except they verify that the
    opener of the socket had the desired permissions as well.

    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

03 Apr, 2014

1 commit

  • Pull networking updates from David Miller:
    "Here is my initial pull request for the networking subsystem during
    this merge window:

    1) Support for ESN in AH (RFC 4302) from Fan Du.

    2) Add full kernel doc for ethtool command structures, from Ben
    Hutchings.

    3) Add BCM7xxx PHY driver, from Florian Fainelli.

    4) Export computed TCP rate information in netlink socket dumps, from
    Eric Dumazet.

    5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
    Dichtel.

    6) Convert many drivers to pci_enable_msix_range(), from Alexander
    Gordeev.

    7) Record SKB timestamps more efficiently, from Eric Dumazet.

    8) Switch to microsecond resolution for TCP round trip times, also
    from Eric Dumazet.

    9) Clean up and fix 6lowpan fragmentation handling by making use of
    the existing inet_frag api for it's implementation.

    10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

    11) Auto size SKB lengths when composing netlink messages based upon
    past message sizes used, from Eric Dumazet.

    12) qdisc dumps can take a long time, add a cond_resched(), From Eric
    Dumazet.

    13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
    Get rid of never-used-in-tree netpoll RX handling. From Eric W
    Biederman.

    14) Support inter-address-family and namespace changing in VTI tunnel
    driver(s). From Steffen Klassert.

    15) Add Altera TSE driver, from Vince Bridgers.

    16) Optimizing csum_replace2() so that it doesn't adjust the checksum
    by checksumming the entire header, from Eric Dumazet.

    17) Expand BPF internal implementation for faster interpreting, more
    direct translations into JIT'd code, and much cleaner uses of BPF
    filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
    Starovoitov"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
    netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
    net: Add a test to see if a skb is freeable in irq context
    qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
    net: ptp: move PTP classifier in its own file
    net: sxgbe: make "core_ops" static
    net: sxgbe: fix logical vs bitwise operation
    net: sxgbe: sxgbe_mdio_register() frees the bus
    Call efx_set_channels() before efx->type->dimension_resources()
    xen-netback: disable rogue vif in kthread context
    net/mlx4: Set proper build dependancy with vxlan
    be2net: fix build dependency on VxLAN
    mac802154: make csma/cca parameters per-wpan
    mac802154: allow only one WPAN to be up at any given time
    net: filter: minor: fix kdoc in __sk_run_filter
    netlink: don't compare the nul-termination in nla_strcmp
    can: c_can: Avoid led toggling for every packet.
    can: c_can: Simplify TX interrupt cleanup
    can: c_can: Store dlc private
    can: c_can: Reduce register access
    can: c_can: Make the code readable
    ...

    Linus Torvalds
     

04 Mar, 2014

1 commit


08 Feb, 2014

1 commit


15 Nov, 2013

1 commit

  • In af3e095a1fb4, Erik Jacobsen fixed one type of unaligned access
    bug for ia64 by converting a 64-bit write to use put_unaligned().
    Unfortunately, since gcc will convert a short memset() to a series
    of appropriately-aligned stores, the problem is now visible again
    on tilegx, where the memset that zeros out proc_event is converted
    to three 64-bit stores, causing an unaligned access panic.

    A better fix for the original problem is to ensure that proc_event
    is aligned to 8 bytes here. We can do that relatively easily by
    arranging to start the struct cn_msg aligned to 8 bytes and then
    offset by 4 bytes. Doing so means that the immediately following
    proc_event structure is then correctly aligned to 8 bytes.

    The result is that the memset() stores are now aligned, and as an
    added benefit, we can remove the put_unaligned() calls in the code.

    Signed-off-by: Chris Metcalf
    Signed-off-by: David S. Miller

    Chris Metcalf
     

03 Oct, 2013

3 commits

  • We calculated the size for the netlink message buffer as size. Use size
    in the memcpy() call as well instead of recalculating it.

    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • The current code tests the length of the whole netlink message to be
    at least as long to fit a cn_msg. This is wrong as nlmsg_len includes
    the length of the netlink message header. Use nlmsg_len() instead to
    fix this "off-by-NLMSG_HDRLEN" size check.

    Cc: stable@vger.kernel.org # v2.6.14+
    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • Initialize event_data for all possible message types to prevent leaking
    kernel stack contents to userland (up to 20 bytes). Also set the flags
    member of the connector message to 0 to prevent leaking two more stack
    bytes this way.

    Cc: stable@vger.kernel.org # v2.6.15+
    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause
     

29 Mar, 2013

1 commit


21 Mar, 2013

1 commit

  • Process connector can now also detect coredumping events.

    Main aim of patch is get notified at start of coredumping, instead of
    having to wait for it to finish and then being notified through EXIT
    event.

    Could be used for instance by process-managers that want to get
    notified as soon as possible about process failures, and not
    necessarily beeing notified after coredump, which could be in the
    order of minutes depending on size of coredump, piping and so on.

    Signed-off-by: Jesper Derehag
    Signed-off-by: David S. Miller

    Jesper Derehag
     

28 Feb, 2013

1 commit

  • While PROC_CN_MCAST_LISTEN/IGNORE is entirely advisory, it was possible
    for an unprivileged user to turn off notifications for all listeners by
    sending PROC_CN_MCAST_IGNORE. Instead, require the same privileges as
    required for a multicast bind.

    Signed-off-by: Kees Cook
    Cc: Evgeniy Polyakov
    Cc: Matt Helsley
    Cc: stable@vger.kernel.org
    Acked-by: Evgeniy Polyakov
    Acked-by: Matt Helsley
    Signed-off-by: David S. Miller

    Kees Cook
     

19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

04 Jan, 2013

1 commit

  • CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
    markings need to be removed.

    This change removes the use of __devinit, __devexit_p, __devinitdata,
    __devinitconst, and __devexit from these drivers.

    Based on patches originally written by Bill Pemberton, but redone by me
    in order to handle some of the coding style issues better, by hand.

    Cc: Bill Pemberton
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

03 Oct, 2012

1 commit

  • Pull networking changes from David Miller:

    1) GRE now works over ipv6, from Dmitry Kozlov.

    2) Make SCTP more network namespace aware, from Eric Biederman.

    3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

    4) Make openvswitch network namespace aware, from Pravin B Shelar.

    5) IPV6 NAT implementation, from Patrick McHardy.

    6) Server side support for TCP Fast Open, from Jerry Chu and others.

    7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

    8) Increate the loopback default MTU to 64K, from Eric Dumazet.

    9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic. This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

    10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail. Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

    11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

    12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

    13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

    Fix up various fairly trivial conflicts, mostly caused by the user
    namespace changes.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
    hyperv: Add buffer for extended info after the RNDIS response message.
    hyperv: Report actual status in receive completion packet
    hyperv: Remove extra allocated space for recv_pkt_list elements
    hyperv: Fix page buffer handling in rndis_filter_send_request()
    hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
    hyperv: Fix the max_xfer_size in RNDIS initialization
    vxlan: put UDP socket in correct namespace
    vxlan: Depend on CONFIG_INET
    sfc: Fix the reported priorities of different filter types
    sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
    sfc: Fix loopback self-test with separate_tx_channels=1
    sfc: Fix MCDI structure field lookup
    sfc: Add parentheses around use of bitfield macro arguments
    sfc: Fix null function pointer in efx_sriov_channel_type
    vxlan: virtual extensible lan
    igmp: export symbol ip_mc_leave_group
    netlink: add attributes to fdb interface
    tg3: unconditionally select HWMON support when tg3 is enabled.
    Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
    gre: fix sparse warning
    ...

    Linus Torvalds
     

09 Sep, 2012

1 commit


07 Sep, 2012

1 commit


17 Jul, 2012

1 commit


30 Jun, 2012

1 commit

  • This patch adds the following structure:

    struct netlink_kernel_cfg {
    unsigned int groups;
    void (*input)(struct sk_buff *skb);
    struct mutex *cb_mutex;
    };

    That can be passed to netlink_kernel_create to set optional configurations
    for netlink kernel sockets.

    I've populated this structure by looking for NULL and zero parameters at the
    existing code. The remaining parameters that always need to be set are still
    left in the original interface.

    That includes optional parameters for the netlink socket creation. This allows
    easy extensibility of this interface in the future.

    This patch also adapts all callers to use this new interface.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

27 Jun, 2012

1 commit


29 Sep, 2011

1 commit

  • Add an event to monitor comm value changes of tasks. Such an event
    becomes vital, if someone desires to control threads of a process in
    different manner.

    A natural characteristic of threads is its comm value, and helpfully
    application developers have an opportunity to change it in runtime.
    Reporting about such events via proc connector allows to fine-grain
    monitoring and control potentials, for instance a process control daemon
    listening to proc connector and following comm value policies can place
    specific threads to assigned cgroup partitions.

    It might be possible to achieve a pale partial one-shot likeness without
    this update, if an application changes comm value of a thread generator
    task beforehand, then a new thread is cloned, and after that proc
    connector listener gets the fork event and reads new thread's comm value
    from procfs stat file, but this change visibly simplifies and extends the
    matter.

    Signed-off-by: Vladimir Zapolskiy
    Acked-by: Evgeniy Polyakov
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Vladimir Zapolskiy
     

29 Jul, 2011

1 commit

  • proc_fork_connector() uses ->real_parent lockless. This is not safe if
    copy_process() was called with CLONE_THREAD or CLONE_PARENT, in this case
    the parent != current can go away at any moment.

    Signed-off-by: Oleg Nesterov
    Cc: Vladimir Zapolskiy
    Cc: "David S. Miller"
    Cc: Evgeniy Polyakov
    Cc: Evgeniy Polyakov
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Oleg Nesterov
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

26 Jul, 2011

1 commit


23 Jul, 2011

1 commit

  • * 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits)
    ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
    ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction
    connector: add an event for monitoring process tracers
    ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED
    ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task()
    ptrace_init_task: initialize child->jobctl explicitly
    has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/
    ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop
    ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/
    ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented()
    ptrace: ptrace_reparented() should check same_thread_group()
    redefine thread_group_leader() as exit_signal >= 0
    do not change dead_task->exit_signal
    kill task_detached()
    reparent_leader: check EXIT_DEAD instead of task_detached()
    make do_notify_parent() __must_check, update the callers
    __ptrace_detach: avoid task_detached(), check do_notify_parent()
    kill tracehook_notify_death()
    make do_notify_parent() return bool
    ptrace: s/tracehook_tracer_task()/ptrace_parent()/
    ...

    Linus Torvalds