05 Jun, 2018

1 commit

  • Pull aio updates from Al Viro:
    "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.

    The only thing I'm holding back for a day or so is Adam's aio ioprio -
    his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
    but let it sit in -next for decency sake..."

    * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    aio: sanitize the limit checking in io_submit(2)
    aio: fold do_io_submit() into callers
    aio: shift copyin of iocb into io_submit_one()
    aio_read_events_ring(): make a bit more readable
    aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
    aio: take list removal to (some) callers of aio_complete()
    aio: add missing break for the IOCB_CMD_FDSYNC case
    random: convert to ->poll_mask
    timerfd: convert to ->poll_mask
    eventfd: switch to ->poll_mask
    pipe: convert to ->poll_mask
    crypto: af_alg: convert to ->poll_mask
    net/rxrpc: convert to ->poll_mask
    net/iucv: convert to ->poll_mask
    net/phonet: convert to ->poll_mask
    net/nfc: convert to ->poll_mask
    net/caif: convert to ->poll_mask
    net/bluetooth: convert to ->poll_mask
    net/sctp: convert to ->poll_mask
    net/tipc: convert to ->poll_mask
    ...

    Linus Torvalds
     

26 May, 2018

1 commit


23 May, 2018

1 commit

  • Syzbot reported the use-after-free in timer_is_static_object() [1].

    This can happen because the structure for the rto timer (ccid2_hc_tx_sock)
    is removed in dccp_disconnect(), and ccid2_hc_tx_rto_expire() can be
    called after that.

    The report [1] is similar to the one in commit 120e9dabaf55 ("dccp:
    defer ccid_hc_tx_delete() at dismantle time"). And the fix is the same,
    delay freeing ccid2_hc_tx_sock structure, so that it is freed in
    dccp_sk_destruct().

    [1]

    ==================================================================
    BUG: KASAN: use-after-free in timer_is_static_object+0x80/0x90
    kernel/time/timer.c:607
    Read of size 8 at addr ffff8801bebb5118 by task syz-executor2/25299

    CPU: 1 PID: 25299 Comm: syz-executor2 Not tainted 4.17.0-rc5+ #54
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    timer_is_static_object+0x80/0x90 kernel/time/timer.c:607
    debug_object_activate+0x2d9/0x670 lib/debugobjects.c:508
    debug_timer_activate kernel/time/timer.c:709 [inline]
    debug_activate kernel/time/timer.c:764 [inline]
    __mod_timer kernel/time/timer.c:1041 [inline]
    mod_timer+0x4d3/0x13b0 kernel/time/timer.c:1102
    sk_reset_timer+0x22/0x60 net/core/sock.c:2742
    ccid2_hc_tx_rto_expire+0x587/0x680 net/dccp/ccids/ccid2.c:147
    call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
    expire_timers kernel/time/timer.c:1363 [inline]
    __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
    run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
    __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
    invoke_softirq kernel/softirq.c:365 [inline]
    irq_exit+0x1d1/0x200 kernel/softirq.c:405
    exiting_irq arch/x86/include/asm/apic.h:525 [inline]
    smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863

    ...
    Allocated by task 25374:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
    kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
    ccid_new+0x25b/0x3e0 net/dccp/ccid.c:151
    dccp_hdlr_ccid+0x27/0x150 net/dccp/feat.c:44
    __dccp_feat_activate+0x184/0x270 net/dccp/feat.c:344
    dccp_feat_activate_values+0x3a7/0x819 net/dccp/feat.c:1538
    dccp_create_openreq_child+0x472/0x610 net/dccp/minisocks.c:128
    dccp_v4_request_recv_sock+0x12c/0xca0 net/dccp/ipv4.c:408
    dccp_v6_request_recv_sock+0x125d/0x1f10 net/dccp/ipv6.c:415
    dccp_check_req+0x455/0x6a0 net/dccp/minisocks.c:197
    dccp_v4_rcv+0x7b8/0x1f3f net/dccp/ipv4.c:841
    ip_local_deliver_finish+0x2e3/0xd80 net/ipv4/ip_input.c:215
    NF_HOOK include/linux/netfilter.h:288 [inline]
    ip_local_deliver+0x1e1/0x720 net/ipv4/ip_input.c:256
    dst_input include/net/dst.h:450 [inline]
    ip_rcv_finish+0x81b/0x2200 net/ipv4/ip_input.c:396
    NF_HOOK include/linux/netfilter.h:288 [inline]
    ip_rcv+0xb70/0x143d net/ipv4/ip_input.c:492
    __netif_receive_skb_core+0x26f5/0x3630 net/core/dev.c:4592
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4657
    process_backlog+0x219/0x760 net/core/dev.c:5337
    napi_poll net/core/dev.c:5735 [inline]
    net_rx_action+0x7b7/0x1930 net/core/dev.c:5801
    __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285

    Freed by task 25374:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
    ccid_hc_tx_delete+0xc3/0x100 net/dccp/ccid.c:190
    dccp_disconnect+0x130/0xc66 net/dccp/proto.c:286
    dccp_close+0x3bc/0xe60 net/dccp/proto.c:1045
    inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
    inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460
    sock_release+0x96/0x1b0 net/socket.c:594
    sock_close+0x16/0x20 net/socket.c:1149
    __fput+0x34d/0x890 fs/file_table.c:209
    ____fput+0x15/0x20 fs/file_table.c:243
    task_work_run+0x1e4/0x290 kernel/task_work.c:113
    tracehook_notify_resume include/linux/tracehook.h:191 [inline]
    exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
    prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
    syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
    do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8801bebb4cc0
    which belongs to the cache ccid2_hc_tx_sock of size 1240
    The buggy address is located 1112 bytes inside of
    1240-byte region [ffff8801bebb4cc0, ffff8801bebb5198)
    The buggy address belongs to the page:
    page:ffffea0006faed00 count:1 mapcount:0 mapping:ffff8801bebb41c0
    index:0xffff8801bebb5240 compound_mapcount: 0
    flags: 0x2fffc0000008100(slab|head)
    raw: 02fffc0000008100 ffff8801bebb41c0 ffff8801bebb5240 0000000100000003
    raw: ffff8801cdba3138 ffffea0007634120 ffff8801cdbaab40 0000000000000000
    page dumped because: kasan: bad access detected
    ...
    ==================================================================

    Reported-by: syzbot+5d47e9ec91a6f15dbd6f@syzkaller.appspotmail.com
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller

    Alexey Kodanev
     

08 Mar, 2018

1 commit

  • dccp_disconnect() sets 'dp->dccps_hc_tx_ccid' tx handler to NULL,
    therefore if DCCP socket is disconnected and dccp_sendmsg() is
    called after it, it will cause a NULL pointer dereference in
    dccp_write_xmit().

    This crash and the reproducer was reported by syzbot. Looks like
    it is reproduced if commit 69c64866ce07 ("dccp: CVE-2017-8824:
    use-after-free in DCCP code") is applied.

    Reported-by: syzbot+f99ab3887ab65d70f816@syzkaller.appspotmail.com
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller

    Alexey Kodanev
     

12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Feb, 2018

1 commit

  • Pull networking updates from David Miller:

    1) Significantly shrink the core networking routing structures. Result
    of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf

    2) Add netdevsim driver for testing various offloads, from Jakub
    Kicinski.

    3) Support cross-chip FDB operations in DSA, from Vivien Didelot.

    4) Add a 2nd listener hash table for TCP, similar to what was done for
    UDP. From Martin KaFai Lau.

    5) Add eBPF based queue selection to tun, from Jason Wang.

    6) Lockless qdisc support, from John Fastabend.

    7) SCTP stream interleave support, from Xin Long.

    8) Smoother TCP receive autotuning, from Eric Dumazet.

    9) Lots of erspan tunneling enhancements, from William Tu.

    10) Add true function call support to BPF, from Alexei Starovoitov.

    11) Add explicit support for GRO HW offloading, from Michael Chan.

    12) Support extack generation in more netlink subsystems. From Alexander
    Aring, Quentin Monnet, and Jakub Kicinski.

    13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
    Russell King.

    14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.

    15) Many improvements and simplifications to the NFP driver bpf JIT,
    from Jakub Kicinski.

    16) Support for ipv6 non-equal cost multipath routing, from Ido
    Schimmel.

    17) Add resource abstration to devlink, from Arkadi Sharshevsky.

    18) Packet scheduler classifier shared filter block support, from Jiri
    Pirko.

    19) Avoid locking in act_csum, from Davide Caratti.

    20) devinet_ioctl() simplifications from Al viro.

    21) More TCP bpf improvements from Lawrence Brakmo.

    22) Add support for onlink ipv6 route flag, similar to ipv4, from David
    Ahern.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
    tls: Add support for encryption using async offload accelerator
    ip6mr: fix stale iterator
    net/sched: kconfig: Remove blank help texts
    openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
    tcp_nv: fix potential integer overflow in tcpnv_acked
    r8169: fix RTL8168EP take too long to complete driver initialization.
    qmi_wwan: Add support for Quectel EP06
    rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
    ipmr: Fix ptrdiff_t print formatting
    ibmvnic: Wait for device response when changing MAC
    qlcnic: fix deadlock bug
    tcp: release sk_frag.page in tcp_disconnect
    ipv4: Get the address of interface correctly.
    net_sched: gen_estimator: fix lockdep splat
    net: macb: Handle HRESP error
    net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
    ipv6: addrconf: break critical section in addrconf_verify_rtnl()
    ipv6: change route cache aging logic
    i40e/i40evf: Update DESC_NEEDED value to reflect larger value
    bnxt_en: cleanup DIM work on device shutdown
    ...

    Linus Torvalds
     

31 Jan, 2018

1 commit

  • Pull poll annotations from Al Viro:
    "This introduces a __bitwise type for POLL### bitmap, and propagates
    the annotations through the tree. Most of that stuff is as simple as
    'make ->poll() instances return __poll_t and do the same to local
    variables used to hold the future return value'.

    Some of the obvious brainos found in process are fixed (e.g. POLLIN
    misspelled as POLL_IN). At that point the amount of sparse warnings is
    low and most of them are for genuine bugs - e.g. ->poll() instance
    deciding to return -EINVAL instead of a bitmap. I hadn't touched those
    in this series - it's large enough as it is.

    Another problem it has caught was eventpoll() ABI mess; select.c and
    eventpoll.c assumed that corresponding POLL### and EPOLL### were
    equal. That's true for some, but not all of them - EPOLL### are
    arch-independent, but POLL### are not.

    The last commit in this series separates userland POLL### values from
    the (now arch-independent) kernel-side ones, converting between them
    in the few places where they are copied to/from userland. AFAICS, this
    is the least disruptive fix preserving poll(2) ABI and making epoll()
    work on all architectures.

    As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
    it will trigger only on what would've triggered EPOLLWRBAND on other
    architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
    at all on sparc. With this patch they should work consistently on all
    architectures"

    * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
    make kernel-side POLL... arch-independent
    eventpoll: no need to mask the result of epi_item_poll() again
    eventpoll: constify struct epoll_event pointers
    debugging printk in sg_poll() uses %x to print POLL... bitmap
    annotate poll(2) guts
    9p: untangle ->poll() mess
    ->si_band gets POLL... bitmap stored into a user-visible long field
    ring_buffer_poll_wait() return value used as return value of ->poll()
    the rest of drivers/*: annotate ->poll() instances
    media: annotate ->poll() instances
    fs: annotate ->poll() instances
    ipc, kernel, mm: annotate ->poll() instances
    net: annotate ->poll() instances
    apparmor: annotate ->poll() instances
    tomoyo: annotate ->poll() instances
    sound: annotate ->poll() instances
    acpi: annotate ->poll() instances
    crypto: annotate ->poll() instances
    block: annotate ->poll() instances
    x86: annotate ->poll() instances
    ...

    Linus Torvalds
     

03 Jan, 2018

1 commit


21 Dec, 2017

1 commit


06 Dec, 2017

1 commit


28 Nov, 2017

1 commit


17 Aug, 2017

1 commit

  • syszkaller team reported another problem in DCCP [1]

    Problem here is that the structure holding RTO timer
    (ccid2_hc_tx_rto_expire() handler) is freed too soon.

    We can not use del_timer_sync() to cancel the timer
    since this timer wants to grab socket lock (that would risk a dead lock)

    Solution is to defer the freeing of memory when all references to
    the socket were released. Socket timers do own a reference, so this
    should fix the issue.

    [1]

    ==================================================================
    BUG: KASAN: use-after-free in ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
    Read of size 4 at addr ffff8801d2660540 by task kworker/u4:7/3365

    CPU: 1 PID: 3365 Comm: kworker/u4:7 Not tainted 4.13.0-rc4+ #3
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: events_unbound call_usermodehelper_exec_work
    Call Trace:

    __dump_stack lib/dump_stack.c:16 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:52
    print_address_description+0x73/0x250 mm/kasan/report.c:252
    kasan_report_error mm/kasan/report.c:351 [inline]
    kasan_report+0x24e/0x340 mm/kasan/report.c:409
    __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
    ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
    call_timer_fn+0x233/0x830 kernel/time/timer.c:1268
    expire_timers kernel/time/timer.c:1307 [inline]
    __run_timers+0x7fd/0xb90 kernel/time/timer.c:1601
    run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
    __do_softirq+0x2f5/0xba3 kernel/softirq.c:284
    invoke_softirq kernel/softirq.c:364 [inline]
    irq_exit+0x1cc/0x200 kernel/softirq.c:405
    exiting_irq arch/x86/include/asm/apic.h:638 [inline]
    smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:1044
    apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:702
    RIP: 0010:arch_local_irq_enable arch/x86/include/asm/paravirt.h:824 [inline]
    RIP: 0010:__raw_write_unlock_irq include/linux/rwlock_api_smp.h:267 [inline]
    RIP: 0010:_raw_write_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:343
    RSP: 0018:ffff8801cd50eaa8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
    RAX: dffffc0000000000 RBX: ffffffff85a090c0 RCX: 0000000000000006
    RDX: 1ffffffff0b595f3 RSI: 1ffff1003962f989 RDI: ffffffff85acaf98
    RBP: ffff8801cd50eab0 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801cc96ea60
    R13: dffffc0000000000 R14: ffff8801cc96e4c0 R15: ffff8801cc96e4c0

    release_task+0xe9e/0x1a40 kernel/exit.c:220
    wait_task_zombie kernel/exit.c:1162 [inline]
    wait_consider_task+0x29b8/0x33c0 kernel/exit.c:1389
    do_wait_thread kernel/exit.c:1452 [inline]
    do_wait+0x441/0xa90 kernel/exit.c:1523
    kernel_wait4+0x1f5/0x370 kernel/exit.c:1665
    SYSC_wait4+0x134/0x140 kernel/exit.c:1677
    SyS_wait4+0x2c/0x40 kernel/exit.c:1673
    call_usermodehelper_exec_sync kernel/kmod.c:286 [inline]
    call_usermodehelper_exec_work+0x1a0/0x2c0 kernel/kmod.c:323
    process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2097
    worker_thread+0x223/0x1860 kernel/workqueue.c:2231
    kthread+0x35e/0x430 kernel/kthread.c:231
    ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:425

    Allocated by task 21267:
    save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
    save_stack+0x43/0xd0 mm/kasan/kasan.c:447
    set_track mm/kasan/kasan.c:459 [inline]
    kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
    kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
    kmem_cache_alloc+0x127/0x750 mm/slab.c:3561
    ccid_new+0x20e/0x390 net/dccp/ccid.c:151
    dccp_hdlr_ccid+0x27/0x140 net/dccp/feat.c:44
    __dccp_feat_activate+0x142/0x2a0 net/dccp/feat.c:344
    dccp_feat_activate_values+0x34e/0xa90 net/dccp/feat.c:1538
    dccp_rcv_request_sent_state_process net/dccp/input.c:472 [inline]
    dccp_rcv_state_process+0xed1/0x1620 net/dccp/input.c:677
    dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:679
    sk_backlog_rcv include/net/sock.h:911 [inline]
    __release_sock+0x124/0x360 net/core/sock.c:2269
    release_sock+0xa4/0x2a0 net/core/sock.c:2784
    inet_wait_for_connect net/ipv4/af_inet.c:557 [inline]
    __inet_stream_connect+0x671/0xf00 net/ipv4/af_inet.c:643
    inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:682
    SYSC_connect+0x204/0x470 net/socket.c:1642
    SyS_connect+0x24/0x30 net/socket.c:1623
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    Freed by task 3049:
    save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
    save_stack+0x43/0xd0 mm/kasan/kasan.c:447
    set_track mm/kasan/kasan.c:459 [inline]
    kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
    __cache_free mm/slab.c:3503 [inline]
    kmem_cache_free+0x77/0x280 mm/slab.c:3763
    ccid_hc_tx_delete+0xc5/0x100 net/dccp/ccid.c:190
    dccp_destroy_sock+0x1d1/0x2b0 net/dccp/proto.c:225
    inet_csk_destroy_sock+0x166/0x3f0 net/ipv4/inet_connection_sock.c:833
    dccp_done+0xb7/0xd0 net/dccp/proto.c:145
    dccp_time_wait+0x13d/0x300 net/dccp/minisocks.c:72
    dccp_rcv_reset+0x1d1/0x5b0 net/dccp/input.c:160
    dccp_rcv_state_process+0x8fc/0x1620 net/dccp/input.c:663
    dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:679
    sk_backlog_rcv include/net/sock.h:911 [inline]
    __sk_receive_skb+0x33e/0xc00 net/core/sock.c:521
    dccp_v4_rcv+0xef1/0x1c00 net/dccp/ipv4.c:871
    ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
    NF_HOOK include/linux/netfilter.h:248 [inline]
    ip_local_deliver+0x1ce/0x6d0 net/ipv4/ip_input.c:257
    dst_input include/net/dst.h:477 [inline]
    ip_rcv_finish+0x8db/0x19c0 net/ipv4/ip_input.c:397
    NF_HOOK include/linux/netfilter.h:248 [inline]
    ip_rcv+0xc3f/0x17d0 net/ipv4/ip_input.c:488
    __netif_receive_skb_core+0x19af/0x33d0 net/core/dev.c:4417
    __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4455
    process_backlog+0x203/0x740 net/core/dev.c:5130
    napi_poll net/core/dev.c:5527 [inline]
    net_rx_action+0x792/0x1910 net/core/dev.c:5593
    __do_softirq+0x2f5/0xba3 kernel/softirq.c:284

    The buggy address belongs to the object at ffff8801d2660100
    which belongs to the cache ccid2_hc_tx_sock of size 1240
    The buggy address is located 1088 bytes inside of
    1240-byte region [ffff8801d2660100, ffff8801d26605d8)
    The buggy address belongs to the page:
    page:ffffea0007499800 count:1 mapcount:0 mapping:ffff8801d2660100 index:0x0 compound_mapcount: 0
    flags: 0x200000000008100(slab|head)
    raw: 0200000000008100 ffff8801d2660100 0000000000000000 0000000100000005
    raw: ffffea00075271a0 ffffea0007538820 ffff8801d3aef9c0 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801d2660400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801d2660480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff8801d2660500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8801d2660580: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
    ffff8801d2660600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ==================================================================

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Cc: Gerrit Renker
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Aug, 2017

1 commit

  • syzkaller reported that DCCP could have a non empty
    write queue at dismantle time.

    WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:16 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:52
    panic+0x1e4/0x417 kernel/panic.c:180
    __warn+0x1c4/0x1d9 kernel/panic.c:541
    report_bug+0x211/0x2d0 lib/bug.c:183
    fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
    do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
    do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
    do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
    do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
    invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
    RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
    RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
    RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
    RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
    R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
    inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
    dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
    inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
    sock_release+0x8d/0x1e0 net/socket.c:597
    sock_close+0x16/0x20 net/socket.c:1126
    __fput+0x327/0x7e0 fs/file_table.c:210
    ____fput+0x15/0x20 fs/file_table.c:246
    task_work_run+0x18a/0x260 kernel/task_work.c:116
    exit_task_work include/linux/task_work.h:21 [inline]
    do_exit+0xa32/0x1b10 kernel/exit.c:865
    do_group_exit+0x149/0x400 kernel/exit.c:969
    get_signal+0x7e8/0x17e0 kernel/signal.c:2330
    do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
    exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
    prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
    syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Nov, 2016

1 commit

  • Andrey reported following warning while fuzzing with syzkaller

    WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
    ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
    0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
    Call Trace:
    [< inline >] __dump_stack lib/dump_stack.c:15
    [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
    [] panic+0x1bc/0x39d kernel/panic.c:179
    [] __warn+0x1cc/0x1f0 kernel/panic.c:542
    [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
    [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
    [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
    [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
    [] sock_release+0x8e/0x1d0 net/socket.c:570
    [] sock_close+0x16/0x20 net/socket.c:1017
    [] __fput+0x29d/0x720 fs/file_table.c:208
    [] ____fput+0x15/0x20 fs/file_table.c:244
    [] task_work_run+0xf8/0x170 kernel/task_work.c:116
    [< inline >] exit_task_work include/linux/task_work.h:21
    [] do_exit+0x883/0x2ac0 kernel/exit.c:828
    [] do_group_exit+0x10e/0x340 kernel/exit.c:931
    [] get_signal+0x634/0x15a0 kernel/signal.c:2307
    [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
    [] exit_to_usermode_loop+0xe5/0x130
    arch/x86/entry/common.c:156
    [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
    [] syscall_return_slowpath+0x1a8/0x1e0
    arch/x86/entry/common.c:259
    [] entry_SYSCALL_64_fastpath+0xc0/0xc2
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Kernel Offset: disabled

    Fix this the same way we did for TCP in commit 565b7b2d2e63
    ("tcp: do not send reset to already closed sockets")

    Signed-off-by: Eric Dumazet
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Dec, 2015

1 commit

  • This patch is a cleanup to make following patch easier to
    review.

    Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
    from (struct socket)->flags to a (struct socket_wq)->flags
    to benefit from RCU protection in sock_wake_async()

    To ease backports, we rename both constants.

    Two new helpers, sk_set_bit(int nr, struct sock *sk)
    and sk_clear_bit(int net, struct sock *sk) are added so that
    following patch can change their implementation.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Jul, 2015

1 commit

  • Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
    with flags = MSG_WAITALL | MSG_PEEK.

    sk_wait_data waits for sk_receive_queue not empty, but in this case,
    the receive queue is not empty, but does not contain any skb that we
    can use.

    Add a "last skb seen on receive queue" argument to sk_wait_data, so
    that it sleeps until the receive queue has new skbs.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
    Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
    Reported-by: Enrico Scholz
    Reported-by: Dan Searle
    Signed-off-by: Sabrina Dubroca
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

11 Dec, 2014

1 commit


24 Nov, 2014

1 commit


06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Oct, 2014

1 commit

  • Pull percpu updates from Tejun Heo:
    "A lot of activities on percpu front. Notable changes are...

    - percpu allocator now can take @gfp. If @gfp doesn't contain
    GFP_KERNEL, it tries to allocate from what's already available to
    the allocator and a work item tries to keep the reserve around
    certain level so that these atomic allocations usually succeed.

    This will replace the ad-hoc percpu memory pool used by
    blk-throttle and also be used by the planned blkcg support for
    writeback IOs.

    Please note that I noticed a bug in how @gfp is interpreted while
    preparing this pull request and applied the fix 6ae833c7fe0c
    ("percpu: fix how @gfp is interpreted by the percpu allocator")
    just now.

    - percpu_ref now uses longs for percpu and global counters instead of
    ints. It leads to more sparse packing of the percpu counters on
    64bit machines but the overhead should be negligible and this
    allows using percpu_ref for refcnting pages and in-memory objects
    directly.

    - The switching between percpu and single counter modes of a
    percpu_ref is made independent of putting the base ref and a
    percpu_ref can now optionally be initialized in single or killed
    mode. This allows avoiding percpu shutdown latency for cases where
    the refcounted objects may be synchronously created and destroyed
    in rapid succession with only a fraction of them reaching fully
    operational status (SCSI probing does this when combined with
    blk-mq support). It's also planned to be used to implement forced
    single mode to detect underflow more timely for debugging.

    There's a separate branch percpu/for-3.18-consistent-ops which cleans
    up the duplicate percpu accessors. That branch causes a number of
    conflicts with s390 and other trees. I'll send a separate pull
    request w/ resolutions once other branches are merged"

    * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
    percpu: fix how @gfp is interpreted by the percpu allocator
    blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
    percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
    percpu_ref: add PERCPU_REF_INIT_* flags
    percpu_ref: decouple switching to percpu mode and reinit
    percpu_ref: decouple switching to atomic mode and killing
    percpu_ref: add PCPU_REF_DEAD
    percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
    percpu_ref: replace pcpu_ prefix with percpu_
    percpu_ref: minor code and comment updates
    percpu_ref: relocate percpu_ref_reinit()
    Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
    Revert "percpu: free percpu allocation info for uniprocessor system"
    percpu-refcount: make percpu_ref based on longs instead of ints
    percpu-refcount: improve WARN messages
    percpu: fix locking regression in the failure path of pcpu_alloc()
    percpu-refcount: add @gfp to percpu_ref_init()
    proportions: add @gfp to init functions
    percpu_counter: add @gfp to percpu_counter_init()
    percpu_counter: make percpu_counters_lock irq-safe
    ...

    Linus Torvalds
     

09 Oct, 2014

1 commit

  • Pull networking updates from David Miller:
    "Most notable changes in here:

    1) By far the biggest accomplishment, thanks to a large range of
    contributors, is the addition of multi-send for transmit. This is
    the result of discussions back in Chicago, and the hard work of
    several individuals.

    Now, when the ->ndo_start_xmit() method of a driver sees
    skb->xmit_more as true, it can choose to defer the doorbell
    telling the driver to start processing the new TX queue entires.

    skb->xmit_more means that the generic networking is guaranteed to
    call the driver immediately with another SKB to send.

    There is logic added to the qdisc layer to dequeue multiple
    packets at a time, and the handling mis-predicted offloads in
    software is now done with no locks held.

    Finally, pktgen is extended to have a "burst" parameter that can
    be used to test a multi-send implementation.

    Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
    virtio_net

    Adding support is almost trivial, so export more drivers to
    support this optimization soon.

    I want to thank, in no particular or implied order, Jesper
    Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
    Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
    David Tat, Hannes Frederic Sowa, and Rusty Russell.

    2) PTP and timestamping support in bnx2x, from Michal Kalderon.

    3) Allow adjusting the rx_copybreak threshold for a driver via
    ethtool, and add rx_copybreak support to enic driver. From
    Govindarajulu Varadarajan.

    4) Significant enhancements to the generic PHY layer and the bcm7xxx
    driver in particular (EEE support, auto power down, etc.) from
    Florian Fainelli.

    5) Allow raw buffers to be used for flow dissection, allowing drivers
    to determine the optimal "linear pull" size for devices that DMA
    into pools of pages. The objective is to get exactly the
    necessary amount of headers into the linear SKB area pre-pulled,
    but no more. The new interface drivers use is eth_get_headlen().
    From WANG Cong, with driver conversions (several had their own
    by-hand duplicated implementations) by Alexander Duyck and Eric
    Dumazet.

    6) Support checksumming more smoothly and efficiently for
    encapsulations, and add "foo over UDP" facility. From Tom
    Herbert.

    7) Add Broadcom SF2 switch driver to DSA layer, from Florian
    Fainelli.

    8) eBPF now can load programs via a system call and has an extensive
    testsuite. Alexei Starovoitov and Daniel Borkmann.

    9) Major overhaul of the packet scheduler to use RCU in several major
    areas such as the classifiers and rate estimators. From John
    Fastabend.

    10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
    Duyck.

    11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
    Dumazet.

    12) Add Datacenter TCP congestion control algorithm support, From
    Florian Westphal.

    13) Reorganize sk_buff so that __copy_skb_header() is significantly
    faster. From Eric Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
    netlabel: directly return netlbl_unlabel_genl_init()
    net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
    net: description of dma_cookie cause make xmldocs warning
    cxgb4: clean up a type issue
    cxgb4: potential shift wrapping bug
    i40e: skb->xmit_more support
    net: fs_enet: Add NAPI TX
    net: fs_enet: Remove non NAPI RX
    r8169:add support for RTL8168EP
    net_sched: copy exts->type in tcf_exts_change()
    wimax: convert printk to pr_foo()
    af_unix: remove 0 assignment on static
    ipv6: Do not warn for informational ICMP messages, regardless of type.
    Update Intel Ethernet Driver maintainers list
    bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
    tipc: fix bug in multicast congestion handling
    net: better IFF_XMIT_DST_RELEASE support
    net/mlx4_en: remove NETDEV_TX_BUSY
    3c59x: fix bad split of cpu_to_le32(pci_map_single())
    net: bcmgenet: fix Tx ring priority programming
    ...

    Linus Torvalds
     

08 Oct, 2014

1 commit

  • Pull dmaengine updates from Dan Williams:
    "Even though this has fixes marked for -stable, given the size and the
    needed conflict resolutions this is 3.18-rc1/merge-window material.

    These patches have been languishing in my tree for a long while. The
    fact that I do not have the time to do proper/prompt maintenance of
    this tree is a primary factor in the decision to step down as
    dmaengine maintainer. That and the fact that the bulk of drivers/dma/
    activity is going through Vinod these days.

    The net_dma removal has not been in -next. It has developed simple
    conflicts against mainline and net-next (for-3.18).

    Continuing thanks to Vinod for staying on top of drivers/dma/.

    Summary:

    1/ Step down as dmaengine maintainer see commit 08223d80df38
    "dmaengine maintainer update"

    2/ Removal of net_dma, as it has been marked 'broken' since 3.13
    (commit 77873803363c "net_dma: mark broken"), without reports of
    performance regression.

    3/ Miscellaneous fixes"

    * tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
    net: make tcp_cleanup_rbuf private
    net_dma: revert 'copied_early'
    net_dma: simple removal
    dmaengine maintainer update
    dmatest: prevent memory leakage on error path in thread
    ioat: Use time_before_jiffies()
    dmaengine: fix xor sources continuation
    dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
    dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
    dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
    ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
    drivers: dma: Include appropriate header file in dca.c
    drivers: dma: Mark functions as static in dma_v3.c
    dma: mv_xor: Add DMA API error checks
    ioat/dca: Use dev_is_pci() to check whether it is pci device

    Linus Torvalds
     

02 Oct, 2014

1 commit


28 Sep, 2014

1 commit

  • Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
    and there is no plan to fix it.

    This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
    Reverting the remainder of the net_dma induced changes is deferred to
    subsequent patches.

    Marked for stable due to Roman's report of a memory leak in
    dma_pin_iovec_pages():

    https://lkml.org/lkml/2014/9/3/177

    Cc: Dave Jiang
    Cc: Vinod Koul
    Cc: David Whipple
    Cc: Alexander Duyck
    Cc:
    Reported-by: Roman Gushchin
    Acked-by: David S. Miller
    Signed-off-by: Dan Williams

    Dan Williams
     

08 Sep, 2014

1 commit

  • Percpu allocator now supports allocation mask. Add @gfp to
    percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
    with percpu_counters too.

    We could have left percpu_counter_init() alone and added
    percpu_counter_init_gfp(); however, the number of users isn't that
    high and introducing _gfp variants to all percpu data structures would
    be quite ugly, so let's just do the conversion. This is the one with
    the most users. Other percpu data structures are a lot easier to
    convert.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-by: Jan Kara
    Acked-by: "David S. Miller"
    Cc: x86@kernel.org
    Cc: Jens Axboe
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andrew Morton

    Tejun Heo
     

08 May, 2014

1 commit

  • commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%)
    reduced snmp array size to 1, so technically it doesn't have to be
    an array any more. What's more, after the following commit:

    commit 933393f58fef9963eac61db8093689544e29a600
    Date: Thu Dec 22 11:58:51 2011 -0600

    percpu: Remove irqsafe_cpu_xxx variants

    We simply say that regular this_cpu use must be safe regardless of
    preemption and interrupt state. That has no material change for x86
    and s390 implementations of this_cpu operations. However, arches that
    do not provide their own implementation for this_cpu operations will
    now get code generated that disables interrupts instead of preemption.

    probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
    almost 3 years, no one complains.

    So, just convert the array to a single pointer and remove snmp_mib_init()
    and snmp_mib_free() as well.

    Cc: Christoph Lameter
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

09 Oct, 2013

1 commit

  • TCP listener refactoring, part 3 :

    Our goal is to hash SYN_RECV sockets into main ehash for fast lookup,
    and parallel SYN processing.

    Current inet_ehash_bucket contains two chains, one for ESTABLISH (and
    friend states) sockets, another for TIME_WAIT sockets only.

    As the hash table is sized to get at most one socket per bucket, it
    makes little sense to have separate twchain, as it makes the lookup
    slightly more complicated, and doubles hash table memory usage.

    If we make sure all socket types have the lookup keys at the same
    offsets, we can use a generic and faster lookup. It turns out TIME_WAIT
    and ESTABLISHED sockets already have common lookup fields for IPv4.

    [ INET_TW_MATCH() is no longer needed ]

    I'll provide a follow-up to factorize IPv6 lookup as well, to remove
    INET6_TW_MATCH()

    This way, SYN_RECV pseudo sockets will be supported the same.

    A new sock_gen_put() helper is added, doing either a sock_put() or
    inet_twsk_put() [ and will support SYN_RECV later ].

    Note this helper should only be called in real slow path, when rcu
    lookup found a socket that was moved to another identity (freed/reused
    immediately), but could eventually be used in other contexts, like
    sock_edemux()

    Before patch :

    dmesg | grep "TCP established"

    TCP established hash table entries: 524288 (order: 11, 8388608 bytes)

    After patch :

    TCP established hash table entries: 524288 (order: 10, 4194304 bytes)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Jul, 2013

1 commit

  • Several call sites use the hardcoded following condition :

    sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

    Lets use a helper because TCP_NOTSENT_LOWAT support will change this
    condition for TCP sockets.

    Signed-off-by: Eric Dumazet
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 May, 2012

1 commit


20 Dec, 2011

1 commit

  • module_param(bool) used to counter-intuitively take an int. In
    fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
    trick.

    It's time to remove the int/unsigned int option. For this version
    it'll simply give a warning, but it'll break next kernel version.

    (Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Rusty Russell
    Signed-off-by: David S. Miller

    Rusty Russell
     

01 Aug, 2011

1 commit

  • This uses the new feature-negotiation framework to signal Ack Ratio changes,
    as required by RFC 4341, sec. 6.1.2.

    That raises some problems with CCID-2, which at the moment can not cope
    gracefully with Ack Ratios > 1. Since these issues are not directly related
    to feature negotiation, they are marked by a FIXME.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Samuel Jero
    Acked-by: Ian McDonald

    Gerrit Renker
     

07 Dec, 2010

2 commits

  • Ensure that cmsg->cmsg_type value is valid for qpolicy
    that is currently in use.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     
  • This patch adds a generic infrastructure for policy-based dequeueing of
    TX packets and provides two policies:
    * a simple FIFO policy (which is the default) and
    * a priority based policy (set via socket options).
    Both policies honour the tx_qlen sysctl for the maximum size of the write
    queue (can be overridden via socket options).

    The priority policy uses skb->priority internally to assign an u32 priority
    identifier, using the same ranking as SO_PRIORITY. The skb->priority field
    is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
    data using cmsg(3), the patch also provides the requisite parsing routines.

    Signed-off-by: Tomasz Grobelny
    Signed-off-by: Gerrit Renker

    Tomasz Grobelny
     

29 Oct, 2010

1 commit

  • This extends the existing wait-for-ccid routine so that it may be used with
    different types of CCID, addressing the following problems:

    1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite delay (i.e., application "hangs").
    2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
    3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
    4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

    These problems are addressed by giving the CCID a grace period of up to the
    `timeout' value.

    The wait-for-ccid function is, as before, used when the application
    (a) has read all the data in its receive buffer and
    (b) if SO_LINGER was set with a non-zero linger time, or
    (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
    state (client application closes after receiving CloseReq).

    In addition, there is a catch-all case of __skb_queue_purge() after waiting for
    the CCID. This is necessary since the write queue may still have data when
    (a) the host has been passively-closed,
    (b) abnormal termination (unread data, zero linger time),
    (c) wait-for-ccid could not finish within the given time limit.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     

12 Oct, 2010

1 commit

  • This omits the redundant "DCCP:" in warning messages, since DCCP_WARN() already
    echoes the function name, avoiding messages like

    kernel: [10988.766503] dccp_close: DCCP: ABORT -- 209 bytes unread

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

07 Oct, 2010

1 commit


26 Jun, 2010

1 commit

  • In preparation for 64bit snmp counters for some mibs,
    add an 'align' parameter to snmp_mib_init(), instead
    of assuming mibs only contain 'unsigned long' fields.

    Callers can use __alignof__(type) to provide correct
    alignment.

    Signed-off-by: Eric Dumazet
    CC: Herbert Xu
    CC: Arnaldo Carvalho de Melo
    CC: Hideaki YOSHIFUJI
    CC: Vlad Yasevich
    Signed-off-by: David S. Miller

    Eric Dumazet
     

31 May, 2010

1 commit

  • Use memdup_user when user data is immediately copied into the
    allocated region.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    expression from,to,size,flag;
    position p;
    identifier l1,l2;
    @@

    - to = \(kmalloc@p\|kzalloc@p\)(size,flag);
    + to = memdup_user(from,size);
    if (
    - to==NULL
    + IS_ERR(to)
    || ...) {

    }
    - if (copy_from_user(to, from, size) != 0) {
    -
    - }
    //

    Signed-off-by: Julia Lawall
    Acked-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Julia Lawall
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet