23 Jan, 2019

1 commit

  • Next to snooping IGMP/MLD queries RFC4541, section 2.1.1.a) recommends
    to snoop multicast router advertisements to detect multicast routers.

    Multicast router advertisements are sent to an "all-snoopers"
    multicast address. To be able to receive them reliably, we need to
    join this group.

    Otherwise other snooping switches might refrain from forwarding these
    advertisements to us.

    Signed-off-by: Linus Lüssing
    Signed-off-by: David S. Miller

    Linus Lüssing
     

20 Oct, 2018

1 commit

  • net/sched/cls_api.c has overlapping changes to a call to
    nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
    to the 5th argument, and another (from 'net-next') added cb->extack
    instead of NULL to the 6th argument.

    net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
    code which moved (to mr_table_dump)) in 'net-next'. Thanks to David
    Ahern for the heads up.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Oct, 2018

1 commit

  • syzbot found a use-after-free in inet6_mc_check [1]

    The problem here is that inet6_mc_check() uses rcu
    and read_lock(&iml->sflock)

    So the fact that ip6_mc_leave_src() is called under RTNL
    and the socket lock does not help us, we need to acquire
    iml->sflock in write mode.

    In the future, we should convert all this stuff to RCU.

    [1]
    BUG: KASAN: use-after-free in ipv6_addr_equal include/net/ipv6.h:521 [inline]
    BUG: KASAN: use-after-free in inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
    Read of size 8 at addr ffff8801ce7f2510 by task syz-executor0/22432

    CPU: 1 PID: 22432 Comm: syz-executor0 Not tainted 4.19.0-rc7+ #280
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
    print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    ipv6_addr_equal include/net/ipv6.h:521 [inline]
    inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
    __raw_v6_lookup+0x320/0x3f0 net/ipv6/raw.c:98
    ipv6_raw_deliver net/ipv6/raw.c:183 [inline]
    raw6_local_deliver+0x3d3/0xcb0 net/ipv6/raw.c:240
    ip6_input_finish+0x467/0x1aa0 net/ipv6/ip6_input.c:345
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:426
    ip6_mc_input+0x48a/0xd20 net/ipv6/ip6_input.c:503
    dst_input include/net/dst.h:450 [inline]
    ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ipv6_rcv+0x120/0x640 net/ipv6/ip6_input.c:271
    __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
    __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
    netif_receive_skb_internal+0x12c/0x620 net/core/dev.c:5126
    napi_frags_finish net/core/dev.c:5664 [inline]
    napi_gro_frags+0x75a/0xc90 net/core/dev.c:5737
    tun_get_user+0x3189/0x4250 drivers/net/tun.c:1923
    tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1968
    call_write_iter include/linux/fs.h:1808 [inline]
    do_iter_readv_writev+0x8b0/0xa80 fs/read_write.c:680
    do_iter_write+0x185/0x5f0 fs/read_write.c:959
    vfs_writev+0x1f1/0x360 fs/read_write.c:1004
    do_writev+0x11a/0x310 fs/read_write.c:1039
    __do_sys_writev fs/read_write.c:1112 [inline]
    __se_sys_writev fs/read_write.c:1109 [inline]
    __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457421
    Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b5 fb ff c3 48 83 ec 08 e8 1a 2d 00 00 48 89 04 24 b8 14 00 00 00 0f 05 8b 3c 24 48 89 c2 e8 63 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
    RSP: 002b:00007f2d30ecaba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
    RAX: ffffffffffffffda RBX: 000000000000003e RCX: 0000000000457421
    RDX: 0000000000000001 RSI: 00007f2d30ecabf0 RDI: 00000000000000f0
    RBP: 0000000020000500 R08: 00000000000000f0 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000293 R12: 00007f2d30ecb6d4
    R13: 00000000004c4890 R14: 00000000004d7b90 R15: 00000000ffffffff

    Allocated by task 22437:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
    __do_kmalloc mm/slab.c:3718 [inline]
    __kmalloc+0x14e/0x760 mm/slab.c:3727
    kmalloc include/linux/slab.h:518 [inline]
    sock_kmalloc+0x15a/0x1f0 net/core/sock.c:1983
    ip6_mc_source+0x14dd/0x1960 net/ipv6/mcast.c:427
    do_ipv6_setsockopt.isra.9+0x3afb/0x45d0 net/ipv6/ipv6_sockglue.c:743
    ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:933
    rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1069
    sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3038
    __sys_setsockopt+0x1ba/0x3c0 net/socket.c:1902
    __do_sys_setsockopt net/socket.c:1913 [inline]
    __se_sys_setsockopt net/socket.c:1910 [inline]
    __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1910
    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 22430:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kfree+0xcf/0x230 mm/slab.c:3813
    __sock_kfree_s net/core/sock.c:2004 [inline]
    sock_kfree_s+0x29/0x60 net/core/sock.c:2010
    ip6_mc_leave_src+0x11a/0x1d0 net/ipv6/mcast.c:2448
    __ipv6_sock_mc_close+0x20b/0x4e0 net/ipv6/mcast.c:310
    ipv6_sock_mc_close+0x158/0x1d0 net/ipv6/mcast.c:328
    inet6_release+0x40/0x70 net/ipv6/af_inet6.c:452
    __sock_release+0xd7/0x250 net/socket.c:579
    sock_close+0x19/0x20 net/socket.c:1141
    __fput+0x385/0xa30 fs/file_table.c:278
    ____fput+0x15/0x20 fs/file_table.c:309
    task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
    tracehook_notify_resume include/linux/tracehook.h:193 [inline]
    exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
    prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
    syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
    do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8801ce7f2500
    which belongs to the cache kmalloc-192 of size 192
    The buggy address is located 16 bytes inside of
    192-byte region [ffff8801ce7f2500, ffff8801ce7f25c0)
    The buggy address belongs to the page:
    page:ffffea000739fc80 count:1 mapcount:0 mapping:ffff8801da800040 index:0x0
    flags: 0x2fffc0000000100(slab)
    raw: 02fffc0000000100 ffffea0006f6e548 ffffea000737b948 ffff8801da800040
    raw: 0000000000000000 ffff8801ce7f2000 0000000100000010 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801ce7f2400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801ce7f2480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    >ffff8801ce7f2500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8801ce7f2580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    ffff8801ce7f2600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Sep, 2018

1 commit

  • The socket option will be enabled by default to ensure current behaviour
    is not changed. This is the same for the IPv4 version.

    A socket bound to in6addr_any and a specific port will receive all traffic
    on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
    one or more multicast groups were joined (using said socket) and only
    pass on multicast traffic from groups, which were explicitly joined via
    this socket.

    Without this option disabled a socket (system even) joined to multiple
    multicast groups is very hard to get right. Filtering by destination
    address has to take place in user space to avoid receiving multicast
    traffic from other multicast groups, which might have traffic on the same
    port.

    The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
    too, is not done to avoid changing the behaviour of current applications.

    Signed-off-by: Andre Naujoks
    Acked-By: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Andre Naujoks
     

25 Jul, 2018

1 commit


22 Jul, 2018

2 commits

  • There are two scenarios that we will restore deleted records. The first is
    when device down and up(or unmap/remap). In this scenario the new filter
    mode is same with previous one. Because we get it from in_dev->mc_list and
    we do not touch it during device down and up.

    The other scenario is when a new socket join a group which was just delete
    and not finish sending status reports. In this scenario, we should use the
    current filter mode instead of restore old one. Here are 4 cases in total.

    old_socket new_socket before_fix after_fix
    IN(A) IN(A) ALLOW(A) ALLOW(A)
    IN(A) EX( ) TO_IN( ) TO_EX( )
    EX( ) IN(A) TO_EX( ) ALLOW(A)
    EX( ) EX( ) TO_EX( ) TO_EX( )

    Fixes: 24803f38a5c0b (igmp: do not remove igmp souce list info when set link down)
    Fixes: 1666d49e1d416 (mld: do not remove mld souce list info when set link down)
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     
  • Remove the mode parameter for igmp/igmp6_group_added as we can get it
    from first parameter.

    Fixes: 6e2059b53f988 (ipv4/igmp: init group mode as INCLUDE when join source group)
    Fixes: c7ea20c9da5b9 (ipv6/mcast: init as INCLUDE when join SSM INCLUDE group)
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

17 Jul, 2018

1 commit

  • This an IPv6 version patch of "ipv4/igmp: init group mode as INCLUDE when
    join source group". From RFC3810, part 6.1:

    If no per-interface state existed for that
    multicast address before the change (i.e., the change consisted of
    creating a new per-interface record), or if no state exists after the
    change (i.e., the change consisted of deleting a per-interface
    record), then the "non-existent" state is considered to have an
    INCLUDE filter mode and an empty source list.

    Which means a new multicast group should start with state IN(). Currently,
    for MLDv2 SSM JOIN_SOURCE_GROUP mode, we first call ipv6_sock_mc_join(),
    then ip6_mc_source(), which will trigger a TO_IN() message instead of
    ALLOW().

    The issue was exposed by commit a052517a8ff65 ("net/multicast: should not
    send source list records when have filter mode change"). Before this change,
    we sent both ALLOW(A) and TO_IN(A). Now, we only send TO_IN(A).

    Fix it by adding a new parameter to init group mode. Also add some wrapper
    functions to avoid changing too much code.

    v1 -> v2:
    In the first version I only cleared the group change record. But this is not
    enough. Because when a new group join, it will init as EXCLUDE and trigger
    a filter mode change in ip/ip6_mc_add_src(), which will clear all source
    addresses sf_crcount. This will prevent early joined address sending state
    change records if multi source addressed joined at the same time.

    In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM
    JOIN_SOURCE_GROUP. I also split the original patch into two separated patches
    for IPv4 and IPv6.

    There is also a difference between v4 and v6 version. For IPv6, when the
    interface goes down and up, we will send correct state change record with
    unspecified IPv6 address (::) with function ipv6_mc_up(). But after DAD is
    completed, we resend the change record TO_IN() in mld_send_initial_cr().
    Fix it by sending ALLOW() for INCLUDE mode in mld_send_initial_cr().

    Fixes: a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change")
    Reviewed-by: Stefano Brivio
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

23 Jun, 2018

1 commit

  • After recieving MLD querys, we update idev->mc_maxdelay with max_delay
    from query header. This make the later unsolicited reports have the same
    interval with mc_maxdelay, which means we may send unsolicited reports with
    long interval time instead of default configured interval time.

    Also as we will not call ipv6_mc_reset() after device up. This issue will
    be there even after leave the group and join other groups.

    Fixes: fc4eba58b4c14 ("ipv6: make unsolicited report intervals configurable for mld")
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

16 May, 2018

1 commit

  • Variants of proc_create{,_data} that directly take a struct seq_operations
    and deal with network namespaces in ->open and ->release. All callers of
    proc_create + seq_open_net converted over, and seq_{open,release}_net are
    removed entirely.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

28 Mar, 2018

1 commit


27 Mar, 2018

1 commit

  • Prefer the direct use of octal for permissions.

    Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
    and some typing.

    Miscellanea:

    o Whitespace neatening around these conversions.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

05 Mar, 2018

1 commit

  • IPv6 does path selection for multipath routes deep in the lookup
    functions. The next patch adds L4 hash option and needs the skb
    for the forward path. To get the skb to the relevant FIB lookup
    functions it needs to go through the fib rules layer, so add a
    lookup_data argument to the fib_lookup_arg struct.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    David Ahern
     

20 Feb, 2018

1 commit

  • These pernet_operations create and destroy net::ipv6.icmp_sk
    socket, used to send ICMP or error reply.

    Nobody can dereference the socket to handle a packet before
    net is initialized, as there is no routing; nobody can do
    that in parallel with exit, as all of devices are moved
    to init_net or destroyed and there are no packets it-flight.
    So, it's possible to mark these pernet_operations as async.

    The same for ndisc_net_ops and for igmp6_net_ops. The last
    one also creates and destroys /proc entries.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

07 Feb, 2018

1 commit

  • This macro is only used by net/ipv6/mcast.c, but there is no reason
    why it must be BUILD_BUG_ON_NULL().

    Replace it with BUILD_BUG_ON_ZERO(), and remove BUILD_BUG_ON_NULL()
    definition from .

    Link: http://lkml.kernel.org/r/1515121833-3174-3-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Cc: Ian Abbott
    Cc: Masahiro Yamada
    Cc: Hideaki YOSHIFUJI
    Cc: Alexey Kuznetsov
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

20 Jan, 2018

1 commit


17 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

14 Dec, 2017

1 commit

  • syzkaller reported crashes in IPv6 stack [1]

    Xin Long found that lo MTU was set to silly values.

    IPv6 stack reacts to changes to small MTU, by disabling itself under
    RTNL.

    But there is a window where threads not using RTNL can see a wrong
    device mtu. This can lead to surprises, in mld code where it is assumed
    the mtu is suitable.

    Fix this by reading device mtu once and checking IPv6 minimal MTU.

    [1]
    skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20
    head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:104!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100
    RSP: 0018:ffff8801db307508 EFLAGS: 00010286
    RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000
    RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95
    RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020
    R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540
    FS: 0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:

    skb_over_panic net/core/skbuff.c:109 [inline]
    skb_put+0x181/0x1c0 net/core/skbuff.c:1694
    add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695
    add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817
    mld_send_cr net/ipv6/mcast.c:1903 [inline]
    mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448
    call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320
    expire_timers kernel/time/timer.c:1357 [inline]
    __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660
    run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
    __do_softirq+0x29d/0xbb2 kernel/softirq.c:285
    invoke_softirq kernel/softirq.c:365 [inline]
    irq_exit+0x1d3/0x210 kernel/softirq.c:405
    exiting_irq arch/x86/include/asm/apic.h:540 [inline]
    smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
    apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Tested-by: Xin Long
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Nov, 2017

1 commit

  • This converts all remaining cases of the old setup_timer() API into using
    timer_setup(), where the callback argument is the structure already
    holding the struct timer_list. These should have no behavioral changes,
    since they just change which pointer is passed into the callback with
    the same available pointers after conversion. It handles the following
    examples, in addition to some other variations.

    Casting from unsigned long:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

    and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

    become:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    Direct function assignments:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

    have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

    And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

    have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    The conversion is done with the following Coccinelle script:

    spatch --very-quiet --all-includes --include-headers \
    -I ./arch/x86/include -I ./arch/x86/include/generated \
    -I ./include -I ./arch/x86/include/uapi \
    -I ./arch/x86/include/generated/uapi -I ./include/uapi \
    -I ./include/generated/uapi --include ./include/linux/kconfig.h \
    --dir . \
    --cocci-file ~/src/data/timer_setup.cocci

    @fix_address_of@
    expression e;
    @@

    setup_timer(
    -&(e)
    +&e
    , ...)

    // Update any raw setup_timer() usages that have a NULL callback, but
    // would otherwise match change_timer_function_usage, since the latter
    // will update all function assignments done in the face of a NULL
    // function initialization in setup_timer().
    @change_timer_function_usage_NULL@
    expression _E;
    identifier _timer;
    type _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, NULL, _E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, &_E);
    +timer_setup(&_E._timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
    +timer_setup(&_E._timer, NULL, 0);
    )

    @change_timer_function_usage@
    expression _E;
    identifier _timer;
    struct timer_list _stl;
    identifier _callback;
    type _cast_func, _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, _callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    _E->_timer@_stl.function = _callback;
    |
    _E->_timer@_stl.function = &_callback;
    |
    _E->_timer@_stl.function = (_cast_func)_callback;
    |
    _E->_timer@_stl.function = (_cast_func)&_callback;
    |
    _E._timer@_stl.function = _callback;
    |
    _E._timer@_stl.function = &_callback;
    |
    _E._timer@_stl.function = (_cast_func)_callback;
    |
    _E._timer@_stl.function = (_cast_func)&_callback;
    )

    // callback(unsigned long arg)
    @change_callback_handle_cast
    depends on change_timer_function_usage@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    (
    ... when != _origarg
    _handletype *_handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    )
    }

    // callback(unsigned long arg) without existing variable
    @change_callback_handle_cast_no_arg
    depends on change_timer_function_usage &&
    !change_callback_handle_cast@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    + _handletype *_origarg = from_timer(_origarg, t, _timer);
    +
    ... when != _origarg
    - (_handletype *)_origarg
    + _origarg
    ... when != _origarg
    }

    // Avoid already converted callbacks.
    @match_callback_converted
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    { ... }

    // callback(struct something *handle)
    @change_callback_handle_arg
    depends on change_timer_function_usage &&
    !match_callback_converted &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_handletype *_handle
    +struct timer_list *t
    )
    {
    + _handletype *_handle = from_timer(_handle, t, _timer);
    ...
    }

    // If change_callback_handle_arg ran on an empty function, remove
    // the added handler.
    @unchange_callback_handle_arg
    depends on change_timer_function_usage &&
    change_callback_handle_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    {
    - _handletype *_handle = from_timer(_handle, t, _timer);
    }

    // We only want to refactor the setup_timer() data argument if we've found
    // the matching callback. This undoes changes in change_timer_function_usage.
    @unchange_timer_function_usage
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg &&
    !change_callback_handle_arg@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type change_timer_function_usage._cast_data;
    @@

    (
    -timer_setup(&_E->_timer, _callback, 0);
    +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    |
    -timer_setup(&_E._timer, _callback, 0);
    +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    )

    // If we fixed a callback from a .function assignment, fix the
    // assignment cast now.
    @change_timer_function_assignment
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_func;
    typedef TIMER_FUNC_TYPE;
    @@

    (
    _E->_timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -&_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    )

    // Sometimes timer functions are called directly. Replace matched args.
    @change_timer_function_calls
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression _E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_data;
    @@

    _callback(
    (
    -(_cast_data)_E
    +&_E->_timer
    |
    -(_cast_data)&_E
    +&_E._timer
    |
    -_E
    +&_E->_timer
    )
    )

    // If a timer has been configured without a data argument, it can be
    // converted without regard to the callback argument, since it is unused.
    @match_timer_function_unused_data@
    expression _E;
    identifier _timer;
    identifier _callback;
    @@

    (
    -setup_timer(&_E->_timer, _callback, 0);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0L);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0UL);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0L);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0UL);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0L);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0UL);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0L);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0UL);
    +timer_setup(_timer, _callback, 0);
    )

    @change_callback_unused_data
    depends on match_timer_function_unused_data@
    identifier match_timer_function_unused_data._callback;
    type _origtype;
    identifier _origarg;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *unused
    )
    {
    ... when != _origarg
    }

    Signed-off-by: Kees Cook

    Kees Cook
     

04 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

16 Jun, 2017

3 commits

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions (skb_put, __skb_put and pskb_put) return void *
    and remove all the casts across the tree, adding a (u8 *) cast only
    where the unsigned char pointer was used directly, all done with the
    following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    which actually doesn't cover pskb_put since there are only three
    users overall.

    A handful of stragglers were converted manually, notably a macro in
    drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
    instances in net/bluetooth/hci_sock.c. In the former file, I also
    had to fix one whitespace problem spatch introduced.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • A common pattern with skb_put() is to just want to memcpy()
    some data into the new space, introduce skb_put_data() for
    this.

    An spatch similar to the one for skb_put_zero() converts many
    of the places using it:

    @@
    identifier p, p2;
    expression len, skb, data;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, len);
    |
    -memcpy(p, data, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb, data;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, sizeof(*p));
    |
    -memcpy(p, data, sizeof(*p));
    )

    @@
    expression skb, len, data;
    @@
    -memcpy(skb_put(skb, len), data, len);
    +skb_put_data(skb, data, len);

    (again, manually post-processed to retain some comments)

    Reviewed-by: Stephen Hemminger
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

29 Mar, 2017

1 commit


10 Feb, 2017

1 commit

  • In function igmpv3/mld_add_delrec() we allocate pmc and put it in
    idev->mc_tomb, so we should free it when we don't need it in del_delrec().
    But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.

    Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when ...")
    Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when ...")
    Reported-by: Daniel Borkmann
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

17 Jan, 2017

1 commit

  • This is an IPv6 version of commit 24803f38a5c0 ("igmp: do not remove igmp
    souce list..."). In mld_del_delrec(), we will restore back all source filter
    info instead of flush them.

    Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
    we should not remove source list info when set link down. Remove
    igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
    ipv6_mc_down().

    Also clear all source info after igmp6_group_dropped() instead of in it
    because ipv6_mc_down() will call igmp6_group_dropped().

    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

21 Oct, 2016

1 commit

  • Baozeng reported this deadlock case:

    CPU0 CPU1
    ---- ----
    lock([ 165.136033] sk_lock-AF_INET6);
    lock([ 165.136033] rtnl_mutex);
    lock([ 165.136033] sk_lock-AF_INET6);
    lock([ 165.136033] rtnl_mutex);

    Similar to commit 87e9f0315952
    ("ipv4: fix a potential deadlock in mcast getsockopt() path")
    this is due to we still have a case, ipv6_sock_mc_close(),
    where we acquire sk_lock before rtnl_lock. Close this deadlock
    with the similar solution, that is always acquire rtnl lock first.

    Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
    Reported-by: Baozeng Ding
    Tested-by: Baozeng Ding
    Cc: Marcelo Ricardo Leitner
    Signed-off-by: Cong Wang
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    WANG Cong
     

09 Aug, 2016

1 commit

  • Based on RFC3376 5.1 and RFC3810 6.1

    If the per-interface listening change that triggers the new report is
    a filter mode change, then the next [Robustness Variable] State
    Change Reports will include a Filter Mode Change Record. This
    applies even if any number of source list changes occur in that
    period.

    Old State New State State Change Record Sent
    --------- --------- ------------------------
    INCLUDE (A) EXCLUDE (B) TO_EX (B)
    EXCLUDE (A) INCLUDE (B) TO_IN (B)

    So we should not send source-list change if there is a filter-mode change.

    Here are two scenarios:
    1. Group deleted and filter mode is EXCLUDE, which means we need send a
    TO_IN { }.
    2. Not group deleted, but has pcm->crcount, which means we need send a
    normal filter-mode-change.

    At the same time, if the type is ALLOW or BLOCK, and have psf->sf_crcount,
    we stop add records and decrease sf_crcount directly

    Reference: https://www.ietf.org/mail-archive/web/magma/current/msg01274.html

    Signed-off-by: Hangbin Liu
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hangbin Liu
     

04 Mar, 2016

1 commit

  • The current reserved_tailroom calculation fails to take hlen and tlen into
    account.

    skb:
    [__hlen__|__data____________|__tlen___|__extra__]
    ^ ^
    head skb_end_offset

    In this representation, hlen + data + tlen is the size passed to alloc_skb.
    "extra" is the extra space made available in __alloc_skb because of
    rounding up by kmalloc. We can reorder the representation like so:

    [__hlen__|__data____________|__extra__|__tlen___]
    ^ ^
    head skb_end_offset

    The maximum space available for ip headers and payload without
    fragmentation is min(mtu, data + extra). Therefore,
    reserved_tailroom
    = data + extra + tlen - min(mtu, data + extra)
    = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
    = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)

    Compare the second line to the current expression:
    reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
    and we can see that hlen and tlen are not taken into account.

    The min() in the third line can be expanded into:
    if mtu < skb_tailroom - tlen:
    reserved_tailroom = skb_tailroom - mtu
    else:
    reserved_tailroom = tlen

    Depending on hlen, tlen, mtu and the number of multicast address records,
    the current code may output skbs that have less tailroom than
    dev->needed_tailroom or it may output more skbs than needed because not all
    space available is used.

    Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
    Signed-off-by: Benjamin Poirier
    Acked-by: Hannes Frederic Sowa
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Benjamin Poirier
     

17 Nov, 2015

1 commit

  • the OUTMCAST stat is double incremented, getting bumped once in the mcast code
    itself, and again in the common ip output path. Remove the mcast bump, as its
    not needed

    Validated by the reporter, with good results

    Signed-off-by: Neil Horman
    Reported-by: Claus Jensen
    CC: Claus Jensen
    CC: David Miller
    Signed-off-by: David S. Miller

    Neil Horman
     

08 Oct, 2015

1 commit


18 Sep, 2015

3 commits

  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a sock paramter to dst_output making dst_output_sk superfluous.
    Add a skb->sk parameter to all of the callers of dst_output
    Have the callers of dst_output_sk call dst_output.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

01 Apr, 2015

2 commits

  • The ipv6 code uses a mixture of coding styles. In some instances check for NULL
    pointer is done as x != NULL and sometimes as x. x is preferred according to
    checkpatch and this patch makes the code consistent by adopting the latter
    form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     
  • The ipv6 code uses a mixture of coding styles. In some instances check for NULL
    pointer is done as x == NULL and sometimes as !x. !x is preferred according to
    checkpatch and this patch makes the code consistent by adopting the latter
    form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

19 Mar, 2015

1 commit

  • in favor of their inner __ ones, which doesn't grab rtnl.

    As these functions need to operate on a locked socket, we can't be
    grabbing rtnl by then. It's too late and doing so causes reversed
    locking.

    So this patch:
    - move rtnl handling to callers instead while already fixing some
    reversed locking situations, like on vxlan and ipvs code.
    - renames __ ones to not have the __ mark:
    __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
    __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}

    Signed-off-by: Marcelo Ricardo Leitner
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

28 Feb, 2015

2 commits

  • Joining multicast group on ethernet level via "ip maddr" command would
    not work if we have an Ethernet switch that does igmp snooping since
    the switch would not replicate multicast packets on ports that did not
    have IGMP reports for the multicast addresses.

    Linux vxlan interfaces created via "ip link add vxlan" have the group option
    that enables then to do the required join.

    By extending ip address command with option "autojoin" we can get similar
    functionality for openvswitch vxlan interfaces as well as other tunneling
    mechanisms that need to receive multicast traffic. The kernel code is
    structured similar to how the vxlan driver does a group join / leave.

    example:
    ip address add 224.1.1.10/24 dev eth5 autojoin
    ip address del 224.1.1.10/24 dev eth5

    Signed-off-by: Madhu Challa
    Signed-off-by: David S. Miller

    Madhu Challa
     
  • Based on the igmp v4 changes from Eric Dumazet.
    959d10f6bbf6("igmp: add __ip_mc_{join|leave}_group()")

    These changes are needed to perform igmp v6 join/leave while
    RTNL is held.

    Make ipv6_sock_mc_join and ipv6_sock_mc_drop wrappers around
    __ipv6_sock_mc_join and __ipv6_sock_mc_drop to avoid
    proliferation of work queues.

    Signed-off-by: Madhu Challa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Madhu Challa