28 Mar, 2018

1 commit


27 Mar, 2018

2 commits

  • Remove local ADBG macro and use netdev_dbg/pr_debug

    Miscellanea:

    o Remove unnecessary debug message after allocation failure as there
    already is a dump_stack() on the failure paths
    o Leave the allocation failure message on snmp6_alloc_dev as there
    is one code path that does not do a dump_stack()

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Prefer the direct use of octal for permissions.

    Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
    and some typing.

    Miscellanea:

    o Whitespace neatening around these conversions.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

16 Mar, 2018

2 commits

  • Lookup the L3 master device for the passed in device. Only consider
    addresses on netdev's with the same master device. If the device is
    not enslaved or is NULL, then the l3mdev is NULL which means only
    devices not enslaved (ie, in the default domain) are considered.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     
  • ipv6_chk_addr_and_flags determines if an address is a local address and
    optionally if it is an address on a specific device. For example, it is
    called by ip6_route_info_create to determine if a given gateway address
    is a local address. The address check currently does not consider L3
    domains and as a result does not allow a route to be added in one VRF
    if the nexthop points to an address in a second VRF. e.g.,

    $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
    Error: Invalid gateway address.

    where 2001:db8:102::23 is an address on an interface in vrf r1.

    ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
    with a separate argument to not limit the address to the specific device.
    The device is used used to determine the L3 domain of interest.

    To that end add an argument to skip the device check and update callers
    to always pass a device where possible and use the new argument to mean
    any address in the domain.

    Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
    patch handles the change to these callers without adding the domain check.

    ip6_validate_gw needs to handle 2 cases - one where the device is given
    as part of the nexthop spec and the other where the device is resolved.
    There is at least 1 VRF case where deferring the check to only after
    the route lookup has resolved the device fails with an unintuitive error
    "RTNETLINK answers: No route to host" as opposed to the preferred
    "Error: Gateway can not be a local address." The 'no route to host'
    error is because of the fallback to a full lookup. The check is done
    twice to avoid this error.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

02 Mar, 2018

1 commit

  • According to RFC 4429 (section 3.1), adding new IPv6 addresses as
    optimistic addresses is acceptable, as long as the implementation
    follows some rules:

    * Optimistic DAD SHOULD only be used when the implementation is aware
    that the address is based on a most likely unique interface
    identifier (such as in [RFC2464]), generated randomly [RFC3041],
    or by a well-distributed hash function [RFC3972] or assigned by
    Dynamic Host Configuration Protocol for IPv6 (DHCPv6) [RFC3315].
    Optimistic DAD SHOULD NOT be used for manually entered
    addresses.

    Thus, it seems reasonable to allow userspace to set the optimistic flag
    when adding new addresses.

    We must not let userspace set NODAD + OPTIMISTIC, since if the kernel is
    not performing DAD we would never clear the optimistic flag. We must
    also ignore userspace's request to add OPTIMISTIC flag to addresses that
    have already completed DAD (addresses that don't have the TENTATIVE
    flag, or that have the DADFAILED flag).

    Then we also need to clear the OPTIMISTIC flag on permanent addresses
    when DAD fails. Otherwise, IFA_F_OPTIMISTIC addresses added by userspace
    can still be used after DAD has failed, because in
    ipv6_chk_addr_and_flags(), IFA_F_OPTIMISTIC overrides IFA_F_TENTATIVE.

    Setting IFA_F_OPTIMISTIC from userspace is conditional on
    CONFIG_IPV6_OPTIMISTIC_DAD and the optimistic_dad sysctl.

    Signed-off-by: Sabrina Dubroca
    Reviewed-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

20 Feb, 2018

1 commit


13 Feb, 2018

1 commit

  • These pernet_operations (un)register sysctl, which
    are not touched by anybody else.

    So, it's safe to make them async.

    Signed-off-by: Kirill Tkhai
    Acked-by: Andrei Vagin
    Signed-off-by: David S. Miller

    Kirill Tkhai
     

30 Jan, 2018

2 commits

  • Heiner reported a lockdep splat [1]

    This is caused by attempting GFP_KERNEL allocation while RCU lock is
    held and BH blocked.

    We believe that addrconf_verify_rtnl() could run for a long period,
    so instead of using GFP_ATOMIC here as Ido suggested, we should break
    the critical section and restart it after the allocation.

    [1]
    [86220.125562] =============================
    [86220.125586] WARNING: suspicious RCU usage
    [86220.125612] 4.15.0-rc7-next-20180110+ #7 Not tainted
    [86220.125641] -----------------------------
    [86220.125666] kernel/sched/core.c:6026 Illegal context switch in RCU-bh read-side critical section!
    [86220.125711]
    other info that might help us debug this:

    [86220.125755]
    rcu_scheduler_active = 2, debug_locks = 1
    [86220.125792] 4 locks held by kworker/0:2/1003:
    [86220.125817] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [] process_one_work+0x1de/0x680
    [86220.125895] #1: ((addr_chk_work).work){+.+.}, at: [] process_one_work+0x1de/0x680
    [86220.125959] #2: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x12/0x20
    [86220.126017] #3: (rcu_read_lock_bh){....}, at: [] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
    [86220.126111]
    stack backtrace:
    [86220.126142] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
    [86220.126185] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
    [86220.126250] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
    [86220.126288] Call Trace:
    [86220.126312] dump_stack+0x70/0x9e
    [86220.126337] lockdep_rcu_suspicious+0xce/0xf0
    [86220.126365] ___might_sleep+0x1d3/0x240
    [86220.126390] __might_sleep+0x45/0x80
    [86220.126416] kmem_cache_alloc_trace+0x53/0x250
    [86220.126458] ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
    [86220.126498] ipv6_add_addr+0xfe/0x6e0 [ipv6]
    [86220.126538] ipv6_create_tempaddr+0x24d/0x430 [ipv6]
    [86220.126580] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
    [86220.126623] addrconf_verify_rtnl+0x339/0x510 [ipv6]
    [86220.126664] ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
    [86220.126708] addrconf_verify_work+0xe/0x20 [ipv6]
    [86220.126738] process_one_work+0x258/0x680
    [86220.126765] worker_thread+0x35/0x3f0
    [86220.126790] kthread+0x124/0x140
    [86220.126813] ? process_one_work+0x680/0x680
    [86220.126839] ? kthread_create_worker_on_cpu+0x40/0x40
    [86220.126869] ? umh_complete+0x40/0x40
    [86220.126893] ? call_usermodehelper_exec_async+0x12a/0x160
    [86220.126926] ret_from_fork+0x4b/0x60
    [86220.126999] BUG: sleeping function called from invalid context at mm/slab.h:420
    [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2
    [86220.127082] 4 locks held by kworker/0:2/1003:
    [86220.127107] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [] process_one_work+0x1de/0x680
    [86220.127179] #1: ((addr_chk_work).work){+.+.}, at: [] process_one_work+0x1de/0x680
    [86220.127242] #2: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x12/0x20
    [86220.127300] #3: (rcu_read_lock_bh){....}, at: [] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
    [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
    [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
    [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
    [86220.127568] Call Trace:
    [86220.127591] dump_stack+0x70/0x9e
    [86220.127616] ___might_sleep+0x14d/0x240
    [86220.127644] __might_sleep+0x45/0x80
    [86220.127672] kmem_cache_alloc_trace+0x53/0x250
    [86220.127717] ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
    [86220.127762] ipv6_add_addr+0xfe/0x6e0 [ipv6]
    [86220.127807] ipv6_create_tempaddr+0x24d/0x430 [ipv6]
    [86220.127854] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
    [86220.127903] addrconf_verify_rtnl+0x339/0x510 [ipv6]
    [86220.127950] ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
    [86220.127998] addrconf_verify_work+0xe/0x20 [ipv6]
    [86220.128032] process_one_work+0x258/0x680
    [86220.128063] worker_thread+0x35/0x3f0
    [86220.128091] kthread+0x124/0x140
    [86220.128117] ? process_one_work+0x680/0x680
    [86220.128146] ? kthread_create_worker_on_cpu+0x40/0x40
    [86220.128180] ? umh_complete+0x40/0x40
    [86220.128207] ? call_usermodehelper_exec_async+0x12a/0x160
    [86220.128243] ret_from_fork+0x4b/0x60

    Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
    Signed-off-by: Eric Dumazet
    Reported-by: Heiner Kallweit
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Unsolicited IPv6 neighbor advertisements should be sent after DAD
    completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic
    addresses and have those sent by addrconf_dad_completed after DAD.

    Fixes: 4a6e3c5def13c ("net: ipv6: send unsolicited NA on admin up")
    Reported-by: Vivek Venkatraman
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

17 Jan, 2018

1 commit

  • /proc has been ignoring struct file_operations::owner field for 10 years.
    Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
    ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
    inode->i_fop is initialized with proxy struct file_operations for
    regular files:

    - if (de->proc_fops)
    - inode->i_fop = de->proc_fops;
    + if (de->proc_fops) {
    + if (S_ISREG(inode->i_mode))
    + inode->i_fop = &proc_reg_file_ops;
    + else
    + inode->i_fop = de->proc_fops;
    + }

    VFS stopped pinning module at this point.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

08 Jan, 2018

3 commits

  • Similar to IPv4, when the carrier of a netdev changes we should toggle
    the 'linkdown' flag on all the nexthops using it as their nexthop
    device.

    This will later allow us to test for the presence of this flag during
    route lookup and dump.

    Up until commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
    device with address") host and anycast routes used the loopback netdev
    as their nexthop device and thus were not marked with the 'linkdown'
    flag. The patch preserves this behavior and allows one to ping the local
    address even when the nexthop device does not have a carrier and the
    'ignore_routes_with_linkdown' sysctl is set.

    Signed-off-by: Ido Schimmel
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • To make IPv6 more in line with IPv4 we need to be able to respond
    differently to different netdev events. For example, when a netdev is
    unregistered all the routes using it as their nexthop device should be
    flushed, whereas when the netdev's carrier changes only the 'linkdown'
    flag should be toggled.

    Currently, this is not possible, as the function that traverses the
    routing tables is not aware of the triggering event.

    Propagate the triggering event down, so that it could be used in later
    patches.

    Signed-off-by: Ido Schimmel
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Previous patch marked nexthops with the 'dead' and 'linkdown' flags.
    Clear these flags when the netdev comes back up.

    Signed-off-by: Ido Schimmel
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     

05 Dec, 2017

2 commits


22 Nov, 2017

1 commit

  • This converts all remaining cases of the old setup_timer() API into using
    timer_setup(), where the callback argument is the structure already
    holding the struct timer_list. These should have no behavioral changes,
    since they just change which pointer is passed into the callback with
    the same available pointers after conversion. It handles the following
    examples, in addition to some other variations.

    Casting from unsigned long:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

    and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

    become:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    Direct function assignments:

    void my_callback(unsigned long data)
    {
    struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

    have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
    struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

    And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

    have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

    The conversion is done with the following Coccinelle script:

    spatch --very-quiet --all-includes --include-headers \
    -I ./arch/x86/include -I ./arch/x86/include/generated \
    -I ./include -I ./arch/x86/include/uapi \
    -I ./arch/x86/include/generated/uapi -I ./include/uapi \
    -I ./include/generated/uapi --include ./include/linux/kconfig.h \
    --dir . \
    --cocci-file ~/src/data/timer_setup.cocci

    @fix_address_of@
    expression e;
    @@

    setup_timer(
    -&(e)
    +&e
    , ...)

    // Update any raw setup_timer() usages that have a NULL callback, but
    // would otherwise match change_timer_function_usage, since the latter
    // will update all function assignments done in the face of a NULL
    // function initialization in setup_timer().
    @change_timer_function_usage_NULL@
    expression _E;
    identifier _timer;
    type _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, NULL, _E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
    +timer_setup(&_E->_timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, &_E);
    +timer_setup(&_E._timer, NULL, 0);
    |
    -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
    +timer_setup(&_E._timer, NULL, 0);
    )

    @change_timer_function_usage@
    expression _E;
    identifier _timer;
    struct timer_list _stl;
    identifier _callback;
    type _cast_func, _cast_data;
    @@

    (
    -setup_timer(&_E->_timer, _callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
    +timer_setup(&_E._timer, _callback, 0);
    |
    _E->_timer@_stl.function = _callback;
    |
    _E->_timer@_stl.function = &_callback;
    |
    _E->_timer@_stl.function = (_cast_func)_callback;
    |
    _E->_timer@_stl.function = (_cast_func)&_callback;
    |
    _E._timer@_stl.function = _callback;
    |
    _E._timer@_stl.function = &_callback;
    |
    _E._timer@_stl.function = (_cast_func)_callback;
    |
    _E._timer@_stl.function = (_cast_func)&_callback;
    )

    // callback(unsigned long arg)
    @change_callback_handle_cast
    depends on change_timer_function_usage@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    (
    ... when != _origarg
    _handletype *_handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(_handletype *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    |
    ... when != _origarg
    _handletype *_handle;
    ... when != _handle
    _handle =
    -(void *)_origarg;
    +from_timer(_handle, t, _timer);
    ... when != _origarg
    )
    }

    // callback(unsigned long arg) without existing variable
    @change_callback_handle_cast_no_arg
    depends on change_timer_function_usage &&
    !change_callback_handle_cast@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _origtype;
    identifier _origarg;
    type _handletype;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *t
    )
    {
    + _handletype *_origarg = from_timer(_origarg, t, _timer);
    +
    ... when != _origarg
    - (_handletype *)_origarg
    + _origarg
    ... when != _origarg
    }

    // Avoid already converted callbacks.
    @match_callback_converted
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    { ... }

    // callback(struct something *handle)
    @change_callback_handle_arg
    depends on change_timer_function_usage &&
    !match_callback_converted &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    @@

    void _callback(
    -_handletype *_handle
    +struct timer_list *t
    )
    {
    + _handletype *_handle = from_timer(_handle, t, _timer);
    ...
    }

    // If change_callback_handle_arg ran on an empty function, remove
    // the added handler.
    @unchange_callback_handle_arg
    depends on change_timer_function_usage &&
    change_callback_handle_arg@
    identifier change_timer_function_usage._callback;
    identifier change_timer_function_usage._timer;
    type _handletype;
    identifier _handle;
    identifier t;
    @@

    void _callback(struct timer_list *t)
    {
    - _handletype *_handle = from_timer(_handle, t, _timer);
    }

    // We only want to refactor the setup_timer() data argument if we've found
    // the matching callback. This undoes changes in change_timer_function_usage.
    @unchange_timer_function_usage
    depends on change_timer_function_usage &&
    !change_callback_handle_cast &&
    !change_callback_handle_cast_no_arg &&
    !change_callback_handle_arg@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type change_timer_function_usage._cast_data;
    @@

    (
    -timer_setup(&_E->_timer, _callback, 0);
    +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
    |
    -timer_setup(&_E._timer, _callback, 0);
    +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
    )

    // If we fixed a callback from a .function assignment, fix the
    // assignment cast now.
    @change_timer_function_assignment
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression change_timer_function_usage._E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_func;
    typedef TIMER_FUNC_TYPE;
    @@

    (
    _E->_timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E->_timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -&_callback;
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    |
    _E._timer.function =
    -(_cast_func)&_callback
    +(TIMER_FUNC_TYPE)_callback
    ;
    )

    // Sometimes timer functions are called directly. Replace matched args.
    @change_timer_function_calls
    depends on change_timer_function_usage &&
    (change_callback_handle_cast ||
    change_callback_handle_cast_no_arg ||
    change_callback_handle_arg)@
    expression _E;
    identifier change_timer_function_usage._timer;
    identifier change_timer_function_usage._callback;
    type _cast_data;
    @@

    _callback(
    (
    -(_cast_data)_E
    +&_E->_timer
    |
    -(_cast_data)&_E
    +&_E._timer
    |
    -_E
    +&_E->_timer
    )
    )

    // If a timer has been configured without a data argument, it can be
    // converted without regard to the callback argument, since it is unused.
    @match_timer_function_unused_data@
    expression _E;
    identifier _timer;
    identifier _callback;
    @@

    (
    -setup_timer(&_E->_timer, _callback, 0);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0L);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E->_timer, _callback, 0UL);
    +timer_setup(&_E->_timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0L);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_E._timer, _callback, 0UL);
    +timer_setup(&_E._timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0L);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(&_timer, _callback, 0UL);
    +timer_setup(&_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0L);
    +timer_setup(_timer, _callback, 0);
    |
    -setup_timer(_timer, _callback, 0UL);
    +timer_setup(_timer, _callback, 0);
    )

    @change_callback_unused_data
    depends on match_timer_function_unused_data@
    identifier match_timer_function_unused_data._callback;
    type _origtype;
    identifier _origarg;
    @@

    void _callback(
    -_origtype _origarg
    +struct timer_list *unused
    )
    {
    ... when != _origarg
    }

    Signed-off-by: Kees Cook

    Kees Cook
     

15 Nov, 2017

1 commit

  • With commits 35e015e1f577 and a2d3f3e33853, the global 'accept_dad' flag
    is also taken into account (default value is 1). If either global or
    per-interface flag is non-zero, DAD will be enabled on a given interface.

    This is not backward compatible: before those patches, the user could
    disable DAD just by setting the per-interface flag to 0. Now, the
    user instead needs to set both flags to 0 to actually disable DAD.

    Restore the previous behaviour by setting the default for the global
    'accept_dad' flag to 0. This way, DAD is still enabled by default,
    as per-interface flags are set to 1 on device creation, but setting
    them to 0 is enough to disable DAD on a given interface.

    - Before 35e015e1f57a7 and a2d3f3e33853:
    global per-interface DAD enabled
    [default] 1 1 yes
    X 0 no
    X 1 yes

    - After 35e015e1f577 and a2d3f3e33853:
    global per-interface DAD enabled
    [default] 1 1 yes
    0 0 no
    0 1 yes
    1 0 yes

    - After this fix:
    global per-interface DAD enabled
    1 1 yes
    0 0 no
    [default] 0 1 yes
    1 0 yes

    Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
    Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real")
    CC: Stefano Brivio
    CC: Matteo Croce
    CC: Erik Kline
    Signed-off-by: Nicolas Dichtel
    Acked-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

11 Nov, 2017

1 commit

  • Add a per-device sysctl to specify the default traffic class to use for
    kernel originated IPv6 Neighbour Discovery packets.

    Currently this includes:

    - Router Solicitation (ICMPv6 type 133)
    ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()

    - Neighbour Solicitation (ICMPv6 type 135)
    ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()

    - Neighbour Advertisement (ICMPv6 type 136)
    ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()

    - Redirect (ICMPv6 type 137)
    ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()

    and if the kernel ever gets around to generating RA's,
    it would presumably also include:

    - Router Advertisement (ICMPv6 type 134)
    (radvd daemon could pick up on the kernel setting and use it)

    Interface drivers may examine the Traffic Class value and translate
    the DiffServ Code Point into a link-layer appropriate traffic
    prioritization scheme. An example of mapping IETF DSCP values to
    IEEE 802.11 User Priority values can be found here:

    https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11

    The expected primary use case is to properly prioritize ND over wifi.

    Testing:
    jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    0
    jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    -bash: echo: write error: Invalid argument
    jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    -bash: echo: write error: Invalid argument
    jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    255
    jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    34

    jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
    jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
    tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
    IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
    jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
    length 24, tgt is jzem22.pgc, Flags [solicited]

    (based on original change written by Erik Kline, with minor changes)

    v2: fix 'suspicious rcu_dereference_check() usage'
    by explicitly grabbing the rcu_read_lock.

    Cc: Lorenzo Colitti
    Signed-off-by: Erik Kline
    Signed-off-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Maciej Żenczykowski
     

07 Nov, 2017

1 commit

  • Fixes a case where GFP_ATOMIC allocation must be used instead of
    GFP_KERNEL one.

    [ 54.891146] lock_acquire+0xb3/0x2f0
    [ 54.891153] ? fs_reclaim_acquire.part.60+0x5/0x30
    [ 54.891165] fs_reclaim_acquire.part.60+0x29/0x30
    [ 54.891170] ? fs_reclaim_acquire.part.60+0x5/0x30
    [ 54.891178] kmem_cache_alloc_trace+0x3f/0x500
    [ 54.891186] ? cyc2ns_read_end+0x1e/0x30
    [ 54.891196] ipv6_add_addr+0x15a/0xc30
    [ 54.891217] ? ipv6_create_tempaddr+0x2ea/0x5d0
    [ 54.891223] ipv6_create_tempaddr+0x2ea/0x5d0
    [ 54.891238] ? manage_tempaddrs+0x195/0x220
    [ 54.891249] ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0
    [ 54.891255] addrconf_prefix_rcv_add_addr+0x1c0/0x4f0
    [ 54.891268] addrconf_prefix_rcv+0x2e5/0x9b0
    [ 54.891279] ? neigh_update+0x446/0xb90
    [ 54.891298] ? ndisc_router_discovery+0x5ab/0xf00
    [ 54.891303] ndisc_router_discovery+0x5ab/0xf00
    [ 54.891311] ? retint_kernel+0x2d/0x2d
    [ 54.891331] ndisc_rcv+0x1b6/0x270
    [ 54.891340] icmpv6_rcv+0x6aa/0x9f0
    [ 54.891345] ? ipv6_chk_mcast_addr+0x176/0x530
    [ 54.891351] ? do_csum+0x17b/0x260
    [ 54.891360] ip6_input_finish+0x194/0xb20
    [ 54.891372] ip6_input+0x5b/0x2c0
    [ 54.891380] ? ip6_rcv_finish+0x320/0x320
    [ 54.891389] ip6_mc_input+0x15a/0x250
    [ 54.891396] ipv6_rcv+0x772/0x1050
    [ 54.891403] ? consume_skb+0xbe/0x2d0
    [ 54.891412] ? ip6_make_skb+0x2a0/0x2a0
    [ 54.891418] ? ip6_input+0x2c0/0x2c0
    [ 54.891425] __netif_receive_skb_core+0xa0f/0x1600
    [ 54.891436] ? process_backlog+0xac/0x400
    [ 54.891441] process_backlog+0xfa/0x400
    [ 54.891448] ? net_rx_action+0x145/0x1130
    [ 54.891456] net_rx_action+0x310/0x1130
    [ 54.891524] ? RTUSBBulkReceive+0x11d/0x190 [mt7610u_sta]
    [ 54.891538] __do_softirq+0x140/0xaba
    [ 54.891553] irq_exit+0x10b/0x160
    [ 54.891561] do_IRQ+0xbb/0x1b0

    Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
    Signed-off-by: Eric Dumazet
    Reported-by: Valdis Kletnieks
    Acked-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Nov, 2017

1 commit


02 Nov, 2017

1 commit


01 Nov, 2017

2 commits

  • In the (unlikely) event fixup_permanent_addr() returns a failure,
    addrconf_permanent_addr() calls ipv6_del_addr() without the
    mandatory call to in6_ifa_hold(), leading to a refcount error,
    spotted by syzkaller :

    WARNING: CPU: 1 PID: 3142 at lib/refcount.c:227 refcount_dec+0x4c/0x50
    lib/refcount.c:227
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 1 PID: 3142 Comm: ip Not tainted 4.14.0-rc4-next-20171009+ #33
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:16 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:52
    panic+0x1e4/0x41c kernel/panic.c:181
    __warn+0x1c4/0x1e0 kernel/panic.c:544
    report_bug+0x211/0x2d0 lib/bug.c:183
    fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
    do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
    do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
    do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
    do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
    invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
    RIP: 0010:refcount_dec+0x4c/0x50 lib/refcount.c:227
    RSP: 0018:ffff8801ca49e680 EFLAGS: 00010286
    RAX: 000000000000002c RBX: ffff8801d07cfcdc RCX: 0000000000000000
    RDX: 000000000000002c RSI: 1ffff10039493c90 RDI: ffffed0039493cc4
    RBP: ffff8801ca49e688 R08: ffff8801ca49dd70 R09: 0000000000000000
    R10: ffff8801ca49df58 R11: 0000000000000000 R12: 1ffff10039493cd9
    R13: ffff8801ca49e6e8 R14: ffff8801ca49e7e8 R15: ffff8801d07cfcdc
    __in6_ifa_put include/net/addrconf.h:369 [inline]
    ipv6_del_addr+0x42b/0xb60 net/ipv6/addrconf.c:1208
    addrconf_permanent_addr net/ipv6/addrconf.c:3327 [inline]
    addrconf_notify+0x1c66/0x2190 net/ipv6/addrconf.c:3393
    notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1697
    call_netdevice_notifiers net/core/dev.c:1715 [inline]
    __dev_notify_flags+0x15d/0x430 net/core/dev.c:6843
    dev_change_flags+0xf5/0x140 net/core/dev.c:6879
    do_setlink+0xa1b/0x38e0 net/core/rtnetlink.c:2113
    rtnl_newlink+0xf0d/0x1a40 net/core/rtnetlink.c:2661
    rtnetlink_rcv_msg+0x733/0x1090 net/core/rtnetlink.c:4301
    netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2408
    rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4313
    netlink_unicast_kernel net/netlink/af_netlink.c:1273 [inline]
    netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1299
    netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1862
    sock_sendmsg_nosec net/socket.c:633 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:643
    ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2049
    __sys_sendmsg+0xe5/0x210 net/socket.c:2083
    SYSC_sendmsg net/socket.c:2094 [inline]
    SyS_sendmsg+0x2d/0x50 net/socket.c:2090
    entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x7fa9174d3320
    RSP: 002b:00007ffe302ae9e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007ffe302b2ae0 RCX: 00007fa9174d3320
    RDX: 0000000000000000 RSI: 00007ffe302aea20 RDI: 0000000000000016
    RBP: 0000000000000082 R08: 0000000000000000 R09: 000000000000000f
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe302b32a0
    R13: 0000000000000000 R14: 00007ffe302b2ab8 R15: 00007ffe302b32b8

    Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
    Signed-off-by: Eric Dumazet
    Cc: David Ahern
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch updates the error messages displayed in kernel log to include
    hwaddress of the source machine that caused ipv6 duplicate address
    detection failures.

    Examples:

    a) When we receive a NA packet from another machine advertising our
    address:

    ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2001:db8:: on eth0!

    b) When we detect DAD failure during address assignment to an interface:

    IPv6: eth0: IPv6 duplicate address 2001:db8:: used by 34:ab:cd:56:11:e8
    detected!

    v2:
    Changed %pI6 to %pI6c in ndisc_recv_na()
    Chaged the v6 address in the commit message to 2001:db8::

    Suggested-by: Igor Lubashev
    Signed-off-by: Vishwanath Pai
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Vishwanath Pai
     

24 Oct, 2017

7 commits


20 Oct, 2017

3 commits

  • Add extack to in_validator_info and in6_validator_info. Update the one
    user of each, ipvlan, to return an error message for failures.

    Only manual configuration of an address is plumbed in the IPv6 code path.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     
  • inet6addr_validator chain was added by commit 3ad7d2468f79f ("Ipvlan
    should return an error when an address is already in use") to allow
    address validation before changes are committed and to be able to
    fail the address change with an error back to the user. The address
    validation is not done for addresses received from router
    advertisements.

    Handling RAs in softirq context is the only reason for the notifier
    chain to be atomic versus blocking. Since the only current user, ipvlan,
    of the validator chain ignores softirq context, the notifier can be made
    blocking and simply not invoked for softirq path.

    The blocking option is needed by spectrum for example to validate
    resources for an adding an address to an interface.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     
  • ipv6_add_addr is called in process context with rtnl lock held
    (e.g., manual config of an address) or during softirq processing
    (e.g., autoconf and address from a router advertisement).

    Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry
    and does not call unlock until exit, minus the call around the address
    validator notifier. Similarly, addrconf_hash_lock is taken after the
    validator notifier and held until exit. This forces the allocation of
    inet6_ifaddr to always be atomic.

    Refactor ipv6_add_addr as follows:
    1. add an input boolean to discriminate the call path (process context
    or softirq). This new flag controls whether the alloc can be done
    with GFP_KERNEL or GFP_ATOMIC.

    2. Move the rcu_read_lock_bh and unlock calls only around functions that
    do rcu updates.

    3. Remove the in6_dev_hold and put added by 3ad7d2468f79f ("Ipvlan should
    return an error when an address is already in use."). This was done
    presumably because rcu_read_unlock_bh needs to be called before calling
    the validator. Since rcu_read_lock is not needed before the validator
    runs revert the hold and put added by 3ad7d2468f79f and only do the
    hold when setting ifp->idev.

    4. move duplicate address check and insertion of new address in the global
    address hash into a helper. The helper is called after an ifa is
    allocated and filled in.

    This allows the ifa for manually configured addresses to be done with
    GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held
    and hash table spinlock held.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

12 Oct, 2017

2 commits


10 Oct, 2017

1 commit


09 Oct, 2017

3 commits