18 Mar, 2015

1 commit

  • as a follow on to patch 70006af95515 ("bpf: allow eBPF access skb fields")
    this patch allows 'protocol' and 'vlan_tci' fields to be accessible
    from extended BPF programs.

    The usage of 'protocol', 'vlan_present' and 'vlan_tci' fields is the same as
    corresponding SKF_AD_PROTOCOL, SKF_AD_VLAN_TAG_PRESENT and SKF_AD_VLAN_TAG
    accesses in classic BPF.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

17 Mar, 2015

2 commits

  • reqsk_put() is the generic function that should be used
    to release a refcount (and automatically call reqsk_free())

    reqsk_free() might be called if refcount is known to be 0
    or undefined.

    refcnt is set to one in inet_csk_reqsk_queue_add()

    As request socks are not yet in global ehash table,
    I added temporary debugging checks in reqsk_put() and reqsk_free()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We have many places where we want to check if a socket is
    not a timewait or request socket. Use a helper to avoid
    hard coding this.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Mar, 2015

5 commits

  • Signed-off-by: Scott Feldman
    Signed-off-by: David S. Miller

    Scott Feldman
     
  • As discussed at netconf, introduce swdev_ops as first step to move switchdev
    ops from ndo to swdev. This will keep switchdev from cluttering up ndo ops
    space.

    Signed-off-by: Scott Feldman
    Signed-off-by: David S. Miller

    Scott Feldman
     
  • introduce user accessible mirror of in-kernel 'struct sk_buff':
    struct __sk_buff {
    __u32 len;
    __u32 pkt_type;
    __u32 mark;
    __u32 queue_mapping;
    };

    bpf programs can do:

    int bpf_prog(struct __sk_buff *skb)
    {
    __u32 var = skb->pkt_type;

    which will be compiled to bpf assembler as:

    dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type)

    bpf verifier will check validity of access and will convert it to:

    dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset))
    dst_reg &= 7

    since skb->pkt_type is a bitfield.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • This patch adds the possibility to obtain raw_smp_processor_id() in
    eBPF. Currently, this is only possible in classic BPF where commit
    da2033c28226 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added
    facilities for this.

    Perhaps most importantly, this would also allow us to track per CPU
    statistics with eBPF maps, or to implement a poor-man's per CPU data
    structure through eBPF maps.

    Example function proto-type looks like:

    u32 (*smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id;

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This work is similar to commit 4cd3675ebf74 ("filter: added BPF
    random opcode") and adds a possibility for packet sampling in eBPF.

    Currently, this is only possible in classic BPF and useful to
    combine sampling with f.e. packet sockets, possible also with tc.

    Example function proto-type looks like:

    u32 (*prandom_u32)(void) = (void *)BPF_FUNC_get_prandom_u32;

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

15 Mar, 2015

6 commits

  • This patch moves future_tbl to open up the possibility of having
    multiple rehashes on the same table.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds a rehash counter to bucket_table to indicate
    the last bucket that has been rehashed. This serves two purposes:

    1. Any bucket that has been rehashed can never gain a new object.
    2. If the rehash counter reaches the size of the table, the table
    will forever remain empty.

    This patch also downsizes bucket_table->size to an unsigned int
    since we do not support sizes greater than 32 bits yet.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • There is in fact no need to wait for an RCU grace period in the
    rehash function, since all insertions are guaranteed to go into
    the new table through spin locks.

    This patch uses call_rcu to free the old/rehashed table at our
    leisure.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Previously whenever the walker encountered a resize it simply
    snaps back to the beginning and starts again. However, this only
    works if the rehash started and completed while the walker was
    idle.

    If the walker attempts to restart while the rehash is still ongoing,
    we may miss objects that we shouldn't have.

    This patch fixes this by making the walker walk the old table
    followed by the new table just like all other readers. If a
    rehash is detected we will still signal our caller of the fact
    so they can prepare for duplicates but we will simply continue
    the walk onto the new table after the old one is finished either
    by us or by the rehasher.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • …etooth/bluetooth-next

    Johan Hedberg says:

    ====================
    Here's another set of Bluetooth & ieee802154 patches intended for 4.1:

    - Added support for QCA ROME chipset family in the btusb driver
    - at86rf230 driver fixes & cleanups
    - ieee802154 cleanups
    - Refactoring of Bluetooth mgmt API to allow new users
    - New setting for static Bluetooth address exposed to user space
    - Refactoring of hci_dev flags to remove limit of 32
    - Remove unnecessary fast-connectable setting usage restrictions
    - Fix behavior to be consistent when trying to pair already paired device
    - Service discovery corner-case fixes

    Please let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • This patch fix the max sifs size correction when the
    IEEE802154_HW_TX_OMIT_CKSUM flag is set. With this flag the sk_buff
    doesn't contain the CRC, because the transceiver will add the CRC
    while transmit.

    Also add some defines for the max sifs frame size value and frame check
    sequence according to 802.15.4 standard.

    Signed-off-by: Alexander Aring
    Acked-by: Marc Kleine-Budde
    Signed-off-by: Marcel Holtmann

    Alexander Aring
     

14 Mar, 2015

2 commits


13 Mar, 2015

19 commits

  • Instead of manually coding test_and_set_bit on hdev->dev_flags all the
    time, use hci_dev_test_and_set_flag helper macro.

    Signed-off-by: Marcel Holtmann
    Signed-off-by: Johan Hedberg

    Marcel Holtmann
     
  • Instead of manually coding test_and_clear_bit on hdev->dev_flags all the
    time, use hci_dev_test_and_clear_flag helper macro.

    Signed-off-by: Marcel Holtmann
    Signed-off-by: Johan Hedberg

    Marcel Holtmann
     
  • Instead of manually coding test_and_change_bit on hdev->dev_flags all the
    time, use hci_dev_test_and_change_flag helper macro.

    Signed-off-by: Marcel Holtmann
    Signed-off-by: Johan Hedberg

    Marcel Holtmann
     
  • Instead of manually coding change_bit on hdev->dev_flags all the time,
    use hci_dev_change_flag helper macro.

    Signed-off-by: Marcel Holtmann
    Signed-off-by: Johan Hedberg

    Marcel Holtmann
     
  • Instead of manually coding clear_bit on hdev->dev_flags all the time,
    use hci_dev_clear_flag helper macro.

    Signed-off-by: Marcel Holtmann
    Signed-off-by: Johan Hedberg

    Marcel Holtmann
     
  • Instead of manually coding set_bit on hdev->dev_flags all the time,
    use hci_dev_set_flag helper macro.

    Signed-off-by: Marcel Holtmann
    Signed-off-by: Johan Hedberg

    Marcel Holtmann
     
  • Instead of manually coding test_bit on hdev->dev_flags all the time,
    use hci_dev_test_flag helper macro.

    Signed-off-by: Marcel Holtmann
    Signed-off-by: Johan Hedberg

    Marcel Holtmann
     
  • The patch adds a second advertising setting that allows switching of the
    controller into connectable mode independent of the global connectable
    setting.

    Signed-off-by: Marcel Holtmann
    Signed-off-by: Johan Hedberg

    Marcel Holtmann
     
  • Now that all of the operations are safe on a single hash table
    accross network namespaces, allocate a single global hash table
    and update the code to use it.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Commit c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for worker
    queue") changed ht->shift to be atomic, which is actually unnecessary.

    Instead of leaving the current shift in the core rhashtable structure,
    it can be cached inside the individual bucket tables.

    There, it will only be initialized once during a new table allocation
    in the shrink/expansion slow path, and from then onward it stays immutable
    for the rest of the bucket table liftime.

    That allows shift to be non-atomic. The patch also moves hash_rnd
    management into the table setup. The rhashtable structure now consumes
    3 instead of 4 cachelines.

    Signed-off-by: Daniel Borkmann
    Cc: Ying Xue
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Before inserting request socks into general hash table,
    fill their socket family.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • sock_edemux() & sock_gen_put() should be ready to cope with request socks.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When request socks will be in ehash, they'll need to be refcounted.

    This patch adds rsk_refcnt/ireq_refcnt macros, and adds
    reqsk_put() function, but nothing yet use them.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We need to identify request sock when they'll be visible in
    global ehash table.

    ireq_state is an alias to req.__req_common.skc_state.

    Its value is set to TCP_NEW_SYN_RECV

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • TCP_SYN_RECV state is currently used by fast open sockets.

    Initial TCP requests (the pseudo sockets created when a SYN is received)
    are not yet associated to a state. They are attached to their parent,
    and the parent is in TCP_LISTEN state.

    This commit adds TCP_NEW_SYN_RECV state, so that we can convert
    TCP stack to a different schem gradually.

    This state is not exported to user space.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • I forgot to update dccp_v6_conn_request() & cookie_v6_check().
    They both need to set ireq->ireq_net and ireq->ir_cookie

    Lets clear ireq->ir_cookie in inet_reqsk_alloc()

    Signed-off-by: Eric Dumazet
    Fixes: 33cf7c90fe2f ("net: add real socket cookies")
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • I noticed that a helper function with argument type ARG_ANYTHING does
    not need to have an initialized value (register).

    This can worst case lead to unintented stack memory leakage in future
    helper functions if they are not carefully designed, or unintended
    application behaviour in case the application developer was not careful
    enough to match a correct helper function signature in the API.

    The underlying issue is that ARG_ANYTHING should actually be split
    into two different semantics:

    1) ARG_DONTCARE for function arguments that the helper function
    does not care about (in other words: the default for unused
    function arguments), and

    2) ARG_ANYTHING that is an argument actually being used by a
    helper function and *guaranteed* to be an initialized register.

    The current risk is low: ARG_ANYTHING is only used for the 'flags'
    argument (r4) in bpf_map_update_elem() that internally does strict
    checking.

    Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Having to say
    > #ifdef CONFIG_NET_NS
    > struct net *net;
    > #endif

    in structures is a little bit wordy and a little bit error prone.

    Instead it is possible to say:
    > typedef struct {
    > #ifdef CONFIG_NET_NS
    > struct net *net;
    > #endif
    > } possible_net_t;

    And then in a header say:

    > possible_net_t net;

    Which is cleaner and easier to use and easier to test, as the
    possible_net_t is always there no matter what the compile options.

    Further this allows read_pnet and write_pnet to be functions in all
    cases which is better at catching typos.

    This change adds possible_net_t, updates the definitions of read_pnet
    and write_pnet, updates optional struct net * variables that
    write_pnet uses on to have the type possible_net_t, and finally fixes
    up the b0rked users of read_pnet and write_pnet.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • hold_net and release_net were an idea that turned out to be useless.
    The code has been disabled since 2008. Kill the code it is long past due.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

12 Mar, 2015

5 commits

  • This makes it possible to retain the route preference when RAs are handled in
    userspace.

    Signed-off-by: Lubomir Rintel
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Lubomir Rintel
     
  • Flags are used in the return path rather than the return patch.

    Fixes: af33c1adae1e ("vxlan: Eliminate dependency on UDP socket in transmit path")
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     
  • A long standing problem in netlink socket dumps is the use
    of kernel socket addresses as cookies.

    1) It is a security concern.

    2) Sockets can be reused quite quickly, so there is
    no guarantee a cookie is used once and identify
    a flow.

    3) request sock, establish sock, and timewait socks
    for a given flow have different cookies.

    Part of our effort to bring better TCP statistics requires
    to switch to a different allocator.

    In this patch, I chose to use a per network namespace 64bit generator,
    and to use it only in the case a socket needs to be dumped to netlink.
    (This might be refined later if needed)

    Note that I tried to carry cookies from request sock, to establish sock,
    then timewait sockets.

    Signed-off-by: Eric Dumazet
    Cc: Eric Salo
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Export of_mdio_parse_addr() which allows parsing a given Ethernet PHY
    node MDIO address, verify it is within the allowed range, and return
    its value. This is going to be useful for the DSA code which needs to
    deal with multiple layers of MDIO buses.

    Signed-off-by: Florian Fainelli
    Acked-by: Rob Herring
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Currently hash_rnd is a parameter that users can set. However,
    no existing users set this parameter. It is also something that
    people are unlikely to want to set directly since it's just a
    random number.

    In preparation for allowing the reseeding/rehashing of rhashtable,
    this patch moves hash_rnd into bucket_table so that it's now an
    internal state rather than a parameter.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu