28 May, 2015

1 commit


21 May, 2015

1 commit

  • As we're now always including the high bits of the sequence number
    in the IV generation process we need to ensure that they don't
    contain crap.

    This patch ensures that the high sequence bits are always zeroed
    so that we don't leak random data into the IV.

    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Herbert Xu
     

29 Apr, 2015

1 commit

  • The returned xfrm_state should be hold before unlock xfrm_state_lock,
    otherwise the returned xfrm_state maybe be released.

    Fixes: c454997e6[{pktgen, xfrm} Introduce xfrm_state_lookup_byspi..]
    Cc: Fan Du
    Signed-off-by: Li RongQing
    Acked-by: Fan Du
    Signed-off-by: Steffen Klassert

    Li RongQing
     

15 Apr, 2015

1 commit


10 Apr, 2015

1 commit


08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

07 Apr, 2015

1 commit

  • https://bugzilla.kernel.org/show_bug.cgi?id=95211

    Commit 70be6c91c86596ad2b60c73587880b47df170a41
    ("xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer") added check
    which dereferences ->outer_mode too early but larval SAs don't have
    this pointer set (yet). So check for tunnel stuff later.

    Mike Noordermeer reported this bug and patiently applied all the debugging.

    Technically this is remote-oops-in-interrupt-context type of thing.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
    IP: [] xfrm_input+0x3c2/0x5a0
    ...
    [] ? xfrm4_esp_rcv+0x36/0x70
    [] ? ip_local_deliver_finish+0x9a/0x200
    [] ? __netif_receive_skb_core+0x6f3/0x8f0
    ...

    RIP [] xfrm_input+0x3c2/0x5a0
    Kernel panic - not syncing: Fatal exception in interrupt

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Steffen Klassert

    Alexey Dobriyan
     

01 Apr, 2015

1 commit

  • In many places, the a6 field is typecasted to struct in6_addr. As the
    fields are in union anyway, just add in6_addr type to the union and
    get rid of the typecasting.

    Modifying the uapi header is okay, the union has still the same size.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

17 Mar, 2015

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2015-03-16

    1) Fix the network header offset in _decode_session6
    when multiple IPv6 extension headers are present.
    From Hajime Tazaki.

    2) Fix an interfamily tunnel crash. We set outer mode
    protocol too early and may dispatch to the wrong
    address family. Move the setting of the outer mode
    protocol behind the last accessing of the inner mode
    to fix the crash.

    3) Most callers of xfrm_lookup() expect that dst_orig
    is released on error. But xfrm_lookup_route() may
    need dst_orig to handle certain error cases. So
    introduce a flag that tells what should be done in
    case of error. From Huaibin Wang.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Mar, 2015

1 commit

  • structure like xfrm_usersa_info or xfrm_userpolicy_info
    has different sizeof when compiled as 32bits and 64bits
    due to not appending pack attribute in their definition.
    This will result in broken SA and SP information when user
    trying to configure them through netlink interface.

    Inform user land about this situation instead of keeping
    silent, the upper test scripts would behave accordingly.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     

12 Feb, 2015

1 commit

  • dst_orig should be released on error. Function like __xfrm_route_forward()
    expects that behavior.
    Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(),
    which expects the opposite.
    Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be
    done in case of error.

    Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions")
    Signed-off-by: huaibin Wang
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    huaibin Wang
     

18 Jan, 2015

1 commit

  • Contrary to common expectations for an "int" return, these functions
    return only a positive value -- if used correctly they cannot even
    return 0 because the message header will necessarily be in the skb.

    This makes the very common pattern of

    if (genlmsg_end(...) < 0) { ... }

    be a whole bunch of dead code. Many places also simply do

    return nlmsg_end(...);

    and the caller is expected to deal with it.

    This also commonly (at least for me) causes errors, because it is very
    common to write

    if (my_function(...))
    /* error condition */

    and if my_function() does "return nlmsg_end()" this is of course wrong.

    Additionally, there's not a single place in the kernel that actually
    needs the message length returned, and if anyone needs it later then
    it'll be very easy to just use skb->len there.

    Remove this, and make the functions void. This removes a bunch of dead
    code as described above. The patch adds lines because I did

    - return nlmsg_end(...);
    + nlmsg_end(...);
    + return 0;

    I could have preserved all the function's return values by returning
    skb->len, but instead I've audited all the places calling the affected
    functions and found that none cared. A few places actually compared
    the return value with < 0 with no change in behaviour, so I opted for the more
    efficient version.

    One instance of the error I've made numerous times now is also present
    in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
    check for
    Signed-off-by: David S. Miller

    Johannes Berg
     

13 Jan, 2015

1 commit


09 Dec, 2014

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2014-12-03

    1) Fix a set but not used warning. From Fabian Frederick.

    2) Currently we make sequence number values available to userspace
    only if we use ESN. Make the sequence number values also available
    for non ESN states. From Zhi Ding.

    3) Remove socket policy hashing. We don't need it because socket
    policies are always looked up via a linked list. From Herbert Xu.

    4) After removing socket policy hashing, we can use __xfrm_policy_link
    in xfrm_policy_insert. From Herbert Xu.

    5) Add a lookup method for vti6 tunnels with wildcard endpoints.
    I forgot this when I initially implemented vti6.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Nov, 2014

2 commits

  • For a long time we couldn't actually use __xfrm_policy_link in
    xfrm_policy_insert because the latter wanted to do hashing at
    a specific position.

    Now that __xfrm_policy_link no longer does hashing it can now
    be safely used in xfrm_policy_insert to kill some duplicate code,
    finally reuniting general policies with socket policies.

    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Herbert Xu
     
  • Back in 2003 when I added policy expiration, I half-heartedly
    did a clean-up and renamed xfrm_sk_policy_link/xfrm_sk_policy_unlink
    to __xfrm_policy_link/__xfrm_policy_unlink, because the latter
    could be reused for all policies. I never actually got around
    to using __xfrm_policy_link for non-socket policies.

    Later on hashing was added to all xfrm policies, including socket
    policies. In fact, we don't need hashing on socket policies at
    all since they're always looked up via a linked list.

    This patch restores xfrm_sk_policy_link/xfrm_sk_policy_unlink
    as wrappers around __xfrm_policy_link/__xfrm_policy_unlink so
    that it's obvious we're dealing with socket policies.

    This patch also removes hashing from __xfrm_policy_link as for
    now it's only used by socket policies which do not need to be
    hashed. Ironically this will in fact allow us to use this helper
    for non-socket policies which I shall do later.

    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Herbert Xu
     

03 Nov, 2014

1 commit

  • After this commit, the attribute XFRMA_REPLAY_VAL is added when no ESN replay
    value is defined. Thus sequence number values are always notified to userspace.

    Signed-off-by: dingzhi
    Signed-off-by: Adrien Mazarguil
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    dingzhi
     

31 Oct, 2014

1 commit

  • Some drivers are unable to perform TX completions in a bound time.
    They instead call skb_orphan()

    Problem is skb_fclone_busy() has to detect this case, otherwise
    we block TCP retransmits and can freeze unlucky tcp sessions on
    mostly idle hosts.

    Signed-off-by: Eric Dumazet
    Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues")
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Oct, 2014

1 commit


21 Oct, 2014

1 commit

  • skb_gso_segment has three possible return values:
    1. a pointer to the first segmented skb
    2. an errno value (IS_ERR())
    3. NULL. This can happen when GSO is used for header verification.

    However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
    and would oops when NULL is returned.

    Note that these call sites should never actually see such a NULL return
    value; all callers mask out the GSO bits in the feature argument.

    However, there have been issues with some protocol handlers erronously not
    respecting the specified feature mask in some cases.

    It is preferable to get 'have to turn off hw offloading, else slow' reports
    rather than 'kernel crashes'.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

02 Oct, 2014

1 commit

  • Lets use a proper structure to clearly document and implement
    skb fast clones.

    Then, we might experiment more easily alternative layouts.

    This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm,
    to stop leaking of implementation details.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Sep, 2014

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2014-09-25

    1) Remove useless hash_resize_mutex in xfrm_hash_resize().
    This mutex is used only there, but xfrm_hash_resize()
    can't be called concurrently at all. From Ying Xue.

    2) Extend policy hashing to prefixed policies based on
    prefix lenght thresholds. From Christophe Gouault.

    3) Make the policy hash table thresholds configurable
    via netlink. From Christophe Gouault.

    4) Remove the maximum authentication length for AH.
    This was needed to limit stack usage. We switched
    already to allocate space, so no need to keep the
    limit. From Herbert Xu.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Sep, 2014

1 commit


18 Sep, 2014

1 commit

  • While tracking down the MAX_AH_AUTH_LEN crash in an old kernel
    I thought that this limit was rather arbitrary and we should
    just get rid of it.

    In fact it seems that we've already done all the work needed
    to remove it apart from actually removing it. This limit was
    there in order to limit stack usage. Since we've already
    switched over to allocating scratch space using kmalloc, there
    is no longer any need to limit the authentication length.

    This patch kills all references to it, including the BUG_ONs
    that led me here.

    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Herbert Xu
     

16 Sep, 2014

2 commits

  • Currently we genarate a queueing route if we have matching policies
    but can not resolve the states and the sysctl xfrm_larval_drop is
    disabled. Here we assume that dst_output() is called to kill the
    queued packets. Unfortunately this assumption is not true in all
    cases, so it is possible that these packets leave the system unwanted.

    We fix this by generating queueing routes only from the
    route lookup functions, here we can guarantee a call to
    dst_output() afterwards.

    Fixes: a0073fe18e71 ("xfrm: Add a state resolution packet queue")
    Reported-by: Konstantinos Kolelis
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • Currently we genarate a blackhole route route whenever we have
    matching policies but can not resolve the states. Here we assume
    that dst_output() is called to kill the balckholed packets.
    Unfortunately this assumption is not true in all cases, so
    it is possible that these packets leave the system unwanted.

    We fix this by generating blackhole routes only from the
    route lookup functions, here we can guarantee a call to
    dst_output() afterwards.

    Fixes: 2774c131b1d ("xfrm: Handle blackhole route creation via afinfo.")
    Reported-by: Konstantinos Kolelis
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

10 Sep, 2014

1 commit


02 Sep, 2014

2 commits

  • Enable to specify local and remote prefix length thresholds for the
    policy hash table via a netlink XFRM_MSG_NEWSPDINFO message.

    prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and
    XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh).

    example:

    struct xfrmu_spdhthresh thresh4 = {
    .lbits = 0;
    .rbits = 24;
    };
    struct xfrmu_spdhthresh thresh6 = {
    .lbits = 0;
    .rbits = 56;
    };
    struct nlmsghdr *hdr;
    struct nl_msg *msg;

    msg = nlmsg_alloc();
    hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST);
    nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4);
    nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6);
    nla_send_auto(sk, msg);

    The numbers are the policy selector minimum prefix lengths to put a
    policy in the hash table.

    - lbits is the local threshold (source address for out policies,
    destination address for in and fwd policies).

    - rbits is the remote threshold (destination address for out
    policies, source address for in and fwd policies).

    The default values are:

    XFRMA_SPD_IPV4_HTHRESH: 32 32
    XFRMA_SPD_IPV6_HTHRESH: 128 128

    Dynamic re-building of the SPD is performed when the thresholds values
    are changed.

    The current thresholds can be read via a XFRM_MSG_GETSPDINFO request:
    the kernel replies to XFRM_MSG_GETSPDINFO requests by an
    XFRM_MSG_NEWSPDINFO message, with both attributes
    XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH.

    Signed-off-by: Christophe Gouault
    Signed-off-by: Steffen Klassert

    Christophe Gouault
     
  • The idea is an extension of the current policy hashing.

    Today only non-prefixed policies are stored in a hash table. This
    patch relaxes the constraints, and hashes policies whose prefix
    lengths are greater or equal to a configurable threshold.

    Each hash table (one per direction) maintains its own set of IPv4 and
    IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32,
    128, 128).

    Example, if the output hash table is configured with values (16, 24,
    56, 64):

    ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed
    ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed
    ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed

    ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ... => hashed
    ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ... => hashed
    ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ... => unhashed

    The high order bits of the addresses (up to the threshold) are used to
    compute the hash key.

    Signed-off-by: Christophe Gouault
    Signed-off-by: Steffen Klassert

    Christophe Gouault
     

29 Aug, 2014

1 commit

  • In xfrm_state.c, hash_resize_mutex is defined as a local variable
    and only used in xfrm_hash_resize() which is declared as a work
    handler of xfrm.state_hash_work. But when the xfrm.state_hash_work
    work is put in the global workqueue(system_wq) with schedule_work(),
    the work will be really inserted in the global workqueue if it was
    not already queued, otherwise, it is still left in the same position
    on the the global workqueue. This means the xfrm_hash_resize() work
    handler is only executed once at any time no matter how many times
    its work is scheduled, that is, xfrm_hash_resize() is not called
    concurrently at all, so hash_resize_mutex is redundant for us.

    Cc: Christophe Gouault
    Cc: Steffen Klassert
    Signed-off-by: Ying Xue
    Acked-by: David S. Miller
    Signed-off-by: Steffen Klassert

    Ying Xue
     

07 Aug, 2014

1 commit

  • All other add functions for lists have the new item as first argument
    and the position where it is added as second argument. This was changed
    for no good reason in this function and makes using it unnecessary
    confusing.

    The name was changed to hlist_add_behind() to cause unconverted code to
    generate a compile error instead of using the wrong parameter order.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Ken Helias
    Cc: "Paul E. McKenney"
    Acked-by: Jeff Kirsher [intel driver bits]
    Cc: Hugh Dickins
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Helias
     

30 Jun, 2014

1 commit

  • The SPI check introduced in ea9884b3acf3311c8a11db67bfab21773f6f82ba
    was intended for IPComp SAs but actually prevented AH SAs from getting
    installed (depending on the SPI).

    Fixes: ea9884b3acf3 ("xfrm: check user specified spi for IPComp")
    Cc: Fan Du
    Signed-off-by: Tobias Brunner
    Signed-off-by: Steffen Klassert

    Tobias Brunner
     

26 Jun, 2014

1 commit

  • xfrm_lookup must return a dst_entry with a refcount for the caller.
    Git commit 1a1ccc96abb ("xfrm: Remove caching of xfrm_policy_sk_bundles")
    removed this refcount for the socket policy case accidentally.
    This patch restores it and sets DST_NOCACHE flag to make sure
    that the dst_entry is freed when the refcount becomes null.

    Fixes: 1a1ccc96abb ("xfrm: Remove caching of xfrm_policy_sk_bundles")
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

04 Jun, 2014

2 commits

  • Conflicts:
    include/net/inetpeer.h
    net/ipv6/output_core.c

    Changes in net were fixing bugs in code removed in net-next.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The xfrm_user module registers its pernet init/exit after xfrm
    itself so that its net exit function xfrm_user_net_exit() is
    executed before xfrm_net_exit() which calls xfrm_state_fini() to
    cleanup the SA's (xfrm states). This opens a window between
    zeroing net->xfrm.nlsk pointer and deleting all xfrm_state
    instances which may access it (via the timer). If an xfrm state
    expires in this window, xfrm_exp_state_notify() will pass null
    pointer as socket to nlmsg_multicast().

    As the notifications are called inside rcu_read_lock() block, it
    is sufficient to retrieve the nlsk socket with rcu_dereference()
    and check the it for null.

    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Michal Kubecek
     

23 May, 2014

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2014-05-22

    This is the last ipsec pull request before I leave for
    a three weeks vacation tomorrow. David, can you please
    take urgent ipsec patches directly into net/net-next
    during this time?

    I'll continue to run the ipsec/ipsec-next trees as soon
    as I'm back.

    1) Simplify the xfrm audit handling, from Tetsuo Handa.

    2) Codingstyle cleanup for xfrm_output, from abian Frederick.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

13 May, 2014

2 commits

  • Fix checkpatch warning:
    "WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable"

    Cc: Steffen Klassert
    Cc: Herbert Xu
    Cc: Andrew Morton
    Signed-off-by: Fabian Frederick
    Signed-off-by: Steffen Klassert

    Fabian Frederick
     
  • Conflicts:
    drivers/net/ethernet/altera/altera_sgdma.c
    net/netlink/af_netlink.c
    net/sched/cls_api.c
    net/sched/sch_api.c

    The netlink conflict dealt with moving to netlink_capable() and
    netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
    in non-init namespaces. These were simple transformations from
    netlink_capable to netlink_ns_capable.

    The Altera driver conflict was simply code removal overlapping some
    void pointer cast cleanups in net-next.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 May, 2014

1 commit

  • commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%)
    reduced snmp array size to 1, so technically it doesn't have to be
    an array any more. What's more, after the following commit:

    commit 933393f58fef9963eac61db8093689544e29a600
    Date: Thu Dec 22 11:58:51 2011 -0600

    percpu: Remove irqsafe_cpu_xxx variants

    We simply say that regular this_cpu use must be safe regardless of
    preemption and interrupt state. That has no material change for x86
    and s390 implementations of this_cpu operations. However, arches that
    do not provide their own implementation for this_cpu operations will
    now get code generated that disables interrupts instead of preemption.

    probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
    almost 3 years, no one complains.

    So, just convert the array to a single pointer and remove snmp_mib_init()
    and snmp_mib_free() as well.

    Cc: Christoph Lameter
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

25 Apr, 2014

1 commit

  • It is possible by passing a netlink socket to a more privileged
    executable and then to fool that executable into writing to the socket
    data that happens to be valid netlink message to do something that
    privileged executable did not intend to do.

    To keep this from happening replace bare capable and ns_capable calls
    with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
    Which act the same as the previous calls except they verify that the
    opener of the socket had the desired permissions as well.

    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman