23 Mar, 2016

2 commits


21 Mar, 2016

1 commit

  • Pull x86 protection key support from Ingo Molnar:
    "This tree adds support for a new memory protection hardware feature
    that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

    There's a background article at LWN.net:

    https://lwn.net/Articles/643797/

    The gist is that protection keys allow the encoding of
    user-controllable permission masks in the pte. So instead of having a
    fixed protection mask in the pte (which needs a system call to change
    and works on a per page basis), the user can map a (handful of)
    protection mask variants and can change the masks runtime relatively
    cheaply, without having to change every single page in the affected
    virtual memory range.

    This allows the dynamic switching of the protection bits of large
    amounts of virtual memory, via user-space instructions. It also
    allows more precise control of MMU permission bits: for example the
    executable bit is separate from the read bit (see more about that
    below).

    This tree adds the MM infrastructure and low level x86 glue needed for
    that, plus it adds a high level API to make use of protection keys -
    if a user-space application calls:

    mmap(..., PROT_EXEC);

    or

    mprotect(ptr, sz, PROT_EXEC);

    (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
    this special case, and will set a special protection key on this
    memory range. It also sets the appropriate bits in the Protection
    Keys User Rights (PKRU) register so that the memory becomes unreadable
    and unwritable.

    So using protection keys the kernel is able to implement 'true'
    PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
    PROT_READ as well. Unreadable executable mappings have security
    advantages: they cannot be read via information leaks to figure out
    ASLR details, nor can they be scanned for ROP gadgets - and they
    cannot be used by exploits for data purposes either.

    We know about no user-space code that relies on pure PROT_EXEC
    mappings today, but binary loaders could start making use of this new
    feature to map binaries and libraries in a more secure fashion.

    There is other pending pkeys work that offers more high level system
    call APIs to manage protection keys - but those are not part of this
    pull request.

    Right now there's a Kconfig that controls this feature
    (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
    (like most x86 CPU feature enablement code that has no runtime
    overhead), but it's not user-configurable at the moment. If there's
    any serious problem with this then we can make it configurable and/or
    flip the default"

    * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
    x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
    mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
    x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
    mm/core, x86/mm/pkeys: Add execute-only protection keys support
    x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
    x86/mm/pkeys: Allow kernel to modify user pkey rights register
    x86/fpu: Allow setting of XSAVE state
    x86/mm: Factor out LDT init from context init
    mm/core, x86/mm/pkeys: Add arch_validate_pkey()
    mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
    x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
    x86/mm/pkeys: Add Kconfig prompt to existing config option
    x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
    x86/mm/pkeys: Dump PKRU with other kernel registers
    mm/core, x86/mm/pkeys: Differentiate instruction fetches
    x86/mm/pkeys: Optimize fault handling in access_error()
    mm/core: Do not enforce PKEY permissions on remote mm access
    um, pkeys: Add UML arch_*_access_permitted() methods
    mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
    x86/mm/gup: Simplify get_user_pages() PTE bit handling
    ...

    Linus Torvalds
     

20 Mar, 2016

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support more Realtek wireless chips, from Jes Sorenson.

    2) New BPF types for per-cpu hash and arrap maps, from Alexei
    Starovoitov.

    3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

    4) Allow the use of SO_REUSEPORT in order to do per-thread processing
    of incoming TCP/UDP connections. The muxing can be done using a
    BPF program which hashes the incoming packet. From Craig Gallek.

    5) Add a multiplexer for TCP streams, to provide a messaged based
    interface. BPF programs can be used to determine the message
    boundaries. From Tom Herbert.

    6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

    7) Avoid factorial complexity when taking down an inetdev interface
    with lots of configured addresses. We were doing things like
    traversing the entire address less for each address removed, and
    flushing the entire netfilter conntrack table for every address as
    well.

    8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

    9) Allow offloading u32 classifiers to hardware, and implement for
    ixgbe, from John Fastabend.

    10) Allow configuring IRQ coalescing parameters on a per-queue basis,
    from Kan Liang.

    11) Extend ethtool so that larger link mode masks can be supported.
    From David Decotigny.

    12) Introduce devlink, which can be used to configure port link types
    (ethernet vs Infiniband, etc.), port splitting, and switch device
    level attributes as a whole. From Jiri Pirko.

    13) Hardware offload support for flower classifiers, from Amir Vadai.

    14) Add "Local Checksum Offload". Basically, for a tunneled packet
    the checksum of the outer header is 'constant' (because with the
    checksum field filled into the inner protocol header, the payload
    of the outer frame checksums to 'zero'), and we can take advantage
    of that in various ways. From Edward Cree"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
    bonding: fix bond_get_stats()
    net: bcmgenet: fix dma api length mismatch
    net/mlx4_core: Fix backward compatibility on VFs
    phy: mdio-thunder: Fix some Kconfig typos
    lan78xx: add ndo_get_stats64
    lan78xx: handle statistics counter rollover
    RDS: TCP: Remove unused constant
    RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
    net: smc911x: convert pxa dma to dmaengine
    team: remove duplicate set of flag IFF_MULTICAST
    bonding: remove duplicate set of flag IFF_MULTICAST
    net: fix a comment typo
    ethernet: micrel: fix some error codes
    ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
    bpf, dst: add and use dst_tclassid helper
    bpf: make skb->tc_classid also readable
    net: mvneta: bm: clarify dependencies
    cls_bpf: reset class and reuse major in da
    ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
    ldmvsw: Add ldmvsw.c driver code
    ...

    Linus Torvalds
     

19 Mar, 2016

11 commits

  • Merge second patch-bomb from Andrew Morton:

    - a couple of hotfixes

    - the rest of MM

    - a new timer slack control in procfs

    - a couple of procfs fixes

    - a few misc things

    - some printk tweaks

    - lib/ updates, notably to radix-tree.

    - add my and Nick Piggin's old userspace radix-tree test harness to
    tools/testing/radix-tree/. Matthew said it was a godsend during the
    radix-tree work he did.

    - a few code-size improvements, switching to __always_inline where gcc
    screwed up.

    - partially implement character sets in sscanf

    * emailed patches from Andrew Morton : (118 commits)
    sscanf: implement basic character sets
    lib/bug.c: use common WARN helper
    param: convert some "on"/"off" users to strtobool
    lib: add "on"/"off" support to kstrtobool
    lib: update single-char callers of strtobool()
    lib: move strtobool() to kstrtobool()
    include/linux/unaligned: force inlining of byteswap operations
    include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
    include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
    usb: common: convert to use match_string() helper
    ide: hpt366: convert to use match_string() helper
    ata: hpt366: convert to use match_string() helper
    power: ab8500: convert to use match_string() helper
    power: charger_manager: convert to use match_string() helper
    drm/edid: convert to use match_string() helper
    pinctrl: convert to use match_string() helper
    device property: convert to use match_string() helper
    lib/string: introduce match_string() helper
    radix-tree tests: add test for radix_tree_iter_next
    radix-tree tests: add regression3 test
    ...

    Linus Torvalds
     
  • RDS_TCP_DEFAULT_BUFSIZE has been unused since commit 1edd6a14d24f
    ("RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune").

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Add per-net sysctl tunables to set the size of sndbuf and
    rcvbuf on the kernel tcp socket.

    The tunables are added at /proc/sys/net/rds/tcp/rds_tcp_sndbuf
    and /proc/sys/net/rds/tcp/rds_tcp_rcvbuf.

    These values must be set before accept() or connect(),
    and there may be an arbitrary number of existing rds-tcp
    sockets when the tunable is modified. To make sure that all
    connections in the netns pick up the same value for the tunable,
    we reset existing rds-tcp connections in the netns, so that
    they can reconnect with the new parameters.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded
    value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX
    for this, which makes the code a bit more generic and allows to remove
    BPF_TUNLEN_MAX from eBPF code.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • We can just add a small helper dst_tclassid() for retrieving the
    dst->tclassid value. It makes the code a bit better in that we can
    get rid of the ifdef from filter.c by moving this into the header.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Currently, the tc_classid from eBPF skb context is write-only, but there's
    no good reason for tc programs to limit it to write-only. For example,
    it can be used to transfer its state via tail calls where the resulting
    tc_classid gets filled gradually.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • There are two issues with the current code. First one is that we need
    to set res->class to 0 in case we use non-default classid matching.

    This is important for the case where cls_bpf was initially set up with
    an optional binding to a default class with tcf_bind_filter(), where
    the underlying qdisc implements bind_tcf() that fills res->class and
    tests for it later on when doing the classification. Convention for
    these cases is that after tc_classify() was called, such qdiscs (atm,
    drr, qfq, cbq, hfsc, htb) first test class, and if 0, then they lookup
    based on classid.

    Second, there's a bug with da mode, where res->classid is only assigned
    a 16 bit minor, but it needs to expand to the full 32 bit major/minor
    combination instead, therefore we need to expand with the bound major.
    This is fine as classes belonging to a classful qdisc must share the
    same major.

    Fixes: 045efa82ff56 ("cls_bpf: introduce integrated actions")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Currently output of MPLS packets on tunnel vports is not allowed by Open
    vSwitch. This is because historically encapsulation was done in such a way
    that the inner_protocol field of the skb needed to hold the inner protocol
    for both MPLS and tunnel encapsulation in order for GSO segmentation to be
    performed correctly.

    Since b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of
    vport") Open vSwitch makes use of lwt to output to tunnel netdevs which
    perform encapsulation. As no drivers expose support for MPLS offloads this
    means that GSO packets are segmented in software by validate_xmit_skb(),
    which is called from __dev_queue_xmit(), before tunnel encapsulation occurs.
    This means that the inner protocol of MPLS is no longer needed by the time
    encapsulation occurs and the contention on the inner_protocol field of the
    skb no longer occurs.

    Thus it is now safe to output MPLS to tunnel vports.

    Signed-off-by: Simon Horman
    Reviewed-by: Jesse Gross
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Signed-off-by: Fengguang Wu
    Signed-off-by: David S. Miller

    Wu Fengguang
     
  • Signed-off-by: Fengguang Wu
    Signed-off-by: David S. Miller

    Wu Fengguang
     
  • Pull rdma updates from Doug Ledford:
    "Initial roundup of 4.6 merge window patches.

    This is the first of two pull requests. It is the smaller request,
    but touches for more different things (this is everything but what is
    in or going into staging). The pull request for the code in
    staging/rdma is on hold until after we decide what to do on the
    write/writev API issue and may be partially deferred until 4.7 as a
    result.

    Summary:

    - cxgb4 updates
    - nes updates
    - unification of iwarp portmapper code to core
    - add drain_cq API
    - various ib_core updates
    - minor ipoib updates
    - minor mlx4 updates
    - more significant mlx5 updates (including a minor merge conflict
    with net-next tree...merge is simple to resolve and Stephen's
    resolution was confirmed by Mellanox)
    - trivial net/9p rdma conversion
    - ocrdma RoCEv2 update
    - srpt updates"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits)
    iwpm: crash fix for large connections test
    iw_cxgb3: support for iWARP port mapping
    iw_cxgb4: remove port mapper related code
    iw_nes: remove port mapper related code
    iwcm: common code for port mapper
    net/9p: convert to new CQ API
    IB/mlx5: Add support for don't trap rules
    net/mlx5_core: Introduce forward to next priority action
    net/mlx5_core: Create anchor of last flow table
    iser: Accept arbitrary sg lists mapping if the device supports it
    mlx5: Add arbitrary sg list support
    IB/core: Add arbitrary sg_list support
    IB/mlx5: Expose correct max_fast_reg_page_list_len
    IB/mlx5: Make coding style more consistent
    IB/mlx5: Convert UMR CQ to new CQ API
    IB/ocrdma: Skip using unneeded intermediate variable
    IB/ocrdma: Skip using unneeded intermediate variable
    IB/ocrdma: Delete unnecessary variable initialisations in 11 functions
    IB/core: Documentation fix in the MAD header file
    IB/core: trivial prink cleanup.
    ...

    Linus Torvalds
     

18 Mar, 2016

6 commits

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    drivers/rtc: broken link fix
    drm/i915 Fix typos in i915_gem_fence.c
    Docs: fix missing word in REPORTING-BUGS
    lib+mm: fix few spelling mistakes
    MAINTAINERS: add git URL for APM driver
    treewide: Fix typo in printk

    Linus Torvalds
     
  • Now SYN_RECV request sockets are installed in ehash table, an ICMP
    handler can find a request socket while another cpu handles an incoming
    packet transforming this SYN_RECV request socket into an ESTABLISHED
    socket.

    We need to remove the now obsolete WARN_ON(req->sk), since req->sk
    is set when a new child is created and added into listener accept queue.

    If this race happens, the ICMP will do nothing special.

    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Signed-off-by: Eric Dumazet
    Reported-by: Ben Lazarus
    Reported-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • vlan drivers lack proper propagation of gso_max_segs from
    lower device.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The success of CMA allocation largely depends on the success of
    migration and key factor of it is page reference count. Until now, page
    reference is manipulated by direct calling atomic functions so we cannot
    follow up who and where manipulate it. Then, it is hard to find actual
    reason of CMA allocation failure. CMA allocation should be guaranteed
    to succeed so finding offending place is really important.

    In this patch, call sites where page reference is manipulated are
    converted to introduced wrapper function. This is preparation step to
    add tracepoint to each page reference manipulation function. With this
    facility, we can easily find reason of CMA allocation failure. There is
    no functional change in this patch.

    In addition, this patch also converts reference read sites. It will
    help a second step that renames page._count to something else and
    prevents later attempt to direct access to it (Suggested by Andrew).

    Signed-off-by: Joonsoo Kim
    Acked-by: Michal Nazarewicz
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: "Kirill A. Shutemov"
    Cc: Sergey Senozhatsky
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Pull tty/serial updates from Greg KH:
    "Here's the big tty/serial driver pull request for 4.6-rc1.

    Lots of changes in here, Peter has been on a tear again, with lots of
    refactoring and bugs fixes, many thanks to the great work he has been
    doing. Lots of driver updates and fixes as well, full details in the
    shortlog.

    All have been in linux-next for a while with no reported issues"

    * tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (220 commits)
    serial: 8250: describe CONFIG_SERIAL_8250_RSA
    serial: samsung: optimize UART rx fifo access routine
    serial: pl011: add mark/space parity support
    serial: sa1100: make sa1100_register_uart_fns a function
    tty: serial: 8250: add MOXA Smartio MUE boards support
    serial: 8250: convert drivers to use up_to_u8250p()
    serial: 8250/mediatek: fix building with SERIAL_8250=m
    serial: 8250/ingenic: fix building with SERIAL_8250=m
    serial: 8250/uniphier: fix modular build
    Revert "drivers/tty/serial: make 8250/8250_ingenic.c explicitly non-modular"
    Revert "drivers/tty/serial: make 8250/8250_mtk.c explicitly non-modular"
    serial: mvebu-uart: initial support for Armada-3700 serial port
    serial: mctrl_gpio: Add missing module license
    serial: ifx6x60: avoid uninitialized variable use
    tty/serial: at91: fix bad offset for UART timeout register
    tty/serial: at91: restore dynamic driver binding
    serial: 8250: Add hardware dependency to RT288X option
    TTY, devpts: document pty count limiting
    tty: goldfish: support platform_device with id -1
    drivers: tty: goldfish: Add device tree bindings
    ...

    Linus Torvalds
     
  • Pull crypto update from Herbert Xu:
    "Here is the crypto update for 4.6:

    API:
    - Convert remaining crypto_hash users to shash or ahash, also convert
    blkcipher/ablkcipher users to skcipher.
    - Remove crypto_hash interface.
    - Remove crypto_pcomp interface.
    - Add crypto engine for async cipher drivers.
    - Add akcipher documentation.
    - Add skcipher documentation.

    Algorithms:
    - Rename crypto/crc32 to avoid name clash with lib/crc32.
    - Fix bug in keywrap where we zero the wrong pointer.

    Drivers:
    - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
    - Add PIC32 hwrng driver.
    - Support BCM6368 in bcm63xx hwrng driver.
    - Pack structs for 32-bit compat users in qat.
    - Use crypto engine in omap-aes.
    - Add support for sama5d2x SoCs in atmel-sha.
    - Make atmel-sha available again.
    - Make sahara hashing available again.
    - Make ccp hashing available again.
    - Make sha1-mb available again.
    - Add support for multiple devices in ccp.
    - Improve DMA performance in caam.
    - Add hashing support to rockchip"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
    crypto: qat - remove redundant arbiter configuration
    crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
    crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
    crypto: qat - Change the definition of icp_qat_uof_regtype
    hwrng: exynos - use __maybe_unused to hide pm functions
    crypto: ccp - Add abstraction for device-specific calls
    crypto: ccp - CCP versioning support
    crypto: ccp - Support for multiple CCPs
    crypto: ccp - Remove check for x86 family and model
    crypto: ccp - memset request context to zero during import
    lib/mpi: use "static inline" instead of "extern inline"
    lib/mpi: avoid assembler warning
    hwrng: bcm63xx - fix non device tree compatibility
    crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
    crypto: qat - The AE id should be less than the maximal AE number
    lib/mpi: Endianness fix
    crypto: rockchip - add hash support for crypto engine in rk3288
    crypto: xts - fix compile errors
    crypto: doc - add skcipher API documentation
    crypto: doc - update AEAD AD handling
    ...

    Linus Torvalds
     

17 Mar, 2016

2 commits


16 Mar, 2016

1 commit

  • $ make tags
    GEN tags
    ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
    ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
    ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
    ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
    ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
    ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
    ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
    ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
    ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"

    Which are all the result of the DEFINE_PER_CPU pattern:

    scripts/tags.sh:200: '/\
    Acked-by: David S. Miller
    Acked-by: Rafael J. Wysocki
    Cc: Tejun Heo
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

15 Mar, 2016

16 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS/OVS updates for net-next

    The following patchset contains Netfilter/IPVS fixes and OVS NAT
    support, more specifically this batch is composed of:

    1) Fix a crash in ipset when performing a parallel flush/dump with
    set:list type, from Jozsef Kadlecsik.

    2) Make sure NFACCT_FILTER_* netlink attributes are in place before
    accessing them, from Phil Turnbull.

    3) Check return error code from ip_vs_fill_iph_skb_off() in IPVS SIP
    helper, from Arnd Bergmann.

    4) Add workaround to IPVS to reschedule existing connections to new
    destination server by dropping the packet and wait for retransmission
    of TCP syn packet, from Julian Anastasov.

    5) Allow connection rescheduling in IPVS when in CLOSE state, also
    from Julian.

    6) Fix wrong offset of SIP Call-ID in IPVS helper, from Marco Angaroni.

    7) Validate IPSET_ATTR_ETHER netlink attribute length, from Jozsef.

    8) Check match/targetinfo netlink attribute size in nft_compat,
    patch from Florian Westphal.

    9) Check for integer overflow on 32-bit systems in x_tables, from
    Florian Westphal.

    Several patches from Jarno Rajahalme to prepare the introduction of
    NAT support to OVS based on the Netfilter infrastructure:

    10) Schedule IP_CT_NEW_REPLY definition for removal in
    nf_conntrack_common.h.

    11) Simplify checksumming recalculation in nf_nat.

    12) Add comments to the openvswitch conntrack code, from Jarno.

    13) Update the CT state key only after successful nf_conntrack_in()
    invocation.

    14) Find existing conntrack entry after upcall.

    15) Handle NF_REPEAT case due to templates in nf_conntrack_in().

    16) Call the conntrack helper functions once the conntrack has been
    confirmed.

    17) And finally, add the NAT interface to OVS.

    The batch closes with:

    18) Cleanup to use spin_unlock_wait() instead of
    spin_lock()/spin_unlock(), from Nicholas Mc Guire.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The spin_lock()/spin_unlock() is synchronizing on the
    nf_conntrack_locks_all_lock which is equivalent to
    spin_unlock_wait() but the later should be more efficient.

    Signed-off-by: Nicholas Mc Guire
    Signed-off-by: Pablo Neira Ayuso

    Nicholas Mc Guire
     
  • On loaded TCP servers, looking at millions of sockets can hold
    cpu for many seconds, if the lookup condition is very narrow.

    (eg : ss dst 1.2.3.4 )

    Better add a cond_resched() to allow other processes to access
    the cpu.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Extend OVS conntrack interface to cover NAT. New nested
    OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
    A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
    If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
    attributes, new (non-committed/non-confirmed) connections are mangled
    according to the rest of the nested attributes.

    The corresponding OVS userspace patch series includes test cases (in
    tests/system-traffic.at) that also serve as example uses.

    This work extends on a branch by Thomas Graf at
    https://github.com/tgraf/ovs/tree/nat.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Thomas Graf
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • There is no need to help connections that are not confirmed, so we can
    delay helping new connections to the time when they are confirmed.
    This change is needed for NAT support, and having this as a separate
    patch will make the following NAT patch a bit easier to review.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Repeat the nf_conntrack_in() call when it returns NF_REPEAT. This
    avoids dropping a SYN packet re-opening an existing TCP connection.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Add a new function ovs_ct_find_existing() to find an existing
    conntrack entry for which this packet was already applied to. This is
    only to be called when there is evidence that the packet was already
    tracked and committed, but we lost the ct reference due to an
    userspace upcall.

    ovs_ct_find_existing() is called from skb_nfct_cached(), which can now
    hide the fact that the ct reference may have been lost due to an
    upcall. This allows ovs_ct_commit() to be simplified.

    This patch is needed by later "openvswitch: Interface with NAT" patch,
    as we need to be able to pass the packet through NAT using the
    original ct reference also after the reference is lost after an
    upcall.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Only a successful nf_conntrack_in() call can effect a connection state
    change, so it suffices to update the key only after the
    nf_conntrack_in() returns.

    This change is needed for the later NAT patches.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • This makes the code easier to understand and the following patches
    more focused.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • NAT checksum recalculation code assumes existence of skb_dst, which
    becomes a problem for a later patch in the series ("openvswitch:
    Interface with NAT."). Simplify this by removing the check on
    skb_dst, as the checksum will be dealt with later in the stack.

    Suggested-by: Pravin Shelar
    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Remove the definition of IP_CT_NEW_REPLY from the kernel as it does
    not make sense. This allows the definition of IP_CT_NUMBER to be
    simplified as well.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Rework the netdev event handler, similar to what the Mellanox Spectrum
    driver does, to easily welcome more events later (for example
    NETDEV_PRECHANGEUPPER) and use netdev helpers (such as
    netif_is_bridge_master).

    Signed-off-by: Vivien Didelot
    Acked-by: Jiri Pirko
    Acked-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • netdev_upper_dev_unlink() which notifies NETDEV_CHANGEUPPER, returns
    void, as well as del_nbp(). So there's no advantage to catch an eventual
    error from the port_bridge_leave routine at the DSA level.

    Make this routine void for the DSA layer and its existing drivers.

    Signed-off-by: Vivien Didelot
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Rename DSA port_join_bridge and port_leave_bridge routines to
    respectively port_bridge_join and port_bridge_leave in order to respect
    an implicit Port::Bridge namespace.

    Signed-off-by: Vivien Didelot
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Zefir Kurtisi reported kernel panic with an openwrt specific patch.
    However, it turns out that mainline has a similar bug waiting to happen.

    Once NF_HOOK() returns the skb is in undefined state and must not be
    used. Moreover, the okfn must consume the skb to support async
    processing (NF_QUEUE).

    Current okfn in this spot doesn't consume it and caller assumes that
    NF_HOOK return value tells us if skb was freed or not, but thats wrong.

    It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at
    LOCAL_IN that returns STOLEN or NF_QUEUE verdicts.

    Once we add NF_QUEUE support for nftables bridge this will break --
    NF_QUEUE holds the skb for async processing, caller will erronoulsy
    return RX_HANDLER_PASS and on reinject netfilter will access free'd skb.

    Fix this by pushing skb up the stack in the okfn instead.

    NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING
    completely in this case but this is how its been forever so it seems
    preferable to not change this.

    Cc: Felix Fietkau
    Cc: Zefir Kurtisi
    Signed-off-by: Florian Westphal
    Tested-by: Zefir Kurtisi
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • …etooth/bluetooth-next

    Johan Hedberg says:

    ====================
    pull request: bluetooth-next 2016-03-12

    Here's the last bluetooth-next pull request for the 4.6 kernel.

    - New USB ID for AR3012 in btusb
    - New BCM2E55 ACPI ID
    - Buffer overflow fix for the Add Advertising command
    - Support for a new Bluetooth LE limited privacy mode
    - Fix for firmware activation in btmrvl_sdio
    - Cleanups to mac802154 & 6lowpan code

    Please let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller