06 Oct, 2020

3 commits

  • Rejecting non-native endian BTF overlapped with the addition
    of support for it.

    The rest were more simple overlapping changes, except the
    renesas ravb binding update, which had to follow a file
    move as well as a YAML conversion.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull x86 platform driver fixes from Andy Shevchenko:
    "We have some fixes for Tablet Mode reporting in particular, that users
    are complaining a lot about.

    Summary:

    - Attempt #3 of enabling Tablet Mode reporting w/o regressions

    - Improve battery recognition code in ASUS WMI driver

    - Fix Kconfig dependency warning for Fujitsu and LG laptop drivers

    - Add fixes in Thinkpad ACPI driver for _BCL method and NVRAM polling

    - Fix power supply extended topology in Mellanox driver

    - Fix memory leak in OLPC EC driver

    - Avoid static struct device in Intel PMC core driver

    - Add support for the touchscreen found in MPMAN Converter9 2-in-1

    - Update MAINTAINERS to reflect the real state of affairs"

    * tag 'platform-drivers-x86-v5.9-2' of git://git.infradead.org/linux-platform-drivers-x86:
    platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse
    MAINTAINERS: Add Mark Gross and Hans de Goede as x86 platform drivers maintainers
    platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting
    platform/x86: intel-vbtn: Revert "Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360"
    platform/x86: intel_pmc_core: do not create a static struct device
    platform/x86: mlx-platform: Fix extended topology configuration for power supply units
    platform/x86: pcengines-apuv2: Fix typo on define of AMD_FCH_GPIO_REG_GPIO55_DEVSLP0
    platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP
    platform/x86: fix kconfig dependency warning for LG_LAPTOP
    platform/x86: thinkpad_acpi: initialize tp_nvram_state variable
    platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360
    platform/x86: asus-wmi: Add BATC battery name to the list of supported
    platform/x86: asus-nb-wmi: Revert "Do not load on Asus T100TA and T200TA"
    platform/x86: touchscreen_dmi: Add info for the MPMAN Converter9 2-in-1
    Documentation: laptops: thinkpad-acpi: fix underline length build warning
    Platform: OLPC: Fix memleak in olpc_ec_probe

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Make sure SKB control block is in the proper state during IPSEC
    ESP-in-TCP encapsulation. From Sabrina Dubroca.

    2) Various kinds of attributes were not being cloned properly when we
    build new xfrm_state objects from existing ones. Fix from Antony
    Antony.

    3) Make sure to keep BTF sections, from Tony Ambardar.

    4) TX DMA channels need proper locking in lantiq driver, from Hauke
    Mehrtens.

    5) Honour route MTU during forwarding, always. From Maciej
    Żenczykowski.

    6) Fix races in kTLS which can result in crashes, from Rohit
    Maheshwari.

    7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
    Jha.

    8) Use correct address family in xfrm state lookups, from Herbert Xu.

    9) A bridge FDB flush should not clear out user managed fdb entries
    with the ext_learn flag set, from Nikolay Aleksandrov.

    10) Fix nested locking of netdev address lists, from Taehee Yoo.

    11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.

    12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.

    13) Don't free command entries in mlx5 while comp handler could still be
    running, from Eran Ben Elisha.

    14) Error flow of request_irq() in mlx5 is busted, due to an off by one
    we try to free and IRQ never allocated. From Maor Gottlieb.

    15) Fix leak when dumping netlink policies, from Johannes Berg.

    16) Sendpage cannot be performed when a page is a slab page, or the page
    count is < 1. Some subsystems such as nvme were doing so. Create a
    "sendpage_ok()" helper and use it as needed, from Coly Li.

    17) Don't leak request socket when using syncookes with mptcp, from
    Paolo Abeni.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
    net/core: check length before updating Ethertype in skb_mpls_{push,pop}
    net: mvneta: fix double free of txq->buf
    net_sched: check error pointer in tcf_dump_walker()
    net: team: fix memory leak in __team_options_register
    net: typhoon: Fix a typo Typoon --> Typhoon
    net: hinic: fix DEVLINK build errors
    net: stmmac: Modify configuration method of EEE timers
    tcp: fix syn cookied MPTCP request socket leak
    libceph: use sendpage_ok() in ceph_tcp_sendpage()
    scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
    drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
    tcp: use sendpage_ok() to detect misused .sendpage
    nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
    net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
    net: introduce helper sendpage_ok() in include/linux/net.h
    net: usb: pegasus: Proper error handing when setting pegasus' MAC address
    net: core: document two new elements of struct net_device
    netlink: fix policy dump leak
    net/mlx5e: Fix race condition on nhe->n pointer in neigh update
    net/mlx5e: Fix VLAN create flow
    ...

    Linus Torvalds
     

05 Oct, 2020

37 commits

  • Convert m88e1318_get_wol() to use the well implemented phy_read_paged()
    instead of open coding it.

    Signed-off-by: Jisheng Zhang
    Reviewed-by: Marek Behún
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Jisheng Zhang
     
  • A driver may refuse to enable VLAN filtering for any reason beyond what
    the DSA framework cares about, such as:
    - having tc-flower rules that rely on the switch being VLAN-aware
    - the particular switch does not support VLAN, even if the driver does
    (the DSA framework just checks for the presence of the .port_vlan_add
    and .port_vlan_del pointers)
    - simply not supporting this configuration to be toggled at runtime

    Currently, when a driver rejects a configuration it cannot support, it
    does this from the commit phase, which triggers various warnings in
    switchdev.

    So propagate the prepare phase to drivers, to give them the ability to
    refuse invalid configurations cleanly and avoid the warnings.

    Since we need to modify all function prototypes and check for the
    prepare phase from within the drivers, take that opportunity and move
    the existing driver restrictions within the prepare phase where that is
    possible and easy.

    Cc: Florian Fainelli
    Cc: Martin Blumenstingl
    Cc: Hauke Mehrtens
    Cc: Woojung Huh
    Cc: Microchip Linux Driver Support
    Cc: Sean Wang
    Cc: Landen Chao
    Cc: Andrew Lunn
    Cc: Vivien Didelot
    Cc: Jonathan McDowell
    Cc: Linus Walleij
    Cc: Alexandre Belloni
    Cc: Claudiu Manoil
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • Evaluating ACPI _BCL could fail, then ACPI buffer size will be set to 0.
    When reuse this ACPI buffer, AE_BUFFER_OVERFLOW will be triggered.

    Re-initialize buffer size will make ACPI evaluate successfully.

    Fixes: 46445b6b896fd ("thinkpad-acpi: fix handle locate for video and query of _BCL")
    Signed-off-by: Aaron Ma
    Signed-off-by: Andy Shevchenko

    Aaron Ma
     
  • Rikard Falkeborn says:

    ====================
    net: Constify struct genl_small_ops

    Make a couple of static struct genl_small_ops const to allow the compiler
    to put them in read-only memory. Patches are independent.

    v2: Rebase on net-next, genl_ops -> genl_small_ops
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The only usage of these is to assign their address to the small_ops field
    in the genl_family struct, which is a const pointer, and applying
    ARRAY_SIZE() on them. Make them const to allow the compiler to put them
    in read-only memory.

    Signed-off-by: Rikard Falkeborn
    Signed-off-by: David S. Miller

    Rikard Falkeborn
     
  • The only usages of mptcp_pm_ops is to assign its address to the small_ops
    field of the genl_family struct, which is a const pointer, and applying
    ARRAY_SIZE() on it. Make it const to allow the compiler to put it in
    read-only memory.

    Signed-off-by: Rikard Falkeborn
    Signed-off-by: David S. Miller

    Rikard Falkeborn
     
  • Linus Torvalds
     
  • 1. Keep the code for the normal (non-error) flow at the lowest
    indentation level. And use "goto drop" for all error handling.

    2. Replace code that pads short Ethernet frames with a "__skb_pad" call.

    3. Change "dev_kfree_skb" to "kfree_skb" in error handling code.
    "kfree_skb" is the correct function to call when dropping an skb due to
    an error. "dev_kfree_skb", which is an alias of "consume_skb", is for
    dropping skbs normally (not due to an error).

    Cc: Krzysztof Halasa
    Cc: Stephen Hemminger
    Signed-off-by: Xie He
    Signed-off-by: David S. Miller

    Xie He
     
  • Openvswitch allows to drop a packet's Ethernet header, therefore
    skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true
    and mac_len=0. In that case the pointer passed to skb_mod_eth_type()
    doesn't point to an Ethernet header and the new Ethertype is written at
    unexpected locations.

    Fix this by verifying that mac_len is big enough to contain an Ethernet
    header.

    Fixes: fa4e0f8855fc ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions")
    Signed-off-by: Guillaume Nault
    Acked-by: Davide Caratti
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • clang static analysis reports this problem:

    drivers/net/ethernet/marvell/mvneta.c:3465:2: warning:
    Attempt to free released memory
    kfree(txq->buf);
    ^~~~~~~~~~~~~~~

    When mvneta_txq_sw_init() fails to alloc txq->tso_hdrs,
    it frees without poisoning txq->buf. The error is caught
    in the mvneta_setup_txqs() caller which handles the error
    by cleaning up all of the txqs with a call to
    mvneta_txq_sw_deinit which also frees txq->buf.

    Since mvneta_txq_sw_deinit is a general cleaner, all of the
    partial cleaning in mvneta_txq_sw_deinit()'s error handling
    is not needed.

    Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO")
    Signed-off-by: Tom Rix
    Signed-off-by: David S. Miller

    Tom Rix
     
  • Although we take RTNL on dump path, it is possible to
    skip RTNL on insertion path. So the following race condition
    is possible:

    rtnl_lock() // no rtnl lock
    mutex_lock(&idrinfo->lock);
    // insert ERR_PTR(-EBUSY)
    mutex_unlock(&idrinfo->lock);
    tc_dump_action()
    rtnl_unlock()

    So we have to skip those temporary -EBUSY entries on dump path
    too.

    Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com
    Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
    Cc: Vlad Buslov
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • The variable "i" isn't initialized back correctly after the first loop
    under the label inst_rollback gets executed.

    The value of "i" is assigned to be option_count - 1, and the ensuing
    loop (under alloc_rollback) begins by initializing i--.
    Thus, the value of i when the loop begins execution will now become
    i = option_count - 2.

    Thus, when kfree(dst_opts[i]) is called in the second loop in this
    order, (i.e., inst_rollback followed by alloc_rollback),
    dst_optsp[option_count - 2] is the first element freed, and
    dst_opts[option_count - 1] does not get freed, and thus, a memory
    leak is caused.

    This memory leak can be fixed, by assigning i = option_count (instead of
    option_count - 1).

    Fixes: 80f7c6683fe0 ("team: add support for per-port options")
    Reported-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com
    Tested-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com
    Signed-off-by: Anant Thazhemadam
    Signed-off-by: David S. Miller

    Anant Thazhemadam
     
  • Michael Chan says:

    ====================
    bnxt_en: net-next updates.

    This series starts off with the usual update of the firmware interface
    spec. A new firmware status bit in the interface will be used in patch
    add the infrastructure to read the firmware status very early during
    driver probe and this will allow patch #4 to do the recovery if needed.

    The rest of the patches add improvements to the current RX reset
    logic by localizing the reset to the affected RX ring only and to
    reset only if firmware has determined that the RX ring is in permanent
    error state.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Currently, the driver will schedule RX ring reset when we get a buffer
    error in the RX completion record. These RX buffer errors can be due
    to normal out-of-buffer conditions or a permanent error in the RX
    ring. Because the driver cannot distinguish between these 2
    conditions, we assume all these buffer errors require reset.

    This is very disruptive when it is just a normal out-of-buffer
    condition. Newer firmware will now monitor the rings for the permanent
    failure and will send a notification to the driver when it happens.
    This allows the driver to reset only when such a notification is
    received. In environments where we have predominently out-of-buffer
    conditions, we now can avoid these unnecessary resets.

    Reviewed-by: Edwin Peer
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • There is logic in the RX path to detect unexpected handles in the
    RX completion. We'll print a warning and schedule a reset. The
    next expected handle is then set to 0xffff which is guaranteed to
    not match any valid handle. This will force all remaining packets in
    the ring to be discarded before the reset. There can be hundreds of
    these packets remaining in the ring and there is no need to print the
    warnings for these forced errors.

    Reviewed-by: Pavan Chebbi
    Reviewed-by: Edwin Peer
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • Add a per ring rx_resets counter to count these RX resets.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • On some older chips, it is necessary to do a reset when we get buffer
    errors associated with an RX ring. These buffer errors may become
    frequent if the RX ring underruns under heavy traffic. The current
    code does a global reset of all reasources when this happens. This
    works but creates a big disruption of all rings when one RX ring is
    having problem. This patch implements a localized RX ring reset of
    just the RX ring having the issue. All other rings including all
    TX rings will not be affected by this single RX ring reset.

    Only the older chips prior to the P5 class supports this reset.
    Because it is not a global reset, packets may still be arriving
    while we are calling firmware to reset that ring. We need to be
    sure that we don't post any buffers during this time while the
    ring is undergoing reset. After firmware completes successfully,
    the ring will be in the reset state with no buffers and we can start
    filling it with new buffers and posting them.

    Reviewed-by: Pavan Chebbi
    Signed-off-by: Edwin Peer
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • bnxt_init_one_rx_ring() includes logic to initialize the BDs for one RX
    ring and to allocate the buffers. Separate the allocation logic into a
    new bnxt_alloc_one_rx_ring() function. The allocation function will be
    used later to allocate new buffers for one specified RX ring when we
    reset that RX ring.

    Reviewed-by: Pavan Chebbi
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • bnxt_free_rx_skbs() frees all the allocated buffers and SKBs for
    every RX ring. Refactor this function by calling a new function
    bnxt_free_one_rx_ring_skbs() to free these buffers on one specified
    RX ring at a time. This is preparation work for resetting one RX
    ring during run-time.

    Reviewed-by: Pavan Chebbi
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • If firmware does not come out of reset, log FW health status info
    to provide more information on firmware status.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • The NS3 SoC platforms require assistance from the OP-TEE to recover
    firmware if a crash occurs while no driver is bound. The
    CRASHED_NO_MASTER condition is recorded in the firmware status register
    during the crash to indicate when driver intervension is needed to
    coordinate a firmware reload. This condition is detected during early
    driver initialization in order to effect a firmware fastboot on
    supported platforms when necessary.

    Reviewed-by: Vasundhara Volam
    Signed-off-by: Edwin Peer
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Edwin Peer
     
  • Firmware now supports device independent discovery of the status
    register location. This status register can provide more detailed
    information about firmware errors, especially if problems occur
    before the HWRM interface is functioning. Attempt to map this
    register if it is present and report the firmware status on firmware
    init failures.

    Signed-off-by: Edwin Peer
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Edwin Peer
     
  • The allocator for the firmware health structure conflates allocation
    and capability checks, limiting the reusability of the code. This patch
    separates out the capability check and disablement and improves the
    warning message to better describe the consequences of an allocation
    failure.

    Signed-off-by: Edwin Peer
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Edwin Peer
     
  • Main changes is to extend hwrm_nvm_get_dev_info_output() for stored
    firmware versions and a new flag is added to fw_status_reg.

    Reviewed-by: Edwin Peer
    Signed-off-by: Vasundhara Volam
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Vasundhara Volam
     
  • Andrew Lunn says:

    ====================
    mv88e6xxx: Add per port devlink regions

    This patchset extends devlink regions to support per port regions, and
    them makes use of them to support the ports of the mv88e6xxx switches.

    root@rap:~# devlink region show
    mdio_bus/gpio-0:00/global1: size 64 snapshot []
    mdio_bus/gpio-0:00/global2: size 64 snapshot []
    mdio_bus/gpio-0:00/atu: size 49152 snapshot []
    mdio_bus/gpio-0:00/0/port: size 64 snapshot []
    mdio_bus/gpio-0:00/1/port: size 64 snapshot []
    mdio_bus/gpio-0:00/2/port: size 64 snapshot []
    mdio_bus/gpio-0:00/3/port: size 64 snapshot []
    mdio_bus/gpio-0:00/4/port: size 64 snapshot []
    mdio_bus/gpio-0:00/5/port: size 64 snapshot []
    mdio_bus/gpio-0:00/6/port: size 64 snapshot []
    mdio_bus/gpio-0:00/7/port: size 64 snapshot []
    mdio_bus/gpio-0:00/8/port: size 64 snapshot []
    mdio_bus/gpio-0:00/9/port: size 64 snapshot []
    mdio_bus/gpio-0:00/10/port: size 64 snapshot []

    root@rap:~# devlink region new mdio_bus/gpio-0:00/1/port snapshot 42
    root@rap:~# devlink region dump mdio_bus/gpio-0:00/1/port snapshot 42
    0000000000000000 4f 1e 3e 20 00 01 01 39 3f 05 00 00 fd 07 00 00
    0000000000000010 80 00 01 00 00 00 00 00 00 00 00 00 00 00 00 91
    0000000000000020 00 00 00 00 00 00 00 00 00 00 00 00 22 00 00 00
    0000000000000030 07 3e 00 00 00 00 00 80 00 00 00 00 00 00 5b 00

    In order to support all ports of the switch, a new devlink flavour has
    been added for unused ports:

    mdio_bus/gpio-0:00/0: type notset flavour unused splittable false
    mdio_bus/gpio-0:00/1: type notset flavour cpu port 1 splittable false
    mdio_bus/gpio-0:00/2: type eth netdev red flavour physical port 2 splittable fae
    mdio_bus/gpio-0:00/3: type eth netdev blue flavour physical port 3 splittable fe
    mdio_bus/gpio-0:00/4: type eth netdev green flavour physical port 4 splittable e
    mdio_bus/gpio-0:00/5: type notset flavour unused splittable false
    mdio_bus/gpio-0:00/6: type notset flavour unused splittable false
    mdio_bus/gpio-0:00/7: type notset flavour unused splittable false
    mdio_bus/gpio-0:00/8: type eth netdev waic0 flavour physical port 8 splittable e
    mdio_bus/gpio-0:00/9: type notset flavour unused splittable false
    mdio_bus/gpio-0:00/10: type notset flavour unused splittable false

    The DSA core now creates the devlink port instances earlier, so that
    the driver setup function can make use of them.

    v3:
    Whitespace cleanup
    Added justification for devlink unused flavour
    Added Tested-by, Reviewed-by:
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Add a devlink region to return the per port registers.

    Signed-off-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Reviewed-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Hide away from DSA drivers how devlink works.

    Signed-off-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Reviewed-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Allow DSA drivers to make use of devlink port regions, via simple
    wrappers.

    Reviewed-by: Vladimir Oltean
    Tested-by: Vladimir Oltean
    Signed-off-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Allow regions to be registered to a devlink port. The same netlink API
    is used, but the port index is provided to indicate when a region is a
    port region as opposed to a device region.

    Reviewed-by: Vladimir Oltean
    Tested-by: Vladimir Oltean
    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • DSA drivers want to create regions on devlink ports as well as the
    devlink device instance, in order to export registers and other tables
    per port. To keep all this code together in the drivers, have the
    devlink ports registered early, so the setup() method can setup both
    device and port devlink regions.

    v3:
    Remove dp->setup
    Move common code out of switch statement.
    Fix wrong goto

    Signed-off-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Reviewed-by: Vladimir Oltean
    Tested-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • If a port is unused, still create a devlink port for it, but set the
    flavour to unused. This allows us to attach devlink regions to the
    port, etc.

    Reviewed-by: Vladimir Oltean
    Tested-by: Vladimir Oltean
    Signed-off-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Not all ports of a switch need to be used, particularly in embedded
    systems. Add a port flavour for ports which physically exist in the
    switch, but are not connected to the front panel etc, and so are
    unused. By having unused ports present in devlink, it gives a more
    accurate representation of the hardware. It also allows regions to be
    associated to such ports, so allowing, for example, to determine
    unused ports are correctly powered off, or to compare probable reset
    defaults of unused ports to used ports experiences issues.

    Actually registering unused ports and setting the flavour to unused is
    optional. The DSA core will register all such switch ports, but such
    ports are expected to be limited in number. Bigger ASICs may decide
    not to list unused ports.

    v2:
    Expand the description about why it is useful

    Reviewed-by: Vladimir Oltean
    Tested-by: Vladimir Oltean
    Signed-off-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next:

    1) Rename 'searched' column to 'clashres' in conntrack /proc/ stats
    to amend a recent patch, from Florian Westphal.

    2) Remove unused nft_data_debug(), from YueHaibing.

    3) Remove unused definitions in IPVS, also from YueHaibing.

    4) Fix user data memleak in tables and objects, this is also amending
    a recent patch, from Jose M. Guisado.

    5) Use nla_memdup() to allocate user data in table and objects, also
    from Jose M. Guisado

    6) User data support for chains, from Jose M. Guisado

    7) Remove unused definition in nf_tables_offload, from YueHaibing.

    8) Use kvzalloc() in ip_set_alloc(), from Vasily Averin.

    9) Fix false positive reported by lockdep in nfnetlink mutexes,
    from Florian Westphal.

    10) Extend fast variant of cmp for neq operation, from Phil Sutter.

    11) Implement fast bitwise variant, also from Phil Sutter.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • A typical use of bitwise expression is to mask out parts of an IP
    address when matching on the network part only. Optimize for this common
    use with a fast variant for NFT_BITWISE_BOOL-type expressions operating
    on 32bit-sized values.

    Signed-off-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso

    Phil Sutter
     
  • Add a boolean indicating NFT_CMP_NEQ. To include it into the match
    decision, it is sufficient to XOR it with the data comparison's result.

    While being at it, store the mask that is calculated during expression
    init and free the eval routine from having to recalculate it each time.

    Signed-off-by: Phil Sutter
    Signed-off-by: Pablo Neira Ayuso

    Phil Sutter
     
  • From time to time there are lockdep reports similar to this one:

    WARNING: possible circular locking dependency detected
    ------------------------------------------------------
    000000004f61aa56 (&table[i].mutex){+.+.}, at: nfnl_lock [nfnetlink]
    but task is already holding lock:
    [..] (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid [nf_tables]
    which lock already depends on the new lock.
    the existing dependency chain (in reverse order) is:
    -> #1 (&net->nft.commit_mutex){+.+.}:
    [..]
    nf_tables_valid_genid+0x18/0x60 [nf_tables]
    nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
    nfnetlink_rcv+0x110/0x140 [nfnetlink]
    netlink_unicast+0x12c/0x1e0
    [..]
    sys_sendmsg+0x18/0x40
    linux_sparc_syscall+0x34/0x44
    -> #0 (&table[i].mutex){+.+.}:
    [..]
    nfnl_lock+0x24/0x40 [nfnetlink]
    ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
    set_match_v1_checkentry+0x14/0xc0 [xt_set]
    xt_check_match+0x238/0x260 [x_tables]
    __nft_match_init+0x160/0x180 [nft_compat]
    [..]
    sys_sendmsg+0x18/0x40
    linux_sparc_syscall+0x34/0x44
    other info that might help us debug this:
    Possible unsafe locking scenario:
    CPU0 CPU1
    ---- ----
    lock(&net->nft.commit_mutex);
    lock(&table[i].mutex);
    lock(&net->nft.commit_mutex);
    lock(&table[i].mutex);

    Lockdep considers this an ABBA deadlock because the different nfnl subsys
    mutexes reside in the same lockdep class, but this is a false positive.

    CPU1 table[i] refers to the nftables subsys mutex, whereas CPU1 locks
    the ipset subsys mutex.

    Yi Che reported a similar lockdep splat, this time between ipset and
    ctnetlink subsys mutexes.

    Time to place them in distinct classes to avoid these warnings.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Currently netadmin inside non-trusted container can quickly allocate
    whole node's memory via request of huge ipset hashtable.
    Other ipset-related memory allocations should be restricted too.

    v2: fixed typo ALLOC -> ACCOUNT

    Signed-off-by: Vasily Averin
    Signed-off-by: Pablo Neira Ayuso

    Vasily Averin