06 Mar, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Reviewed-by: Horia Geantă
    Signed-off-by: Herbert Xu

    Gustavo A. R. Silva
     

28 Feb, 2020

1 commit


22 Feb, 2020

3 commits

  • Hardware registers of devices under control of power management cannot
    be accessed at all times. If such a device is suspended, register
    accesses may lead to undefined behavior, like reading bogus values, or
    causing exceptions or system lock-ups.

    Extend struct debugfs_regset32 with an optional field to let device
    drivers specify the device the registers in the set belong to. This
    allows debugfs_show_regset32() to make sure the device is resumed while
    its registers are being read.

    Signed-off-by: Geert Uytterhoeven
    Reviewed-by: Niklas Söderlund
    Reviewed-by: Greg Kroah-Hartman
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Herbert Xu

    Geert Uytterhoeven
     
  • Register qm to uacce framework for user crypto driver

    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Jonathan Cameron
    Signed-off-by: Zhangfei Gao
    Signed-off-by: Zhou Wang
    Signed-off-by: Herbert Xu

    Zhangfei Gao
     
  • Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
    provide Shared Virtual Addressing (SVA) between accelerators and processes.
    So accelerator can access any data structure of the main cpu.
    This differs from the data sharing between cpu and io device, which share
    only data content rather than address.
    Since unified address, hardware and user space of process can share the
    same virtual address in the communication.

    Uacce create a chrdev for every registration, the queue is allocated to
    the process when the chrdev is opened. Then the process can access the
    hardware resource by interact with the queue file. By mmap the queue
    file space to user space, the process can directly put requests to the
    hardware without syscall to the kernel space.

    The IOMMU core only tracks mmdevice bonds at the moment, because it
    only needs to handle IOTLB invalidation and PASID table entries. However
    uacce needs a finer granularity since multiple queues from the same
    device can be bound to an mm. When the mm exits, all bound queues must
    be stopped so that the IOMMU can safely clear the PASID table entry and
    reallocate the PASID.

    An intermediate struct uacce_mm links uacce devices and queues.
    Note that an mm may be bound to multiple devices but an uacce_mm
    structure only ever belongs to a single device, because we don't need
    anything more complex (if multiple devices are bound to one mm, then
    we'll create one uacce_mm for each bond).

    uacce_device --+-- uacce_mm --+-- uacce_queue
    | '-- uacce_queue
    |
    '-- uacce_mm --+-- uacce_queue
    +-- uacce_queue
    '-- uacce_queue

    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Jonathan Cameron
    Signed-off-by: Kenneth Lee
    Signed-off-by: Zaibo Xu
    Signed-off-by: Zhou Wang
    Signed-off-by: Jean-Philippe Brucker
    Signed-off-by: Zhangfei Gao
    Signed-off-by: Herbert Xu

    Kenneth Lee
     

10 Feb, 2020

4 commits

  • Pull new zonefs file system from Damien Le Moal:
    "Zonefs is a very simple file system exposing each zone of a zoned
    block device as a file.

    Unlike a regular file system with native zoned block device support
    (e.g. f2fs or the on-going btrfs effort), zonefs does not hide the
    sequential write constraint of zoned block devices to the user. As a
    result, zonefs is not a POSIX compliant file system. Its goal is to
    simplify the implementation of zoned block devices support in
    applications by replacing raw block device file accesses with a richer
    file based API, avoiding relying on direct block device file ioctls
    which may be more obscure to developers.

    One example of this approach is the implementation of LSM
    (log-structured merge) tree structures (such as used in RocksDB and
    LevelDB) on zoned block devices by allowing SSTables to be stored in a
    zone file similarly to a regular file system rather than as a range of
    sectors of a zoned device. The introduction of the higher level
    construct "one file is one zone" can help reducing the amount of
    changes needed in the application while at the same time allowing the
    use of zoned block devices with various programming languages other
    than C.

    Zonefs IO management implementation uses the new iomap generic code.
    Zonefs has been successfully tested using a functional test suite
    (available with zonefs userland format tool on github) and a prototype
    implementation of LevelDB on top of zonefs"

    * tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
    zonefs: Add documentation
    fs: New zonefs file system

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "A set of fixes for X86:

    - Ensure that the PIT is set up when the local APIC is disable or
    configured in legacy mode. This is caused by an ordering issue
    introduced in the recent changes which skip PIT initialization when
    the TSC and APIC frequencies are already known.

    - Handle malformed SRAT tables during early ACPI parsing which caused
    an infinite loop anda boot hang.

    - Fix a long standing race in the affinity setting code which affects
    PCI devices with non-maskable MSI interrupts. The problem is caused
    by the non-atomic writes of the MSI address (destination APIC id)
    and data (vector) fields which the device uses to construct the MSI
    message. The non-atomic writes are mandated by PCI.

    If both fields change and the device raises an interrupt after
    writing address and before writing data, then the MSI block
    constructs a inconsistent message which causes interrupts to be
    lost and subsequent malfunction of the device.

    The fix is to redirect the interrupt to the new vector on the
    current CPU first and then switch it over to the new target CPU.
    This allows to observe an eventually raised interrupt in the
    transitional stage (old CPU, new vector) to be observed in the APIC
    IRR and retriggered on the new target CPU and the new vector.

    The potential spurious interrupts caused by this are harmless and
    can in the worst case expose a buggy driver (all handlers have to
    be able to deal with spurious interrupts as they can and do happen
    for various reasons).

    - Add the missing suspend/resume mechanism for the HYPERV hypercall
    page which prevents resume hibernation on HYPERV guests. This
    change got lost before the merge window.

    - Mask the IOAPIC before disabling the local APIC to prevent
    potentially stale IOAPIC remote IRR bits which cause stale
    interrupt lines after resume"

    * tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/apic: Mask IOAPIC entries when disabling the local APIC
    x86/hyperv: Suspend/resume the hypercall page for hibernation
    x86/apic/msi: Plug non-maskable MSI affinity race
    x86/boot: Handle malformed SRAT tables during early ACPI parsing
    x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode

    Linus Torvalds
     
  • Pull perf fixes from Thomas Gleixner:
    "A set of fixes and improvements for the perf subsystem:

    Kernel fixes:

    - Install cgroup events to the correct CPU context to prevent a
    potential list double add

    - Prevent an integer underflow in the perf mlock accounting

    - Add a missing prototype for arch_perf_update_userpage()

    Tooling:

    - Add a missing unlock in the error path of maps__insert() in perf
    maps.

    - Fix the build with the latest libbfd

    - Fix the perf parser so it does not delete parse event terms, which
    caused a regression for using perf with the ARM CoreSight as the
    sink configuration was missing due to the deletion.

    - Fix the double free in the perf CPU map merging test case

    - Add the missing ustring support for the perf probe command"

    * tag 'perf-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf maps: Add missing unlock to maps__insert() error case
    perf probe: Add ustring support for perf probe command
    perf: Make perf able to build with latest libbfd
    perf test: Fix test case Merge cpu map
    perf parse: Copy string to perf_evsel_config_term
    perf parse: Refactor 'struct perf_evsel_config_term'
    kernel/events: Add a missing prototype for arch_perf_update_userpage()
    perf/cgroups: Install cgroup events to correct cpuctx
    perf/core: Fix mlock accounting in perf_mmap()

    Linus Torvalds
     
  • Pull interrupt fixes from Thomas Gleixner:
    "A set of fixes for the interrupt subsystem:

    - Provision only ACPI enabled redistributors on GICv3

    - Use the proper command colums when building the INVALL command for
    the GICv3-ITS

    - Ensure the allocation of the L2 vPE table for GICv4.1

    - Correct the GICv4.1 VPROBASER programming so it uses the proper
    size

    - A set of small GICv4.1 tidy up patches

    - Configuration cleanup for C-SKY interrupt chip

    - Clarify the function documentation for irq_set_wake() to document
    that the wakeup functionality is orthogonal to the irq
    disable/enable mechanism"

    * tag 'irq-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/gic-v3-its: Rename VPENDBASER/VPROPBASER accessors
    irqchip/gic-v3-its: Remove superfluous WARN_ON
    irqchip/gic-v4.1: Drop 'tmp' in inherit_vpe_l1_table_from_rd()
    irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level
    irqchip/gic-v4.1: Set vpe_l1_base for all redistributors
    irqchip/gic-v4.1: Fix programming of GICR_VPROPBASER_4_1_SIZE
    genirq: Clarify that irq wake state is orthogonal to enable/disable
    irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL
    irqchip: Some Kconfig cleanup for C-SKY
    irqchip/gic-v3: Only provision redistributors that are enabled in ACPI

    Linus Torvalds
     

09 Feb, 2020

8 commits

  • Pull networking fixes from David Miller:

    1) Unbalanced locking in mwifiex_process_country_ie, from Brian Norris.

    2) Fix thermal zone registration in iwlwifi, from Andrei
    Otcheretianski.

    3) Fix double free_irq in sgi ioc3 eth, from Thomas Bogendoerfer.

    4) Use after free in mptcp, from Florian Westphal.

    5) Use after free in wireguard's root_remove_peer_lists, from Eric
    Dumazet.

    6) Properly access packets heads in bonding alb code, from Eric
    Dumazet.

    7) Fix data race in skb_queue_len(), from Qian Cai.

    8) Fix regression in r8169 on some chips, from Heiner Kallweit.

    9) Fix XDP program ref counting in hv_netvsc, from Haiyang Zhang.

    10) Certain kinds of set link netlink operations can cause a NULL deref
    in the ipv6 addrconf code. Fix from Eric Dumazet.

    11) Don't cancel uninitialized work queue in drop monitor, from Ido
    Schimmel.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
    net: thunderx: use proper interface type for RGMII
    mt76: mt7615: fix max_nss in mt7615_eeprom_parse_hw_cap
    bpf: Improve bucket_log calculation logic
    selftests/bpf: Test freeing sockmap/sockhash with a socket in it
    bpf, sockhash: Synchronize_rcu before free'ing map
    bpf, sockmap: Don't sleep while holding RCU lock on tear-down
    bpftool: Don't crash on missing xlated program instructions
    bpf, sockmap: Check update requirements after locking
    drop_monitor: Do not cancel uninitialized work item
    mlxsw: spectrum_dpipe: Add missing error path
    mlxsw: core: Add validation of hardware device types for MGPIR register
    mlxsw: spectrum_router: Clear offload indication from IPv6 nexthops on abort
    selftests: mlxsw: Add test cases for local table route replacement
    mlxsw: spectrum_router: Prevent incorrect replacement of local table routes
    net: dsa: microchip: enable module autoprobe
    ipv6/addrconf: fix potential NULL deref in inet6_set_link_af()
    dpaa_eth: support all modes with rate adapting PHYs
    net: stmmac: update pci platform data to use phy_interface
    net: stmmac: xgmac: fix missing IFF_MULTICAST checki in dwxgmac2_set_filter
    net: stmmac: fix missing IFF_MULTICAST check in dwmac4_set_filter
    ...

    Linus Torvalds
     
  • Pull ARM SoC late updates from Olof Johansson:
    "This is some material that we picked up into our tree late, or that
    had more complex dependencies on more than one topic branch that makes
    sense to keep separately.

    - TI support for secure accelerators and hwrng on OMAP4/5

    - TI camera changes for dra7 and am437x and SGX improvement due to
    better reset control support on am335x, am437x and dra7

    - Davinci moves to proper clocksource on DM365, and regulator/audio
    improvements for DM365 and DM644x eval boards"

    * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
    ARM: dts: omap4-droid4: Enable hdq for droid4 ds250x 1-wire battery nvmem
    ARM: dts: motorola-cpcap-mapphone: Configure calibration interrupt
    ARM: dts: Configure interconnect target module for am437x sgx
    ARM: dts: Configure sgx for dra7
    ARM: dts: Configure rstctrl reset for am335x SGX
    ARM: dts: dra7: Add ti-sysc node for VPE
    ARM: dts: dra7: add vpe clkctrl node
    ARM: dts: am43x-epos-evm: Add VPFE and OV2659 entries
    ARM: dts: am437x-sk-evm: Add VPFE and OV2659 entries
    ARM: dts: am43xx: add support for clkout1 clock
    arm: dts: dra76-evm: Add CAL and OV5640 nodes
    arm: dtsi: dra76x: Add CAL dtsi node
    arm: dts: dra72-evm-common: Add entries for the CSI2 cameras
    ARM: dts: DRA72: Add CAL dtsi node
    ARM: dts: dra7-l4: Add ti-sysc node for CAM
    ARM: OMAP: DRA7xx: Make CAM clock domain SWSUP only
    ARM: dts: dra7: add cam clkctrl node
    ARM: OMAP2+: Drop legacy platform data for omap4 des
    ARM: OMAP2+: Drop legacy platform data for omap4 sham
    ARM: OMAP2+: Drop legacy platform data for omap4 aes
    ...

    Linus Torvalds
     
  • Pull ARM SoC-related driver updates from Olof Johansson:
    "Various driver updates for platforms:

    - Nvidia: Fuse support for Tegra194, continued memory controller
    pieces for Tegra30

    - NXP/FSL: Refactorings of QuickEngine drivers to support
    ARM/ARM64/PPC

    - NXP/FSL: i.MX8MP SoC driver pieces

    - TI Keystone: ring accelerator driver

    - Qualcomm: SCM driver cleanup/refactoring + support for new SoCs.

    - Xilinx ZynqMP: feature checking interface for firmware. Mailbox
    communication for power management

    - Overall support patch set for cpuidle on more complex hierarchies
    (PSCI-based)

    and misc cleanups, refactorings of Marvell, TI, other platforms"

    * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (166 commits)
    drivers: soc: xilinx: Use mailbox IPI callback
    dt-bindings: power: reset: xilinx: Add bindings for ipi mailbox
    drivers: soc: ti: knav_qmss_queue: Pass lockdep expression to RCU lists
    MAINTAINERS: Add brcmstb PCIe controller entry
    soc/tegra: fuse: Unmap registers once they are not needed anymore
    soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
    soc/tegra: fuse: Warn if straps are not ready
    soc/tegra: fuse: Cache values of straps and Chip ID registers
    memory: tegra30-emc: Correct error message for timed out auto calibration
    memory: tegra30-emc: Firm up hardware programming sequence
    memory: tegra30-emc: Firm up suspend/resume sequence
    soc/tegra: regulators: Do nothing if voltage is unchanged
    memory: tegra: Correct reset value of xusb_hostr
    soc/tegra: fuse: Add APB DMA dependency for Tegra20
    bus: tegra-aconnect: Remove PM_CLK dependency
    dt-bindings: mediatek: add MT6765 power dt-bindings
    soc: mediatek: cmdq: delete not used define
    memory: tegra: Add support for the Tegra194 memory controller
    memory: tegra: Only include support for enabled SoCs
    memory: tegra: Support DVFS on Tegra186 and later
    ...

    Linus Torvalds
     
  • Pull ARM Device-tree updates from Olof Johansson:
    "New SoCs:

    - Atmel/Microchip SAM9X60 (ARM926 SoC)

    - OMAP 37xx gets split into AM3703/AM3715/DM3725, who are all
    variants of it with different GPU/media IP configurations.

    - ST stm32mp15 SoCs (1-2 Cortex-A7, CAN, GPU depending on SKU)

    - ST Ericsson ab8505 (variant of ab8500) and db8520 (variant of
    db8500)

    - Unisoc SC9863A SoC (8x Cortex-A55 mobile chipset w/ GPU, modem)

    - Qualcomm SC7180 (8-core 64bit SoC, unnamed CPU class)

    New boards:

    - Allwinner:
    + Emlid Neutis SoM (H3 variant)
    + Libre Computer ALL-H3-IT
    + PineH64 Model B

    - Amlogic:
    + Libretech Amlogic GX PC (s905d and s912-based variants)

    - Atmel/Microchip:
    + Kizboxmini, sam9x60 EK, sama5d27 Wireless SOM (wlsom1)

    - Marvell:
    + Armada 385-based SolidRun Clearfog GTR

    - NXP:
    + Gateworks GW59xx boards based on i.MX6/6Q/6QDL
    + Tolino Shine 3 eBook reader (i.MX6sl)
    + Embedded Artists COM (i.MX7ULP)
    + SolidRun CLearfog CX/ITX and HoneyComb (LX2160A-based systems)
    + Google Coral Edge TPU (i.MX8MQ)

    - Rockchip:
    + Radxa Dalang Carrier (supports rk3288 and rk3399 SOMs)
    + Radxa Rock Pi N10 (RK3399Pro-based)
    + VMARC RK3399Pro SOM

    - ST:
    + Reference boards for stm32mp15

    - ST Ericsson:
    + Samsung Galaxy S III mini (GT-I8190)
    + HREF520 reference board for DB8520

    - TI OMAP:
    + Gen1 Amazon Echo (OMAP3630-based)

    - Qualcomm:
    + Inforce 6640 Single Board Computer (msm8996-based)
    + SC7180 IDP (SC7180-based)"

    * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (623 commits)
    dt-bindings: fix compilation error of the example in marvell,mmp3-hsic-phy.yaml
    arm64: dts: ti: k3-am654-base-board: Add CSI2 OV5640 camera
    arm64: dts: ti: k3-am65-main Add CAL node
    arm64: dts: ti: k3-j721e-main: Add McASP nodes
    arm64: dts: ti: k3-am654-main: Add McASP nodes
    arm64: dts: ti: k3-j721e: DMA support
    arm64: dts: ti: k3-j721e-main: Move secure proxy and smmu under main_navss
    arm64: dts: ti: k3-j721e-main: Correct main NAVSS representation
    arm64: dts: ti: k3-j721e: Correct the address for MAIN NAVSS
    arm64: dts: ti: k3-am65: DMA support
    arm64: dts: ti: k3-am65-main: Move secure proxy under cbass_main_navss
    arm64: dts: ti: k3-am65-main: Correct main NAVSS representation
    ARM: dts: aspeed: rainier: Add UCD90320 power sequencer
    ARM: dts: aspeed: rainier: Switch PSUs to unknown version
    arm64: dts: rockchip: Kill off "simple-panel" compatibles
    ARM: dts: rockchip: Kill off "simple-panel" compatibles
    arm64: dts: rockchip: rename dwmmc node names to mmc
    ARM: dts: rockchip: rename dwmmc node names to mmc
    arm64: dts: exynos: Rename Samsung and Exynos to lowercase
    arm64: dts: uniphier: add reset-names to NAND controller node
    ...

    Linus Torvalds
     
  • Pull vfs file system parameter updates from Al Viro:
    "Saner fs_parser.c guts and data structures. The system-wide registry
    of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
    the horror switch() in fs_parse() that would have to grow another case
    every time something got added to that system-wide registry.

    New syntax types can be added by filesystems easily now, and their
    namespace is that of functions - not of system-wide enum members. IOW,
    they can be shared or kept private and if some turn out to be widely
    useful, we can make them common library helpers, etc., without having
    to do anything whatsoever to fs_parse() itself.

    And we already get that kind of requests - the thing that finally
    pushed me into doing that was "oh, and let's add one for timeouts -
    things like 15s or 2h". If some filesystem really wants that, let them
    do it. Without somebody having to play gatekeeper for the variants
    blessed by direct support in fs_parse(), TYVM.

    Quite a bit of boilerplate is gone. And IMO the data structures make a
    lot more sense now. -200LoC, while we are at it"

    * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
    tmpfs: switch to use of invalfc()
    cgroup1: switch to use of errorfc() et.al.
    procfs: switch to use of invalfc()
    hugetlbfs: switch to use of invalfc()
    cramfs: switch to use of errofc() et.al.
    gfs2: switch to use of errorfc() et.al.
    fuse: switch to use errorfc() et.al.
    ceph: use errorfc() and friends instead of spelling the prefix out
    prefix-handling analogues of errorf() and friends
    turn fs_param_is_... into functions
    fs_parse: handle optional arguments sanely
    fs_parse: fold fs_parameter_desc/fs_parameter_spec
    fs_parser: remove fs_parameter_description name field
    add prefix to fs_context->log
    ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
    new primitive: __fs_parse()
    switch rbd and libceph to p_log-based primitives
    struct p_log, variants of warnf() et.al. taking that one instead
    teach logfc() to handle prefices, give it saner calling conventions
    get rid of cg_invalf()
    ...

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:

    - bmap series from cmaiolino

    - getting rid of convolutions in copy_mount_options() (use a couple of
    copy_from_user() instead of the __get_user() crap)

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    saner copy_mount_options()
    fibmap: Reject negative block numbers
    fibmap: Use bmap instead of ->bmap method in ioctl_fibmap
    ecryptfs: drop direct calls to ->bmap
    cachefiles: drop direct usage of ->bmap method.
    fs: Enable bmap() function to properly return errors

    Linus Torvalds
     
  • Merge thundering herd avoidance on pipe IO.

    This would have been applied for 5.5 already, but got delayed because of
    a user-space race condition in the GNU make jobserver code. Now that
    there's a new GNU make 4.3 release, and most distributions seem to have
    at least applied the (almost three year old) fix for the problem, let's
    see if people notice.

    And it might have been just bad random timing luck on my machine.

    If you do hit the race condition, things will still work, but the
    symptom is that you don't get nearly the expected parallelism when using
    "make -j".

    The jobserver bug can definitely happen without this patch too, but
    seems to be easier to trigger when we no longer wake up pipe waiters
    unnecessarily.

    * pipe-exclusive-wakeup:
    pipe: use exclusive waits when reading or writing

    Linus Torvalds
     
  • This makes the pipe code use separate wait-queues and exclusive waiting
    for readers and writers, avoiding a nasty thundering herd problem when
    there are lots of readers waiting for data on a pipe (or, less commonly,
    lots of writers waiting for a pipe to have space).

    While this isn't a common occurrence in the traditional "use a pipe as a
    data transport" case, where you typically only have a single reader and
    a single writer process, there is one common special case: using a pipe
    as a source of "locking tokens" rather than for data communication.

    In particular, the GNU make jobserver code ends up using a pipe as a way
    to limit parallelism, where each job consumes a token by reading a byte
    from the jobserver pipe, and releases the token by writing a byte back
    to the pipe.

    This pattern is fairly traditional on Unix, and works very well, but
    will waste a lot of time waking up a lot of processes when only a single
    reader needs to be woken up when a writer releases a new token.

    A simplified test-case of just this pipe interaction is to create 64
    processes, and then pass a single token around between them (this
    test-case also intentionally passes another token that gets ignored to
    test the "wake up next" logic too, in case anybody wonders about it):

    #include

    int main(int argc, char **argv)
    {
    int fd[2], counters[2];

    pipe(fd);
    counters[0] = 0;
    counters[1] = -1;
    write(fd[1], counters, sizeof(counters));

    /* 64 processes */
    fork(); fork(); fork(); fork(); fork(); fork();

    do {
    int i;
    read(fd[0], &i, sizeof(i));
    if (i < 0)
    continue;
    counters[0] = i+1;
    write(fd[1], counters, (1+(i & 1)) *sizeof(int));
    } while (counters[0] < 1000000);
    return 0;
    }

    and in a perfect world, passing that token around should only cause one
    context switch per transfer, when the writer of a token causes a
    directed wakeup of just a single reader.

    But with the "writer wakes all readers" model we traditionally had, on
    my test box the above case causes more than an order of magnitude more
    scheduling: instead of the expected ~1M context switches, "perf stat"
    shows

    231,852.37 msec task-clock # 15.857 CPUs utilized
    11,250,961 context-switches # 0.049 M/sec
    616,304 cpu-migrations # 0.003 M/sec
    1,648 page-faults # 0.007 K/sec
    1,097,903,998,514 cycles # 4.735 GHz
    120,781,778,352 instructions # 0.11 insn per cycle
    27,997,056,043 branches # 120.754 M/sec
    283,581,233 branch-misses # 1.01% of all branches

    14.621273891 seconds time elapsed

    0.018243000 seconds user
    3.611468000 seconds sys

    before this commit.

    After this commit, I get

    5,229.55 msec task-clock # 3.072 CPUs utilized
    1,212,233 context-switches # 0.232 M/sec
    103,951 cpu-migrations # 0.020 M/sec
    1,328 page-faults # 0.254 K/sec
    21,307,456,166 cycles # 4.074 GHz
    12,947,819,999 instructions # 0.61 insn per cycle
    2,881,985,678 branches # 551.096 M/sec
    64,267,015 branch-misses # 2.23% of all branches

    1.702148350 seconds time elapsed

    0.004868000 seconds user
    0.110786000 seconds sys

    instead. Much better.

    [ Note! This kernel improvement seems to be very good at triggering a
    race condition in the make jobserver (in GNU make 4.2.1) for me. It's
    a long known bug that was fixed back in June 2017 by GNU make commit
    b552b0525198 ("[SV 51159] Use a non-blocking read with pselect to
    avoid hangs.").

    But there wasn't a new release of GNU make until 4.3 on Jan 19 2020,
    so a number of distributions may still have the buggy version. Some
    have backported the fix to their 4.2.1 release, though, and even
    without the fix it's quite timing-dependent whether the bug actually
    is hit. ]

    Josh Triplett says:
    "I've been hammering on your pipe fix patch (switching to exclusive
    wait queues) for a month or so, on several different systems, and I've
    run into no issues with it. The patch *substantially* improves
    parallel build times on large (~100 CPU) systems, both with parallel
    make and with other things that use make's pipe-based jobserver.

    All current distributions (including stable and long-term stable
    distributions) have versions of GNU make that no longer have the
    jobserver bug"

    Tested-by: Josh Triplett
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Feb, 2020

20 commits

  • …/maz/arm-platforms into irq/urgent

    Pull irqchip fixes for 5.6, take #1 from Marc Zyngier:

    - Guarantee allocation of L2 vPE table for GICv4.1
    - Fix GICv4.1 VPROPBASER programming
    - Numerous GICv4.1 tidy ups
    - Fix disabled GICv3 redistributor provisioning with ACPI
    - KConfig cleanup for C-SKY

    Thomas Gleixner
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2020-02-07

    The following pull-request contains BPF updates for your *net* tree.

    We've added 15 non-merge commits during the last 10 day(s) which contain
    a total of 12 files changed, 114 insertions(+), 31 deletions(-).

    The main changes are:

    1) Various BPF sockmap fixes related to RCU handling in the map's tear-
    down code, from Jakub Sitnicki.

    2) Fix macro state explosion in BPF sk_storage map when calculating its
    bucket_log on allocation, from Martin KaFai Lau.

    3) Fix potential BPF sockmap update race by rechecking socket's established
    state under lock, from Lorenz Bauer.

    4) Fix crash in bpftool on missing xlated instructions when kptr_restrict
    sysctl is set, from Toke Høiland-Jørgensen.

    5) Fix i40e's XSK wakeup code to return proper error in busy state and
    various misc fixes in xdpsock BPF sample code, from Maciej Fijalkowski.

    6) Fix the way modifiers are skipped in BTF in the verifier while walking
    pointers to avoid program rejection, from Alexei Starovoitov.

    7) Fix Makefile for runqslower BPF tool to i) rebuild on libbpf changes and
    ii) to fix undefined reference linker errors for older gcc version due to
    order of passed gcc parameters, from Yulia Kartseva and Song Liu.

    8) Fix a trampoline_count BPF kselftest warning about missing braces around
    initializer, from Andrii Nakryiko.

    9) Fix up redundant "HAVE" prefix from large INSN limit kernel probe in
    bpftool, from Michal Rostecki.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Currently, we will not set vpe_l1_page for the current RD if we can
    inherit the vPE configuration table from another RD (or ITS), which
    results in an inconsistency between RDs within the same CommonLPIAff
    group.

    Let's rename it to vpe_l1_base to indicate the base address of the
    vPE configuration table of this RD, and set it properly for *all*
    v4.1 redistributors.

    Signed-off-by: Zenghui Yu
    Signed-off-by: Marc Zyngier
    Link: https://lore.kernel.org/r/20200206075711.1275-3-yuzenghui@huawei.com

    Zenghui Yu
     
  • Puyll NFS client updates from Anna Schumaker:
    "Stable bugfixes:
    - Fix memory leaks and corruption in readdir # v2.6.37+
    - Directory page cache needs to be locked when read # v2.6.37+

    New features:
    - Convert NFS to use the new mount API
    - Add "softreval" mount option to let clients use cache if server goes down
    - Add a config option to compile without UDP support
    - Limit the number of inactive delegations the client can cache at once
    - Improved readdir concurrency using iterate_shared()

    Other bugfixes and cleanups:
    - More 64-bit time conversions
    - Add additional diagnostic tracepoints
    - Check for holes in swapfiles, and add dependency on CONFIG_SWAP
    - Various xprtrdma cleanups to prepare for 5.7's changes
    - Several fixes for NFS writeback and commit handling
    - Fix acls over krb5i/krb5p mounts
    - Recover from premature loss of openstateids
    - Fix NFS v3 chacl and chmod bug
    - Compare creds using cred_fscmp()
    - Use kmemdup_nul() in more places
    - Optimize readdir cache page invalidation
    - Lease renewal and recovery fixes"

    * tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (93 commits)
    NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
    NFSv4: try lease recovery on NFS4ERR_EXPIRED
    NFS: Fix memory leaks
    nfs: optimise readdir cache page invalidation
    NFS: Switch readdir to using iterate_shared()
    NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
    NFS: Directory page cache pages need to be locked when read
    NFS: Fix memory leaks and corruption in readdir
    SUNRPC: Use kmemdup_nul() in rpc_parse_scope_id()
    NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
    NFSv4: Limit the total number of cached delegations
    NFSv4: Add accounting for the number of active delegations held
    NFSv4: Try to return the delegation immediately when marked for return on close
    NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
    NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
    NFS: nfs_find_open_context() should use cred_fscmp()
    NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
    NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
    NFS: remove unused macros
    nfs: Return EINVAL rather than ERANGE for mount parse errors
    ...

    Linus Torvalds
     
  • Pull i2c updates from Wolfram Sang:
    "i2c core:

    - huge improvements and refactorizations of the Linux I2C
    documentation (lots of thanks to Luca for doing it and Jean for the
    careful review)

    - subsystem wide API conversion to i2c_new_client_device()

    - remove obsolete parport-light driver

    - smaller core updates (removal of 'extern', enabling more compile
    testing, use more helper macros)

    - and quite a bunch of driver updates (new IDs, simplifications,
    better PM, support of atomic transfers and other improvements)

    i2c-mux:

    - The main feature is the idle-state rework of the pca954x driver
    from Biwen Li

    at24 driver:

    - minor maintenance: update the license tag, sort headers

    - move support for the write-protect pin into nvmem core

    - add a reference to the new wp-gpios property in nvmem to at25
    bindings

    - add support for regulator and pm_runtime control"

    * 'i2c/for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (91 commits)
    i2c: cros-ec-tunnel: Fix ACPI identifier
    i2c: cros-ec-tunnel: Fix slave device enumeration
    i2c: stm32f7: add PM_SLEEP suspend/resume support
    i2c: cadence: Fix wording in i2c-cadence driver
    i2c: cadence: Fix power management order of operations
    i2c: cadence: Fix error printing in case of defer
    i2c: cadence: Handle transfer_size rollover
    i2c: i801: Add support for Intel Comet Lake PCH-V
    docs: i2c: writing-clients: properly name the stop condition
    docs: i2c: i2c-protocol: use same wording as smbus-protocol
    docs: i2c: rename sections so the overall picture is clearer
    docs: i2c: old-module-parameters: use monospace instead of ""
    docs: i2c: old-module-parameters: clarify this is for obsolete kernels
    docs: i2c: old-module-parameters: fix internal hyperlink
    docs: i2c: instantiating-devices: use monospace for sysfs attributes
    docs: i2c: instantiating-devices: rearrange static instatiation
    docs: i2c: instantiating-devices: fix internal hyperlink
    docs: i2c: smbus-protocol: improve I2C Block transactions description
    docs: i2c: smbus-protocol: fix punctuation
    docs: i2c: smbus-protocol: fix typo
    ...

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Just some fixes for this merge window: the tegra changes fix some
    regressions in the merge, nouveau has a few modesetting fixes.

    The amdgpu fixes are bit bigger, but they contain a couple of weeks of
    fixes, and don't seem to contain anything that isn't really a fix.

    Summary:

    tegra:
    - merge window regression fixes

    nouveau:
    - couple of volta/turing modesetting fixes

    amdgpu:
    - EDC fixes for Arcturus
    - GDDR6 memory training fixe
    - Fix for reading gfx clockgating registers while in GFXOFF state
    - i2c freq fixes
    - Misc display fixes
    - TLB invalidation fix when using semaphores
    - VCN 2.5 instancing fixes
    - Switch raven1 gfxoff to a blacklist
    - Coreboot workaround for KV/KB
    - Root cause dongle fixes for display and revert workaround
    - Enable GPU reset for renoir and navi
    - Navi overclocking fixes
    - Fix up confusing warnings in display clock validation on raven

    amdkfd:
    - SDMA fix

    radeon:
    - Misc LUT fixes"

    * tag 'drm-next-2020-02-07' of git://anongit.freedesktop.org/drm/drm: (90 commits)
    gpu: host1x: Set DMA direction only for DMA-mapped buffer objects
    drm/tegra: Reuse IOVA mapping where possible
    drm/tegra: Relax IOMMU usage criteria on old Tegra
    drm/amd/dm/mst: Ignore payload update failures
    drm/amdgpu: update default voltage for boot od table for navi1x
    drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage
    drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency
    drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)
    drm/amdgpu: fetch default VDDC curve voltages (v2)
    drm/amdgpu/smu_v11_0: Correct behavior of restoring default tables (v2)
    drm/amdgpu/navi10: add OD_RANGE for navi overclocking
    drm/amdgpu/navi: fix index for OD MCLK
    drm/amd/display: Fix HW/SW state mismatch
    drm/amd/display: Fix a typo when computing dsc configuration
    drm/amd/powerplay: fix navi10 system intermittent reboot issue V2
    drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode
    drm/amd/display: Only enable cursor on pipes that need it
    drm/nouveau/kms/gv100-: avoid sending a core update until the first modeset
    drm/nouveau/kms/gv100-: move window ownership setup into modesetting path
    drm/nouveau/disp/gv100-: halt NV_PDISP_FE_RM_INTR_STAT_CTRL_DISP_ERROR storms
    ...

    Linus Torvalds
     
  • Pull clk fixes from Stephen Boyd:
    "A collection of fixes:

    - Make of_clk.h self contained

    - Fix new qcom DT bindings that just merged to match the DTS files

    - Fix qcom clk driver to properly detect DFS clk frequencies

    - Fix the ls1028a driver to not deref a pointer before assigning it"

    * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
    of: clk: Make self-contained
    clk: qcom: Use ARRAY_SIZE in videocc-sc7180 for parent clocks
    clk: qcom: Get rid of the test clock for videocc-sc7180
    dt-bindings: clock: Cleanup qcom,videocc bindings for sdm845/sc7180
    clk: qcom: Use ARRAY_SIZE in gpucc-sc7180 for parent clocks
    clk: qcom: Get rid of the test clock for gpucc-sc7180
    dt-bindings: clock: Fix qcom,gpucc bindings for sdm845/sc7180/msm8998
    clk: qcom: Use ARRAY_SIZE in dispcc-sc7180 for parent clocks
    clk: qcom: Get rid of the test clock for dispcc-sc7180
    clk: qcom: Get rid of fallback global names for dispcc-sc7180
    dt-bindings: clock: Fix qcom,dispcc bindings for sdm845/sc7180
    clk: qcom: rcg2: Don't crash if our parent can't be found; return an error
    clk: ls1028a: fix a dereference of pointer 'parent' before a null check
    dt-bindings: clk: qcom: Fix self-validation, split, and clean cruft
    clk: qcom: Don't overwrite 'cfg' in clk_rcg2_dfs_populate_freq()

    Linus Torvalds
     
  • Pull watchdog updates from Wim Van Sebroeck:

    - add IT8786 chipset ID

    - addition of sam9x60 compatible watchdog

    - da9062 improvements

    - fix UAF in reboot notifier handling in watchdog core code

    - other fixes and small improvements

    * tag 'linux-watchdog-5.6-rc1' of git://www.linux-watchdog.org/linux-watchdog:
    watchdog: da9062: make restart handler atomic safe
    watchdog: mtk_wdt: mt2712: Add reset controller
    watchdog: mtk_wdt: mt8183: Add reset controller
    dt-bindings: mediatek: mt2712: Add #reset-cells
    dt-bindings: mediatek: mt8183: Add #reset-cells
    dt-bindings: watchdog: da9062: add suspend disable option
    watchdog: it87_wdt: add IT8786 ID
    watchdog: dw_wdt: ping watchdog to reset countdown before start
    watchdog: fix UAF in reboot notifier handling in watchdog core code
    watchdog: cadence: Skip printing pointer value
    watchdog: qcom: Use platform_get_irq_optional() for bark irq
    watchdog: da9062: add power management ops
    watchdog: make DesignWare watchdog allow users to set bigger timeout value
    drivers: watchdog: stm32_iwdg: set WDOG_HW_RUNNING at probe
    watchdog: sama5d4_wdt: addition of sam9x60 compatible watchdog

    Linus Torvalds
     
  • called errorfc/infofc/warnfc/invalfc

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Don't bother with "mixed" options that would allow both the
    form with and without argument (i.e. both -o foo and -o foo=bar).
    Rather than trying to shove both into a single fs_parameter_spec,
    allow having with-argument and no-argument specs with the same
    name and teach fs_parse to handle that.

    There are very few options of that sort, and they are actually
    easier to handle that way - callers end up with less postprocessing.

    Signed-off-by: Al Viro

    Al Viro
     
  • The former contains nothing but a pointer to an array of the latter...

    Signed-off-by: Al Viro

    Al Viro
     
  • Unused now.

    Signed-off-by: Eric Sandeen
    Acked-by: David Howells
    Signed-off-by: Al Viro

    Eric Sandeen
     
  • ... turning it into struct p_log embedded into fs_context. Initialize
    the prefix with fs_type->name, turning fs_parse() into a trivial
    inline wrapper for __fs_parse().

    This makes fs_parameter_description->name completely unused.

    Signed-off-by: Al Viro

    Al Viro
     
  • ... and now errorf() et.al. are never called with NULL fs_context,
    so we can get rid of conditional in those.

    Signed-off-by: Al Viro

    Al Viro
     
  • fs_parse() analogue taking p_log instead of fs_context.
    fs_parse() turned into a wrapper, callers in ceph_common and rbd
    switched to __fs_parse().

    As the result, fs_parse() never gets NULL fs_context and neither
    do fs_context-based logging primitives

    Signed-off-by: Al Viro

    Al Viro
     
  • primitives for prefixed logging

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Its behaviour is identical to that of fs_value_is_filename.
    It makes no sense, anyway - LOOKUP_EMPTY affects nothing
    whatsoever once the pathname has been imported from userland.
    And both fs_value_is_filename and fs_value_is_filename_empty
    carry an already imported pathname.

    Signed-off-by: Al Viro

    Al Viro
     
  • Have the arrays of constant_table self-terminated (by NULL ->name
    in the final entry). Simplifies lookup_constant() and allows to
    reuse the search for enum params as well.

    Signed-off-by: Al Viro

    Al Viro
     

07 Feb, 2020

3 commits

  • zonefs is a very simple file system exposing each zone of a zoned block
    device as a file. Unlike a regular file system with zoned block device
    support (e.g. f2fs), zonefs does not hide the sequential write
    constraint of zoned block devices to the user. Files representing
    sequential write zones of the device must be written sequentially
    starting from the end of the file (append only writes).

    As such, zonefs is in essence closer to a raw block device access
    interface than to a full featured POSIX file system. The goal of zonefs
    is to simplify the implementation of zoned block device support in
    applications by replacing raw block device file accesses with a richer
    file API, avoiding relying on direct block device file ioctls which may
    be more obscure to developers. One example of this approach is the
    implementation of LSM (log-structured merge) tree structures (such as
    used in RocksDB and LevelDB) on zoned block devices by allowing SSTables
    to be stored in a zone file similarly to a regular file system rather
    than as a range of sectors of a zoned device. The introduction of the
    higher level construct "one file is one zone" can help reducing the
    amount of changes needed in the application as well as introducing
    support for different application programming languages.

    Zonefs on-disk metadata is reduced to an immutable super block to
    persistently store a magic number and optional feature flags and
    values. On mount, zonefs uses blkdev_report_zones() to obtain the device
    zone configuration and populates the mount point with a static file tree
    solely based on this information. E.g. file sizes come from the device
    zone type and write pointer offset managed by the device itself.

    The zone files created on mount have the following characteristics.
    1) Files representing zones of the same type are grouped together
    under a common sub-directory:
    * For conventional zones, the sub-directory "cnv" is used.
    * For sequential write zones, the sub-directory "seq" is used.
    These two directories are the only directories that exist in zonefs.
    Users cannot create other directories and cannot rename nor delete
    the "cnv" and "seq" sub-directories.
    2) The name of zone files is the number of the file within the zone
    type sub-directory, in order of increasing zone start sector.
    3) The size of conventional zone files is fixed to the device zone size.
    Conventional zone files cannot be truncated.
    4) The size of sequential zone files represent the file's zone write
    pointer position relative to the zone start sector. Truncating these
    files is allowed only down to 0, in which case, the zone is reset to
    rewind the zone write pointer position to the start of the zone, or
    up to the zone size, in which case the file's zone is transitioned
    to the FULL state (finish zone operation).
    5) All read and write operations to files are not allowed beyond the
    file zone size. Any access exceeding the zone size is failed with
    the -EFBIG error.
    6) Creating, deleting, renaming or modifying any attribute of files and
    sub-directories is not allowed.
    7) There are no restrictions on the type of read and write operations
    that can be issued to conventional zone files. Buffered, direct and
    mmap read & write operations are accepted. For sequential zone files,
    there are no restrictions on read operations, but all write
    operations must be direct IO append writes. mmap write of sequential
    files is not allowed.

    Several optional features of zonefs can be enabled at format time.
    * Conventional zone aggregation: ranges of contiguous conventional
    zones can be aggregated into a single larger file instead of the
    default one file per zone.
    * File ownership: The owner UID and GID of zone files is by default 0
    (root) but can be changed to any valid UID/GID.
    * File access permissions: the default 640 access permissions can be
    changed.

    The mkzonefs tool is used to format zoned block devices for use with
    zonefs. This tool is available on Github at:

    git@github.com:damien-lemoal/zonefs-tools.git.

    zonefs-tools also includes a test suite which can be run against any
    zoned block device, including null_blk block device created with zoned
    mode.

    Example: the following formats a 15TB host-managed SMR HDD with 256 MB
    zones with the conventional zones aggregation feature enabled.

    $ sudo mkzonefs -o aggr_cnv /dev/sdX
    $ sudo mount -t zonefs /dev/sdX /mnt
    $ ls -l /mnt/
    total 0
    dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
    dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq

    The size of the zone files sub-directories indicate the number of files
    existing for each type of zones. In this example, there is only one
    conventional zone file (all conventional zones are aggregated under a
    single file).

    $ ls -l /mnt/cnv
    total 137101312
    -rw-r----- 1 root root 140391743488 Nov 25 13:23 0

    This aggregated conventional zone file can be used as a regular file.

    $ sudo mkfs.ext4 /mnt/cnv/0
    $ sudo mount -o loop /mnt/cnv/0 /data

    The "seq" sub-directory grouping files for sequential write zones has
    in this example 55356 zones.

    $ ls -lv /mnt/seq
    total 14511243264
    -rw-r----- 1 root root 0 Nov 25 13:23 0
    -rw-r----- 1 root root 0 Nov 25 13:23 1
    -rw-r----- 1 root root 0 Nov 25 13:23 2
    ...
    -rw-r----- 1 root root 0 Nov 25 13:23 55354
    -rw-r----- 1 root root 0 Nov 25 13:23 55355

    For sequential write zone files, the file size changes as data is
    appended at the end of the file, similarly to any regular file system.

    $ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct
    1+0 records in
    1+0 records out
    4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s

    $ ls -l /mnt/seq/0
    -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0

    The written file can be truncated to the zone size, preventing any
    further write operation.

    $ truncate -s 268435456 /mnt/seq/0
    $ ls -l /mnt/seq/0
    -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0

    Truncation to 0 size allows freeing the file zone storage space and
    restart append-writes to the file.

    $ truncate -s 0 /mnt/seq/0
    $ ls -l /mnt/seq/0
    -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0

    Since files are statically mapped to zones on the disk, the number of
    blocks of a file as reported by stat() and fstat() indicates the size
    of the file zone.

    $ stat /mnt/seq/0
    File: /mnt/seq/0
    Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
    Device: 870h/2160d Inode: 50431 Links: 1
    Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
    Access: 2019-11-25 13:23:57.048971997 +0900
    Modify: 2019-11-25 13:52:25.553805765 +0900
    Change: 2019-11-25 13:52:25.553805765 +0900
    Birth: -

    The number of blocks of the file ("Blocks") in units of 512B blocks
    gives the maximum file size of 524288 * 512 B = 256 MB, corresponding
    to the device zone size in this example. Of note is that the "IO block"
    field always indicates the minimum IO size for writes and corresponds
    to the device physical sector size.

    This code contains contributions from:
    * Johannes Thumshirn ,
    * Darrick J. Wong ,
    * Christoph Hellwig ,
    * Chaitanya Kulkarni and
    * Ting Yao .

    Signed-off-by: Damien Le Moal
    Reviewed-by: Dave Chinner

    Damien Le Moal
     
  • no real difference now

    Signed-off-by: Al Viro

    Al Viro
     
  • Don't do a single array; attach them to fsparam_enum() entry
    instead. And don't bother trying to embed the names into those -
    it actually loses memory, with no real speedup worth mentioning.

    Simplifies validation as well.

    Signed-off-by: Al Viro

    Al Viro