24 Oct, 2020

1 commit


02 Sep, 2020

1 commit


07 Aug, 2020

1 commit


08 Jul, 2020

1 commit

  • Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined
    in NVM Express TP4053. Zoned namespaces are discovered based on their
    Command Set Identifier reported in the namespaces Namespace
    Identification Descriptor list. A successfully discovered Zoned
    Namespace will be registered with the block layer as a host managed
    zoned block device with Zone Append command support. A namespace that
    does not support append is not supported by the driver.

    Reviewed-by: Martin K. Petersen
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Javier González
    Reviewed-by: Himanshu Madhani
    Signed-off-by: Hans Holmberg
    Signed-off-by: Dmitry Fomichev
    Signed-off-by: Ajay Joshi
    Signed-off-by: Aravind Ramesh
    Signed-off-by: Niklas Cassel
    Signed-off-by: Matias Bjørling
    Signed-off-by: Damien Le Moal
    Signed-off-by: Keith Busch
    Signed-off-by: Christoph Hellwig

    Keith Busch
     

25 Jun, 2020

1 commit


18 Jun, 2020

1 commit


14 Jun, 2020

1 commit

  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

14 May, 2020

2 commits

  • Blk-crypto delegates crypto operations to inline encryption hardware
    when available. The separately configurable blk-crypto-fallback contains
    a software fallback to the kernel crypto API - when enabled, blk-crypto
    will use this fallback for en/decryption when inline encryption hardware
    is not available.

    This lets upper layers not have to worry about whether or not the
    underlying device has support for inline encryption before deciding to
    specify an encryption context for a bio. It also allows for testing
    without actual inline encryption hardware - in particular, it makes it
    possible to test the inline encryption code in ext4 and f2fs simply by
    running xfstests with the inlinecrypt mount option, which in turn allows
    for things like the regular upstream regression testing of ext4 to cover
    the inline encryption code paths.

    For more details, refer to Documentation/block/inline-encryption.rst.

    Signed-off-by: Satya Tangirala
    Reviewed-by: Eric Biggers
    Signed-off-by: Jens Axboe

    Satya Tangirala
     
  • Inline Encryption hardware allows software to specify an encryption context
    (an encryption key, crypto algorithm, data unit num, data unit size) along
    with a data transfer request to a storage device, and the inline encryption
    hardware will use that context to en/decrypt the data. The inline
    encryption hardware is part of the storage device, and it conceptually sits
    on the data path between system memory and the storage device.

    Inline Encryption hardware implementations often function around the
    concept of "keyslots". These implementations often have a limited number
    of "keyslots", each of which can hold a key (we say that a key can be
    "programmed" into a keyslot). Requests made to the storage device may have
    a keyslot and a data unit number associated with them, and the inline
    encryption hardware will en/decrypt the data in the requests using the key
    programmed into that associated keyslot and the data unit number specified
    with the request.

    As keyslots are limited, and programming keys may be expensive in many
    implementations, and multiple requests may use exactly the same encryption
    contexts, we introduce a Keyslot Manager to efficiently manage keyslots.

    We also introduce a blk_crypto_key, which will represent the key that's
    programmed into keyslots managed by keyslot managers. The keyslot manager
    also functions as the interface that upper layers will use to program keys
    into inline encryption hardware. For more information on the Keyslot
    Manager, refer to documentation found in block/keyslot-manager.c and
    linux/keyslot-manager.h.

    Co-developed-by: Eric Biggers
    Signed-off-by: Eric Biggers
    Signed-off-by: Satya Tangirala
    Reviewed-by: Eric Biggers
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Satya Tangirala
     

01 May, 2020

1 commit

  • On each IO completion, iocost decides whether the IO met or missed its latency
    target. Currently, the targets are fixed numbers per IO type. While this can be
    good enough for loose latency targets way higher than typical completion
    latencies, the effect of IO size makes it difficult to tighten the latency
    target - a target adequate for 4k IOs might be too tight for 512k IOs and
    vice-versa.

    iocost already has all the necessary information to account for different IO
    sizes when testing whether the latency target is met as iocost can calculate the
    size vtime cost of a given IO. This patch updates the completion path to
    calculate the size vtime cost of the IO, deduct the nsec equivalent from the
    observed latency and use the adjusted value to decide whether the target is met.

    This makes latency targets independent from IO size and enables determining
    adequate latency targets with fixed size fio runs.

    Signed-off-by: Tejun Heo
    Cc: Andy Newell
    Signed-off-by: Jens Axboe

    Tejun Heo
     

30 Jan, 2020

2 commits


13 Jan, 2020

1 commit

  • Changes v5 => v6:
    - Blk-crypto's kernel crypto API fallback is no longer restricted to
    8-byte DUNs. It's also now separately configurable from blk-crypto, and
    can be disabled entirely, while still allowing the kernel to use inline
    encryption hardware. Further, struct bio_crypt_ctx takes up less space,
    and no longer contains the information needed by the crypto API
    fallback - the fallback allocates the required memory when necessary.
    - Blk-crypto now supports all file content encryption modes supported by
    fscrypt.
    - Fixed bio merging logic in blk-merge.c
    - Fscrypt now supports inline encryption with the direct key policy, since
    blk-crypto now has support for larger DUNs.
    - Keyslot manager now uses a hashtable to lookup which keyslot contains
    any particular key (thanks Eric!)
    - Fscrypt support for inline encryption now handles filesystems with
    multiple underlying block devices (thanks Eric!)
    - Numerous cleanups

    Bug: 137270441
    Test: refer to I26376479ee38259b8c35732cb3a1d7e15f9b05a3
    Change-Id: I13e2e327e0b4784b394cb1e7cf32a04856d95f01
    Link: https://lore.kernel.org/linux-block/20191218145136.172774-1-satyat@google.com/
    Signed-off-by: Satya Tangirala

    Satya Tangirala
     

07 Jan, 2020

1 commit

  • Currently t10-pi can only be built into the block layer which via
    crc-t10dif pulls in a whole chunk of the Crypto API. In fact all
    users of t10-pi work as modules and there is no reason for it to
    always be built-in.

    This patch adds a new hidden option for t10-pi that is selected
    automatically based on BLK_DEV_INTEGRITY and whether the users
    of t10-pi are built-in or not.

    Signed-off-by: Herbert Xu
    Signed-off-by: Jens Axboe

    Herbert Xu
     

09 Dec, 2019

1 commit


08 Nov, 2019

1 commit


31 Oct, 2019

2 commits

  • We introduce blk-crypto, which manages programming keyslots for struct
    bios. With blk-crypto, filesystems only need to call bio_crypt_set_ctx with
    the encryption key, algorithm and data_unit_num; they don't have to worry
    about getting a keyslot for each encryption context, as blk-crypto handles
    that. Blk-crypto also makes it possible for layered devices like device
    mapper to make use of inline encryption hardware.

    Blk-crypto delegates crypto operations to inline encryption hardware when
    available, and also contains a software fallback to the kernel crypto API.
    For more details, refer to Documentation/block/inline-encryption.rst.

    Bug: 137270441
    Test: tested as series; see Ie1b77f7615d6a7a60fdc9105c7ab2200d17636a8
    Change-Id: I7df59fef0c1e90043b1899c5a95973e23afac0c5
    Signed-off-by: Satya Tangirala
    Link: https://patchwork.kernel.org/patch/11214731/

    Satya Tangirala
     
  • Inline Encryption hardware allows software to specify an encryption context
    (an encryption key, crypto algorithm, data unit num, data unit size, etc.)
    along with a data transfer request to a storage device, and the inline
    encryption hardware will use that context to en/decrypt the data. The
    inline encryption hardware is part of the storage device, and it
    conceptually sits on the data path between system memory and the storage
    device.

    Inline Encryption hardware implementations often function around the
    concept of "keyslots". These implementations often have a limited number
    of "keyslots", each of which can hold an encryption context (we say that
    an encryption context can be "programmed" into a keyslot). Requests made
    to the storage device may have a keyslot associated with them, and the
    inline encryption hardware will en/decrypt the data in the requests using
    the encryption context programmed into that associated keyslot. As
    keyslots are limited, and programming keys may be expensive in many
    implementations, and multiple requests may use exactly the same encryption
    contexts, we introduce a Keyslot Manager to efficiently manage keyslots.
    The keyslot manager also functions as the interface that upper layers will
    use to program keys into inline encryption hardware. For more information
    on the Keyslot Manager, refer to documentation found in
    block/keyslot-manager.c and linux/keyslot-manager.h.

    Bug: 137270441
    Test: tested as series; see Ie1b77f7615d6a7a60fdc9105c7ab2200d17636a8
    Change-Id: Iea1ee5a7eec46cb50d33cf1e2d20dfb7335af4ed
    Signed-off-by: Satya Tangirala
    Link: https://patchwork.kernel.org/patch/11214713/

    Satya Tangirala
     

29 Aug, 2019

2 commits

  • This patchset implements IO cost model based work-conserving
    proportional controller.

    While io.latency provides the capability to comprehensively prioritize
    and protect IOs depending on the cgroups, its protection is binary -
    the lowest latency target cgroup which is suffering is protected at
    the cost of all others. In many use cases including stacking multiple
    workload containers in a single system, it's necessary to distribute
    IO capacity with better granularity.

    One challenge of controlling IO resources is the lack of trivially
    observable cost metric. The most common metrics - bandwidth and iops
    - can be off by orders of magnitude depending on the device type and
    IO pattern. However, the cost isn't a complete mystery. Given
    several key attributes, we can make fairly reliable predictions on how
    expensive a given stream of IOs would be, at least compared to other
    IO patterns.

    The function which determines the cost of a given IO is the IO cost
    model for the device. This controller distributes IO capacity based
    on the costs estimated by such model. The more accurate the cost
    model the better but the controller adapts based on IO completion
    latency and as long as the relative costs across differents IO
    patterns are consistent and sensible, it'll adapt to the actual
    performance of the device.

    Currently, the only implemented cost model is a simple linear one with
    a few sets of default parameters for different classes of device.
    This covers most common devices reasonably well. All the
    infrastructure to tune and add different cost models is already in
    place and a later patch will also allow using bpf progs for cost
    models.

    Please see the top comment in blk-iocost.c and documentation for
    more details.

    v2: Rebased on top of RQ_ALLOC_TIME changes and folded in Rik's fix
    for a divide-by-zero bug in current_hweight() triggered by zero
    inuse_sum.

    Signed-off-by: Tejun Heo
    Cc: Andy Newell
    Cc: Josef Bacik
    Cc: Rik van Riel
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • There are currently two start time timestamps - start_time_ns and
    io_start_time_ns. The former marks the request allocation and and the
    second issue-to-device time. The planned io.weight controller needs
    to measure the total time bios take to execute after it leaves rq_qos
    including the time spent waiting for request to become available,
    which can easily dominate on saturated devices.

    This patch adds request->alloc_time_ns which records when the request
    allocation attempt started. As it isn't used for the usual stats,
    make it optional behind CONFIG_BLK_RQ_ALLOC_TIME and
    QUEUE_FLAG_RQ_ALLOC_TIME so that it can be compiled out when there are
    no users and it's active only on queues which need it even when
    compiled in.

    v2: s/pre_start_time/alloc_time/ and add CONFIG_BLK_RQ_ALLOC_TIME
    gating as suggested by Jens.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

15 Jul, 2019

2 commits


09 Jul, 2019

1 commit


15 Jun, 2019

1 commit

  • Convert the cgroup-v1 files to ReST format, in order to
    allow a later addition to the admin-guide.

    The conversion is actually:
    - add blank lines and identation in order to identify paragraphs;
    - fix tables markups;
    - add some lists markups;
    - mark literal blocks;
    - adjust title markups.

    At its new index.rst, let's add a :orphan: while this is not linked to
    the main index.rst file, in order to avoid build warnings.

    Signed-off-by: Mauro Carvalho Chehab
    Acked-by: Tejun Heo
    Signed-off-by: Tejun Heo

    Mauro Carvalho Chehab
     

13 Jun, 2019

1 commit

  • In most use cases of zoned block devices (aka SMR disks), the
    mq-deadline scheduler is mandatory as it implements sequential write
    command processing guarantees with zone write locking. So make sure that
    this scheduler is always enabled if CONFIG_BLK_DEV_ZONED is selected.

    Tested-by: Chaitanya Kulkarni
    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: Damien Le Moal
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

07 Apr, 2019

1 commit

  • Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
    architectures. These types are required to support block device and/or
    file sizes larger than 2 TiB, and have generally defaulted to on for
    a long time. Enabling the option only increases the i386 tinyconfig
    size by 145 bytes, and many data structures already always use
    64-bit values for their in-core and on-disk data structures anyway,
    so there should not be a large change in dynamic memory usage either.

    Dropping this option removes a somewhat weird non-default config that
    has cause various bugs or compiler warnings when actually used.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

30 Dec, 2018

1 commit

  • Pull Kconfig updates from Masahiro Yamada:

    - support -y option for merge_config.sh to avoid downgrading =y to =m

    - remove S_OTHER symbol type, and touch include/config/*.h files correctly

    - fix file name and line number in lexer warnings

    - fix memory leak when EOF is encountered in quotation

    - resolve all shift/reduce conflicts of the parser

    - warn no new line at end of file

    - make 'source' statement more strict to take only string literal

    - rewrite the lexer and remove the keyword lookup table

    - convert to SPDX License Identifier

    - compile C files independently instead of including them from zconf.y

    - fix various warnings of gconfig

    - misc cleanups

    * tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
    kconfig: surround dbg_sym_flags with #ifdef DEBUG to fix gconf warning
    kconfig: split images.c out of qconf.cc/gconf.c to fix gconf warnings
    kconfig: add static qualifiers to fix gconf warnings
    kconfig: split the lexer out of zconf.y
    kconfig: split some C files out of zconf.y
    kconfig: convert to SPDX License Identifier
    kconfig: remove keyword lookup table entirely
    kconfig: update current_pos in the second lexer
    kconfig: switch to ASSIGN_VAL state in the second lexer
    kconfig: stop associating kconf_id with yylval
    kconfig: refactor end token rules
    kconfig: stop supporting '.' and '/' in unquoted words
    treewide: surround Kconfig file paths with double quotes
    microblaze: surround string default in Kconfig with double quotes
    kconfig: use T_WORD instead of T_VARIABLE for variables
    kconfig: use specific tokens instead of T_ASSIGN for assignments
    kconfig: refactor scanning and parsing "option" properties
    kconfig: use distinct tokens for type and default properties
    kconfig: remove redundant token defines
    kconfig: rename depends_list to comment_option_list
    ...

    Linus Torvalds
     

21 Dec, 2018

1 commit

  • The Kconfig lexer supports special characters such as '.' and '/' in
    the parameter context. In my understanding, the reason is just to
    support bare file paths in the source statement.

    I do not see a good reason to complicate Kconfig for the room of
    ambiguity.

    The majority of code already surrounds file paths with double quotes,
    and it makes sense since file paths are constant string literals.

    Make it treewide consistent now.

    Signed-off-by: Masahiro Yamada
    Acked-by: Wolfram Sang
    Acked-by: Geert Uytterhoeven
    Acked-by: Ingo Molnar

    Masahiro Yamada
     

08 Nov, 2018

1 commit


11 Oct, 2018

1 commit

  • 'default n' is the default value for any bool or tristate Kconfig
    setting so there is no need to write it explicitly.

    Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
    is not set' for visible symbols") the Kconfig behavior is the same
    regardless of 'default n' being present or not:

    ...
    One side effect of (and the main motivation for) this change is making
    the following two definitions behave exactly the same:

    config FOO
    bool

    config FOO
    bool
    default n

    With this change, neither of these will generate a
    '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
    That might make it clearer to people that a bare 'default n' is
    redundant.
    ...

    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Jens Axboe

    Bartlomiej Zolnierkiewicz
     

27 Sep, 2018

1 commit

  • Move the code for runtime power management from blk-core.c into the
    new source file blk-pm.c. Move the corresponding declarations from
    into . For CONFIG_PM=n, leave out
    the declarations of the functions that are not used in that mode.
    This patch not only reduces the number of #ifdefs in the block layer
    core code but also reduces the size of header file
    and hence should help to reduce the build time of the Linux kernel
    if CONFIG_PM is not defined.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Jianchao Wang
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Cc: Alan Stern
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

09 Jul, 2018

2 commits

  • Current IO controllers for the block layer are less than ideal for our
    use case. The io.max controller is great at hard limiting, but it is
    not work conserving. This patch introduces io.latency. You provide a
    latency target for your group and we monitor the io in short windows to
    make sure we are not exceeding those latency targets. This makes use of
    the rq-qos infrastructure and works much like the wbt stuff. There are
    a few differences from wbt

    - It's bio based, so the latency covers the whole block layer in addition to
    the actual io.
    - We will throttle all IO types that comes in here if we need to.
    - We use the mean latency over the 100ms window. This is because writes can
    be particularly fast, which could give us a false sense of the impact of
    other workloads on our protected workload.
    - By default there's no throttling, we set the queue_depth to INT_MAX so that
    we can have as many outstanding bio's as we're allowed to. Only at
    throttle time do we pay attention to the actual queue depth.
    - We backcharge cgroups for root cg issued IO and induce artificial
    delays in order to deal with cases like metadata only or swap heavy
    workloads.

    In testing this has worked out relatively well. Protected workloads
    will throttle noisy workloads down to 1 io at time if they are doing
    normal IO on their own, or induce up to a 1 second delay per syscall if
    they are doing a lot of root issued IO (metadata/swap IO).

    Our testing has revolved mostly around our production web servers where
    we have hhvm (the web server application) in a protected group and
    everything else in another group. We see slightly higher requests per
    second (RPS) on the test tier vs the control tier, and much more stable
    RPS across all machines in the test tier vs the control tier.

    Another test we run is a slow memory allocator in the unprotected group.
    Before this would eventually push us into swap and cause the whole box
    to die and not recover at all. With these patches we see slight RPS
    drops (usually 10-15%) before the memory consumer is properly killed and
    things recover within seconds.

    Signed-off-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • Exclude zoned block device members from struct request_queue for
    CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building
    the code that uses these struct request_queue members if
    CONFIG_BLK_DEV_ZONED != n.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Damien Le Moal
    Cc: Matias Bjorling
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

16 Jun, 2018

1 commit

  • As we move stuff around, some doc references are broken. Fix some of
    them via this script:
    ./scripts/documentation-file-ref-check --fix

    Manually checked if the produced result is valid, removing a few
    false-positives.

    Acked-by: Takashi Iwai
    Acked-by: Masami Hiramatsu
    Acked-by: Stephen Boyd
    Acked-by: Charles Keepax
    Acked-by: Mathieu Poirier
    Reviewed-by: Coly Li
    Signed-off-by: Mauro Carvalho Chehab
    Acked-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

09 Aug, 2017

1 commit

  • Like pci and virtio, we add a rdma helper for affinity
    spreading. This achieves optimal mq affinity assignments
    according to the underlying rdma device affinity maps.

    Reviewed-by: Jens Axboe
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Max Gurtovoy
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Sagi Grimberg
     

13 May, 2017

1 commit

  • Pull libnvdimm fixes from Dan Williams:
    "Incremental fixes and a small feature addition on top of the main
    libnvdimm 4.12 pull request:

    - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
    The size regression is fixed by moving all dax helpers into the
    dax-core and only specifying "select DAX" for FS_DAX and
    dax-capable drivers. He also asked for clarification of the
    NR_DEV_DAX config option which, on closer look, does not need to be
    a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
    for good measure.

    - Ben's attention to detail on -stable patch submissions caught a
    case where the recent fixes to arch_copy_from_iter_pmem() missed a
    condition where we strand dirty data in the cache. This is tagged
    for -stable and will also be included in the rework of the pmem api
    to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.

    - Vishal adds a feature that missed the initial pull due to pending
    review feedback. It allows the kernel to clear media errors when
    initializing a BTT (atomic sector update driver) instance on a pmem
    namespace.

    - Ross noticed that the dax_device + dax_operations conversion broke
    __dax_zero_page_range(). The nvdimm unit tests fail to check this
    path, but xfstests immediately trips over it. No excuse for missing
    this before submitting the 4.12 pull request.

    These all pass the nvdimm unit tests and an xfstests spot check. The
    set has received a build success notification from the kbuild robot"

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    filesystem-dax: fix broken __dax_zero_page_range() conversion
    libnvdimm, btt: ensure that initializing metadata clears poison
    libnvdimm: add an atomic vs process context flag to rw_bytes
    x86, pmem: Fix cache flushing for iovec write < 8 bytes
    device-dax: kill NR_DEV_DAX
    block, dax: move "select DAX" from BLOCK to FS_DAX
    device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX

    Linus Torvalds
     

09 May, 2017

1 commit

  • For configurations that do not enable DAX filesystems or drivers, do not
    require the DAX core to be built.

    Given that the 'direct_access' method has been removed from
    'block_device_operations', we can also go ahead and remove the
    block-related dax helper functions from fs/block_dev.c to
    drivers/dax/super.c. This keeps dax details out of the block layer and
    lets the DAX core be built as a module in the FS_DAX=n case.

    Filesystems need to include dax.h to call bdev_dax_supported().

    Cc: linux-xfs@vger.kernel.org
    Cc: Jens Axboe
    Cc: "Theodore Ts'o"
    Cc: Matthew Wilcox
    Cc: Alexander Viro
    Cc: "Darrick J. Wong"
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Dan Williams

    Dan Williams
     

06 May, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "The bulk of this has been in multiple -next releases. There were a few
    late breaking fixes and small features that got added in the last
    couple days, but the whole set has received a build success
    notification from the kbuild robot.

    Change summary:

    - Region media error reporting: A libnvdimm region device is the
    parent to one or more namespaces. To date, media errors have been
    reported via the "badblocks" attribute attached to pmem block
    devices for namespaces in "raw" or "memory" mode. Given that
    namespaces can be in "device-dax" or "btt-sector" mode this new
    interface reports media errors generically, i.e. independent of
    namespace modes or state.

    This subsequently allows userspace tooling to craft "ACPI 6.1
    Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
    requests and submit them via the ioctl path for NVDIMM root bus
    devices.

    - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
    by a request from Linus and feedback from Christoph this allows for
    dax capable drivers to publish their own custom dax operations.
    This fixes the broken assumption that all dax operations are
    related to a persistent memory device, and makes it easier for
    other architectures and platforms to add customized persistent
    memory support.

    - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
    available for storage appliance applications to manually trigger
    memory controllers to drain write-pending buffers that would
    otherwise be flushed automatically by the platform ADR
    (asynchronous-DRAM-refresh) mechanism at a power loss event.
    Support for "locked" DIMMs is included to prevent namespaces from
    surfacing when the namespace label data area is locked. Finally,
    fixes for various reported deadlocks and crashes, also tagged for
    -stable.

    - ACPI / nfit driver updates: General updates of the nfit driver to
    add DSM command overrides, ACPI 6.1 health state flags support, DSM
    payload debug available by default, and various fixes.

    Acknowledgements that came after the branch was pushed:

    - commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock":
    Tested-by: Yi Zhang

    - commit 23f498448362 "libnvdimm: rework region badblocks clearing"
    Tested-by: Toshi Kani "

    * tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
    libnvdimm, pfn: fix 'npfns' vs section alignment
    libnvdimm: handle locked label storage areas
    libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
    brd: fix uninitialized use of brd->dax_dev
    block, dax: use correct format string in bdev_dax_supported
    device-dax: fix sysfs attribute deadlock
    libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
    libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
    libnvdimm: rework region badblocks clearing
    acpi, nfit: kill ACPI_NFIT_DEBUG
    libnvdimm: fix clear length of nvdimm_forget_poison()
    libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
    libnvdimm, region: sysfs trigger for nvdimm_flush()
    libnvdimm: fix phys_addr for nvdimm_clear_poison
    x86, dax, pmem: remove indirection around memcpy_from_pmem()
    block: remove block_device_operations ->direct_access()
    block, dax: convert bdev_dax_supported() to dax_direct_access()
    filesystem-dax: convert to dax_direct_access()
    Revert "block: use DAX for partition table reads"
    ext2, ext4, xfs: retrieve dax_device for iomap operations
    ...

    Linus Torvalds
     

21 Apr, 2017

1 commit

  • Replace bdev_direct_access() with dax_direct_access() that uses
    dax_device and dax_operations instead of a block_device and
    block_device_operations for dax. Once all consumers of the old api have
    been converted bdev_direct_access() will be deleted.

    Given that block device partitioning decisions can cause dax page
    alignment constraints to be violated this also introduces the
    bdev_dax_pgoff() helper. It handles calculating a logical pgoff relative
    to the dax_device and also checks for page alignment.

    Signed-off-by: Dan Williams

    Dan Williams