12 Jan, 2016

25 commits

  • Now that a device can be managed using the system blocks, a method to
    reset the device is necessary as well. This patch introduces logic to
    reset the device easily to factory state and exposes it through an
    ioctl.

    The ioctl takes the following flags:

    NVM_FACTORY_ERASE_ONLY_USER
    By default all blocks, except host-reserved blocks are erased upon
    factory reset. Instead of this, only erase host-reserved blocks.
    NVM_FACTORY_RESET_HOST_BLKS
    Mark host-reserved blocks to be erased and set their type to free.
    NVM_FACTORY_RESET_GRWN_BBLKS
    Mark "grown bad blocks" to be erased and set their type to free.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Use system block information to register the appropriate media manager.
    This enables the LightNVM subsystem to instantiate a media manager
    selected by the user, instead of relying on automatic detection by each
    media manager loaded in the kernel.

    A device must now be initialized before it can proceed to initialize its
    media manager. Upon initialization, the configured media manager is
    automatically initialized as well.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Based on the previous patch, we now introduce an ioctl to initialize the
    device using nvm_init_sysblock and create the necessary system blocks.
    The user may specify the media manager that they wish to instantiate on
    top. Default from user-space will be "gennvm".

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • An Open-Channel SSD shall be initialized before use. To initialize, we
    define an on-disk format, that keeps a small set of metadata to bring up
    the media manager on top of the device.

    The initial step is introduced to allow a user to format the disks for a
    given media manager. During format, a system block is stored on one to
    three separate luns on the device. Each lun has the system block
    duplicated. During initialization, the system block can be retrieved and
    the appropriate media manager can initialized.

    The on-disk format currently covers (struct nvm_system_block):

    - Magic value "NVMS".
    - Monotonic increasing sequence number.
    - The physical block erase count.
    - Version of the system block format.
    - Media manager type.
    - Media manager superblock physical address.

    The interface provides three functions to manage the system block:

    int nvm_init_sysblock(struct nvm_dev *, struct nvm_sb_info *)
    int nvm_get_sysblock(struct nvm *dev, struct nvm_sb_info *)
    int nvm_update_sysblock(struct nvm *dev, struct nvm_sb_info *)

    Each implement a part of the logic to manage the system block. The
    initialization creates the first system blocks and mark them on the
    device. Get retrieves the latest system block by scanning all pages in
    the associated system blocks. The update sysblock writes new metadata
    and allocates new block if necessary.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • NAND MLC memories have both lower and upper pages. When programming,
    both of these must be written, before data can be read. However,
    these lower and upper pages might not placed at even and odd flash
    pages, but can be skipped. Therefore each flash memory has its lower
    pages defined, which can then be used when programming and to know when
    padding are necessary.

    This patch implements the lower page definition in the specification,
    and exposes it through a simple lookup table at dev->lptbl.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Some flash media has extended capabilities, such as programming SLC
    pages on MLC/TLC flash, erase/program suspend, scramble and encryption.
    MCCAP is introduced to detect support for these capabilities in the
    command set.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • LightNVM targets need to know the state of the flash block when doing
    flash optimizations. An example is implementing a write buffer to
    respect the flash page size. Currently, block state is not accounted
    for; the media manager only differentiates among free, bad and in-use
    blocks.

    This patch adds the logic in the generic media manager to enable
    targets manage blocks into open and close separately, and it implements
    such management in rrpc. It also adds a set of flags to describe the
    state of the block (open, closed, free, bad).

    In order to avoid taking two locks (nvm_lun and rrpc_lun) consecutively,
    we introduce lockless get_/put_block primitives so that the open and
    close list locks and future common logic is handled within the nvm_lun
    lock.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • The get/set bad block interface defines good block, factory bad block,
    grown bad block, device reserved block, and host reserved block.
    Unfortunately the grown bad block was missing, leaving the offsets wrong
    for device and host side reserved blocks.

    This patch adds the missing type and corrects the offsets.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Currently, a rrpc block only points to its nvm_lun. If a user wants to
    find the associated rrpc lun, it will have to calculate the index and
    look it up manually. By referencing the rrpc lun directly, this step can
    be omitted, at the cost of a larger memory footprint.

    This is important for upcoming patches that implement write buffering in
    rrpc.

    Signed-off-by: Javier González
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier González
     
  • Internal logic for both core and media managers, does not have a
    backing bio for issuing I/Os. Introduce nvm_submit_ppa to allow raw
    I/Os to be submitted to the underlying device driver.

    The function request the device, ppa, data buffer and its length and
    will submit the I/O synchronously to the device. The return value may
    therefore be used to detect any errors regarding the issued I/O.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Instead of passing request error into the LightNVM modules, incorporate
    it into the nvm_rq.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • Sometimes a user want to erase multiple PPAs at the same time. Extend
    nvm_erase_ppa to take multiple ppas and number of ppas to be erased.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • There is no need to check whether dev's pages per block is
    beyond rrpc support every time we init a lun, we only need
    to check it once before enter the lun init loop.

    Signed-off-by: Wenwei Tao
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     
  • The Westlake controller requires that the PPA list has sectors defined
    sequentially. Currently, the PPA list is created with planes first, then
    sectors. Change this to sectors first, then planes.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • This patch fix two issues in rrpc_lun_gc

    1. prio_list is protected by rrpc_lun's lock not nvm_lun's, so
    acquire rlun's lock instead of lun's before operate on the list.

    2. we delete block from prio_list before allocating gcb, but gcb
    allocation may fail, we end without putting it back to the list,
    this makes the block won't get reclaimed in the future. To solve
    this issue, delete block after gcb allocation.

    Signed-off-by: Wenwei Tao
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     
  • We delete a block from the gc list before reclaim it, so
    put it back to the list on its reclaim fail, otherwise
    this block will not get reclaimed and be programmable
    in the future.

    Signed-off-by: Wenwei Tao
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     
  • We should check last io completion status before
    starting another one.

    Signed-off-by: Wenwei Tao
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     
  • During get_bb_tbl, a callback is used to allow an user-specific scan
    function to be called. The callback may return an error, and in that
    case, the return value is overridden. However, the callback error is
    needed when the fault is a user error and not a kernel error. For
    example, when a user tries to initialize the same device twice. The
    get_bb_tbl callback should be able to communicate this.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • To implement sync I/O support within the LightNVM core, the end_io
    functions are refactored to take an end_io function pointer instead of
    testing for initialized media manager, followed by calling its end_io
    function.

    Sync I/O can then be implemented using a callback that signal I/O
    completion. This is similar to the logic found in blk_to_execute_io().
    By implementing it this way, the underlying device I/Os submission logic
    is abstracted away from core, targets, and media managers.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A device may be driven in single, double or quad plane mode. In that
    case, the rqd must have either one, two, or four PPAs set for a single
    PPA sent to the device. Refactor this logic into their own
    functions to be shared by program/erase/read in the core.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • A device may function in single, dual or quad plane mode. The gennvm
    media manager manages this with explicit helpers. They convert a single
    ppa to 1, 2 or 4 separate ppas in a ppa list. To aid implementation of
    recovery and system blocks, this functionality can be moved directly
    into the core.

    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Matias Bjørling
     
  • When rrpc_write_ppalist_rq and rrpc_read_ppalist_rq succeed, we setup
    rq correctly, but nvm_submit_io may afterward fail since it cannot
    allocate request or nvme_nvm_command, we return error but forget to
    cleanup the previous work.

    Signed-off-by: Wenwei Tao
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     
  • The mempool allocation might fail. Make sure to return error when it
    does, instead of causing a kernel panic.

    Signed-off-by: Javier Gonzalez
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Javier Gonzalez
     
  • When initing bad block list in gennvm_block_bb, once we move bad block
    from free_list to bb_list, we should maintain both stat info
    nr_free_blocks and nr_bad_blocks. So this patch fixes to add missing
    operation related to nr_free_blocks.

    Signed-off-by: Chao Yu
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Chao Yu
     
  • Put bio when submission fails, since we get it
    before submission. And return error when backend
    device driver doesn't provide a submit_io method,
    thus we can end IO properly.

    Signed-off-by: Wenwei Tao
    Signed-off-by: Matias Bjørling
    Signed-off-by: Jens Axboe

    Wenwei Tao
     

04 Jan, 2016

3 commits


01 Jan, 2016

5 commits

  • Pull PCI bugfix from Bjorn Helgaas:
    "Here's another fix for v4.4.

    This fixes 32-bit config reads for the HiSilicon driver. Obviously
    the driver is completely broken without this fix (apparently it
    actually was tested internally, but got broken somehow in the process
    of upstreaming it).

    Summary:

    HiSilicon host bridge driver
    Fix 32-bit config reads (Dongdong Liu)"

    * tag 'pci-v4.4-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
    PCI: hisi: Fix hisi_pcie_cfg_read() 32-bit reads

    Linus Torvalds
     
  • Pull sparc fixes from David Miller:
    "Just some missing syscall wire ups"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc: Wire up mlock2 system call.
    sparc: Add all necessary direct socket system calls.

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Prevent XFRM per-cpu counter updates for one namespace from being
    applied to another namespace. Fix from DanS treetman.

    2) Fix RCU de-reference in iwl_mvm_get_key_sta_id(), from Johannes
    Berg.

    3) Remove ethernet header assumption in nft_do_chain_netdev(), from
    Pablo Neira Ayuso.

    4) Fix cpsw PHY ident with multiple slaves and fixed-phy, from Pascal
    Speck.

    5) Fix use after free in sixpack_close and mkiss_close.

    6) Fix VXLAN fw assertion on bnx2x, from Yuval Mintz.

    7) natsemi doesn't check for DMA mapping errors, from Alexey
    Khoroshilov.

    8) Fix inverted test in ip6addrlbl_get(), from ANdrey Ryabinin.

    9) Missing initialization of needed_headroom in geneve tunnel driver,
    from Paolo Abeni.

    10) Fix conntrack template leak in openvswitch, from Joe Stringer.

    11) Mission initialization of wq->flags in sock_alloc_inode(), from
    Nicolai Stange.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
    sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close
    net, socket, socket_wq: fix missing initialization of flags
    drivers: net: cpsw: fix error return code
    openvswitch: Fix template leak in error cases.
    sctp: label accepted/peeled off sockets
    sctp: use GFP_USER for user-controlled kmalloc
    qlcnic: fix a loop exit condition better
    net: cdc_ncm: avoid changing RX/TX buffers on MTU changes
    geneve: initialize needed_headroom
    ipv6: honor ifindex in case we receive ll addresses in router advertisements
    addrconf: always initialize sysctl table data
    ipv6/addrlabel: fix ip6addrlbl_get()
    switchdev: bridge: Pass ageing time as clock_t instead of jiffies
    sh_eth: fix 16-bit descriptor field access endianness too
    veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
    net: usb: cdc_ncm: Adding Dell DW5813 LTE AT&T Mobile Broadband Card
    net: usb: cdc_ncm: Adding Dell DW5812 LTE Verizon Mobile Broadband Card
    natsemi: add checks for dma mapping errors
    rhashtable: Kill harmless RCU warning in rhashtable_walk_init
    openvswitch: correct encoding of set tunnel action attributes
    ...

    Linus Torvalds
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • The GLIBC folks would like to eliminate socketcall support
    eventually, and this makes sense regardless so wire them
    all up.

    Signed-off-by: David S. Miller

    David S. Miller
     

31 Dec, 2015

4 commits

  • In sctp_close, sctp_make_abort_user may return NULL because of memory
    allocation failure. If this happens, it will bypass any state change
    and never free the assoc. The assoc has no chance to be freed and it
    will be kept in memory with the state it had even after the socket is
    closed by sctp_close().

    So if sctp_make_abort_user fails to allocate memory, we should abort
    the asoc via sctp_primitive_ABORT as well. Just like the annotation in
    sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
    "Even if we can't send the ABORT due to low memory delete the TCB.
    This is a departure from our typical NOMEM handling".

    But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
    dereference the chunk pointer, and system crash. So we should add
    SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
    places where it adds SCTP_CMD_REPLY cmd.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • …m/linux/kernel/git/kvalo/wireless-drivers

    Kalle Valo says:

    ====================
    iwlwifi

    * don't load firmware that won't exist for 7260
    * fix RCU splat
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from
    the current 4.4 release cycle introduced a new flags member in
    struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
    from struct socket's flags member into that new place.

    Unfortunately, the new flags field is never initialized properly, at least
    not for the struct socket_wq instance created in sock_alloc_inode().

    One particular issue I encountered because of this is that my GNU Emacs
    failed to draw anything on my desktop -- i.e. what I got is a transparent
    window, including the title bar. Bisection lead to the commit mentioned
    above and further investigation by means of strace told me that Emacs
    is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
    reproducible 100% of times and the fact that properly initializing the
    struct socket_wq ->flags fixes the issue leads me to the conclusion that
    somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
    preventing my Emacs from receiving any SIGIO's due to data becoming
    available and it got stuck.

    Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
    member to zero.

    Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection")
    Signed-off-by: Nicolai Stange
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Nicolai Stange
     
  • Pull block fixes from Jens Axboe:
    "Make the block layer great again.

    Basically three amazing fixes in this pull request, split into 4
    patches. Believe me, they should go into 4.4. Two of them fix a
    regression, the third and last fixes an easy-to-trigger bug.

    - Fix a bad irq enable through null_blk, for queue_mode=1 and using
    timer completions. Add a block helper to restart a queue
    asynchronously, and use that from null_blk. From me.

    - Fix a performance issue in NVMe. Some devices (Intel Pxxxx) expose
    a stripe boundary, and performance suffers if we cross it. We took
    that into account for merging, but not for the newer splitting
    code. Fix from Keith.

    - Fix a kernel oops in lightnvm with multiple channels. From Matias"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    lightnvm: wrong offset in bad blk lun calculation
    null_blk: use async queue restart helper
    block: add blk_start_queue_async()
    block: Split bios on chunk boundaries

    Linus Torvalds
     

30 Dec, 2015

3 commits

  • The total delay of HDMI hotplug detecting with 30ms is sometimes not
    enoughtfor HDMI live status up with specific HDMI monitors in BSW platform.

    After doing experiments for following monitors, it needs 80ms at least
    for those worst cases.

    Lenovo L246 1xwA (4 failed, necessary hot-plug delay: 58/40/60/40ms)
    Philips HH2AP (9 failed, necessary hot-plug delay: 80/50/50/60/46/40/58/58/39ms)
    BENQ ET-0035-N (6 failed, necessary hot-plug delay: 60/50/50/80/80/40ms)
    DELL U2713HM (2 failed, necessary hot-plug delay: 58/59ms)
    HP HP-LP2475w (5 failed, necessary hot-plug delay: 70/50/40/60/40ms)

    It looks like 70-80 ms is BSW platform needs in some bad cases of the
    monitors at this end (8 times delay at most). Keep less than 100ms for
    HDCP pulse HPD low (with at least 100ms) to respond a plug out.

    Reviewed-by: Cooper Chiou
    Tested-by: Gary Wang
    Cc: Gavin Hindman
    Cc: Sonika Jindal
    Cc: Shashank Sharma
    Cc: Shobhit Kumar
    Signed-off-by: Gary Wang
    Link: http://patchwork.freedesktop.org/patch/msgid/1450858295-12804-1-git-send-email-gary.c.wang@intel.com
    Tested-by: Shobhit Kumar
    Cc: drm-intel-fixes@lists.freedesktop.org
    Fixes: 237ed86c693d ("drm/i915: Check live status before reading edid")
    Signed-off-by: Daniel Vetter
    (cherry picked from commit f8d03ea0053b23de42c828d559016eabe0b91523)
    [Jani: undo the file mode change of the original commit]
    Signed-off-by: Jani Nikula

    Gary Wang
     
  • Merge misc fixes from Andrew Morton:
    "9 fixes"

    * emailed patches from Andrew Morton :
    mm/vmstat: fix overflow in mod_zone_page_state()
    ocfs2/dlm: clear migration_pending when migration target goes down
    mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()
    ocfs2: fix flock panic issue
    m32r: add io*_rep helpers
    m32r: fix build failure
    arch/x86/xen/suspend.c: include xen/xen.h
    mm: memcontrol: fix possible memcg leak due to interrupted reclaim
    ocfs2: fix BUG when calculate new backup super

    Linus Torvalds
     
  • Pull vfs fix from Al Viro:
    "Fix for 3.15 breakage of fcntl64() in arm OABI compat. -stable
    fodder"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    [PATCH] arm: fix handling of F_OFD_... in oabi_fcntl64()

    Linus Torvalds