06 Sep, 2014

40 commits

  • Greg Kroah-Hartman
     
  • commit a9ef803d740bfadf5e505fbc57efa57692e27025 upstream.

    commit bdd405d2a528 ("usb: hub: Prevent hub autosuspend if
    usbcore.autosuspend is -1") causes a build error if CONFIG_PM_RUNTIME is
    disabled. Fix that by doing a simple #ifdef guard around it.

    Reported-by: Stephen Rothwell
    Reported-by: kbuild test robot
    Cc: Roger Quadros
    Cc: Michael Welling
    Cc: Alan Stern
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • commit 4449a51a7c281602d3a385044ab928322a122a02 upstream.

    Aleksei hit the soft lockup during reading /proc/PID/smaps. David
    investigated the problem and suggested the right fix.

    while_each_thread() is racy and should die, this patch updates
    vm_is_stack().

    Signed-off-by: Oleg Nesterov
    Reported-by: Aleksei Besogonov
    Tested-by: Aleksei Besogonov
    Suggested-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit aee7af356e151494d5014f57b33460b162f181b5 upstream.

    In the presence of delegations, we can no longer assume that the
    state->n_rdwr, state->n_rdonly, state->n_wronly reflect the open
    stateid share mode, and so we need to calculate the initial value
    for calldata->arg.fmode using the state->flags.

    Reported-by: James Drews
    Fixes: 88069f77e1ac5 (NFSv41: Fix a potential state leakage when...)
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 412f6c4c26fb1eba8844290663837561ac53fa6e upstream.

    If we did an OPEN_DOWNGRADE, then the right thing to do on success, is
    to apply the new open mode to the struct nfs4_state. Instead, we were
    unconditionally clearing the state, making it appear to our state
    machinery as if we had just performed a CLOSE.

    Fixes: 226056c5c312b (NFSv4: Use correct locking when updating nfs4_state...)
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit f87d928f6d98644d39809a013a22f981d39017cf upstream.

    When creating a new object on the NFS server, we should not be sending
    posix setacl requests unless the preceding posix_acl_create returned a
    non-trivial acl. Doing so, causes Solaris servers in particular to
    return an EINVAL.

    Fixes: 013cdf1088d72 (nfs: use generic posix ACL infrastructure,,,)
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1132786
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 3c45ddf823d679a820adddd53b52c6699c9a05ac upstream.

    The current code always selects XPRT_TRANSPORT_BC_TCP for the back
    channel, even when the forward channel was not TCP (eg, RDMA). When
    a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
    code when trying to send CB_NULL.

    Instead, construct the transport protocol number from the forward
    channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
    not support bi-directional RPC will not have registered a "BC"
    transport, causing create_backchannel_client() to fail immediately.

    Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     
  • commit 71a6ec8ac587418ceb6b420def1ca44b334c1ff7 upstream.

    Commit c8e47028 made it possible to change resvport/noresvport and
    sharecache/nosharecache via a remount operation, neither of which should be
    allowed.

    Signed-off-by: Scott Mayhew
    Fixes: c8e47028 (nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data)
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Scott Mayhew
     
  • commit 7a9e75a185e6b3a3860e6a26fb6e88691fc2c9d9 upstream.

    There was a check for result being not NULL. But get_acl() may return
    NULL, or ERR_PTR, or actual pointer.
    The purpose of the function where current change is done is to "list
    ACLs only when they are available", so any error condition of get_acl()
    mustn't be elevated, and returning 0 there is still valid.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81111
    Signed-off-by: Andrey Utkin
    Reviewed-by: Christoph Hellwig
    Fixes: 74adf83f5d77 (nfs: only show Posix ACLs in listxattr if actually...)
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Andrey Utkin
     
  • commit d9499a95716db0d4bc9b67e88fd162133e7d6b08 upstream.

    A memory allocation failure could cause nfsd_startup_generic to fail, in
    which case nfsd_users wouldn't be incorrectly left elevated.

    After nfsd restarts nfsd_startup_generic will then succeed without doing
    anything--the first consequence is likely nfs4_start_net finding a bad
    laundry_wq and crashing.

    Signed-off-by: Kinglong Mee
    Fixes: 4539f14981ce "nfsd: replace boolean nfsd_up flag by users counter"
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Kinglong Mee
     
  • commit dd5f5006d1035547559c8a90781a7e249787a7a2 upstream.

    The commit [5ee0f803cc3a: usbcore: don't log on consecutive debounce
    failures of the same port] added the check of the reliable port, but
    it also replaced the device argument to dev_err() wrongly, which leads
    to a NULL dereference.

    This patch restores the right device, port_dev->dev. Also, since
    dev_err() itself shows the port number, reduce the port number shown
    in the error message, essentially reverting to the state before the
    commit 5ee0f803cc3a.

    [The fix suggested by Hannes, and the error message cleanup suggested
    by Alan Stern]

    Fixes: 5ee0f803cc3a ('usbcore: don't log on consecutive debounce failures of the same port')
    Reported-by: Hannes Reinecke
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Takashi Iwai
     
  • commit bdd405d2a5287bdb9b04670ea255e1f122138e66 upstream.

    If user specifies that USB autosuspend must be disabled by module
    parameter "usbcore.autosuspend=-1" then we must prevent
    autosuspend of USB hub devices as well.

    commit 596d789a211d introduced in v3.8 changed the original behaivour
    and stopped respecting the usbcore.autosuspend parameter for hubs.

    Fixes: 596d789a211d "USB: set hub's default autosuspend delay as 0"

    Signed-off-by: Roger Quadros
    Tested-by: Michael Welling
    Acked-by: Alan Stern
    Signed-off-by: Greg Kroah-Hartman

    Roger Quadros
     
  • commit 5cbcc35e5bf0eae3c7494ce3efefffc9977827ae upstream.

    The roothub's index per controller is from 0, but the hub port index per hub
    is from 1, this patch fixes "can't find device at roohub" problem for connecting
    test fixture at roohub when do USB-IF Embedded Host High-Speed Electrical Test.

    This patch is for v3.12+.

    Signed-off-by: Peter Chen
    Acked-by: Alan Stern
    Signed-off-by: Greg Kroah-Hartman

    Peter Chen
     
  • commit 6817ae225cd650fb1c3295d769298c38b1eba818 upstream.

    This patch fixes a potential security issue in the whiteheat USB driver
    which might allow a local attacker to cause kernel memory corrpution. This
    is due to an unchecked memcpy into a fixed size buffer (of 64 bytes). On
    EHCI and XHCI busses it's possible to craft responses greater than 64
    bytes leading a buffer overflow.

    Signed-off-by: James Forshaw
    Signed-off-by: Greg Kroah-Hartman

    James Forshaw
     
  • commit 646907f5bfb0782c731ae9ff6fb63471a3566132 upstream.

    Added support to the ftdi_sio driver for ekey Converter USB which
    uses an FT232BM chip.

    Signed-off-by: Jaša Bartelj
    Signed-off-by: Johan Hovold
    Signed-off-by: Greg Kroah-Hartman

    Jaša Bartelj
     
  • commit 6552cc7f09261db2aeaae389aa2c05a74b3a93b4 upstream.

    Add device id for Basic Micro ATOM Nano USB2Serial adapters.

    Reported-by: Nicolas Alt
    Tested-by: Nicolas Alt
    Signed-off-by: Johan Hovold
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold
     
  • commit cc824534d4fef0e46e4486d5c1e10d3c6b1ebadc upstream.

    Looks like MUSB cable removal can cause wake-up interrupts to
    stop working for device tree based booting at least for UART3
    even as nothing is dynamically remuxed. This can be fixed by
    calling reconfigure_io_chain() for device tree based booting
    in hwmod code. Note that we already do that for legacy booting
    if the legacy mux is configured.

    My guess is that this is related to UART3 and MUSB ULPI
    hsusb0_data0 and hsusb0_data1 support for Carkit mode that
    somehow affect the configured IO chain for UART3 and require
    rearming the wake-up interrupts.

    In general, for device tree based booting, pinctrl-single
    calls the rearm hook that in turn calls reconfigure_io_chain
    so calling reconfigure_io_chain should not be needed from the
    hwmod code for other events.

    So let's limit the hwmod rearming of iochain only to
    HWMOD_FORCE_MSTANDBY where MUSB is currently the only user
    of it. If we see other devices needing similar changes we can
    add more checks for it.

    Cc: Paul Walmsley
    Signed-off-by: Tony Lindgren
    Signed-off-by: Greg Kroah-Hartman

    Tony Lindgren
     
  • commit e21eba05afd288a227320f797864ddd859397eed upstream.

    This is a bit bigger hammer then I would like to use for this, but for now
    it will have to make do. I'm working on getting my hands on one of these so
    that I can try to get streams to work (with a quirk flag if necessary) and
    then we can re-enable them.

    For now this at least makes uas capable disk enclosures work again by forcing
    fallback to the usb-storage driver.

    https://bugzilla.kernel.org/show_bug.cgi?id=79511

    Signed-off-by: Hans de Goede
    Acked-by: Mathias Nyman
    Signed-off-by: Greg Kroah-Hartman

    Hans de Goede
     
  • commit 365038d83313951d6ace15342eb24624bbef1666 upstream.

    When we manually need to move the TR dequeue pointer we need to set the
    correct cycle bit as well. Previously we used the trb pointer from the
    last event received as a base, but this was changed in
    commit 1f81b6d22a59 ("usb: xhci: Prefer endpoint context dequeue pointer")
    to use the dequeue pointer from the endpoint context instead

    It turns out some Asmedia controllers advance the dequeue pointer
    stored in the endpoint context past the event triggering TRB, and
    this messed up the way the cycle bit was calculated.

    Instead of adding a quirk or complicating the already hard to follow cycle bit
    code, the whole cycle bit calculation is now simplified and adapted to handle
    event and endpoint context dequeue pointer differences.

    Fixes: 1f81b6d22a59 ("usb: xhci: Prefer endpoint context dequeue pointer")
    Reported-by: Maciej Puzio
    Reported-by: Evan Langlois
    Reviewed-by: Julius Werner
    Tested-by: Maciej Puzio
    Tested-by: Evan Langlois
    Signed-off-by: Mathias Nyman
    Signed-off-by: Greg Kroah-Hartman

    Mathias Nyman
     
  • commit 2597fe99bb0259387111d0431691f5daac84f5a5 upstream.

    AMD xHC also needs short tx quirk after tested on most of chipset
    generations. That's because there is the same incorrect behavior like
    Fresco Logic host. Please see below message with on USB webcam
    attached on xHC host:

    [ 139.262944] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.266934] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.270913] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.274937] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.278914] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.282936] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.286915] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.290938] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.294913] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
    [ 139.298917] xhci_hcd 0000:00:10.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?

    Reported-by: Arindam Nath
    Tested-by: Shriraj-Rai P
    Signed-off-by: Huang Rui
    Signed-off-by: Mathias Nyman
    Signed-off-by: Greg Kroah-Hartman

    Huang Rui
     
  • commit 9a54886342e227433aebc9d374f8ae268a836475 upstream.

    When using a Renesas uPD720231 chipset usb-3 uas to sata bridge with a 120G
    Crucial M500 ssd, model string: Crucial_ CT120M500SSD1, together with a
    the integrated Intel xhci controller on a Haswell laptop:

    00:14.0 USB controller [0c03]: Intel Corporation 8 Series USB xHCI HC [8086:9c31] (rev 04)

    The following error gets logged to dmesg:

    xhci error: Transfer event TRB DMA ptr not part of current TD

    Treating COMP_STOP the same as COMP_STOP_INVAL when no event_seg gets found
    fixes this.

    Signed-off-by: Hans de Goede
    Signed-off-by: Mathias Nyman
    Signed-off-by: Greg Kroah-Hartman

    Hans de Goede
     
  • commit a2fa6721c7237b5a666f16f732628c0c09c0b954 upstream.

    The Elecom WDC-150SU2M uses this chip.

    Reported-by: Hiroki Kondo
    Signed-off-by: Larry Finger
    Signed-off-by: Greg Kroah-Hartman

    Larry Finger
     
  • commit 8626d524ef08f10fccc0c41e5f75aef8235edf47 upstream.

    The stick is not recognized.
    This dongle uses r8188eu but usb-id is missing.
    3.16.0

    Signed-off-by: Holger Paradies
    Signed-off-by: Larry Finger
    Signed-off-by: Greg Kroah-Hartman

    Holger Paradies
     
  • commit ec0a38bf8b28b036202070cf3ef271e343d9eafc upstream.

    Fix two reported bugs, caused by et131x_adapter->phydev->addr being accessed
    before it is initialised, by:

    - letting et131x_mii_write() take a phydev address, instead of using the one
    stored in adapter by default. This is so et131x_mdio_write() can use it's own
    addr value.
    - removing implementation of et131x_mdio_reset(), as it's not needed.
    - moving a call to et131x_disable_phy_coma() in et131x_pci_setup(), which uses
    phydev->addr, until after the mdiobus has been registered.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=80751
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=77121
    Signed-off-by: Mark Einon
    Signed-off-by: Greg Kroah-Hartman

    Mark Einon
     
  • commit e409842a03b0c2c41c0959fef8a7563208af36c1 upstream.

    The following patch fixes a build error on sparc32. I think it should go to
    stable 3.16.

    Remove a circular dependency on atomic.h header file which leads to compilation
    failure on sparc32 as reported here:
    http://kisskb.ellerman.id.au/kisskb/buildresult/11340509/

    The specific dependency is as follows:

    In file included from arch/sparc/include/asm/smp_32.h:24:0,
    from arch/sparc/include/asm/smp.h:6,
    from arch/sparc/include/asm/switch_to_32.h:4,
    from arch/sparc/include/asm/switch_to.h:6,
    from arch/sparc/include/asm/ptrace.h:84,
    from arch/sparc/include/asm/processor_32.h:16,
    from arch/sparc/include/asm/processor.h:6,
    from arch/sparc/include/asm/barrier_32.h:4,
    from arch/sparc/include/asm/barrier.h:6,
    from arch/sparc/include/asm/atomic_32.h:17,
    from arch/sparc/include/asm/atomic.h:6,
    from drivers/staging/lustre/lustre/obdclass/class_obd.c:38

    Signed-off-by: Pranith Kumar
    Signed-off-by: Greg Kroah-Hartman

    Pranith Kumar
     
  • commit db9ee220361de03ee86388f9ea5e529eaad5323c upstream.

    It turns out that there are some serious problems with the on-disk
    format of journal checksum v2. The foremost is that the function to
    calculate descriptor tag size returns sizes that are too big. This
    causes alignment issues on some architectures and is compounded by the
    fact that some parts of jbd2 use the structure size (incorrectly) to
    determine the presence of a 64bit journal instead of checking the
    feature flags.

    Therefore, introduce journal checksum v3, which enlarges the
    descriptor block tag format to allow for full 32-bit checksums of
    journal blocks, fix the journal tag function to return the correct
    sizes, and fix the jbd2 recovery code to use feature flags to
    determine 64bitness.

    Add a few function helpers so we don't have to open-code quite so
    many pieces.

    Switching to a 16-byte block size was found to increase journal size
    overhead by a maximum of 0.1%, to convert a 32-bit journal with no
    checksumming to a 32-bit journal with checksum v3 enabled.

    Signed-off-by: Darrick J. Wong
    Reported-by: TR Reardon
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • commit 022eaa7517017efe4f6538750c2b59a804dc7df7 upstream.

    When recovering the journal, don't fall into an infinite loop if we
    encounter a corrupt journal block. Instead, just skip the block and
    return an error, which fails the mount and thus forces the user to run
    a full filesystem fsck.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • commit d80d448c6c5bdd32605b78a60fe8081d82d4da0f upstream.

    When performing a same-directory rename, it's possible that adding or
    setting the new directory entry will cause the directory to overflow
    the inline data area, which causes the directory to be converted to an
    extent-based directory. Under this circumstance it is necessary to
    re-read the directory when deleting the old dirent because the "old
    directory" context still points to i_block in the inode table, which
    is now an extent tree root! The delete fails with an FS error, and
    the subsequent fsck complains about incorrect link counts and
    hardlinked directories.

    Test case (originally found with flat_dir_test in the metadata_csum
    test program):

    # mkfs.ext4 -O inline_data /dev/sda
    # mount /dev/sda /mnt
    # mkdir /mnt/x
    # touch /mnt/x/changelog.gz /mnt/x/copyright /mnt/x/README.Debian
    # sync
    # for i in /mnt/x/*; do mv $i $i.longer; done
    # ls -la /mnt/x/
    total 0
    -rw-r--r-- 1 root root 0 Aug 25 12:03 changelog.gz.longer
    -rw-r--r-- 1 root root 0 Aug 25 12:03 copyright
    -rw-r--r-- 1 root root 0 Aug 25 12:03 copyright.longer
    -rw-r--r-- 1 root root 0 Aug 25 12:03 README.Debian.longer

    (Hey! Why are there four files now??)

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • commit 6603120e96eae9a5d6228681ae55c7fdc998d1bb upstream.

    In case of delalloc block i_disksize may be less than i_size. So we
    have to update i_disksize each time we allocated and submitted some
    blocks beyond i_disksize. We weren't doing this on the error paths,
    so fix this.

    testcase: xfstest generic/019

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Monakhov
     
  • commit c174e6d6979a04b7b77b93f244396be4b81f8bfb upstream.

    After commit f282ac19d86f we use different transactions for
    preallocation and i_disksize update which result in complain from fsck
    after power-failure. spotted by generic/019. IMHO this is regression
    because fs becomes inconsistent, even more 'e2fsck -p' will no longer
    works (which drives admins go crazy) Same transaction requirement
    applies ctime,mtime updates

    testcase: xfstest generic/019

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Monakhov
     
  • commit 69dc9536405213c1d545fcace1fc15c481d00aae upstream.

    Currently we reserve only 4 blocks but in worst case scenario
    ext4_zero_partial_blocks() may want to zeroout and convert two
    non adjacent blocks.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Monakhov
     
  • commit 4631dbf677ded0419fee35ca7408285dabfaef1a upstream.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Monakhov
     
  • commit 8e8248b1369c97c7bb6f8bcaee1f05deeabab8ef upstream.

    NFC will leak buffer if send failed.
    Use single exit point that does the freeing

    Signed-off-by: Alexander Usyskin
    Signed-off-by: Tomas Winkler
    Signed-off-by: Greg Kroah-Hartman

    Alexander Usyskin
     
  • commit 73ab4232388b7a08f17c8d08141ff2099fa0b161 upstream.

    If connect request is queued (e.g. device in pg) set client state
    to initializing, thus avoid preliminary exit in wait if current
    state is disconnected.

    This is regression from:

    commit e4d8270e604c3202131bac607969605ac397b893
    Author: Alexander Usyskin
    mei: set connecting state just upon connection request is sent to the fw

    Signed-off-by: Alexander Usyskin
    Signed-off-by: Tomas Winkler
    Signed-off-by: Greg Kroah-Hartman

    Alexander Usyskin
     
  • commit 9e0af23764344f7f1b68e4eefbe7dc865018b63d upstream.

    This has been reported and discussed for a long time, and this hang occurs in
    both 3.15 and 3.16.

    Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

    Btrfs has a kind of work queued as an ordered way, which means that its
    ordered_func() must be processed in the way of FIFO, so it usually looks like --

    normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work->func() ordered_list
    ordered_work->ordered_func()
    ordered_work->ordered_free()

    The hang is a rare case, first when we find free space, we get an uncached block
    group, then we go to read its free space cache inode for free space information,
    so it will

    file a readahead request
    btrfs_readpages()
    for page that is not in page cache
    __do_readpage()
    submit_extent_page()
    btrfs_submit_bio_hook()
    btrfs_bio_wq_end_io()
    submit_bio()
    end_workqueue_bio() current_work = arg; normal_work
    worker->current_func(arg)
    normal_work_helper(arg)
    A = container_of(arg, struct btrfs_work, normal_work);

    A->func()
    A->ordered_func()
    A->ordered_free() ordered_func()
    submit_compressed_extents()
    find_free_extent()
    load_free_space_inode()
    ... ordered_free()

    As if work A has a high priority in wq->ordered_list and there are more ordered
    works queued after it, such as B->ordered_func(), its memory could have been
    freed before normal_work_helper() returns, which means that kernel workqueue
    code worker_thread() still has worker->current_work pointer to be work
    A->normal_work's, ie. arg's address.

    Meanwhile, work C is allocated after work A is freed, work C->normal_work
    and work A->normal_work are likely to share the same address(I confirmed this
    with ftrace output, so I'm not just guessing, it's rare though).

    When another kthread picks up work C->normal_work to process, and finds our
    kthread is processing it(see find_worker_executing_work()), it'll think
    work C as a collision and skip then, which ends up nobody processing work C.

    So the situation is that our kthread is waiting forever on work C.

    Besides, there're other cases that can lead to deadlock, but the real problem
    is that all btrfs workqueue shares one work->func, -- normal_work_helper,
    so this makes each workqueue to have its own helper function, but only a
    wraper pf normal_work_helper.

    With this patch, I no long hit the above hang.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • commit f6dc45c7a93a011dff6eb9b2ffda59c390c7705a upstream.

    We should only be flushing on close if the file was flagged as needing
    it during truncate. I broke this with my ordered data vs transaction
    commit deadlock fix.

    Thanks to Miao Xie for catching this.

    Signed-off-by: Chris Mason
    Reported-by: Miao Xie
    Reported-by: Fengguang Wu
    Signed-off-by: Greg Kroah-Hartman

    Chris Mason
     
  • commit 38c1c2e44bacb37efd68b90b3f70386a8ee370ee upstream.

    The crash is

    ------------[ cut here ]------------
    kernel BUG at fs/btrfs/extent_io.c:2124!
    [...]
    Workqueue: btrfs-endio normal_work_helper [btrfs]
    RIP: 0010:[] [] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]

    This is in fact a regression.

    It is because we forgot to increase @offset properly in reading corrupted block,
    so that the @offset remains, and this leads to checksum errors while reading
    left blocks queued up in the same bio, and then ends up with hiting the above
    BUG_ON.

    Reported-by: Chris Murphy
    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • commit 8d875f95da43c6a8f18f77869f2ef26e9594fecc upstream.

    Truncates and renames are often used to replace old versions of a file
    with new versions. Applications often expect this to be an atomic
    replacement, even if they haven't done anything to make sure the new
    version is fully on disk.

    Btrfs has strict flushing in place to make sure that renaming over an
    old file with a new file will fully flush out the new file before
    allowing the transaction commit with the rename to complete.

    This ordering means the commit code needs to be able to lock file pages,
    and there are a few paths in the filesystem where we will try to end a
    transaction with the page lock held. It's rare, but these things can
    deadlock.

    This patch removes the ordered flushes and switches to a best effort
    filemap_flush like ext4 uses. It's not perfect, but it should fix the
    deadlocks.

    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Chris Mason
     
  • commit ce62003f690dff38d3164a632ec69efa15c32cbf upstream.

    When failing to allocate space for the whole compressed extent, we'll
    fallback to uncompressed IO, but we've forgotten to redirty the pages
    which belong to this compressed extent, and these 'clean' pages will
    simply skip 'submit' part and go to endio directly, at last we got data
    corruption as we write nothing.

    Signed-off-by: Liu Bo
    Tested-By: Martin Steigerwald
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • commit 6f7ff6d7832c6be13e8c95598884dbc40ad69fb7 upstream.

    Before processing the extent buffer, acquire a read lock on it, so
    that we're safe against concurrent updates on the extent buffer.

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana