25 Sep, 2013

13 commits

  • Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic
    kernel") did some cleanup for reboot= command line, but it made the
    reboot_default inoperative.

    The default value of variable reboot_default should be 1, and if command
    line reboot= is not set, system will use the default reboot mode.

    [akpm@linux-foundation.org: fix comment layout]
    Signed-off-by: Li Fei
    Signed-off-by: liu chuansheng
    Acked-by: Robin Holt
    Cc: [3.11.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chuansheng Liu
     
  • After commit 829199197a43 ("kernel/audit.c: avoid negative sleep
    durations") audit emitters will block forever if userspace daemon cannot
    handle backlog.

    After the timeout the waiting loop turns into busy loop and runs until
    daemon dies or returns back to work. This is a minimal patch for that
    bug.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Luiz Capitulino
    Cc: Richard Guy Briggs
    Cc: Eric Paris
    Cc: Chuck Anderson
    Cc: Dan Duval
    Cc: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Revert commit 3b38722efd9f ("memcg, vmscan: integrate soft reclaim
    tighter with zone shrinking code")

    I merged this prematurely - Michal and Johannes still disagree about the
    overall design direction and the future remains unclear.

    Cc: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert commit e883110aad71 ("memcg: get rid of soft-limit tree
    infrastructure")

    I merged this prematurely - Michal and Johannes still disagree about the
    overall design direction and the future remains unclear.

    Cc: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert commit a5b7c87f9207 ("vmscan, memcg: do softlimit reclaim also
    for targeted reclaim")

    I merged this prematurely - Michal and Johannes still disagree about the
    overall design direction and the future remains unclear.

    Cc: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert commit de57780dc659 ("memcg: enhance memcg iterator to support
    predicates")

    I merged this prematurely - Michal and Johannes still disagree about the
    overall design direction and the future remains unclear.

    Cc: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert commit 7d910c054be4 ("memcg: track children in soft limit excess
    to improve soft limit")

    I merged this prematurely - Michal and Johannes still disagree about the
    overall design direction and the future remains unclear.

    Cc: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert commit e839b6a1c8d0 ("memcg, vmscan: do not attempt soft limit
    reclaim if it would not scan anything")

    I merged this prematurely - Michal and Johannes still disagree about the
    overall design direction and the future remains unclear.

    Cc: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert commit 1be171d60bdd ("memcg: track all children over limit in the
    root")

    I merged this prematurely - Michal and Johannes still disagree about the
    overall design direction and the future remains unclear.

    Cc: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert commit e975de998b96 ("memcg, vmscan: do not fall into reclaim-all
    pass too quickly")

    I merged this prematurely - Michal and Johannes still disagree about the
    overall design direction and the future remains unclear.

    Cc: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • While printing 32-bit node numbers, an 8-byte string is not enough.
    Increase the size of the string to 12 chars.

    This got left out in commit 49fa8140e487 ("fs/ocfs2/super.c: Use bigger
    nodestr to accomodate 32-bit node numbers").

    Signed-off-by: Goldwyn Rodrigues
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     
  • watchdog_tresh controls how often nmi perf event counter checks per-cpu
    hrtimer_interrupts counter and blows up if the counter hasn't changed
    since the last check. The counter is updated by per-cpu
    watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
    period which guarantees that hrtimer is scheduled 2 times per the main
    period. Both hrtimer and perf event are started together when the
    watchdog is enabled.

    So far so good. But...

    But what happens when watchdog_thresh is updated from sysctl handler?

    proc_dowatchdog will set a new sampling period and hrtimer callback
    (watchdog_timer_fn) will use the new value in the next round. The
    problem, however, is that nobody tells the perf event that the sampling
    period has changed so it is ticking with the period configured when it
    has been set up.

    This might result in an ear ripping dissonance between perf and hrtimer
    parts if the watchdog_thresh is increased. And even worse it might lead
    to KABOOM if the watchdog is configured to panic on such a spurious
    lockup.

    This patch fixes the issue by updating both nmi perf even counter and
    hrtimers if the threshold value has changed.

    The nmi one is disabled and then reinitialized from scratch. This has
    an unpleasant side effect that the allocation of the new event might
    fail theoretically so the hard lockup detector would be disabled for
    such cpus. On the other hand such a memory allocation failure is very
    unlikely because the original event is deallocated right before.

    It would be much nicer if we just changed perf event period but there
    doesn't seem to be any API to do that right now. It is also unfortunate
    that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
    cannot use on_each_cpu() and do the same thing from the per-cpu context.
    The update from the current CPU should be safe because
    perf_event_disable removes the event atomically before it clears the
    per-cpu watchdog_ev so it cannot change anything under running handler
    feet.

    The hrtimer is simply restarted (thanks to Don Zickus who has pointed
    this out) if it is queued because we cannot rely it will fire&adopt to
    the new sampling period before a new nmi event triggers (when the
    treshold is decreased).

    [akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place]
    Signed-off-by: Michal Hocko
    Acked-by: Don Zickus
    Cc: Frederic Weisbecker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Fabio Estevam
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • proc_dowatchdog doesn't synchronize multiple callers which might lead to
    confusion when two parallel callers might confuse watchdog_enable_all_cpus
    resp watchdog_disable_all_cpus (eg watchdog gets enabled even if
    watchdog_thresh was set to 0 already).

    This patch adds a local mutex which synchronizes callers to the sysctl
    handler.

    Signed-off-by: Michal Hocko
    Cc: Frederic Weisbecker
    Acked-by: Don Zickus
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

24 Sep, 2013

3 commits

  • Linus Torvalds
     
  • Pull staging fixes from Greg KH:
    "Here are a number of small staging tree and iio driver fixes. Nothing
    major, just lots of little things"

    * tag 'staging-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (34 commits)
    iio:buffer_cb: Add missing iio_buffer_init()
    iio: Prevent race between IIO chardev opening and IIO device free
    iio: fix: Keep a reference to the IIO device for open file descriptors
    iio: Stop sampling when the device is removed
    iio: Fix crash when scan_bytes is computed with active_scan_mask == NULL
    iio: Fix mcp4725 dev-to-indio_dev conversion in suspend/resume
    iio: Fix bma180 dev-to-indio_dev conversion in suspend/resume
    iio: Fix tmp006 dev-to-indio_dev conversion in suspend/resume
    iio: iio_device_add_event_sysfs() bugfix
    staging: iio: ade7854-spi: Fix return value
    staging:iio:hmc5843: Fix measurement conversion
    iio: isl29018: Fix uninitialized value
    staging:iio:dummy fix kfifo_buf kconfig dependency issue if kfifo modular and buffer enabled for built in dummy driver.
    iio: at91: fix adc_clk overflow
    staging: line6: add bounds check in snd_toneport_source_put()
    Staging: comedi: Fix dependencies for drivers misclassified as PCI
    staging: r8188eu: Adjust RX gain
    staging: r8188eu: Fix smatch warning in core/rtw_ieee80211.
    staging: r8188eu: Fix smatch error in core/rtw_mlme_ext.c
    staging: r8188eu: Fix Smatch off-by-one warning in hal/rtl8188e_hal_init.c
    ...

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are a number of small USB fixes for 3.12-rc2.

    One is a revert of a EHCI change that isn't quite ready for 3.12.
    Others are minor things, gadget fixes, Kconfig fixes, and some quirks
    and documentation updates.

    All have been in linux-next for a bit"

    * tag 'usb-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    USB: pl2303: distinguish between original and cloned HX chips
    USB: Faraday fotg210: fix email addresses
    USB: fix typo in usb serial simple driver Kconfig
    Revert "USB: EHCI: support running URB giveback in tasklet context"
    usb: s3c-hsotg: do not disconnect gadget when receiving ErlySusp intr
    usb: s3c-hsotg: fix unregistration function
    usb: gadget: f_mass_storage: reset endpoint driver data when disabled
    usb: host: fsl-mph-dr-of: Staticize local symbols
    usb: gadget: f_eem: Staticize eem_alloc
    usb: gadget: f_ecm: Staticize ecm_alloc
    usb: phy: omap-usb3: Fix return value
    usb: dwc3: gadget: avoid memory leak when failing to allocate all eps
    usb: dwc3: remove extcon dependency
    usb: gadget: add '__ref' for rndis_config_register() and cdc_config_register()
    usb: dwc3: pci: add support for BayTrail
    usb: gadget: cdc2: fix conversion to new interface of f_ecm
    usb: gadget: fix a bug and a WARN_ON in dummy-hcd
    usb: gadget: mv_u3d_core: fix violation of locking discipline in mv_u3d_ep_disable()

    Linus Torvalds
     

23 Sep, 2013

4 commits

  • Pull drm fixes from Dave Airlie:
    - some small fixes for msm and exynos
    - a regression revert affecting nouveau users with old userspace
    - intel pageflip deadlock and gpu hang fixes, hsw modesetting hangs

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
    Revert "drm: mark context support as a legacy subsystem"
    drm/i915: Don't enable the cursor on a disable pipe
    drm/i915: do not update cursor in crtc mode set
    drm/exynos: fix return value check in lowlevel_buffer_allocate()
    drm/exynos: Fix address space warnings in exynos_drm_fbdev.c
    drm/exynos: Fix address space warning in exynos_drm_buf.c
    drm/exynos: Remove redundant OF dependency
    drm/msm: drop unnecessary set_need_resched()
    drm/i915: kill set_need_resched
    drm/msm: fix potential NULL pointer dereference
    drm/i915/dvo: set crtc timings again for panel fixed modes
    drm/i915/sdvo: Robustify the dtddrm_mode conversions
    drm/msm: workaround for missing irq
    drm/msm: return -EBUSY if bo still active
    drm/msm: fix return value check in ERR_PTR()
    drm/msm: fix cmdstream size check
    drm/msm: hangcheck harder
    drm/msm: handle read vs write fences
    drm/i915/sdvo: Fully translate sync flags in the dtd->mode conversion
    drm/i915: Use proper print format for debug prints
    ...

    Linus Torvalds
     
  • Pull block IO fixes from Jens Axboe:
    "After merge window, no new stuff this time only a collection of neatly
    confined and simple fixes"

    * 'for-3.12/core' of git://git.kernel.dk/linux-block:
    cfq: explicitly use 64bit divide operation for 64bit arguments
    block: Add nr_bios to block_rq_remap tracepoint
    If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
    bio-integrity: Fix use of bs->bio_integrity_pool after free
    blkcg: relocate root_blkg setting and clearing
    block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
    block: trace all devices plug operation

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "These are mostly bug fixes and a two small performance fixes. The
    most important of the bunch are Josef's fix for a snapshotting
    regression and Mark's update to fix compile problems on arm"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
    Btrfs: create the uuid tree on remount rw
    btrfs: change extent-same to copy entire argument struct
    Btrfs: dir_inode_operations should use btrfs_update_time also
    btrfs: Add btrfs: prefix to kernel log output
    btrfs: refuse to remount read-write after abort
    Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
    Btrfs: don't leak transaction in btrfs_sync_file()
    Btrfs: add the missing mutex unlock in write_all_supers()
    Btrfs: iput inode on allocation failure
    Btrfs: remove space_info->reservation_progress
    Btrfs: kill delay_iput arg to the wait_ordered functions
    Btrfs: fix worst case calculator for space usage
    Revert "Btrfs: rework the overcommit logic to be based on the total size"
    Btrfs: improve replacing nocow extents
    Btrfs: drop dir i_size when adding new names on replay
    Btrfs: replay dir_index items before other items
    Btrfs: check roots last log commit when checking if an inode has been logged
    Btrfs: actually log directory we are fsync()'ing
    Btrfs: actually limit the size of delalloc range
    Btrfs: allocate the free space by the existed max extent size when ENOSPC
    ...

    Linus Torvalds
     
  • 'samples' is 64bit operant, but do_div() second parameter is 32.
    do_div silently truncates high 32 bits and calculated result
    is invalid.

    In case if low 32bit of 'samples' are zeros then do_div() produces
    kernel crash.

    Signed-off-by: Anatol Pomozov
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Anatol Pomozov
     

22 Sep, 2013

3 commits

  • …/jic23/iio into staging-linus

    Jonathan writes:

    First round of IIO fixes for 3.12

    A series of wrong 'struct dev' assumptions in suspend/resume callbacks
    following on from this issue being identified in a new driver review.
    One to watch out for in future.

    A number of driver specific fixes
    1) at91 - fix a overflow in clock rate computation
    2) dummy - Kconfig dependency issue
    3) isl29018 - uninitialized value
    4) hmc5843 - measurement conversion bug introduced by recent cleanup.
    5) ade7854-spi - wrong return value.

    Some IIO core fixes
    1) Wrong value picked up for event code creation for a modified channel
    2) A null dereference on failure to initialize a buffer after no buffer has
    been in use, when using the available_scan_masks approach.
    3) Sampling not stopped when a device is removed. Effects forced removal
    such as hot unplugging.
    4) Prevent device going away if a chrdev is still open in userspace.
    5) Prevent race on chardev opening and device being freed.
    6) Add a missing iio_buffer_init in the call back buffer.

    These last few are the first part of a set from Lars-Peter Clausen who
    has been taking a closer look at our removal paths and buffer handling
    than anyone has for quite some time.

    Greg Kroah-Hartman
     
  • Pull NFS client bugfix from Trond Myklebust:
    "Fix a regression due to incorrect sharing of gss auth caches"

    * tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    RPCSEC_GSS: fix crash on destroying gss auth

    Linus Torvalds
     
  • Adding the number of bios in a remapped request to 'block_rq_remap'
    tracepoint.

    Request remapper clones bios in a request to track the completion
    status of each bio. So the number of bios can be useful information
    for investigation.

    Related discussions:
    http://www.redhat.com/archives/dm-devel/2013-August/msg00084.html
    http://www.redhat.com/archives/dm-devel/2013-September/msg00024.html

    Signed-off-by: Jun'ichi Nomura
    Acked-by: Mike Snitzer
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Jun'ichi Nomura
     

21 Sep, 2013

17 commits

  • Users have been complaining of the uuid tree stuff warning that there is no uuid
    root when trying to do snapshot operations. This is because if you mount -o ro
    we will not create the uuid tree. But then if you mount -o rw,remount we will
    still not create it and then any subsequent snapshot/subvol operations you try
    to do will fail gloriously. Fix this by creating the uuid_root on remount rw if
    it was not already there. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data
    back to it's argument struct. Unfortunately, not all architectures provide
    __put_user_unaligned(), so compiles break on them if btrfs is selected.

    Instead, just copy the whole struct in / out at the start and end of
    operations, respectively.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Mark Fasheh
     
  • Commit 2bc5565286121d2a77ccd728eb3484dff2035b58 (Btrfs: don't update atime on
    RO subvolumes) ensures that the access time of an inode is not updated when
    the inode lives in a read-only subvolume.
    However, if a directory on a read-only subvolume is accessed, the atime is
    updated. This results in a write operation to a read-only subvolume. I
    believe that access times should never be updated on read-only subvolumes.

    To reproduce:

    # mkfs.btrfs -f /dev/dm-3
    (...)
    # mount /dev/dm-3 /mnt
    # btrfs subvol create /mnt/sub
    Create subvolume '/mnt/sub'
    # mkdir /mnt/sub/dir
    # echo "abc" > /mnt/sub/dir/file
    # btrfs subvol snapshot -r /mnt/sub /mnt/rosnap
    Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap'
    # stat /mnt/rosnap/dir
    File: `/mnt/rosnap/dir'
    Size: 8 Blocks: 0 IO Block: 4096 directory
    Device: 16h/22d Inode: 257 Links: 1
    Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
    Access: 2013-09-11 07:21:49.389157126 -0400
    Modify: 2013-09-11 07:22:02.330156079 -0400
    Change: 2013-09-11 07:22:02.330156079 -0400
    # ls /mnt/rosnap/dir
    file
    # stat /mnt/rosnap/dir
    File: `/mnt/rosnap/dir'
    Size: 8 Blocks: 0 IO Block: 4096 directory
    Device: 16h/22d Inode: 257 Links: 1
    Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
    Access: 2013-09-11 07:22:56.797151670 -0400
    Modify: 2013-09-11 07:22:02.330156079 -0400
    Change: 2013-09-11 07:22:02.330156079 -0400

    Reported-by: Koen De Wit
    Signed-off-by: Guangyu Sun
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Guangyu Sun
     
  • The kernel log entries for device label %s and device fsid %pU
    are missing the btrfs: prefix. Add those here.

    Signed-off-by: Frank Holton
    Reviewed-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Frank Holton
     
  • It's still possible to flip the filesystem into RW mode after it's
    remounted RO due to an abort. There are lots of places that check for
    the superblock error bit and will not write data, but we should not let
    the filesystem appear read-write.

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    David Sterba
     
  • This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default
    subvolume by passing a subvolume id of 0.

    Signed-off-by: chandan
    Reviewed-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    chandan
     
  • In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns
    a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would
    return without ending/freeing the transaction.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     
  • The BUG() was replaced by btrfs_error() and return -EIO with the
    patch "get rid of one BUG() in write_all_supers()", but the missing
    mutex_unlock() was overlooked.

    The 0-DAY kernel build service from Intel reported the missing
    unlock which was found by the coccinelle tool:

    fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374

    Signed-off-by: Stefan Behrens
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Stefan Behrens
     
  • We don't do the iput when we fail to allocate our delayed delalloc work in
    __start_delalloc_inodes, fix this.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This isn't used for anything anymore, just remove it.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This is a left over of how we used to wait for ordered extents, which was to
    grab the inode and then run filemap flush on it. However if we have an ordered
    extent then we already are holding a ref on the inode, and we just use
    btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
    the inode to start work on the ordered extent. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Forever ago I made the worst case calculator say that we could potentially split
    into 3 blocks for every level on the way down, which isn't right. If we split
    we're only going to get two new blocks, the one we originally cow'ed and the new
    one we're going to split. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This reverts commit 70afa3998c9baed4186df38988246de1abdab56d. It is causing
    performance issues and wasn't actually correct. There were problems with the
    way we flushed delalloc and that was the real cause of the early enospc.
    Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Various people have hit a deadlock when running btrfs/011. This is because when
    replacing nocow extents we will take the i_mutex to make sure nobody messes with
    the file while we are replacing the extent. The problem is we are already
    holding a transaction open, which is a locking inversion, so instead we need to
    save these inodes we find and then process them outside of the transaction.

    Further we can't just lock the inode and assume we are good to go. We need to
    lock the extent range and then read back the extent cache for the inode to make
    sure the extent really still points at the physical block we want. If it
    doesn't we don't have to copy it. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • So if we have dir_index items in the log that means we also have the inode item
    as well, which means that the inode's i_size is correct. However when we
    process dir_index'es we call btrfs_add_link() which will increase the
    directory's i_size for the new entry. To fix this we need to just set the dir
    items i_size to 0, and then as we find dir_index items we adjust the i_size.
    btrfs_add_link() will do it for new entries, and if the entry already exists we
    can just add the name_len to the i_size ourselves. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • A user reported a bug where his log would not replay because he was getting
    -EEXIST back. This was because he had a file moved into a directory that was
    logged. What happens is the file had a lower inode number, and so it is
    processed first when replaying the log, and so we add the inode ref in for the
    directory it was moved to. But then we process the directories DIR_INDEX item
    and try to add the inode ref for that inode and it fails because we already
    added it when we replayed the inode. To solve this problem we need to just
    process any DIR_INDEX items we have in the log first so this all is taken care
    of, and then we can replay the rest of the items. With this patch my reproducer
    can remount the file system properly instead of erroring out. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Liu introduced a local copy of the last log commit for an inode to make sure we
    actually log an inode even if a log commit has already taken place. In order to
    make sure we didn't relog the same inode multiple times he set this local copy
    to the current trans when we log the inode, because usually we log the inode and
    then sync the log. The exception to this is during rename, we will relog an
    inode if the name changed and it is already in the log. The problem with this
    is then we go to sync the inode, and our check to see if the inode has already
    been logged is tripped and we don't sync the log. To fix this we need to _also_
    check against the roots last log commit, because it could be less than what is
    in our local copy of the log commit. This fixes a bug where we rename a file
    into a directory and then fsync the directory and then on remount the directory
    is no longer there. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik