23 Jan, 2013

2 commits

  • Pull fuse fixes from Miklos Szeredi:
    "This contain a bugfix for CUSE and miscellaneous small fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: remove unused variable in fuse_try_move_page()
    fuse: make fuse_file_fallocate() static
    fuse: Move CUSE Kconfig entry from fs/Kconfig into fs/fuse/Kconfig
    cuse: fix uninitialized variable warnings
    cuse: do not register multiple devices with identical names
    cuse: use mutex as registration lock instead of spinlocks

    Linus Torvalds
     
  • Pull f2fs fixes from Jaegeuk Kim:
    o Support swap file and link generic_file_remap_pages
    o Enhance the bio streaming flow and free section control
    o Major bug fix on recovery routine
    o Minor bug/warning fixes and code cleanups

    * tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (22 commits)
    f2fs: use _safe() version of list_for_each
    f2fs: add comments of start_bidx_of_node
    f2fs: avoid issuing small bios due to several dirty node pages
    f2fs: support swapfile
    f2fs: add remap_pages as generic_file_remap_pages
    f2fs: add __init to functions in init_f2fs_fs
    f2fs: fix the debugfs entry creation path
    f2fs: add global mutex_lock to protect f2fs_stat_list
    f2fs: remove the blk_plug usage in f2fs_write_data_pages
    f2fs: avoid redundant time update for parent directory in f2fs_delete_entry
    f2fs: remove redundant call to set_blocksize in f2fs_fill_super
    f2fs: move f2fs_balance_fs to punch_hole
    f2fs: add f2fs_balance_fs in several interfaces
    f2fs: revisit the f2fs_gc flow
    f2fs: check return value during recovery
    f2fs: avoid null dereference in f2fs_acl_from_disk
    f2fs: initialize newly allocated dnode structure
    f2fs: update f2fs partition info about SIT/NAT layout
    f2fs: update f2fs document to reflect SIT/NAT layout correctly
    f2fs: remove unneeded INIT_LIST_HEAD at few places
    ...

    Linus Torvalds
     

22 Jan, 2013

6 commits

  • This is calling list_del() inside a loop which is a problem when we try
    move to the next item on the list. I've converted it to use the _safe
    version. And also, as a cleanup, I've converted it to use
    list_for_each_entry instead of list_for_each.

    Signed-off-by: Dan Carpenter
    Reviewed-by: Dmitry Torokhov
    Signed-off-by: Jaegeuk Kim

    Dan Carpenter
     
  • The caller of start_bidx_of_node() should give proper node offsets which
    point only direct node blocks. Otherwise, it is a caller's bug.
    This patch adds comments to make it clear.

    Reported-by: Dan Carpenter
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • If some small bios of dirty node pages are supposed to be issued during the
    sequential data writes, there-in well-produced consecutive data bios are able
    to be split by the small node bios, resulting in performance degradation.
    So, let's collect a number of dirty node pages until reaching a threshold.
    And, by default, I set the threshold as 2MB, a segment size.

    This improves sequential write performance on i5, 512GB SSD (830 w/ SATA2) as
    follows.
    Before: 231 MB/s -> After: 255 MB/s

    Signed-off-by: Jaegeuk Kim
    Reviewed-by: Namjae Jeon

    Jaegeuk Kim
     
  • This patch adds f2fs_bmap operation to the data address space.
    This enables f2fs to support swapfile.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • This was added for all the file systems before.

    See the following commit.

    commit id: 0b173bc4daa8f8ec03a85abf5e47b23502ff80af

    [PATCH] mm: kill vma flag VM_CAN_NONLINEAR

    This patch moves actual ptes filling for non-linear file mappings
    into special vma operation: ->remap_pages().

    File system must implement this method to get non-linear mappings support,
    if it uses filemap_fault() then generic_file_remap_pages() can be used.

    Now device drivers can implement this method and obtain nonlinear vma support."

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Add __init to functions in init_f2fs_fs for code consistency.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Amit Sahrawat
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     

17 Jan, 2013

14 commits

  • The variables mapping,index are initialized but never used
    otherwise, so remove the unused variables.

    dpatch engine is used to auto generate this patch.
    (https://github.com/weiyj/dpatch)

    Signed-off-by: Wei Yongjun
    Signed-off-by: Miklos Szeredi

    Wei Yongjun
     
  • Fix the following sparse warning:

    fs/fuse/file.c:2249:6: warning: symbol 'fuse_file_fallocate' was not declared. Should it be static?

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Given that CUSE depends on FUSE, it only makes sense to move its
    Kconfig entry into the FUSE Kconfig file. Also, add a few grammatical
    and semantic touchups.

    Signed-off-by: Robert P. J. Day
    Signed-off-by: Miklos Szeredi

    Robert P. J. Day
     
  • Fix the following compiler warnings:

    fs/fuse/cuse.c: In function 'cuse_process_init_reply':
    fs/fuse/cuse.c:288:24: warning: 'val' may be used uninitialized in this function [-Wmaybe-uninitialized]
    fs/fuse/cuse.c:272:14: note: 'val' was declared here
    fs/fuse/cuse.c:284:10: warning: 'key' may be used uninitialized in this function [-Wmaybe-uninitialized]
    fs/fuse/cuse.c:272:8: note: 'key' was declared here

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Sysfs doesn't allow two devices with the same name, but we register a
    sysfs entry for each cuse device without checking for name collisions.
    This extends the registration to first check whether the name was already
    registered.

    To avoid race-conditions between the name-check and linking the device, we
    need to protect the whole registration with a mutex.

    Signed-off-by: David Herrmann
    Acked-by: Tejun Heo
    Signed-off-by: Miklos Szeredi

    David Herrmann
     
  • We need to check for name-collisions during cuse-device registration. To
    avoid race-conditions, this needs to be protected during the whole device
    registration. Therefore, replace the spinlocks by mutexes first so we can
    safely extend the locked regions to include more expensive or sleeping
    code paths.

    Signed-off-by: David Herrmann
    Acked-by: Tejun Heo
    Signed-off-by: Miklos Szeredi

    David Herrmann
     
  • Pull xfs bugfixes from Ben Myers:

    - fix(es) for compound buffers

    - fix for dquot soft timer asserts due to overflow of d_blk_softlimit

    - fix for regression in dir v2 code introduced in commit 20f7e9f3726a
    ("xfs: factor dir2 block read operations")

    * tag 'for-linus-v3.8-rc4' of git://oss.sgi.com/xfs/xfs:
    xfs: recalculate leaf entry pointer after compacting a dir2 block
    xfs: remove int casts from debug dquot soft limit timer asserts
    xfs: fix the multi-segment log buffer format
    xfs: fix segment in xfs_buf_item_format_segment
    xfs: rename bli_format to avoid confusion with bli_formats
    xfs: use b_maps[] for discontiguous buffers

    Linus Torvalds
     
  • Dave Jones hit this assert when doing a compile on recent git, with
    CONFIG_XFS_DEBUG enabled:

    XFS: Assertion failed: (char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)), file: fs/xfs/xfs_dir2_data.c, line: 828

    Upon further digging, the tag found by xfs_dir2_data_unused_tag_p(dup)
    contained "2" and not the proper offset, and I found that this value was
    changed after the memmoves under "Use a stale leaf for our new entry."
    in xfs_dir2_block_addname(), i.e.

    memmove(&blp[mid + 1], &blp[mid],
    (highstale - mid) * sizeof(*blp));

    overwrote it.

    What has happened is that the previous call to xfs_dir2_block_compact()
    has rearranged things; it changes btp->count as well as the
    blp array. So after we make that call, we must recalculate the
    proper pointer to the leaf entries by making another call to
    xfs_dir2_block_leaf_p().

    Dave provided a metadump image which led to a simple reproducer
    (create a particular filename in the affected directory) and this
    resolves the testcase as well as the bug on his live system.

    Thanks also to dchinner for looking at this one with me.

    Signed-off-by: Eric Sandeen
    Tested-by: Dave Jones
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Eric Sandeen
     
  • The int casts here make it easy to trigger an assert with a large
    soft limit. For example, set a >4TB soft limit on an empty volume
    to reproduce a (0 > -x) comparison due to an overflow of
    d_blk_softlimit.

    Signed-off-by: Brian Foster
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Brian Foster
     
  • Per Dave Chinner suggestion, this patch:
    1) Corrects the detection of whether a multi-segment buffer is
    still tracking data.
    2) Clears all the buffer log formats for a multi-segment buffer.

    Signed-off-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Mark Tinguely
     
  • Not every segment in a multi-segment buffer is dirty in a
    transaction and they will not be outputted. The assert in
    xfs_buf_item_format_segment() that checks for the at least
    one chunk of data in the segment to be used is not necessary
    true for multi-segmented buffers.

    Signed-off-by: Mark Tinguely
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Mark Tinguely
     
  • Rename the bli_format structure to __bli_format to avoid
    accidently confusing them with the bli_formats pointer.

    Signed-off-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Mark Tinguely
     
  • Commits starting at 77c1a08 introduced a multiple segment support
    to xfs_buf. xfs_trans_buf_item_match() could not find a multi-segment
    buffer in the transaction because it was looking at the single segment
    block number rather than the multi-segment b_maps[0].bm.bn. This
    results on a recursive buffer lock that can never be satisfied.

    This patch:
    1) Changed the remaining b_map accesses to be b_maps[0] accesses.
    2) Renames the single segment b_map structure to __b_map to avoid
    future confusion.

    Signed-off-by: Mark Tinguely
    Reviewed-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Mark Tinguely
     
  • Pull ext3 and udf fixes from Jan Kara:
    "One ext3 performance regression fix and one udf regression fix (oops
    on interrupted mount)."

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    UDF: Fix a null pointer dereference in udf_sb_free_partitions
    jbd: don't wake kjournald unnecessarily

    Linus Torvalds
     

15 Jan, 2013

7 commits

  • As the "status" debugfs entry will be maintained for entire F2FS filesystem
    irrespective of the number of partitions.
    So, we can move the initialization to the init part of the f2fs and destroy will
    be done from exit part. After making changes, for individual partition mount -
    entry creation code will not be executed.

    Signed-off-by: Jianpeng Ma
    Signed-off-by: Namjae Jeon
    Signed-off-by: Amit Sahrawat
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • There is an race condition between umounting f2fs and reading f2fs/status, which
    results in oops.

    Fox example:
    Thread A Thread B
    umount f2fs cat f2fs/status

    f2fs_destroy_stats() { stat_show() {
    list_for_each_entry_safe(&f2fs_stat_list)
    list_del(&si->stat_list);
    mutex_lock(&si->stat_lock);
    si->sbi = NULL;
    mutex_unlock(&si->stat_lock);
    kfree(sbi->stat_info);
    } mutex_lock(&si->stat_lock) stat_list);
    mutex_unlock(&f2fs_stat_mutex);
    kfree(sbi->stat_info); mutex_lock(&f2fs_stat_mutex);
    } list_for_each_entry_safe(&f2fs_stat_list)
    ...
    mutex_unlock(&f2fs_stat_mutex);
    }

    Signed-off-by: Jianpeng Ma
    [jaegeuk.kim@samsung.com: fix typos, description, and remove the existing lock]
    Signed-off-by: Jaegeuk Kim

    majianpeng
     
  • Let's consider the usage of blk_plug in f2fs_write_data_pages().
    We can come up with the two issues: lock contention and task awareness.

    1. Merging bios prior to grabing "queue lock"
    The f2fs merges consecutive IOs in the file system level before
    submitting any bios, which is similar with the back merge by the
    plugging mechanism in attempt_plug_merge(). Both of them need to acquire
    no queue lock.

    2. Merging policy with respect to tasks
    The f2fs merges IOs as much as possible regardless of tasks, while
    blk-plugging is conducted on a basis of tasks. As we can understand
    there are trade-offs, f2fs tries to maximize the write performance with
    well-merged bios.

    As a result, if f2fs produces many consecutive but separated bios in
    writepages(), it would be good to use blk-plugging since f2fs would be
    able to avoid queue lock contention in the block layer by merging them.
    But, f2fs merges IOs and submit one bio, which means that there are not
    much chances to merge bios by attempt_plug_merge().

    However, f2fs has already been used blk_plug by triggering generic_writepages()
    in f2fs_write_data_pages().
    So to make the overall code consistency, I'd like to remove blk_plug there.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Amit Sahrawat
    Signed-off-by: Jaegeuk Kim

    Namjae Jeon
     
  • This patch fixes a regression caused by commit bff943af6fe "udf: Fix memory
    leak when mounting" due to which it was triggering a kernel null point
    dereference in case of interrupted mount OR when allocating memory to
    sbi->s_partmaps failed in function udf_sb_alloc_partition_maps.

    Reported-and-tested-by: James Hogan
    Signed-off-by: Namjae Jeon
    Signed-off-by: Ashish Sangwan
    Signed-off-by: Jan Kara

    Namjae Jeon
     
  • Don't send an extra wakeup to kjournald in the case where we
    already have the proper target in j_commit_request, i.e. that
    commit has already been requested for commit.

    commit d9b0193 "jbd: fix fsync() tid wraparound bug" changed
    the logic leading to a wakeup, but it caused some extra wakeups
    which were found to lead to a measurable performance regression.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Jan Kara

    Eric Sandeen
     
  • Andrew Morton pointed this out a month ago, and then I completely forgot
    about it.

    If we read a partial last page of a block device, we will zero out the
    end of the page, but since that page can then be mapped into user space,
    we should also make sure to flush the cache on architectures that have
    virtual caches. We have the flush_dcache_page() function for this, so
    use it.

    Now, in practice this really never matters, because nobody sane uses
    virtual caches to begin with, and they largely exist on old broken RISC
    arhitectures.

    And even if you did run on one of those obsolete CPU's, the whole "mmap
    and access the last partial page of a block device" behavior probably
    doesn't actually exist. The normal IO functions (read/write) will never
    see the zeroed-out part of the page that migth not be coherent in the
    cache, because they honor the size of the device.

    So I'm marking this for stable (3.7 only), but I'm not sure anybody will
    ever care.

    Pointed-out-by: Andrew Morton
    Cc: stable@vger.kernel.org # 3.7
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull driver core fixes from Greg Kroah-Hartman:
    "Here are two patches for 3.8-rc3.

    One removes the __dev* defines from init.h now that all usages of it
    are gone from your tree. The other fix is for debugfs's paramater
    that was using the wrong base for the option.

    Signed-off-by: Greg Kroah-Hartman "

    * tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    debugfs: convert gid= argument from decimal, not octal
    Remove __dev* markings from init.h

    Linus Torvalds
     

14 Jan, 2013

2 commits


12 Jan, 2013

1 commit

  • The tricky problem is this check:

    if (i++ >= max)

    icc (mis)optimizes this check as:

    if (++i > max)

    The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF).

    This is "allowed" by the C standard, assuming i++ never overflows,
    because signed integer overflow is undefined behavior. This
    optimization effectively reverts the previous commit 362e6663ef23
    ("exec.c, compat.c: fix count(), compat_count() bounds checking") that
    tries to fix the check.

    This patch simply moves ++ after the check.

    Signed-off-by: Xi Wang
    Cc: Jason Baron
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xi Wang
     

11 Jan, 2013

4 commits

  • This patch technically breaks userspace, but I suspect that anyone who
    actually used this flag would have encountered this brokenness, declared
    it lunacy, and already sent a patch.

    Signed-off-by: Dave Reisner
    Reviewed-by: Vasiliy Kulikov
    Acked-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Dave Reisner
     
  • The f2fs_fallocate() has two operations: punch_hole and expand_size.

    Only in the case of punch_hole, dirty node pages can be produced, so let's
    trigger f2fs_balance_fs() in this case only.
    Furthermore, let's trigger it at every data truncation routine.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • The f2fs_balance_fs() is to check the number of free sections and decide whether
    it needs to conduct cleaning or not. If there are not enough free sections, the
    cleaning job should be started.

    In order to control an amount of free sections even under high utilization, f2fs
    should call f2fs_balance_fs at all the VFS interfaces that are able to produce
    dirty pages.
    This patch adds the function calls in the missing interfaces as follows.

    1. f2fs_setxattr()
    The f2fs_setxattr() produces dirty node pages so that we should call
    f2fs_balance_fs() either likewise doing in other VFS interfaces such as
    f2fs_lookup(), f2fs_mkdir(), and so on.

    2. f2fs_sync_file()
    We should guarantee serving free sections for syncing metadata during fsync.
    Previously, there is no space check before triggering checkpoint and
    sync_node_pages.
    Therefore, if a bunch of fsync calls are triggered under 100% of FS utilization,
    f2fs is able to be faced with no free sections, resulting in BUG_ON().

    3. f2fs_sync_fs()
    Before calling write_checkpoint(), we should guarantee that there are minimum
    free sections.

    4. f2fs_write_inode()
    f2fs_write_inode() is also able to produce dirty node pages.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     
  • Fix kernel-doc warnings in fs/seq_file.c:

    Warning(fs/seq_file.c:304): No description found for parameter 'whence'
    Warning(fs/seq_file.c:304): Excess function parameter 'origin' description in 'seq_lseek'

    Signed-off-by: Randy Dunlap
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

10 Jan, 2013

1 commit

  • I'd like to revisit the f2fs_gc flow and rewrite as follows.

    1. In practical, the nGC parameter of f2fs_gc is meaningless. So, let's
    remove it.
    2. Background GC marks victim blocks as dirty one at a time.
    3. Foreground GC should do cleaning job until acquiring enough free
    sections. Afterwards, it needs to do checkpoint.

    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

08 Jan, 2013

3 commits

  • Pull networking fixes from David Miller:

    1) New sysctl ndisc_notify needs some documentation, from Hanns
    Frederic Sowa.

    2) Netfilter REJECT target doesn't set transport header of SKB
    correctly, from Mukund Jampala.

    3) Forcedeth driver needs to check for DMA mapping failures, from Larry
    Finger.

    4) brcmsmac driver can't use usleep_range while holding locks, use
    udelay instead. From Niels Ole Salscheider.

    5) Fix unregister of netlink bridge multicast database handlers, from
    Vlad Yasevich and Rami Rosen.

    6) Fix checksum calculations in netfilter's ipv6 network prefix
    translation module.

    7) Fix high order page allocation failures in netfilter xt_recent, from
    Eric Dumazet.

    8) mac802154 needs to use netif_rx_ni() instead of netif_rx() because
    mac802154_process_data() can execute in process rather than
    interrupt context. From Alexander Aring.

    9) Fix splice handling of MSG_SENDPAGE_NOTLAST, otherwise we elide one
    tcp_push() too many. From Eric Dumazet and Willy Tarreau.

    10) Fix skb->truesize tracking in XEN netfront driver, from Ian
    Campbell.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
    xen/netfront: improve truesize tracking
    ipv4: fix NULL checking in devinet_ioctl()
    tcp: fix MSG_SENDPAGE_NOTLAST logic
    net/ipv4/ipconfig: really display the BOOTP/DHCP server's address.
    ip-sysctl: fix spelling errors
    mac802154: fix NOHZ local_softirq_pending 08 warning
    ipv6: document ndisc_notify in networking/ip-sysctl.txt
    ath9k: Fix Kconfig for ATH9K_HTC
    netfilter: xt_recent: avoid high order page allocations
    netfilter: fix missing dependencies for the NOTRACK target
    netfilter: ip6t_NPT: fix IPv6 NTP checksum calculation
    bridge: add empty br_mdb_init() and br_mdb_uninit() definitions.
    vxlan: allow live mac address change
    bridge: Correctly unregister MDB rtnetlink handlers
    brcmfmac: fix parsing rsn ie for ap mode.
    brcmsmac: add copyright information for Canonical
    rtlwifi: rtl8723ae: Fix warning for unchecked pci_map_single() call
    rtlwifi: rtl8192se: Fix warning for unchecked pci_map_single() call
    rtlwifi: rtl8192de: Fix warning for unchecked pci_map_single() call
    rtlwifi: rtl8192ce: Fix warning for unchecked pci_map_single() call
    ...

    Linus Torvalds
     
  • Pull CIFS fixes from Steve French:
    "Misc small cifs fixes"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: Don't let read only caching for mandatory byte-range locked files
    CIFS: Fix write after setting a read lock for read oplock files
    Revert "CIFS: Fix write after setting a read lock for read oplock files"
    cifs: adjust sequence number downward after signing NT_CANCEL request
    cifs: move check for NULL socket into smb_send_rqst

    Linus Torvalds
     
  • Pull ext4 regression fixes from Ted Ts'o:
    "Bug fixes, including two regressions introduced in v3.8. The most
    serious of these regressions is a buffer cache leak."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: remove duplicate call to ext4_bread() in ext4_init_new_dir()
    ext4: release buffer in failed path in dx_probe()
    ext4: fix configuration dependencies for ext4 ACLs and security labels

    Linus Torvalds