25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

24 Mar, 2011

1 commit


21 Mar, 2011

1 commit

  • * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (25 commits)
    video: change to new flag variable
    scsi: change to new flag variable
    rtc: change to new flag variable
    rapidio: change to new flag variable
    pps: change to new flag variable
    net: change to new flag variable
    misc: change to new flag variable
    message: change to new flag variable
    memstick: change to new flag variable
    isdn: change to new flag variable
    ieee802154: change to new flag variable
    ide: change to new flag variable
    hwmon: change to new flag variable
    dma: change to new flag variable
    char: change to new flag variable
    fs: change to new flag variable
    xtensa: change to new flag variable
    um: change to new flag variables
    s390: change to new flag variable
    mips: change to new flag variable
    ...

    Fix up trivial conflict in drivers/hwmon/Makefile

    Linus Torvalds
     

17 Mar, 2011

2 commits

  • Replace EXTRA_CFLAGS with ccflags-y. And change ntfs-objs to ntfs-y
    for cleaner conditional inclusion.

    Signed-off-by: matt mooney
    Acked-by: WANG Cong
    Signed-off-by: Michal Marek

    matt mooney
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
    AppArmor: kill unused macros in lsm.c
    AppArmor: cleanup generated files correctly
    KEYS: Add an iovec version of KEYCTL_INSTANTIATE
    KEYS: Add a new keyctl op to reject a key with a specified error code
    KEYS: Add a key type op to permit the key description to be vetted
    KEYS: Add an RCU payload dereference macro
    AppArmor: Cleanup make file to remove cruft and make it easier to read
    SELinux: implement the new sb_remount LSM hook
    LSM: Pass -o remount options to the LSM
    SELinux: Compute SID for the newly created socket
    SELinux: Socket retains creator role and MLS attribute
    SELinux: Auto-generate security_is_socket_class
    TOMOYO: Fix memory leak upon file open.
    Revert "selinux: simplify ioctl checking"
    selinux: drop unused packet flow permissions
    selinux: Fix packet forwarding checks on postrouting
    selinux: Fix wrong checks for selinux_policycap_netpeer
    selinux: Fix check for xfrm selinux context algorithm
    ima: remove unnecessary call to ima_must_measure
    IMA: remove IMA imbalance checking
    ...

    Linus Torvalds
     

16 Mar, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
    GFS2: Don't use _raw version of RCU dereference
    GFS2: Adding missing unlock_page()
    GFS2: Update to AIL list locking
    GFS2: introduce AIL lock
    GFS2: fix block allocation check for fallocate
    GFS2: Optimize glock multiple-dequeue code
    GFS2: Remove potential race in flock code
    GFS2: Fix glock deallocation race
    GFS2: quota allows exceeding hard limit
    GFS2: deallocation performance patch
    GFS2: panics on quotacheck update
    GFS2: Improve cluster mmap scalability
    GFS2: Fix glock queue trace point
    GFS2: Post-VFS scale update for RCU path walk
    GFS2: Use RCU for glock hash table

    Linus Torvalds
     
  • James Morris
     

15 Mar, 2011

1 commit


14 Mar, 2011

3 commits

  • gfs2_write_begin() calls grab_cache_page_write_begin() that returns *locked*
    page. Correspondent error-handling path lacks for unlock_page() call:

    > out:
    > if (error == 0)
    > return 0;
    >
    > page_cache_release(page);

    The whole system hangs if gfs2_unstuff_dinode() called from gfs2_write_begin()
    failed for some reason.

    Reported-by: Maxim
    Signed-off-by: Maxim
    Signed-off-by: Steven Whitehouse

    Maxim
     
  • The exportfs encode handle function should return the minimum required
    handle size. This helps user to find out the handle size by passing 0
    handle size in the first step and then redoing to the call again with
    the returned handle size value.

    Acked-by: Serge Hallyn
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     
  • The previous patch missed a couple of places where the AIL list
    needed locking, so this fixes up those places, plus a comment
    is corrected too.

    Signed-off-by: Steven Whitehouse
    Cc: Dave Chinner

    Steven Whitehouse
     

11 Mar, 2011

3 commits

  • The log lock is currently used to protect the AIL lists and
    the movements of buffers into and out of them. The lists
    are self contained and no log specific items outside the
    lists are accessed when starting or emptying the AIL lists.

    Hence the operation of the AIL does not require the protection
    of the log lock so split them out into a new AIL specific lock
    to reduce the amount of traffic on the log lock. This will
    also reduce the amount of serialisation that occurs when
    the gfs2_logd pushes on the AIL to move it forward.

    This reduces the impact of log pushing on sequential write
    throughput.

    Signed-off-by: Dave Chinner
    Signed-off-by: Steven Whitehouse

    Dave Chinner
     
  • GFS2 fallocate wasn't properly checking if a blocks were already allocated.
    In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2
    was always treating it as if there were no blocks allocated for that page.
    GFS2 now calls gfs2_block_map() to check if the blocks are allocated before
    writing them out.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • This is a small patch that optimizes multiple glock dequeue
    operations. It changes the unlock order to be more efficient
    and makes it easier for lock debugging tools to unravel. It
    also eliminates the need for the temp variable x, although
    that would likely be optimized out.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

10 Mar, 2011

4 commits


09 Mar, 2011

3 commits

  • This patch ensures that we always wait for glock demotion when
    dropping flocks on a file in order to prevent any race
    conditions associated with further flock calls or closing
    the file.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch fixes a race in deallocating glocks which was introduced
    in the RCU glock patch. We need to ensure that the glock count is
    kept correct even in the case that there is a race to add a new
    glock into the hash table. Also, to avoid having to wait for an
    RCU grace period, the glock counter can be decremented before
    call_rcu() is called.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Immediately after being synced to disk, cached quotas are zeroed out and a
    subsequent access of the cached quotas results in incorrect zero values. This
    meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage
    values it found in the cached quotas and comparison against warn/limits never
    triggered a quota violation.

    This patch adds a new flag QDF_REFRESH that is set after a sync so that the
    cached quotas are forcefully refreshed from disk on a subsequent access on
    seeing this flag set.

    Resolves: rhbz#675944
    Signed-off-by: Abhi Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     

08 Mar, 2011

1 commit


24 Feb, 2011

2 commits

  • This patch is a performance improvement to GFS2's dealloc code.
    Rather than update the quota file and statfs file for every
    single block that's stripped off in unlink function do_strip,
    this patch keeps track and updates them once for every layer
    that's stripped. This is done entirely inside the existing
    transaction, so there should be no risk of corruption.
    The other functions that deallocate blocks will be unaffected
    because they are using wrapper functions that do the same
    thing that they do today.

    I tested this code on my roth cluster by creating 200
    files in a directory, each of which is 100MB, then on
    four nodes, I simultaneously deleted the files, thus competing
    for GFS2 resources (but different files). The commands
    I used were:

    [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
    [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
    [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
    [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done

    The performance increase was significant:

    roth-01 roth-02 roth-03 roth-05
    --------- --------- --------- ---------
    old: real 0m34.027 0m25.021s 0m23.906s 0m35.646s
    new: real 0m22.379s 0m24.362s 0m24.133s 0m18.562s

    Total time spent deleting:
    old: 118.6s
    new: 89.4

    For this particular case, this showed a 25% performance increase for
    GFS2 unlinks.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Michael Leun reported that running parallel opens on a fuse filesystem
    can trigger a "kernel BUG at mm/truncate.c:475"

    Gurudas Pai reported the same bug on NFS.

    The reason is, unmap_mapping_range() is not prepared for more than
    one concurrent invocation per inode. For example:

    thread1: going through a big range, stops in the middle of a vma and
    stores the restart address in vm_truncate_count.

    thread2: comes in with a small (e.g. single page) unmap request on
    the same vma, somewhere before restart_address, finds that the
    vma was already unmapped up to the restart address and happily
    returns without doing anything.

    Another scenario would be two big unmap requests, both having to
    restart the unmapping and each one setting vm_truncate_count to its
    own value. This could go on forever without any of them being able to
    finish.

    Truncate and hole punching already serialize with i_mutex. Other
    callers of unmap_mapping_range() do not, and it's difficult to get
    i_mutex protection for all callers. In particular ->d_revalidate(),
    which calls invalidate_inode_pages2_range() in fuse, may be called
    with or without i_mutex.

    This patch adds a new mutex to 'struct address_space' to prevent
    running multiple concurrent unmap_mapping_range() on the same mapping.

    [ We'll hopefully get rid of all this with the upcoming mm
    preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
    lockbreak" patch in particular. But that is for 2.6.39 ]

    Signed-off-by: Miklos Szeredi
    Reported-by: Michael Leun
    Reported-by: Gurudas Pai
    Tested-by: Gurudas Pai
    Acked-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

17 Feb, 2011

1 commit

  • There are two spellings in use for 'freeze' + 'able' - 'freezable' and
    'freezeable'. The former is the more prominent one. The latter is
    mostly used by workqueue and in a few other odd places. Unify the
    spelling to 'freezable'.

    Signed-off-by: Tejun Heo
    Reported-by: Alan Stern
    Acked-by: "Rafael J. Wysocki"
    Acked-by: Greg Kroah-Hartman
    Acked-by: Dmitry Torokhov
    Cc: David Woodhouse
    Cc: Alex Dubov
    Cc: "David S. Miller"
    Cc: Steven Whitehouse

    Tejun Heo
     

08 Feb, 2011

1 commit

  • Handle block allocation for forceful unstuffing of quota dinode during quota
    update using quotactl(). Also fix block reservation for special cases when
    quotas cross over block boundaries and update 2 blocks instead of 1.

    Signed-off-by: Abhi Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     

02 Feb, 2011

2 commits

  • The mmap system call grabs a glock when an update to atime maybe
    required. It does this in order to ensure that the flags on the
    inode are uptodate, but since it will only mark atime for a future
    update, an exclusive lock is not required here (one will be taken
    later when the actual update is performed).

    Also, the lock can be skipped when the mount is marked noatime in
    addition to the original check which only looked at the noatime
    flag for the inode itself.

    This should increase the scalability of the mmap call when multiple
    nodes are all mmaping the same file.

    Reported-by: Scooter Morris
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • SELinux would like to implement a new labeling behavior of newly created
    inodes. We currently label new inodes based on the parent and the creating
    process. This new behavior would also take into account the name of the
    new object when deciding the new label. This is not the (supposed) full path,
    just the last component of the path.

    This is very useful because creating /etc/shadow is different than creating
    /etc/passwd but the kernel hooks are unable to differentiate these
    operations. We currently require that userspace realize it is doing some
    difficult operation like that and than userspace jumps through SELinux hoops
    to get things set up correctly. This patch does not implement new
    behavior, that is obviously contained in a seperate SELinux patch, but it
    does pass the needed name down to the correct LSM hook. If no such name
    exists it is fine to pass NULL.

    Signed-off-by: Eric Paris

    Eric Paris
     

31 Jan, 2011

1 commit


21 Jan, 2011

2 commits

  • We can allow a few more cases to use RCU path walking than
    originally allowed. It should be possible to also enable
    RCU path walking when the glock is already cached. Thats
    a bit more complicated though, so left for a future patch.

    Signed-off-by: Steven Whitehouse
    Cc: Nick Piggin

    Steven Whitehouse
     
  • This has a number of advantages:

    - Reduces contention on the hash table lock
    - Makes the code smaller and simpler
    - Should speed up glock dumps when under load
    - Removes ref count changing in examine_bucket
    - No longer need hash chain lock in glock_put() in common case

    There are some further changes which this enables and which
    we may do in the future. One is to look at using SLAB_RCU,
    and another is to look at using a per-cpu counter for the
    per-sb glock counter, since that is touched twice in the
    lifetime of each glock (but only used at umount time).

    Signed-off-by: Steven Whitehouse
    Cc: Paul E. McKenney

    Steven Whitehouse
     

18 Jan, 2011

2 commits

  • In the (impossible, except if there is fs corruption) error path
    in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
    fails, it was leaving the function by calling iput() rather
    than iget_failed(). This would cause future lookups of the same
    inode to block forever.

    This patch fixes the problem by moving the call to gfs2_inode_refresh()
    into gfs2_inode_lookup() where iget_failed() is part of the error path
    already. Also this cleans up some unreachable code and makes
    gfs2_set_iop() static.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • When a file gets deleted on GFS2, if a node can't get an exclusive lock on the
    file's iopen glock, it punts on actually freeing up the space, because another
    node is using the file. When it does this, it needs to drop the iopen glock
    from its cache so that the other node can get an exclusive lock on it. Now,
    gfs2_delete_inode() sets GL_NOCACHE before dropping the shared lock on the
    iopen glock in preparation for grabbing it in the exclusive state. Since the
    node needs the glock in the exclusive state, dropping the shared lock from the
    cache doesn't slow down the case where no other nodes are using the file.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

17 Jan, 2011

2 commits

  • Currently all filesystems except XFS implement fallocate asynchronously,
    while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
    I/O we really want our allocation on disk, especially for the !KEEP_SIZE
    case where we actually grow the file with user-visible zeroes. On the
    other hand always commiting the transaction is a bad idea for fast-path
    uses of fallocate like for example in recent Samba versions. Given
    that block allocation is a data plane operation anyway change it from
    an inode operation to a file operation so that we have the file structure
    available that lets us check for O_SYNC.

    This also includes moving the code around for a few of the filesystems,
    and remove the already unnedded S_ISDIR checks given that we only wire
    up fallocate for regular files.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of various home grown checks that might need updates for new
    flags just check for any bit outside the mask of the features supported
    by the filesystem. This makes the check future proof for any newly
    added flag.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

14 Jan, 2011

1 commit

  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     

13 Jan, 2011

2 commits


11 Jan, 2011

1 commit


08 Jan, 2011

1 commit

  • …t/npiggin/linux-npiggin

    * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
    fs: scale mntget/mntput
    fs: rename vfsmount counter helpers
    fs: implement faster dentry memcmp
    fs: prefetch inode data in dcache lookup
    fs: improve scalability of pseudo filesystems
    fs: dcache per-inode inode alias locking
    fs: dcache per-bucket dcache hash locking
    bit_spinlock: add required includes
    kernel: add bl_list
    xfs: provide simple rcu-walk ACL implementation
    btrfs: provide simple rcu-walk ACL implementation
    ext2,3,4: provide simple rcu-walk ACL implementation
    fs: provide simple rcu-walk generic_check_acl implementation
    fs: provide rcu-walk aware permission i_ops
    fs: rcu-walk aware d_revalidate method
    fs: cache optimise dentry and inode for rcu-walk
    fs: dcache reduce branches in lookup path
    fs: dcache remove d_mounted
    fs: fs_struct use seqlock
    fs: rcu-walk for path lookup
    ...

    Linus Torvalds