23 Feb, 2012

12 commits

  • Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Add a new data structure to allow sharing code between the log grant and
    regrant code.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • The tic->t_wait waitqueues can never have more than a single waiter
    on them, so we can easily replace them with a task_struct pointer
    and wake_up_process.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Remove the now unused opportunistic parameter, and use the the
    xlog_writeq_wake and xlog_reserveq_wake helpers now that we don't have
    to care about the opportunistic wakeups.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • There is no reason to wake up log space waiters when unlocking inodes or
    dquots, and the commit log has no explanation for this function either.

    Given that we now have exact log space wakeups everywhere we can assume
    the reason for this function was to paper over log space races in earlier
    XFS versions.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • The only reason that xfs_log_space_wake had to do opportunistic wakeups
    was that the old xfs_log_move_tail calling convention didn't allow for
    exact wakeups when not updating the log tail LSN. Since this issue has
    been fixed we can do exact wakeups now.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Currently xfs_log_move_tail has a tail_lsn argument that is horribly
    overloaded: it may contain either an actual lsn to assign to the log tail,
    0 as a special case to use the last sync LSN, or 1 to indicate that no tail
    LSN assignment should be performed, and we should opportunisticly wake up
    at one task waiting for log space even if we did not move the LSN.

    Remove the tail lsn assigned from xfs_log_move_tail and make the two callers
    use xlog_assign_tail_lsn instead of the current variant of partially using
    the code in xfs_log_move_tail and partially opencoding it. Note that means
    we grow an addition lock roundtrip on the AIL lock for each bulk update
    or delete, which is still far less than what we had before introducing the
    bulk operations. If this proves to be a problem we can still add a variant
    of xlog_assign_tail_lsn that expects the lock to be held already.

    Also rename the remainder of xfs_log_move_tail to xfs_log_space_wake as
    that name describes its functionality much better.

    Reviewed-by: Mark Tinguely
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • This patch is a cleanup of quota check on disk blocks and inodes
    reservations, and changes it as follows.

    (1) add a total_count variable to store the total number of
    current usages and new reservations for disk blocks and inodes,
    respectively.

    (2) make it more readable to check if the local variables softlimit
    and hardlimit are positive. It has been changed as follows.
    if (softlimit > 0ULL) -> if (softlimit)
    if (hardlimit > 0ULL) -> if (hardlimit)
    This is because they are defined as xfs_qcnt_t which is unsigned.

    Signed-off-by: Mitsuo Hayasaka
    Cc: Ben Myers
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Mitsuo Hayasaka
     

22 Feb, 2012

2 commits

  • The xfs checks quota when reserving disk blocks and inodes. In the block
    reservation, it checks if the total number of blocks including current
    usage and new reservation exceed quota. In the inode reservation,
    it checks using the total number of inodes including only current usage
    without new reservation. However, this inode quota check works well
    since the caller of xfs_trans_dquot() always sets the argument of the
    number of new inode reservation to 1 or 0 and inode is reserved one by
    one in current xfs.

    To make it more general, this patch changes it to the same way as the
    block quota check.

    Signed-off-by: Mitsuo Hayasaka
    Cc: Ben Myers
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    (cherry picked from commit c922bbc819324558e61402a7a76c10c550ca61bc)

    Mitsuo Hayasaka
     
  • In general, quota allows us to use disk blocks and inodes up to each
    limit, that is, they are available if they don't exceed their limitations.
    Current xfs sets their available ranges to lower than them except disk
    inode quota check. So, this patch changes the ranges to not beyond them.

    Signed-off-by: Mitsuo Hayasaka
    Cc: Ben Myers
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    (cherry picked from commit 20f12d8ac01917d96860f352f67eddd912df0afb)

    Mitsuo Hayasaka
     

14 Feb, 2012

1 commit


11 Feb, 2012

1 commit

  • Stop reusing dquots from the freelist when allocating new ones directly, and
    implement a shrinker that actually follows the specifications for the
    interface. The shrinker implementation is still highly suboptimal at this
    point, but we can gradually work on it.

    This also fixes an bug in the previous lock ordering, where we would take
    the hash and dqlist locks inside of the freelist lock against the normal
    lock ordering. This is only solvable by introducing the dispose list,
    and thus not when using direct reclaim of unused dquots for new allocations.

    As a side-effect the quota upper bound and used to free ratio values in
    /proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
    new world order.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    (cherry picked from commit 04da0c8196ac0b12fb6b84f4b7a51ad2fa56d869)

    Christoph Hellwig
     

04 Feb, 2012

3 commits


03 Feb, 2012

2 commits

  • Create a new function xfs_this_quota_on() that takes a xfs_mount
    data structure and a disk quota type and returns true if the specified
    type of quota is ON in the xfs_mount data structure.

    Signed-off-by: Chandra Seetharaman
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Chandra Seetharaman
     
  • Removing the macro, as this is no more needed in the code.
    Tried to find the reference when it was last used - but the usage
    for this seemed to have been dropped long time ago.

    Signed-off-by: Amit Sahrawat
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Amit Sahrawat
     

01 Feb, 2012

2 commits

  • When a system tries to mount a filesystem (FS) using UUID, the xfs
    returns -EINVAL and shows a message if a FS with the same UUID has
    been already mounted. It is useful to output the duplicate UUID
    with it.

    Signed-off-by: Mitsuo Hayasaka
    Reviewed-by: Christoph Hellwig
    Cc: Ben Myers
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Signed-off-by: Ben Myers

    Mitsuo Hayasaka
     
  • The kmem_realloc() in xfs is given KM_* memory allocation flags. And it
    allocates memory using kmalloc() after they are converted to gfp_mask
    flags. In xlog_recover_add_to_cont_trans(), 0u is passed to kmem_realloc(),
    instead of them. I guess it is preferred to use them, and here memory must
    be allocated but don't have to be done with GFP_ATOMIC. So, this patch
    changes it to KM_SLEEP.

    Signed-off-by: Mitsuo Hayasaka
    Cc: Ben Myers
    Cc: Alex Elder
    Cc: Christoph Hellwig
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Mitsuo Hayasaka
     

26 Jan, 2012

1 commit


20 Jan, 2012

5 commits


18 Jan, 2012

11 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits)
    audit: no leading space in audit_log_d_path prefix
    audit: treat s_id as an untrusted string
    audit: fix signedness bug in audit_log_execve_info()
    audit: comparison on interprocess fields
    audit: implement all object interfield comparisons
    audit: allow interfield comparison between gid and ogid
    audit: complex interfield comparison helper
    audit: allow interfield comparison in audit rules
    Kernel: Audit Support For The ARM Platform
    audit: do not call audit_getname on error
    audit: only allow tasks to set their loginuid if it is -1
    audit: remove task argument to audit_set_loginuid
    audit: allow audit matching on inode gid
    audit: allow matching on obj_uid
    audit: remove audit_finish_fork as it can't be called
    audit: reject entry,always rules
    audit: inline audit_free to simplify the look of generic code
    audit: drop audit_set_macxattr as it doesn't do anything
    audit: inline checks for not needing to collect aux records
    audit: drop some potentially inadvisable likely notations
    ...

    Use evil merge to fix up grammar mistakes in Kconfig file.

    Bad speling and horrible grammar (and copious swearing) is to be
    expected, but let's keep it to commit messages and comments, rather than
    expose it to users in config help texts or printouts.

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: cleanup xfs_file_aio_write
    xfs: always return with the iolock held from xfs_file_aio_write_checks
    xfs: remove the i_new_size field in struct xfs_inode
    xfs: remove the i_size field in struct xfs_inode
    xfs: replace i_pin_wait with a bit waitqueue
    xfs: replace i_flock with a sleeping bitlock
    xfs: make i_flags an unsigned long
    xfs: remove the if_ext_max field in struct xfs_ifork
    xfs: remove the unused dm_attrs structure
    xfs: cleanup xfs_iomap_eof_align_last_fsb
    xfs: remove xfs_itruncate_data

    Linus Torvalds
     
  • * 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    btrfs: take allocation of ->tree_root into open_ctree()
    btrfs: let ->s_fs_info point to fs_info, not root...
    btrfs: consolidate failure exits in btrfs_mount() a bit
    btrfs: make free_fs_info() call ->kill_sb() unconditional
    btrfs: merge free_fs_info() calls on fill_super failures
    btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
    btrfs: make open_ctree() return int
    btrfs: sanitizing ->fs_info, part 5
    btrfs: sanitizing ->fs_info, part 4
    btrfs: sanitizing ->fs_info, part 3
    btrfs: sanitizing ->fs_info, part 2
    btrfs: sanitizing ->fs_info, part 1
    btrfs: fix a deadlock in btrfs_scan_one_device()
    btrfs: fix mount/umount race
    btrfs: get ->kill_sb() of its own
    btrfs: preparation to fixing mount/umount race

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
    Btrfs: use larger system chunks
    Btrfs: add a delalloc mutex to inodes for delalloc reservations
    Btrfs: space leak tracepoints
    Btrfs: protect orphan block rsv with spin_lock
    Btrfs: add allocator tracepoints
    Btrfs: don't call btrfs_throttle in file write
    Btrfs: release space on error in page_mkwrite
    Btrfs: fix btrfsck error 400 when truncating a compressed
    Btrfs: do not use btrfs_end_transaction_throttle everywhere
    Btrfs: add balance progress reporting
    Btrfs: allow for resuming restriper after it was paused
    Btrfs: allow for canceling restriper
    Btrfs: allow for pausing restriper
    Btrfs: add skip_balance mount option
    Btrfs: recover balance on mount
    Btrfs: save balance parameters to disk
    Btrfs: soft profile changing mode (aka soft convert)
    Btrfs: implement online profile changing
    Btrfs: do not reduce profile in do_chunk_alloc()
    Btrfs: virtual address space subset filter
    ...

    Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
    mnt_drop_write_file() helper.

    Linus Torvalds
     
  • Jüri Aedla reported that the /proc//mem handling really isn't very
    robust, and it also doesn't match the permission checking of any of the
    other related files.

    This changes it to do the permission checks at open time, and instead of
    tracking the process, it tracks the VM at the time of the open. That
    simplifies the code a lot, but does mean that if you hold the file
    descriptor open over an execve(), you'll continue to read from the _old_
    VM.

    That is different from our previous behavior, but much simpler. If
    somebody actually finds a load where this matters, we'll need to revert
    this commit.

    I suspect that nobody will ever notice - because the process mapping
    addresses will also have changed as part of the execve. So you cannot
    actually usefully access the fd across a VM change simply because all
    the offsets for IO would have changed too.

    Reported-by: Jüri Aedla
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Don't log a message for set_nlink(0).

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • dd slept infinitely when fsfeeze failed because of EIO.
    To fix this problem, if ->freeze_fs fails, freeze_super() wakes up
    the tasks waiting for the filesystem to become unfrozen.

    When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(),
    the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen.

    However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then
    freeze_super() returns an error number. In this case, FITHAW ioctl returns
    EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up
    s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely.

    Signed-off-by: Kazuya Mio
    Signed-off-by: Al Viro

    Kazuya Mio
     
  • Just a code cleanup really. We don't need to make a function call just for
    it to return on error. This also makes the VFS function even easier to follow
    and removes a conditional on a hot path.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • At the moment we allow tasks to set their loginuid if they have
    CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they
    log in and it be impossible to ever reset. We had to make it mutable even
    after it was once set (with the CAP) because on update and admin might have
    to restart sshd. Now sshd would get his loginuid and the next user which
    logged in using ssh would not be able to set his loginuid.

    Systemd has changed how userspace works and allowed us to make the kernel
    work the way it should. With systemd users (even admins) are not supposed
    to restart services directly. The system will restart the service for
    them. Thus since systemd is going to loginuid==-1, sshd would get -1, and
    sshd would be allowed to set a new loginuid without special permissions.

    If an admin in this system were to manually start an sshd he is inserting
    himself into the system chain of trust and thus, logically, it's his
    loginuid that should be used! Since we have old systems I make this a
    Kconfig option.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • The function always deals with current. Don't expose an option
    pretending one can use it for something. You can't.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • With all the size field updates out of the way xfs_file_aio_write can
    be further simplified by pushing all iolock handling into
    xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
    the generic generic_write_sync helper for synchronous writes.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig