06 Jan, 2009

40 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
    dlm: fs/dlm/ast.c: fix warning
    dlm: add new debugfs entry
    dlm: add time stamp of blocking callback
    dlm: change lock time stamping
    dlm: improve how bast mode handling
    dlm: remove extra blocking callback check
    dlm: replace schedule with cond_resched
    dlm: remove kmap/kunmap
    dlm: trivial annotation of be16 value
    dlm: fix up memory allocation flags

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits)
    GFS2: Use DEFINE_SPINLOCK
    GFS2: Fix use-after-free bug on umount (try #2)
    Revert "GFS2: Fix use-after-free bug on umount"
    GFS2: Streamline alloc calculations for writes
    GFS2: Send useful information with uevent messages
    GFS2: Fix use-after-free bug on umount
    GFS2: Remove ancient, unused code
    GFS2: Move four functions from super.c
    GFS2: Fix bug in gfs2_lock_fs_check_clean()
    GFS2: Send some sensible sysfs stuff
    GFS2: Kill two daemons with one patch
    GFS2: Move gfs2_recoverd into recovery.c
    GFS2: Fix "truncate in progress" hang
    GFS2: Clean up & move gfs2_quotad
    GFS2: Add more detail to debugfs glock dumps
    GFS2: Banish struct gfs2_rgrpd_host
    GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd
    GFS2: Move rg_igeneration into struct gfs2_rgrpd
    GFS2: Banish struct gfs2_dinode_host
    GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize
    ...

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (138 commits)
    ocfs2: Access the right buffer_head in ocfs2_merge_rec_left.
    ocfs2: use min_t in ocfs2_quota_read()
    ocfs2: remove unneeded lvb casts
    ocfs2: Add xattr support checking in init_security
    ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle
    ocfs2: calculate and reserve credits for xattr value in mknod
    ocfs2/xattr: fix credits calculation during index create
    ocfs2/xattr: Always updating ctime during xattr set.
    ocfs2/xattr: Remove extend_trans call and add its credits from the beginning
    ocfs2/dlm: Fix race during lockres mastery
    ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list
    ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating
    ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler()
    ocfs2/dlm: Fix a race between migrate request and exit domain
    ocfs2: One more hamming code optimization.
    ocfs2: Another hamming code optimization.
    ocfs2: Don't hand-code xor in ocfs2_hamming_encode().
    ocfs2: Enable metadata checksums.
    ocfs2: Validate superblock with checksum and ecc.
    ocfs2: Checksum and ECC for directory blocks.
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    inotify: fix type errors in interfaces
    fix breakage in reiserfs_new_inode()
    fix the treatment of jfs special inodes
    vfs: remove duplicate code in get_fs_type()
    add a vfs_fsync helper
    sys_execve and sys_uselib do not call into fsnotify
    zero i_uid/i_gid on inode allocation
    inode->i_op is never NULL
    ntfs: don't NULL i_op
    isofs check for NULL ->i_op in root directory is dead code
    affs: do not zero ->i_op
    kill suid bit only for regular files
    vfs: lseek(fd, 0, SEEK_CUR) race condition

    Linus Torvalds
     
  • The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.

    For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
    currently 'u32', it should be '__s32' . That is Robert's suggestion, and
    is consistent with the other declarations of watch descriptors in the
    kernel source, in particular, the inotify_event structure in
    include/linux/inotify.h:

    struct inotify_event {
    __s32 wd; /* watch descriptor */
    __u32 mask; /* watch mask */
    __u32 cookie; /* cookie to synchronize two events */
    __u32 len; /* length (including nulls) of name */
    char name[0]; /* stub for possible name */
    };

    The patch makes the changes needed for inotify_rm_watch().

    Signed-off-by: Michael Kerrisk
    Cc: Robert Love
    Cc: Vegard Nossum
    Cc: Ulrich Drepper
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Michael Kerrisk
     
  • now that we use ih.key earlier, we need to do all its setup early enough

    Signed-off-by: Al Viro

    Al Viro
     
  • We used to put them on a single list, without any locking. Racy.

    Signed-off-by: Al Viro

    Al Viro
     
  • save 14 bytes:

    text data bss dec hex filename
    1354 32 4 1390 56e fs/filesystems.o.before
    text data bss dec hex filename
    1340 32 4 1376 560 fs/filesystems.o

    Signed-off-by: Li Zefan
    Signed-off-by: Al Viro

    Li Zefan
     
  • Fsync currently has a fdatawrite/fdatawait pair around the method call,
    and a mutex_lock/unlock of the inode mutex. All callers of fsync have
    to duplicate this, but we have a few and most of them don't quite get
    it right. This patch adds a new vfs_fsync that takes care of this.
    It's a little more complicated as usual as ->fsync might get a NULL file
    pointer and just a dentry from nfsd, but otherwise gets afile and we
    want to take the mapping and file operations from it when it is there.

    Notes on the fsync callers:

    - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
    lower file
    - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
    file, and returning 0 when ->fsync was missing
    - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
    taking i_mutex. Now given that shared memory doesn't have disk
    backing not doing anything in fsync seems fine and I left it out of
    the vfs_fsync conversion for now, but in that case we might just
    not pass it through to the lower file at all but just call the no-op
    simple_sync_file directly.

    [and now actually export vfs_fsync]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • sys_execve and sys_uselib do not call into fsnotify so inotify does not get
    open events for these types of syscalls. This patch simply makes the
    requisite fsnotify calls.

    Signed-off-by: Eric Paris
    Signed-off-by: Al Viro

    Eric Paris
     
  • ... and don't bother in callers. Don't bother with zeroing i_blocks,
    while we are at it - it's already been zeroed.

    i_mode is not worth the effort; it has no common default value.

    Signed-off-by: Al Viro

    Al Viro
     
  • We used to have rather schizophrenic set of checks for NULL ->i_op even
    though it had been eliminated years ago. You'd need to go out of your
    way to set it to NULL explicitly _and_ a bunch of code would die on
    such inodes anyway. After killing two remaining places that still
    did that bogosity, all that crap can go away.

    Signed-off-by: Al Viro

    Al Viro
     
  • it's already set to empty table (and no, ntfs doesn't have any explicit
    checks for NULL ->i_op or NULL ->i_fop)

    Signed-off-by: Al Viro

    Al Viro
     
  • for one thing it never happens, for another we check that inode
    is a directory right after that place anyway (and we'd already
    checked that reading it from disk has not failed).

    Signed-off-by: Al Viro

    Al Viro
     
  • it is already set to empty table and should never be NULL

    Signed-off-by: Al Viro

    Al Viro
     
  • This patch fixes a race condition in lseek. While it is expected that
    unpredictable behaviour may result while repositioning the offset of a
    file descriptor concurrently with reading/writing to the same file
    descriptor, this should not happen when merely *reading* the file
    descriptor's offset.

    Unfortunately, the only portable way in Unix to read a file
    descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this
    concurrently with read/write may mess up the position.

    [with fixes from akpm]

    Signed-off-by: Alain Knaff
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Alain Knaff
     
  • In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*()
    functions", the wrong buffer_head is accessed. So change it
    to the right buffer_head.

    Signed-off-by: Tao Ma
    Acked-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • This is preferred to min().

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb().
    This is pointless however, as ocfs2_dlm_lvb() returns void *.

    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • We must check whether ocfs2 volume support xattr in init_security,
    if not support xattr and security is enable, would cause failure of mknod.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • In extreme situation, may need xattr bucket for setting
    security entry and acl entries during mknod. This only
    happens when block size is too small.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • We extend the credits for xattr's large value in set_value_outside
    before, this can give rise to a credits issue when we set one security
    entry and two acl entries duing mknod. As we remove extend_trans form
    set_value_outside, we must calculate and reserve the credits for
    xattr's large value in mknod.

    Signed-off-by: Tiger Yang
    Signed-off-by: Mark Fasheh

    Tiger Yang
     
  • When creating a xattr index block, the old calculation forget
    to add credits for the meta change of the alloc file. So add
    more credits and more comments to explain it.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • In xattr set, we should always update ctime if the operation goes
    sucessfully. The old one mistakenly put it in ocfs2_xattr_set_entry
    which is only called when we set xattr in inode or xattr block. The
    side benefit is that it resolve the bug 1052 since in that scenario,
    ocfs2_calc_xattr_set_need only calc out the xattr set credits while
    ocfs2_xattr_set_entry update the inode also which isn't concerned with
    the process of xattr set.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • Actually, when setting a new xattr value, we know it from the very
    beginning, and it isn't like the extension of bucket in which case
    we can't figure it out. So remove ocfs2_extend_trans in that function
    and calculate it before the transaction. It also relieve acl operation
    from the worry about the side effect of ocfs2_extend_trans.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • dlm_get_lock_resource() is supposed to return a lock resource with a proper
    master. If multiple concurrent threads attempt to lookup the lockres for the
    same lockid while the lock mastery in underway, one or more threads are likely
    to return a lockres without a proper master.

    This patch makes the threads wait in dlm_get_lock_resource() while the mastery
    is underway, ensuring all threads return the lockres with a proper master.

    This issue is known to be limited to users using the flock() syscall. For all
    other fs operations, the ocfs2 dlmglue layer serializes the dlm op for each
    lockid.

    Users encountering this bug will see flock() return EINVAL and dmesg have the
    following error:
    ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource : bad api args

    Reported-by: Coly Li
    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • This patch adds a new lock, dlm->tracking_lock, to protect adding/removing
    lockres' to/from the dlm->tracking_list. We were previously using dlm->spinlock
    for the same, but that proved inadequate as we could be freeing a lockres from
    a context that did not hold that lock. As the new lock only protects this list,
    we can explicitly take it when removing the lockres from the tracking list.

    This bug was exposed when testing multiple processes concurrently flock() the
    same file.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • During lockres purge, o2dlm sends a drop reference message to the lockres
    master. This patch delays the message if the lockres is being migrated.

    Fixes oss bugzilla#1012
    http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • Patch cleans printed errors in dlm_proxy_ast_handler(). The errors now includes
    the node number that sent the (b)ast. Also it reduces the number of endian swaps
    of the cookie.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • Patch address a racing migrate request message and an exit domain message.
    Instead of blocking exit domains for the duration of the migrate, we ignore
    failure to deliver that message. This is because an exiting domain should
    not have any active locks and thus has no role to play in the migration.

    Signed-off-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Sunil Mushran
     
  • The previous optimization used a fast find-highest-bit-set operation to
    give us a good starting point in calc_code_bit(). This version lets the
    caller cache the previous code buffer bit offset. Thus, the next call
    always starts where the last one left off.

    This reduces the calculation another 39%, for a total 80% reduction from
    the original, naive implementation. At least, on my machine. This also
    brings the parity calculation to within an order of magnitude of the
    crc32 calculation.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • In the calc_code_bit() function, we must find all powers of two beneath
    the code bit number, *after* it's shifted by those powers of two. This
    requires a loop to see where it ends up.

    We can optimize it by starting at its most significant bit. This shaves
    32% off the time, for a total of 67.6% shaved off of the original, naive
    implementation.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • When I wrote ocfs2_hamming_encode(), I was following documentation of
    the algorithm and didn't have quite the (possibly still imperfect) grasp
    of it I do now. As part of this, I literally hand-coded xor. I would
    test a bit, and then add that bit via xor to the parity word.

    I can, of course, just do a single xor of the parity word and the source
    word (the code buffer bit offset). This cuts CPU usage by 53% on a
    mostly populated buffer (an inode containing utmp.h inline).

    Joel

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Add OCFS2_FEATURE_INCOMPAT_META_ECC to the list of supported features.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • The superblock is read via a raw call. Validate it after we find it
    from its signature.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the
    dirblocks.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • Future ocfs2 features metaecc and indexed directories need to store a
    little bit of data in each dirblock. For compatibility, we place this
    in a trailer at the end of the dirblock. The trailer plays itself as an
    empty dirent, so that if the features are turned off, it can be reused
    without requiring a tunefs scan.

    This code adds the trailer and validates it when the block is read in.

    [ Mark is the original author, but I reinserted this code before his
    dir index work. -- Joel ]

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • Change the rest of the naked ocfs2_journal_access() calls in
    fs/ocfs2/xattr.c to use the appropriate ocfs2_journal_access_*() call
    for their metadata type.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • ocfs2_remove_value_outside() needs to know the type of buffer it is
    looking at. Pass in an ocfs2_xattr_value_buf.

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker
     
  • ocfs2_xattr_set_entry is the function that knows what type of block it
    is setting into. This is what we wanted from ocfs2_xattr_value_buf.
    Plus, moving the value buf up into ocfs2_xattr_set_entry() allows us to
    pass it into ocfs2_xattr_set_value_outside() and ocfs2_xattr_cleanup().

    Signed-off-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Joel Becker