08 Aug, 2011

1 commit


07 Aug, 2011

12 commits

  • After commit 3567866bf261: "RCUify freeing acls, let check_acl() go ahead in
    RCU mode if acl is cached" posix_acl_permission is being called with an
    unsupported flag and the permission check fails. This patch fixes the issue.

    Signed-off-by: Ari Savolainen
    Signed-off-by: Al Viro

    Ari Savolainen
     
  • * 'for-linus' of git://git.open-osd.org/linux-open-osd:
    ore: Make ore its own module
    exofs: Rename raid engine from exofs/ios.c => ore
    exofs: ios: Move to a per inode components & device-table
    exofs: Move exofs specific osd operations out of ios.c
    exofs: Add offset/length to exofs_get_io_state
    exofs: Fix truncate for the raid-groups case
    exofs: Small cleanup of exofs_fill_super
    exofs: BUG: Avoid sbi realloc
    exofs: Remove pnfs-osd private definitions
    nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4

    Linus Torvalds
     
  • The inode structure layout is largely random, and some of the vfs paths
    really do care. The path lookup in particular is already quite D$
    intensive, and profiles show that accessing the 'inode->i_op->xyz'
    fields is quite costly.

    We already optimized the dcache to not unnecessarily load the d_op
    structure for members that are often NULL using the DCACHE_OP_xyz bits
    in dentry->d_flags, and this does something very similar for the inode
    ops that are used during pathname lookup.

    It also re-orders the fields so that the fields accessed by 'stat' are
    together at the beginning of the inode structure, and roughly in the
    order accessed.

    The effect of this seems to be in the 1-2% range for an empty kernel
    "make -j" run (which is fairly kernel-intensive, mostly in filename
    lookup), so it's visible. The numbers are fairly noisy, though, and
    likely depend a lot on exact microarchitecture. So there's more tuning
    to be done.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Gcc tends to generate better code with small integers, including the
    DCACHE_xyz flag tests - so move the common ones to be first in the list.
    Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
    DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
    tree.

    And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
    common case to be a nice straight-line fall-through.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Export everything from ore need exporting. Change Kbuild and Kconfig
    to build ore.ko as an independent module. Import ore from exofs

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • ORE stands for "Objects Raid Engine"

    This patch is a mechanical rename of everything that was in ios.c
    and its API declaration to an ore.c and an osd_ore.h header. The ore
    engine will later be used by the pnfs objects layout driver.

    * File ios.c => ore.c

    * Declaration of types and API are moved from exofs.h to a new
    osd_ore.h

    * All used types are prefixed by ore_ from their exofs_ name.

    * Shift includes from exofs.h to osd_ore.h so osd_ore.h is
    independent, include it from exofs.h.

    Other than a pure rename there are no other changes. Next patch
    will move the ore into it's own module and will export the API
    to be used by exofs and later the layout driver

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Exofs raid engine was saving on memory space by having a single layout-info,
    single pid, and a single device-table, global to the filesystem. Then passing
    a credential and object_id info at the io_state level, private for each
    inode. It would also devise this contraption of rotating the device table
    view for each inode->ino to spread out the device usage.

    This is not compatible with the pnfs-objects standard, demanding that
    each inode can have it's own layout-info, device-table, and each object
    component it's own pid, oid and creds.

    So: Bring exofs raid engine to be usable for generic pnfs-objects use by:

    * Define an exofs_comp structure that holds obj_id and credential info.

    * Break up exofs_layout struct to an exofs_components structure that holds a
    possible array of exofs_comp and the array of devices + the size of the
    arrays.

    * Add a "comps" parameter to get_io_state() that specifies the ids creds
    and device array to use for each IO.

    This enables to keep the layout global, but the device-table view, creds
    and IDs at the inode level. It only adds two 64bit to each inode, since
    some of these members already existed in another form.

    * ios raid engine now access layout-info and comps-info through the passed
    pointers. Everything is pre-prepared by caller for generic access of
    these structures and arrays.

    At the exofs Level:

    * Super block holds an exofs_components struct that holds the device
    array, previously in layout. The devices there are in device-table
    order. The device-array is twice bigger and repeats the device-table
    twice so now each inode's device array can point to a random device
    and have a round-robin view of the table, making it compatible to
    previous exofs versions.

    * Each inode has an exofs_components struct that is initialized at
    load time, with it's own view of the device table IDs and creds.
    When doing IO this gets passed to the io_state together with the
    layout.

    While preforming this change. Bugs where found where credentials with the
    wrong IDs where used to access the different SB objects (super.c). As well
    as some dead code. It was never noticed because the target we use does not
    check the credentials.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • ios.c will be moving to an external library, for use by the
    objects-layout-driver. Remove from it some exofs specific functions.

    Also g_attr_logical_length is used both by inode.c and ios.c
    move definition to the later, to keep it independent

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • In future raid code we will need to know the IO offset/length
    and if it's a read or write to determine some of the array
    sizes we'll need.

    So add a new exofs_get_rw_state() API for use when
    writeing/reading. All other simple cases are left using the
    old way.

    The major change to this is that now we need to call
    exofs_get_io_state later at inode.c::read_exec and
    inode.c::write_exec when we actually know these things. So this
    patch is kept separate so I can test things apart from other
    changes.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: cope with negative dentries in cifs_get_root
    cifs: convert prefixpath delimiters in cifs_build_path_to_root
    CIFS: Fix missing a decrement of inFlight value
    cifs: demote DFS referral lookup errors to cFYI
    Revert "cifs: advertise the right receive buffer size to the server"

    Linus Torvalds
     
  • The CLOEXE bit is magical, and for performance (and semantic) reasons we
    don't actually maintain it in the file descriptor itself, but in a
    separate bit array. Which means that when we show f_flags, the CLOEXE
    status is shown incorrectly: we show the status not as it is now, but as
    it was when the file was opened.

    Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit
    array.

    Uli needs this in order to re-implement the pfiles program:

    "For normal file descriptors (not sockets) this was the last piece of
    information which wasn't available. This is all part of my 'give
    Solaris users no reason to not switch' effort. I intend to offer the
    code to the util-linux-ng maintainers."

    Requested-by: Ulrich Drepper
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • WARN_ONCE() is very annoying, in that it shows the stack trace that we
    don't care about at all, and also triggers various user-level "kernel
    oopsed" logic that we really don't care about. And it's not like the
    user can do anything about the applications (sshd) in question, it's a
    distro issue.

    Requested-by: Andi Kleen (and many others)
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 Aug, 2011

7 commits

  • The loop around lookup_one_len doesn't handle the case where it might
    return a negative dentry, which can cause an oops on the next pass
    through the loop. Check for that and break out of the loop with an
    error of -ENOENT if there is one.

    Fixes the panic reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=727927

    Reported-by: TR Bentley
    Reported-by: Iain Arnell
    Cc: Al Viro
    Cc: stable@kernel.org
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Regression from 2.6.39...

    The delimiters in the prefixpath are not being converted based on
    whether posix paths are in effect. Fixes:

    https://bugzilla.redhat.com/show_bug.cgi?id=727834

    Reported-and-Tested-by: Iain Arnell
    Reported-by: Patrick Oltmann
    Cc: Pavel Shilovsky
    Cc: stable@kernel.org
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached
    get rid of boilerplate switches in posix_acl.h
    fix block device fallout from ->fsync() changes

    Linus Torvalds
     
  • In the general raid-group case the truncate was wrong in that
    it did not also fix the object length of the neighboring groups.

    There are two bad cases in the old code:
    1. Space that should be freed was not.
    2. If a file That was big is truncated small, then made bigger
    again, the holes would not contain zeros but could expose old data.
    (If the growing of the file expands to more than a full
    groups cycle + group size (> S + T))

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Small cleanup that unifies duplicated code used in both the
    error and success cases

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Since the beginning we realloced the sbi structure when a bigger
    then one device table was specified. (I know that was really stupid).

    Then much later when "register bdi" was added (By Jens) it was
    registering the pointer to sbi->bdi before the realloc.

    We never saw this problem because up till now the realloc did not
    do anything since the device table was small enough to fit in the
    original allocation. But once we starting testing with large device
    tables (Bigger then 28) we noticed the crash of writeback operating
    on a deallocated pointer.

    * Avoid the all mess by allocating the device-table as a second array
    and get rid of the variable-sized structure and the rest of this
    mess.
    * Take the chance to clean near by structures and comments.
    * Add a needed dprint on startup to indicate the loaded layout.
    * Also move the bdi registration to the very end because it will
    only fail in a low memory, which will probably fail before hand.
    There are many more likely causes to not load before that. This
    way the error handling is made simpler. (Just doing this would be
    enough to fix the BUG)

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Now that pnfs-osd has hit mainline we can remove exofs's
    private header. (And the FIXME comment)

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     

04 Aug, 2011

6 commits


03 Aug, 2011

4 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits)
    Btrfs: don't call writepages from within write_full_page
    Btrfs: Remove unused variable 'last_index' in file.c
    Btrfs: clean up for find_first_extent_bit()
    Btrfs: clean up for wait_extent_bit()
    Btrfs: clean up for insert_state()
    Btrfs: remove unused members from struct extent_state
    Btrfs: clean up code for merging extent maps
    Btrfs: clean up code for extent_map lookup
    Btrfs: clean up search_extent_mapping()
    Btrfs: remove redundant code for dir item lookup
    Btrfs: make acl functions really no-op if acl is not enabled
    Btrfs: remove remaining ref-cache code
    Btrfs: remove a BUG_ON() in btrfs_commit_transaction()
    Btrfs: use wait_event()
    Btrfs: check the nodatasum flag when writing compressed files
    Btrfs: copy string correctly in INO_LOOKUP ioctl
    Btrfs: don't print the leaf if we had an error
    btrfs: make btrfs_set_root_node void
    Btrfs: fix oops while writing data to SSD partitions
    Btrfs: Protect the readonly flag of block group
    ...

    Fix up trivial conflicts (due to acl and writeback cleanups) in
    - fs/btrfs/acl.c
    - fs/btrfs/ctree.h
    - fs/btrfs/extent_io.c

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • cifs: demote DFS referral lookup errors to cFYI

    Now that we call into this routine on every mount, anyone who doesn't
    have the upcall configured will get multiple printks about failed lookups.

    Reported-and-Tested-by: Martijn Uffing
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • This reverts commit c4d3396b261473ded6f370edd1e79ba34e089d7e.

    Problems discovered with readdir to Samba due to
    not accounting for header size properly with this change

    Steve French
     

02 Aug, 2011

10 commits

  • blkdev_fsync() needs to write pages in pagecache...

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Al Viro

    Rafael J. Wysocki
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
    ext4: prevent memory leaks from ext4_mb_init_backend() on error path
    ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #
    ext4: use ext4_msg() instead of printk in mballoc
    ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info
    ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()
    ext4: use the correct error exit path in ext4_init_inode_table()
    ext4: add missing kfree() on error return path in add_new_gdb()
    ext4: change umode_t in tracepoint headers to be an explicit __u16
    ext4: fix races in ext4_sync_parent()
    ext4: Fix overflow caused by missing cast in ext4_fallocate()
    ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole
    ext4: simplify parameters of reserve_backup_gdb()
    ext4: simplify parameters of add_new_gdb()
    ext4: remove lock_buffer in bclean() and setup_new_group_blocks()
    ext4: simplify journal handling in setup_new_group_blocks()
    ext4: let setup_new_group_blocks() set multiple bits at a time
    ext4: fix a typo in ext4_group_extend()
    ext4: let ext4_group_add_blocks() handle 0 blocks quickly
    ext4: let ext4_group_add_blocks() return an error code
    ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()
    ...

    Fix up conflict in fs/ext4/inode.c: commit aacfc19c626e ("fs: simplify
    the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO()
    function for the new simplified calling convention, while commit
    dae1e52cb126 ("ext4: move ext4_ind_* functions from inode.c to
    indirect.c") moved the function to another file.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set
    VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock
    VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree()
    VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree()
    switch posix_acl_chmod() to umode_t
    switch posix_acl_from_mode() to umode_t
    switch posix_acl_equiv_mode() to umode_t *
    switch posix_acl_create() to umode_t *
    block: initialise bd_super in bdget()
    vfs: avoid call to inode_lru_list_del() if possible
    vfs: avoid taking inode_hash_lock on pipes and sockets
    vfs: conditionally call inode_wb_list_del()
    VFS: Fix automount for negative autofs dentries
    Btrfs: load the key from the dir item in readdir into a fake dentry
    devtmpfs: missing initialialization in never-hit case
    hppfs: missing include

    Linus Torvalds
     
  • * 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    efivars: Introduce PSTORE_EFI_ATTRIBUTES
    efivars: Use string functions in pstore_write
    efivars: introduce utf16_strncmp
    efivars: String functions
    efi: Add support for using efivars as a pstore backend
    pstore: Allow the user to explicitly choose a backend
    pstore: Make "part" unsigned
    pstore: Add extra context for writes and erases
    pstore: Extend API for more flexibility in new backends

    Linus Torvalds
     
  • In ext4_mb_init(), if the s_locality_group allocation fails it will
    currently cause the allocations made in ext4_mb_init_backend() to
    be leaked. Moving the ext4_mb_init_backend() allocation after the
    s_locality_group allocation avoids that problem.

    Signed-off-by: Yu Jian
    Signed-off-by: Andreas Dilger
    Signed-off-by: "Theodore Ts'o"

    Yu Jian
     
  • Signed-off-by: Yu Jian
    Signed-off-by: Andreas Dilger
    Signed-off-by: "Theodore Ts'o"

    Yu Jian
     
  • Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • When doing a writepage we call writepages to try and write out any other dirty
    pages in the area. This could cause problems where we commit a transaction and
    then have somebody else dirtying metadata in the area as we could end up writing
    out a lot more than we care about, which could cause latency on anybody who is
    waiting for the transaction to completely finish committing. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • The variable 'last_index' is calculated in the __btrfs_buffered_write
    function and passed as a parameter to the prepare_pages function,
    but is not used anywhere in the prepare_pages function.

    Remove instances of 'last_index' in these functions.

    Signed-off-by: Mitch Harder
    Signed-off-by: Chris Mason

    Mitch Harder
     
  • find_first_extent_bit() and find_first_extent_bit_state() share
    most of the code, and we can just make the former call the latter.

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Xiao Guangrong