04 Jun, 2011

1 commit

  • Caching "we have already removed suid/caps" was overenthusiastic as merged.
    On network filesystems we might have had suid/caps set on another client,
    silently picked by this client on revalidate, all of that *without* clearing
    the S_NOSEC flag.

    AFAICS, the only reasonably sane way to deal with that is
    * new superblock flag; unless set, S_NOSEC is not going to be set.
    * local block filesystems set it in their ->mount() (more accurately,
    mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
    local block ones clear it)
    * if any network filesystem (or a cluster one) wants to use S_NOSEC,
    it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
    inode attribute changes are picked from other clients.

    It's not an earth-shattering hole (anybody that can set suid on another client
    will almost certainly be able to write to the file before doing that anyway),
    but it's a bug that needs fixing.

    Signed-off-by: Al Viro

    Al Viro
     

28 May, 2011

1 commit


27 May, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (25 commits)
    cifs: remove unnecessary dentry_unhash on rmdir/rename_dir
    ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir
    exofs: remove unnecessary dentry_unhash on rmdir/rename_dir
    nfs: remove unnecessary dentry_unhash on rmdir/rename_dir
    ext2: remove unnecessary dentry_unhash on rmdir/rename_dir
    ext3: remove unnecessary dentry_unhash on rmdir/rename_dir
    ext4: remove unnecessary dentry_unhash on rmdir/rename_dir
    btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir
    ceph: remove unnecessary dentry_unhash calls
    vfs: clean up vfs_rename_other
    vfs: clean up vfs_rename_dir
    vfs: clean up vfs_rmdir
    vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems
    libfs: drop unneeded dentry_unhash
    vfs: update dentry_unhash() comment
    vfs: push dentry_unhash on rename_dir into file systems
    vfs: push dentry_unhash on rmdir into file systems
    vfs: remove dget() from dentry_unhash()
    vfs: dentry_unhash immediately prior to rmdir
    vfs: Block mmapped writes while the fs is frozen
    ...

    Linus Torvalds
     

26 May, 2011

2 commits


10 May, 2011

1 commit


31 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

23 Mar, 2011

2 commits

  • This function basically does:

    remove_from_page_cache(old);
    page_cache_release(old);
    add_to_page_cache_locked(new);

    Except it does this atomically, so there's no possibility for the "add" to
    fail because of a race.

    If memory cgroups are enabled, then the memory cgroup charge is also moved
    from the old page to the new.

    This function is currently used by fuse to move pages into the page cache
    on read, instead of copying the page contents.

    [minchan.kim@gmail.com: add freepage() hook to replace_page_cache_page()]
    Signed-off-by: Miklos Szeredi
    Acked-by: Rik van Riel
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: make fuse_dentry_revalidate() RCU aware
    fuse: make fuse_permission() RCU aware
    fuse: wakeup pollers on connection release/abort
    fuse: reduce size of struct fuse_request

    Linus Torvalds
     

21 Mar, 2011

4 commits


19 Mar, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
    doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
    Update cpuset info & webiste for cgroups
    dcdbas: force SMI to happen when expected
    arch/arm/Kconfig: remove one to many l's in the word.
    asm-generic/user.h: Fix spelling in comment
    drm: fix printk typo 'sracth'
    Remove one to many n's in a word
    Documentation/filesystems/romfs.txt: fixing link to genromfs
    drivers:scsi Change printk typo initate -> initiate
    serial, pch uart: Remove duplicate inclusion of linux/pci.h header
    fs/eventpoll.c: fix spelling
    mm: Fix out-of-date comments which refers non-existent functions
    drm: Fix printk typo 'failled'
    coh901318.c: Change initate to initiate.
    mbox-db5500.c Change initate to initiate.
    edac: correct i82975x error-info reported
    edac: correct i82975x mci initialisation
    edac: correct commented info
    fs: update comments to point correct document
    target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
    ...

    Trivial conflict in fs/eventpoll.c (spelling vs addition)

    Linus Torvalds
     

14 Mar, 2011

1 commit

  • The exportfs encode handle function should return the minimum required
    handle size. This helps user to find out the handle size by passing 0
    handle size in the first step and then redoing to the call again with
    the returned handle size value.

    Acked-by: Serge Hallyn
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     

10 Mar, 2011

3 commits


25 Feb, 2011

2 commits

  • Commit e1181ee6 "vfs: pass struct file to do_truncate on O_TRUNC
    opens" broke the behavior of open(O_TRUNC|O_RDONLY) in fuse. Fuse
    assumed that when called from open, a truncate() will be done, not an
    ftruncate().

    Fix by restoring the old behavior, based on the ATTR_OPEN flag.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Single threaded NTFS-3G could get stuck if a delayed RELEASE reply
    triggered a DESTROY request via path_put().

    Fix this by

    a) making RELEASE requests synchronous, whenever possible, on fuseblk
    filesystems

    b) if not possible (triggered by an asynchronous read/write) then do
    the path_put() in a separate thread with schedule_work().

    Reported-by: Oliver Neukum
    Cc: stable@kernel.org
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

15 Feb, 2011

1 commit


13 Jan, 2011

1 commit


10 Jan, 2011

1 commit


07 Jan, 2011

4 commits

  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Require filesystems be aware of .d_revalidate being called in rcu-walk
    mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
    -ECHILD from all implementations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

08 Dec, 2010

4 commits

  • In kernel ABI version 7.16 and later FUSE_IOCTL_RETRY reply from a
    unrestricted IOCTL request shall return with an array of 'struct
    fuse_ioctl_iovec' instead of 'struct iovec'. This fixes the ABI
    ambiguity of 32bit vs. 64bit.

    Reported-by: "ccmail111"
    Signed-off-by: Miklos Szeredi
    CC: Tejun Heo

    Miklos Szeredi
     
  • Terje Malmedal reports that a fuse filesystem with 32 million inodes
    on a machine with lots of memory can take up to 30 minutes to process
    FORGET requests when all those inodes are evicted from the icache.

    To solve this, create a BATCH_FORGET request that allows up to about
    8000 FORGET requests to be sent in a single message.

    This request is only sent if userspace supports interface version 7.16
    or later, otherwise fall back to sending individual FORGET messages.

    Reported-by: Terje Malmedal
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Terje Malmedal reports that a fuse filesystem with 32 million inodes
    on a machine with lots of memory can go unresponsive for up to 30
    minutes when all those inodes are evicted from the icache.

    The reason is that FORGET messages, sent when the inode is evicted,
    are queued up together with regular filesystem requests, and while the
    huge queue of FORGET messages are processed no other filesystem
    operation can proceed.

    Since a full fuse request structure is allocated for each inode, these
    take up quite a bit of memory as well.

    To solve these issues, create a slim 'fuse_forget_link' structure
    containing just the minimum of information required to send the FORGET
    request and chain these on a separate queue.

    When userspace is asking for a request make sure that FORGET and
    non-FORGET requests are selected fairly: for each 8 non-FORGET allow
    16 FORGET requests. This will make sure FORGETs do not pile up, yet
    other requests are also allowed to proceed while the queued FORGETs
    are processed.

    Reported-by: Terje Malmedal
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Get rid of unnecessary page_address()-es.

    Signed-off-by: Miklos Szeredi
    CC: Tejun Heo

    Miklos Szeredi
     

30 Nov, 2010

2 commits

  • Verify that the total length of the iovec returned in FUSE_IOCTL_RETRY
    doesn't overflow iov_length().

    Signed-off-by: Miklos Szeredi
    CC: Tejun Heo
    CC: [2.6.31+]

    Miklos Szeredi
     
  • If a 32bit CUSE server is run on 64bit this results in EIO being
    returned to the caller.

    The reason is that FUSE_IOCTL_RETRY reply was defined to use 'struct
    iovec', which is different on 32bit and 64bit archs.

    Work around this by looking at the size of the reply to determine
    which struct was used. This is only needed if CONFIG_COMPAT is
    defined.

    A more permanent fix for the interface will be to use the same struct
    on both 32bit and 64bit.

    Reported-by: "ccmail111"
    Signed-off-by: Miklos Szeredi
    CC: Tejun Heo
    CC: [2.6.31+]

    Miklos Szeredi
     

25 Nov, 2010

1 commit

  • The attribute cache for a file was not being cleared when a file is opened
    with O_TRUNC.

    If the filesystem's open operation truncates the file ("atomic_o_trunc"
    feature flag is set) then the kernel should invalidate the cached st_mtime
    and st_ctime attributes.

    Also i_size should be explicitly be set to zero as it is used sometimes
    without refreshing the cache.

    Signed-off-by: Ken Sumrall
    Cc: Anfei
    Cc: "Anand V. Avati"
    Signed-off-by: Miklos Szeredi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Sumrall
     

29 Oct, 2010

3 commits


28 Oct, 2010

1 commit

  • Replace iterated page_cache_release() with release_pages(), which is
    faster and shorter.

    Needs release_pages() to be exported to modules.

    Suggested-by: Andrew Morton
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

27 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    split invalidate_inodes()
    fs: skip I_FREEING inodes in writeback_sb_inodes
    fs: fold invalidate_list into invalidate_inodes
    fs: do not drop inode_lock in dispose_list
    fs: inode split IO and LRU lists
    fs: switch bdev inode bdi's correctly
    fs: fix buffer invalidation in invalidate_list
    fsnotify: use dget_parent
    smbfs: use dget_parent
    exportfs: use dget_parent
    fs: use RCU read side protection in d_validate
    fs: clean up dentry lru modification
    fs: split __shrink_dcache_sb
    fs: improve DCACHE_REFERENCED usage
    fs: use percpu counter for nr_dentry and nr_dentry_unused
    fs: simplify __d_free
    fs: take dcache_lock inside __d_path
    fs: do not assign default i_ino in new_inode
    fs: introduce a per-cpu last_ino allocator
    new helper: ihold()
    ...

    Linus Torvalds