03 Dec, 2008

1 commit

  • * 'linux-next' of git://git.infradead.org/ubifs-2.6:
    UBIFS: pre-allocate bulk-read buffer
    UBIFS: do not allocate too much
    UBIFS: do not print scary memory allocation warnings
    UBIFS: allow for gaps when dirtying the LPT
    UBIFS: fix compilation warnings
    MAINTAINERS: change UBI/UBIFS git tree URLs
    UBIFS: endian handling fixes and annotations
    UBIFS: remove printk

    Linus Torvalds
     

02 Dec, 2008

7 commits

  • kernel-doc handles macros now (it has for quite some time), so change the
    ntfs_debug() macro's kernel-doc to be just before the macro instead of
    before a phony function prototype.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Randy Dunlap
    Cc: Anton Altaparmakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • It has been thought that the per-user file descriptors limit would also
    limit the resources that a normal user can request via the epoll
    interface. Vegard Nossum reported a very simple program (a modified
    version attached) that can make a normal user to request a pretty large
    amount of kernel memory, well within the its maximum number of fds. To
    solve such problem, default limits are now imposed, and /proc based
    configuration has been introduced. A new directory has been created,
    named /proc/sys/fs/epoll/ and inside there, there are two configuration
    points:

    max_user_instances = Maximum number of devices - per user

    max_user_watches = Maximum number of "watched" fds - per user

    The current default for "max_user_watches" limits the memory used by epoll
    to store "watches", to 1/32 of the amount of the low RAM. As example, a
    256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
    That should be enough to not break existing heavy epoll users. The
    default value for "max_user_instances" is set to 128, that should be
    enough too.

    This also changes the userspace, because a new error code can now come out
    from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already
    listed, so that should be ok.

    [akpm@linux-foundation.org: use get_current_user()]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc:
    Cc: Cyrill Gorcunov
    Reported-by: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • We're panicing in ocfs2_read_blocks_sync() if a jbd-managed buffer is seen.
    At first glance, this seems ok but in reality it can happen. My test case
    was to just run 'exorcist'. A struct inode is being pushed out of memory but
    is then re-read at a later time, before the buffer has been checkpointed by
    jbd. This causes a BUG to be hit in ocfs2_read_blocks_sync().

    Reviewed-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • In init_dlmfs_fs(), if calling kmem_cache_create() failed, the code will use return value from
    calling bdi_init(). The correct behavior should be set status as -ENOMEM before going to "bail:".

    Signed-off-by: Coly Li
    Acked-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Coly Li
     
  • In ocfs2_unlock_ast(), call wake_up() on lockres before releasing
    the spin lock on it. As soon as the spin lock is released, the
    lockres can be freed.

    Signed-off-by: David Teigland
    Signed-off-by: Mark Fasheh

    David Teigland
     
  • The locking_state dump, ocfs2_dlm_seq_show, reads the lvb on locks where it
    has not yet been initialized by a lock call.

    Signed-off-by: David Teigland
    Acked-by: Joel Becker
    Signed-off-by: Mark Fasheh

    David Teigland
     
  • This patch fixes two typos in comments of ocfs2.

    Signed-off-by: Coly Li
    Signed-off-by: Mark Fasheh

    Coly Li
     

01 Dec, 2008

1 commit


28 Nov, 2008

1 commit

  • udf_clear_inode() can leave behind buffers on mapping's i_private list (when
    we truncated preallocation). Call invalidate_inode_buffers() so that the list
    is properly cleaned-up before we return from udf_clear_inode(). This is ugly
    and suggest that we should cleanup preallocation earlier than in clear_inode()
    but currently there's no such call available since drop_inode() is called under
    inode lock and thus is unusable for disk operations.

    Signed-off-by: Jan Kara

    Jan Kara
     

27 Nov, 2008

1 commit

  • The conversion to write_begin/write_end interfaces had a bug where we
    were passing a bad parameter to cifs_readpage_worker. Rather than
    passing the page offset of the start of the write, we needed to pass the
    offset of the beginning of the page. This was reliably showing up as
    data corruption in the fsx-linux test from LTP.

    It also became evident that this code was occasionally doing unnecessary
    read calls. Optimize those away by using the PG_checked flag to indicate
    that the unwritten part of the page has been initialized.

    CC: Nick Piggin
    Acked-by: Dave Kleikamp
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

22 Nov, 2008

3 commits

  • To avoid memory allocation failure during bulk-read, pre-allocate
    a bulk-read buffer, so that if there is only one bulk-reader at
    a time, it would just use the pre-allocated buffer and would not
    do any memory allocation. However, if there are more than 1 bulk-
    reader, then only one reader would use the pre-allocated buffer,
    while the other reader would allocate the buffer for itself.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Bulk-read allocates 128KiB or more using kmalloc. The allocation
    starts failing often when the memory gets fragmented. UBIFS still
    works fine in this case, because it falls-back to standard
    (non-optimized) read method, though. This patch teaches bulk-read
    to allocate exactly the amount of memory it needs, instead of
    allocating 128KiB every time.

    This patch is also a preparation to the further fix where we'll
    have a pre-allocated bulk-read buffer as well. For example, now
    the @bu object is prepared in 'ubifs_bulk_read()', so we could
    path either pre-allocated or allocated information to
    'ubifs_do_bulk_read()' later. Or teaching 'ubifs_do_bulk_read()'
    not to allocate 'bu->buf' if it is already there.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Bulk-read allocates a lot of memory with 'kmalloc()', and when it
    is/gets fragmented 'kmalloc()' fails with a scarry warning. But
    because bulk-read is just an optimization, UBIFS keeps working fine.
    Supress the warning by passing __GFP_NOWARN option to 'kmalloc()'.

    This patch also introduces a macro for the magic 128KiB constant.
    This is just neater.

    Note, this is not really fixes the problem we had, but just hides
    the warnings. The further patches fix the problem.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     

21 Nov, 2008

2 commits


20 Nov, 2008

3 commits

  • fs/hostfs/hostfs_user.c defines do_readlink() as non-static, and so does
    fs/xfs/linux-2.6/xfs_ioctl.c when CONFIG_XFS_DEBUG=y. So rename
    do_readlink() in hostfs to hostfs_do_readlink().

    I think it's better if XFS guys will also rename their do_readlink(),
    it's not necessary to use such a general name.

    Signed-off-by: WANG Cong
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • Peter Cordes is sorry that he rm'ed his swapfiles while they were in use,
    he then had no pathname to swapoff. It's a curious little oversight, but
    not one worth a lot of hackery. Kudos to Willy Tarreau for turning this
    around from a discussion of synthetic pathnames to how to prevent unlink.
    Mimic immutable: prohibit unlinking an active swapfile in may_delete()
    (and don't worry my little head over the tiny race window).

    Signed-off-by: Hugh Dickins
    Cc: Willy Tarreau
    Acked-by: Christoph Hellwig
    Cc: Peter Cordes
    Cc: Bodo Eggert
    Cc: David Newall
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • I have received some reports of out-of-memory errors on some older AMD
    architectures. These errors are what I would expect to see if
    crypt_stat->key were split between two separate pages. eCryptfs should
    not assume that any of the memory sent through virt_to_scatterlist() is
    all contained in a single page, and so this patch allocates two
    scatterlist structs instead of one when processing keys. I have received
    confirmation from one person affected by this bug that this patch resolves
    the issue for him, and so I am submitting it for inclusion in a future
    stable release.

    Note that virt_to_scatterlist() runs sg_init_table() on the scatterlist
    structs passed to it, so the calls to sg_init_table() in
    decrypt_passphrase_encrypted_session_key() are redundant.

    Signed-off-by: Michael Halcrow
    Reported-by: Paulo J. S. Silva
    Cc: "Leon Woestenberg"
    Cc: Tim Gardner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     

19 Nov, 2008

1 commit


18 Nov, 2008

7 commits

  • Block ext devt conversion missed md_autodetect_dev() call in
    rescan_partitions() leaving md autodetect unable to see partitions.
    Fix it.

    Signed-off-by: Tejun Heo
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Make add_partition() return pointer to the new hd_struct on success
    and ERR_PTR() value on failure. This change will be used to fix md
    autodetection bug.

    Signed-off-by: Tejun Heo
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Partition stats structure was not freed on devt allocation failure
    path. Fix it.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    prevent cifs_writepages() from skipping unwritten pages
    Fixed parsing of mount options when doing DFS submount
    [CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch
    [CIFS] Fix build break
    cifs: reinstate sharing of tree connections
    [CIFS] minor cleanup to cifs_mount
    cifs: reinstate sharing of SMB sessions sans races
    cifs: disable sharing session and tcon and add new TCP sharing code
    [CIFS] clean up server protocol handling
    [CIFS] remove unused list, add new cifs sock list to prepare for mount/umount fix
    [CIFS] Fix cifs reconnection flags
    [CIFS] Can't rely on iov length and base when kernel_recvmsg returns error

    Linus Torvalds
     
  • Fixes a data corruption under heavy stress in which pages could be left
    dirty after all open instances of a inode have been closed.

    In order to write contiguous pages whenever possible, cifs_writepages()
    asks pagevec_lookup_tag() for more pages than it may write at one time.
    Normally, it then resets index just past the last page written before calling
    pagevec_lookup_tag() again.

    If cifs_writepages() can't write the first page returned, it wasn't resetting
    index, and the next call to pagevec_lookup_tag() resulted in skipping all of
    the pages it previously returned, even though cifs_writepages() did nothing
    with them. This can result in data loss when the file descriptor is about
    to be closed.

    This patch ensures that index gets set back to the next returned page so
    that none get skipped.

    Signed-off-by: Dave Kleikamp
    Acked-by: Jeff Layton
    Cc: Shirish S Pargaonkar
    Signed-off-by: Steve French

    Dave Kleikamp
     
  • Since these hit the same routines, and are relatively small, it is easier to review
    them as one patch.

    Fixed incorrect handling of the last option in some cases
    Fixed prefixpath handling convert path_consumed into host depended string length (in bytes)
    Use non default separator if it is provided in the original mount options

    Acked-by: Jeff Layton
    Signed-off-by: Igor Mammedov
    Signed-off-by: Steve French

    Igor Mammedov
     
  • set tcon->ses earlier

    If the inital tree connect fails, we'll end up calling cifs_put_smb_ses
    with a NULL pointer. Fix it by setting the tcon->ses earlier.

    Acked-by: Jeff Layton
    Signed-off-by: Steve French

    Steve French
     

17 Nov, 2008

3 commits


16 Nov, 2008

1 commit

  • Inotify watch removals suck violently.

    To kick the watch out we need (in this order) inode->inotify_mutex and
    ih->mutex. That's fine if we have a hold on inode; however, for all
    other cases we need to make damn sure we don't race with umount. We can
    *NOT* just grab a reference to a watch - inotify_unmount_inodes() will
    happily sail past it and we'll end with reference to inode potentially
    outliving its superblock.

    Ideally we just want to grab an active reference to superblock if we
    can; that will make sure we won't go into inotify_umount_inodes() until
    we are done. Cleanup is just deactivate_super().

    However, that leaves a messy case - what if we *are* racing with
    umount() and active references to superblock can't be acquired anymore?
    We can bump ->s_count, grab ->s_umount, which will almost certainly wait
    until the superblock is shut down and the watch in question is pining
    for fjords. That's fine, but there is a problem - we might have hit the
    window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
    the moment when superblock is past the point of no return and is heading
    for shutdown) and the moment when deactivate_super() acquires
    ->s_umount.

    We could just do drop_super() yield() and retry, but that's rather
    antisocial and this stuff is luser-triggerable. OTOH, having grabbed
    ->s_umount and having found that we'd got there first (i.e. that
    ->s_root is non-NULL) we know that we won't race with
    inotify_umount_inodes().

    So we could grab a reference to watch and do the rest as above, just
    with drop_super() instead of deactivate_super(), right? Wrong. We had
    to drop ih->mutex before we could grab ->s_umount. So the watch
    could've been gone already.

    That still can be dealt with - we need to save watch->wd, do idr_find()
    and compare its result with our pointer. If they match, we either have
    the damn thing still alive or we'd lost not one but two races at once,
    the watch had been killed and a new one got created with the same ->wd
    at the same address. That couldn't have happened in inotify_destroy(),
    but inotify_rm_wd() could run into that. Still, "new one got created"
    is not a problem - we have every right to kill it or leave it alone,
    whatever's more convenient.

    So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
    "grab it and kill it" check. If it's been our original watch, we are
    fine, if it's a newcomer - nevermind, just pretend that we'd won the
    race and kill the fscker anyway; we are safe since we know that its
    superblock won't be going away.

    And yes, this is far beyond mere "not very pretty"; so's the entire
    concept of inotify to start with.

    Signed-off-by: Al Viro
    Acked-by: Greg KH
    Signed-off-by: Linus Torvalds

    Al Viro
     

15 Nov, 2008

3 commits

  • Signed-off-by: Steve French

    Steve French
     
  • We do this by abandoning the global list of SMB sessions and instead
    moving to a per-server list. This entails adding a new list head to the
    TCP_Server_Info struct. The refcounting for the cifsSesInfo is moved to
    a non-atomic variable. We have to protect it by a lock anyway, so there's
    no benefit to making it an atomic. The list and refcount are protected
    by the global cifs_tcp_ses_lock.

    The patch also adds a new routines to find and put SMB sessions and
    that properly take and put references under the lock.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • The code that allows these structs to be shared is extremely racy.
    Disable the sharing of SMB and tcon structs for now until we can
    come up with a way to do this that's race free.

    We want to continue to share TCP sessions, however since they are
    required for multiuser mounts. For that, implement a new (hopefully
    race-free) scheme. Add a new global list of TCP sessions, and take
    care to get a reference to it whenever we're dealing with one.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

14 Nov, 2008

5 commits

  • We're currently declaring both a sockaddr_in and sockaddr6_in on the
    stack, but we really only need storage for one of them. Declare a
    sockaddr struct and cast it to the proper type. Also, eliminate the
    protocolType field in the TCP_Server_Info struct. It's redundant since
    we have a sa_family field in the sockaddr anyway.

    We may need to revisit this if SCTP is ever implemented, but for now
    this will simplify the code.

    CIFS over IPv6 also has a number of problems currently. This fixes all
    of them that I found. Eventually, it would be nice to move more of the
    code to be protocol independent, but this is a start.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Steve French
     
  • Also adds two lines missing from the previous patch (for the need reconnect flag in the
    /proc/fs/cifs/DebugData handling)

    The new global_cifs_sock_list is added, and initialized in init_cifs but not used yet.
    Jeff Layton will be adding code in to use that and to remove the GlobalTcon and GlobalSMBSession
    lists.

    CC: Jeff Layton
    CC: Shirish Pargaonkar
    Signed-off-by: Steve French

    Steve French
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
    dlm: fix shutdown cleanup

    Linus Torvalds
     
  • In preparation for Jeff's big umount/mount fixes to remove the possibility of
    various races in cifs mount and linked list handling of sessions, sockets and
    tree connections, this patch cleans up some repetitive code in cifs_mount,
    and addresses a problem with ses->status and tcon->tidStatus in which we
    were overloading the "need_reconnect" state with other status in that
    field. So the "need_reconnect" flag has been broken out from those
    two state fields (need reconnect was not mutually exclusive from some of the
    other possible tid and ses states). In addition, a few exit cases in
    cifs_mount were cleaned up, and a problem with a tcon flag (for lease support)
    was not being set consistently for the 2nd mount of the same share

    CC: Jeff Layton
    CC: Shirish Pargaonkar
    Signed-off-by: Steve French

    Steve French
     
  • Fixes a regression from commit 0f8e0d9a317406612700426fad3efab0b7bbc467,
    "dlm: allow multiple lockspace creates".

    An extraneous 'else' slipped into a code fragment being moved from
    release_lockspace() to dlm_release_lockspace(). The result of the
    unwanted 'else' is that dlm threads and structures are not stopped
    and cleaned up when the final dlm lockspace is removed. Trying to
    create a new lockspace again afterward will fail with
    "kmem_cache_create: duplicate cache dlm_conn" because the cache
    was not previously destroyed.

    Signed-off-by: David Teigland

    David Teigland
     

13 Nov, 2008

1 commit