05 Dec, 2008

1 commit


04 Dec, 2008

2 commits


03 Dec, 2008

1 commit

  • * 'linux-next' of git://git.infradead.org/ubifs-2.6:
    UBIFS: pre-allocate bulk-read buffer
    UBIFS: do not allocate too much
    UBIFS: do not print scary memory allocation warnings
    UBIFS: allow for gaps when dirtying the LPT
    UBIFS: fix compilation warnings
    MAINTAINERS: change UBI/UBIFS git tree URLs
    UBIFS: endian handling fixes and annotations
    UBIFS: remove printk

    Linus Torvalds
     

02 Dec, 2008

7 commits

  • kernel-doc handles macros now (it has for quite some time), so change the
    ntfs_debug() macro's kernel-doc to be just before the macro instead of
    before a phony function prototype.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Randy Dunlap
    Cc: Anton Altaparmakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • It has been thought that the per-user file descriptors limit would also
    limit the resources that a normal user can request via the epoll
    interface. Vegard Nossum reported a very simple program (a modified
    version attached) that can make a normal user to request a pretty large
    amount of kernel memory, well within the its maximum number of fds. To
    solve such problem, default limits are now imposed, and /proc based
    configuration has been introduced. A new directory has been created,
    named /proc/sys/fs/epoll/ and inside there, there are two configuration
    points:

    max_user_instances = Maximum number of devices - per user

    max_user_watches = Maximum number of "watched" fds - per user

    The current default for "max_user_watches" limits the memory used by epoll
    to store "watches", to 1/32 of the amount of the low RAM. As example, a
    256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
    That should be enough to not break existing heavy epoll users. The
    default value for "max_user_instances" is set to 128, that should be
    enough too.

    This also changes the userspace, because a new error code can now come out
    from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already
    listed, so that should be ok.

    [akpm@linux-foundation.org: use get_current_user()]
    Signed-off-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc:
    Cc: Cyrill Gorcunov
    Reported-by: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • We're panicing in ocfs2_read_blocks_sync() if a jbd-managed buffer is seen.
    At first glance, this seems ok but in reality it can happen. My test case
    was to just run 'exorcist'. A struct inode is being pushed out of memory but
    is then re-read at a later time, before the buffer has been checkpointed by
    jbd. This causes a BUG to be hit in ocfs2_read_blocks_sync().

    Reviewed-by: Joel Becker
    Signed-off-by: Mark Fasheh

    Mark Fasheh
     
  • In init_dlmfs_fs(), if calling kmem_cache_create() failed, the code will use return value from
    calling bdi_init(). The correct behavior should be set status as -ENOMEM before going to "bail:".

    Signed-off-by: Coly Li
    Acked-by: Sunil Mushran
    Signed-off-by: Mark Fasheh

    Coly Li
     
  • In ocfs2_unlock_ast(), call wake_up() on lockres before releasing
    the spin lock on it. As soon as the spin lock is released, the
    lockres can be freed.

    Signed-off-by: David Teigland
    Signed-off-by: Mark Fasheh

    David Teigland
     
  • The locking_state dump, ocfs2_dlm_seq_show, reads the lvb on locks where it
    has not yet been initialized by a lock call.

    Signed-off-by: David Teigland
    Acked-by: Joel Becker
    Signed-off-by: Mark Fasheh

    David Teigland
     
  • This patch fixes two typos in comments of ocfs2.

    Signed-off-by: Coly Li
    Signed-off-by: Mark Fasheh

    Coly Li
     

01 Dec, 2008

1 commit


29 Nov, 2008

1 commit


28 Nov, 2008

1 commit

  • udf_clear_inode() can leave behind buffers on mapping's i_private list (when
    we truncated preallocation). Call invalidate_inode_buffers() so that the list
    is properly cleaned-up before we return from udf_clear_inode(). This is ugly
    and suggest that we should cleanup preallocation earlier than in clear_inode()
    but currently there's no such call available since drop_inode() is called under
    inode lock and thus is unusable for disk operations.

    Signed-off-by: Jan Kara

    Jan Kara
     

27 Nov, 2008

1 commit

  • The conversion to write_begin/write_end interfaces had a bug where we
    were passing a bad parameter to cifs_readpage_worker. Rather than
    passing the page offset of the start of the write, we needed to pass the
    offset of the beginning of the page. This was reliably showing up as
    data corruption in the fsx-linux test from LTP.

    It also became evident that this code was occasionally doing unnecessary
    read calls. Optimize those away by using the PG_checked flag to indicate
    that the unwritten part of the page has been initialized.

    CC: Nick Piggin
    Acked-by: Dave Kleikamp
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

26 Nov, 2008

2 commits

  • Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
    sites. Spread them out to the usage sites, as suggested by
    Mathieu Desnoyers.

    Signed-off-by: Ingo Molnar
    Acked-by: Mathieu Desnoyers

    Ingo Molnar
     
  • This was a forward port of work done by Mathieu Desnoyers, I changed it to
    encode the 'what' parameter on the tracepoint name, so that one can register
    interest in specific events and not on classes of events to then check the
    'what' parameter.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jens Axboe
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

25 Nov, 2008

3 commits

  • Since commit c98451bd, the loop in nlm_lookup_host() unconditionally
    compares the host's h_srcaddr field to the incoming source address.
    For client-side nlm_host entries, both are always AF_UNSPEC, so this
    check is unnecessary.

    Since commit 781b61a6, which added support for AF_INET6 addresses to
    nlm_cmp_addr(), nlm_cmp_addr() now returns FALSE for AF_UNSPEC
    addresses, which causes nlm_lookup_host() to create a fresh nlm_host
    entry every time it is called on the client.

    These extra entries will eventually expire once the server is
    unmounted, so the impact of this regression, introduced with lockd
    IPv6 support in 2.6.28, should be minor.

    We could fix this by adding an arm in nlm_cmp_addr() for AF_UNSPEC
    addresses, but really, nlm_lookup_host() shouldn't be matching on the
    srcaddr field for client-side nlm_host lookups.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Thanks to Matthew Dodd for this bug report:

    A file label issue while running SELinux in MLS mode provoked the
    following bug, which is a result of use before init on a 'struct list_head'.

    In nfsd4_list_rec_dir() if the call to dentry_open() fails the 'goto
    out' skips INIT_LIST_HEAD() which results in the normally improbable
    case where list_entry() returns NULL.

    Trace follows.

    NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
    SELinux: Context unconfined_t:object_r:var_lib_nfs_t:s0 is not valid
    (left unmapped).
    type=1400 audit(1227298063.609:282): avc: denied { read } for
    pid=1890 comm="rpc.nfsd" name="v4recovery" dev=dm-0 ino=148726
    scontext=system_u:system_r:nfsd_t:s0-s15:c0.c1023
    tcontext=system_u:object_r:unlabeled_t:s15:c0.c1023 tclass=dir
    BUG: unable to handle kernel NULL pointer dereference at 00000004
    IP: [] list_del+0x6/0x60
    *pde = 0d9ce067 *pte = 00000000
    Oops: 0000 [#1] SMP
    Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs autofs4
    sunrpc ipv6 dm_multipath scsi_dh ppdev parport_pc sg parport floppy
    ata_piix pata_acpi ata_generic libata pcnet32 i2c_piix4 mii pcspkr
    i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod BusLogic sd_mod
    scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last
    unloaded: microcode]

    Pid: 1890, comm: rpc.nfsd Not tainted (2.6.27.5-37.fc9.i686 #1)
    EIP: 0060:[] EFLAGS: 00010217 CPU: 0
    EIP is at list_del+0x6/0x60
    EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: cd99e480
    ESI: cf9caed8 EDI: 00000000 EBP: cf9caebc ESP: cf9caeb8
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process rpc.nfsd (pid: 1890, ti=cf9ca000 task=cf4de580 task.ti=cf9ca000)
    Stack: 00000000 cf9caef0 d0a9f139 c0496d04 d0a9f217 fffffff3 00000000
    00000000
    00000000 00000000 cf32b220 00000000 00000008 00000801 cf9caefc
    d0a9f193
    00000000 cf9caf08 d0a9b6ea 00000000 cf9caf1c d0a874f2 cf9c3004
    00000008
    Call Trace:
    [] ? nfsd4_list_rec_dir+0xf3/0x13a [nfsd]
    [] ? do_path_lookup+0x12d/0x175
    [] ? load_recdir+0x0/0x26 [nfsd]
    [] ? nfsd4_recdir_load+0x13/0x34 [nfsd]
    [] ? nfs4_state_start+0x2a/0xc5 [nfsd]
    [] ? nfsd_svc+0x51/0xff [nfsd]
    [] ? write_svc+0x0/0x1e [nfsd]
    [] ? write_svc+0x1b/0x1e [nfsd]
    [] ? nfsctl_transaction_write+0x3a/0x61 [nfsd]
    [] ? sys_nfsservctl+0x116/0x154
    [] ? putname+0x24/0x2f
    [] ? putname+0x24/0x2f
    [] ? do_sys_open+0xad/0xb7
    [] ? filp_close+0x50/0x5a
    [] ? sys_open+0x1e/0x26
    [] ? syscall_call+0x7/0xb
    [] ? init_cyrix+0x185/0x490
    =======================
    Code: 75 e1 8b 53 08 8d 4b 04 8d 46 04 e8 75 00 00 00 8b 53 10 8d 4b 0c
    8d 46 0c e8 67 00 00 00 5b 5e 5f 5d c3 90 90 55 89 e5 53 89 c3 40
    04 8b 00 39 d8 74 16 50 53 68 3e d6 6f c0 6a 30 68 78 d6
    EIP: [] list_del+0x6/0x60 SS:ESP 0068:cf9caeb8
    ---[ end trace a89c4ad091c4ad53 ]---

    Cc: Matthew N. Dodd
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • If nfsd was shut down before the grace period ended, we could end up
    with a freed object still on grace_list. Thanks to Jeff Moyer for
    reporting the resulting list corruption warnings.

    Signed-off-by: J. Bruce Fields
    Tested-by: Jeff Moyer

    J. Bruce Fields
     

24 Nov, 2008

1 commit


23 Nov, 2008

1 commit


22 Nov, 2008

3 commits

  • To avoid memory allocation failure during bulk-read, pre-allocate
    a bulk-read buffer, so that if there is only one bulk-reader at
    a time, it would just use the pre-allocated buffer and would not
    do any memory allocation. However, if there are more than 1 bulk-
    reader, then only one reader would use the pre-allocated buffer,
    while the other reader would allocate the buffer for itself.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Bulk-read allocates 128KiB or more using kmalloc. The allocation
    starts failing often when the memory gets fragmented. UBIFS still
    works fine in this case, because it falls-back to standard
    (non-optimized) read method, though. This patch teaches bulk-read
    to allocate exactly the amount of memory it needs, instead of
    allocating 128KiB every time.

    This patch is also a preparation to the further fix where we'll
    have a pre-allocated bulk-read buffer as well. For example, now
    the @bu object is prepared in 'ubifs_bulk_read()', so we could
    path either pre-allocated or allocated information to
    'ubifs_do_bulk_read()' later. Or teaching 'ubifs_do_bulk_read()'
    not to allocate 'bu->buf' if it is already there.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Bulk-read allocates a lot of memory with 'kmalloc()', and when it
    is/gets fragmented 'kmalloc()' fails with a scarry warning. But
    because bulk-read is just an optimization, UBIFS keeps working fine.
    Supress the warning by passing __GFP_NOWARN option to 'kmalloc()'.

    This patch also introduces a macro for the magic 128KiB constant.
    This is just neater.

    Note, this is not really fixes the problem we had, but just hides
    the warnings. The further patches fix the problem.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     

21 Nov, 2008

2 commits


20 Nov, 2008

3 commits

  • fs/hostfs/hostfs_user.c defines do_readlink() as non-static, and so does
    fs/xfs/linux-2.6/xfs_ioctl.c when CONFIG_XFS_DEBUG=y. So rename
    do_readlink() in hostfs to hostfs_do_readlink().

    I think it's better if XFS guys will also rename their do_readlink(),
    it's not necessary to use such a general name.

    Signed-off-by: WANG Cong
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • Peter Cordes is sorry that he rm'ed his swapfiles while they were in use,
    he then had no pathname to swapoff. It's a curious little oversight, but
    not one worth a lot of hackery. Kudos to Willy Tarreau for turning this
    around from a discussion of synthetic pathnames to how to prevent unlink.
    Mimic immutable: prohibit unlinking an active swapfile in may_delete()
    (and don't worry my little head over the tiny race window).

    Signed-off-by: Hugh Dickins
    Cc: Willy Tarreau
    Acked-by: Christoph Hellwig
    Cc: Peter Cordes
    Cc: Bodo Eggert
    Cc: David Newall
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • I have received some reports of out-of-memory errors on some older AMD
    architectures. These errors are what I would expect to see if
    crypt_stat->key were split between two separate pages. eCryptfs should
    not assume that any of the memory sent through virt_to_scatterlist() is
    all contained in a single page, and so this patch allocates two
    scatterlist structs instead of one when processing keys. I have received
    confirmation from one person affected by this bug that this patch resolves
    the issue for him, and so I am submitting it for inclusion in a future
    stable release.

    Note that virt_to_scatterlist() runs sg_init_table() on the scatterlist
    structs passed to it, so the calls to sg_init_table() in
    decrypt_passphrase_encrypted_session_key() are redundant.

    Signed-off-by: Michael Halcrow
    Reported-by: Paulo J. S. Silva
    Cc: "Leon Woestenberg"
    Cc: Tim Gardner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     

19 Nov, 2008

1 commit


18 Nov, 2008

7 commits

  • Block ext devt conversion missed md_autodetect_dev() call in
    rescan_partitions() leaving md autodetect unable to see partitions.
    Fix it.

    Signed-off-by: Tejun Heo
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Make add_partition() return pointer to the new hd_struct on success
    and ERR_PTR() value on failure. This change will be used to fix md
    autodetection bug.

    Signed-off-by: Tejun Heo
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Partition stats structure was not freed on devt allocation failure
    path. Fix it.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    prevent cifs_writepages() from skipping unwritten pages
    Fixed parsing of mount options when doing DFS submount
    [CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch
    [CIFS] Fix build break
    cifs: reinstate sharing of tree connections
    [CIFS] minor cleanup to cifs_mount
    cifs: reinstate sharing of SMB sessions sans races
    cifs: disable sharing session and tcon and add new TCP sharing code
    [CIFS] clean up server protocol handling
    [CIFS] remove unused list, add new cifs sock list to prepare for mount/umount fix
    [CIFS] Fix cifs reconnection flags
    [CIFS] Can't rely on iov length and base when kernel_recvmsg returns error

    Linus Torvalds
     
  • Fixes a data corruption under heavy stress in which pages could be left
    dirty after all open instances of a inode have been closed.

    In order to write contiguous pages whenever possible, cifs_writepages()
    asks pagevec_lookup_tag() for more pages than it may write at one time.
    Normally, it then resets index just past the last page written before calling
    pagevec_lookup_tag() again.

    If cifs_writepages() can't write the first page returned, it wasn't resetting
    index, and the next call to pagevec_lookup_tag() resulted in skipping all of
    the pages it previously returned, even though cifs_writepages() did nothing
    with them. This can result in data loss when the file descriptor is about
    to be closed.

    This patch ensures that index gets set back to the next returned page so
    that none get skipped.

    Signed-off-by: Dave Kleikamp
    Acked-by: Jeff Layton
    Cc: Shirish S Pargaonkar
    Signed-off-by: Steve French

    Dave Kleikamp
     
  • Since these hit the same routines, and are relatively small, it is easier to review
    them as one patch.

    Fixed incorrect handling of the last option in some cases
    Fixed prefixpath handling convert path_consumed into host depended string length (in bytes)
    Use non default separator if it is provided in the original mount options

    Acked-by: Jeff Layton
    Signed-off-by: Igor Mammedov
    Signed-off-by: Steve French

    Igor Mammedov
     
  • set tcon->ses earlier

    If the inital tree connect fails, we'll end up calling cifs_put_smb_ses
    with a NULL pointer. Fix it by setting the tcon->ses earlier.

    Acked-by: Jeff Layton
    Signed-off-by: Steve French

    Steve French
     

17 Nov, 2008

2 commits