08 Feb, 2012

1 commit

  • * git://git.samba.org/sfrench/cifs-2.6:
    cifs: Fix oops in session setup code for null user mounts
    [CIFS] Update cifs Kconfig title to match removal of experimental dependency
    cifs: fix printk format warnings
    cifs: check offset in decode_ntlmssp_challenge()
    cifs: NULL dereference on allocation failure

    Linus Torvalds
     

07 Feb, 2012

1 commit

  • Setting the task name is done within setup_new_exec() by accessing
    bprm->filename. However this happens after flush_old_exec().
    This may result in a use after free bug, flush_old_exec() may
    "complete" vfork_done, which will wake up the parent which in turn
    may free the passed in filename.
    To fix this add a new tcomm field in struct linux_binprm which
    contains the now early generated task name until it is used.

    Fixes this bug on s390:

    Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000
    Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818)
    Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374)
    Call Trace:
    ([] setup_new_exec+0x38/0x374)
    [] load_elf_binary+0x402/0x1bf4
    [] search_binary_handler+0x38e/0x5bc
    [] do_execve_common+0x410/0x514
    [] do_execve+0x46/0x58
    [] kernel_execve+0x28/0x70
    [] ____call_usermodehelper+0x102/0x140
    [] kernel_thread_starter+0x6/0xc
    [] kernel_thread_starter+0x0/0xc
    Last Breaking-Event-Address:
    [] setup_new_exec+0x2fc/0x374

    Kernel panic - not syncing: Fatal exception: panic_on_oops

    Reported-by: Sebastian Ott
    Signed-off-by: Heiko Carstens
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

04 Feb, 2012

1 commit

  • - Fix a regression in 16-bit Atmel NAND flash which was introduced in 3.1
    - Fix breakage with MTD suspend caused by the API rework
    - Fix a problem with resetting the MX28 BCH module
    - A couple of other trivial fixes

    * tag 'for-linus-3.3-20120204' of git://git.infradead.org/~dwmw2/mtd-3.3:
    Revert "mtd: atmel_nand: optimize read/write buffer functions"
    mtd: fix MTD suspend
    jffs2: do not initialize variable unnecessarily
    mtd: gpmi-nand bugfix: reset the BCH module when it is not MX23
    mtd: nand: fix typo in comment

    Linus Torvalds
     

03 Feb, 2012

6 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: fix safety of rbd_put_client()
    rbd: fix a memory leak in rbd_get_client()
    ceph: create a new session lock to avoid lock inversion
    ceph: fix length validation in parse_reply_info()
    ceph: initialize client debugfs outside of monc->mutex
    ceph: change "ceph.layout" xattr to be "ceph.file.layout"

    Linus Torvalds
     
  • For null user mounts, do not invoke string length function
    during session setup.

    Cc:
    Acked-by: Jeff Layton
    Signed-off-by: Shirish Pargaonkar
    Signed-off-by: Steve French

    Shirish Pargaonkar
     
  • This fixes the race in process_vm_core found by Oleg (see

    http://article.gmane.org/gmane.linux.kernel/1235667/

    for details).

    This has been updated since I last sent it as the creation of the new
    mm_access() function did almost exactly the same thing as parts of the
    previous version of this patch did.

    In order to use mm_access() even when /proc isn't enabled, we move it to
    kernel/fork.c where other related process mm access functions already
    are.

    Signed-off-by: Chris Yeoh
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     
  • Lockdep was reporting a possible circular lock dependency in
    dentry_lease_is_valid(). That function needs to sample the
    session's s_cap_gen and and s_cap_ttl fields coherently, but needs
    to do so while holding a dentry lock. The s_cap_lock field was
    being used to protect the two fields, but that can't be taken while
    holding a lock on a dentry within the session.

    In most cases, the s_cap_gen and s_cap_ttl fields only get operated
    on separately. But in three cases they need to be updated together.
    Implement a new lock to protect the spots updating both fields
    atomically is required.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • "len" is read from network and thus needs validation. Otherwise, given
    a bogus "len" value, p+len could be an out-of-bounds pointer, which is
    used in further parsing.

    Signed-off-by: Xi Wang
    Signed-off-by: Sage Weil

    Xi Wang
     
  • The virtual extended attribute named "ceph.layout" is meaningful
    only for regular files. Change its name to be "ceph.file.layout" to
    more directly reflect that in the ceph xattr namespace. Preserve
    the old "ceph.layout" name for the time being (until we decide it's
    safe to get rid of it entirely).

    Add a missing initializer for "readonly" in the terminating entry.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     

02 Feb, 2012

4 commits

  • Once /proc/pid/mem is opened, the memory can't be released until
    mem_release() even if its owner exits.

    Change mem_open() to do atomic_inc(mm_count) + mmput(), this only
    pins mm_struct. Change mem_rw() to do atomic_inc_not_zero(mm_count)
    before access_remote_vm(), this verifies that this mm is still alive.

    I am not sure what should mem_rw() return if atomic_inc_not_zero()
    fails. With this patch it returns zero to match the "mm == NULL" case,
    may be it should return -EINVAL like it did before e268337d.

    Perhaps it makes sense to add the additional fatal_signal_pending()
    check into the main loop, to ensure we do not hold this memory if
    the target task was oom-killed.

    Cc: stable@kernel.org
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No functional changes, cleanup and preparation.

    mem_read() and mem_write() are very similar. Move this code into the
    new common helper, mem_rw(), which takes the additional "int write"
    argument.

    Cc: stable@kernel.org
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • mem_release() can hit mm == NULL, add the necessary check.

    Cc: stable@kernel.org
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This patch fixes merge conflict resolution breakage introduced by merge
    d3712b9dfcf4 ("Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream").

    The commit changed 'mtd_can_have_bb()' function and made it always
    return zero, which is incorrect. Instead, we need it to return whether
    the underlying flash device can have bad eraseblocks or not. UBI needs
    this information because it affects how it handles the underlying flash.
    E.g., if the underlying flash is NOR, it cannot have bad blocks and any
    write or erase error is fatal, and all we can do is to switch to R/O
    mode. We do not need to reserve a pool of good eraseblocks for bad
    eraseblocks handling, and so on.

    This patch also removes 'mtd_can_have_bb()' invocations from Logfs to
    ensure correct Logfs behavior.

    I've tested that with this patch UBI works on top of NOR and NAND
    flashes emulated by mtdram and nandsim correspondingly.

    This patch is based on patch from Linus Torvalds.

    Signed-off-by: Artem Bityutskiy
    Acked-by: Jörn Engel
    Acked-by: Prasad Joshi
    Acked-by: Brian Norris
    Signed-off-by: Linus Torvalds

    Artem Bityutskiy
     

01 Feb, 2012

2 commits

  • Removed the dependency on CONFIG_EXPERIMENTAL but forgot to update
    the text description to be consistent.

    Signed-off-by: Steve French

    Steve French
     
  • There are few important bug fixes for LogFS

    * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
    Logfs: Allow NULL block_isbad() methods
    logfs: Grow inode in delete path
    logfs: Free areas before calling generic_shutdown_super()
    logfs: remove useless BUG_ON
    MAINTAINERS: Add Prasad Joshi in LogFS maintiners
    logfs: Propagate page parameter to __logfs_write_inode
    logfs: set superblock shutdown flag after generic sb shutdown
    logfs: take write mutex lock during fsync and sync
    logfs: Prevent memory corruption
    logfs: update page reference count for pined pages

    Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what
    "mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL
    block_isbad() methods" clashing with the abstraction changes in the
    commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and
    d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly".

    This resolution takes the semantics from commit f2933e86ad93, and just
    makes mtd_block_isbad() return zero (false) if the 'block_isbad'
    function is NULL. But that also means that now "mtd_can_have_bb()"
    always returns 0.

    Now, "mtd_block_markbad()" will obviously return an error if the
    low-level driver doesn't support bad blocks, so this is somewhat
    non-symmetric, but it actually makes sense if a NULL "block_isbad"
    function is considered to mean "I assume that all my blocks are always
    good".

    Linus Torvalds
     

31 Jan, 2012

2 commits

  • Fix printk format warnings for ssize_t variables:

    fs/cifs/connect.c:2145:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
    fs/cifs/connect.c:2152:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
    fs/cifs/connect.c:2160:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
    fs/cifs/connect.c:2170:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'

    Signed-off-by: Randy Dunlap
    Acked-by: Jeff Layton
    Cc: linux-cifs@vger.kernel.org

    Randy Dunlap
     
  • We should check that we're not copying memory from beyond the end of the
    blob.

    Signed-off-by: Dan Carpenter
    Reviewed-by: Jeff Layton

    Dan Carpenter
     

29 Jan, 2012

2 commits

  • …ernel/git/gregkh/driver-core

    Here are some patches for the 3.3-rc1 tree.

    It contains the removal of the sysdev code, now that all users of it are
    gone, as well as some sysfs bugfixes that have been reported by users.
    There are also some documentation updates here as well.

    * tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    sysfs: Complain bitterly about attempts to remove files from nonexistent directories.
    stable: update documentation to ask for kernel version
    base/core.c:fix typo in comment in function device_add
    Documentation: devres: add allocation functions to list of supported calls
    Documentation update for the driver model core
    kernel-doc: fix new warnings in driver-core
    kernel-doc: fix new warnings in debugfs
    kernel-doc: fix new warnings in device.h
    driver core: remove drivers/base/sys.c and include/linux/sysdev.h

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix reservations in btrfs_page_mkwrite
    Btrfs: advance window_start if we're using a bitmap
    btrfs: mask out gfp flags in releasepage
    Btrfs: fix enospc error caused by wrong checks of the chunk
    Btrfs: do not defrag a file partially
    Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
    Btrfs: use cluster->window_start when allocating from a cluster bitmap
    Btrfs: Check for NULL page in extent_range_uptodate
    btrfs: Fix busyloops in transaction waiting code
    Btrfs: make sure a bitmap has enough bytes
    Btrfs: fix uninit warning in backref.c

    Linus Torvalds
     

28 Jan, 2012

9 commits

  • Not all mtd drivers define block_isbad(). Let's assume no bad blocks
    instead of refusing to mount.

    Signed-off-by: Joern Engel

    Joern Engel
     
  • Can be necessary if an inode gets deleted (through -ENOSPC) before being
    written. Might be better to move this into logfs_write_rec(), but for
    now go with the stupid&safe patch.

    Signed-off-by: Joern Engel

    Joern Engel
     
  • Or hit an assertion in map_invalidatepage() instead.

    Signed-off-by: Joern Engel

    Joern Engel
     
  • It prevents write sizes >4k.

    Signed-off-by: Joern Engel

    Joern Engel
     
  • During GC LogFS has to rewrite each valid block to a separate segment.
    Rewrite operation reads data from an old segment and writes it to a
    newly allocated segment. Since every write operation changes data
    block pointers maintained in inode, inode should also be rewritten.

    In GC path to avoid AB-BA deadlock LogFS marks a page with
    PG_pre_locked in addition to locking the page (PG_locked). The page
    lock is ignored iff the page is pre-locked.

    LogFS uses a special file called segment file. The segment file
    maintains an 8 bytes entry for every segment. It keeps track of erase
    count, level etc. for every segment.

    Bad things happen with a segment belonging to the segment file is GCed

    ------------[ cut here ]------------
    kernel BUG at /home/prasad/logfs/readwrite.c:297!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4
    serio_raw [last unloaded: logfs]
    Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH
    VirtualBox
    EIP: 0060:[] EFLAGS: 00010292 CPU: 0
    EIP is at logfs_lock_write_page+0x6a/0x70 [logfs]
    EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094
    ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000)
    Stack:
    f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4
    00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5
    00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900
    Call Trace:
    [] logfs_get_write_page.clone.16+0xdd/0x100 [logfs]
    [] logfs_mod_segment_entry+0x55/0x110 [logfs]
    [] logfs_get_segment_entry+0x1d/0x20 [logfs]
    [] ? logfs_cleanup_journal+0x50/0x50 [logfs]
    [] ostore_get_erase_count+0x1b/0x40 [logfs]
    [] logfs_open_area+0xc8/0x150 [logfs]
    [] ? kmemleak_alloc+0x2c/0x60
    [] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs]
    [] ? mempool_kmalloc+0x13/0x20
    [] ? mempool_kmalloc+0x13/0x20
    [] logfs_segment_write+0x17f/0x1d0 [logfs]
    [] logfs_write_i0+0x11c/0x180 [logfs]
    [] logfs_write_direct+0x45/0x90 [logfs]
    [] __logfs_write_buf+0xbd/0xf0 [logfs]
    [] ? kmap_atomic_prot+0x4e/0xe0
    [] logfs_write_buf+0x3b/0x60 [logfs]
    [] __logfs_write_inode+0xa9/0x110 [logfs]
    [] logfs_rewrite_block+0xc0/0x110 [logfs]
    [] ? get_mapping_page+0x10/0x60 [logfs]
    [] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs]
    [] logfs_gc_segment+0x2ad/0x310 [logfs]
    [] __logfs_gc_once+0x4a/0x80 [logfs]
    [] logfs_gc_pass+0x683/0x6a0 [logfs]
    [] logfs_mount+0x5a9/0x680 [logfs]
    [] mount_fs+0x21/0xd0
    [] ? __alloc_percpu+0xf/0x20
    [] ? alloc_vfsmnt+0xb1/0x130
    [] vfs_kern_mount+0x4b/0xa0
    [] do_kern_mount+0x3e/0xe0
    [] do_mount+0x34d/0x670
    [] ? strndup_user+0x49/0x70
    [] sys_mount+0x6b/0xa0
    [] syscall_call+0x7/0xb
    Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09
    EIP: [] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18
    ---[ end trace 96e67d5b3aa3d6ca ]---

    The patch passes locked page to __logfs_write_inode. It calls function
    logfs_get_wblocks() to pre-lock the page. This ensures any further
    attempts to lock the page are ignored (esp from get_erase_count).

    Acked-by: Joern Engel
    Signed-off-by: Prasad Joshi

    Prasad Joshi
     
  • While unmounting the file system LogFS calls generic_shutdown_super.
    The function does file system independent superblock shutdown.
    However, it might result in call file system specific inode eviction.

    LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in
    super->s_flags. Since, inode eviction might call truncate on inode,
    following BUG is observed when file system is unmounted:

    ------------[ cut here ]------------
    kernel BUG at /home/prasad/logfs/segment.c:362!
    invalid opcode: 0000 [#1] PREEMPT SMP
    CPU 3
    Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp
    parport psmouse floppy virtio_pci serio_raw virtio_ring virtio

    Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs
    RIP: 0010:[] []
    logfs_segment_write+0x211/0x230 [logfs]
    RSP: 0018:ffff880062d7b9e8 EFLAGS: 00010202
    RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000
    RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000
    RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000
    R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40
    R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000
    FS: 00007f25d50ea760(0000) GS:ffff88007fd80000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process umount (pid: 1933, threadinfo ffff880062d7a000,
    task ffff880070b44500)
    Stack:
    ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000
    8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460
    ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000
    Call Trace:
    [] logfs_write_i0+0x12e/0x190 [logfs]
    [] __logfs_write_rec+0x140/0x220 [logfs]
    [] __logfs_write_rec+0xf2/0x220 [logfs]
    [] logfs_write_rec+0x64/0xd0 [logfs]
    [] __logfs_write_buf+0x106/0x110 [logfs]
    [] logfs_write_buf+0x4e/0x80 [logfs]
    [] __logfs_write_inode+0x98/0x110 [logfs]
    [] logfs_truncate+0x54/0x290 [logfs]
    [] logfs_evict_inode+0xdc/0x190 [logfs]
    [] evict+0x85/0x170
    [] iput+0xe6/0x1b0
    [] shrink_dcache_for_umount_subtree+0x218/0x280
    [] shrink_dcache_for_umount+0x51/0x90
    [] generic_shutdown_super+0x2c/0x100
    [] logfs_kill_sb+0x57/0xf0 [logfs]
    [] deactivate_locked_super+0x45/0x70
    [] deactivate_super+0x4a/0x70
    [] mntput_no_expire+0xa4/0xf0
    [] sys_umount+0x6f/0x380
    [] system_call_fastpath+0x16/0x1b
    Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55
    b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff
    0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66
    RIP [] logfs_segment_write+0x211/0x230 [logfs]
    RSP
    ---[ end trace fe6b040cea952290 ]---

    Therefore, move super->s_flags setting after the fs-indenpendent work
    has been finished.

    Reviewed-by: Joern Engel
    Signed-off-by: Prasad Joshi

    Prasad Joshi
     
  • LogFS uses super->s_write_mutex while writing data to disk. Taking the
    same mutex lock in sync and fsync code path solves the following BUG:

    ------------[ cut here ]------------
    kernel BUG at /home/prasad/logfs/dev_bdev.c:134!

    Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs
    RIP: 0010:[] []
    bdev_writeseg+0x25d/0x270 [logfs]
    Call Trace:
    [] logfs_open_area+0x91/0x150 [logfs]
    [] ? find_level.clone.9+0x62/0x100
    [] __logfs_segment_write.clone.20+0x5c/0x190 [logfs]
    [] ? mempool_kmalloc+0x15/0x20
    [] ? mempool_alloc+0x53/0x130
    [] logfs_segment_write+0x1d4/0x230 [logfs]
    [] logfs_write_i0+0x12e/0x190 [logfs]
    [] __logfs_write_rec+0x140/0x220 [logfs]
    [] logfs_write_rec+0x64/0xd0 [logfs]
    [] __logfs_write_buf+0x106/0x110 [logfs]
    [] logfs_write_buf+0x4e/0x80 [logfs]
    [] __logfs_writepage+0x23/0x80 [logfs]
    [] logfs_writepage+0xdc/0x110 [logfs]
    [] __writepage+0x17/0x40
    [] write_cache_pages+0x208/0x4f0
    [] ? set_page_dirty+0x70/0x70
    [] generic_writepages+0x4a/0x70
    [] do_writepages+0x21/0x40
    [] writeback_single_inode+0x101/0x250
    [] writeback_sb_inodes+0xed/0x1c0
    [] writeback_inodes_wb+0x7b/0x1e0
    [] wb_writeback+0x4c3/0x530
    [] ? sub_preempt_count+0x9d/0xd0
    [] wb_do_writeback+0xdb/0x290
    [] ? sub_preempt_count+0x9d/0xd0
    [] ? _raw_spin_unlock_irqrestore+0x18/0x40
    [] ? del_timer+0x8a/0x120
    [] bdi_writeback_thread+0x8c/0x2e0
    [] ? wb_do_writeback+0x290/0x290
    [] kthread+0x96/0xa0
    [] kernel_thread_helper+0x4/0x10
    [] ? kthread_worker_fn+0x190/0x190
    [] ? gs_change+0xb/0xb
    RIP [] bdev_writeseg+0x25d/0x270 [logfs]
    ---[ end trace 0211ad60a57657c4 ]---

    Reviewed-by: Joern Engel
    Signed-off-by: Prasad Joshi

    Prasad Joshi
     
  • This is a bad one. I wonder whether we were so far protected by
    no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS.

    Found by Dan Carpenter using smatch.

    Signed-off-by: Joern Engel
    Signed-off-by: Prasad Joshi

    Joern Engel
     
  • LogFS sets PG_private flag to indicate a pined page. We assumed that
    marking a page as private is enough to ensure its existence. But
    instead it is necessary to hold a reference count to the page.

    The change resolves the following BUG

    BUG: Bad page state in process flush-253:16 pfn:6a6d0
    page flags: 0x100000000000808(uptodate|private)

    Suggested-and-Acked-by: Joern Engel
    Signed-off-by: Prasad Joshi

    Prasad Joshi
     

27 Jan, 2012

11 commits

  • Josef fixed btrfs_page_mkwrite to properly release reserved
    extents if there was an error. But if we fail to get a reservation
    and we fail to dirty the inode (for ENOSPC reasons), we'll end up
    trying to release a reservation we never had.

    This makes sure we only release if we were able to reserve.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • If we span a long area in a bitmap we could end up taking a lot of time
    searching to the next free area if we're searching from the original
    window_start, so advance window_start in order to make sure we don't do any
    superficial searching. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • btree_releasepage is a callback and can be passed unknown gfp flags and then
    they may end up in kmem_cache_alloc called from alloc_extent_state, slab
    allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.

    This may happen when btrfs is mounted from a loop device, which masks out
    __GFP_IO flag. The check in try_release_extent_state

    3399 if ((mask & GFP_NOFS) == GFP_NOFS)
    3400 mask = GFP_NOFS;

    will not work and passes unfiltered flags further resulting in crash at
    mm/slab.c:2963

    [] cache_alloc_refill+0x3b4/0x5c8
    [] kmem_cache_alloc+0x204/0x294
    [] mempool_alloc+0x52/0x170
    [] alloc_extent_state+0x40/0xd4 [btrfs]
    [] __clear_extent_bit+0x38a/0x4cc [btrfs]
    [] try_release_extent_state+0x9c/0xd4 [btrfs]
    [] btree_releasepage+0x7e/0xd0 [btrfs]
    [] shrink_page_list+0x6a0/0x724
    [] shrink_inactive_list+0x230/0x578
    [] shrink_list+0x6c/0x120
    [] shrink_zone+0x1e2/0x228
    [] shrink_zones+0x90/0x254
    [] do_try_to_free_pages+0xac/0x420
    [] try_to_free_pages+0x13c/0x1b0
    [] __alloc_pages_nodemask+0x5b4/0x9a8
    [] grab_cache_page_write_begin+0x7e/0xe8

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • When we did sysbench test for inline files, enospc error happened easily though
    there was lots of free disk space which could be allocated for new chunks.

    Reproduce steps:
    # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024))
    # mount /mnt
    # ulimit -n 102400
    # cd /mnt
    # sysbench --num-threads=1 --test=fileio --file-num=81920 \
    > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
    > --file-test-mode=seqwr prepare
    # sysbench --num-threads=1 --test=fileio --file-num=81920 \
    > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
    > --file-test-mode=seqwr run

    The reason of this bug is:
    Now, we can reserve space which is larger than the free space in the chunks if
    we have enough free disk space which can be used for new chunks. By this way,
    the space allocator should allocate a new chunk by force if there is no free
    space in the free space cache. But there are two wrong checks which break this
    operation.

    One is
    if (ret == -ENOSPC && num_bytes > min_alloc_size)
    in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
    even we fail to allocate free space by minimum allocable size.

    The other is
    if (space_info->force_alloc)
    force = space_info->force_alloc;
    in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
    sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.

    Fix these two wrong checks. Especially the second one, we fix it by changing
    the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
    CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
    higher priority. And if the value which is passed in by the caller is greater
    than ->force_alloc, use the passed value.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • xfstests 218 complains that btrfs defrags a file partially:
    After: 1
    Write backwards sync, but contiguous - should defrag to 1 extent
    Before: 10
    -After: 1
    +After: 2

    To fix this, we need to set max_to_defrag count properly.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • There have been 4 warnings on 32-bit build, they are herewith fixed.

    Signed-off-by: Stefan Behrens
    Signed-off-by: Chris Mason

    Stefan Behrens
     
  • We specifically set window_start in the cluster struct to indicate where the
    cluster starts in a bitmap, but we've been using min_start to indicate where
    we're searching from. This is usually the start of the blockgroup, so
    essentially means we're constantly searching from the start of any bitmap we
    find, which completely negates all the trouble we go to in order to setup a
    cluster. So start using window_start to make sure we actually use the area we
    found. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • A user has encountered a NULL pointer kernel oops in btrfs when
    encountering media errors. The problem has been identified
    as an unhandled NULL pointer returned from find_get_page().
    This modification simply checks for a NULL page, and returns
    with an error if found (the extent_range_uptodate() function
    returns 1 on errors).

    After testing this patch, the user reported that the error with
    the NULL pointer oops was solved. However, there is still a
    remaining problem with a thread becoming stuck in
    wait_on_page_locked(page) in the read_extent_buffer_pages(...)
    function in extent_io.c

    for (i = start_i; i < num_pages; i++) {
    page = extent_buffer_page(eb, i);
    wait_on_page_locked(page);
    if (!PageUptodate(page))
    ret = -EIO;
    }

    This patch leaves the issue with the locked page yet to be resolved.

    Signed-off-by: Mitch Harder
    Signed-off-by: Chris Mason

    Mitch Harder
     
  • wait_log_commit() and wait_for_writer() were using slightly different
    conditions for deciding whether they should call schedule() and whether they
    should continue in the wait loop. Thus it could happen that we busylooped when
    the first condition was not true while the second one was. That is burning CPU
    cycles needlessly and is deadly on UP machines...

    Signed-off-by: Jan Kara
    Signed-off-by: Chris Mason

    Jan Kara
     
  • We have only been checking for min_bytes available in bitmap entries, but we
    won't successfully setup a bitmap cluster unless it has at least bytes in the
    bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
    if there are a bunch of bitmap entries with less than 2mb's in them, we'll
    search all them anyway, which is suboptimal. Fix this check. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Added initialization with the declaration of ret. It isn't set later on the
    switch-default branch (which should never be taken).

    Signed-off-by: Jan Schmidt
    Signed-off-by: Chris Mason

    Jan Schmidt
     

26 Jan, 2012

1 commit

  • Quoth Ben Myers:
    "Please pull in the following bugfix for xfs. We forgot to drop a lock on
    error in xfs_readlink. It hasn't been through -next yet, but there is no
    -next tree tomorrow. The fix is clear so I'm sending this request today."

    * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()

    Linus Torvalds