19 Mar, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
    doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
    Update cpuset info & webiste for cgroups
    dcdbas: force SMI to happen when expected
    arch/arm/Kconfig: remove one to many l's in the word.
    asm-generic/user.h: Fix spelling in comment
    drm: fix printk typo 'sracth'
    Remove one to many n's in a word
    Documentation/filesystems/romfs.txt: fixing link to genromfs
    drivers:scsi Change printk typo initate -> initiate
    serial, pch uart: Remove duplicate inclusion of linux/pci.h header
    fs/eventpoll.c: fix spelling
    mm: Fix out-of-date comments which refers non-existent functions
    drm: Fix printk typo 'failled'
    coh901318.c: Change initate to initiate.
    mbox-db5500.c Change initate to initiate.
    edac: correct i82975x error-info reported
    edac: correct i82975x mci initialisation
    edac: correct commented info
    fs: update comments to point correct document
    target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
    ...

    Trivial conflict in fs/eventpoll.c (spelling vs addition)

    Linus Torvalds
     

17 Mar, 2011

1 commit

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
    AppArmor: kill unused macros in lsm.c
    AppArmor: cleanup generated files correctly
    KEYS: Add an iovec version of KEYCTL_INSTANTIATE
    KEYS: Add a new keyctl op to reject a key with a specified error code
    KEYS: Add a key type op to permit the key description to be vetted
    KEYS: Add an RCU payload dereference macro
    AppArmor: Cleanup make file to remove cruft and make it easier to read
    SELinux: implement the new sb_remount LSM hook
    LSM: Pass -o remount options to the LSM
    SELinux: Compute SID for the newly created socket
    SELinux: Socket retains creator role and MLS attribute
    SELinux: Auto-generate security_is_socket_class
    TOMOYO: Fix memory leak upon file open.
    Revert "selinux: simplify ioctl checking"
    selinux: drop unused packet flow permissions
    selinux: Fix packet forwarding checks on postrouting
    selinux: Fix wrong checks for selinux_policycap_netpeer
    selinux: Fix check for xfrm selinux context algorithm
    ima: remove unnecessary call to ima_must_measure
    IMA: remove IMA imbalance checking
    ...

    Linus Torvalds
     

16 Mar, 2011

1 commit

  • * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: fix build failure introduced by s/freezeable/freezable/
    workqueue: add system_freezeable_wq
    rds/ib: use system_wq instead of rds_ib_fmr_wq
    net/9p: replace p9_poll_task with a work
    net/9p: use system_wq instead of p9_mux_wq
    xfs: convert to alloc_workqueue()
    reiserfs: make commit_wq use the default concurrency level
    ocfs2: use system_wq instead of ocfs2_quota_wq
    ext4: convert to alloc_workqueue()
    scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path
    scsi/be2iscsi,qla2xxx: convert to alloc_workqueue()
    misc/iwmc3200top: use system_wq instead of dedicated workqueues
    i2o: use alloc_workqueue() instead of create_workqueue()
    acpi: kacpi*_wq don't need WQ_MEM_RECLAIM
    fs/aio: aio_wq isn't used in memory reclaim path
    input/tps6507x-ts: use system_wq instead of dedicated workqueue
    cpufreq: use system_wq instead of dedicated workqueues
    wireless/ipw2x00: use system_wq instead of dedicated workqueues
    arm/omap: use system_wq in mailbox
    workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER

    Linus Torvalds
     

15 Mar, 2011

2 commits


08 Mar, 2011

1 commit


21 Feb, 2011

1 commit


15 Feb, 2011

2 commits


12 Feb, 2011

2 commits

  • ext4 has a data corruption case when doing non-block-aligned
    asynchronous direct IO into a sparse file, as demonstrated
    by xfstest 240.

    The root cause is that while ext4 preallocates space in the
    hole, mappings of that space still look "new" and
    dio_zero_block() will zero out the unwritten portions. When
    more than one AIO thread is going, they both find this "new"
    block and race to zero out their portion; this is uncoordinated
    and causes data corruption.

    Dave Chinner fixed this for xfs by simply serializing all
    unaligned asynchronous direct IO. I've done the same here.
    The difference is that we only wait on conversions, not all IO.
    This is a very big hammer, and I'm not very pleased with
    stuffing this into ext4_file_write(). But since ext4 is
    DIO_LOCKING, we need to serialize it at this high level.

    I tried to move this into ext4_ext_direct_IO, but by then
    we have the i_mutex already, and we will wait on the
    work queue to do conversions - which must also take the
    i_mutex. So that won't work.

    This was originally exposed by qemu-kvm installing to
    a raw disk image with a normal sector-63 alignment. I've
    tested a backport of this patch with qemu, and it does
    avoid the corruption. It is also quite a lot slower
    (14 min for package installs, vs. 8 min for well-aligned)
    but I'll take slow correctness over fast corruption any day.

    Mingming suggested that we can track outstanding
    conversions, and wait on those so that non-sparse
    files won't be affected, and I've implemented that here;
    unaligned AIO to nonsparse files won't take a perf hit.

    [tytso@mit.edu: Keep the mutex as a hashed array instead
    of bloating the ext4 inode]

    [tytso@mit.edu: Fix up namespace issues so that global
    variables are protected with an "ext4_" prefix.]

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • In 2.6.37 I was running into oopses with repeated module
    loads & unloads. I tracked this down to:

    fb1813f4 ext4: use dedicated slab caches for group_info structures

    (this was in addition to the features advert unload problem)

    The kstrdup & subsequent kfree of the cache name was causing
    a double free. In slub, at least, if I read it right it allocates
    & frees the name itself, slab seems to do something different...
    so in slub I think we were leaking -our- cachep->name, and double
    freeing the one allocated by slub.

    After getting lost in slab/slub/slob a bit, I just looked at other
    sized-caches that get allocated. jbd2, biovec, sgpool all do it
    more or less the way jbd2 does. Below patch follows the jbd2
    method of dynamically allocating a cache at mount time from
    a list of static names.

    (This might also possibly fix a race creating the caches with
    parallel mounts running).

    [Folded in a fix from Dan Carpenter which fixed an off-by-one error in
    the original patch]

    Cc: stable@kernel.org
    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     

08 Feb, 2011

1 commit

  • This fixes a corruption problem with the multi-block
    writepages submittal change for ext4, from commit
    bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc ("ext4: use bio
    layer instead of buffer layer in mpage_da_submit_io").

    (Note that this corruption is not present in 2.6.37 on
    ext4, because the corruption was detected after the
    feature was merged in 2.6.37-rc1, and so it was turned
    off by adding a non-default mount option,
    mblk_io_submit. With this commit, which hopefully
    fixes the last of the bugs with this feature, we'll be
    able to turn on this performance feature by default in
    2.6.38, and remove the mblk_io_submit option.)

    The ext4 code path to bundle multiple pages for
    writeback in ext4_bio_write_page() had a bug: we should
    be clearing buffer head dirty flags *before* we submit
    the bio, not in the completion routine.

    The patch below was tested on 2.6.37 under KVM with the
    postgresql script which was submitted by Jon Nelson as
    documented in commit 1449032be1.

    Without the patch, I'd hit the corruption problem about
    50-70% of the time. With the patch, I executed the
    script > 100 times with no corruption seen.

    I also fixed a bug to make sure ext4_end_bio() doesn't
    dereference the bio after the bio_put() call.

    Reported-by: Jon Nelson
    Reported-by: Matthias Bayer
    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Curt Wohlgemuth
     

04 Feb, 2011

3 commits

  • Make sure we the correct cleanup happens if we die while trying to
    load the ext4 file system.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Ext4 features interface was not properly unregistered which led to
    problems while unloading/reloading ext4 module. This commit fixes that by
    adding proper kobject unregistration code into ext4_exit_fs() as well as
    fail-path of ext4_init_fs()

    Reported-by: Eric Sandeen
    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Lukas Czerner
     
  • https://bugzilla.kernel.org/show_bug.cgi?id=27652

    If the lazyinit thread is running, the teardown function
    ext4_destroy_lazyinit_thread() has problems:

    ext4_clear_request_list();
    while (ext4_li_info->li_task) {
    wake_up(&ext4_li_info->li_wait_daemon);
    wait_event(ext4_li_info->li_wait_task,
    ext4_li_info->li_task == NULL);
    }

    Clearing the request list will cause the thread to exit and free
    ext4_li_info, so then we're waiting on something which is getting
    freed.

    Fix this up by making the thread respond to kthread_stop, and exit,
    without the need to wait for that exit in some other homegrown way.

    Cc: stable@kernel.org
    Reported-and-Tested-by: Tao Ma
    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     

02 Feb, 2011

1 commit

  • SELinux would like to implement a new labeling behavior of newly created
    inodes. We currently label new inodes based on the parent and the creating
    process. This new behavior would also take into account the name of the
    new object when deciding the new label. This is not the (supposed) full path,
    just the last component of the path.

    This is very useful because creating /etc/shadow is different than creating
    /etc/passwd but the kernel hooks are unable to differentiate these
    operations. We currently require that userspace realize it is doing some
    difficult operation like that and than userspace jumps through SELinux hoops
    to get things set up correctly. This patch does not implement new
    behavior, that is obviously contained in a seperate SELinux patch, but it
    does pass the needed name down to the correct LSM hook. If no such name
    exists it is fine to pass NULL.

    Signed-off-by: Eric Paris

    Eric Paris
     

01 Feb, 2011

1 commit


21 Jan, 2011

2 commits


17 Jan, 2011

2 commits

  • Currently all filesystems except XFS implement fallocate asynchronously,
    while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
    I/O we really want our allocation on disk, especially for the !KEEP_SIZE
    case where we actually grow the file with user-visible zeroes. On the
    other hand always commiting the transaction is a bad idea for fast-path
    uses of fallocate like for example in recent Samba versions. Given
    that block allocation is a data plane operation anyway change it from
    an inode operation to a file operation so that we have the file structure
    available that lets us check for O_SYNC.

    This also includes moving the code around for a few of the filesystems,
    and remove the already unnedded S_ISDIR checks given that we only wire
    up fallocate for regular files.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of various home grown checks that might need updates for new
    flags just check for any bit outside the mask of the features supported
    by the filesystem. This makes the check future proof for any newly
    added flag.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

14 Jan, 2011

4 commits

  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits)
    fs: add documentation on fallocate hole punching
    Gfs2: fail if we try to use hole punch
    Btrfs: fail if we try to use hole punch
    Ext4: fail if we try to use hole punch
    Ocfs2: handle hole punching via fallocate properly
    XFS: handle hole punching via fallocate properly
    fs: add hole punching to fallocate
    vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
    fix signedness mess in rw_verify_area() on 64bit architectures
    fs: fix kernel-doc for dcache::prepend_path
    fs: fix kernel-doc for dcache::d_validate
    sanitize ecryptfs ->mount()
    switch afs
    move internal-only parts of ncpfs headers to fs/ncpfs
    switch ncpfs
    switch 9p
    pass default dentry_operations to mount_pseudo()
    switch hostfs
    switch affs
    switch configfs
    ...

    Linus Torvalds
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     
  • pr_warning_ratelimited() doesn't exist.

    Also include printk.h, which defines these things.

    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

13 Jan, 2011

2 commits

  • Ext4 doesn't have the ability to punch holes yet, so make sure we return
    EOPNOTSUPP if we try to use hole punching through fallocate. This support can
    be added later. Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     
  • As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
    is prone to deadlocks. We hold s_umount semaphore for reading during the
    path resolution and resolution itself may need to acquire the semaphore
    for writing when e. g. autofs mountpoint is passed.

    Solve the problem by performing the resolution before we get hold of the
    superblock (and thus s_umount semaphore). The whole thing is complicated
    by the fact that some filesystems (OCFS2) ignore the path argument. So to
    distinguish between filesystem which want the path and which do not we
    introduce new .quota_on_meta callback which does not get the path. OCFS2
    then uses this callback instead of old .quota_on.

    CC: Al Viro
    CC: Christoph Hellwig
    CC: Ted Ts'o
    CC: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     

12 Jan, 2011

3 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
    ext4: fix trimming starting with block 0 with small blocksize
    ext4: revert buggy trim overflow patch
    ext4: don't pass entire map to check_eofblocks_fl
    ext4: fix memory leak in ext4_free_branches
    ext4: remove ext4_mb_return_to_preallocation()
    ext4: flush the i_completed_io_list during ext4_truncate
    ext4: add error checking to calls to ext4_handle_dirty_metadata()
    ext4: fix trimming of a single group
    ext4: fix uninitialized variable in ext4_register_li_request
    ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary
    ext4: drop i_state_flags on architectures with 64-bit longs
    ext4: reorder ext4_inode_info structure elements to remove unneeded padding
    ext4: drop ec_type from the ext4_ext_cache structure
    ext4: use ext4_lblk_t instead of sector_t for logical blocks
    ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED
    ext4: fix 32bit overflow in ext4_ext_find_goal()
    ext4: add more error checks to ext4_mkdir()
    ext4: ext4_ext_migrate should use NULL not 0
    ext4: Use ext4_error_file() to print the pathname to the corrupted inode
    ext4: use IS_ERR() to check for errors in ext4_error_file
    ...

    Linus Torvalds
     
  • When s_first_data_block is not zero (which happens e.g. when block size is 1KB)
    and trim ioctl is called to start trimming from block 0, the math in
    ext4_get_group_no_and_offset() overflows. The overall result is that ioctl
    returns EINVAL which is kind of unexpected and we probably don't want
    userspace tools to bother with internal details of filesystem structure.
    So just silently increase starting offset (and shorten length) when starting
    block is below s_first_data_block.

    CC: Lukas Czerner
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • This reverts commit 4f531501e44: ext4: fix possible overflow in
    ext4_trim_fs()

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

11 Jan, 2011

10 commits

  • Since check_eofblocks_fl() only uses the m_lblk portion of the map
    structure, we may as well pass that directly, rather than passing the
    entire map, which IMHO obfuscates what parameters check_eofblocks_fl()
    cares about. Not a big deal, but seems tidier and less confusing, to
    me.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Commit 40389687 moved a call to ext4_forget() out of
    ext4_free_branches and let ext4_free_blocks() handle calling
    bforget(). But that change unfortunately did not replace the call to
    ext4_forget() with brelse(), which was needed to drop the in-use count
    of the indirect block's buffer head, which lead to a memory leak when
    deleting files that used indirect blocks. Fix this.

    Thanks to Hugh Dickins for pointing this out.

    Cc: stable@kernel.org
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • This function was never implemented, except for a BUG_ON which was
    tripping when ext4 is run without a journal. The problem is that
    although the comment asserts that "truncate (which is the only way to
    free block) discards all preallocations", ext4_free_blocks() is also
    called in various error recovery paths when blocks have been
    allocated, but for various reasons, we were not able to use those data
    blocks (for example, because we ran out of memory while trying to
    manipulate the extent tree, or some other similar situation).

    In addition to the fact that this function isn't implemented except
    for the incorrect BUG_ON, the single caller of this function,
    ext4_free_blocks(), doesn't use it all if the journal is enabled.

    So remove the (stub) function entirely for now. If we decide it's
    better to add it back, it's only going to be useful with a relatively
    large number of code changes anyway.

    Google-Bug-Id: 3236408

    Cc: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Ted first found the bug when running 2.6.36 kernel with dioread_nolock
    mount option that xfstests #13 complained about wrong file size during fsck.
    However, the bug exists in the older kernels as well although it is
    somehow harder to trigger.

    The problem is that ext4_end_io_work() can happen after we have truncated an
    inode to a smaller size. Then when ext4_end_io_work() calls
    ext4_convert_unwritten_extents(), we may reallocate some blocks that have
    been truncated, so the inode size becomes inconsistent with the allocated
    blocks.

    The following patch flushes the i_completed_io_list during truncate to reduce
    the risk that some pending end_io requests are executed later and convert
    already truncated blocks to initialized.

    Note that although the fix helps reduce the problem a lot there may still
    be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
    problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
    held, it can race with an ongoing write request so that the io_end request is
    processed later when the corresponding blocks have been truncated.

    Ted and I have discussed the problem offline and we saw a few ways to fix
    the race completely:

    a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold
    whenever vmtruncate() is called. The i_mutex lock prevents any new write
    requests from entering writeback and the i_alloc_sem prevents the race
    from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
    is called from do_truncate(), which is probably the most common case.
    However, there are places where we may call vmtruncate() without holding
    either i_mutex or i_alloc_sem. I would like to ask for other people's
    opinions on what locks are expected to be held before calling vmtruncate().
    There seems a disagreement among the callers of that function.

    b) We change the ext4 write path so that we change the extent tree to contain
    the newly allocated blocks and update i_size both at the same time --- when
    the write of the data blocks is completed.

    c) We add some additional locking to synchronize vmtruncate() and
    ext4_end_io_work(). This approach may have performance implications so we
    need to be careful.

    All of the above proposals may require more substantial changes, so
    we may consider to take the following patch as a bandaid.

    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"

    Jiaying Zhang
     
  • Call ext4_std_error() in various places when we can't bail out
    cleanly, so the file system can be marked as in error.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • When ext4_trim_fs() is called to trim a part of a single group, the
    logic will wrongly set last block of the interval to 'len' instead
    of 'first_block + len'. Thus a shorter interval is possibly trimmed.
    Fix it.

    CC: Lukas Czerner
    Cc: stable@kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • fs/ext4/super.c: In function 'ext4_register_li_request':
    fs/ext4/super.c:2936: warning: 'ret' may be used uninitialized in this function

    It looks buggy to me, too.

    Cc: Lukas Czerner
    Cc: stable@kernel.org
    Signed-off-by: Andrew Morton
    Signed-off-by: "Theodore Ts'o"

    Andrew Morton
     
  • Replace the jbd2_inode structure (which is 48 bytes) with a pointer
    and only allocate the jbd2_inode when it is needed --- that is, when
    the file system has a journal present and the inode has been opened
    for writing. This allows us to further slim down the ext4_inode_info
    structure.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • We can store the dynamic inode state flags in the high bits of
    EXT4_I(inode)->i_flags, and eliminate i_state_flags. This saves 8
    bytes from the size of ext4_inode_info structure, which when
    multiplied by the number of the number of in the inode cache, can save
    a lot of memory.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • By reordering the elements in the ext4_inode_info structure, we can
    reduce the padding needed on an x86_64 system by 16 bytes.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o