11 Jan, 2012

2 commits

  • Conflicts:
    fs/ext4/ioctl.c

    Theodore Ts'o
     
  • Commit 503358ae01b70ce6909d19dd01287093f6b6271c ("ext4: avoid divide by
    zero when trying to mount a corrupted file system") fixes CVE-2009-4307
    by performing a sanity check on s_log_groups_per_flex, since it can be
    set to a bogus value by an attacker.

    sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
    groups_per_flex = 1 << sbi->s_log_groups_per_flex;

    if (groups_per_flex < 2) { ... }

    This patch fixes two potential issues in the previous commit.

    1) The sanity check might only work on architectures like PowerPC.
    On x86, 5 bits are used for the shifting amount. That means, given a
    large s_log_groups_per_flex value like 36, groups_per_flex = 1 << 36
    is essentially 1 << 4 = 16, rather than 0. This will bypass the check,
    leaving s_log_groups_per_flex and groups_per_flex inconsistent.

    2) The sanity check relies on undefined behavior, i.e., oversized shift.
    A standard-confirming C compiler could rewrite the check in unexpected
    ways. Consider the following equivalent form, assuming groups_per_flex
    is unsigned for simplicity.

    groups_per_flex = 1 << sbi->s_log_groups_per_flex;
    if (groups_per_flex == 0 || groups_per_flex == 1) {

    We compile the code snippet using Clang 3.0 and GCC 4.6. Clang will
    completely optimize away the check groups_per_flex == 0, leaving the
    patched code as vulnerable as the original. GCC keeps the check, but
    there is no guarantee that future versions will do the same.

    Signed-off-by: Xi Wang
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Xi Wang
     

10 Jan, 2012

1 commit

  • a) leaking root dentry is bad
    b) in case of failed ext4_mb_init() we don't want to do ext4_mb_release()
    c) OTOH, in the same case we *do* want ext4_ext_release()

    Signed-off-by: Al Viro

    Al Viro
     

09 Jan, 2012

1 commit

  • * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
    PM / Hibernate: Implement compat_ioctl for /dev/snapshot
    PM / Freezer: fix return value of freezable_schedule_timeout_killable()
    PM / shmobile: Allow the A4R domain to be turned off at run time
    PM / input / touchscreen: Make st1232 use device PM QoS constraints
    PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
    PM / shmobile: Remove the stay_on flag from SH7372's PM domains
    PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
    PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
    PM: Drop generic_subsys_pm_ops
    PM / Sleep: Remove forward-only callbacks from AMBA bus type
    PM / Sleep: Remove forward-only callbacks from platform bus type
    PM: Run the driver callback directly if the subsystem one is not there
    PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
    PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
    PM / Sleep: Merge internal functions in generic_ops.c
    PM / Sleep: Simplify generic system suspend callbacks
    PM / Hibernate: Remove deprecated hibernation snapshot ioctls
    PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
    ARM: S3C64XX: Implement basic power domain support
    PM / shmobile: Use common always on power domain governor
    ...

    Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
    XBT_FORCE_SLEEP bit

    Linus Torvalds
     

07 Jan, 2012

2 commits


05 Jan, 2012

1 commit


04 Jan, 2012

1 commit

  • Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
    it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
    the cost of taking it into inode_init_always() will be negligible for pipes
    and sockets and negative for everything else. Not to mention the removal of
    boilerplate code from ->destroy_inode() instances...

    Signed-off-by: Al Viro

    Al Viro
     

22 Dec, 2011

1 commit

  • * master: (848 commits)
    SELinux: Fix RCU deref check warning in sel_netport_insert()
    binary_sysctl(): fix memory leak
    mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
    ipmi_watchdog: restore settings when BMC reset
    oom: fix integer overflow of points in oom_badness
    memcg: keep root group unchanged if creation fails
    nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
    nilfs2: unbreak compat ioctl
    cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
    evm: prevent racing during tfm allocation
    evm: key must be set once during initialization
    mmc: vub300: fix type of firmware_rom_wait_states module parameter
    Revert "mmc: enable runtime PM by default"
    mmc: sdhci: remove "state" argument from sdhci_suspend_host
    x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
    IB/qib: Correct sense on freectxts increment and decrement
    RDMA/cma: Verify private data length
    cgroups: fix a css_set not found bug in cgroup_attach_proc
    oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
    Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
    ...

    Conflicts:
    kernel/cgroup_freezer.c

    Rafael J. Wysocki
     

19 Dec, 2011

1 commit


13 Dec, 2011

1 commit


24 Nov, 2011

1 commit

  • * 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: (24 commits)
    freezer: fix wait_event_freezable/__thaw_task races
    freezer: kill unused set_freezable_with_signal()
    dmatest: don't use set_freezable_with_signal()
    usb_storage: don't use set_freezable_with_signal()
    freezer: remove unused @sig_only from freeze_task()
    freezer: use lock_task_sighand() in fake_signal_wake_up()
    freezer: restructure __refrigerator()
    freezer: fix set_freezable[_with_signal]() race
    freezer: remove should_send_signal() and update frozen()
    freezer: remove now unused TIF_FREEZE
    freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
    cgroup_freezer: prepare for removal of TIF_FREEZE
    freezer: clean up freeze_processes() failure path
    freezer: kill PF_FREEZING
    freezer: test freezable conditions while holding freezer_lock
    freezer: make freezing indicate freeze condition in effect
    freezer: use dedicated lock instead of task_lock() + memory barrier
    freezer: don't distinguish nosig tasks on thaw
    freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks
    freezer: rename thaw_process() to __thaw_task() and simplify the implementation
    ...

    Rafael J. Wysocki
     

22 Nov, 2011

1 commit

  • There is no reason to export two functions for entering the
    refrigerator. Calling refrigerator() instead of try_to_freeze()
    doesn't save anything noticeable or removes any race condition.

    * Rename refrigerator() to __refrigerator() and make it return bool
    indicating whether it scheduled out for freezing.

    * Update try_to_freeze() to return bool and relay the return value of
    __refrigerator() if freezing().

    * Convert all refrigerator() users to try_to_freeze().

    * Update documentation accordingly.

    * While at it, add might_sleep() to try_to_freeze().

    Signed-off-by: Tejun Heo
    Cc: Samuel Ortiz
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: Andrew Morton
    Cc: Jan Kara
    Cc: KONISHI Ryusuke
    Cc: Christoph Hellwig

    Tejun Heo
     

07 Nov, 2011

2 commits


27 Oct, 2011

1 commit

  • This patch introduces a fast path in ext4_ext_convert_to_initialized()
    for the case when the conversion can be performed by transferring
    the newly initialized blocks from the uninitialized extent into
    an adjacent initialized extent. Doing so removes the expensive
    invocations of memmove() which occur during extent insertion and
    the subsequent merge.

    In practice this should be the common case for clients performing
    append writes into files pre-allocated via
    fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
    direct IO and when using a suboptimal implementation of memmove()
    (x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
    consumption by 32%.

    Two new trace points are added to ext4_ext_convert_to_initialized()
    to offer visibility into its operations. No exit trace point has
    been added due to the multiplicity of return points. This can be
    revisited once the upstream cleanup is backported.

    Signed-off-by: Eric Gouriou
    Signed-off-by: "Theodore Ts'o"

    Eric Gouriou
     

09 Oct, 2011

2 commits

  • This fixes a bug which was introduced in dd68314ccf3fb. The problem
    came from the test of the return value of proc_mkdir which is always
    false without procfs, and this would initialization of ext4.

    Signed-off-by: Fabrice Jouhaud
    Signed-off-by: "Theodore Ts'o"

    Fabrice Jouhaud
     
  • For a long time now orlov is the default block allocator in the
    ext4. It performs better than the old one and no one seems to claim
    otherwise so we can safely drop it and make oldalloc and orlov mount
    option deprecated.

    This is a part of the effort to reduce number of ext4 options hence the
    test matrix.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

07 Oct, 2011

1 commit


10 Sep, 2011

9 commits


04 Sep, 2011

1 commit

  • If the user explicitly specifies conflicting mount options for
    delalloc or dioread_nolock and data=journal, fail the mount, instead
    of printing a warning and continuing (since many user's won't look at
    dmesg and notice the warning).

    Also, print a single warning that data=journal implies that delayed
    allocation is not on by default (since it's not supported), and
    furthermore that O_DIRECT is not supported. Improve the text in
    Documentation/filesystems/ext4.txt so this is clear there as well.

    Similarly, if the dioread_nolock mount option is specified when the
    file system block size != PAGE_SIZE, fail the mount instead of
    printing a warning message and ignoring the mount option.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

14 Aug, 2011

1 commit

  • Flush inode's i_completed_io_list before calling ext4_io_wait to
    prevent the following deadlock scenario: A page fault happens while
    some process is writing inode A. During page fault,
    shrink_icache_memory is called that in turn evicts another inode
    B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
    that waits for inode B's i_ioend_count to become zero. However, inode
    B's ioend work was queued behind some of inode A's ioend work on the
    same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
    thread on that cpu is processing inode A's ioend work, it tries to
    grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
    still hold before the page fault happened, we enter a deadlock.

    Also moves ext4_flush_completed_IO and ext4_ioend_wait from
    ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
    ext4_evict_inode() is called before ext4_destroy_inode() and in
    ext4_evict_inode(), we may call ext4_truncate() without holding
    i_mutex lock. As a result, there is a race between flush_completed_IO
    that is called from ext4_ext_truncate() and ext4_end_io_work, which
    may cause corruption on an io_end structure. This change moves
    ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
    to ext4_evict_inode() to resolve the race between ext4_truncate() and
    ext4_end_io_work during inode deletion.

    Signed-off-by: Jiaying Zhang
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org

    Jiaying Zhang
     

04 Aug, 2011

1 commit

  • Commit 9933fc0i (ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and
    ext4_kvfree()) intruduced wrappers around k*alloc/vmalloc but introduced
    a typo for ext4_kzalloc() by not using kzalloc() but kmalloc().

    Signed-off-by: Mathias Krause
    Signed-off-by: "Theodore Ts'o"

    Mathias Krause
     

01 Aug, 2011

2 commits


27 Jul, 2011

1 commit

  • Before this patch, parallel resizers are allowed and protected by a
    mutex lock, actually, there is no need to support parallel resizer, so
    this patch prevents parallel resizers by atmoic bit ops, like
    lock_page() and unlock_page() do.

    To do this, the patch removed the mutex lock s_resize_lock from struct
    ext4_sb_info and added a unsigned long field named s_resize_flags
    which inidicates if there is a resizer.

    Signed-off-by: Yongqiang Yang
    Signed-off-by: "Theodore Ts'o"

    Yongqiang Yang
     

18 Jul, 2011

1 commit

  • If the stripe width was set to 1, then this patch will ignore
    that stripe width and ext4 will act as if the stripe width
    were 0 with respect to optimizing allocations.

    Signed-off-by: Dan Ehrenberg
    Signed-off-by: "Theodore Ts'o"

    Dan Ehrenberg
     

11 Jul, 2011

1 commit


06 Jun, 2011

1 commit

  • Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
    in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
    format and fill the tail of file up to its end. We will hit the BUG_ON
    when we write the last block (2^32-1) into the sparse file.

    The root cause of the problem lies in the fact that we specifically set
    s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
    which is 32 bit long. However, we are not storing start and end block
    number, but rather start block number and length in blocks. It means
    that in order to cover extent from 0 to EXT_MAX_BLOCK we need
    EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
    and it does not.

    The only way to fix it without changing the meaning of the struct
    ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
    by one fs block so we can cover the whole extent we can get by the
    on-disk extent format.

    Also in many places EXT_MAX_BLOCK is used as length instead of maximum
    logical block number as the name suggests, it is all a bit messy. So
    this commit renames it to EXT_MAX_BLOCKS and change its usage in some
    places to actually be maximum number of blocks in the extent.

    The bug which this commit fixes can be reproduced as follows:

    dd if=/dev/zero of=/mnt/mp1/file bs= count=1 seek=$((2**32-2))
    sync
    dd if=/dev/zero of=/mnt/mp1/file bs= count=1 seek=$((2**32-1))

    Reported-by: Kazuya Mio
    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"

    Lukas Czerner
     

27 May, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
    xen: cleancache shim to Xen Transcendent Memory
    ocfs2: add cleancache support
    ext4: add cleancache support
    btrfs: add cleancache support
    ext3: add cleancache support
    mm/fs: add hooks to support cleancache
    mm: cleancache core ops functions and config
    fs: add field to superblock to support cleancache
    mm/fs: cleancache documentation

    Fix up trivial conflict in fs/btrfs/extent_io.c due to includes

    Linus Torvalds
     
  • This seventh patch of eight in this cleancache series "opts-in"
    cleancache for ext4. Filesystems must explicitly enable cleancache
    by calling cleancache_init_fs anytime an instance of the filesystem
    is mounted. For ext4, all other cleancache hooks are in
    the VFS layer including the matching cleancache_flush_fs
    hook which must be called on unmount.

    Details and a FAQ can be found in Documentation/vm/cleancache.txt

    [v6-v8: no changes]
    [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
    Signed-off-by: Dan Magenheimer
    Reviewed-by: Jeremy Fitzhardinge
    Reviewed-by: Konrad Rzeszutek Wilk
    Acked-by: Andreas Dilger
    Cc: Ted Ts'o
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Rik Van Riel
    Cc: Jan Beulich
    Cc: Chris Mason
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Nitin Gupta

    Dan Magenheimer
     

25 May, 2011

1 commit

  • Prevent an ext4 filesystem from being mounted multiple times.
    A sequence number is stored on disk and is periodically updated (every 5
    seconds by default) by a mounted filesystem.
    At mount time, we now wait for s_mmp_update_interval seconds to make sure
    that the MMP sequence does not change.
    In case of failure, the nodename, bdevname and the time at which the MMP
    block was last updated is displayed.

    Signed-off-by: Andreas Dilger
    Signed-off-by: Johann Lombardi
    Signed-off-by: "Theodore Ts'o"

    Johann Lombardi