28 Apr, 2008

40 commits

  • Annoying gcc warning:

    fs/fat/inode.c: In function 'fat_fill_super':
    fs/fat/inode.c:1222: warning: comparison is always false due to limited range of data type

    Change it to compare with 4K instead of PAGE_CACHE_SIZE, as suggested
    by OGAWA-san.

    [FAT spec says: logical_sector_size should be 512, 1024, 2048 4096]
    So, at least for now, we limit it to 4096.

    Signed-off-by: Olof Johansson
    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olof Johansson
     
  • I received a complaint that some FAT formated medias (e.g. sd memory cards)
    trigger a "unknown partition table" message even though there is no partition
    table and they work correctly, while in general (when e.g. formated with
    mkdosfs or even Windows Vista) this message is not shown.

    Currently this seems only to happen when the medias get formatted with Windows
    XP (and possibly Win 2000). Then the boot indicator byte contains garbage
    (part of text message) and so do the other parts checked by msdos_paritition
    which then later triggers this message.

    References: novell bug #364365

    Most fat formatted media without partition table contains zeros in the boot
    indication and the other tested bytes and so falls through the checks in
    msdos_partition, leading it to return with 1 (all is fine).

    But some (e.g. WinXP formatted) fat fomated medias don't use boot_ind and so
    the check fails and causes a "unkown partition table" warning eventhough there
    is none and everything would be fine.

    This additional check directly verifies if there is a fat formatted medium
    without a partition table.

    Signed-off-by: Frank Seidel
    Cc: Andreas Dilger
    Acked-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frank Seidel
     
  • The on-disk media specification field in FAT is only 8-bits, so testing for

    Acked-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • __getname() is faster than __get_free_page(). Use it.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • This patch fix the problem that the buffer allocated for convert of unicode to
    utf8 in fat/dir.c is too small.

    And cannot handle filename with 255 asian characters when mounted with utf8
    options.

    Also it fix the filename length limitation checking in vfat/namei.c that the
    filename length should be checked against the number of converted unicode
    characters.

    Not the length before NLS/UTF8 converted.

    Signed-off-by: Keith Mok
    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keith Mok
     
  • On the systems, ftruncate() which expand size for FAT became the cause
    of OOM. The cont_expand_zero() filled all memory with dirty pages,
    and since disk is very slow, limit of page scanning was exceeded, then
    it triggered OOM.

    This adds balance_dirty_pages_ratelimited() to avoid filling memory
    with dirty pages.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • This removes unneeded fat_clusters_flush().

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • Currently, free_clusters is not updated until it is trusted, because
    Windows doesn't update it correctly.

    But if user is using FAT driver of Linux, it updates free_clusters
    correctly. Instead, this updates it even if it's untrusted, so if
    free_clustes is correct, now keep correct value.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • Normally utime(2) checks current process is owner of the file, or it
    has CAP_FOWNER capability. But FAT filesystem doesn't have uid/gid as
    on disk info, so normal check is too unflexible.

    With this option you can relax it.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • Fix fat_setattr() on the case of showexec option. If user specified
    showexec option, inode->i_mode may not have S_IXUGO. This just use
    inode->i_mode to fix it.

    And with this patch, we don't allow chmod() on memory inode, it's just
    bad behaviour. IOW, we allow changing S_IWUGO only which can be stored
    to disk.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • - Rename fat_notify_change() to fat_setattr()
    - check_mode() cleanup
    - Change layout of code

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • FAT doesn't need to check bad inode anymore.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • Quota files cannot have tails because quota_write and quota_read functions do
    not support them. So far when quota files did have tail, we just refused to
    turn quotas on it. Sadly this check has been wrong and so there are now
    plenty installations where quota files don't have NOTAIL flag set and so now
    after fixing the check, they suddently fail to turn quotas on. Since it's
    easy to unpack the tail from kernel, do this from reiserfs_quota_on() which
    solves the problem and is generally nicer to users anyway.

    Signed-off-by: Jan Kara
    Reported-by:
    Cc: Jeff Mahoney
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Call dquot_drop() from reiserfs_dquot_drop() even if we fail to start a
    transaction. Otherwise we never get to dropping references to quota
    structures from the inode and umount will hang indefinitely.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Cc: Chris Mason
    Cc: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • fs/reiserfs/do_balan.c:1467:10: warning: symbol 'ret_val' shadows an earlier one
    fs/reiserfs/do_balan.c:275:6: originally declared here
    fs/reiserfs/do_balan.c:1471:23: warning: symbol 'ih' shadows an earlier one
    fs/reiserfs/do_balan.c:249:67: originally declared here

    Signed-off-by: Harvey Harrison
    Cc: Chris Mason
    Cc: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • fs/reiserfs/journal.c:4319:2: warning: returning void-valued expression

    Signed-off-by: Harvey Harrison
    Cc: Chris Mason
    Cc: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • replace all:
    little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
    expression_in_cpu_byteorder);
    with:
    leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
    generated with semantic patch

    Signed-off-by: Marcin Slusarz
    Cc: Jeff Mahoney
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Slusarz
     
  • Let's use bsize instead.
    fs/udf/namei.c:960:12: warning: symbol 'elen' shadows an earlier one
    fs/udf/namei.c:937:15: originally declared here

    Signed-off-by: Harvey Harrison
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • Signed-off-by: Harvey Harrison
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • remove fs64_add and fs64_sub - they probably weren't ever used because
    their prototypes used u32 instead of __fs64

    Signed-off-by: Marcin Slusarz
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Slusarz
     
  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Cc: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • replace all:
    big/little_endian_variable = cpu_to_[bl]eX([bl]eX_to_cpu(big/little_endian_variable) +
    expression_in_cpu_byteorder);
    with:
    [bl]eX_add_cpu(&big/little_endian_variable, expression_in_cpu_byteorder);
    generated with semantic patch

    Signed-off-by: Marcin Slusarz
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Slusarz
     
  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • When quota is disabled, we should not print 'journaled quota not supported'
    when user tried to mount non-journaled quota. Also fix typo in the message.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • If the block allocator gets blocks out of system zone ext3 calls ext3_error.
    But if the file system is mounted with errors=continue retry block allocation.
    We need to mark the system zone blocks as in use to make sure retry don't
    pick them again

    System zone is the block range mapping block bitmap, inode bitmap and inode
    table.

    [akpm@linux-foundation.org: fix typo in comment]
    Signed-off-by: Aneesh Kumar K.V
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Call dquot_drop() from ext3_dquot_drop() even if we fail to start a
    transaction. Otherwise we never get to dropping references to quota
    structures from the inode and umount will hang indefinitely. Thanks to
    Payphone LIOU for spotting the problem.

    Signed-off-by: Jan Kara
    Cc: Payphone LIOU
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Make ext3 update mtime and ctime of the directory into which we move file even
    if the directory entry already exists.

    Signed-off-by: Jan Kara
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • There are several cases where the running transaction can get buffers added to
    its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers
    counter end up larger than its t_outstanding_credits counter.

    This will cause issues when starting new transactions as while we are logging
    buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes
    negative, we will report that we need less space in the journal than we
    actually need, so transactions will be started even though there may not be
    enough room for them. In the worst case scenario (which admittedly is almost
    impossible to reproduce) this will result in the journal running out of space.

    The fix is to only
    refile buffers from the committing transaction to the running transactions
    BJ_Modified list when b_modified is set on that journal, which is the only way
    to be sure if the running transaction has modified that buffer.

    This patch also fixes an accounting error in journal_forget, it is possible
    that we can call journal_forget on a buffer without having modified it, only
    gotten write access to it, so instead of freeing a credit, we only do so if
    the buffer was modified. The assert will help catch if this problem occurs.
    Without these two patches I could hit this assert within minutes of running
    postmark, with them this issue no longer arises. Thank you,

    Signed-off-by: Josef Bacik
    Cc:
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     
  • Currently at the start of a journal commit we loop through all of the buffers
    on the committing transaction and clear the b_modified flag (the flag that is
    set when a transaction modifies the buffer) under the j_list_lock.

    The problem is that everywhere else this flag is modified only under the jbd
    lock buffer flag, so it will race with a running transaction who could
    potentially set it, and have it unset by the committing transaction.

    This is also a big waste, you can have several thousands of buffers that you
    are clearing the modified flag on when you may not need to. This patch
    removes this code and instead clears the b_modified flag upon entering
    do_get_write_access/journal_get_create_access, so if that transaction does
    indeed use the buffer then it will be accounted for properly, and if it does
    not then we know we didn't use it.

    That will be important for the next patch in this series. Tested thoroughly
    by myself using postmark/iozone/bonnie++.

    Signed-off-by: Josef Bacik
    Cc:
    Acked-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Bacik
     
  • if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
    side-effects to allow a definition of BUG_ON that drops the code completely.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @ disable unlikely @ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (unlikely(E)) { BUG(); }
    + BUG_ON(E);
    )

    @@ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (E) { BUG(); }
    + BUG_ON(E);
    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Julia Lawall
     
  • Check ext3_journal_get_write_access() errors.

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Use ext3_get_group_desc()

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Add missing ext3_journal_stop() in error handling.

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Use ext3_group_first_block_no()

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Make the needlessly global ext3_xattr_list() static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Convert byte order of constant instead of variable which can be done at
    compile time (vs run time).

    Signed-off-by: Marcin Slusarz
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Slusarz
     
  • Currently fdatasync is identical to fsync in ext3.

    I think fdatasync should skip journal flush in data=ordered and
    data=writeback mode when it overwrites to already-instantiated blocks on
    HDD. When I_DIRTY_DATASYNC flag is not set, fdatasync should skip journal
    writeout because this indicates only atime or/and mtime updates.

    Following patch is the same approach of ext2's fsync code(ext2_sync_file).

    I did a performance test using the sysbench.

    #sysbench --num-threads=128 --max-requests=50000 --test=fileio --file-total-size=128G
    --file-test-mode=rndwr --file-fsync-mode=fdatasync run

    The result on ext3 was:

    -2.6.24
    Operations performed: 0 Read, 50080 Write, 59600 Other = 109680 Total
    Read 0b Written 782.5Mb Total transferred 782.5Mb (12.116Mb/sec)
    775.45 Requests/sec executed

    Test execution summary:
    total time: 64.5814s
    total number of events: 50080
    total time taken by event execution: 3713.9836
    per-request statistics:
    min: 0.0000s
    avg: 0.0742s
    max: 0.9375s
    approx. 95 percentile: 0.2901s

    Threads fairness:
    events (avg/stddev): 391.2500/23.26
    execution time (avg/stddev): 29.0155/1.99

    -2.6.24-patched
    Operations performed: 0 Read, 50009 Write, 61596 Other = 111605 Total
    Read 0b Written 781.39Mb Total transferred 781.39Mb (16.419Mb/sec)
    1050.83 Requests/sec executed

    Test execution summary:
    total time: 47.5900s
    total number of events: 50009
    total time taken by event execution: 2934.5768
    per-request statistics:
    min: 0.0000s
    avg: 0.0587s
    max: 0.8938s
    approx. 95 percentile: 0.1993s

    Threads fairness:
    events (avg/stddev): 390.6953/22.64
    execution time (avg/stddev): 22.9264/1.17

    Filesystem I/O throughput was improved.

    Signed-off-by :Hisashi Hifumi
    Acked-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hisashi Hifumi
     
  • If the block allocator gets blocks out of system zone ext2 calls ext2_error.
    But if the file system is mounted with errors=continue retry block allocation.
    We need to mark the system zone blocks as in use to make sure retry don't
    pick them again

    System zone is the block range mapping block bitmap, inode bitmap and inode
    table.

    [akpm@linux-foundation.org: fix typo in comment]
    Signed-off-by: Aneesh Kumar K.V
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V