03 Jun, 2009

6 commits


02 Jun, 2009

9 commits

  • … when we use cls_cgroup

    This patch fixes a bug which unconfigured struct tcf_proto keeps
    chaining in tc_ctl_tfilter(), and avoids kernel panic in
    cls_cgroup_classify() when we use cls_cgroup.

    When we execute 'tc filter add', tcf_proto is allocated, initialized
    by classifier's init(), and chained. After it's chained,
    tc_ctl_tfilter() calls classifier's change(). When classifier's
    change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.

    In addition, cls_cgroup is initialized in change() not in init(). It
    accesses unconfigured struct tcf_proto which is chained before
    change(), then hits Oops.

    Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
    Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
    Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
    Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
    Signed-off-by: David S. Miller <davem@davemloft.net>

    Minoru Usui
     
  • Patch to fix bad length checking in e1000. E1000 by default does two
    things:

    1) Spans rx descriptors for packets that don't fit into 1 skb on recieve
    2) Strips the crc from a frame by subtracting 4 bytes from the length prior to
    doing an skb_put

    Since the e1000 driver isn't written to support receiving packets that span
    multiple rx buffers, it checks the End of Packet bit of every frame, and
    discards it if its not set. This places us in a situation where, if we have a
    spanning packet, the first part is discarded, but the second part is not (since
    it is the end of packet, and it passes the EOP bit test). If the second part of
    the frame is small (4 bytes or less), we subtract 4 from it to remove its crc,
    underflow the length, and wind up in skb_over_panic, when we try to skb_put a
    huge number of bytes into the skb. This amounts to a remote DOS attack through
    careful selection of frame size in relation to interface MTU. The fix for this
    is already in the e1000e driver, as well as the e1000 sourceforge driver, but no
    one ever pushed it to e1000. This is lifted straight from e1000e, and prevents
    small frames from causing the underflow described above

    Signed-off-by: Neil Horman
    Tested-by: Andy Gospodarek
    Signed-off-by: David S. Miller

    Neil Horman
     
  • Add a phy_power_down parameter to forcedeth: set to 1 to power down the
    phy and disable the link when an interface goes down; set to 0 to always
    leave the phy powered up.

    The phy power state persists across reboots; Windows, some BIOSes, and
    older versions of Linux don't bother to power up the phy again, forcing
    users to remove all power to get the interface working (see
    http://bugzilla.kernel.org/show_bug.cgi?id=13072). Leaving the phy
    powered on is the safest default behavior. Users accustomed to seeing
    the link state reflect the interface state and/or wanting to minimize
    power consumption can set phy_power_down=1 if compatibility with other
    OSes is not an issue.

    Signed-off-by: Ed Swierk
    Signed-off-by: David S. Miller

    Ed Swierk
     
  • It's possible to recurse into filesystem from the memory
    allocation, which deadlocks in xfs_qm_shake(). Add check
    for __GFP_FS, and bail out if it is not set.

    Signed-off-by: Felix Blyakher
    Signed-off-by: Hedi Berriche
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Andi Kleen
    Signed-off-by: Felix Blyakher

    Felix Blyakher
     
  • In the case where growing a filesystem would leave the last AG
    too small, the fixup code has an overflow in the calculation
    of the new size with one fewer ag, because "nagcount" is a 32
    bit number. If the new filesystem has > 2^32 blocks in it
    this causes a problem resulting in an EINVAL return from growfs:

    # xfs_io -f -c "truncate 19998630180864" fsfile
    # mkfs.xfs -f -bsize=4096 -dagsize=76288719b,size=3905982455b fsfile
    # mount -o loop fsfile /mnt
    # xfs_growfs /mnt

    meta-data=/dev/loop0 isize=256 agcount=52,
    agsize=76288719 blks
    = sectsz=512 attr=2
    data = bsize=4096 blocks=3905982455, imaxpct=5
    = sunit=0 swidth=0 blks
    naming =version 2 bsize=4096 ascii-ci=0
    log =internal bsize=4096 blocks=32768, version=2
    = sectsz=512 sunit=0 blks, lazy-count=0
    realtime =none extsz=4096 blocks=0, rtextents=0
    xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Invalid argument

    Reported-by: richard.ems@cape-horn-eng.com
    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Felix Blyakher
    Signed-off-by: Felix Blyakher

    Eric Sandeen
     
  • Regreesion from commit ef8f7fc, which rearranged the code in
    xfs_swap_extents() leading to double unlock of xfs inode ilock.
    That resulted in xfs_fsr deadlocking itself on platforms, which
    don't handle double unlock of rw_semaphore nicely. It caused the
    count go negative, which represents the write holder, without
    really having one. ia64 is one of the platforms where deadlock
    was easily reproduced and the fix was tested.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Eric Sandeen
    Signed-off-by: Felix Blyakher

    Felix Blyakher
     
  • This mostly adds back AppleTouch support and adds CONFIG_HIGHMEM
    by default.

    Signed-off-by: Benjamin Herrenschmidt

    Benjamin Herrenschmidt
     
  • David S. Miller
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: hash - Fix handling of sg entry that crosses page boundary

    Linus Torvalds
     

01 Jun, 2009

7 commits


14 May, 2009

1 commit

  • These struct buffer_heads are allocated on the stack (and hence are
    initialized with stack garbage). They are only used to call a
    get_blocks() function, so that's mostly OK, but b_state must be
    initialized to be 0 so we don't have any unexpected BH_* flags set by
    accident, such as BH_Unwritten or BH_Delay.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     

13 May, 2009

2 commits

  • Setting BH_Unwritten buffer_heads as BH_Mapped avoids multiple
    (unnecessary) calls to get_block() during the call to the write(2)
    system call. Setting BH_Unwritten buffer heads as BH_Mapped requires
    that the writepages() functions can handle BH_Unwritten buffer_heads.

    After this commit, things work as follows:

    ext4_ext_get_block() returns unmapped, unwritten, buffer head when
    called with create = 0 for prealloc space. This makes sure we handle
    the read path and non-delayed allocation case correctly. Even though
    the buffer head is marked unmapped we have valid b_blocknr and b_bdev
    values in the buffer_head.

    ext4_da_get_block_prep() called for block resrevation will now return
    mapped, unwritten, new buffer_head for prealloc space. This avoids
    multiple calls to get_block() for write to same offset. By making such
    buffers as BH_New, we also assure that sub-block zeroing of buffered
    writes happens correctly.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"

    Aneesh Kumar K.V
     
  • The BH_Delay and BH_Unwritten flags should never leak out to
    submit_bh(). So add some BUG_ON() checks to submit_bh so we can get a
    stack trace and determine how and why this might have happened.

    (Note that only XFS and ext4 use these buffer head flags, and XFS does
    not use submit_bh(). So this patch should only modify behavior for
    ext4.)

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org

    Aneesh Kumar K.V
     

04 May, 2009

1 commit


02 May, 2009

5 commits

  • Move the function prototypes in group.h into ext4.h so they are all
    defined in one place.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The fs/ext4/namei.h header file had only a single function
    declaration, and should have never been a standalone file. Move it
    into ext4.h, where should have been from the beginning.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • There is no longer a reason for a separate ext4_i.h header file, so
    move it into ext4.h just to make life easier for developers to find
    the relevant data structures and typedefs. Should also speed up
    compiles slightly, too.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • By avoiding the use of not-yet-used block groups (i.e., block groups
    with the BLOCK_UNINIT flag), mballoc had a tendency to create large
    files with large non-contiguous gaps. In addition avoiding the use of
    new block groups had a tendency to push regular file data into the
    first block group in a flex_bg group, which slows down the speed of
    e2fsck pass 2, since it has a tendency to seek much more. For
    example:

    Before Patch After Patch
    Time in seconds Time in seconds
    Real / User/ Sys MB/s Real / User/ Sys MB/s
    Pass 1 8.52 / 2.21 / 0.46 20.43 8.84 / 4.97 / 1.11 19.68
    Pass 2 21.16 / 1.02 / 1.86 11.30 6.54 / 1.77 / 1.78 36.39
    Pass 3 0.01 / 0.00 / 0.00 139.00 0.01 / 0.01 / 0.00 128.90
    Pass 4 0.16 / 0.15 / 0.00 0.00 0.17 / 0.17 / 0.00 0.00
    Pass 5 2.52 / 1.99 / 0.09 0.79 2.31 / 1.78 / 0.06 0.86
    Total 32.40 / 5.11 / 2.49 12.81 17.99 / 8.75 / 2.98 23.01

    This was on a sample 80 gig root filesystem which was approximately
    50% full. Note the improved e2fsck pass 2 performance, by over a
    factor of 3, due to a decreased number of seeks. (The total amount of
    I/O in pass 2 was unchanged; the layout of the directory blocks was
    simply much better from e2fsck's's perspective.)

    Other changes as a result of this patch on this sample filesystem:

    Before Patch After Patch
    # of non-contig files 762 779
    # of non-contig directories 571 570
    # of BLOCK_UNINIT bg's 307 293
    # of INODE_UNINIT bg's 503 503

    Out of 640 block groups, of which 333 were in use, this patch caused
    an extra 14 block groups to be utilized. The number of non-contiguous
    files did go up slightly, but when measured against the 99.9% of the
    files (603,154) which were contiguously allocated, this is pretty
    insignificant.

    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andreas Dilger

    Theodore Ts'o
     
  • By using a separate super_operations structure for filesystems that
    have and don't have journals, we can simply ext4_write_super() ---
    which is only needed when no journal is present --- and ext4_freeze(),
    ext4_unfreeze(), and ext4_sync_fs(), which are only needed when the
    journal is present.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

01 May, 2009

4 commits

  • The function ext4_mark_recovery_complete() is called from two call
    paths: either (a) while mounting the filesystem, in which case there's
    no danger of any other CPU calling write_super() until the mount is
    completed, and (b) while remounting the filesystem read-write, in
    which case the fs core has already locked the superblock. This also
    allows us to take out a very vile unlock_super()/lock_super() pair in
    ext4_remount().

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Ext4's on-line resizing adds a new block group and then, only at the
    last step adjusts s_groups_count. However, it's possible on SMP
    systems that another CPU could see the updated the s_group_count and
    not see the newly initialized data structures for the just-added block
    group. For this reason, it's important to insert a SMP read barrier
    after reading s_groups_count and before reading any (for example) the
    new block group descriptors allowed by the increased value of
    s_groups_count.

    Unfortunately, we rather blatently violate this locking protocol
    documented in fs/ext4/resize.c. Fortunately, (1) on-line resizes
    happen relatively rarely, and (2) it seems rare that the filesystem
    code will immediately try to use just-added block group before any
    memory ordering issues resolve themselves. So apparently problems
    here are relatively hard to hit, since ext3 has been vulnerable to the
    same issue for years with no one apparently complaining.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The s_dirt flag wasn't completely handled correctly, but it didn't
    really matter when journalling was enabled. It turns out that when
    ext4 runs without a journal, we don't clear s_dirt in places where we
    should have, with the result that the high-level write_super()
    function was writing the superblock when it wasn't necessary.

    So we fix this by making ext4_commit_super() clear the s_dirt flag,
    and removing many of the other places where s_dirt is manipulated.
    When journalling is enabled, the s_dirt flag might be left set more
    often, but s_dirt really doesn't matter when journalling is enabled.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • The ext4_commit_super() function took both a struct super_block * and
    a struct ext4_super_block *, but the struct ext4_super_block can be
    derived from the struct super_block.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

28 Apr, 2009

1 commit

  • For very large filesystems, the s_flex_groups array can get quite big.
    For example, a filesystem that can be resized up to 16TB will have
    8192 flex groups (assuming the default flex_bg size of 16), so the
    array is 96k, which is *very* marginal for kmalloc(). On the other
    hand, a 160GB filesystem without the resize_inode feature will only
    require 960 bytes. So we try to allocate the array first using
    kmalloc(), and if that fails, we'll try to use vmalloc() instead.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

26 Apr, 2009

3 commits


25 Apr, 2009

1 commit