09 Jul, 2008

4 commits


08 Jul, 2008

1 commit


06 Jul, 2008

2 commits

  • Fix some issues in pagemap_read noted by Alexey:

    - initialize pagemap_walk.mm to "mm" , so the code starts working as
    advertised

    - initialize ->private to "&pm" so it wouldn't immediately oops in
    pagemap_pte_hole()

    - unstatic struct pagemap_walk, so two threads won't fsckup each other
    (including those started by root, including flipping ->mm when you don't
    have permissions)

    - pagemap_read() contains two calls to ptrace_may_attach(), second one
    looks unneeded.

    - avoid possible kmalloc(0) and integer wraparound.

    Cc: Alexey Dobriyan
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    [ Personally, I'd just remove the functionality entirely - Linus ]
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Don't use a static entry, so as to prevent races during concurrent use
    of this function.

    Reported-by: Alexey Dobriyan
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

05 Jul, 2008

7 commits


03 Jul, 2008

1 commit

  • The legacy protocol's open operation doesn't handle an append operation
    (it is expected that the client take care of it). We were incorrectly
    passing the extended protocol's flag through even in legacy mode. This
    was reported in bugzilla report #10689. This patch fixes the problem
    by disallowing extended protocol open modes from being passed in legacy
    mode and implemented append functionality on the client side by adding
    a seek after the open.

    Signed-off-by: Eric Van Hensbergen

    Eric Van Hensbergen
     

01 Jul, 2008

1 commit

  • fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
    then immediately wait on them. Conceptually, that makes them sync writes
    and we should treat them as such so that the IO schedulers can handle
    them appropriately.

    This patch fixes a write starvation issue that Lin Ming reported, where
    xx is stuck for more than 2 minutes because of a large number of
    synchronous IO in the system:

    INFO: task kjournald:20558 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    kjournald D ffff810010820978 6712 20558 2
    ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
    ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
    0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
    Call Trace:
    [] kobject_get+0x12/0x17
    [] getnstimeofday+0x2f/0x83
    [] sync_buffer+0x0/0x3f
    [] io_schedule+0x5d/0x9f
    [] sync_buffer+0x3b/0x3f
    [] __wait_on_bit+0x40/0x6f
    [] sync_buffer+0x0/0x3f
    [] out_of_line_wait_on_bit+0x6c/0x78
    [] wake_bit_function+0x0/0x23
    [] sync_dirty_buffer+0x98/0xcb
    [] journal_commit_transaction+0x97d/0xcb6
    [] lock_timer_base+0x26/0x4b
    [] kjournald+0xc1/0x1fb
    [] autoremove_wake_function+0x0/0x2e
    [] kjournald+0x0/0x1fb
    [] kthread+0x47/0x74
    [] schedule_tail+0x28/0x5d
    [] child_rip+0xa/0x12
    [] kthread+0x0/0x74
    [] child_rip+0x0/0x12

    Lin Ming confirms that this patch fixes the issue. I've run tests with
    it for the past week and no ill effects have been observed, so I'm
    proposing it for inclusion into 2.6.26.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

30 Jun, 2008

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
    udf: Fix regression in UDF anchor block detection

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    [patch 2/3] vfs: dcache cleanups
    [patch 1/3] vfs: dcache sparse fixes
    [patch 3/3] vfs: make d_path() consistent across mount operations
    [patch 4/4] flock: remove unused fields from file_lock_operations
    [patch 3/4] vfs: fix ERR_PTR abuse in generic_readlink
    [patch 2/4] fs: make struct file arg to d_path const
    [patch 1/4] vfs: path_{get,put}() cleanups
    [patch for 2.6.26 4/4] vfs: utimensat(): fix write access check for futimens()
    [patch for 2.6.26 3/4] vfs: utimensat(): fix error checking for {UTIME_NOW,UTIME_OMIT} case
    [patch for 2.6.26 1/4] vfs: utimensat(): ignore tv_sec if tv_nsec == UTIME_OMIT or UTIME_NOW
    [patch for 2.6.26 2/4] vfs: utimensat(): be consistent with utime() for immutable and append-only files
    [PATCH] fix cgroup-inflicted breakage in block_dev.c

    Linus Torvalds
     

25 Jun, 2008

1 commit

  • This patch fixes bz 450641.

    This patch changes the computation for zero_metapath_length(), which it
    renames to metapath_branch_start(). When you are extending the metadata
    tree, The indirect blocks that point to the new data block must either
    diverge from the existing tree either at the inode, or at the first
    indirect block. They can diverge at the first indirect block because the
    inode has room for 483 pointers while the indirect blocks have room for
    509 pointers, so when the tree is grown, there is some free space in the
    first indirect block. What metapath_branch_start() now computes is the
    height where the first indirect block for the new data block is located.
    It can either be 1 (if the indirect block diverges from the inode) or 2
    (if it diverges from the first indirect block).

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

24 Jun, 2008

9 commits

  • This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to
    handle kernel paging request at ffff81002690e000.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • Jan Kara
     
  • In some cases it could happen that some block passed test in
    udf_check_anchor_block() even though udf_read_tagged() refused to read it later
    (e.g. because checksum was not correct). This patch makes
    udf_check_anchor_block() use udf_read_tagged() so that the checking is
    stricter.

    This fixes the regression (certain disks unmountable) caused by commit
    423cf6dc04eb79d441bfda2b127bc4b57134b41d.

    Signed-off-by: Tomas Janousek
    Signed-off-by: Jan Kara

    Tomas Janousek
     
  • Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Fix a sign issue in xdr_decode_fhstatus3()
    Fix incorrect comparison in nfs_validate_mount_data()

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • This appears to fix the Oops reported in
    http://bugzilla.kernel.org/show_bug.cgi?id=10826

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Comment from Al Viro: add prepend_name() wrapper.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Fix the following sparse warnings:

    fs/dcache.c:2183:19: warning: symbol 'filp_cachep' was not declared. Should it be static?
    fs/dcache.c:115:3: warning: context imbalance in 'dentry_iput' - unexpected unlock
    fs/dcache.c:188:2: warning: context imbalance in 'dput' - different lock contexts for basic block
    fs/dcache.c:400:2: warning: context imbalance in 'prune_one_dentry' - different lock contexts for basic block
    fs/dcache.c:431:22: warning: context imbalance in 'prune_dcache' - different lock contexts for basic block
    fs/dcache.c:563:2: warning: context imbalance in 'shrink_dcache_sb' - different lock contexts for basic block
    fs/dcache.c:1385:6: warning: context imbalance in 'd_delete' - wrong count at exit
    fs/dcache.c:1636:2: warning: context imbalance in '__d_unalias' - unexpected unlock
    fs/dcache.c:1735:2: warning: context imbalance in 'd_materialise_unique' - different lock contexts for basic block

    Signed-off-by: Miklos Szeredi
    Reviewed-by: Matthew Wilcox
    Acked-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • The path that __d_path() computes can become slightly inconsistent when it
    races with mount operations: it grabs the vfsmount_lock when traversing mount
    points but immediately drops it again, only to re-grab it when it reaches the
    next mount point. The result is that the filename computed is not always
    consisent, and the file may never have had that name. (This is unlikely, but
    still possible.)

    Fix this by grabbing the vfsmount_lock for the whole duration of
    __d_path().

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: John Johansen
    Signed-off-by: Miklos Szeredi
    Acked-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

23 Jun, 2008

10 commits

  • fl_insert and fl_remove are not used right now in the kernel. Remove them.

    Signed-off-by: Denis V. Lunev
    Cc: Matthew Wilcox
    Cc: Alexander Viro
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Denis V. Lunev
     
  • generic_readlink calls ERR_PTR for negative and positive values
    (vfs_readlink returns length of "link"), but it should not
    (not an errno) and does not need to.

    Signed-off-by: Marcin Slusarz
    Cc: Al Viro
    Cc: Christoph Hellwig
    Acked-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Marcin Slusarz
     
  • Signed-off-by: Jan Engelhardt
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Jan Engelhardt
     
  • Here are some more places where path_{get,put}() can be used instead of
    dput()/mntput() pair.

    Signed-off-by: Jan Blunck
    Cc: Al Viro
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Jan Blunck
     
  • The POSIX.1 draft spec for futimens()/utimensat() says:

    Only a process with the effective user ID equal to the
    user ID of the file, *or with write access to the file*,
    or with appropriate privileges may use futimens() or
    utimensat() with a null pointer as the times argument
    or with both tv_nsec fields set to the special value
    UTIME_NOW.

    The important piece here is "with write access to the file", and
    this matters for futimens(), which deals with an argument that
    is a file descriptor referring to the file whose timestamps are
    being updated, The standard is saying that the "writability"
    check is based on the file permissions, not the access mode with
    which the file is opened. (This behavior is consistent with the
    semantics of FreeBSD's futimes().) However, Linux is currently
    doing the latter -- futimens(fd, times) is a library
    function implemented as

    utimensat(fd, NULL, times, 0)

    and within the utimensat() implementation we have the code:

    f = fget(dfd); // dfd is 'fd'
    ...
    if (f) {
    if (!(f->f_mode & FMODE_WRITE))
    goto mnt_drop_write_and_out;

    The check should instead be based on the file permissions.

    Thanks to Miklos for pointing out how to do this check.
    Miklos also pointed out a simplification that could be
    made to my first version of this patch, since the checks
    for the pathname and file descriptor cases can now be
    conflated.

    Acked-by: Miklos Szeredi
    Cc: Al Viro
    Cc: Ulrich Drepper
    Signed-off-by: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Michael Kerrisk
     
  • The POSIX.1 draft spec for utimensat() says:

    Only a process with the effective user ID equal to the
    user ID of the file or with appropriate privileges may use
    futimens() or utimensat() with a non-null times argument
    that does not have both tv_nsec fields set to UTIME_NOW
    and does not have both tv_nsec fields set to UTIME_OMIT.

    If this condition is violated, then the error EPERM should result.
    However, the current implementation does not generate EPERM if
    one tv_nsec field is UTIME_NOW while the other is UTIME_OMIT.
    It should give this error for that case.

    This patch:

    a) Repairs that problem.
    b) Removes the now unneeded nsec_special() helper function.
    c) Adds some comments to explain the checks that are being
    performed.

    Thanks to Miklos, who provided comments on the previous iteration
    of this patch. As a result, this version is a little simpler and
    and its logic is better structured.

    Miklos suggested an alternative idea, migrating the
    is_owner_or_cap() checks into fs/attr.c:inode_change_ok() via
    the use of an ATTR_OWNER_CHECK flag. Maybe we could do that
    later, but for now I've gone with this version, which is
    IMO simpler, and can be more easily read as being correct.

    Acked-by: Miklos Szeredi
    Cc: Al Viro
    Cc: Ulrich Drepper
    Signed-off-by: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Michael Kerrisk
     
  • The POSIX.1 draft spec for utimensat() says that if a times[n].tv_nsec
    field is UTIME_OMIT or UTIME_NOW, then the value in the corresponding
    tv_sec field is ignored. See the last sentence of this para, from
    the spec:

    If the tv_nsec field of a timespec structure has
    the special value UTIME_NOW, the file's relevant
    timestamp shall be set to the greatest value
    supported by the file system that is not greater than
    the current time. If the tv_nsec field has the
    special value UTIME_OMIT, the file's relevant
    timestamp shall not be changed. In either case,
    the tv_sec field shall be ignored.

    However the current Linux implementation requires the tv_sec value to be
    zero (or the EINVAL error results). This requirement should be removed.

    Acked-by: Miklos Szeredi
    Cc: Al Viro
    Cc: Ulrich Drepper
    Signed-off-by: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Michael Kerrisk
     
  • …e and append-only files

    This patch fixes utimensat() to make its behavior consistent
    with that of utime()/utimes() when dealing with files marked
    immutable and append-only.

    The current utimensat() implementation also returns EPERM if
    'times' is non-NULL and the tv_nsec fields are both UTIME_NOW.
    For consistency, the

    (times != NULL && times[0].tv_nsec == UTIME_NOW &&
    times[1].tv_nsec == UTIME_NOW)

    case should be treated like the traditional utimes() case where
    'times' is NULL. That is, the call should succeed for a file
    marked append-only and should give the error EACCES if the file
    is marked as immutable.

    The simple way to do this is to set 'times' to NULL
    if (times[0].tv_nsec == UTIME_NOW && times[1].tv_nsec == UTIME_NOW).

    This is also the natural approach, since POSIX.1 semantics consider the
    times == {{x, UTIME_NOW}, {y, UTIME_NOW}}
    to be exactly equivalent to the case for
    times == NULL.

    (Thanks to Miklos for pointing this out.)

    Patch 3 in this series relies on the simplification provided
    by this patch.

    Acked-by: Miklos Szeredi <miklos@szeredi.hu>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ulrich Drepper <drepper@redhat.com>
    Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

    Michael Kerrisk
     
  • devcgroup_inode_permission() expects MAY_FOO, not FMODE_FOO; kindly
    keep your misdesign consistent if you positively have to inflict it
    on the kernel.

    Signed-off-by: Al Viro

    Al Viro
     
  • Christian Borntraeger reported that reinstating cond_resched() with
    CONFIG_PREEMPT caused a performance regression on lmbench:

    For example select file 500:
    23 microseconds
    32 microseconds

    and that's really because we totally unnecessarily do the cond_resched()
    in the innermost loop of select(), which is just silly.

    This moves it out from the innermost loop (which only ever loops ove the
    bits in a single "unsigned long" anyway), which makes the performance
    regression go away.

    Reported-and-tested-by: Christian Borntraeger
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Jun, 2008

1 commit


20 Jun, 2008

1 commit

  • This is the patch for the group descriptor table corruption during
    online resize pointed out by Theodore Tso. The problem was caused by
    the fact that the ext4 group descriptor can be either 32 or 64 bytes
    long. Only the 64 bytes structure was taken into account.

    Signed-off-by: Frederic Bohe
    Signed-off-by: Mingming Cao
    Signed-off-by: "Theodore Ts'o"

    Frederic Bohe