15 Oct, 2010

3 commits

  • If you build aout support as a module, you'll want these exported.

    Reported-by: Tetsuo Handa
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Tony Luck reports that the addition of the access_ok() check in commit
    0eead9ab41da ("Don't dump task struct in a.out core-dumps") broke the
    ia64 compile due to missing the necessary header file includes.

    Rather than add yet another include () to make everything
    happy, just uninline the silly core dump helper functions and move the
    bodies to fs/exec.c where they make a lot more sense.

    dump_seek() in particular was too big to be an inline function anyway,
    and none of them are in any way performance-critical. And we really
    don't need to mess up our include file headers more than they already
    are.

    Reported-and-tested-by: Tony Luck
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • akiphie points out that a.out core-dumps have that odd task struct
    dumping that was never used and was never really a good idea (it goes
    back into the mists of history, probably the original core-dumping
    code). Just remove it.

    Also do the access_ok() check on dump_write(). It probably doesn't
    matter (since normal filesystems all seem to do it anyway), but he
    points out that it's normally done by the VFS layer, so ...

    [ I suspect that we should possibly do "vfs_write()" instead of
    calling ->write directly. That also does the whole fsnotify and write
    statistics thing, which may or may not be a good idea. ]

    And just to be anal, do this all for the x86-64 32-bit a.out emulation
    code too, even though it's not enabled (and won't currently even
    compile)

    Reported-by: akiphie
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

14 Oct, 2010

2 commits


12 Oct, 2010

1 commit

  • This patch disables the fanotify syscalls by just not building them and
    letting the cond_syscall() statements in kernel/sys_ni.c redirect them
    to sys_ni_syscall().

    It was pointed out by Tvrtko Ursulin that the fanotify interface did not
    include an explicit prioritization between groups. This is necessary
    for fanotify to be usable for hierarchical storage management software,
    as they must get first access to the file, before inotify-like notifiers
    see the file.

    This feature can be added in an ABI compatible way in the next release
    (by using a number of bits in the flags field to carry the info) but it
    was suggested by Alan that maybe we should just hold off and do it in
    the next cycle, likely with an (new) explicit argument to the syscall.
    I don't like this approach best as I know people are already starting to
    use the current interface, but Alan is all wise and noone on list backed
    me up with just using what we have. I feel this is needlessly ripping
    the rug out from under people at the last minute, but if others think it
    needs to be a new argument it might be the best way forward.

    Three choices:
    Go with what we got (and implement the new feature next cycle). Add a
    new field right now (and implement the new feature next cycle). Wait
    till next cycle to release the ABI (and implement the new feature next
    cycle). This is number 3.

    Signed-off-by: Eric Paris
    Signed-off-by: Linus Torvalds

    Eric Paris
     

10 Oct, 2010

2 commits


08 Oct, 2010

2 commits


07 Oct, 2010

9 commits

  • We need to update the issue_seq on any grant operation, be it via an MDS
    reply or a separate grant message. The update in the grant path was
    missing. This broke cap release for inodes in which the MDS sent an
    explicit grant message that was not soon after followed by a successful
    MDS reply on the same inode.

    Also fix the signedness on seq locals.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • If an MDS tries to revoke caps that we don't have, we want to send
    releases early since they probably contain the caps message the MDS
    is looking for.

    Previously, we only sent the messages if we didn't have the inode either. But
    in a multi-mds system we can retain the inode after dropping all caps for
    a single MDS.

    Signed-off-by: Greg Farnum
    Signed-off-by: Sage Weil

    Greg Farnum
     
  • encode_fh on error should update max_len with minimum required
    size, so that caller can redo the call with the reallocated buffer.
    This is required with open by handle patch series

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Sage Weil

    Aneesh Kumar K.V
     
  • encode_fh function should return 255 on error as done by other file
    system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units
    and not in bytes.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Sage Weil

    Aneesh Kumar K.V
     
  • If we interrupt an osd request, we call __cancel_request, but it wasn't
    verifying that req->r_osd was non-NULL before dereferencing it. This could
    cause a crash if osds were flapping and we aborted a request on said osd.

    Reported-by: Henry C Chang
    Signed-off-by: Sage Weil

    Sage Weil
     
  • Fix argument order.

    Signed-off-by: Henry C Chang
    Signed-off-by: Sage Weil

    Henry C Chang
     
  • When marking an inode reclaimable, a per-AG counter is increased, the
    inode is tagged reclaimable in its per-AG tree, and, when this is the
    first reclaimable inode in the AG, the AG entry in the per-mount tree
    is also tagged.

    When an inode is finally reclaimed, however, it is only deleted from
    the per-AG tree. Neither the counter is decreased, nor is the parent
    tree's AG entry untagged properly.

    Since the tags in the per-mount tree are not cleared, the inode
    shrinker iterates over all AGs that have had reclaimable inodes at one
    point in time.

    The counters on the other hand signal an increasing amount of slab
    objects to reclaim. Since "70e60ce xfs: convert inode shrinker to
    per-filesystem context" this is not a real issue anymore because the
    shrinker bails out after one iteration.

    But the problem was observable on a machine running v2.6.34, where the
    reclaimable work increased and each process going into direct reclaim
    eventually got stuck on the xfs inode shrinking path, trying to scan
    several million objects.

    Fix this by properly unwinding the reclaimable-state tracking of an
    inode when it is reclaimed.

    Signed-off-by: Johannes Weiner
    Cc: stable@kernel.org
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Johannes Weiner
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    writeback: always use sb->s_bdi for writeback purposes

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: Initialize total_len in fuse_retrieve()

    Linus Torvalds
     

04 Oct, 2010

2 commits

  • We currently use struct backing_dev_info for various different purposes.
    Originally it was introduced to describe a backing device which includes
    an unplug and congestion function and various bits of readahead information
    and VM-relevant flags. We're also using for tracking dirty inodes for
    writeback.

    To make writeback properly find all inodes we need to only access the
    per-filesystem backing_device pointed to by the superblock in ->s_bdi
    inside the writeback code, and not the instances pointeded to by
    inode->i_mapping->backing_dev which can be overriden by special devices
    or might not be set at all by some filesystems.

    Long term we should split out the writeback-relevant bits of struct
    backing_device_info (which includes more than the current bdi_writeback)
    and only point to it from the superblock while leaving the traditional
    backing device as a separate structure that can be overriden by devices.

    The one exception for now is the block device filesystem which really
    wants different writeback contexts for it's different (internal) inodes
    to handle the writeout more efficiently. For now we do this with
    a hack in fs-writeback.c because we're so late in the cycle, but in
    the future I plan to replace this with a superblock method that allows
    for multiple writeback contexts per filesystem.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this
    function

    Initialize total_len to zero, else its value will be undefined.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Miklos Szeredi

    Geert Uytterhoeven
     

02 Oct, 2010

5 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: prevent infinite recursion in cifs_reconnect_tcon
    cifs: set backing_dev_info on new S_ISREG inodes

    Linus Torvalds
     
  • Prevent from recursively locking the reiserfs lock in reiserfs_unpack()
    because we may call journal_begin() that requires the lock to be taken
    only once, otherwise it won't be able to release the lock while taking
    other mutexes, ending up in inverted dependencies between the journal
    mutex and the reiserfs lock for example.

    This fixes:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.35.4.4a #3
    -------------------------------------------------------
    lilo/1620 is trying to acquire lock:
    (&journal->j_mutex){+.+...}, at: [] do_journal_begin_r+0x7f/0x340 [reiserfs]

    but task is already holding lock:
    (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
    [] lock_acquire+0x67/0x80
    [] __mutex_lock_common+0x4d/0x410
    [] mutex_lock_nested+0x18/0x20
    [] reiserfs_write_lock+0x28/0x40 [reiserfs]
    [] do_journal_begin_r+0x86/0x340 [reiserfs]
    [] journal_begin+0x77/0x140 [reiserfs]
    [] reiserfs_remount+0x224/0x530 [reiserfs]
    [] do_remount_sb+0x60/0x110
    [] do_mount+0x625/0x790
    [] sys_mount+0x84/0xb0
    [] syscall_call+0x7/0xb

    -> #0 (&journal->j_mutex){+.+...}:
    [] __lock_acquire+0x1026/0x1180
    [] lock_acquire+0x67/0x80
    [] __mutex_lock_common+0x4d/0x410
    [] mutex_lock_nested+0x18/0x20
    [] do_journal_begin_r+0x7f/0x340 [reiserfs]
    [] journal_begin+0x77/0x140 [reiserfs]
    [] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
    [] reiserfs_get_block+0x22c/0x1530 [reiserfs]
    [] __block_prepare_write+0x1bb/0x3a0
    [] block_prepare_write+0x26/0x40
    [] reiserfs_prepare_write+0x88/0x170 [reiserfs]
    [] reiserfs_unpack+0xe6/0x120 [reiserfs]
    [] reiserfs_ioctl+0x272/0x320 [reiserfs]
    [] vfs_ioctl+0x28/0xa0
    [] do_vfs_ioctl+0x32d/0x5c0
    [] sys_ioctl+0x63/0x70
    [] syscall_call+0x7/0xb

    other info that might help us debug this:

    2 locks held by lilo/1620:
    #0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [] reiserfs_unpack+0x6a/0x120 [reiserfs]
    #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

    stack backtrace:
    Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3
    Call Trace:
    [] __lock_acquire+0x1026/0x1180
    [] lock_acquire+0x67/0x80
    [] __mutex_lock_common+0x4d/0x410
    [] mutex_lock_nested+0x18/0x20
    [] do_journal_begin_r+0x7f/0x340 [reiserfs]
    [] journal_begin+0x77/0x140 [reiserfs]
    [] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
    [] reiserfs_get_block+0x22c/0x1530 [reiserfs]
    [] __block_prepare_write+0x1bb/0x3a0
    [] block_prepare_write+0x26/0x40
    [] reiserfs_prepare_write+0x88/0x170 [reiserfs]
    [] reiserfs_unpack+0xe6/0x120 [reiserfs]
    [] reiserfs_ioctl+0x272/0x320 [reiserfs]
    [] vfs_ioctl+0x28/0xa0
    [] do_vfs_ioctl+0x32d/0x5c0
    [] sys_ioctl+0x63/0x70
    [] syscall_call+0x7/0xb

    Reported-by: Jarek Poplawski
    Tested-by: Jarek Poplawski
    Signed-off-by: Frederic Weisbecker
    Cc: Jeff Mahoney
    Cc: All since 2.6.32
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     
  • The reiserfs mutex already depends on the inode mutex, so we can't lock
    the inode mutex in reiserfs_unpack() without using the safe locking API,
    because reiserfs_unpack() is always called with the reiserfs mutex locked.

    This fixes:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.35c #13
    -------------------------------------------------------
    lilo/1606 is trying to acquire lock:
    (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [] reiserfs_unpack+0x60/0x110 [reiserfs]

    but task is already holding lock:
    (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
    [] lock_acquire+0x67/0x80
    [] __mutex_lock_common+0x4d/0x410
    [] mutex_lock_nested+0x18/0x20
    [] reiserfs_write_lock+0x28/0x40 [reiserfs]
    [] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs]
    [] reiserfs_fill_super+0x941/0xe60 [reiserfs]
    [] get_sb_bdev+0x117/0x170
    [] get_super_block+0x21/0x30 [reiserfs]
    [] vfs_kern_mount+0x6a/0x1b0
    [] do_kern_mount+0x39/0xe0
    [] do_mount+0x340/0x790
    [] sys_mount+0x84/0xb0
    [] syscall_call+0x7/0xb

    -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
    [] __lock_acquire+0x1026/0x1180
    [] lock_acquire+0x67/0x80
    [] __mutex_lock_common+0x4d/0x410
    [] mutex_lock_nested+0x18/0x20
    [] reiserfs_unpack+0x60/0x110 [reiserfs]
    [] reiserfs_ioctl+0x272/0x320 [reiserfs]
    [] vfs_ioctl+0x28/0xa0
    [] do_vfs_ioctl+0x32d/0x5c0
    [] sys_ioctl+0x63/0x70
    [] syscall_call+0x7/0xb

    other info that might help us debug this:

    1 lock held by lilo/1606:
    #0: (&REISERFS_SB(s)->lock){+.+.+.}, at: [] reiserfs_write_lock+0x28/0x40 [reiserfs]

    stack backtrace:
    Pid: 1606, comm: lilo Not tainted 2.6.35c #13
    Call Trace:
    [] __lock_acquire+0x1026/0x1180
    [] lock_acquire+0x67/0x80
    [] __mutex_lock_common+0x4d/0x410
    [] mutex_lock_nested+0x18/0x20
    [] reiserfs_unpack+0x60/0x110 [reiserfs]
    [] reiserfs_ioctl+0x272/0x320 [reiserfs]
    [] vfs_ioctl+0x28/0xa0
    [] do_vfs_ioctl+0x32d/0x5c0
    [] sys_ioctl+0x63/0x70
    [] syscall_call+0x7/0xb

    Reported-by: Jarek Poplawski
    Tested-by: Jarek Poplawski
    Signed-off-by: Frederic Weisbecker
    Cc: Jeff Mahoney
    Cc: [2.6.32 and later]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     
  • Having the limits file world readable will ease the task of system
    management on systems where root privileges might be restricted.

    Having admin restricted with root priviledges, he/she could not check
    other users process' limits.

    Also it'd align with most of the /proc stat files.

    Signed-off-by: Jiri Olsa
    Acked-by: Neil Horman
    Cc: Eugene Teo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Olsa
     
  • cifs_reconnect_tcon is called from smb_init. After a successful
    reconnect, cifs_reconnect_tcon will call reset_cifs_unix_caps. That
    function will, in turn call CIFSSMBQFSUnixInfo and CIFSSMBSetFSUnixInfo.
    Those functions also call smb_init.

    It's possible for the session and tcon reconnect to succeed, and then
    for another cifs_reconnect to occur before CIFSSMBQFSUnixInfo or
    CIFSSMBSetFSUnixInfo to be called. That'll cause those functions to call
    smb_init and cifs_reconnect_tcon again, ad infinitum...

    Break the infinite recursion by having those functions use a new
    smb_init variant that doesn't attempt to perform a reconnect.

    Reported-and-Tested-by: Michal Suchanek
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     

30 Sep, 2010

3 commits


29 Sep, 2010

1 commit

  • I have been seeing occasional pauses in transaction throughput up to
    30s long under heavy parallel workloads. The only notable thing was
    that the xfsaild was trying to be active during the pauses, but
    making no progress. It was running exactly 20 times a second (on the
    50ms no-progress backoff), and the number of pushbuf events was
    constant across this time as well. IOWs, the xfsaild appeared to be
    stuck on buffers that it could not push out.

    Further investigation indicated that it was trying to push out inode
    buffers that were pinned and/or locked. The xfsbufd was also getting
    woken at the same frequency (by the xfsaild, no doubt) to push out
    delayed write buffers. The xfsbufd was not making any progress
    because all the buffers in the delwri queue were pinned. This scan-
    and-make-no-progress dance went one in the trace for some seconds,
    before the xfssyncd came along an issued a log force, and then
    things started going again.

    However, I noticed something strange about the log force - there
    were way too many IO's issued. 516 log buffers were written, to be
    exact. That added up to 129MB of log IO, which got me very
    interested because it's almost exactly 25% of the size of the log.
    He delayed logging code is suppose to aggregate the minimum of 25%
    of the log or 8MB worth of changes before flushing. That's what
    really puzzled me - why did a log force write 129MB instead of only
    8MB?

    Essentially what has happened is that no CIL pushes had occurred
    since the previous tail push which cleared out 25% of the log space.
    That caused all the new transactions to block because there wasn't
    log space for them, but they kick the xfsaild to push the tail.
    However, the xfsaild was not making progress because there were
    buffers it could not lock and flush, and the xfsbufd could not flush
    them because they were pinned. As a result, both the xfsaild and the
    xfsbufd could not move the tail of the log forward without the CIL
    first committing.

    The cause of the problem was that the background CIL push, which
    should happen when 8MB of aggregated changes have been committed, is
    being held off by the concurrent transaction commit load. The
    background push does a down_write_trylock() which will fail if there
    is a concurrent transaction commit holding the push lock in read
    mode. With 8 CPUs all doing transactions as fast as they can, there
    was enough concurrent transaction commits to hold off the background
    push until tail-pushing could no longer free log space, and the halt
    would occur.

    It should be noted that there is no reason why it would halt at 25%
    of log space used by a single CIL checkpoint. This bug could
    definitely violate the "no transaction should be larger than half
    the log" requirement and hence result in corruption if the system
    crashed under heavy load. This sort of bug is exactly the reason why
    delayed logging was tagged as experimental....

    The fix is to start blocking background pushes once the threshold
    has been exceeded. Rework the threshold calculations to keep the
    amount of log space a CIL checkpoint can use to below that of the
    AIL push threshold to avoid the problem completely.

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

25 Sep, 2010

1 commit

  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
    o2dlm: force free mles during dlm exit
    ocfs2: Sync inode flags with ext2.
    ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits.
    ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent.
    ocfs2: update ctime when changing the file's permission by setfacl
    ocfs2/net: fix uninitialized ret in o2net_send_message_vec()
    Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c
    Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes.
    ocfs2: Fix lockdep warning in reflink.
    ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.

    Linus Torvalds
     

24 Sep, 2010

5 commits

  • While umounting, a block mle doesn't get freed if dlm is shutdown after
    master request is received but before assert master. This results in unclean
    shutdown of dlm domain.

    This patch frees all mles that lie around after other nodes were notified about
    exiting the dlm and marking dlm state as leaving. Only block mles are expected
    to be around, so we log ERROR for other mles but still free them.

    Signed-off-by: Srinivas Eeda
    Signed-off-by: Joel Becker

    Srinivas Eeda
     
  • We sync our inode flags with ext2 and define them by hex
    values. But actually in commit 3669567(4 years ago), all
    these values are moved to include/linux/fs.h. So we'd
    better also use them as what ext2 did. So sync our inode
    flags with ext2 by using FS_*.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • The first time I read the function ocfs2_resmap_resv_bits, I consider
    about what 'wanted' will be used and consider about the comments.
    Then I find it is only used if the reservation is empty. ;)

    So we'd better move it to the parens so that it make the code more
    readable, what's more, ocfs2_resmap_resv_bits is used so frequently
    and we should save some cpus.

    Acked-by: Mark Fasheh
    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • e_leaf_clusters is a le16, so use cpu_to_le16 instead
    of cpu_to_le32.

    What's more, we change 'clusters' to unsigned int to
    signify that the size of 'clusters' isn't important here.

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     
  • In commit 30e2bab, ext3 fixed it. So change it accordingly in ocfs2.

    Steps to reproduce:
    # touch aaa
    # stat -c %Z aaa
    1283760364
    # setfacl -m 'u::x,g::x,o::x' aaa
    # stat -c %Z aaa
    1283760364

    Signed-off-by: Tao Ma
    Signed-off-by: Joel Becker

    Tao Ma
     

23 Sep, 2010

4 commits

  • Currently, /proc//smaps has wrong dirty pages accounting.
    Shared_Dirty and Private_Dirty output only pte dirty pages and ignore
    PG_dirty page flag. It is difference against documentation, but also
    inconsistent against Referenced field. (Referenced checks both pte and
    page flags)

    This patch fixes it.

    Test program:

    large-array.c
    ---------------------------------------------------
    #include
    #include
    #include
    #include

    char array[1*1024*1024*1024L];

    int main(void)
    {
    memset(array, 1, sizeof(array));
    pause();

    return 0;
    }
    ---------------------------------------------------

    Test case:
    1. run ./large-array
    2. cat /proc/`pidof large-array`/smaps
    3. swapoff -a
    4. cat /proc/`pidof large-array`/smaps again

    Test result:

    00601000-40601000 rw-p 00000000 00:00 0
    Size: 1048576 kB
    Rss: 1048576 kB
    Pss: 1048576 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 218992 kB

    00601000-40601000 rw-p 00000000 00:00 0
    Size: 1048576 kB
    Rss: 1048576 kB
    Pss: 1048576 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 0 kB
    Private_Dirty: 1048576 kB
    Acked-by: Hugh Dickins
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • OCFS2 can return ERESTARTSYS from its write function when the process is
    signalled while waiting for a cluster lock (and the filesystem is mounted
    with intr mount option). Generally, it seems reasonable to allow
    filesystems to return this error code from its IO functions. As we must
    not leak ERESTARTSYS (and similar error codes) to userspace as a result of
    an AIO operation, we have to properly convert it to EINTR inside AIO code
    (restarting the syscall isn't really an option because other AIO could
    have been already submitted by the same io_submit syscall).

    Signed-off-by: Jan Kara
    Reviewed-by: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Zach Brown
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Commit 73296bc611 ("procfs: Use generic_file_llseek in /proc/vmcore")
    broke seeking on /proc/vmcore. This changes it back to use default_llseek
    in order to restore the original behaviour.

    The problem with generic_file_llseek is that it only allows seeks up to
    inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual
    file systems. We should merge generic_file_llseek and default_llseek some
    day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore
    is the safer solution.

    Signed-off-by: Arnd Bergmann
    Cc: Frederic Weisbecker
    Reported-by: CAI Qian
    Tested-by: CAI Qian
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • In 32-bit compatibility mode, the error handling for
    compat_do_readv_writev() may free an uninitialized pointer, potentially
    leading to all sorts of ugly memory corruption. This is reliably
    triggerable by unprivileged users by invoking the readv()/writev()
    syscalls with an invalid iovec pointer. The below patch fixes this to
    emulate the non-compat version.

    Introduced by commit b83733639a49 ("compat: factor out
    compat_rw_copy_check_uvector from compat_do_readv_writev")

    Signed-off-by: Dan Rosenberg
    Cc: stable@kernel.org (2.6.35)
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Dan Rosenberg