16 Apr, 2008

2 commits

  • Describe debug parameters with their names (and not their values).

    Signed-off-by: Paul Bolle
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Paul Bolle
     
  • mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But
    filesystems are calling this function while holding xattr_sem so possible
    recursion into the fs violates locking ordering of xattr_sem and transaction
    start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems
    can specify desired gfp mask and use GFP_NOFS from all of them.

    Signed-off-by: Jan Kara
    Reported-by: Dave Jones
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

15 Apr, 2008

2 commits

  • This fixes a regression introduced in commit
    205c109a7a96d9a3d8ffe64c4068b70811fef5e8 when switching to
    write_begin/write_end operations in JFFS2.

    The page offset is miscalculated, leading to corruption of the fragment
    lists and subsequently to memory corruption and panics.

    [ Side note: the bug is a fairly direct result of the naming. Nick was
    likely misled by the use of "offs", since we tend to use the notion of
    "offset" not as an absolute position, but as an offset _within_ a page
    or allocation.

    Alternatively, a "pgoff_t" is a page index, but not a byte offset -
    our VM naming can be a bit confusing.

    So in this case, a VM person would likely have called this a "pos",
    not an "offs", or perhaps talked about byte offsets rather than page
    offsets (since it's counted in bytes, not pages). - Linus ]

    Signed-off-by: Alexey Korolev
    Signed-off-by: Vasiliy Leonenko
    Signed-off-by: David Woodhouse
    Signed-off-by: Linus Torvalds

    Alexey Korolev
     
  • Miklos Szeredi found the bug:

    "Basically what happens is that on the server nlm_fopen() calls
    nfsd_open() which returns -EACCES, to which nlm_fopen() returns
    NLM_LCK_DENIED.

    "On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
    which in will cause fcntl_setlk() to retry forever."

    So, for example, opening a file on an nfs filesystem, changing
    permissions to forbid further access, then trying to lock the file,
    could result in an infinite loop.

    And Trond Myklebust identified the culprit, from Marc Eshel and I:

    7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out
    generic/filesystem switch from setlock code"

    That commit claimed to just be reshuffling code, but actually introduced
    a behavioral change by calling the lock method repeatedly as long as it
    returned -EAGAIN.

    We assumed this would be safe, since we assumed a lock of type SETLKW
    would only return with either success or an error other than -EAGAIN.
    However, nfs does can in fact return -EAGAIN in this situation, and
    independently of whether that behavior is correct or not, we don't
    actually need this change, and it seems far safer not to depend on such
    assumptions about the filesystem's ->lock method.

    Therefore, revert the problematic part of the original commit. This
    leaves vfs_lock_file() and its other callers unchanged, while returning
    fcntl_setlk and fcntl_setlk64 to their former behavior.

    Signed-off-by: J. Bruce Fields
    Tested-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: Marc Eshel
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

12 Apr, 2008

2 commits

  • * 'docs' of git://git.lwn.net/linux-2.6:
    Add additional examples in Documentation/spinlocks.txt
    Move sched-rt-group.txt to scheduler/
    Documentation: move rpc-cache.txt to filesystems/
    Documentation: move nfsroot.txt to filesystems/
    Spell out behavior of atomic_dec_and_lock() in kerneldoc
    Fix a typo in highres.txt
    Fixes to the seq_file document
    Fill out information on patch tags in SubmittingPatches
    Add the seq_file documentation

    Linus Torvalds
     
  • Documentation/ is a little large, and filesystems/ seems an obvious
    place for this file.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Jonathan Corbet

    J. Bruce Fields
     

11 Apr, 2008

7 commits

  • Michael Kerrisk found out that signalfd was not reporting back user data
    pushed using sigqueue:

    http://groups.google.com/group/linux.kernel/msg/9397cab8551e3123

    The following patch makes signalfd report back the ssi_ptr and ssi_int members
    of the signalfd_siginfo structure.

    Signed-off-by: Davide Libenzi
    Acked-by: Michael Kerrisk
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Jeff Roberson discovered a race when using kaio eventfd based notifications.
    When it occurs it can lead tomissed wakeups and hung userspace.

    This patch fixes the race by moving the notification inside the spinlocked
    section of kaio. The operation is safe since eventfd spinlock and kaio one
    are unrelated.

    Signed-off-by: Davide Libenzi
    Cc: Zach Brown
    Cc: Jeff Roberson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Use asmlinkage_protect in sys_io_getevents, because GCC for i386 with
    CONFIG_FRAME_POINTER=n can decide to clobber an argument word on the
    stack, i.e. the user struct pt_regs. Here the problem is not a tail
    call, but just the compiler's use of the stack when it inlines and
    optimizes the body of the called function. This seems to avoid it.

    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • The prevent_tail_call() macro works around the problem of the compiler
    clobbering argument words on the stack, which for asmlinkage functions
    is the caller's (user's) struct pt_regs. The tail/sibling-call
    optimization is not the only way that the compiler can decide to use
    stack argument words as scratch space, which we have to prevent.
    Other optimizations can do it too.

    Until we have new compiler support to make "asmlinkage" binding on the
    compiler's own use of the stack argument frame, we have work around all
    the manifestations of this issue that crop up.

    More cases seem to be prevented by also keeping the incoming argument
    variables live at the end of the function. This makes their original
    stack slots attractive places to leave those variables, so the compiler
    tends not clobber them for something else. It's still no guarantee, but
    it handles some observed cases that prevent_tail_call() did not.

    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
    [XFS] Ensure "both" features2 slots are consistent
    [XFS] Fix superblock features2 field alignment problem
    [XFS] remove shouting-indirection macros from xfs_sb.h

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cfq-iosched: do not leak ioc_data across iosched switches
    splice: fix infinite loop in generic_file_splice_read()

    Linus Torvalds
     
  • Some time ago while attempting to handle invalid link counts, I botched
    the unlink of links itself, so this patch fixes this now correctly, so
    that only the link count of nodes that don't point to links is ignored.
    Thanks to Vlado Plaga to notify me of this
    problem.

    Signed-off-by: Roman Zippel
    Signed-off-by: Linus Torvalds

    Roman Zippel
     

10 Apr, 2008

4 commits

  • Since older kernels may look in the sb_bad_features2 slot for flags,
    rather than zeroing it out on fixup, we should make it equal to the
    sb_features2 value.

    Also, if the ATTR2 flag was not found prior to features2 fixup, it was not
    set in the mount flags, so re-check after the fixup so that the current
    session will use the feature.

    Also fix up the comments to reflect these changes.

    SGI-PV: 980085
    SGI-Modid: xfs-linux-melb:xfs-kern:30778a

    Signed-off-by: Eric Sandeen
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Eric Sandeen
     
  • Due to the xfs_dsb_t structure not being 64 bit aligned, the last field of
    the on-disk superblock can vary in location This causes problems when the
    filesystem gets moved to a different platform, or there is a 32 bit
    userspace and 64 bit kernel.

    This patch detects the defect at mount time, logs a warning such as:

    XFS: correcting sb_features alignment problem

    in dmesg and corrects the problem so that everything is OK. it also
    blacklists the bad field in the superblock so it does not get used for
    something else later on.

    SGI-PV: 977636
    SGI-Modid: xfs-linux-melb:xfs-kern:30539a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Eric Sandeen
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • Remove macro-to-small-function indirection from xfs_sb.h, and remove some
    which are completely unused.

    SGI-PV: 976035
    SGI-Modid: xfs-linux-melb:xfs-kern:30528a

    Signed-off-by: Eric Sandeen
    Signed-off-by: Donald Douwsma
    Signed-off-by: Lachlan McIlroy

    Eric Sandeen
     
  • There's a quirky loop in generic_file_splice_read() that could go
    on indefinitely, if the file splice returns 0 permanently (and not
    just as a temporary condition). Get rid of the loop and pass
    back -EAGAIN correctly from __generic_file_splice_read(), so we
    handle that condition properly as well.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Apr, 2008

2 commits


05 Apr, 2008

1 commit

  • Mikulas Patocka noted that the optimization where we check if a buffer
    was already dirty (and we avoid re-dirtying it) was not really SMP-safe.

    Since the read of the old status was not synchronized with anything, an
    aggressive CPU re-ordering of memory accesses might have moved that read
    up to before the data was even written to the buffer, and another CPU
    that cleaned it again, causing the newly dirty state to never actually
    hit the disk.

    Admittedly this would probably never trigger in practice, but it's still
    wrong.

    Mikulas sent a patch that fixed the problem, but I dislike the subtlety
    of the whole optimization, so this is an alternate fix that is more
    explicit about the particular SMP ordering for the optimization, and
    separates out the speculative reads of the buffer state into its own
    conditional (and makes the memory barrier only happen if we are likely
    to actually hit the optimized case in the first place).

    I considered removing the optimization entirely, but Andrew argued for
    it's continued existence. I'm a push-over.

    Cc: Mikulas Patocka
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Apr, 2008

2 commits


03 Apr, 2008

1 commit


02 Apr, 2008

1 commit


31 Mar, 2008

3 commits


29 Mar, 2008

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock
    [PATCH] do shrink_submounts() for all fs types
    [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts()
    [PATCH] count ghost references to vfsmounts
    [PATCH] reduce stack footprint in namespace.c

    Linus Torvalds
     
  • kafs doesn't check if the cell already exists - so if you do an echo "add
    newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell
    again. kobject will also complain about a double registration. To prevent
    such problems, return -EEXIST in that case.

    Signed-off-by: Sven Schnelle
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sven Schnelle
     
  • Current nobh_write_end() implementation ignore partial writes(copied < len)
    case if page was fully mapped and simply mark page as Uptodate, which is
    totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in
    previous write_begin call. It simply contains garbage from pagecache and
    result in data leakage.

    #TEST_CASE_BEGIN:
    ~~~~~~~~~~~~~~~~
    In fact issue triggered by classical testcase
    open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
    ftruncate(3, 409600) = 0
    writev(3, [{"a", 1}, {NULL, 4095}], 2) = 1
    ##TESTCASE_SOURCE:
    ~~~~~~~~~~~~~~~~~
    #include
    #include
    #include
    #include
    #include
    #include
    int main(int argc, char **argv)
    {
    int fd, ret;
    void* p;
    struct iovec iov[2];
    fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666);
    ftruncate(fd, 409600);
    iov[0].iov_base="a";
    iov[0].iov_len=1;
    iov[1].iov_base=NULL;
    iov[1].iov_len=4096;
    ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec));
    printf("writev = %d, err = %d\n", ret, errno);
    return 0;
    }
    ##TESTCASE RESULT:
    ~~~~~~~~~~~~~~~~~~
    [root@ts63 ~]# mount | grep mnt2
    /dev/mapper/test on /mnt2 type ext2 (rw,nobh)
    [root@ts63 ~]# /tmp/writev /mnt2/test
    writev = 1, err = 0
    [root@ts63 ~]# hexdump -C /mnt2/test

    00000000 61 65 62 6f 6f 74 00 00 f0 b9 b4 59 3a 00 00 00 |aeboot.....Y:...|
    00000010 20 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......|
    00000020 df df df df df df df df df df df df df df df df |................|
    00000030 3a 00 00 00 2a 00 00 00 21 00 00 00 00 00 00 00 |:...*...!.......|
    00000040 60 c0 8c 00 00 00 00 00 40 4a 8d 00 00 00 00 00 |`.......@J......|
    00000050 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......|
    00000060 74 69 6d 65 20 64 64 20 69 66 3d 2f 64 65 76 2f |time dd if=/dev/|
    00000070 6c 6f 6f 70 30 20 20 6f 66 3d 2f 64 65 76 2f 6e |loop0 of=/dev/n|
    skip..
    00000f50 00 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |........1.......|
    00000f60 6d 6b 66 73 2e 65 78 74 33 20 2f 64 65 76 2f 76 |mkfs.ext3 /dev/v|
    00000f70 7a 76 67 2f 74 65 73 74 20 2d 62 34 30 39 36 00 |zvg/test -b4096.|
    00000f80 a0 fe 8c 00 00 00 00 00 21 00 00 00 00 00 00 00 |........!.......|
    00000f90 23 31 32 30 35 39 35 30 34 30 34 00 3a 00 00 00 |#1205950404.:...|
    00000fa0 20 00 8d 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......|
    00000fb0 d0 cf 8c 00 00 00 00 00 10 d0 8c 00 00 00 00 00 |................|
    00000fc0 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......|
    00000fd0 6d 6f 75 6e 74 20 2f 64 65 76 2f 76 7a 76 67 2f |mount /dev/vzvg/|
    00000fe0 74 65 73 74 20 20 2f 76 7a 20 2d 6f 20 64 61 74 |test /vz -o dat|
    00000ff0 61 3d 77 72 69 74 65 62 61 63 6b 00 00 00 00 00 |a=writeback.....|
    00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

    As you can see file's page contains garbage from pagecache instead of zeros.
    #TEST_CASE_END

    Attached patch:
    - Add sanity check BUG_ON in order to prevent incorrect usage by caller,
    This is function invariant because page can has buffers and in no zero
    *fadata pointer at the same time.
    - Always attach buffers to page is it is partial write case.
    - Always switch back to generic_write_end if page has buffers.
    This is reasonable because if page already has buffer then generic_write_begin
    was called previously.

    Signed-off-by: Dmitri Monakhov
    Reviewed-by: Nick Piggin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitri Monakhov
     

28 Mar, 2008

5 commits


25 Mar, 2008

2 commits


23 Mar, 2008

3 commits