05 Nov, 2013

1 commit


18 Oct, 2013

4 commits

  • commit 118b23022512eb2f41ce42db70dc0568d00be4ba upstream.

    dynamic_dname() is both too much and too little for those - the
    output may be well in excess of 64 bytes dynamic_dname() assumes
    to be enough (thanks to ashmem feeding really long names to
    shmem_file_setup()) and vsnprintf() is an overkill for those
    guys.

    Signed-off-by: Al Viro
    Cc: Colin Cross
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 6e4ea8e33b2057b85d75175dd89b93f5e26de3bc upstream.

    If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
    potentionally return from the function without having freed these
    allocations. If we don't do the return, we over-write the previous
    allocation pointers, so we leak either way.

    Spotted with Coverity.

    [ Fixed by tytso to set is and bs to NULL after freeing these
    pointers, in case in the retry loop we later end up triggering an
    error causing a jump to cleanup, at which point we could have a double
    free bug. -- Ted ]

    Signed-off-by: Dave Jones
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Eric Sandeen
    Signed-off-by: Greg Kroah-Hartman

    Dave Jones
     
  • commit 4871c1588f92c6c13f4713a7009f25f217055807 upstream.

    btrfs_rename was using the root of the old dir instead of the root of the new
    dir when checking for a hash collision, so if you tried to move a file into a
    subvol it would freak out because it would see the file you are trying to move
    in its current root. This fixes the bug where this would fail

    btrfs subvol create test1
    btrfs subvol create test2
    mv test1 test2.

    Thanks to Chris Murphy for catching this,

    Reported-by: Chris Murphy
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 9d05746e7b16d8565dddbe3200faa1e669d23bbf upstream.

    Olga reported that file descriptors opened with O_PATH do not work with
    fstatfs(), found during further development of ksh93's thread support.

    There is no reason to not allow O_PATH file descriptors here (fstatfs is
    very much a path operation), so use "fdget_raw()". See commit
    55815f70147d ("vfs: make O_PATH file descriptors usable for 'fstat()'")
    for a very similar issue reported for fstat() by the same team.

    Reported-and-tested-by: ольга крыжановская
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     

14 Oct, 2013

9 commits

  • commit b8d0c69b9469ffd33df30fee3e990f2d4aa68a09 upstream.

    A user was reporting weird warnings from btrfs_put_delayed_ref() and I noticed
    that we were doing this list_del_init() on our head ref outside of
    delayed_refs->lock. This is a problem if we have people still on the list, we
    could end up modifying old pointers and such. Fix this by removing us from the
    list before we do our run_delayed_ref on our head ref. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit a05254143cd183b18002cbba7759a1e4629aa762 upstream.

    We have logic to see if we've already created a parent directory by check to see
    if an inode inside of that directory has a lower inode number than the one we
    are currently processing. The logic is that if there is a lower inode number
    then we would have had to made sure the directory was created at that previous
    point. The problem is that subvols inode numbers count from the lowest objectid
    in the root tree, which may be less than our current progress. So just skip if
    our dir item key is a root item. This fixes the original test and the xfstest
    version I made that added an extra subvol create. Thanks,

    Reported-by: Emil Karlson
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit b6c60c8018c4e9beb2f83fc82c09f9d033766571 upstream.

    Previously we only added blocks to the list to have their backrefs checked if
    the level of the block is right above the one we are searching for. This is
    because we want to make sure we don't add the entire path up to the root to the
    lists to make sure we process things one at a time. This assumes that if any
    blocks in the path to the root are going to be not checked (shared in other
    words) then they will be in the level right above the current block on up. This
    isn't quite right though since we can have blocks higher up the list that are
    shared because they are attached to a reloc root. But we won't add this block
    to be checked and then later on we will BUG_ON(!upper->checked). So instead
    keep track of wether or not we've queued a block to be checked in this current
    search, and if we haven't go ahead and queue it to be checked. This patch fixed
    the panic I was seeing where we BUG_ON(!upper->checked). Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 997def25e4b9cee3b01609e18a52f926bca8bd2b upstream.

    Commit f5ea1100 cleans up the disk to host conversions for
    node directory entries, but because a variable is reused in
    xfs_node_toosmall() the next node is not correctly found.
    If the original node is small enough (< BBTOB(bp->b_length),
    file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569

    Keep the original node header to get the correct forward node.

    (When a node is considered for a merge with a sibling, it overwrites the
    sibling pointers of the original incore nodehdr with the sibling's
    pointers. This leads to loop considering the original node as a merge
    candidate with itself in the second pass, and so it incorrectly
    determines a merge should occur.)

    [v3: added Dave Chinner's (slightly modified) suggestion to the commit header,
    cleaned up whitespace. -bpm]

    Signed-off-by: Mark Tinguely
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers
    Signed-off-by: Greg Kroah-Hartman

    Mark Tinguely
     
  • commit 52b26a3e1bb3e065c32b3febdac1e1f117d88e15 upstream.

    - Fix an Oops when nfs4_ds_connect() returns an error.
    - Always check the device status after waiting for a connect to complete.

    Reported-by: Andy Adamson
    Reported-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 7f42ec3941560f0902fe3671e36f2c20ffd3af0a upstream.

    Many NILFS2 users were reported about strange file system corruption
    (for example):

    NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
    NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

    But such error messages are consequence of file system's issue that takes
    place more earlier. Fortunately, Jerome Poulin
    and Anton Eliasson were reported about another
    issue not so recently. These reports describe the issue with segctor
    thread's crash:

    BUG: unable to handle kernel paging request at 0000000000004c83
    IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

    Call Trace:
    nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
    nilfs_segctor_construct+0x17b/0x290 [nilfs2]
    nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
    kthread+0xc0/0xd0
    ret_from_fork+0x7c/0xb0

    These two issues have one reason. This reason can raise third issue
    too. Third issue results in hanging of segctor thread with eating of
    100% CPU.

    REPRODUCING PATH:

    One of the possible way or the issue reproducing was described by
    Jermoe me Poulin :

    1. init S to get to single user mode.
    2. sysrq+E to make sure only my shell is running
    3. start network-manager to get my wifi connection up
    4. login as root and launch "screen"
    5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
    6. lscp | xz -9e > lscp.txt.xz
    7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
    8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
    9. start a screen and launch strace -f -o find-cat.log -t find
    /mnt/nilfs -type f -exec cat {} > /dev/null \;
    10. start a screen and launch strace -f -o apt-get.log -t apt-get update
    11. launch the last command again as it did not crash the first time
    12. apt-get crashes
    13. ps aux > ps-aux-crashed.log
    13. sysrq+W
    14. sysrq+E wait for everything to terminate
    15. sysrq+SUSB

    Simplified way of the issue reproducing is starting kernel compilation
    task and "apt-get update" in parallel.

    REPRODUCIBILITY:

    The issue is reproduced not stable [60% - 80%]. It is very important to
    have proper environment for the issue reproducing. The critical
    conditions for successful reproducing:

    (1) It should have big modified file by mmap() way.

    (2) This file should have the count of dirty blocks are greater that
    several segments in size (for example, two or three) from time to time
    during processing.

    (3) It should be intensive background activity of files modification
    in another thread.

    INVESTIGATION:

    First of all, it is possible to see that the reason of crash is not valid
    page address:

    NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
    NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783

    Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
    number. And b_blocknr with b_size values look like block numbers. So,
    buffer_head's pointer points on not proper address value.

    Detailed investigation of the issue is discovered such picture:

    [-----------------------------SEGMENT 6783-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783

    [-----------------------------SEGMENT 6784-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
    NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
    NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
    [----------] ditto
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15

    [-----------------------------SEGMENT 6785-------------------------------]
    NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
    NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
    NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
    NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
    NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
    NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
    NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
    NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
    NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
    NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
    [----------] ditto
    NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12

    NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
    NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785

    NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82

    BUG: unable to handle kernel paging request at 0000000000001a82
    IP: [] nilfs_end_page_io+0x12/0xd0 [nilfs2]

    Usually, for every segment we collect dirty files in list. Then, dirty
    blocks are gathered for every dirty file, prepared for write and
    submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
    place complete write phase after calling nilfs_end_bio_write() on the
    block layer. Buffers/pages are marked as not dirty on final phase and
    processed files removed from the list of dirty files.

    It is possible to see that we had three prepare_write and submit_bio
    phases before segbuf_wait and complete_write phase. Moreover, segments
    compete between each other for dirty blocks because on every iteration
    of segments processing dirty buffer_heads are added in several lists of
    payload_buffers:

    [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
    [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8

    The next pointer is the same but prev pointer has changed. It means
    that buffer_head has next pointer from one list but prev pointer from
    another. Such modification can be made several times. And, finally, it
    can be resulted in various issues: (1) segctor hanging, (2) segctor
    crashing, (3) file system metadata corruption.

    FIX:
    This patch adds:

    (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
    for every proccessed dirty block;

    (2) checking of BH_Async_Write flag in
    nilfs_lookup_dirty_data_buffers() and
    nilfs_lookup_dirty_node_buffers();

    (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
    nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().

    Reported-by: Jerome Poulin
    Reported-by: Anton Eliasson
    Cc: Paul Fertser
    Cc: ARAI Shun-ichi
    Cc: Piotr Szymaniak
    Cc: Juan Barry Manuel Canham
    Cc: Zahid Chowdhury
    Cc: Elmer Zhang
    Cc: Kenneth Langga
    Signed-off-by: Vyacheslav Dubeyko
    Acked-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vyacheslav Dubeyko
     
  • commit 0ab08f576b9e6a6b689fc6b4e632079b978e619b upstream.

    A former patch introducing FUSE_I_SIZE_UNSTABLE flag provided detailed
    description of races between ftruncate and anyone who can extend i_size:

    > 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size
    > changed (due to our own fuse_do_setattr()) and is going to call
    > truncate_pagecache() for some 'new_size' it believes valid right now. But by
    > the time that particular truncate_pagecache() is called ...
    > 2. fuse_do_setattr() returns (either having called truncate_pagecache() or
    > not -- it doesn't matter).
    > 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
    > 4. mmap-ed write makes a page in the extended region dirty.

    This patch adds necessary bits to fuse_file_fallocate() to protect from that
    race.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Maxim Patlasov
     
  • commit bde52788bdb755b9e4b75db6c434f30e32a0ca0b upstream.

    The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):

    1) An user makes a page dirty via mmap-ed write.
    2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
    and covering the page.
    3) Before truncate_pagecache_range call from fuse_file_fallocate,
    the page goes to write-back. The page is fully processed by fuse_writepage
    (including end_page_writeback on the page), but fuse_flush_writepages did
    nothing because fi->writectr < 0.
    4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
    by calling fuse_release_nowrite. The latter triggers processing queued
    write-back request which will write stale data to the hole soon.

    Changed in v2 (thanks to Brian for suggestion):
    - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
    we can end up in returning -ENOTSUPP while user data is already punched
    from page cache. Use filemap_write_and_wait_range() instead.
    Changed in v3 (thanks to Miklos for suggestion):
    - fuse_wait_on_writeback() is prone to livelocks; use fuse_set_nowrite()
    instead. So far as we need a dirty-page barrier only, fuse_sync_writes()
    should be enough.
    - rebased to for-linus branch of fuse.git

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Maxim Patlasov
     
  • commit 72023656961b8c81a168a7a6762d589339d0d7ec upstream.

    A high setting of max_map_count, and a process core-dumping with a large
    enough vm_map_count could result in an NT_FILE note not being written,
    and the kernel crashing immediately later because it has assumed
    otherwise.

    Reproduction of the oops-causing bug described here:

    https://lkml.org/lkml/2013/8/30/50

    Rge ussue originated in commit 2aa362c49c31 ("coredump: extend core dump
    note section to contain file names of mapped file") from Oct 4, 2012.

    This patch make that section optional in that case. fill_files_note()
    should signify the error, and also let the info struct in
    elf_core_dump() be zero-initialized so that we can check for the
    optionally written note.

    [akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
    [akpm@linux-foundation.org: fix sparse warning]
    Signed-off-by: Dan Aloni
    Cc: Al Viro
    Cc: Denys Vlasenko
    Reported-by: Martin MOKREJS
    Tested-by: Martin MOKREJS
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Dan Aloni
     

05 Oct, 2013

2 commits

  • commit 49475555848d396a0c78fb2f8ecceb3f3f263ef1 upstream.

    Superblock lock was replaced with (un)lock_super() removal, but left
    uninitialized for Seventh Edition UNIX filesystem in the following commit (3.7):
    c07cb01 sysv: drop lock/unlock super

    Signed-off-by: Lubomir Rintel
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Lubomir Rintel
     
  • commit 2f6cf0de0281d210061ce976f2d42d246adc75bb upstream.

    The memcpy() in bio_copy_data() was using the wrong offset vars, leading
    to data corruption in weird unusual setups.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Kent Overstreet
     

02 Oct, 2013

4 commits

  • commit adbe6991efd36104ac9eaf751993d35eaa7f493a upstream.

    This fixes a copy and paste error introduced by 9f060e2231
    ("block: Convert integrity to bvec_alloc_bs()").

    Found by Coverity (CID 1020654).

    Signed-off-by: Bjorn Helgaas
    Acked-by: Kent Overstreet
    Signed-off-by: Jens Axboe
    Cc: Jonghwan Choi
    Signed-off-by: Greg Kroah-Hartman

    Bjorn Helgaas
     
  • commit e729eac6f65e11c5f03b09adcc84bd5bcb230467 upstream.

    Refuse RW mount of udf filesystem. So far we just silently changed it
    to RO mount but when the media is writeable, block layer won't notice
    this change and thus will think device is used RW and will block eject
    button of the drive. That is unexpected by users because for
    non-writeable media eject button works just fine.

    Userspace mount(8) command handles this just fine and retries mounting
    with MS_RDONLY set so userspace shouldn't see any regression. Plus any
    tool mounting udf is likely confronted with the case of read-only
    media where block layer already refuses to mount the filesystem without
    MS_RDONLY set so our behavior shouldn't be anything new for it.

    Reported-by: Hui Wang
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit d759bfa4e7919b89357de50a2e23817079889195 upstream.

    Change all function used in filesystem discovery during mount to user
    standard kernel return values - -errno on error, 0 on success instead
    of 1 on failure and 0 on success. This allows us to pass error number
    (not just failure / success) so we can abort device scanning earlier
    in case of errors like EIO or ENOMEM . Also we will be able to return
    EROFS in case writeable mount is requested but writing isn't supported.

    Signed-off-by: Jan Kara
    Cc: Hui Wang
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit dfb1d61b0e9f9e2c542e9adc8d970689f4114ff6 upstream.

    If an error occurs after having called finish_open() then fput() needs to
    be called on the already opened file.

    Signed-off-by: Miklos Szeredi
    Cc: Steve French
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

27 Sep, 2013

13 commits

  • commit efeb9e60d48f7778fdcad4a0f3ad9ea9b19e5dfd upstream.

    Userspace can add names containing a slash character to the directory
    listing. Don't allow this as it could cause all sorts of trouble.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 06a7c3c2781409af95000c60a5df743fd4e2f8b4 upstream.

    The way how fuse calls truncate_pagecache() from fuse_change_attributes()
    is completely wrong. Because, w/o i_mutex held, we never sure whether
    'oldsize' and 'attr->size' are valid by the time of execution of
    truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
    released fc->lock in the middle of fuse_change_attributes(), we completely
    loose control of actions which may happen with given inode until we reach
    truncate_pagecache. The list of potentially dangerous actions includes
    mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.

    The typical outcome of doing truncate_pagecache() with outdated arguments
    is data corruption from user point of view. This is (in some sense)
    acceptable in cases when the issue is triggered by a change of the file on
    the server (i.e. externally wrt fuse operation), but it is absolutely
    intolerable in scenarios when a single fuse client modifies a file without
    any external intervention. A real life case I discovered by fsx-linux
    looked like this:

    1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
    FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
    2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
    to the server synchronously, then calls fuse_change_attributes(). The
    latter updates i_size, releases fc->lock, but before comparing oldsize vs
    attr->size..
    3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
    updating attributes and i_size, but now oldsize is equal to
    outarg.attr.size because i_size has just been updated (step 2). Hence,
    fuse_do_setattr() returns w/o calling truncate_pagecache().
    4. As soon as ftruncate(2) completes, the user extends file size by
    write(2) making a hole in the middle of file, then reads data from the hole
    either by read(2) or mmap-ed read. The user expects to get zero data from
    the hole, but gets stale data because truncate_pagecache() is not executed
    yet.

    The scenario above illustrates one side of the problem: not truncating the
    page cache even though we should. Another side corresponds to truncating
    page cache too late, when the state of inode changed significantly.
    Theoretically, the following is possible:

    1. As in the previous scenario fuse_dentry_revalidate() discovered that
    i_size changed (due to our own fuse_do_setattr()) and is going to call
    truncate_pagecache() for some 'new_size' it believes valid right now. But
    by the time that particular truncate_pagecache() is called ...
    2. fuse_do_setattr() returns (either having called truncate_pagecache() or
    not -- it doesn't matter).
    3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
    4. mmap-ed write makes a page in the extended region dirty.

    The result will be the lost of data user wrote on the fourth step.

    The patch is a hotfix resolving the issue in a simplistic way: let's skip
    dangerous i_size update and truncate_pagecache if an operation changing
    file size is in progress. This simplistic approach looks correct for the
    cases w/o external changes. And to handle them properly, more sophisticated
    and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
    postpone it until the issue is well discussed on the mailing list(s).

    Changed in v2:
    - improved patch description to cover both sides of the issue.

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Maxim Patlasov
     
  • commit d331a415aef98717393dda0be69b7947da08eba3 upstream.

    Calls like setxattr and removexattr result in updation of ctime.
    Therefore invalidate inode attributes to force a refresh.

    Signed-off-by: Anand Avati
    Reviewed-by: Brian Foster
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Anand Avati
     
  • commit 4a4ac4eba1010ef9a804569058ab29e3450c0315 upstream.

    The patch fixes a race between ftruncate(2), mmap-ed write and write(2):

    1) An user makes a page dirty via mmap-ed write.
    2) The user performs shrinking truncate(2) intended to purge the page.
    3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
    writeback. fuse_writepage_locked fills FUSE_WRITE request and releases
    the original page by end_page_writeback.
    4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex
    is free.
    5) Ordinary write(2) extends i_size back to cover the page. Note that
    fuse_send_write_pages do wait for fuse writeback, but for another
    page->index.
    6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request.
    fuse_send_writepage is supposed to crop inarg->size of the request,
    but it doesn't because i_size has already been extended back.

    Moving end_page_writeback to the end of fuse_writepage_locked fixes the
    race because now the fact that truncate_pagecache is successfully returned
    infers that fuse_writepage_locked has already called end_page_writeback.
    And this, in turn, infers that fuse_flush_writepages has already called
    fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2)
    could not extend it because of i_mutex held by ftruncate(2).

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Maxim Patlasov
     
  • commit 494ddd11be3e2621096bb425eed2886f8e8446d4 upstream.

    Signed-off-by: Jianpeng Ma
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    majianpeng
     
  • commit 17b7f7cf58926844e1dd40f5eb5348d481deca6a upstream.

    Refuse RW mount of isofs filesystem. So far we just silently changed it
    to RO mount but when the media is writeable, block layer won't notice
    this change and thus will think device is used RW and will block eject
    button of the drive. That is unexpected by users because for
    non-writeable media eject button works just fine.

    Userspace mount(8) command handles this just fine and retries mounting
    with MS_RDONLY set so userspace shouldn't see any regression. Plus any
    tool mounting isofs is likely confronted with the case of read-only
    media where block layer already refuses to mount the filesystem without
    MS_RDONLY set so our behavior shouldn't be anything new for it.

    Reported-by: Hui Wang
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit aee1c13dd0f6c2fc56e0e492b349ee8ac655880f upstream.

    Don't allow mounting the proc filesystem unless the caller has
    CAP_SYS_ADMIN rights over the pid namespace. The principle here is if
    you create or have capabilities over it you can mount it, otherwise
    you get to live with what other people have mounted.

    Andy pointed out that this is needed to prevent users in a user
    namespace from remounting proc and specifying different hidepid and gid
    options on already existing proc mounts.

    Reported-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 28e8be31803b19d0d8f76216cb11b480b8a98bec upstream.

    Call fiemap ioctl(2) with given start offset as well as an desired mapping
    range should show extents if possible. However, we somehow figure out the
    end offset of mapping via 'mapping_end -= cpos' before iterating the
    extent records which would cause problems if the given fiemap length is
    too small to a cluster size, e.g,

    Cluster size 4096:
    debugfs.ocfs2 1.6.3
    Block Size Bits: 12 Cluster Size Bits: 12

    The extended fiemap test utility From David:
    https://gist.github.com/anonymous/6172331

    # dd if=/dev/urandom of=/ocfs2/test_file bs=1M count=1000
    # ./fiemap /ocfs2/test_file 4096 10
    start: 4096, length: 10
    File /ocfs2/test_file has 0 extents:
    # Logical Physical Length Flags
    ^^^^^
    Reported-by: David Weber
    Tested-by: David Weber
    Cc: Sunil Mushran
    Cc: Mark Fashen
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jie Liu
     
  • commit bbb651e469d99f0088e286fdeb54acca7bb4ad4e upstream.

    If you start the replace procedure on a read only filesystem, at
    the end the procedure fails to write the updated dev_items to the
    chunk tree. The problem is that this error is not indicated except
    for a WARN_ON(). If the user now thinks that everything was done
    as expected and destroys the source device (with mkfs or with a
    hammer). The next mount fails with "failed to read chunk root" and
    the filesystem is gone.

    This commit adds code to fail the attempt to start the replace
    procedure if the filesystem is mounted read-only.

    Signed-off-by: Stefan Behrens
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Stefan Behrens
     
  • commit 5208386c501276df18fee464e21d3c58d2d79517 upstream.

    Merge conditions in ext4_setattr() handling inode size changes, also
    move ext4_begin_ordered_truncate() call somewhat earlier because it
    simplifies error recovery in case of failure. Also add error handling in
    case i_disksize update fails.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 933d4b36576c951d0371bbfed05ec0135d516a6e upstream.

    If a server sends a lease break to a connection that doesn't have
    opens with a lease key specified in the server response, we can't
    find an open file to send an ack. Fix this by walking through
    all connections we have.

    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit 1a05096de82f3cd672c76389f63964952678506f upstream.

    This happens when we receive a lease break from a server, then
    find an appropriate lease key in opened files and schedule the
    oplock_break slow work. lw pointer isn't freed in this case.

    Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Pavel Shilovsky
     
  • commit 73e216a8a42c0ef3d08071705c946c38fdbe12b0 upstream.

    Oleksii reported that he had seen an oops similar to this:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
    IP: [] sock_sendmsg+0x93/0xd0
    PGD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: ipt_MASQUERADE xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables carl9170 ath usb_storage f2fs nfnetlink_log nfnetlink md4 cifs dns_resolver hid_generic usbhid hid af_packet uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev rfcomm btusb bnep bluetooth qmi_wwan qcserial cdc_wdm usb_wwan usbnet usbserial mii snd_hda_codec_hdmi snd_hda_codec_realtek iwldvm mac80211 coretemp intel_powerclamp kvm_intel kvm iwlwifi snd_hda_intel cfg80211 snd_hda_codec xhci_hcd e1000e ehci_pci snd_hwdep sdhci_pci snd_pcm ehci_hcd microcode psmouse sdhci thinkpad_acpi mmc_core i2c_i801 pcspkr usbcore hwmon snd_timer snd_page_alloc snd ptp rfkill pps_core soundcore evdev usb_common vboxnetflt(O) vboxdrv(O)Oops#2 Part8
    loop tun binfmt_misc fuse msr acpi_call(O) ipv6 autofs4
    CPU: 0 PID: 21612 Comm: kworker/0:1 Tainted: G W O 3.10.1SIGN #28
    Hardware name: LENOVO 2306CTO/2306CTO, BIOS G2ET92WW (2.52 ) 02/22/2013
    Workqueue: cifsiod cifs_echo_request [cifs]
    task: ffff8801e1f416f0 ti: ffff880148744000 task.ti: ffff880148744000
    RIP: 0010:[] [] sock_sendmsg+0x93/0xd0
    RSP: 0000:ffff880148745b00 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff880148745b78 RCX: 0000000000000048
    RDX: ffff880148745c90 RSI: ffff880181864a00 RDI: ffff880148745b78
    RBP: ffff880148745c48 R08: 0000000000000048 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff880181864a00
    R13: ffff880148745c90 R14: 0000000000000048 R15: 0000000000000048
    FS: 0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000088 CR3: 000000020c42c000 CR4: 00000000001407b0
    Oops#2 Part7
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffff880148745b30 ffffffff810c4af9 0000004848745b30 ffff880181864a00
    ffffffff81ffbc40 0000000000000000 ffff880148745c90 ffffffff810a5aab
    ffff880148745bc0 ffffffff81ffbc40 ffff880148745b60 ffffffff815a9fb8
    Call Trace:
    [] ? finish_task_switch+0x49/0xe0
    [] ? lock_timer_base.isra.36+0x2b/0x50
    [] ? _raw_spin_unlock_irqrestore+0x18/0x40
    [] ? try_to_del_timer_sync+0x4f/0x70
    [] ? _raw_spin_unlock_bh+0x1f/0x30
    [] kernel_sendmsg+0x37/0x50
    [] smb_send_kvec+0xd0/0x1d0 [cifs]
    [] smb_send_rqst+0x83/0x1f0 [cifs]
    [] cifs_call_async+0xec/0x1b0 [cifs]
    [] ? free_rsp_buf+0x40/0x40 [cifs]
    Oops#2 Part6
    [] SMB2_echo+0x8e/0xb0 [cifs]
    [] cifs_echo_request+0x79/0xa0 [cifs]
    [] process_one_work+0x173/0x4a0
    [] worker_thread+0x121/0x3a0
    [] ? manage_workers.isra.27+0x2b0/0x2b0
    [] kthread+0xc0/0xd0
    [] ? kthread_create_on_node+0x120/0x120
    [] ret_from_fork+0x7c/0xb0
    [] ? kthread_create_on_node+0x120/0x120
    Code: 84 24 b8 00 00 00 4c 89 f1 4c 89 ea 4c 89 e6 48 89 df 4c 89 60 18 48 c7 40 28 00 00 00 00 4c 89 68 30 44 89 70 14 49 8b 44 24 28 90 88 00 00 00 3d ef fd ff ff 74 10 48 8d 65 e0 5b 41 5c 41
    RIP [] sock_sendmsg+0x93/0xd0
    RSP
    CR2: 0000000000000088

    The client was in the middle of trying to send a frame when the
    server->ssocket pointer got zeroed out. In most places, that we access
    that pointer, the srv_mutex is held. There's only one spot that I see
    that the server->ssocket pointer gets set and the srv_mutex isn't held.
    This patch corrects that.

    The upstream bug report was here:

    https://bugzilla.kernel.org/show_bug.cgi?id=60557

    Reported-by: Oleksii Shevchuk
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     

08 Sep, 2013

1 commit

  • commit 44512449c0ab368889dd13ae0031fba74ee7e1d2 upstream.

    NFSv4 reserves readdir cookie values 0-2 for special entries (. and ..),
    but jfs allows a value of 2 for a non-special entry. This incompatibility
    can result in the nfs client reporting a readdir loop.

    This patch doesn't change the value stored internally, but adds one to
    the value exposed to the iterate method.

    Signed-off-by: Dave Kleikamp
    [bwh: Backported to 3.2:
    - Adjust context
    - s/ctx->pos/filp->f_pos/]
    Tested-by: Christian Kujau
    Signed-off-by: Ben Hutchings
    Signed-off-by: Greg Kroah-Hartman

    Dave Kleikamp
     

30 Aug, 2013

4 commits

  • commit 35dc248383bbab0a7203fca4d722875bc81ef091 upstream.

    There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
    leads to one process writing data into the address space of some other
    random unrelated process if the ioctl is interrupted by a signal.
    What happens is the following:

    - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
    underlying SCSI command will transfer data from the SCSI device to
    the buffer provided in the ioctl)

    - Before the command finishes, a signal is sent to the process waiting
    in the ioctl. This will end up waking up the sg_ioctl() code:

    result = wait_event_interruptible(sfp->read_wait,
    (srp_done(sfp, srp) || sdp->detached));

    but neither srp_done() nor sdp->detached is true, so we end up just
    setting srp->orphan and returning to userspace:

    srp->orphan = 1;
    write_unlock_irq(&sfp->rq_list_lock);
    return result; /* -ERESTARTSYS because signal hit process */

    At this point the original process is done with the ioctl and
    blithely goes ahead handling the signal, reissuing the ioctl, etc.

    - Eventually, the SCSI command issued by the first ioctl finishes and
    ends up in sg_rq_end_io(). At the end of that function, we run through:

    write_lock_irqsave(&sfp->rq_list_lock, iflags);
    if (unlikely(srp->orphan)) {
    if (sfp->keep_orphan)
    srp->sg_io_owned = 0;
    else
    done = 0;
    }
    srp->done = done;
    write_unlock_irqrestore(&sfp->rq_list_lock, iflags);

    if (likely(done)) {
    /* Now wake up any sg_read() that is waiting for this
    * packet.
    */
    wake_up_interruptible(&sfp->read_wait);
    kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
    kref_put(&sfp->f_ref, sg_remove_sfp);
    } else {
    INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
    schedule_work(&srp->ew.work);
    }

    Since srp->orphan *is* set, we set done to 0 (assuming the
    userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
    ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
    to run in a workqueue.

    - In workqueue context we go through sg_rq_end_io_usercontext() ->
    sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
    bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

    The key point here is that we are doing copy_to_user() on a
    workqueue -- that is, we're on a kernel thread with current->mm
    equal to whatever random previous user process was scheduled before
    this kernel thread. So we end up copying whatever data the SCSI
    command returned to the virtual address of the buffer passed into
    the original ioctl, but it's quite likely we do this copying into a
    different address space!

    As suggested by James Bottomley ,
    add a check for current->mm (which is NULL if we're on a kernel thread
    without a real userspace address space) in bio_uncopy_user(), and skip
    the copy if we're on a kernel thread.

    There's no reason that I can think of for any caller of bio_uncopy_user()
    to want to do copying on a kernel thread with a random active userspace
    address space.

    Huge thanks to Costa Sapuntzakis for the
    original pointer to this bug in the sg code.

    Signed-off-by: Roland Dreier
    Tested-by: David Milburn
    Cc: Jens Axboe
    Signed-off-by: James Bottomley
    Signed-off-by: Greg Kroah-Hartman

    Roland Dreier
     
  • commit 4bf93b50fd04118ac7f33a3c2b8a0a1f9fa80bc9 upstream.

    Fix the issue with improper counting number of flying bio requests for
    BIO_EOPNOTSUPP error detection case.

    The sb_nbio must be incremented exactly the same number of times as
    complete() function was called (or will be called) because
    nilfs_segbuf_wait() will call wail_for_completion() for the number of
    times set to sb_nbio:

    do {
    wait_for_completion(&segbuf->sb_bio_event);
    } while (--segbuf->sb_nbio > 0);

    Two functions complete() and wait_for_completion() must be called the
    same number of times for the same sb_bio_event. Otherwise,
    wait_for_completion() will hang or leak.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Dan Carpenter
    Acked-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vyacheslav Dubeyko
     
  • commit 2df37a19c686c2d7c4e9b4ce1505b5141e3e5552 upstream.

    Remove double call of bio_put() in nilfs_end_bio_write() for the case of
    BIO_EOPNOTSUPP error detection. The issue was found by Dan Carpenter
    and he suggests first version of the fix too.

    Signed-off-by: Vyacheslav Dubeyko
    Reported-by: Dan Carpenter
    Acked-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vyacheslav Dubeyko
     
  • commit 52e220d357a38cb29fa2e29f34ed94c1d66357f4 upstream.

    This should actually be returning an ERR_PTR on error instead of NULL.
    That was how it was designed and all the callers expect it.

    [AV: actually, that's what "VFS: Make clone_mnt()/copy_tree()/collect_mounts()
    return errors" missed - originally collect_mounts() was expected to return
    NULL on failure]

    Signed-off-by: Dan Carpenter
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     

20 Aug, 2013

2 commits

  • commit 91aa11fae1cf8c2fd67be0609692ea9741cdcc43 upstream.

    When jbd2_journal_dirty_metadata() returns error,
    __ext4_handle_dirty_metadata() stops the handle. However callers of this
    function do not count with that fact and still happily used now freed
    handle. This use after free can result in various issues but very likely
    we oops soon.

    The motivation of adding __ext4_journal_stop() into
    __ext4_handle_dirty_metadata() in commit 9ea7a0df seems to be only to
    improve error reporting. So replace __ext4_journal_stop() with
    ext4_journal_abort_handle() which was there before that commit and add
    WARN_ON_ONCE() to dump stack to provide useful information.

    Reported-by: Sage Weil
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 2b047252d087be7f2ba088b4933cd904f92e6fce upstream.

    Ben Tebulin reported:

    "Since v3.7.2 on two independent machines a very specific Git
    repository fails in 9/10 cases on git-fsck due to an SHA1/memory
    failures. This only occurs on a very specific repository and can be
    reproduced stably on two independent laptops. Git mailing list ran
    out of ideas and for me this looks like some very exotic kernel issue"

    and bisected the failure to the backport of commit 53a59fc67f97 ("mm:
    limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").

    That commit itself is not actually buggy, but what it does is to make it
    much more likely to hit the partial TLB invalidation case, since it
    introduces a new case in tlb_next_batch() that previously only ever
    happened when running out of memory.

    The real bug is that the TLB gather virtual memory range setup is subtly
    buggered. It was introduced in commit 597e1c3580b7 ("mm/mmu_gather:
    enable tlb flush range in generic mmu_gather"), and the range handling
    was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB
    range flushed when __tlb_remove_page() runs out of slots"), but that fix
    was not complete.

    The problem with the TLB gather virtual address range is that it isn't
    set up by the initial tlb_gather_mmu() initialization (which didn't get
    the TLB range information), but it is set up ad-hoc later by the
    functions that actually flush the TLB. And so any such case that forgot
    to update the TLB range entries would potentially miss TLB invalidates.

    Rather than try to figure out exactly which particular ad-hoc range
    setup was missing (I personally suspect it's the hugetlb case in
    zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
    did), this patch just gets rid of the problem at the source: make the
    TLB range information available to tlb_gather_mmu(), and initialize it
    when initializing all the other tlb gather fields.

    This makes the patch larger, but conceptually much simpler. And the end
    result is much more understandable; even if you want to play games with
    partial ranges when invalidating the TLB contents in chunks, now the
    range information is always there, and anybody who doesn't want to
    bother with it won't introduce subtle bugs.

    Ben verified that this fixes his problem.

    Reported-bisected-and-tested-by: Ben Tebulin
    Build-testing-by: Stephen Rothwell
    Build-testing-by: Richard Weinberger
    Reviewed-by: Michal Hocko
    Acked-by: Peter Zijlstra
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds