09 Oct, 2014

1 commit

  • schedule_delayed_work() happening when the work is already pending is
    a cheap no-op. Don't bother with ->wbuf_queued logics - it's both
    broken (cancelling ->wbuf_dwork leaves it set, as spotted by Jeff Harris)
    and pointless. It's cheaper to let schedule_delayed_work() handle that
    case.

    Reported-by: Jeff Harris
    Tested-by: Jeff Harris
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

09 Aug, 2014

2 commits

  • Pull MTD updates from Brian Norris:
    "AMD-compatible CFI driver:
    - Support OTP programming for Micron M29EW family
    - Increase buffer write timeout, according to detected flash
    parameter info

    NAND
    - Add helpers for retrieving ONFI timing modes
    - GPMI: provide option to disable bad block marker swapping (required
    for Ka-On electronics platforms)

    SPI NOR
    - EON EN25QH128 support
    - Support new Flag Status Register (FSR) on a few Micron flash

    Common
    - New sysfs entries for bad block and ECC stats

    And a few miscellaneous refactorings, cleanups, and driver
    improvements"

    * tag 'for-linus-20140808' of git://git.infradead.org/linux-mtd: (31 commits)
    mtd: gpmi: make blockmark swapping optional
    mtd: gpmi: remove line breaks from error messages and improve wording
    mtd: gpmi: remove useless (void *) type casts and spaces between type casts and variables
    mtd: atmel_nand: NFC: support multiple interrupt handling
    mtd: atmel_nand: implement the nfc_device_ready() by checking the R/B bit
    mtd: atmel_nand: add NFC status error check
    mtd: atmel_nand: make ecc parameters same as definition
    mtd: nand: add ONFI timing mode to nand_timings converter
    mtd: nand: define struct nand_timings
    mtd: cfi_cmdset_0002: fix do_write_buffer() timeout error
    mtd: denali: use 8 bytes for READID command
    mtd/ftl: fix the double free of the buffers allocated in build_maps()
    mtd: phram: Fix whitespace issues
    mtd: spi-nor: add support for EON EN25QH128
    mtd: cfi_cmdset_0002: Add support for locking OTP memory
    mtd: cfi_cmdset_0002: Add support for writing OTP memory
    mtd: cfi_cmdset_0002: Invalidate cache after entering/exiting OTP memory
    mtd: cfi_cmdset_0002: Add support for reading OTP
    mtd: spi-nor: add support for flag status register on Micron chips
    mtd: Account for BBT blocks when a partition is being allocated
    ...

    Linus Torvalds
     
  • Now with 64bit bzImage and kexec tools, we support ramdisk that size is
    bigger than 2g, as we could put it above 4G.

    Found compressed initramfs image could not be decompressed properly. It
    turns out that image length is int during decompress detection, and it
    will become < 0 when length is more than 2G. Furthermore, during
    decompressing len as int is used for inbuf count, that has problem too.

    Change len to long, that should be ok as on 32 bit platform long is
    32bits.

    Tested with following compressed initramfs image as root with kexec.
    gzip, bzip2, xz, lzma, lzop, lz4.
    run time for populate_rootfs():
    size name Nehalem-EX Westmere-EX Ivybridge-EX
    9034400256 root_img : 26s 24s 30s
    3561095057 root_img.lz4 : 28s 27s 27s
    3459554629 root_img.lzo : 29s 29s 28s
    3219399480 root_img.gz : 64s 62s 49s
    2251594592 root_img.xz : 262s 260s 183s
    2226366598 root_img.lzma: 386s 376s 277s
    2901482513 root_img.bz2 : 635s 599s

    Signed-off-by: Yinghai Lu
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Rashika Kheria
    Cc: Josh Triplett
    Cc: Kyungsik Lee
    Cc: P J P
    Cc: Al Viro
    Cc: Tetsuo Handa
    Cc: "Daniel M. Weeks"
    Cc: Alexandre Courbot
    Cc: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

03 Jul, 2014

2 commits


13 Jun, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "This the bunch that sat in -next + lock_parent() fix. This is the
    minimal set; there's more pending stuff.

    In particular, I really hope to get acct.c fixes merged this cycle -
    we need that to deal sanely with delayed-mntput stuff. In the next
    pile, hopefully - that series is fairly short and localized
    (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
    iov_iter work. Most of prereqs for ->splice_write with sane locking
    order are there and Kent's dio rewrite would also fit nicely on top of
    this pile"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
    lock_parent: don't step on stale ->d_parent of all-but-freed one
    kill generic_file_splice_write()
    ceph: switch to iter_file_splice_write()
    shmem: switch to iter_file_splice_write()
    nfs: switch to iter_splice_write_file()
    fs/splice.c: remove unneeded exports
    ocfs2: switch to iter_file_splice_write()
    ->splice_write() via ->write_iter()
    bio_vec-backed iov_iter
    optimize copy_page_{to,from}_iter()
    bury generic_file_aio_{read,write}
    lustre: get rid of messing with iovecs
    ceph: switch to ->write_iter()
    ceph_sync_direct_write: stop poking into iov_iter guts
    ceph_sync_read: stop poking into iov_iter guts
    new helper: copy_page_from_iter()
    fuse: switch to ->write_iter()
    btrfs: switch to ->write_iter()
    ocfs2: switch to ->write_iter()
    xfs: switch to ->write_iter()
    ...

    Linus Torvalds
     

07 Jun, 2014

1 commit

  • jffs2_garbage_collect_thread() does disallow_signal(SIGHUP) around
    jffs2_garbage_collect_pass() and the comment says "We don't want SIGHUP
    to interrupt us".

    But disallow_signal() can't ensure that jffs2_garbage_collect_pass()
    won't be interrupted by SIGHUP, the problem is that SIGHUP can be
    already pending when disallow_signal() is called, and in this case any
    interruptible sleep won't block.

    Note: this is in fact because disallow_signal() is buggy and should be
    fixed, see the next changes.

    But there is another reason why disallow_signal() is wrong: SIG_IGN set
    by disallow_signal() silently discards any SIGHUP which can be sent
    before the next allow_signal(SIGHUP).

    Change this code to use sigprocmask(SIG_UNBLOCK/SIG_BLOCK, SIGHUP).
    This even matches the old (and wrong) semantics allow/disallow had when
    this logic was written.

    Signed-off-by: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Al Viro
    Cc: David Woodhouse
    Cc: Frederic Weisbecker
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    Cc: Richard Weinberger
    Cc: Steven Rostedt
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

07 May, 2014

2 commits


08 Apr, 2014

1 commit

  • Pull MTD updates from Brian Norris:
    - A few SPI NOR ID definitions
    - Kill the NAND "max pagesize" restriction
    - Fix some x16 bus-width NAND support
    - Add NAND JEDEC parameter page support
    - DT bindings for NAND ECC
    - GPMI NAND updates (subpage reads)
    - More OMAP NAND refactoring
    - New STMicro SPI NOR driver (now in 40 patches!)
    - A few other random bugfixes

    * tag 'for-linus-20140405' of git://git.infradead.org/linux-mtd: (120 commits)
    Fix index regression in nand_read_subpage
    mtd: diskonchip: mem resource name is not optional
    mtd: nand: fix mention to CONFIG_MTD_NAND_ECC_BCH
    mtd: nand: fix GET/SET_FEATURES address on 16-bit devices
    mtd: omap2: Use devm_ioremap_resource()
    mtd: denali_dt: Use devm_ioremap_resource()
    mtd: devices: elm: update DRIVER_NAME as "omap-elm"
    mtd: devices: elm: configure parallel channels based on ecc_steps
    mtd: devices: elm: clean elm_load_syndrome
    mtd: devices: elm: check for hardware engine's design constraints
    mtd: st_spi_fsm: Succinctly reorganise .remove()
    mtd: st_spi_fsm: Allow loop to run at least once before giving up CPU
    mtd: st_spi_fsm: Correct vendor name spelling issue - missing "M"
    mtd: st_spi_fsm: Avoid duplicating MTD core code
    mtd: st_spi_fsm: Remove useless consts from function arguments
    mtd: st_spi_fsm: Convert ST SPI FSM (NOR) Flash driver to new DT partitions
    mtd: st_spi_fsm: Move runtime configurable msg sequences into device's struct
    mtd: st_spi_fsm: Supply the W25Qxxx chip specific configuration call-back
    mtd: st_spi_fsm: Supply the S25FLxxx chip specific configuration call-back
    mtd: st_spi_fsm: Supply the MX25xxx chip specific configuration call-back
    ...

    Linus Torvalds
     

05 Apr, 2014

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     

04 Apr, 2014

2 commits

  • This patch removes read_cache_page_async() which wasn't really needed
    anywhere and simplifies the code around it a bit.

    read_cache_page_async() is useful when we want to read a page into the
    cache without waiting for it to complete. This happens when the
    appropriate callback 'filler' doesn't complete its read operation and
    releases the page lock immediately, and instead queues a different
    completion routine to do that. This never actually happened anywhere in
    the code.

    read_cache_page_async() had 3 different callers:

    - read_cache_page() which is the sync version, it would just wait for
    the requested read to complete using wait_on_page_read().

    - JFFS2 would call it from jffs2_gc_fetch_page(), but the filler
    function it supplied doesn't do any async reads, and would complete
    before the filler function returns - making it actually a sync read.

    - CRAMFS would call it using the read_mapping_page_async() wrapper, with
    a similar story to JFFS2 - the filler function doesn't do anything that
    reminds async reads and would always complete before the filler function
    returns.

    To sum it up, the code in mm/filemap.c never took advantage of having
    read_cache_page_async(). While there are filler callbacks that do async
    reads (such as the block one), we always called it with the
    read_cache_page().

    This patch adds a mandatory wait for read to complete when adding a new
    page to the cache, and removes read_cache_page_async() and its wrappers.

    Signed-off-by: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

11 Mar, 2014

5 commits

  • mounting JFFS2 partition sometimes crashes with this call trace:

    [ 1322.240000] Kernel bug detected[#1]:
    [ 1322.244000] Cpu 2
    [ 1322.244000] $ 0 : 0000000000000000 0000000000000018 000000003ff00070 0000000000000001
    [ 1322.252000] $ 4 : 0000000000000000 c0000000f3980150 0000000000000000 0000000000010000
    [ 1322.260000] $ 8 : ffffffffc09cd5f8 0000000000000001 0000000000000088 c0000000ed300de8
    [ 1322.268000] $12 : e5e19d9c5f613a45 ffffffffc046d464 0000000000000000 66227ba5ea67b74e
    [ 1322.276000] $16 : c0000000f1769c00 c0000000ed1e0200 c0000000f3980150 0000000000000000
    [ 1322.284000] $20 : c0000000f3a80000 00000000fffffffc c0000000ed2cfbd8 c0000000f39818f0
    [ 1322.292000] $24 : 0000000000000004 0000000000000000
    [ 1322.300000] $28 : c0000000ed2c0000 c0000000ed2cfab8 0000000000010000 ffffffffc039c0b0
    [ 1322.308000] Hi : 000000000000023c
    [ 1322.312000] Lo : 000000000003f802
    [ 1322.316000] epc : ffffffffc039a9f8 check_tn_node+0x88/0x3b0
    [ 1322.320000] Not tainted
    [ 1322.324000] ra : ffffffffc039c0b0 jffs2_do_read_inode_internal+0x1250/0x1e48
    [ 1322.332000] Status: 5400f8e3 KX SX UX KERNEL EXL IE
    [ 1322.336000] Cause : 00800034
    [ 1322.340000] PrId : 000c1004 (Netlogic XLP)
    [ 1322.344000] Modules linked in:
    [ 1322.348000] Process jffs2_gcd_mtd7 (pid: 264, threadinfo=c0000000ed2c0000, task=c0000000f0e68dd8, tls=0000000000000000)
    [ 1322.356000] Stack : c0000000f1769e30 c0000000ed010780 c0000000ed010780 c0000000ed300000
    c0000000f1769c00 c0000000f3980150 c0000000f3a80000 00000000fffffffc
    c0000000ed2cfbd8 ffffffffc039c0b0 ffffffffc09c6340 0000000000001000
    0000000000000dec ffffffffc016c9d8 c0000000f39805a0 c0000000f3980180
    0000008600000000 0000000000000000 0000000000000000 0000000000000000
    0001000000000dec c0000000f1769d98 c0000000ed2cfb18 0000000000010000
    0000000000010000 0000000000000044 c0000000f3a80000 c0000000f1769c00
    c0000000f3d207a8 c0000000f1769d98 c0000000f1769de0 ffffffffc076f9c0
    0000000000000009 0000000000000000 0000000000000000 ffffffffc039cf90
    0000000000000017 ffffffffc013fbdc 0000000000000001 000000010003e61c
    ...
    [ 1322.424000] Call Trace:
    [ 1322.428000] [] check_tn_node+0x88/0x3b0
    [ 1322.432000] [] jffs2_do_read_inode_internal+0x1250/0x1e48
    [ 1322.440000] [] jffs2_do_crccheck_inode+0x70/0xd0
    [ 1322.448000] [] jffs2_garbage_collect_pass+0x160/0x870
    [ 1322.452000] [] jffs2_garbage_collect_thread+0xdc/0x1f0
    [ 1322.460000] [] kthread+0xb8/0xc0
    [ 1322.464000] [] kernel_thread_helper+0x10/0x18
    [ 1322.472000]
    [ 1322.472000]
    Code: 67bd0050 94a4002c 2c830001 de050218 2403fffc 0080a82d 00431824 24630044
    [ 1322.480000] ---[ end trace b052bb90e97dfbf5 ]---

    The variable csize in structure jffs2_tmp_dnode_info is of type uint16_t, but it
    is used to hold the compressed data length(csize) which is declared as uint32_t.
    So, when the value of csize exceeds 16bits, it gets truncated when assigned to
    tn->csize. This is causing a kernel BUG.
    Changing the definition of csize in jffs2_tmp_dnode_info to uint32_t fixes the issue.

    Signed-off-by: Ajesh Kunhipurayil Vijayan
    Signed-off-by: Kamlakant Patel
    Cc:
    Signed-off-by: Brian Norris

    Ajesh Kunhipurayil Vijayan
     
  • Creating a large file on a JFFS2 partition sometimes crashes with this call
    trace:

    [ 306.476000] CPU 13 Unable to handle kernel paging request at virtual address c0000000dfff8002, epc == ffffffffc03a80a8, ra == ffffffffc03a8044
    [ 306.488000] Oops[#1]:
    [ 306.488000] Cpu 13
    [ 306.492000] $ 0 : 0000000000000000 0000000000000000 0000000000008008 0000000000008007
    [ 306.500000] $ 4 : c0000000dfff8002 000000000000009f c0000000e0007cde c0000000ee95fa58
    [ 306.508000] $ 8 : 0000000000000001 0000000000008008 0000000000010000 ffffffffffff8002
    [ 306.516000] $12 : 0000000000007fa9 000000000000ff0e 000000000000ff0f 80e55930aebb92bb
    [ 306.524000] $16 : c0000000e0000000 c0000000ee95fa5c c0000000efc80000 ffffffffc09edd70
    [ 306.532000] $20 : ffffffffc2b60000 c0000000ee95fa58 0000000000000000 c0000000efc80000
    [ 306.540000] $24 : 0000000000000000 0000000000000004
    [ 306.548000] $28 : c0000000ee950000 c0000000ee95f738 0000000000000000 ffffffffc03a8044
    [ 306.556000] Hi : 00000000000574a5
    [ 306.560000] Lo : 6193b7a7e903d8c9
    [ 306.564000] epc : ffffffffc03a80a8 jffs2_rtime_compress+0x98/0x198
    [ 306.568000] Tainted: G W
    [ 306.572000] ra : ffffffffc03a8044 jffs2_rtime_compress+0x34/0x198
    [ 306.580000] Status: 5000f8e3 KX SX UX KERNEL EXL IE
    [ 306.584000] Cause : 00800008
    [ 306.588000] BadVA : c0000000dfff8002
    [ 306.592000] PrId : 000c1100 (Netlogic XLP)
    [ 306.596000] Modules linked in:
    [ 306.596000] Process dd (pid: 170, threadinfo=c0000000ee950000, task=c0000000ee6e0858, tls=0000000000c47490)
    [ 306.608000] Stack : 7c547f377ddc7ee4 7ffc7f967f5d7fae 7f617f507fc37ff4 7e7d7f817f487f5f
    7d8e7fec7ee87eb3 7e977ff27eec7f9e 7d677ec67f917f67 7f3d7e457f017ed7
    7fd37f517f867eb2 7fed7fd17ca57e1d 7e5f7fe87f257f77 7fd77f0d7ede7fdb
    7fba7fef7e197f99 7fde7fe07ee37eb5 7f5c7f8c7fc67f65 7f457fb87f847e93
    7f737f3e7d137cd9 7f8e7e9c7fc47d25 7dbb7fac7fb67e52 7ff17f627da97f64
    7f6b7df77ffa7ec5 80057ef17f357fb3 7f767fa27dfc7fd5 7fe37e8e7fd07e53
    7e227fcf7efb7fa1 7f547e787fa87fcc 7fcb7fc57f5a7ffb 7fc07f6c7ea97e80
    7e2d7ed17e587ee0 7fb17f9d7feb7f31 7f607e797e887faa 7f757fdd7c607ff3
    7e877e657ef37fbd 7ec17fd67fe67ff7 7ff67f797ff87dc4 7eef7f3a7c337fa6
    7fe57fc97ed87f4b 7ebe7f097f0b8003 7fe97e2a7d997cba 7f587f987f3c7fa9
    ...
    [ 306.676000] Call Trace:
    [ 306.680000] [] jffs2_rtime_compress+0x98/0x198
    [ 306.684000] [] jffs2_selected_compress+0x110/0x230
    [ 306.692000] [] jffs2_compress+0x5c/0x388
    [ 306.696000] [] jffs2_write_inode_range+0xd8/0x388
    [ 306.704000] [] jffs2_write_end+0x16c/0x2d0
    [ 306.708000] [] generic_file_buffered_write+0xf8/0x2b8
    [ 306.716000] [] __generic_file_aio_write+0x1ac/0x350
    [ 306.720000] [] generic_file_aio_write+0x80/0x168
    [ 306.728000] [] do_sync_write+0x94/0xf8
    [ 306.732000] [] vfs_write+0xa4/0x1a0
    [ 306.736000] [] SyS_write+0x50/0x90
    [ 306.744000] [] handle_sys+0x180/0x1a0
    [ 306.748000]
    [ 306.748000]
    Code: 020b202d 0205282d 90a50000 14a40038 00000000 0060602d 0000282d 016c5823
    [ 306.760000] ---[ end trace 79dd088435be02d0 ]---
    Segmentation fault

    This crash is caused because the 'positions' is declared as an array of signed
    short. The value of position is in the range 0..65535, and will be converted
    to a negative number when the position is greater than 32767 and causes a
    corruption and crash. Changing the definition to 'unsigned short' fixes this
    issue

    Signed-off-by: Jayachandran C
    Signed-off-by: Kamlakant Patel
    Cc:
    Signed-off-by: Brian Norris

    Kamlakant Patel
     
  • If jffs2_new_inode() succeeds, it returns with f->sem held, and the caller
    is responsible for releasing the lock. If it fails, it still returns with
    the lock held, but the caller won't release the lock, which will lead to
    deadlock.

    Fix it by releasing the lock in jffs2_new_inode() on error.

    Signed-off-by: Wang Guoli
    Signed-off-by: Wang Nan
    Cc: Artem Bityutskiy
    Cc: David Woodhouse
    Cc: Wang Guoli
    Signed-off-by: Andrew Morton
    [Brian: not marked for stable; no one observed deadlock, and I don't
    think it can happen here]
    Signed-off-by: Brian Norris

    Wang Guoli
     
  • We triggered soft-lockup under stress test on 2.6.34 kernel.

    BUG: soft lockup - CPU#1 stuck for 60009ms! [lockf2.test:14488]
    ...
    [] (jffs2_do_reserve_space+0x420/0x440 [jffs2])
    [] (jffs2_reserve_space_gc+0x34/0x78 [jffs2])
    [] (jffs2_garbage_collect_dnode.isra.3+0x264/0x478 [jffs2])
    [] (jffs2_garbage_collect_pass+0x9c0/0xe4c [jffs2])
    [] (jffs2_reserve_space+0x104/0x2a8 [jffs2])
    [] (jffs2_write_inode_range+0x5c/0x4d4 [jffs2])
    [] (jffs2_write_end+0x198/0x2c0 [jffs2])
    [] (generic_file_buffered_write+0x158/0x200)
    [] (__generic_file_aio_write+0x3a4/0x414)
    [] (generic_file_aio_write+0x5c/0xbc)
    [] (do_sync_write+0x98/0xd4)
    [] (vfs_write+0xa8/0x150)
    [] (sys_write+0x3c/0xc0)]

    Fix this by adding a cond_resched() in the while loop.

    [akpm@linux-foundation.org: don't initialize `ret']
    Signed-off-by: Li Zefan
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Brian Norris

    Li Zefan
     
  • @wait is a local variable, so if we don't remove it from the wait queue
    list, later wake_up() may end up accessing invalid memory.

    This was spotted by eyes.

    Signed-off-by: Li Zefan
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Brian Norris

    Li Zefan
     

29 Jan, 2014

2 commits

  • Pull MTD updates from Brian Norris:
    - Add me (Brian Norris) as an additional MTD maintainer (it'd be nice to get
    David's "ack" for this; I'm sure he approves, but he's been pretty silent
    lately)
    - Add Ezequiel Garcie as maintainer for the pxa3xx NAND driver
    - Last (?) round of pxa3xx improvements for supporting Armada 370/XP
    - Typical churn in driver boilerplate (OOM messages, printk()'s, devm_*, etc.)
    - Quad read mode support for SPI NOR driver (m25p80)
    - Update Davinci NAND driver to prepare for use on new platforms
    - Begin to kill off NAND_MAX_{PAGE,OOB}SIZE macros; more work is pending
    - Miscellaneous NAND device support (new IDs)
    - Add READ RETRY support for Micron MLC NAND
    - Support new GPMI NAND ECC layout device-tree binding
    - Avoid mapping stack/vmalloc() memory for GPMI NAND DMA

    * tag 'for-linus-20140127' of git://git.infradead.org/linux-mtd: (151 commits)
    mtd: gpmi: add sanity check when mapping DMA for read_buf/write_buf
    mtd: gpmi: allocate a proper buffer for non ECC read/write
    mtd: m25p80: Set rx_nbits for Quad SPI transfers
    mtd: m25p80: Enable Quad SPI read transfers for s25fl512s
    mtd: s3c2410: Merge plat/regs-nand.h into s3c2410.c
    mtd: mtdram: add missing 'const'
    mtd: m25p80: assign default read command
    mtd: nuc900_nand: remove redundant return value check of platform_get_resource()
    mtd: plat_nand: remove redundant return value check of platform_get_resource()
    mtd: nand: add Intel manufacturer ID
    mtd: nand: add SanDisk manufacturer ID
    mtd: nand: add support for Samsung K9LCG08U0B
    mtd: nand: pxa3xx: Add support for 2048 bytes page size devices
    mtd: m25p80: Use OPCODE_QUAD_READ_4B for 4-byte addressing
    mtd: nand: don't use {read,write}_buf for 8-bit transfers
    mtd: nand: use __packed shorthand
    mtd: nand: support Micron READ RETRY
    mtd: nand: add generic READ RETRY support
    mtd: nand: add ONFI vendor block for Micron
    mtd: nand: localize ECC failures per page
    ...

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
    assorted cleanups and fixes all over the place...

    There will be another pile later this week"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
    __dentry_path() fixes
    vfs: Remove second variable named error in __dentry_path
    vfs: Is mounted should be testing mnt_ns for NULL or error.
    Fix race when checking i_size on direct i/o read
    hfsplus: remove can_set_xattr
    nfsd: use get_acl and ->set_acl
    fs: remove generic_acl
    nfs: use generic posix ACL infrastructure for v3 Posix ACLs
    gfs2: use generic posix ACL infrastructure
    jfs: use generic posix ACL infrastructure
    xfs: use generic posix ACL infrastructure
    reiserfs: use generic posix ACL infrastructure
    ocfs2: use generic posix ACL infrastructure
    jffs2: use generic posix ACL infrastructure
    hfsplus: use generic posix ACL infrastructure
    f2fs: use generic posix ACL infrastructure
    ext2/3/4: use generic posix ACL infrastructure
    btrfs: use generic posix ACL infrastructure
    fs: make posix_acl_create more useful
    fs: make posix_acl_chmod more useful
    ...

    Linus Torvalds
     

26 Jan, 2014

3 commits


24 Jan, 2014

1 commit


04 Jan, 2014

1 commit


28 Oct, 2013

1 commit


29 Jun, 2013

1 commit


04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


22 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: David Woodhouse
    Cc: Al Viro
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

18 Nov, 2012

1 commit

  • Users of jffs2_do_reserve_space() expect they still held
    erase_completion_lock after call to it. But there is a path
    where jffs2_do_reserve_space() leaves erase_completion_lock unlocked.
    The patch fixes it.

    Found by Linux Driver Verification project (linuxtesting.org).

    Signed-off-by: Alexey Khoroshilov
    Cc: stable@vger.kernel.org
    Signed-off-by: Artem Bityutskiy

    Alexey Khoroshilov
     

09 Nov, 2012

1 commit

  • jffs2_write_begin() first acquires the page lock, then f->sem. This
    causes an AB-BA deadlock with jffs2_garbage_collect_live(), which first
    acquires f->sem, then the page lock:

    jffs2_garbage_collect_live
    mutex_lock(&f->sem) (A)
    jffs2_garbage_collect_dnode
    jffs2_gc_fetch_page
    read_cache_page_async
    do_read_cache_page
    lock_page(page) (B)

    jffs2_write_begin
    grab_cache_page_write_begin
    find_lock_page
    lock_page(page) (B)
    mutex_lock(&f->sem) (A)

    We fix this by restructuring jffs2_write_begin() to take f->sem before
    the page lock. However, we make sure that f->sem is not held when
    calling jffs2_reserve_space(), as this is not permitted by the locking
    rules.

    The deadlock above was observed multiple times on an SoC with a dual
    ARMv7 (Cortex-A9), running the long-term 3.4.11 kernel; it occurred
    when using scp to copy files from a host system to the ARM target
    system. The fix was heavily tested on the same target system.

    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Betker
    Acked-by: Joakim Tjernlund
    Signed-off-by: Artem Bityutskiy

    Thomas Betker
     

09 Oct, 2012

2 commits


03 Oct, 2012

2 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     

29 Sep, 2012

2 commits

  • JFFS2 was designed without thought for OOB bitflips, it seems, but they
    can occur and will be reported to JFFS2 via mtd_read_oob()[1]. We don't
    want to fail on these transactions, since the data was corrected.

    [1] Few drivers report bitflips for OOB-only transactions. With such
    drivers, this patch should have no effect.

    Signed-off-by: Brian Norris
    Cc: stable@vger.kernel.org
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: David Woodhouse

    Brian Norris
     
  • This patch fixes regression introduced by
    "8bdc81c jffs2: get rid of jffs2_sync_super". We submit a delayed work in order
    to make sure the write-buffer is synchronized at some point. But we do not
    flush it when we unmount, which causes an oops when we unmount the file-system
    and then the delayed work is executed.

    This patch fixes the issue by adding a "cancel_delayed_work_sync()" infocation
    in the '->sync_fs()' handler. This will make sure the delayed work is canceled
    on sync, unmount and re-mount. And because VFS always callse 'sync_fs()' before
    unmounting or remounting, this fixes the issue.

    Reported-by: Ludovic Desroches
    Cc: stable@vger.kernel.org [3.5+]
    Signed-off-by: Artem Bityutskiy
    Tested-by: Ludovic Desroches
    Signed-off-by: David Woodhouse

    Artem Bityutskiy