08 Apr, 2014

1 commit

  • Pull MTD updates from Brian Norris:
    - A few SPI NOR ID definitions
    - Kill the NAND "max pagesize" restriction
    - Fix some x16 bus-width NAND support
    - Add NAND JEDEC parameter page support
    - DT bindings for NAND ECC
    - GPMI NAND updates (subpage reads)
    - More OMAP NAND refactoring
    - New STMicro SPI NOR driver (now in 40 patches!)
    - A few other random bugfixes

    * tag 'for-linus-20140405' of git://git.infradead.org/linux-mtd: (120 commits)
    Fix index regression in nand_read_subpage
    mtd: diskonchip: mem resource name is not optional
    mtd: nand: fix mention to CONFIG_MTD_NAND_ECC_BCH
    mtd: nand: fix GET/SET_FEATURES address on 16-bit devices
    mtd: omap2: Use devm_ioremap_resource()
    mtd: denali_dt: Use devm_ioremap_resource()
    mtd: devices: elm: update DRIVER_NAME as "omap-elm"
    mtd: devices: elm: configure parallel channels based on ecc_steps
    mtd: devices: elm: clean elm_load_syndrome
    mtd: devices: elm: check for hardware engine's design constraints
    mtd: st_spi_fsm: Succinctly reorganise .remove()
    mtd: st_spi_fsm: Allow loop to run at least once before giving up CPU
    mtd: st_spi_fsm: Correct vendor name spelling issue - missing "M"
    mtd: st_spi_fsm: Avoid duplicating MTD core code
    mtd: st_spi_fsm: Remove useless consts from function arguments
    mtd: st_spi_fsm: Convert ST SPI FSM (NOR) Flash driver to new DT partitions
    mtd: st_spi_fsm: Move runtime configurable msg sequences into device's struct
    mtd: st_spi_fsm: Supply the W25Qxxx chip specific configuration call-back
    mtd: st_spi_fsm: Supply the S25FLxxx chip specific configuration call-back
    mtd: st_spi_fsm: Supply the MX25xxx chip specific configuration call-back
    ...

    Linus Torvalds
     

05 Apr, 2014

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Major changes for 3.14 include support for the newly added ZERO_RANGE
    and COLLAPSE_RANGE fallocate operations, and scalability improvements
    in the jbd2 layer and in xattr handling when the extended attributes
    spill over into an external block.

    Other than that, the usual clean ups and minor bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
    ext4: fix premature freeing of partial clusters split across leaf blocks
    ext4: remove unneeded test of ret variable
    ext4: fix comment typo
    ext4: make ext4_block_zero_page_range static
    ext4: atomically set inode->i_flags in ext4_set_inode_flags()
    ext4: optimize Hurd tests when reading/writing inodes
    ext4: kill i_version support for Hurd-castrated file systems
    ext4: each filesystem creates and uses its own mb_cache
    fs/mbcache.c: doucple the locking of local from global data
    fs/mbcache.c: change block and index hash chain to hlist_bl_node
    ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
    ext4: refactor ext4_fallocate code
    ext4: Update inode i_size after the preallocation
    ext4: fix partial cluster handling for bigalloc file systems
    ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
    ext4: only call sync_filesystm() when remounting read-only
    fs: push sync_filesystem() down to the file system's remount_fs()
    jbd2: improve error messages for inconsistent journal heads
    jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
    jbd2: minimize region locked by j_list_lock in journal_get_create_access()
    ...

    Linus Torvalds
     

04 Apr, 2014

2 commits

  • This patch removes read_cache_page_async() which wasn't really needed
    anywhere and simplifies the code around it a bit.

    read_cache_page_async() is useful when we want to read a page into the
    cache without waiting for it to complete. This happens when the
    appropriate callback 'filler' doesn't complete its read operation and
    releases the page lock immediately, and instead queues a different
    completion routine to do that. This never actually happened anywhere in
    the code.

    read_cache_page_async() had 3 different callers:

    - read_cache_page() which is the sync version, it would just wait for
    the requested read to complete using wait_on_page_read().

    - JFFS2 would call it from jffs2_gc_fetch_page(), but the filler
    function it supplied doesn't do any async reads, and would complete
    before the filler function returns - making it actually a sync read.

    - CRAMFS would call it using the read_mapping_page_async() wrapper, with
    a similar story to JFFS2 - the filler function doesn't do anything that
    reminds async reads and would always complete before the filler function
    returns.

    To sum it up, the code in mm/filemap.c never took advantage of having
    read_cache_page_async(). While there are filler callbacks that do async
    reads (such as the block one), we always called it with the
    read_cache_page().

    This patch adds a mandatory wait for read to complete when adding a new
    page to the cache, and removes read_cache_page_async() and its wrappers.

    Signed-off-by: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

11 Mar, 2014

5 commits

  • mounting JFFS2 partition sometimes crashes with this call trace:

    [ 1322.240000] Kernel bug detected[#1]:
    [ 1322.244000] Cpu 2
    [ 1322.244000] $ 0 : 0000000000000000 0000000000000018 000000003ff00070 0000000000000001
    [ 1322.252000] $ 4 : 0000000000000000 c0000000f3980150 0000000000000000 0000000000010000
    [ 1322.260000] $ 8 : ffffffffc09cd5f8 0000000000000001 0000000000000088 c0000000ed300de8
    [ 1322.268000] $12 : e5e19d9c5f613a45 ffffffffc046d464 0000000000000000 66227ba5ea67b74e
    [ 1322.276000] $16 : c0000000f1769c00 c0000000ed1e0200 c0000000f3980150 0000000000000000
    [ 1322.284000] $20 : c0000000f3a80000 00000000fffffffc c0000000ed2cfbd8 c0000000f39818f0
    [ 1322.292000] $24 : 0000000000000004 0000000000000000
    [ 1322.300000] $28 : c0000000ed2c0000 c0000000ed2cfab8 0000000000010000 ffffffffc039c0b0
    [ 1322.308000] Hi : 000000000000023c
    [ 1322.312000] Lo : 000000000003f802
    [ 1322.316000] epc : ffffffffc039a9f8 check_tn_node+0x88/0x3b0
    [ 1322.320000] Not tainted
    [ 1322.324000] ra : ffffffffc039c0b0 jffs2_do_read_inode_internal+0x1250/0x1e48
    [ 1322.332000] Status: 5400f8e3 KX SX UX KERNEL EXL IE
    [ 1322.336000] Cause : 00800034
    [ 1322.340000] PrId : 000c1004 (Netlogic XLP)
    [ 1322.344000] Modules linked in:
    [ 1322.348000] Process jffs2_gcd_mtd7 (pid: 264, threadinfo=c0000000ed2c0000, task=c0000000f0e68dd8, tls=0000000000000000)
    [ 1322.356000] Stack : c0000000f1769e30 c0000000ed010780 c0000000ed010780 c0000000ed300000
    c0000000f1769c00 c0000000f3980150 c0000000f3a80000 00000000fffffffc
    c0000000ed2cfbd8 ffffffffc039c0b0 ffffffffc09c6340 0000000000001000
    0000000000000dec ffffffffc016c9d8 c0000000f39805a0 c0000000f3980180
    0000008600000000 0000000000000000 0000000000000000 0000000000000000
    0001000000000dec c0000000f1769d98 c0000000ed2cfb18 0000000000010000
    0000000000010000 0000000000000044 c0000000f3a80000 c0000000f1769c00
    c0000000f3d207a8 c0000000f1769d98 c0000000f1769de0 ffffffffc076f9c0
    0000000000000009 0000000000000000 0000000000000000 ffffffffc039cf90
    0000000000000017 ffffffffc013fbdc 0000000000000001 000000010003e61c
    ...
    [ 1322.424000] Call Trace:
    [ 1322.428000] [] check_tn_node+0x88/0x3b0
    [ 1322.432000] [] jffs2_do_read_inode_internal+0x1250/0x1e48
    [ 1322.440000] [] jffs2_do_crccheck_inode+0x70/0xd0
    [ 1322.448000] [] jffs2_garbage_collect_pass+0x160/0x870
    [ 1322.452000] [] jffs2_garbage_collect_thread+0xdc/0x1f0
    [ 1322.460000] [] kthread+0xb8/0xc0
    [ 1322.464000] [] kernel_thread_helper+0x10/0x18
    [ 1322.472000]
    [ 1322.472000]
    Code: 67bd0050 94a4002c 2c830001 de050218 2403fffc 0080a82d 00431824 24630044
    [ 1322.480000] ---[ end trace b052bb90e97dfbf5 ]---

    The variable csize in structure jffs2_tmp_dnode_info is of type uint16_t, but it
    is used to hold the compressed data length(csize) which is declared as uint32_t.
    So, when the value of csize exceeds 16bits, it gets truncated when assigned to
    tn->csize. This is causing a kernel BUG.
    Changing the definition of csize in jffs2_tmp_dnode_info to uint32_t fixes the issue.

    Signed-off-by: Ajesh Kunhipurayil Vijayan
    Signed-off-by: Kamlakant Patel
    Cc:
    Signed-off-by: Brian Norris

    Ajesh Kunhipurayil Vijayan
     
  • Creating a large file on a JFFS2 partition sometimes crashes with this call
    trace:

    [ 306.476000] CPU 13 Unable to handle kernel paging request at virtual address c0000000dfff8002, epc == ffffffffc03a80a8, ra == ffffffffc03a8044
    [ 306.488000] Oops[#1]:
    [ 306.488000] Cpu 13
    [ 306.492000] $ 0 : 0000000000000000 0000000000000000 0000000000008008 0000000000008007
    [ 306.500000] $ 4 : c0000000dfff8002 000000000000009f c0000000e0007cde c0000000ee95fa58
    [ 306.508000] $ 8 : 0000000000000001 0000000000008008 0000000000010000 ffffffffffff8002
    [ 306.516000] $12 : 0000000000007fa9 000000000000ff0e 000000000000ff0f 80e55930aebb92bb
    [ 306.524000] $16 : c0000000e0000000 c0000000ee95fa5c c0000000efc80000 ffffffffc09edd70
    [ 306.532000] $20 : ffffffffc2b60000 c0000000ee95fa58 0000000000000000 c0000000efc80000
    [ 306.540000] $24 : 0000000000000000 0000000000000004
    [ 306.548000] $28 : c0000000ee950000 c0000000ee95f738 0000000000000000 ffffffffc03a8044
    [ 306.556000] Hi : 00000000000574a5
    [ 306.560000] Lo : 6193b7a7e903d8c9
    [ 306.564000] epc : ffffffffc03a80a8 jffs2_rtime_compress+0x98/0x198
    [ 306.568000] Tainted: G W
    [ 306.572000] ra : ffffffffc03a8044 jffs2_rtime_compress+0x34/0x198
    [ 306.580000] Status: 5000f8e3 KX SX UX KERNEL EXL IE
    [ 306.584000] Cause : 00800008
    [ 306.588000] BadVA : c0000000dfff8002
    [ 306.592000] PrId : 000c1100 (Netlogic XLP)
    [ 306.596000] Modules linked in:
    [ 306.596000] Process dd (pid: 170, threadinfo=c0000000ee950000, task=c0000000ee6e0858, tls=0000000000c47490)
    [ 306.608000] Stack : 7c547f377ddc7ee4 7ffc7f967f5d7fae 7f617f507fc37ff4 7e7d7f817f487f5f
    7d8e7fec7ee87eb3 7e977ff27eec7f9e 7d677ec67f917f67 7f3d7e457f017ed7
    7fd37f517f867eb2 7fed7fd17ca57e1d 7e5f7fe87f257f77 7fd77f0d7ede7fdb
    7fba7fef7e197f99 7fde7fe07ee37eb5 7f5c7f8c7fc67f65 7f457fb87f847e93
    7f737f3e7d137cd9 7f8e7e9c7fc47d25 7dbb7fac7fb67e52 7ff17f627da97f64
    7f6b7df77ffa7ec5 80057ef17f357fb3 7f767fa27dfc7fd5 7fe37e8e7fd07e53
    7e227fcf7efb7fa1 7f547e787fa87fcc 7fcb7fc57f5a7ffb 7fc07f6c7ea97e80
    7e2d7ed17e587ee0 7fb17f9d7feb7f31 7f607e797e887faa 7f757fdd7c607ff3
    7e877e657ef37fbd 7ec17fd67fe67ff7 7ff67f797ff87dc4 7eef7f3a7c337fa6
    7fe57fc97ed87f4b 7ebe7f097f0b8003 7fe97e2a7d997cba 7f587f987f3c7fa9
    ...
    [ 306.676000] Call Trace:
    [ 306.680000] [] jffs2_rtime_compress+0x98/0x198
    [ 306.684000] [] jffs2_selected_compress+0x110/0x230
    [ 306.692000] [] jffs2_compress+0x5c/0x388
    [ 306.696000] [] jffs2_write_inode_range+0xd8/0x388
    [ 306.704000] [] jffs2_write_end+0x16c/0x2d0
    [ 306.708000] [] generic_file_buffered_write+0xf8/0x2b8
    [ 306.716000] [] __generic_file_aio_write+0x1ac/0x350
    [ 306.720000] [] generic_file_aio_write+0x80/0x168
    [ 306.728000] [] do_sync_write+0x94/0xf8
    [ 306.732000] [] vfs_write+0xa4/0x1a0
    [ 306.736000] [] SyS_write+0x50/0x90
    [ 306.744000] [] handle_sys+0x180/0x1a0
    [ 306.748000]
    [ 306.748000]
    Code: 020b202d 0205282d 90a50000 14a40038 00000000 0060602d 0000282d 016c5823
    [ 306.760000] ---[ end trace 79dd088435be02d0 ]---
    Segmentation fault

    This crash is caused because the 'positions' is declared as an array of signed
    short. The value of position is in the range 0..65535, and will be converted
    to a negative number when the position is greater than 32767 and causes a
    corruption and crash. Changing the definition to 'unsigned short' fixes this
    issue

    Signed-off-by: Jayachandran C
    Signed-off-by: Kamlakant Patel
    Cc:
    Signed-off-by: Brian Norris

    Kamlakant Patel
     
  • If jffs2_new_inode() succeeds, it returns with f->sem held, and the caller
    is responsible for releasing the lock. If it fails, it still returns with
    the lock held, but the caller won't release the lock, which will lead to
    deadlock.

    Fix it by releasing the lock in jffs2_new_inode() on error.

    Signed-off-by: Wang Guoli
    Signed-off-by: Wang Nan
    Cc: Artem Bityutskiy
    Cc: David Woodhouse
    Cc: Wang Guoli
    Signed-off-by: Andrew Morton
    [Brian: not marked for stable; no one observed deadlock, and I don't
    think it can happen here]
    Signed-off-by: Brian Norris

    Wang Guoli
     
  • We triggered soft-lockup under stress test on 2.6.34 kernel.

    BUG: soft lockup - CPU#1 stuck for 60009ms! [lockf2.test:14488]
    ...
    [] (jffs2_do_reserve_space+0x420/0x440 [jffs2])
    [] (jffs2_reserve_space_gc+0x34/0x78 [jffs2])
    [] (jffs2_garbage_collect_dnode.isra.3+0x264/0x478 [jffs2])
    [] (jffs2_garbage_collect_pass+0x9c0/0xe4c [jffs2])
    [] (jffs2_reserve_space+0x104/0x2a8 [jffs2])
    [] (jffs2_write_inode_range+0x5c/0x4d4 [jffs2])
    [] (jffs2_write_end+0x198/0x2c0 [jffs2])
    [] (generic_file_buffered_write+0x158/0x200)
    [] (__generic_file_aio_write+0x3a4/0x414)
    [] (generic_file_aio_write+0x5c/0xbc)
    [] (do_sync_write+0x98/0xd4)
    [] (vfs_write+0xa8/0x150)
    [] (sys_write+0x3c/0xc0)]

    Fix this by adding a cond_resched() in the while loop.

    [akpm@linux-foundation.org: don't initialize `ret']
    Signed-off-by: Li Zefan
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Brian Norris

    Li Zefan
     
  • @wait is a local variable, so if we don't remove it from the wait queue
    list, later wake_up() may end up accessing invalid memory.

    This was spotted by eyes.

    Signed-off-by: Li Zefan
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Brian Norris

    Li Zefan
     

29 Jan, 2014

2 commits

  • Pull MTD updates from Brian Norris:
    - Add me (Brian Norris) as an additional MTD maintainer (it'd be nice to get
    David's "ack" for this; I'm sure he approves, but he's been pretty silent
    lately)
    - Add Ezequiel Garcie as maintainer for the pxa3xx NAND driver
    - Last (?) round of pxa3xx improvements for supporting Armada 370/XP
    - Typical churn in driver boilerplate (OOM messages, printk()'s, devm_*, etc.)
    - Quad read mode support for SPI NOR driver (m25p80)
    - Update Davinci NAND driver to prepare for use on new platforms
    - Begin to kill off NAND_MAX_{PAGE,OOB}SIZE macros; more work is pending
    - Miscellaneous NAND device support (new IDs)
    - Add READ RETRY support for Micron MLC NAND
    - Support new GPMI NAND ECC layout device-tree binding
    - Avoid mapping stack/vmalloc() memory for GPMI NAND DMA

    * tag 'for-linus-20140127' of git://git.infradead.org/linux-mtd: (151 commits)
    mtd: gpmi: add sanity check when mapping DMA for read_buf/write_buf
    mtd: gpmi: allocate a proper buffer for non ECC read/write
    mtd: m25p80: Set rx_nbits for Quad SPI transfers
    mtd: m25p80: Enable Quad SPI read transfers for s25fl512s
    mtd: s3c2410: Merge plat/regs-nand.h into s3c2410.c
    mtd: mtdram: add missing 'const'
    mtd: m25p80: assign default read command
    mtd: nuc900_nand: remove redundant return value check of platform_get_resource()
    mtd: plat_nand: remove redundant return value check of platform_get_resource()
    mtd: nand: add Intel manufacturer ID
    mtd: nand: add SanDisk manufacturer ID
    mtd: nand: add support for Samsung K9LCG08U0B
    mtd: nand: pxa3xx: Add support for 2048 bytes page size devices
    mtd: m25p80: Use OPCODE_QUAD_READ_4B for 4-byte addressing
    mtd: nand: don't use {read,write}_buf for 8-bit transfers
    mtd: nand: use __packed shorthand
    mtd: nand: support Micron READ RETRY
    mtd: nand: add generic READ RETRY support
    mtd: nand: add ONFI vendor block for Micron
    mtd: nand: localize ECC failures per page
    ...

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
    assorted cleanups and fixes all over the place...

    There will be another pile later this week"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
    __dentry_path() fixes
    vfs: Remove second variable named error in __dentry_path
    vfs: Is mounted should be testing mnt_ns for NULL or error.
    Fix race when checking i_size on direct i/o read
    hfsplus: remove can_set_xattr
    nfsd: use get_acl and ->set_acl
    fs: remove generic_acl
    nfs: use generic posix ACL infrastructure for v3 Posix ACLs
    gfs2: use generic posix ACL infrastructure
    jfs: use generic posix ACL infrastructure
    xfs: use generic posix ACL infrastructure
    reiserfs: use generic posix ACL infrastructure
    ocfs2: use generic posix ACL infrastructure
    jffs2: use generic posix ACL infrastructure
    hfsplus: use generic posix ACL infrastructure
    f2fs: use generic posix ACL infrastructure
    ext2/3/4: use generic posix ACL infrastructure
    btrfs: use generic posix ACL infrastructure
    fs: make posix_acl_create more useful
    fs: make posix_acl_chmod more useful
    ...

    Linus Torvalds
     

26 Jan, 2014

3 commits


24 Jan, 2014

1 commit


04 Jan, 2014

1 commit


28 Oct, 2013

1 commit


29 Jun, 2013

1 commit


04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


22 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: David Woodhouse
    Cc: Al Viro
    Signed-off-by: Kees Cook
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     

18 Nov, 2012

1 commit

  • Users of jffs2_do_reserve_space() expect they still held
    erase_completion_lock after call to it. But there is a path
    where jffs2_do_reserve_space() leaves erase_completion_lock unlocked.
    The patch fixes it.

    Found by Linux Driver Verification project (linuxtesting.org).

    Signed-off-by: Alexey Khoroshilov
    Cc: stable@vger.kernel.org
    Signed-off-by: Artem Bityutskiy

    Alexey Khoroshilov
     

09 Nov, 2012

1 commit

  • jffs2_write_begin() first acquires the page lock, then f->sem. This
    causes an AB-BA deadlock with jffs2_garbage_collect_live(), which first
    acquires f->sem, then the page lock:

    jffs2_garbage_collect_live
    mutex_lock(&f->sem) (A)
    jffs2_garbage_collect_dnode
    jffs2_gc_fetch_page
    read_cache_page_async
    do_read_cache_page
    lock_page(page) (B)

    jffs2_write_begin
    grab_cache_page_write_begin
    find_lock_page
    lock_page(page) (B)
    mutex_lock(&f->sem) (A)

    We fix this by restructuring jffs2_write_begin() to take f->sem before
    the page lock. However, we make sure that f->sem is not held when
    calling jffs2_reserve_space(), as this is not permitted by the locking
    rules.

    The deadlock above was observed multiple times on an SoC with a dual
    ARMv7 (Cortex-A9), running the long-term 3.4.11 kernel; it occurred
    when using scp to copy files from a host system to the ARM target
    system. The fix was heavily tested on the same target system.

    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Betker
    Acked-by: Joakim Tjernlund
    Signed-off-by: Artem Bityutskiy

    Thomas Betker
     

09 Oct, 2012

2 commits


03 Oct, 2012

2 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     

29 Sep, 2012

2 commits

  • JFFS2 was designed without thought for OOB bitflips, it seems, but they
    can occur and will be reported to JFFS2 via mtd_read_oob()[1]. We don't
    want to fail on these transactions, since the data was corrected.

    [1] Few drivers report bitflips for OOB-only transactions. With such
    drivers, this patch should have no effect.

    Signed-off-by: Brian Norris
    Cc: stable@vger.kernel.org
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: David Woodhouse

    Brian Norris
     
  • This patch fixes regression introduced by
    "8bdc81c jffs2: get rid of jffs2_sync_super". We submit a delayed work in order
    to make sure the write-buffer is synchronized at some point. But we do not
    flush it when we unmount, which causes an oops when we unmount the file-system
    and then the delayed work is executed.

    This patch fixes the issue by adding a "cancel_delayed_work_sync()" infocation
    in the '->sync_fs()' handler. This will make sure the delayed work is canceled
    on sync, unmount and re-mount. And because VFS always callse 'sync_fs()' before
    unmounting or remounting, this fixes the issue.

    Reported-by: Ludovic Desroches
    Cc: stable@vger.kernel.org [3.5+]
    Signed-off-by: Artem Bityutskiy
    Tested-by: Ludovic Desroches
    Signed-off-by: David Woodhouse

    Artem Bityutskiy
     

21 Sep, 2012

1 commit


18 Sep, 2012

1 commit

  • - Pass the user namespace the uid and gid values in the xattr are stored
    in into posix_acl_from_xattr.

    - Pass the user namespace kuid and kgid values should be converted into
    when storing uid and gid values in an xattr in posix_acl_to_xattr.

    - Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
    pass in &init_user_ns.

    In the short term this change is not strictly needed but it makes the
    code clearer. In the longer term this change is necessary to be able to
    mount filesystems outside of the initial user namespace that natively
    store posix acls in the linux xattr format.

    Cc: Theodore Tso
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: Jan Kara
    Cc: Al Viro
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

23 Jul, 2012

1 commit


14 Jul, 2012

2 commits

  • boolean "does it have to be exclusive?" flag is passed instead;
    Local filesystem should just ignore it - the object is guaranteed
    not to be there yet.

    Signed-off-by: Al Viro

    Al Viro
     
  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     

02 Jun, 2012

1 commit

  • Pull mtd update from David Woodhouse:
    - More robust parsing especially of xattr data in JFFS2
    - Updates to mxc_nand and gpmi drivers to support new boards and device tree
    - Improve consistency of information about ECC strength in NAND devices
    - Clean up partition handling of plat_nand
    - Support NAND drivers without dedicated access to OOB area
    - BCH hardware ECC support for OMAP
    - Other fixes and cleanups, and a few new device IDs

    Fixed trivial conflict in drivers/mtd/nand/gpmi-nand/gpmi-nand.c due to
    added include files next to each other.

    * tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd: (75 commits)
    mtd: mxc_nand: move ecc strengh setup before nand_scan_tail
    mtd: block2mtd: fix recursive call of mtd_writev
    mtd: gpmi-nand: define ecc.strength
    mtd: of_parts: fix breakage in Kconfig
    mtd: nand: fix scan_read_raw_oob
    mtd: docg3 fix in-middle of blocks reads
    mtd: cfi_cmdset_0002: Slight cleanup of fixup messages
    mtd: add fixup for S29NS512P NOR flash.
    jffs2: allow to complete xattr integrity check on first GC scan
    jffs2: allow to discriminate between recoverable and non-recoverable errors
    mtd: nand: omap: add support for hardware BCH ecc
    ARM: OMAP3: gpmc: add BCH ecc api and modes
    mtd: nand: check the return code of 'read_oob/read_oob_raw'
    mtd: nand: remove 'sndcmd' parameter of 'read_oob/read_oob_raw'
    mtd: m25p80: Add support for Winbond W25Q80BW
    jffs2: get rid of jffs2_sync_super
    jffs2: remove unnecessary GC pass on sync
    jffs2: remove unnecessary GC pass on umount
    jffs2: remove lock_super
    mtd: gpmi: add gpmi support for mx6q
    ...

    Linus Torvalds
     

31 May, 2012

3 commits

  • Currently JFFS2 file-system maps the VFS "superblock" abstraction to the
    write-buffer. Namely, it uses VFS services to synchronize the write-buffer
    periodically.

    The whole "superblock write-out" VFS infrastructure is served by the
    'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
    writes out all dirty superblock using the '->write_super()' call-back. But the
    problem with this thread is that it wastes power by waking up the system every
    5 seconds no matter what. So we want to kill it completely and thus, we need to
    make file-systems to stop using the '->write_super' VFS service, and then
    remove it together with the kernel thread.

    This patch switches the JFFS2 write-buffer management from
    '->write_super()'/'->s_dirt' to a delayed work. Instead of setting the 's_dirt'
    flag we just schedule a delayed work for synchronizing the write-buffer.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • We do not need to call 'jffs2_write_super()' on sync. This function
    causes a GC pass to make sure the current contents is pushed out with
    the data which we already have on the media.

    But this is not needed on unmount and only slows sync down unnecessarily.
    It is enough to just sync the write-buffer.

    This call was added by one of the generic VFS rework patch-sets,
    see d579ed00aa96a7f7486978540a0d7cecaff742ae.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • We do not need to call 'jffs2_write_super()' on unmount. This function
    causes a GC pass to make sure the current contents is pushed out with
    the data which we already have on the media.

    But this is not needed on unmount and only slows unmount down unnecessarily.
    It is enough to just sync the write-buffer.

    This call was added by one of the generic VFS rework patch-sets,
    see 8c85e125124a473d6f3e9bb187b0b84207f81d91.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy