09 Jan, 2015

2 commits

  • commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream.

    We didn't check length of rock ridge ER records before printing them.
    Thus corrupted isofs image can cause us to access and print some memory
    behind the buffer with obvious consequences.

    Reported-and-tested-by: Carl Henrik Lunde
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit f54e18f1b831c92f6512d2eedb224cd63d607d3d upstream.

    Rock Ridge extensions define so called Continuation Entries (CE) which
    define where is further space with Rock Ridge data. Corrupted isofs
    image can contain arbitrarily long chain of these, including a one
    containing loop and thus causing kernel to end in an infinite loop when
    traversing these entries.

    Limit the traversal to 32 entries which should be more than enough space
    to store all the Rock Ridge data.

    Reported-by: P J P
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

20 Nov, 2014

1 commit

  • With the isofs_hash() function removed, isofs_hash_ms() is the only user
    of isofs_hash_common(), but it's defined inside of an #ifdef, which triggers
    this gcc warning in ARM axm55xx_defconfig starting with v3.18-rc3:

    fs/isofs/inode.c:177:1: warning: 'isofs_hash_common' defined but not used [-Wunused-function]

    This patch moves the function inside of the same #ifdef section to avoid that
    warning, which seems the best compromise of a relatively harmless patch for
    a late -rc.

    Signed-off-by: Arnd Bergmann
    Fixes: b0afd8e5db7b ("isofs: don't bother with ->d_op for normal case")
    Signed-off-by: Al Viro

    Arnd Bergmann
     

31 Oct, 2014

1 commit


29 Oct, 2014

1 commit


14 Oct, 2014

1 commit

  • The kernel used to contain two functions for length-delimited,
    case-insensitive string comparison, strnicmp with correct semantics and
    a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp
    was renamed to strncasecmp, and strnicmp made into a wrapper for the new
    strncasecmp to avoid breaking existing users.

    To allow the compat wrapper strnicmp to be removed at some point in the
    future, and to avoid the extra indirection cost, do
    s/strnicmp/strncasecmp/g.

    Signed-off-by: Rasmus Villemoes
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

20 Aug, 2014

1 commit

  • We did not check relocated directory in any way when processing Rock
    Ridge 'CL' tag. Thus a corrupted isofs image can possibly have a CL
    entry pointing to another CL entry leading to possibly unbounded
    recursion in kernel code and thus stack overflow or deadlocks (if there
    is a loop created from CL entries).

    Fix the problem by not allowing CL entry to point to a directory entry
    with CL entry (such use makes no good sense anyway) and by checking
    whether CL entry doesn't point to itself.

    CC: stable@vger.kernel.org
    Reported-by: Chris Evans
    Signed-off-by: Jan Kara

    Jan Kara
     

09 Aug, 2014

1 commit

  • Now with 64bit bzImage and kexec tools, we support ramdisk that size is
    bigger than 2g, as we could put it above 4G.

    Found compressed initramfs image could not be decompressed properly. It
    turns out that image length is int during decompress detection, and it
    will become < 0 when length is more than 2G. Furthermore, during
    decompressing len as int is used for inbuf count, that has problem too.

    Change len to long, that should be ok as on 32 bit platform long is
    32bits.

    Tested with following compressed initramfs image as root with kexec.
    gzip, bzip2, xz, lzma, lzop, lz4.
    run time for populate_rootfs():
    size name Nehalem-EX Westmere-EX Ivybridge-EX
    9034400256 root_img : 26s 24s 30s
    3561095057 root_img.lz4 : 28s 27s 27s
    3459554629 root_img.lzo : 29s 29s 28s
    3219399480 root_img.gz : 64s 62s 49s
    2251594592 root_img.xz : 262s 260s 183s
    2226366598 root_img.lzma: 386s 376s 277s
    2901482513 root_img.bz2 : 635s 599s

    Signed-off-by: Yinghai Lu
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Rashika Kheria
    Cc: Josh Triplett
    Cc: Kyungsik Lee
    Cc: P J P
    Cc: Al Viro
    Cc: Tetsuo Handa
    Cc: "Daniel M. Weeks"
    Cc: Alexandre Courbot
    Cc: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

08 Apr, 2014

1 commit

  • Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
    "various cleanups for ext2, ext3, udf, isofs, a documentation update
    for quota, and a fix of a race in reiserfs readdir implementation"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    reiserfs: fix race in readdir
    ext2: acl: remove unneeded include of linux/capability.h
    ext3: explicitly remove inode from orphan list after failed direct io
    fs/isofs/inode.c add __init to init_inodecache()
    ext3: Speedup WB_SYNC_ALL pass
    fs/quota/Kconfig: Update filesystems
    ext3: Update outdated comment before ext3_ordered_writepage()
    ext3: Update PF_MEMALLOC handling in ext3_write_inode()
    ext2/3: use prandom_u32() instead of get_random_bytes()
    ext3: remove an unneeded check in ext3_new_blocks()
    ext3: remove unneeded check in ext3_ordered_writepage()
    fs: Mark function as static in ext3/xattr_security.c
    fs: Mark function as static in ext3/dir.c
    fs: Mark function as static in ext2/xattr_security.c
    ext3: Add __init macro to init_inodecache
    ext2: Add __init macro to init_inodecache
    udf: Add __init macro to init_inodecache
    fs: udf: parse_options: blocksize check

    Linus Torvalds
     

13 Mar, 2014

2 commits

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     
  • init_inodecache is only called by __init init_iso9660_fs

    Signed-off-by: Fabian Frederick
    Signed-off-by: Jan Kara

    Fabian Frederick
     

25 Oct, 2013

1 commit


01 Aug, 2013

1 commit

  • Refuse RW mount of isofs filesystem. So far we just silently changed it
    to RO mount but when the media is writeable, block layer won't notice
    this change and thus will think device is used RW and will block eject
    button of the drive. That is unexpected by users because for
    non-writeable media eject button works just fine.

    Userspace mount(8) command handles this just fine and retries mounting
    with MS_RDONLY set so userspace shouldn't see any regression. Plus any
    tool mounting isofs is likely confronted with the case of read-only
    media where block layer already refuses to mount the filesystem without
    MS_RDONLY set so our behavior shouldn't be anything new for it.

    Reported-by: Hui Wang
    Signed-off-by: Jan Kara

    Jan Kara
     

29 Jun, 2013

2 commits

  • Instances either don't look at it at all (the majority of cases) or
    only want it to find the superblock (which can be had as dentry->d_sb).
    A few cases that want more are actually safe with dentry->d_inode -
    the only precaution needed is the check that it hadn't been replaced with
    NULL by rmdir() or by overwriting rename(), which case should be simply
    treated as cache miss.

    Signed-off-by: Linus Torvalds
    Signed-off-by: Al Viro

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     

13 Mar, 2013

1 commit

  • I had assumed that the only use of module aliases for filesystems
    prior to "fs: Limit sys_mount to only request filesystem modules."
    was in request_module. It turns out I was wrong. At least mkinitcpio
    in Arch linux uses these aliases.

    So readd the preexising aliases, to keep from breaking userspace.

    Userspace eventually will have to follow and use the same aliases the
    kernel does. So at some point we may be delete these aliases without
    problems. However that day is not today.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

26 Feb, 2013

1 commit


23 Feb, 2013

1 commit


10 Oct, 2012

1 commit

  • Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
    u64 inum = fid->raw[2];
    which is unhelpfully reported as at the end of shmem_alloc_inode():

    BUG: unable to handle kernel paging request at ffff880061cd3000
    IP: [] shmem_alloc_inode+0x40/0x40
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Call Trace:
    [] ? exportfs_decode_fh+0x79/0x2d0
    [] do_handle_open+0x163/0x2c0
    [] sys_open_by_handle_at+0xc/0x10
    [] tracesys+0xe1/0xe6

    Right, tmpfs is being stupid to access fid->raw[2] before validating that
    fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
    fall at the end of a page, and the next page not be present.

    But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
    careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
    could oops in the same way: add the missing fh_len checks to those.

    Reported-by: Sasha Levin
    Signed-off-by: Hugh Dickins
    Cc: Al Viro
    Cc: Sage Weil
    Cc: Steven Whitehouse
    Cc: Christoph Hellwig
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Hugh Dickins
     

03 Oct, 2012

2 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     

21 Sep, 2012

1 commit


25 Jul, 2012

1 commit

  • Pull misc udf, ext2, ext3, and isofs fixes from Jan Kara:
    "Assorted, mostly trivial, fixes for udf, ext2, ext3, and isofs. I'm
    on vacation and scarcely checking email since we are expecting baby
    any day now but these fixes should be safe to go in and I don't want
    to delay them unnecessarily."

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    udf: avoid info leak on export
    isofs: avoid info leak on export
    udf: Improve table length check to avoid possible overflow
    ext3: Check return value of blkdev_issue_flush()
    jbd: Check return value of blkdev_issue_flush()
    udf: Do not decrement i_blocks when freeing indirect extent block
    udf: Fix memory leak when mounting
    ext2: cleanup the confused goto label
    UDF: Remove unnecessary variable "offset" from udf_fill_inode
    udf: stop using s_dirt
    ext3: force ro mount if ext3_setup_super() fails
    quota: fix checkpatch.pl warning by replacing with

    Linus Torvalds
     

14 Jul, 2012

1 commit

  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     

13 Jul, 2012

1 commit


30 May, 2012

1 commit

  • pass inode + parent's inode or NULL instead of dentry + bool saying
    whether we want the parent or not.

    NOTE: that needs ceph fix folded in.

    Signed-off-by: Al Viro

    Al Viro
     

21 Mar, 2012

1 commit


09 Jan, 2012

1 commit


04 Jan, 2012

2 commits

  • situation with mount options is the same as for udf

    Signed-off-by: Al Viro

    Al Viro
     
  • Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
    it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
    the cost of taking it into inode_init_always() will be negligible for pipes
    and sockets and negative for everything else. Not to mention the removal of
    boilerplate code from ->destroy_inode() instances...

    Signed-off-by: Al Viro

    Al Viro
     

03 Nov, 2011

2 commits

  • Says Andrew:

    "60 patches. That's good enough for -rc1 I guess. I have quite a lot
    of detritus to be rechecked, work through maintainers, etc.

    - most of the remains of MM
    - rtc
    - various misc
    - cgroups
    - memcg
    - cpusets
    - procfs
    - ipc
    - rapidio
    - sysctl
    - pps
    - w1
    - drivers/misc
    - aio"

    * akpm: (60 commits)
    memcg: replace ss->id_lock with a rwlock
    aio: allocate kiocbs in batches
    drivers/misc/vmw_balloon.c: fix typo in code comment
    drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
    w1: disable irqs in critical section
    drivers/w1/w1_int.c: multiple masters used same init_name
    drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
    drivers/power/ds2780_battery.c: add a nolock function to w1 interface
    drivers/power/ds2780_battery.c: create central point for calling w1 interface
    w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
    pps gpio client: add missing dependency
    pps: new client driver using GPIO
    pps: default echo function
    include/linux/dma-mapping.h: add dma_zalloc_coherent()
    sysctl: make CONFIG_SYSCTL_SYSCALL default to n
    sysctl: add support for poll()
    RapidIO: documentation update
    drivers/net/rionet.c: fix ethernet address macros for LE platforms
    RapidIO: fix potential null deref in rio_setup_device()
    RapidIO: add mport driver for Tsi721 bridge
    ...

    Linus Torvalds
     
  • Use mpage_readpages() instead of multiple calls to isofs_readpage() to
    reduce the CPU utilization and make performance higher.

    Signed-off-by: Namjae Jeon
    Cc: Al Viro
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namjae Jeon
     

02 Nov, 2011

1 commit


23 Jul, 2011

1 commit

  • sbi->s_mutex isn't needed for isofs at all so we can just remove it. Generally,
    since isofs is always mounted read-only, filesystem structure cannot change
    under us. So buffer_head contents stays constant after it's filled in. That
    leaves us with possible changes of global data structures. Superblock changes
    only during filesystem mount (even remount does not change it), inodes are only
    filled in during reading from disk. So there are no changes of these structures
    to bother about.

    Arguments why sbi->s_mutex can be removed at each place:
    isofs_readdir: Accesses sb, inode, filp, local variables => s_mutex not needed
    isofs_lookup: Protected by directory's i_mutex. Accesses sb, inode, dentry,
    local variables => s_mutex not needed
    rock_ridge_symlink_readpage: Protected by page lock. Accesses sb, inode,
    local variables => s_mutex not needed.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

20 Jul, 2011

1 commit


18 Jun, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

14 Mar, 2011

1 commit

  • The exportfs encode handle function should return the minimum required
    handle size. This helps user to find out the handle size by passing 0
    handle size in the first step and then redoing to the call again with
    the returned handle size value.

    Acked-by: Serge Hallyn
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     

10 Mar, 2011

1 commit

  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe