03 Jan, 2013

4 commits

  • Pull ecryptfs fixes from Tyler Hicks:
    "Two self-explanatory fixes and a third patch which improves
    performance: when overwriting a full page in the eCryptfs page cache,
    skip reading in and decrypting the corresponding lower page."

    * tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    fs/ecryptfs/crypto.c: make ecryptfs_encode_for_filename() static
    eCryptfs: fix to use list_for_each_entry_safe() when delete items
    eCryptfs: Avoid unnecessary disk read and data decryption during writing

    Linus Torvalds
     
  • Pull ext4 bug fixes from Ted Ts'o:
    "Various bug fixes for ext4. Perhaps the most serious bug fixed is one
    which could cause file system corruptions when performing file punch
    operations."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: avoid hang when mounting non-journal filesystems with orphan list
    ext4: lock i_mutex when truncating orphan inodes
    ext4: do not try to write superblock on ro remount w/o journal
    ext4: include journal blocks in df overhead calcs
    ext4: remove unaligned AIO warning printk
    ext4: fix an incorrect comment about i_mutex
    ext4: fix deadlock in journal_unmap_buffer()
    ext4: split off ext4_journalled_invalidatepage()
    jbd2: fix assertion failure in jbd2_journal_flush()
    ext4: check dioread_nolock on remount
    ext4: fix extent tree corruption caused by hole punch

    Linus Torvalds
     
  • Remove the unused argument (formerly no_context) from mpol_parse_str()
    and from mpol_to_str().

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
    ensure events are not missed. Since the modifications to the interest
    mask are not protected by the same lock as ep_poll_callback, we need to
    ensure the change is visible to other CPUs calling ep_poll_callback.

    We also need to ensure f_op->poll() has an up-to-date view of past
    events which occured before we modified the interest mask. So this
    barrier also pairs with the barrier in wq_has_sleeper().

    This should guarantee either ep_poll_callback or f_op->poll() (or both)
    will notice the readiness of a recently-ready/modified item.

    This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
    http://thread.gmane.org/gmane.linux.kernel/1408782/

    Signed-off-by: Eric Wong
    Cc: Hans Verkuil
    Cc: Jiri Olsa
    Cc: Jonathan Corbet
    Cc: Al Viro
    Cc: Davide Libenzi
    Cc: Hans de Goede
    Cc: Mauro Carvalho Chehab
    Cc: David Miller
    Cc: Eric Dumazet
    Cc: Andrew Morton
    Cc: Andreas Voellmy
    Tested-by: "Junchang(Jason) Wang"
    Cc: netdev@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Eric Wong
     

27 Dec, 2012

2 commits

  • When trying to mount a file system which does not contain a journal,
    but which does have a orphan list containing an inode which needs to
    be truncated, the mount call with hang forever in
    ext4_orphan_cleanup() because ext4_orphan_del() will return
    immediately without removing the inode from the orphan list, leading
    to an uninterruptible loop in kernel code which will busy out one of
    the CPU's on the system.

    This can be trivially reproduced by trying to mount the file system
    found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
    source tree. If a malicious user were to put this on a USB stick, and
    mount it on a Linux desktop which has automatic mounts enabled, this
    could be considered a potential denial of service attack. (Not a big
    deal in practice, but professional paranoids worry about such things,
    and have even been known to allocate CVE numbers for such problems.)

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     
  • Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
    called without i_mutex being taken. It had previously not been taken
    during orphan cleanup since races weren't possible at that point in
    the mount process, but as a result of this c278531d39, we will now see
    a kernel WARN_ON in this case. Take the i_mutex in
    ext4_orphan_cleanup() to suppress this warning.

    Reported-by: Alexander Beregalov
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu
    Cc: stable@vger.kernel.org

    Theodore Ts'o
     

26 Dec, 2012

8 commits

  • With user namespaces enabled building f2fs fails with:

    CC fs/f2fs/acl.o
    fs/f2fs/acl.c: In function ‘f2fs_acl_from_disk’:
    fs/f2fs/acl.c:85:21: error: ‘struct posix_acl_entry’ has no member named ‘e_id’
    make[2]: *** [fs/f2fs/acl.o] Error 1
    make[2]: Target `__build' not remade because of errors.

    e_id is a backwards compatibility field only used for file systems
    that haven't been converted to use kuids and kgids. When the posix
    acl tag field is neither ACL_USER nor ACL_GROUP assigning e_id is
    unnecessary. Remove the assignment so f2fs will build with user
    namespaces enabled.

    Cc: Namjae Jeon
    Cc: Amit Sahrawat
    Acked-by: Jaegeuk Kim
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • While testing the pid namespace code I hit this nasty warning.

    [ 176.262617] ------------[ cut here ]------------
    [ 176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0()
    [ 176.265145] Hardware name: Bochs
    [ 176.265677] Modules linked in:
    [ 176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
    [ 176.266564] Call Trace:
    [ 176.266564] [] warn_slowpath_common+0x7f/0xc0
    [ 176.266564] [] warn_slowpath_null+0x1a/0x20
    [ 176.266564] [] local_bh_enable_ip+0x7a/0xa0
    [ 176.266564] [] _raw_spin_unlock_bh+0x19/0x20
    [ 176.266564] [] proc_free_inum+0x3a/0x50
    [ 176.266564] [] free_pid_ns+0x1c/0x80
    [ 176.266564] [] put_pid_ns+0x35/0x50
    [ 176.266564] [] put_pid+0x4a/0x60
    [ 176.266564] [] tty_ioctl+0x717/0xc10
    [ 176.266564] [] ? wait_consider_task+0x855/0xb90
    [ 176.266564] [] ? default_spin_lock_flags+0x9/0x10
    [ 176.266564] [] ? remove_wait_queue+0x5a/0x70
    [ 176.266564] [] do_vfs_ioctl+0x98/0x550
    [ 176.266564] [] ? recalc_sigpending+0x1f/0x60
    [ 176.266564] [] ? __set_task_blocked+0x37/0x80
    [ 176.266564] [] ? sys_wait4+0xab/0xf0
    [ 176.266564] [] sys_ioctl+0x91/0xb0
    [ 176.266564] [] ? task_stopped_code+0x50/0x50
    [ 176.266564] [] system_call_fastpath+0x16/0x1b
    [ 176.266564] ---[ end trace 387af88219ad6143 ]---

    It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
    put_pid is called with another spinlock held and irqs disabled.

    For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
    in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • When a journal-less ext4 filesystem is mounted on a read-only block
    device (blockdev --setro will do), each remount (for other, unrelated,
    flags, like suid=>nosuid etc) results in a series of scary messages
    from kernel telling about I/O errors on the device.

    This is becauese of the following code ext4_remount():

    if (sbi->s_journal == NULL)
    ext4_commit_super(sb, 1);

    at the end of remount procedure, which forces writing (flushing) of
    a superblock regardless whenever it is dirty or not, if the filesystem
    is readonly or not, and whenever the device itself is readonly or not.

    We only need call ext4_commit_super when the file system had been
    previously mounted read/write.

    Thanks to Eric Sandeen for help in diagnosing this issue.

    Signed-off-By: Michael Tokarev
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Michael Tokarev
     
  • To more accurately calculate overhead for "bsd" style
    df reporting, we should count the journal blocks as
    overhead as well.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"
    Tested-by: Eric Whitney

    Eric Sandeen
     
  • Although I put this in, I now think it was a bad decision. For most
    users, there is very little to be done in this case. They get the
    message, once per day, with no real context or proposed action. TBH,
    it generates support calls when it probably does not need to; the
    message sounds more dire than the situation really is.

    Just nuke it. Normal investigation via blktrace or whatnot can
    reveal poor IO patterns if bad performance is encountered.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • i_mutex is not held when ->sync_file is called.

    Reviewed-by: Jan Kara
    Signed-off-by: Andy Lutomirski
    Signed-off-by: "Theodore Ts'o"

    Andy Lutomirski
     
  • We cannot wait for transaction commit in journal_unmap_buffer()
    because we hold page lock which ranks below transaction start. We
    solve the issue by bailing out of journal_unmap_buffer() and
    jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible
    for waiting for transaction commit to finish and try invalidation
    again. Since the issue can happen only for page stradding i_size, it
    is simple enough to manually call jbd2_journal_invalidatepage() for
    such page from ext4_setattr(), check the return value and wait if
    necessary.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • In data=journal mode we don't need delalloc or DIO handling in invalidatepage
    and similarly in other modes we don't need the journal handling. So split
    invalidatepage implementations.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

22 Dec, 2012

4 commits

  • Pull CIFS fixes from Steve French:
    "Misc small cifs fixes"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: eliminate cifsERROR variable
    cifs: don't compare uniqueids in cifs_prime_dcache unless server inode numbers are in use
    cifs: fix double-free of "string" in cifs_parse_mount_options

    Linus Torvalds
     
  • This reverts commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e.

    This is obviously wrong, and I have no idea how I missed seeing the
    warning in testing: I must just not have looked at the right logs. The
    caller bumps rq_resused/rq_next_page, so it will always be hit on a
    large enough read.

    Reported-by: Dave Jones
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • The fscache code will currently bleat a "non-unique superblock keys"
    warning even if the user is mounting without the 'fsc' option.

    There should be no reason to even initialise the superblock cache cookie
    unless we're planning on using fscache for something, so ensure that we
    check for the NFS_OPTION_FSCACHE flag before calling into the fscache
    code.

    Reported-by: Paweł Sikora
    Signed-off-by: Trond Myklebust
    Cc: David Howells
    Acked-by: David Howells
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     
  • Provide a stub nfs_fscache_wait_on_invalidate() function for when
    CONFIG_NFS_FSCACHE=n lest the following error appear:

    fs/nfs/inode.c: In function 'nfs_invalidate_mapping':
    fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration]
    cc1: some warnings being treated as errors

    Reported-by: kbuild test robot
    Reported-by: Vineet Gupta
    Reported-by: Borislav Petkov
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

21 Dec, 2012

22 commits

  • The following race is possible between start_this_handle() and someone
    calling jbd2_journal_flush().

    Process A Process B
    start_this_handle().
    if (journal->j_barrier_count) # false
    if (!journal->j_running_transaction) { #true
    read_unlock(&journal->j_state_lock);
    jbd2_journal_lock_updates()
    jbd2_journal_flush()
    write_lock(&journal->j_state_lock);
    if (journal->j_running_transaction) {
    # false
    ... wait for committing trans ...
    write_unlock(&journal->j_state_lock);
    ...
    write_lock(&journal->j_state_lock);
    if (!journal->j_running_transaction) { # true
    jbd2_get_transaction(journal, new_transaction);
    write_unlock(&journal->j_state_lock);
    goto repeat; # eventually blocks on j_barrier_count > 0
    ...
    J_ASSERT(!journal->j_running_transaction);
    # fails

    We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
    in exclusive mode.

    Reported-by: yjwsignal@empal.com
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Jan Kara
     
  • Pull filesystem notification updates from Eric Paris:
    "This pull mostly is about locking changes in the fsnotify system. By
    switching the group lock from a spin_lock() to a mutex() we can now
    hold the lock across things like iput(). This fixes a problem
    involving unmounting a fs and having inodes be busy, first pointed out
    by FAT, but reproducible with tmpfs.

    This also restores signal driven I/O for inotify, which has been
    broken since about 2.6.32."

    Ugh. I *hate* the timing of this. It was rebased after the merge
    window opened, and then left to sit with the pull request coming the day
    before the merge window closes. That's just crap. But apparently the
    patches themselves have been around for over a year, just gathering
    dust, so now it's suddenly critical.

    Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen
    Rothwell's fixes from -next.

    * 'for-next' of git://git.infradead.org/users/eparis/notify:
    inotify: automatically restart syscalls
    inotify: dont skip removal of watch descriptor if creation of ignored event failed
    fanotify: dont merge permission events
    fsnotify: make fasync generic for both inotify and fanotify
    fsnotify: change locking order
    fsnotify: dont put marks on temporary list when clearing marks by group
    fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()
    fsnotify: pass group to fsnotify_destroy_mark()
    fsnotify: use a mutex instead of a spinlock to protect a groups mark list
    fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed
    fsnotify: take groups mark_lock before mark lock
    fsnotify: use reference counting for groups
    fsnotify: introduce fsnotify_get_group()
    inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()

    Linus Torvalds
     
  • Merge the rest of Andrew's patches for -rc1:
    "A bunch of fixes and misc missed-out-on things.

    That'll do for -rc1. I still have a batch of IPC patches which still
    have a possible bug report which I'm chasing down."

    * emailed patches from Andrew Morton : (25 commits)
    keys: use keyring_alloc() to create module signing keyring
    keys: fix unreachable code
    sendfile: allows bypassing of notifier events
    SGI-XP: handle non-fatal traps
    fat: fix incorrect function comment
    Documentation: ABI: remove testing/sysfs-devices-node
    proc: fix inconsistent lock state
    linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
    memcg: don't register hotcpu notifier from ->css_alloc()
    checkpatch: warn on uapi #includes that #include
    mm: cma: WARN if freed memory is still in use
    exec: do not leave bprm->interp on stack
    ...

    Linus Torvalds
     
  • Pull VFS update from Al Viro:
    "fscache fixes, ESTALE patchset, vmtruncate removal series, assorted
    misc stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (79 commits)
    vfs: make lremovexattr retry once on ESTALE error
    vfs: make removexattr retry once on ESTALE
    vfs: make llistxattr retry once on ESTALE error
    vfs: make listxattr retry once on ESTALE error
    vfs: make lgetxattr retry once on ESTALE
    vfs: make getxattr retry once on an ESTALE error
    vfs: allow lsetxattr() to retry once on ESTALE errors
    vfs: allow setxattr to retry once on ESTALE errors
    vfs: allow utimensat() calls to retry once on an ESTALE error
    vfs: fix user_statfs to retry once on ESTALE errors
    vfs: make fchownat retry once on ESTALE errors
    vfs: make fchmodat retry once on ESTALE errors
    vfs: have chroot retry once on ESTALE error
    vfs: have chdir retry lookup and call once on ESTALE error
    vfs: have faccessat retry once on an ESTALE error
    vfs: have do_sys_truncate retry once on an ESTALE error
    vfs: fix renameat to retry on ESTALE errors
    vfs: make do_unlinkat retry once on ESTALE errors
    vfs: make do_rmdir retry once on ESTALE errors
    vfs: add a flags argument to user_path_parent
    ...

    Linus Torvalds
     
  • Pull signal handling cleanups from Al Viro:
    "sigaltstack infrastructure + conversion for x86, alpha and um,
    COMPAT_SYSCALL_DEFINE infrastructure.

    Note that there are several conflicts between "unify
    SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
    resolution is trivial - just remove definitions of SS_ONSTACK and
    SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
    include/uapi/linux/signal.h contains the unified variant."

    Fixed up conflicts as per Al.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    alpha: switch to generic sigaltstack
    new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
    generic compat_sys_sigaltstack()
    introduce generic sys_sigaltstack(), switch x86 and um to it
    new helper: compat_user_stack_pointer()
    new helper: restore_altstack()
    unify SS_ONSTACK/SS_DISABLE definitions
    new helper: current_user_stack_pointer()
    missing user_stack_pointer() instances
    Bury the conditionals from kernel_thread/kernel_execve series
    COMPAT_SYSCALL_DEFINE: infrastructure

    Linus Torvalds
     
  • do_sendfile() in fs/read_write.c does not call the fsnotify functions,
    unlike its neighbors. This manifests as a lack of inotify ACCESS events
    when a file is sent using sendfile(2).

    Addresses
    https://bugzilla.kernel.org/show_bug.cgi?id=12812

    [akpm@linux-foundation.org: use fsnotify_modify(out.file), not fsnotify_access(), per Dave]
    Signed-off-by: Alan Cox
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Scott Wolchok
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Scott Wolchok
     
  • fat_search_long() returns 0 on success, -ENOENT/ENOMEM on failure.
    Change the function comment accordingly.

    While at it, fix some trivial typos.

    Signed-off-by: Ravishankar N
    Signed-off-by: Namjae Jeon
    Acked-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravishankar N
     
  • Lockdep found an inconsistent lock state when rcu is processing delayed
    work in softirq. Currently, kernel is using spin_lock/spin_unlock to
    protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
    context.

    Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.

    =================================
    [ INFO: inconsistent lock state ]
    3.7.0 #36 Not tainted
    ---------------------------------
    inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    (proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
    {SOFTIRQ-ON-W} state was registered at:
    __lock_acquire+0x8ae/0xca0
    lock_acquire+0x199/0x200
    _raw_spin_lock+0x41/0x50
    proc_alloc_inum+0x4c/0xd0
    alloc_mnt_ns+0x49/0xc0
    create_mnt_ns+0x25/0x70
    mnt_init+0x161/0x1c7
    vfs_caches_init+0x107/0x11a
    start_kernel+0x348/0x38c
    x86_64_start_reservations+0x131/0x136
    x86_64_start_kernel+0x103/0x112
    irq event stamp: 2993422
    hardirqs last enabled at (2993422): _raw_spin_unlock_irqrestore+0x55/0x80
    hardirqs last disabled at (2993421): _raw_spin_lock_irqsave+0x29/0x70
    softirqs last enabled at (2993394): _local_bh_enable+0x13/0x20
    softirqs last disabled at (2993395): call_softirq+0x1c/0x30

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(proc_inum_lock);

    lock(proc_inum_lock);

    *** DEADLOCK ***

    no locks held by swapper/1/0.

    stack backtrace:
    Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
    Call Trace:
    [] ? vprintk_emit+0x471/0x510
    print_usage_bug+0x2a5/0x2c0
    mark_lock+0x33b/0x5e0
    __lock_acquire+0x813/0xca0
    lock_acquire+0x199/0x200
    _raw_spin_lock+0x41/0x50
    proc_free_inum+0x1c/0x50
    free_pid_ns+0x1c/0x50
    put_pid_ns+0x2e/0x50
    put_pid+0x4a/0x60
    delayed_put_pid+0x12/0x20
    rcu_process_callbacks+0x462/0x790
    __do_softirq+0x1b4/0x3b0
    call_softirq+0x1c/0x30
    do_softirq+0x59/0xd0
    irq_exit+0x54/0xd0
    smp_apic_timer_interrupt+0x95/0xa3
    apic_timer_interrupt+0x72/0x80
    cpuidle_enter_tk+0x10/0x20
    cpuidle_enter_state+0x17/0x50
    cpuidle_idle_call+0x287/0x520
    cpu_idle+0xba/0x130
    start_secondary+0x2b3/0x2bc

    Signed-off-by: Xiaotian Feng
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaotian Feng
     
  • Add an error message for the case of failure of sync fs in
    delayed_sync_fs() method.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Add to hfs_btree_write() a return of -EIO on failure of b-tree node
    searching. Also add logic ofor processing errors from hfs_btree_write()
    in hfsplus_system_write_inode() with a message about b-tree writing
    failure.

    [akpm@linux-foundation.org: reduce scope of `err', print errno on error]
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Cc: Al Viro
    Acked-by: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Currently, it doesn't process error codes from the hfsplus_block_free()
    call in hfsplus_free_extents() method. Add some error code processing.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • If the read fails we kmap an error code. This doesn't end well. Instead
    print a critical error and pray. This mirrors the rest of the fs
    behaviour with critical error cases.

    Acked-by: Vyacheslav Dubeyko
    Signed-off-by: Alan Cox
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Acked-by: Hin-Tak Leung
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • If a series of scripts are executed, each triggering module loading via
    unprintable bytes in the script header, kernel stack contents can leak
    into the command line.

    Normally execution of binfmt_script and binfmt_misc happens recursively.
    However, when modules are enabled, and unprintable bytes exist in the
    bprm->buf, execution will restart after attempting to load matching
    binfmt modules. Unfortunately, the logic in binfmt_script and
    binfmt_misc does not expect to get restarted. They leave bprm->interp
    pointing to their local stack. This means on restart bprm->interp is
    left pointing into unused stack memory which can then be copied into the
    userspace argv areas.

    After additional study, it seems that both recursion and restart remains
    the desirable way to handle exec with scripts, misc, and modules. As
    such, we need to protect the changes to interp.

    This changes the logic to require allocation for any changes to the
    bprm->interp. To avoid adding a new kmalloc to every exec, the default
    value is left as-is. Only when passing through binfmt_script or
    binfmt_misc does an allocation take place.

    For a proof of concept, see DoTest.sh from:

    http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

    Signed-off-by: Kees Cook
    Cc: halfdog
    Cc: P J P
    Cc: Alexander Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     
  • Clearly, we can't handle the NULL filename case, but we can deal with
    the case where there's a real pathname.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton