Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

07 Feb, 2014

7 commits

7f2803340 Merge tag 'v3.12.10' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/li… ... Browse Code »

…nux-stable into ti-linux-3.12.y

This is the 3.12.10 stable release

* tag 'v3.12.10' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (133 commits)
Linux 3.12.10
x86, cpu, amd: Add workaround for family 16h, erratum 793
powerpc: Make sure "cache" directory is removed when offlining cpu
powerpc: Fix the setup of CPU-to-Node mappings during CPU online
btrfs: restrict snapshotting to own subvolumes
Btrfs: handle EAGAIN case properly in btrfs_drop_snapshot()
target/iscsi: Fix network portal creation race
iscsi-target: Pre-allocate more tags to avoid ack starvation
virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze
SCSI: bfa: Chinook quad port 16G FC HBA claim issue
usb: core: get config and string descriptors for unauthorized devices
hpfs: remember free space
ALSA: hda/hdmi - allow PIN_OUT to be dynamically enabled
ALSA: hda - hdmi: introduce patch_nvhdmi()
ALSA: hda - Don't set indep_hp flag for old AD codecs
KVM: PPC: e500: Fix bad address type in deliver_tlb_misss()
KVM: PPC: Book3S HV: use xics_wake_cpu only when defined
parisc: fix cache-flushing
alpha: fix broken network checksum
inet_diag: fix inet_diag_dump_icsk() timewait socket state logic
...

Signed-off-by: Dan Murphy <DMurphy@ti.com>

Dan Murphy
2014-02-07 07:05:20 +0800
b572f9aaa btrfs: restrict snapshotting to own subvolumes ... Browse Code »

commit d024206133ce21936b3d5780359afc00247655b7 upstream.

Currently, any user can snapshot any subvolume if the path is accessible and
thus indirectly create and keep files he does not own under his direcotries.
This is not possible with traditional directories.

In security context, a user can snapshot root filesystem and pin any
potentially buggy binaries, even if the updates are applied.

All the snapshots are visible to the administrator, so it's possible to
verify if there are suspicious snapshots.

Another more practical problem is that any user can pin the space used
by eg. root and cause ENOSPC.

Original report:
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/484786

Signed-off-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason
Signed-off-by: Greg Kroah-Hartman

David Sterba
2014-02-07 03:22:22 +0800
a0f602ae0 Btrfs: handle EAGAIN case properly in btrfs_drop_snapshot() ... Browse Code »

commit 90515e7f5d7d24cbb2a4038a3f1b5cfa2921aa17 upstream.

We may return early in btrfs_drop_snapshot(), we shouldn't
call btrfs_std_err() for this case, fix it.

Signed-off-by: Wang Shilong
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason
Signed-off-by: Greg Kroah-Hartman

Wang Shilong
2014-02-07 03:22:22 +0800
29b49c821 hpfs: remember free space ... Browse Code »

commit 2cbe5c76fc5e38e9af4b709593146e4b8272b69e upstream.

Previously, hpfs scanned all bitmaps each time the user asked for free
space using statfs. This patch changes it so that hpfs scans the
bitmaps only once, remembes the free space and on next invocation of
statfs it returns the value instantly.

New versions of wine are hammering on the statfs syscall very heavily,
making some games unplayable when they're stored on hpfs, with load
times in minutes.

This should be backported to the stable kernels because it fixes
user-visible problem (excessive level load times in wine).

Signed-off-by: Mikulas Patocka
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Mikulas Patocka
2014-02-07 03:22:21 +0800
441d2e303 vfs: Is mounted should be testing mnt_ns for NULL or error. ... Browse Code »

commit 260a459d2e39761fbd39803497205ce1690bc7b1 upstream.

A bug was introduced with the is_mounted helper function in
commit f7a99c5b7c8bd3d3f533c8b38274e33f3da9096e
Author: Al Viro
Date: Sat Jun 9 00:59:08 2012 -0400

get rid of ->mnt_longterm

it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()

Signed-off-by: Al Viro

The intent was to test if the real_mount(vfsmount)->mnt_ns was
NULL_OR_ERR but the code is actually testing real_mount(vfsmount)
and always returning true.

The result is d_absolute_path returning paths it should be hiding.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2014-02-07 03:22:19 +0800
87df85def vfs: Remove second variable named error in __dentry_path ... Browse Code »

commit a8323da0366d3398eda62741d2ac1130c8a172ed upstream.

In commit 232d2d60aa5469bb097f55728f65146bd49c1d25
Author: Waiman Long
Date: Mon Sep 9 12:18:13 2013 -0400

dcache: Translating dentry into pathname without taking rename_lock

The __dentry_path locking was changed and the variable error was
intended to be moved outside of the loop. Unfortunately the inner
declaration of error was not removed. Resulting in a version of
__dentry_path that will never return an error.

Remove the problematic inner declaration of error and allow
__dentry_path to return errors once again.

Cc: Waiman Long
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Al Viro
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2014-02-07 03:22:19 +0800
842408e7c ext4: avoid clearing beyond i_blocks when truncating an inline data file ... Browse Code »

commit 09c455aaa8f47a94d5bafaa23d58365768210507 upstream.

A missing cast means that when we are truncating a file which is less
than 60 bytes, we don't clear the correct area of memory, and in fact
we can end up truncating the next inode in the inode table, or worse
yet, some other kernel data structure.

Addresses-Coverity-Id: #751987

Signed-off-by: "Theodore Ts'o"
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2014-02-07 03:22:19 +0800

26 Jan, 2014

6 commits

c843ceecf Merge tag 'v3.12.9' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/lin… ... Browse Code »

…ux-stable into ti-linux-3.12.y

This is the 3.12.9 stable release

* tag 'v3.12.9' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (28 commits)
Linux 3.12.9
ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
ARM: 7934/1: DT/kernel: fix arch_match_cpu_phys_id to avoid erroneous match
serial: amba-pl011: use port lock to guard control register access
mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL
md/raid5: Fix possible confusion when multiple write errors occur.
md/raid10: fix two bugs in handling of known-bad-blocks.
md/raid10: fix bug when raid10 recovery fails to recover a block.
md: fix problem when adding device to read-only array with bitmap.
drm/i915: fix DDI PLLs HW state readout code
nilfs2: fix segctor bug that causes file system corruption
mm: fix crash when using XFS on loopback
crash_dump: fix compilation error (on MIPS at least)
ftrace/x86: Load ftrace_ops in parameter not the variable holding it
thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only
SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()
writeback: Fix data corruption on NFS
hwmon: (coretemp) Fix truncated name of alarm attributes
i2c: Re-instate body of i2c_parent_is_i2c_adapter()
...

Signed-off-by: Dan Murphy <DMurphy@ti.com>

Dan Murphy
2014-01-26 02:04:20 +0800
0ac74239b nilfs2: fix segctor bug that causes file system corruption ... Browse Code »

commit 70f2fe3a26248724d8a5019681a869abdaf3e89a upstream.

There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean. It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

1. The cleaner starts the segment construction
2. nilfs_segctor_collect is called
3. sc_stage is on NILFS_ST_SUFILE and segments are freed
4. sc_stage is on NILFS_ST_DAT current segment is full
5. nilfs_segctor_extend_segments is called, which
allocates a new segment
6. The new segment is one of the segments freed in step 3
7. nilfs_sufile_cancel_freev is called and produces an error message
8. Loop around and the collection starts again
9. sc_stage is on NILFS_ST_SUFILE and segments are freed
including the newly allocated segment, which will contain active
data and can be allocated at a later time
10. A few hours later another segment construction allocates the
segment and causes file system corruption

This can be prevented by simply reordering the statements. If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner
Reviewed-by: Ryusuke Konishi
Tested-by: Andreas Rohner
Signed-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Andreas Rohner
2014-01-26 00:49:29 +0800
0e177339b writeback: Fix data corruption on NFS ... Browse Code »

commit f9b0e058cbd04ada76b13afffa7e1df830543c24 upstream.

Commit 4f8ad655dbc8 "writeback: Refactor writeback_single_inode()" added
a condition to skip clean inode. However this is wrong in WB_SYNC_ALL
mode because there we also want to wait for outstanding writeback on
possibly clean inode. This was causing occasional data corruption issues
on NFS because it uses sync_inode() to make sure all outstanding writes
are flushed to the server before truncating the inode and with
sync_inode() returning prematurely file was sometimes extended back
by an outstanding write after it was truncated.

So modify the test to also check for pages under writeback in
WB_SYNC_ALL mode.

Fixes: 4f8ad655dbc82cf05d2edc11e66b78a42d38bf93
Reported-and-tested-by: Dan Duval
Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2014-01-26 00:49:28 +0800
71a342462 vfs: Fix a regression in mounting proc ... Browse Code »

commit 41301ae78a99ead04ea42672a1ab72c6f44cc81d upstream.

Gao feng reported that commit
e51db73532955dc5eaba4235e62b74b460709d5b
userns: Better restrictions on when proc and sysfs can be mounted
caused a regression on mounting a new instance of proc in a mount
namespace created with user namespace privileges, when binfmt_misc
is mounted on /proc/sys/fs/binfmt_misc.

This is an unintended regression caused by the absolutely bogus empty
directory check in fs_fully_visible. The check fs_fully_visible replaced
didn't even bother to attempt to verify proc was fully visible and
hiding proc files with any kind of mount is rare. So for now fix
the userspace regression by allowing directory with nlink == 1
as /proc/sys/fs/binfmt_misc has.

I will have a better patch but it is not stable material, or
last minute kernel material. So it will have to wait.

Acked-by: Serge Hallyn
Acked-by: Gao feng
Tested-by: Gao feng
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2014-01-26 00:49:28 +0800
0489953bb vfs: In d_path don't call d_dname on a mount point ... Browse Code »

commit f48cfddc6729ef133933062320039808bafa6f45 upstream.

Aditya Kali (adityakali@google.com) wrote:
> Commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
> "proc: Fix the namespace inode permission checks." converted
> the namespace files into symlinks. The same commit changed
> the way namespace bind mounts appear in /proc/mounts:
> $ mount --bind /proc/self/ns/ipc /mnt/ipc
> Originally:
> $ cat /proc/mounts | grep ipc
> proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
>
> After commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
> $ cat /proc/mounts | grep ipc
> proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
>
> This breaks userspace which expects the 2nd field in
> /proc/mounts to be a valid path.

The symlink /proc//ns/{ipc,mnt,net,pid,user,uts} point to
dentries allocated with d_alloc_pseudo that we can mount, and
that have interesting names printed out with d_dname.

When these files are bind mounted /proc/mounts is not currently
displaying the mount point correctly because d_dname is called instead
of just displaying the path where the file is mounted.

Solve this by adding an explicit check to distinguish mounted pseudo
inodes and unmounted pseudo inodes. Unmounted pseudo inodes always
use mount of their filesstem as the mnt_root in their path making
these two cases easy to distinguish.

Acked-by: Serge Hallyn
Reported-by: Aditya Kali
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Greg Kroah-Hartman

Eric W. Biederman
2014-01-26 00:49:28 +0800
dfc74e9cc GFS2: Increase i_writecount during gfs2_setattr_chown ... Browse Code »

commit 62e96cf81988101fe9e086b2877307b6adda5197 upstream.

This patch calls get_write_access in function gfs2_setattr_chown,
which merely increases inode->i_writecount for the duration of the
function. That will ensure that any file closes won't delete the
inode's multi-block reservation while the function is running.
It also ensures that a multi-block reservation exists when needed
for quota change operations during the chown.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
Signed-off-by: Greg Kroah-Hartman

Bob Peterson
2014-01-26 00:49:28 +0800

18 Jan, 2014

3 commits

450c4d11d Merge tag 'v3.12.7' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linu… ... Browse Code »

…x-stable into ti-linux-3.12.y

This is the 3.12.7 stable release

* tag 'v3.12.7' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (154 commits)
Linux 3.12.7
sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
ext4: fix bigalloc regression
ACPIPHP / radeon / nouveau: Fix VGA switcheroo problem related to hotplug
nouveau_acpi: convert acpi_get_handle() to acpi_has_method()
aio/migratepages: make aio migrate pages sane
aio: clean up and fix aio_setup_ring page mapping
clocksource: dw_apb_timer_of: Fix support for dts binding "snps,dw-apb-timer"
clocksource: dw_apb_timer_of: Fix read_sched_clock
selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
selinux: look for IPsec labels on both inbound and outbound packets
sh: always link in helper functions extracted from libgcc
gpio: msm: Fix irq mask/unmask by writing bits instead of numbers
gpio: twl4030: Fix regression for twl gpio LED output
sh-pfc: Fix PINMUX_GPIO macro
jbd2: don't BUG but return ENOSPC if a handle runs out of space
s390/3270: fix allocation of tty3270_screen structure
ARM: sun7i: dt: Fix interrupt trigger types
memcg: fix memcg_size() calculation
GFS2: Fix incorrect invalidation for DIO/buffered I/O
...

Conflicts:
arch/arm/mach-omap2/omap_hwmod_7xx_data.c
drivers/usb/musb/musb_core.c

Signed-off-by: Dan Murphy <dmurphy@ti.com>

Dan Murphy
2014-01-18 06:29:26 +0800
d67276925 Merge tag 'v3.12.6' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linu… ... Browse Code »

…x-stable into ti-linux-3.12.y

This is the 3.12.6 stable release

* tag 'v3.12.6' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (120 commits)
Linux 3.12.6
ARM: OMAP2+: hwmod: Fix SOFTRESET logic
drm/i915/vlv: fix up broken precision in vlv_crtc_clock_get
drm/i915/vlv: add VLV specific clock_get function v3
i915/vlv: untangle integrated clock source handling v4
Btrfs: fix lockdep error in async commit
Btrfs: fix a crash when running balance and defrag concurrently
Btrfs: do not run snapshot-aware defragment on error
Btrfs: take ordered root lock when removing ordered operations inode
Btrfs: stop using vfs_read in send
Btrfs: fix incorrect inode acl reset
Btrfs: fix hole check in log_one_extent
Btrfs: fix memory leak of chunks' extent map
Btrfs: reset intwrite on transaction abort
Btrfs: do a full search everytime in btrfs_search_old_slot
Revert "net: update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST"
Input: elantech - add support for newer (August 2013) devices
NFSv4 wait on recovery for async session errors
sc1200_wdt: Fix oops
staging: comedi: ssv_dnp: use comedi_dio_update_state()
...

Conflicts:
arch/arm/mach-omap2/omap_hwmod.c
drivers/usb/musb/musb_cppi41.c

Signed-off-by: Dan Murphy <dmurphy@ti.com>

Dan Murphy
2014-01-18 05:00:07 +0800
5488f7783 Merge tag 'v3.12.5' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linu… ... Browse Code »

…x-stable into ti-linux-3.12.y

This is the 3.12.5 stable release

* tag 'v3.12.5' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (64 commits)
Linux 3.12.5
crypto: scatterwalk - Use sg_chain_ptr on chain entries
drivers/char/i8k.c: add Dell XPLS L421X
USB: cdc-acm: Added support for the Lenovo RD02-D400 USB Modem
USB: spcp8x5: correct handling of CS5 setting
USB: mos7840: correct handling of CS5 setting
USB: ftdi_sio: fixed handling of unsupported CSIZE setting
USB: pl2303: fixed handling of CS5 setting
n_tty: Fix missing newline echo
mei: add 9 series PCH mei device ids
mei: me: add Lynx Point Wellsburg work station device id
Input: mousedev - allow disabling even without CONFIG_EXPERT
Input: allow deselecting serio drivers even without CONFIG_EXPERT
tg3: avoid double-freeing of rx data memory
iwlwifi: dvm: don't override mac80211's queue setting
SCSI: Disable WRITE SAME for RAID and virtual host adapter drivers
x86-64, build: Always pass in -mno-sse
net: update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST
irq: Enable all irqs unconditionally in irq_resume
Update of blkg_stat and blkg_rwstat may happen in bh context. While u64_stats_fetch_retry is only preempt_disable on 32bit UP system. This is not enough to avoid preemption by bh and may read strange 64 bit value.
...

Signed-off-by: Dan Murphy <dmurphy@ti.com>

Dan Murphy
2014-01-18 04:37:58 +0800

10 Jan, 2014

24 commits

adf6f9b43 ext4: fix bigalloc regression ... Browse Code »

commit d0abafac8c9162f39c4f6b2f8141b772a09b3770 upstream.

Commit f5a44db5d2 introduced a regression on filesystems created with
the bigalloc feature (cluster size > blocksize). It causes xfstests
generic/006 and /013 to fail with an unexpected JBD2 failure and
transaction abort that leaves the test file system in a read only state.
Other xfstests run on bigalloc file systems are likely to fail as well.

The cause is the accidental use of a cluster mask where a cluster
offset was needed in ext4_ext_map_blocks().

Signed-off-by: Eric Whitney
Cc: Theodore Ts'o
Signed-off-by: Greg Kroah-Hartman

Eric Whitney
2014-01-10 04:25:16 +0800
2b9a70414 aio/migratepages: make aio migrate pages sane ... Browse Code »

commit 8e321fefb0e60bae4e2a28d20fc4fa30758d27c6 upstream.

The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.

While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.

Signed-off-by: Benjamin LaHaise
Signed-off-by: Greg Kroah-Hartman

Benjamin LaHaise
2014-01-10 04:25:16 +0800
25c36e26d aio: clean up and fix aio_setup_ring page mapping ... Browse Code »

commit 3dc9acb67600393249a795934ccdfc291a200e6b upstream.

Since commit 36bc08cc01709 ("fs/aio: Add support to aio ring pages
migration") the aio ring setup code has used a special per-ring backing
inode for the page allocations, rather than just using random anonymous
pages.

However, rather than remembering the pages as it allocated them, it
would allocate the pages, insert them into the file mapping (dirty, so
that they couldn't be free'd), and then forget about them. And then to
look them up again, it would mmap the mapping, and then use
"get_user_pages()" to get back an array of the pages we just created.

Now, not only is that incredibly inefficient, it also leaked all the
pages if the mmap failed (which could happen due to excessive number of
mappings, for example).

So clean it all up, making it much more straightforward. Also remove
some left-overs of the previous (broken) mm_populate() usage that was
removed in commit d6c355c7dabc ("aio: fix race in ring buffer page
lookup introduced by page migration support") but left the pointless and
now misleading MAP_POPULATE flag around.

Tested-and-acked-by: Benjamin LaHaise
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2014-01-10 04:25:15 +0800
981d2964f jbd2: don't BUG but return ENOSPC if a handle runs out of space ... Browse Code »

commit f6c07cad081ba222d63623d913aafba5586c1d2c upstream.

If a handle runs out of space, we currently stop the kernel with a BUG
in jbd2_journal_dirty_metadata(). This makes it hard to figure out
what might be going on. So return an error of ENOSPC, so we can let
the file system layer figure out what is going on, to make it more
likely we can get useful debugging information). This should make it
easier to debug problems such as the one which was reported by:

https://bugzilla.kernel.org/show_bug.cgi?id=44731

The only two callers of this function are ext4_handle_dirty_metadata()
and ocfs2_journal_dirty(). The ocfs2 function will trigger a
BUG_ON(), which means there will be no change in behavior. The ext4
function will call ext4_error_inode() which will print the useful
debugging information and then handle the situation using ext4's error
handling mechanisms (i.e., which might mean halting the kernel or
remounting the file system read-only).

Also, since both file systems already call WARN_ON(), drop the WARN_ON
from jbd2_journal_dirty_metadata() to avoid two stack traces from
being displayed.

Signed-off-by: "Theodore Ts'o"
Cc: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2014-01-10 04:25:15 +0800
3d5088353 GFS2: Fix incorrect invalidation for DIO/buffered I/O ... Browse Code »

commit dfd11184d894cd0a92397b25cac18831a1a6a5bc upstream.

In patch 209806aba9d540dde3db0a5ce72307f85f33468f we allowed
local deferred locks to be granted against a cached exclusive
lock. That opened up a corner case which this patch now
fixes.

The solution to the problem is to check whether we have cached
pages each time we do direct I/O and if so to unmap, flush
and invalidate those pages. Since the glock state machine
normally does that for us, mostly the code will be a no-op.

Signed-off-by: Steven Whitehouse
Signed-off-by: Greg Kroah-Hartman

Steven Whitehouse
2014-01-10 04:25:15 +0800
e93b10093 GFS2: Fix slab memory leak in gfs2_bufdata ... Browse Code »

commit 502be2a32f09f388e4ff34ef2e3ebcabbbb261da upstream.

This patch fixes a slab memory leak that sometimes can occur
for files with a very short lifespan. The problem occurs when
a dinode is deleted before it has gotten to the journal properly.
In the leak scenario, the bd object is pinned for journal
committment (queued to the metadata buffers queue: sd_log_le_buf)
but is subsequently unpinned and dequeued before it finds its way
to the ail or the revoke queue. In this rare circumstance, the bd
object needs to be freed from slab memory, or it is forgotten.
We have to be very careful how we do it, though, because
multiple processes can call gfs2_remove_from_journal. In order to
avoid double-frees, only the process that does the unpinning is
allowed to free the bd.

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
Signed-off-by: Greg Kroah-Hartman

Bob Peterson
2014-01-10 04:25:15 +0800
6d9c4a00e GFS2: Fix use-after-free race when calling gfs2_remove_from_ail ... Browse Code »

commit 9290a9a7c0bcf5400e8dbfbf9707fa68ea3fb338 upstream.

Function gfs2_remove_from_ail drops the reference on the bh via
brelse. This patch fixes a race condition whereby bh is deferenced
after the brelse when setting bd->bd_blkno = bh->b_blocknr;
Under certain rare circumstances, bh might be gone or reused,
and bd->bd_blkno is set to whatever that memory happens to be,
which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails
the test "bd->bd_blkno >= blkno" which causes it to never be freed.
The end result is that the bd is never freed from the bufdata cache,
which results in this error:
slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects

Signed-off-by: Bob Peterson
Signed-off-by: Steven Whitehouse
Signed-off-by: Greg Kroah-Hartman

Bob Peterson
2014-01-10 04:25:15 +0800
c2ff1ad9b GFS2: don't hold s_umount over blkdev_put ... Browse Code »

commit dfe5b9ad83a63180f358b27d1018649a27b394a9 upstream.

This is a GFS2 version of Tejun's patch:
4f331f01b9c43bf001d3ffee578a97a1e0633eac
vfs: don't hold s_umount over close_bdev_exclusive() call

In this case its blkdev_put itself that is the issue and this
patch uses the same solution of dropping and retaking s_umount.

Reported-by: Tejun Heo
Reported-by: Al Viro
Signed-off-by: Steven Whitehouse
Signed-off-by: Greg Kroah-Hartman

Steven Whitehouse
2014-01-10 04:25:14 +0800
681203c68 ext2: Fix oops in ext2_get_block() called from ext2_quota_write() ... Browse Code »

commit df4e7ac0bb70abc97fbfd9ef09671fc084b3f9db upstream.

ext2_quota_write() doesn't properly setup bh it passes to
ext2_get_block() and thus we hit assertion BUG_ON(maxblocks == 0) in
ext2_get_blocks() (or we could actually ask for mapping arbitrary number
of blocks depending on whatever value was on stack).

Fix ext2_quota_write() to properly fill in number of blocks to map.

Reviewed-by: "Theodore Ts'o"
Reviewed-by: Christoph Hellwig
Reported-by: Christoph Hellwig
Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2014-01-10 04:25:13 +0800
529cfe789 cifs: set FILE_CREATED ... Browse Code »

commit f1e3268126a35b9d3cb8bf67487fcc6cd13991d8 upstream.

Set FILE_CREATED on O_CREAT|O_EXCL.

cifs code didn't change during commit 116cc0225381415b96551f725455d067f63a76a0

Kernel bugzilla 66251

Signed-off-by: Shirish Pargaonkar
Acked-by: Jeff Layton
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman

Shirish Pargaonkar
2014-01-10 04:25:13 +0800
52b9008f8 cifs: We do not drop reference to tlink in CIFSCheckMFSymlink() ... Browse Code »

commit 750b8de6c4277d7034061e1da50663aa1b0479e4 upstream.

When we obtain tcon from cifs_sb, we use cifs_sb_tlink() to first obtain
tlink which also grabs a reference to it. We do not drop this reference
to tlink once we are done with the call.

The patch fixes this issue by instead passing tcon as a parameter and
avoids having to obtain a reference to the tlink. A lookup for the tcon
is already made in the calling functions and this way we avoid having to
re-run the lookup. This is also consistent with the argument list for
other similar calls for M-F symlinks.

We should also return an ENOSYS when we do not find a protocol specific
function to lookup the MF Symlink data.

Signed-off-by: Sachin Prabhu
Reviewed-by: Jeff Layton
Signed-off-by: Steve French
Signed-off-by: Greg Kroah-Hartman

Sachin Prabhu
2014-01-10 04:25:13 +0800
36bbba067 ceph: Avoid data inconsistency due to d-cache aliasing in readpage() ... Browse Code »

commit 56f91aad69444d650237295f68c195b74d888d95 upstream.

If the length of data to be read in readpage() is exactly
PAGE_CACHE_SIZE, the original code does not flush d-cache
for data consistency after finishing reading. This patches fixes
this.

Signed-off-by: Li Wang
Signed-off-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Li Wang
2014-01-10 04:25:11 +0800
5f4890293 ext4: fix FITRIM in no journal mode ... Browse Code »

commit 8f9ff189205a6817aee5a1f996f876541f86e07c upstream.

When using FITRIM ioctl on a file system without journal it will
only trim the block group once, no matter how many times you invoke
FITRIM ioctl and how many block you release from the block group.

It is because we only clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT in journal
callback. Fix this by clearing the bit in no journal mode as well.

Signed-off-by: Lukas Czerner
Signed-off-by: "Theodore Ts'o"
Reported-by: Jorge Fábregas
Signed-off-by: Greg Kroah-Hartman

Lukas Czerner
2014-01-10 04:25:10 +0800
efa1cbb56 ext4: add explicit casts when masking cluster sizes ... Browse Code »

commit f5a44db5d2d677dfbf12deee461f85e9ec633961 upstream.

The missing casts can cause the high 64-bits of the physical blocks to
be lost. Set up new macros which allows us to make sure the right
thing happen, even if at some point we end up supporting larger
logical block numbers.

Thanks to the Emese Revfy and the PaX security team for reporting this
issue.

Reported-by: PaX Team
Reported-by: Emese Revfy
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2014-01-10 04:25:10 +0800
c4589c1f2 ext4: fix deadlock when writing in ENOSPC conditions ... Browse Code »

commit 34cf865d54813aab3497838132fb1bbd293f4054 upstream.

Akira-san has been reporting rare deadlocks of his machine when running
xfstests test 269 on ext4 filesystem. The problem turned out to be in
ext4_da_reserve_metadata() and ext4_da_reserve_space() which called
ext4_should_retry_alloc() while holding i_data_sem. Since
ext4_should_retry_alloc() can force a transaction commit, this is a
lock ordering violation and leads to deadlocks.

Fix the problem by just removing the retry loops. These functions should
just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that
function must take care of retrying after dropping all necessary locks.

Reported-and-tested-by: Akira Fujita
Reviewed-by: Zheng Liu
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2014-01-10 04:25:10 +0800
12f5b4908 ext4: Do not reserve clusters when fs doesn't support extents ... Browse Code »

commit 30fac0f75da24dd5bb43c9e911d2039a984ac815 upstream.

When the filesystem doesn't support extents (like in ext2/3
compatibility modes), there is no need to reserve any clusters. Space
estimates for writing are exact, hole punching doesn't need new
metadata, and there are no unwritten extents to convert.

This fixes a problem when filesystem still having some free space when
accessed with a native ext2/3 driver suddently reports ENOSPC when
accessed with ext4 driver.

Reported-by: Geert Uytterhoeven
Tested-by: Geert Uytterhoeven
Reviewed-by: Lukas Czerner
Signed-off-by: Jan Kara
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Greg Kroah-Hartman

Jan Kara
2014-01-10 04:25:10 +0800
89b4fc74a ext4: fix del_timer() misuse for ->s_err_report ... Browse Code »

commit 9105bb149bbbc555d2e11ba5166dfe7a24eae09e upstream.

That thing should be del_timer_sync(); consider what happens
if ext4_put_super() call of del_timer() happens to come just as it's
getting run on another CPU. Since that timer reschedules itself
to run next day, you are pretty much guaranteed that you'll end up
with kfree'd scheduled timer, with usual fun consequences. AFAICS,
that's -stable fodder all way back to 2010... [the second del_timer_sync()
is almost certainly not needed, but it doesn't hurt either]

Signed-off-by: Al Viro
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Greg Kroah-Hartman

Al Viro
2014-01-10 04:25:10 +0800
ea214c946 ext4: check for overlapping extents in ext4_valid_extent_entries() ... Browse Code »

commit 5946d089379a35dda0e531710b48fca05446a196 upstream.

A corrupted ext4 may have out of order leaf extents, i.e.

extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
^^^^ overlap with previous extent

Reading such extent could hit BUG_ON() in ext4_es_cache_extent().

BUG_ON(end < lblk);

The problem is that __read_extent_tree_block() tries to cache holes as
well but assumes 'lblk' is greater than 'prev' and passes underflowed
length to ext4_es_cache_extent(). Fix it by checking for overlapping
extents in ext4_valid_extent_entries().

I hit this when fuzz testing ext4, and am able to reproduce it by
modifying the on-disk extent by hand.

Also add the check for (ee_block + len - 1) in ext4_valid_extent() to
make sure the value is not overflow.

Ran xfstests on patched ext4 and no regression.

Cc: Lukáš Czerner
Signed-off-by: Eryu Guan
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Greg Kroah-Hartman

Eryu Guan
2014-01-10 04:25:10 +0800
ab69f8ebb ext4: fix use-after-free in ext4_mb_new_blocks ... Browse Code »

commit 4e8d2139802ce4f41936a687f06c560b12115247 upstream.

ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count.
While ext4_mb_use_preallocated checks pa->pa_deleted first and then
increments pa->count later, ext4_mb_put_pa decrements pa->pa_count
before holding pa->pa_lock and then sets pa->pa_deleted.

* Free sequence
ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count
ext4_mb_put_pa (2): lock pa->pa_lock
ext4_mb_put_pa (3): check pa->pa_deleted
ext4_mb_put_pa (4): set pa->pa_deleted=1
ext4_mb_put_pa (5): unlock pa->pa_lock
ext4_mb_put_pa (6): remove pa from a list
ext4_mb_pa_callback: free pa

* Use sequence
ext4_mb_use_preallocated (1): iterate over preallocation
ext4_mb_use_preallocated (2): lock pa->pa_lock
ext4_mb_use_preallocated (3): check pa->pa_deleted
ext4_mb_use_preallocated (4): increase pa->pa_count
ext4_mb_use_preallocated (5): unlock pa->pa_lock
ext4_mb_release_context: access pa

* Use-after-free sequence
[initial status] pa_deleted = 0, pa_count = 1>
ext4_mb_use_preallocated (1): iterate over preallocation
ext4_mb_use_preallocated (2): lock pa->pa_lock
ext4_mb_use_preallocated (3): check pa->pa_deleted
ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count
[pa_count decremented] pa_deleted = 0, pa_count = 0>
ext4_mb_use_preallocated (4): increase pa->pa_count
[pa_count incremented] pa_deleted = 0, pa_count = 1>
ext4_mb_use_preallocated (5): unlock pa->pa_lock
ext4_mb_put_pa (2): lock pa->pa_lock
ext4_mb_put_pa (3): check pa->pa_deleted
ext4_mb_put_pa (4): set pa->pa_deleted=1
[race condition!] pa_deleted = 1, pa_count = 1>
ext4_mb_put_pa (5): unlock pa->pa_lock
ext4_mb_put_pa (6): remove pa from a list
ext4_mb_pa_callback: free pa
ext4_mb_release_context: access pa

AddressSanitizer has detected use-after-free in ext4_mb_new_blocks
Bug report: http://goo.gl/rG1On3

Signed-off-by: Junho Ryu
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Greg Kroah-Hartman

Junho Ryu
2014-01-10 04:25:09 +0800
a3e59ae4f ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails ... Browse Code »

commit ae1495b12df1897d4f42842a7aa7276d920f6290 upstream.

While it's true that errors can only happen if there is a bug in
jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt
the kernel or remount the file system read-only in order to avoid
further data loss. The ext4_journal_abort_handle() function doesn't
do any of this, and while it's likely that this call (since it doesn't
adjust refcounts) will likely result in the file system eventually
deadlocking since the current transaction will never be able to close,
it's much cleaner to call let ext4's error handling system deal with
this situation.

There's a separate bug here which is that if certain jbd2 errors
errors occur and file system is mounted errors=continue, the file
system will probably eventually end grind to a halt as described
above. But things have been this way in a long time, and usually when
we have these sorts of errors it's pretty much a disaster --- and
that's why the jbd2 layer aggressively retries memory allocations,
which is the most likely cause of these jbd2 errors.

Signed-off-by: "Theodore Ts'o"
Reviewed-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman

Theodore Ts'o
2014-01-10 04:25:09 +0800
57a0ea215 xfs: fix infinite loop by detaching the group/project hints from user dquot ... Browse Code »

commit 718cc6f88cbfc4fbd39609f28c4c86883945f90d upstream.

xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
# mount -ouquota,gquota /dev/sda7 /xfs
# mkdir /xfs/test
# xfs_quota -xc 'off -g' /xfs /proc/sysrq-trigger
# dmesg

SysRq : Show Blocked State
task PC stack pid father
xfs_quota D 0000000000000000 0 27574 2551 0x00000000
[snip]
Call Trace:
[] schedule+0xad/0xc0
[] schedule_timeout+0x35e/0x3c0
[] ? mark_held_locks+0x176/0x1c0
[] ? call_timer_fn+0x2c0/0x2c0
[] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
[] schedule_timeout_uninterruptible+0x26/0x30
[] xfs_qm_dquot_walk+0x235/0x260 [xfs]
[] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
[] ? xfs_perag_get+0x5/0x2d0 [xfs]
[] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
[] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
[] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
[] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
[] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
[] xfs_fs_set_xstate+0x136/0x180 [xfs]
[] do_quotactl+0x53a/0x6b0
[] ? iput+0x5b/0x90
[] SyS_quotactl+0x167/0x1d0
[] ? trace_hardirqs_on_thunk+0x3a/0x3f
[] system_call_fastpath+0x16/0x1b

It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s). Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.

This problem has been around since Linux 3.4, it was introduced by:
[ b84a3a9675 xfs: remove the per-filesystem list of dquots ]

Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.

In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.

(cherry picked from commit df8052e7dae00bde6f21b40b6e3e1099770f3afc)

Signed-off-by: Jie Liu
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers
Signed-off-by: Greg Kroah-Hartman

Jie Liu
2014-01-10 04:25:09 +0800
200067a3f aio: fix kioctx leak introduced by "aio: Fix a trinity splat" ... Browse Code »

commit 1881686f842065d2f92ec9c6424830ffc17d23b0 upstream.

e34ecee2ae791df674dfb466ce40692ca6218e43 reworked the percpu reference
counting to correct a bug trinity found. Unfortunately, the change lead
to kioctxes being leaked because there was no final reference count to
put. Add that reference count back in to fix things.

Signed-off-by: Benjamin LaHaise
Signed-off-by: Greg Kroah-Hartman

Benjamin LaHaise
2014-01-10 04:25:08 +0800
17e38d92d ceph: allocate non-zero page to fscache in readpage() ... Browse Code »

commit ff638b7df5a9264024a6448bdfde2b2bf5d1994a upstream.

ceph_osdc_readpages() returns number of bytes read, currently,
the code only allocate full-zero page into fscache, this patch
fixes this.

Signed-off-by: Li Wang
Reviewed-by: Milosz Tanski
Reviewed-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Li Wang
2014-01-10 04:25:07 +0800
b41958835 ceph: wake up 'safe' waiters when unregistering request ... Browse Code »

commit fc55d2c9448b34218ca58733a6f51fbede09575b upstream.

We also need to wake up 'safe' waiters if error occurs or request
aborted. Otherwise sync(2)/fsync(2) may hang forever.

Signed-off-by: Yan, Zheng
Signed-off-by: Sage Weil
Signed-off-by: Greg Kroah-Hartman

Yan, Zheng
2014-01-10 04:25:07 +0800