Eric Lee / smarc-fsl-linux-kernel

10 Oct, 2012

1 commit

35c2a7f49 tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking ... Browse Code »

Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():

BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
[] ? exportfs_decode_fh+0x79/0x2d0
[] do_handle_open+0x163/0x2c0
[] sys_open_by_handle_at+0xc/0x10
[] tracesys+0xe1/0xe6

Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.

But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.

Reported-by: Sasha Levin
Signed-off-by: Hugh Dickins
Cc: Al Viro
Cc: Sage Weil
Cc: Steven Whitehouse
Cc: Christoph Hellwig
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Hugh Dickins
2012-10-10 11:33:55 +0800

09 Oct, 2012

1 commit

0b173bc4d mm: kill vma flag VM_CAN_NONLINEAR ... Browse Code »

Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.

Signed-off-by: Konstantin Khlebnikov
Cc: Alexander Viro
Cc: Carsten Otte
Cc: Chris Metcalf #arch/tile
Cc: Cyrill Gorcunov
Cc: Eric Paris
Cc: H. Peter Anvin
Cc: Hugh Dickins
Cc: Ingo Molnar
Cc: James Morris
Cc: Jason Baron
Cc: Kentaro Takeda
Cc: Matt Helsley
Cc: Nick Piggin
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Robert Richter
Cc: Suresh Siddha
Cc: Tetsuo Handa
Cc: Venkatesh Pallipadi
Acked-by: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-10-09 15:22:17 +0800

03 Oct, 2012

5 commits

60c7b4df8 Merge tag 'for-linus-v3.7-rc1' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs update from Ben Myers:
"Several enhancements and cleanups:

- make inode32 and inode64 remountable options
- SEEK_HOLE/SEEK_DATA enhancements
- cleanup struct declarations in xfs_mount.h"

* tag 'for-linus-v3.7-rc1' of git://oss.sgi.com/xfs/xfs:
xfs: Make inode32 a remountable option
xfs: add inode64->inode32 transition into xfs_set_inode32()
xfs: Fix mp->m_maxagi update during inode64 remount
xfs: reduce code duplication handling inode32/64 options
xfs: make inode64 as the default allocation mode
xfs: Fix m_agirotor reset during AG selection
Make inode64 a remountable option
xfs: stop the sync worker before xfs_unmountfs
xfs: xfs_seek_hole() refinement with hole searching from page cache for unwritten extents
xfs: xfs_seek_data() refinement with unwritten extents check up from page cache
xfs: Introduce a helper routine to probe data or hole offset from page cache
xfs: Remove type argument from xfs_seek_data()/xfs_seek_hole()
xfs: fix race while discarding buffers [V4]
xfs: check for possible overflow in xfs_ioc_trim
xfs: unlock the AGI buffer when looping in xfs_dialloc
xfs: kill struct declarations in xfs_mount.h
xfs: fix uninitialised variable in xfs_rtbuf_get()

Linus Torvalds
2012-10-03 11:42:58 +0800
aab174f0d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs update from Al Viro:

- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c

(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).

A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.

- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).

- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.

- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.

- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...

Linus Torvalds
2012-10-03 11:25:04 +0800
8c0a85377 fs: push rcu_barrier() from deactivate_locked_super() to filesystems ... Browse Code »

There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

Signed-off-by: Kirill A. Shutemov
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Kirill A. Shutemov
2012-10-03 09:35:55 +0800
437589a74 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.

The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.

The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.

Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.

Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.

While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.

Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.

After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...

Linus Torvalds
2012-10-03 02:11:09 +0800
033d9959e Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »

Pull workqueue changes from Tejun Heo:
"This is workqueue updates for v3.7-rc1. A lot of activities this
round including considerable API and behavior cleanups.

* delayed_work combines a timer and a work item. The handling of the
timer part has always been a bit clunky leading to confusing
cancelation API with weird corner-case behaviors. delayed_work is
updated to use new IRQ safe timer and cancelation now works as
expected.

* Another deficiency of delayed_work was lack of the counterpart of
mod_timer() which led to cancel+queue combinations or open-coded
timer+work usages. mod_delayed_work[_on]() are added.

These two delayed_work changes make delayed_work provide interface
and behave like timer which is executed with process context.

* A work item could be executed concurrently on multiple CPUs, which
is rather unintuitive and made flush_work() behavior confusing and
half-broken under certain circumstances. This problem doesn't
exist for non-reentrant workqueues. While non-reentrancy check
isn't free, the overhead is incurred only when a work item bounces
across different CPUs and even in simulated pathological scenario
the overhead isn't too high.

All workqueues are made non-reentrant. This removes the
distinction between flush_[delayed_]work() and
flush_[delayed_]_work_sync(). The former is now as strong as the
latter and the specified work item is guaranteed to have finished
execution of any previous queueing on return.

* In addition to the various bug fixes, Lai redid and simplified CPU
hotplug handling significantly.

* Joonsoo introduced system_highpri_wq and used it during CPU
hotplug.

There are two merge commits - one to pull in IRQ safe timer from
tip/timers/core and the other to pull in CPU hotplug fixes from
wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.

Tejun pointed out a few of them, I fixed a couple more.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
workqueue: remove @delayed from cwq_dec_nr_in_flight()
workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
workqueue: use __cpuinit instead of __devinit for cpu callbacks
workqueue: rename manager_mutex to assoc_mutex
workqueue: WORKER_REBIND is no longer necessary for idle rebinding
workqueue: WORKER_REBIND is no longer necessary for busy rebinding
workqueue: reimplement idle worker rebinding
workqueue: deprecate __cancel_delayed_work()
workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
workqueue: use mod_delayed_work() instead of __cancel + queue
workqueue: use irqsafe timer for delayed_work
workqueue: clean up delayed_work initializers and add missing one
workqueue: make deferrable delayed_work initializer names consistent
workqueue: cosmetic whitespace updates for macro definitions
workqueue: deprecate system_nrt[_freezable]_wq
workqueue: deprecate flush[_delayed]_work_sync()
...

Linus Torvalds
2012-10-03 00:54:49 +0800

27 Sep, 2012

10 commits

2903ff019 switch simple cases of fget_light to fdget ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 10:20:08 +0800
64e09fa2e switch xfs_find_handle() to fget_light() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:10:11 +0800
1ea65c960 switch xfs_swapext() to fget_light() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:10:11 +0800
2ea039298 xfs: Make inode32 a remountable option ... Browse Code »

As inode64 is the default option now, and was also made remountable
previously, inode32 can also be remounted on-the-fly when it is needed.

Signed-off-by: Carlos Maiolino
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Carlos Maiolino
2012-09-27 05:01:28 +0800
4056c1d08 xfs: add inode64->inode32 transition into xfs_set_inode32() ... Browse Code »

To make inode32 a remountable option, xfs_set_inode32() should be able
to make a transition from inode64 option, disabling inode allocation on
higher AGs.

Signed-off-by: Carlos Maiolino
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Carlos Maiolino
2012-09-27 04:59:50 +0800
4c0837224 xfs: Fix mp->m_maxagi update during inode64 remount ... Browse Code »

With the changes made on xfs_set_inode64(), to make it behave as
xfs_set_inode32() (now leaving to the caller the responsibility to update
mp->m_maxagi), we use the return value of xfs_set_inode64() to update
mp->m_maxagi during remount.

Signed-off-by: Carlos Maiolino
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Carlos Maiolino
2012-09-27 04:58:21 +0800
2d2194f61 xfs: reduce code duplication handling inode32/64 options ... Browse Code »

Add xfs_set_inode32() to be used to enable inode32 allocation mode. this
will reduce the amount of duplicated code needed to mount/remount a
filesystem with inode32 option. This patch also changes
xfs_set_inode64() to return the maximum AG number that inodes can be
allocated instead of set mp->m_maxagi by itself, so that the behaviour
is the same as xfs_set_inode32(). This simplifies code that calls these
functions and needs to know the maximum AG that inodes can be allocated
in.

Signed-off-by: Carlos Maiolino
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Carlos Maiolino
2012-09-27 04:56:33 +0800
08bf54041 xfs: make inode64 as the default allocation mode ... Browse Code »

since 64-bit inodes can be accessed while using inode32, and these can
also be used on 32-bit kernels, there is no reason to still keep inode32
as the default mount option. If the filesystem cannot handle 64bit
inode numbers (i.e CONFIG_LBDAF is not enabled and BITS_PER_LONG == 32),
XFS_MOUNT_SMALL_INUMS will still be set by default, so inode64 is not an
unconditional default value.

Signed-off-by: Carlos Maiolino
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Carlos Maiolino
2012-09-27 04:54:19 +0800
8aea3ff41 xfs: Fix m_agirotor reset during AG selection ... Browse Code »

xfs_ialloc_next_ag() currently resets m_agirotor when it is equal to
m_maxagi:

if (++mp->m_agirotor == mp->m_maxagi)
mp->m_agirotor = 0;

But, if for some reason mp->m_maxagi changes to a lower value than
current m_agirotor, this condition will never be true, causing
m_agirotor to exceed the maximum allowed value (m_maxagi).

This implies mainly during lookups for xfs_perag structs in its radix
tree, since the agno value used for the lookup is based on m_agirotor.
An out-of-range m_agirotor may cause a lookup failure which in case will
return NULL.

As an example, the value of m_maxagi is decreased during
inode64->inode32 remount process, case where I've found this problem.

Signed-off-by: Carlos Maiolino
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Carlos Maiolino
2012-09-27 04:42:42 +0800
c3a58fecd Make inode64 a remountable option ... Browse Code »

Actually, there is no reason about why a user must umount and mount a
XFS filesystem to enable 'inode64' option. So, this patch makes this a
remountable option.

Signed-off-by: Carlos Maiolino
Reviewed-by: Brian Foster
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Carlos Maiolino
2012-09-27 04:41:39 +0800

19 Sep, 2012

1 commit

0ba6e5368 xfs: stop the sync worker before xfs_unmountfs ... Browse Code »

Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs. This prevents occasional crashes on unmount like so:

PID: 21602 TASK: ee9df060 CPU: 0 COMMAND: "kworker/0:3"
#0 [c5377d28] crash_kexec at c0292c94
#1 [c5377d80] oops_end at c07090c2
#2 [c5377d98] no_context at c06f614e
#3 [c5377dbc] __bad_area_nosemaphore at c06f6281
#4 [c5377df4] bad_area_nosemaphore at c06f629b
#5 [c5377e00] do_page_fault at c070b0cb
#6 [c5377e7c] error_code (via page_fault) at c070892c
EAX: f300c6a8 EBX: f300c6a8 ECX: 000000c0 EDX: 000000c0 EBP: c5377ed0
DS: 007b ESI: 00000000 ES: 007b EDI: 00000001 GS: ffffad20
CS: 0060 EIP: c0481ad0 ERR: ffffffff EFLAGS: 00010246
#7 [c5377eb0] atomic64_read_cx8 at c0481ad0
#8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
#9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653 TASK: e79143b0 CPU: 3 COMMAND: "umount"
#0 [cde0fda0] __schedule at c0706595
#1 [cde0fe28] schedule at c0706b89
#2 [cde0fe30] schedule_timeout at c0705600
#3 [cde0fe94] __down_common at c0706098
#4 [cde0fec8] __down at c0706122
#5 [cde0fed0] down at c025936f
#6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
#7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
#8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
#9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers
Reviewed-by: Dave Chinner
Reviewed-by: Mark Tinguely

Ben Myers
2012-09-19 05:51:26 +0800

18 Sep, 2012

4 commits

431f19744 userns: Convert quota netlink aka quota_send_warning ... Browse Code »

Modify quota_send_warning to take struct kqid instead a type and
identifier pair.

When sending netlink broadcasts always convert uids and quota
identifiers into the intial user namespace. There is as yet no way to
send a netlink broadcast message with different contents to receivers
in different namespaces, so for the time being just map all of the
identifiers into the initial user namespace which preserves the
current behavior.

Change the callers of quota_send_warning in gfs2, xfs and dquot
to generate a struct kqid to pass to quota send warning. When
all of the user namespaces convesions are complete a struct kqid
values will be availbe without need for conversion, but a conversion
is needed now to avoid needing to convert everything at once.

Cc: Ben Myers
Cc: Alex Elder
Cc: Dave Chinner
Cc: Jan Kara
Cc: Steven Whitehouse
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-09-18 16:01:40 +0800
74a8a1037 userns: Convert qutoactl ... Browse Code »

Update the quotactl user space interface to successfull compile with
user namespaces support enabled and to hand off quota identifiers to
lower layers of the kernel in struct kqid instead of type and qid
pairs.

The quota on function is not converted because while it takes a quota
type and an id. The id is the on disk quota format to use, which
is something completely different.

The signature of two struct quotactl_ops methods were changed to take
struct kqid argumetns get_dqblk and set_dqblk.

The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk
are minimally changed so that the code continues to work with
the change in parameter type.

This is the first in a series of changes to always store quota
identifiers in the kernel in struct kqid and only use raw type and qid
values when interacting with on disk structures or userspace. Always
using struct kqid internally makes it hard to miss places that need
conversion to or from the kernel internal values.

Cc: Jan Kara
Cc: Dave Chinner
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Ben Myers
Cc: Alex Elder
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-09-18 16:01:39 +0800
5f3a4a28e userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr ... Browse Code »

- Pass the user namespace the uid and gid values in the xattr are stored
in into posix_acl_from_xattr.

- Pass the user namespace kuid and kgid values should be converted into
when storing uid and gid values in an xattr in posix_acl_to_xattr.

- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
pass in &init_user_ns.

In the short term this change is not strictly needed but it makes the
code clearer. In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.

Cc: Theodore Tso
Cc: Andrew Morton
Cc: Andreas Dilger
Cc: Jan Kara
Cc: Al Viro
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-09-18 16:01:35 +0800
4026c9fde xfs: stop the sync worker before xfs_unmountfs ... Browse Code »

Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs. This prevents occasional crashes on unmount like so:

PID: 21602 TASK: ee9df060 CPU: 0 COMMAND: "kworker/0:3"
#0 [c5377d28] crash_kexec at c0292c94
#1 [c5377d80] oops_end at c07090c2
#2 [c5377d98] no_context at c06f614e
#3 [c5377dbc] __bad_area_nosemaphore at c06f6281
#4 [c5377df4] bad_area_nosemaphore at c06f629b
#5 [c5377e00] do_page_fault at c070b0cb
#6 [c5377e7c] error_code (via page_fault) at c070892c
EAX: f300c6a8 EBX: f300c6a8 ECX: 000000c0 EDX: 000000c0 EBP: c5377ed0
DS: 007b ESI: 00000000 ES: 007b EDI: 00000001 GS: ffffad20
CS: 0060 EIP: c0481ad0 ERR: ffffffff EFLAGS: 00010246
#7 [c5377eb0] atomic64_read_cx8 at c0481ad0
#8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
#9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653 TASK: e79143b0 CPU: 3 COMMAND: "umount"
#0 [cde0fda0] __schedule at c0706595
#1 [cde0fe28] schedule at c0706b89
#2 [cde0fe30] schedule_timeout at c0705600
#3 [cde0fe94] __down_common at c0706098
#4 [cde0fec8] __down at c0706122
#5 [cde0fed0] down at c025936f
#6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
#7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
#8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
#9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers
Reviewed-by: Dave Chinner
Reviewed-by: Mark Tinguely

Ben Myers
2012-09-18 01:05:22 +0800

30 Aug, 2012

1 commit

6fb8a90aa xfs: fix race while discarding buffers [V4] ... Browse Code »

While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
buffers from lru list), there is a possibility to have xfs_buf_stale() racing
with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
it.

This happens because xfs_buftarg_shrink() handle the dispose list without
locking and the test condition in xfs_buf_stale() checks for the buffer being in
*any* list:

if (!list_empty(&bp->b_lru))

If the buffer happens to be on dispose list, this causes the buffer counter of
lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
and another in xfs_buf_stale()) causing a wrong account usage of the lru list.

This may cause xfs_buftarg_shrink() to return a wrong value to the memory
shrinker shrink_slab(), and such account error may also cause an underflowed
value to be returned; since the counter is lower than the current number of
items in the lru list, a decrement may happen when the counter is 0, causing
an underflow on the counter.

The fix uses a new flag field (and a new buffer flag) to serialize buffer
handling during the shrink process. The new flag field has been designed to use
btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.

dchinner, sandeen, aquini and aris also deserve credits for this.

Signed-off-by: Carlos Maiolino
Reviewed-by: Ben Myers
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Carlos Maiolino
2012-08-30 04:01:11 +0800

25 Aug, 2012

5 commits

b686d1f79 xfs: xfs_seek_hole() refinement with hole searching from page cache for unwritten extents ... Browse Code »

xfs_seek_hole() refinement with hole searching from page cache for unwritten extent.

Signed-off-by: Jie Liu
Reviewed-by: Mark Tinguely
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Jeff Liu
2012-08-25 02:57:10 +0800
52f1acc8b xfs: xfs_seek_data() refinement with unwritten extents check up from page cache ... Browse Code »

xfs_seek_data() refinement with unwritten extents check up from page cache.

Signed-off-by: Jie Liu
Reviewed-by: Mark Tinguely
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Jeff Liu
2012-08-25 02:56:29 +0800
d126d43f6 xfs: Introduce a helper routine to probe data or hole offset from page cache ... Browse Code »

Introduce helpers to probe data or hole offset from page cache.

Signed-off-by: Jie Liu
Reviewed-by: Mark Tinguely
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Jeff Liu
2012-08-25 02:55:09 +0800
834ab1222 xfs: Remove type argument from xfs_seek_data()/xfs_seek_hole() ... Browse Code »

The type is already indicated by the function naming explicitly, so this argument
can be omitted from those calls.

Signed-off-by: Jie Liu
Reviewed-by: Mark Tinguely
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Jeff Liu
2012-08-25 02:48:05 +0800
e599b3253 xfs: fix race while discarding buffers [V4] ... Browse Code »

While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
buffers from lru list), there is a possibility to have xfs_buf_stale() racing
with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
it.

This happens because xfs_buftarg_shrink() handle the dispose list without
locking and the test condition in xfs_buf_stale() checks for the buffer being in
*any* list:

if (!list_empty(&bp->b_lru))

If the buffer happens to be on dispose list, this causes the buffer counter of
lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
and another in xfs_buf_stale()) causing a wrong account usage of the lru list.

This may cause xfs_buftarg_shrink() to return a wrong value to the memory
shrinker shrink_slab(), and such account error may also cause an underflowed
value to be returned; since the counter is lower than the current number of
items in the lru list, a decrement may happen when the counter is 0, causing
an underflow on the counter.

The fix uses a new flag field (and a new buffer flag) to serialize buffer
handling during the shrink process. The new flag field has been designed to use
btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.

dchinner, sandeen, aquini and aris also deserve credits for this.

Signed-off-by: Carlos Maiolino
Reviewed-by: Ben Myers
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Carlos Maiolino
2012-08-25 02:46:10 +0800

24 Aug, 2012

3 commits

a672e1be3 xfs: check for possible overflow in xfs_ioc_trim ... Browse Code »

If range.start or range.minlen is bigger than filesystem size, return
invalid value error. This fixes possible overflow in BTOBB macro when
passed value was nearly ULLONG_MAX.

Signed-off-by: Tomas Racek
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Tomas Racek
2012-08-24 03:48:44 +0800
761290309 xfs: unlock the AGI buffer when looping in xfs_dialloc ... Browse Code »

Also update some commens in the area to make the code easier to read.

Signed-off-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2012-08-24 03:48:32 +0800
0b9e3f6d8 xfs: fix uninitialised variable in xfs_rtbuf_get() ... Browse Code »

Results in this assert failure in generic/090:

XFS: Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4363
.....
Call Trace:
[] xfs_bmapi_read+0x6b/0x370
[] xfs_rtbuf_get+0x42/0x130
[] xfs_rtget_summary+0x89/0x120
[] xfs_rtallocate_extent_size+0xce/0x340
[] xfs_rtallocate_extent+0x240/0x290
[] xfs_bmap_rtalloc+0x1ba/0x340
[] xfs_bmap_alloc+0x35/0x40
[] xfs_bmapi_allocate+0xf1/0x350
[] xfs_bmapi_write+0x66e/0xa60
[] xfs_iomap_write_direct+0x22a/0x3f0
[] __xfs_get_blocks+0x38b/0x5d0
[] xfs_get_blocks_direct+0x14/0x20
[] do_blockdev_direct_IO+0xf71/0x1eb0
[] __blockdev_direct_IO+0x55/0x60
[] xfs_vm_direct_IO+0x11a/0x1e0
[] generic_file_direct_write+0xd7/0x1b0
[] xfs_file_dio_aio_write+0x13c/0x320
[] xfs_file_aio_write+0x1c2/0x1d0
[] do_sync_write+0xa7/0xe0
[] vfs_write+0xa8/0x160
[] sys_pwrite64+0x92/0xb0
[] system_call_fastpath+0x16/0x1b

Signed-off-by: Dave Chinner
Signed-off-by: Ben Myers

Dave Chinner
2012-08-24 03:48:16 +0800

21 Aug, 2012

1 commit

43829731d workqueue: deprecate flush[_delayed]_work_sync() ... Browse Code »

flush[_delayed]_work_sync() are now spurious. Mark them deprecated
and convert all users to flush[_delayed]_work().

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant and the regular flushes guarantee that the work item is
not pending or running on any CPU on return, so there's no reason to
use the sync flushes at all and they're going away.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo
Cc: Russell King
Cc: Paul Mundt
Cc: Ian Campbell
Cc: Jens Axboe
Cc: Mattia Dongili
Cc: Kent Yoder
Cc: David Airlie
Cc: Jiri Kosina
Cc: Karsten Keil
Cc: Bryan Wu
Cc: Benjamin Herrenschmidt
Cc: Alasdair Kergon
Cc: Mauro Carvalho Chehab
Cc: Florian Tobias Schandinat
Cc: David Woodhouse
Cc: "David S. Miller"
Cc: linux-wireless@vger.kernel.org
Cc: Anton Vorontsov
Cc: Sangbeom Kim
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: Eric Van Hensbergen
Cc: Takashi Iwai
Cc: Steven Whitehouse
Cc: Petr Vandrovec
Cc: Mark Fasheh
Cc: Christoph Hellwig
Cc: Avi Kivity

Tejun Heo
2012-08-21 05:51:24 +0800

17 Aug, 2012

4 commits

643bfc061 xfs: check for possible overflow in xfs_ioc_trim ... Browse Code »

If range.start or range.minlen is bigger than filesystem size, return
invalid value error. This fixes possible overflow in BTOBB macro when
passed value was nearly ULLONG_MAX.

Signed-off-by: Tomas Racek
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Tomas Racek
2012-08-17 05:42:52 +0800
c4982110a xfs: unlock the AGI buffer when looping in xfs_dialloc ... Browse Code »

Also update some commens in the area to make the code easier to read.

Signed-off-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2012-08-17 05:23:59 +0800
1ed845df6 xfs: kill struct declarations in xfs_mount.h ... Browse Code »

I noticed that "struct xfs_mount_args" was still declared in
"fs/xfs/xfs_mount.h". That struct doesn't even exist any more (and
is obviously not referenced elsewhere in that header file). While
in there, delete four other unneeded struct declarations in that
file.

Doing so highlights that "fs/xfs/xfs_trace.h" was relying indirectly
on "xfs_mount.h" to be #included in order to declare "struct
xfs_bmbt_irec", so add that declaration to resolve that issue.

Signed-off-by: Alex Elder
Signed-off-by: Ben Myers

Alex Elder
2012-08-17 02:29:35 +0800
a76cccbee xfs: fix uninitialised variable in xfs_rtbuf_get() ... Browse Code »

Results in this assert failure in generic/090:

XFS: Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4363
.....
Call Trace:
[] xfs_bmapi_read+0x6b/0x370
[] xfs_rtbuf_get+0x42/0x130
[] xfs_rtget_summary+0x89/0x120
[] xfs_rtallocate_extent_size+0xce/0x340
[] xfs_rtallocate_extent+0x240/0x290
[] xfs_bmap_rtalloc+0x1ba/0x340
[] xfs_bmap_alloc+0x35/0x40
[] xfs_bmapi_allocate+0xf1/0x350
[] xfs_bmapi_write+0x66e/0xa60
[] xfs_iomap_write_direct+0x22a/0x3f0
[] __xfs_get_blocks+0x38b/0x5d0
[] xfs_get_blocks_direct+0x14/0x20
[] do_blockdev_direct_IO+0xf71/0x1eb0
[] __blockdev_direct_IO+0x55/0x60
[] xfs_vm_direct_IO+0x11a/0x1e0
[] generic_file_direct_write+0xd7/0x1b0
[] xfs_file_dio_aio_write+0x13c/0x320
[] xfs_file_aio_write+0x1c2/0x1d0
[] do_sync_write+0xa7/0xe0
[] vfs_write+0xa8/0x160
[] sys_pwrite64+0x92/0xb0
[] system_call_fastpath+0x16/0x1b

Signed-off-by: Dave Chinner
Signed-off-by: Ben Myers

Dave Chinner
2012-08-17 01:53:12 +0800

02 Aug, 2012

1 commit

a0e881b7c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.

Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...

Linus Torvalds
2012-08-02 01:26:23 +0800

31 Jul, 2012

2 commits

d9457dc05 xfs: Convert to new freezing code ... Browse Code »

Generic code now blocks all writers from standard write paths. So we add
blocking of all writers coming from ioctl (we get a protection of ioctl against
racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
non-racy freeze protection. We also keep freeze protection on transaction
start to block internal filesystem writes such as removal of preallocated
blocks.

CC: Ben Myers
CC: Alex Elder
CC: xfs@oss.sgi.com
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2012-07-31 13:45:48 +0800
37cd9600a Merge tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs update from Ben Myers:
"Numerous cleanups and several bug fixes. Here are some highlights:

- Discontiguous directory buffer support
- Inode allocator refactoring
- Removal of the IO lock in inode reclaim
- Implementation of .update_time
- Fix for handling of EOF in xfs_vm_writepage
- Fix for races in xfsaild, and idle mode is re-enabled
- Fix for a crash in xfs_buf completion handlers on unmount."

Fix up trivial conflicts in fs/xfs/{xfs_buf.c,xfs_log.c,xfs_log_priv.h}
due to duplicate patches that had already been merged for 3.5.

* tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs: (44 commits)
xfs: wait for the write the superblock on unmount
xfs: re-enable xfsaild idle mode and fix associated races
xfs: remove iolock lock classes
xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes
xfs: do not take the iolock in xfs_inactive
xfs: remove xfs_inactive_attrs
xfs: clean up xfs_inactive
xfs: do not read the AGI buffer in xfs_dialloc until nessecary
xfs: refactor xfs_ialloc_ag_select
xfs: add a short cut to xfs_dialloc for the non-NULL agbp case
xfs: remove the alloc_done argument to xfs_dialloc
xfs: split xfs_dialloc
xfs: remove xfs_ialloc_find_free
Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision.
xfs: remove xfs_inotobp
xfs: merge xfs_itobp into xfs_imap_to_bp
xfs: handle EOF correctly in xfs_vm_writepage
xfs: implement ->update_time
xfs: fix comment typo of struct xfs_da_blkinfo.
xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
...

Linus Torvalds
2012-07-31 04:37:53 +0800

30 Jul, 2012

1 commit

9a57fa8ee xfs: wait for the write the superblock on unmount ... Browse Code »

v2: Add the xfs_buf_lock to xfs_quiesce_attr().
Add explaination why xfs_buf_lock() is used to wait for write.

xfs_wait_buftarg() does not wait for the completion of the write of the
uncached superblock. This write can race with the shutdown of the log
and causes a panic if the write does not win the race.

During the log write, xfsaild_push() will lock the buffer and set the
XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed
on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait
for the write to complete. The buffer's lock is held until the write is
complete, so we can block on a xfs_buf_lock() request to be notified
that the write is complete.

Signed-off-by: Mark Tinguely
Reviewed-by: Christoph Hellwig
Signed-off-by: Ben Myers

Mark Tinguely
2012-07-30 05:34:19 +0800