Eric Lee / smarc-fsl-linux-kernel

21 Dec, 2012

5 commits

96680d2b9 Merge branch 'for-next' of git://git.infradead.org/users/eparis/notify ... Browse Code »

Pull filesystem notification updates from Eric Paris:
"This pull mostly is about locking changes in the fsnotify system. By
switching the group lock from a spin_lock() to a mutex() we can now
hold the lock across things like iput(). This fixes a problem
involving unmounting a fs and having inodes be busy, first pointed out
by FAT, but reproducible with tmpfs.

This also restores signal driven I/O for inotify, which has been
broken since about 2.6.32."

Ugh. I *hate* the timing of this. It was rebased after the merge
window opened, and then left to sit with the pull request coming the day
before the merge window closes. That's just crap. But apparently the
patches themselves have been around for over a year, just gathering
dust, so now it's suddenly critical.

Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen
Rothwell's fixes from -next.

* 'for-next' of git://git.infradead.org/users/eparis/notify:
inotify: automatically restart syscalls
inotify: dont skip removal of watch descriptor if creation of ignored event failed
fanotify: dont merge permission events
fsnotify: make fasync generic for both inotify and fanotify
fsnotify: change locking order
fsnotify: dont put marks on temporary list when clearing marks by group
fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()
fsnotify: pass group to fsnotify_destroy_mark()
fsnotify: use a mutex instead of a spinlock to protect a groups mark list
fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed
fsnotify: take groups mark_lock before mark lock
fsnotify: use reference counting for groups
fsnotify: introduce fsnotify_get_group()
inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()

Linus Torvalds
2012-12-21 12:11:52 +0800
4c9a44aeb Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge the rest of Andrew's patches for -rc1:
"A bunch of fixes and misc missed-out-on things.

That'll do for -rc1. I still have a batch of IPC patches which still
have a possible bug report which I'm chasing down."

* emailed patches from Andrew Morton : (25 commits)
keys: use keyring_alloc() to create module signing keyring
keys: fix unreachable code
sendfile: allows bypassing of notifier events
SGI-XP: handle non-fatal traps
fat: fix incorrect function comment
Documentation: ABI: remove testing/sysfs-devices-node
proc: fix inconsistent lock state
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
memcg: don't register hotcpu notifier from ->css_alloc()
checkpatch: warn on uapi #includes that #include
mm: cma: WARN if freed memory is still in use
exec: do not leave bprm->interp on stack
...

Linus Torvalds
2012-12-21 12:00:43 +0800
54d46ea99 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull signal handling cleanups from Al Viro:
"sigaltstack infrastructure + conversion for x86, alpha and um,
COMPAT_SYSCALL_DEFINE infrastructure.

Note that there are several conflicts between "unify
SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
resolution is trivial - just remove definitions of SS_ONSTACK and
SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
include/uapi/linux/signal.h contains the unified variant."

Fixed up conflicts as per Al.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to generic sigaltstack
new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
generic compat_sys_sigaltstack()
introduce generic sys_sigaltstack(), switch x86 and um to it
new helper: compat_user_stack_pointer()
new helper: restore_altstack()
unify SS_ONSTACK/SS_DISABLE definitions
new helper: current_user_stack_pointer()
missing user_stack_pointer() instances
Bury the conditionals from kernel_thread/kernel_execve series
COMPAT_SYSCALL_DEFINE: infrastructure

Linus Torvalds
2012-12-21 10:05:28 +0800
cfde81908 keys: use keyring_alloc() to create module signing keyring ... Browse Code »

Use keyring_alloc() to create special keyrings now that it has
a permissions parameter rather than using key_alloc() +
key_instantiate_and_link().

Signed-off-by: David Howells
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2012-12-21 09:40:21 +0800
44fd07e98 kcmp: include linux/ptrace.h ... Browse Code »

This makes it compile on s390. After all the ptrace_may_access
(which we use this file) is declared exactly in linux/ptrace.h.

This is preparatory work to wire this syscall up on all archs.

Signed-off-by: Cyrill Gorcunov
Signed-off-by: Alexander Kartashov
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2012-12-21 09:40:19 +0800

20 Dec, 2012

8 commits

2832bc19f sched: numa: ksm: fix oops in task_numa_placment() ... Browse Code »

task_numa_placement() oopsed on NULL p->mm when task_numa_fault() got
called in the handling of break_ksm() for ksmd. That might be a
peculiar case, which perhaps KSM could takes steps to avoid? but it's
more robust if task_numa_placement() allows for such a possibility.

Signed-off-by: Hugh Dickins
Acked-by: Mel Gorman
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-12-20 23:06:56 +0800
7005cd397 Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull random updates from Ted Ts'o:
"A few /dev/random improvements for the v3.8 merge window."

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: Mix cputime from each thread that exits to the pool
random: prime last_data value per fips requirements
random: fix debug format strings
random: make it possible to enable debugging without rebuild

Linus Torvalds
2012-12-20 12:23:37 +0800
c40702c49 new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those ... Browse Code »

note that they are relying on access_ok() already checked by caller.

Signed-off-by: Al Viro

Al Viro
2012-12-20 07:07:41 +0800
902684395 generic compat_sys_sigaltstack() ... Browse Code »

Again, conditional on CONFIG_GENERIC_SIGALTSTACK

Signed-off-by: Al Viro

Al Viro
2012-12-20 07:07:41 +0800
6bf9adfc9 introduce generic sys_sigaltstack(), switch x86 and um to it ... Browse Code »

Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not
select it are completely unaffected

Signed-off-by: Al Viro

Al Viro
2012-12-20 07:07:40 +0800
5c49574ff new helper: restore_altstack() ... Browse Code »

to be used by rt_sigreturn instances

Signed-off-by: Al Viro

Al Viro
2012-12-20 07:07:40 +0800
ae903caae Bury the conditionals from kernel_thread/kernel_execve series ... Browse Code »

All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().

Signed-off-by: Al Viro

Al Viro
2012-12-20 07:07:38 +0800
3935e8950 watchdog: Fix disable/enable regression ... Browse Code »

Commit 8d4516904b39 ("watchdog: Fix CPU hotplug regression") causes an
oops or hard lockup when doing

echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog

and the kernel is booted with nmi_watchdog=1 (default)

Running laptop-mode-tools and disconnecting/connecting AC power will
cause this to trigger, making it a common failure scenario on laptops.

Instead of bailing out of watchdog_disable() when !watchdog_enabled we
can initialize the hrtimer regardless of watchdog_enabled status. This
makes it safe to call watchdog_disable() in the nmi_watchdog=0 case,
without the negative effect on the enabled => disabled => enabled case.

All these tests pass with this patch:
- nmi_watchdog=1
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog

- nmi_watchdog=0
echo 0 > /sys/devices/system/cpu/cpu1/online

- nmi_watchdog=0
echo mem > /sys/power/state

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51661

Cc: # v3.7
Cc: Norbert Warmuth
Cc: Joseph Salisbury
Cc: Thomas Gleixner
Signed-off-by: Bjørn Mork
Signed-off-by: Linus Torvalds

Bjørn Mork
2012-12-20 04:10:33 +0800

19 Dec, 2012

7 commits

7a684c452 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module update from Rusty Russell:
"Nothing all that exciting; a new module-from-fd syscall for those who
want to verify the source of the module (ChromeOS) and/or use standard
IMA on it or other security hooks."

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Fix kbuild output when using default extra_certificates
MODSIGN: Avoid using .incbin in C source
modules: don't hand 0 to vmalloc.
module: Remove a extra null character at the top of module->strtab.
ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants
ASN.1: Define indefinite length marker constant
moduleparam: use __UNIQUE_ID()
__UNIQUE_ID()
MODSIGN: Add modules_sign make target
powerpc: add finit_module syscall.
ima: support new kernel module syscall
add finit_module syscall to asm-generic
ARM: add finit_module syscall to ARM
security: introduce kernel_module_from_file hook
module: add flags arg to sys_finit_module()
module: add syscall to load module from fd

Linus Torvalds
2012-12-19 23:55:08 +0800
673ab8783 Merge branch 'akpm' (more patches from Andrew) ... Browse Code »

Merge patches from Andrew Morton:
"Most of the rest of MM, plus a few dribs and drabs.

I still have quite a few irritating patches left around: ones with
dubious testing results, lack of review, ones which should have gone
via maintainer trees but the maintainers are slack, etc.

I need to be more activist in getting these things wrapped up outside
the merge window, but they're such a PITA."

* emailed patches from Andrew Morton : (48 commits)
mm/vmscan.c: avoid possible deadlock caused by too_many_isolated()
vmscan: comment too_many_isolated()
mm/kmemleak.c: remove obsolete simple_strtoul
mm/memory_hotplug.c: improve comments
mm/hugetlb: create hugetlb cgroup file in hugetlb_init
mm/mprotect.c: coding-style cleanups
Documentation: ABI: /sys/devices/system/node/
slub: drop mutex before deleting sysfs entry
memcg: add comments clarifying aspects of cache attribute propagation
kmem: add slab-specific documentation about the kmem controller
slub: slub-specific propagation changes
slab: propagate tunable values
memcg: aggregate memcg cache values in slabinfo
memcg/sl[au]b: shrink dead caches
memcg/sl[au]b: track all the memcg children of a kmem_cache
memcg: destroy memcg caches
sl[au]b: allocate objects from memcg cache
sl[au]b: always get the cache from its page in kmem_cache_free()
memcg: skip memcg kmem allocations in specified code regions
memcg: infrastructure to match an allocation to the right cache
...

Linus Torvalds
2012-12-19 07:08:12 +0800
2ad306b17 fork: protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs ... Browse Code »

Because those architectures will draw their stacks directly from the page
allocator, rather than the slab cache, we can directly pass __GFP_KMEMCG
flag, and issue the corresponding free_pages.

This code path is taken when the architecture doesn't define
CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has
THREAD_SIZE >= PAGE_SIZE. Luckily, most - if not all - of the remaining
architectures fall in this category.

This will guarantee that every stack page is accounted to the memcg the
process currently lives on, and will have the allocations to fail if they
go over limit.

For the time being, I am defining a new variant of THREADINFO_GFP, not to
mess with the other path. Once the slab is also tracked by memcg, we can
get rid of that flag.

Tested to successfully protect against :(){ :|:& };:

Signed-off-by: Glauber Costa
Acked-by: Frederic Weisbecker
Acked-by: Kamezawa Hiroyuki
Reviewed-by: Michal Hocko
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Greg Thelen
Cc: Johannes Weiner
Cc: JoonSoo Kim
Cc: Mel Gorman
Cc: Pekka Enberg
Cc: Rik van Riel
Cc: Suleiman Souhlal
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-12-19 07:02:13 +0800
50bdd430c res_counter: return amount of charges after res_counter_uncharge() ... Browse Code »

It is useful to know how many charges are still left after a call to
res_counter_uncharge. While it is possible to issue a res_counter_read
after uncharge, this can be racy.

If we need, for instance, to take some action when the counters drop down
to 0, only one of the callers should see it. This is the same semantics
as the atomic variables in the kernel.

Since the current return value is void, we don't need to worry about
anything breaking due to this change: nobody relied on that, and only
users appearing from now on will be checking this value.

Signed-off-by: Glauber Costa
Reviewed-by: Michal Hocko
Acked-by: Kamezawa Hiroyuki
Acked-by: David Rientjes
Cc: Johannes Weiner
Cc: Suleiman Souhlal
Cc: Tejun Heo
Cc: Christoph Lameter
Cc: Frederic Weisbecker
Cc: Greg Thelen
Cc: JoonSoo Kim
Cc: Mel Gorman
Cc: Pekka Enberg
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-12-19 07:02:12 +0800
19af395d7 irq: tsk->comm is an array ... Browse Code »

The array check is useless so remove it.

[akpm@linux-foundation.org: remove comment, per David]
Signed-off-by: Alan Cox
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2012-12-19 07:02:11 +0800
758338e96 Merge branch 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull minor tracing updates and fixes from Steven Rostedt:
"It seems that one of my old pull requests have slipped through.

The changes are contained to just the files that I maintain, and are
changes from others that I told I would get into this merge window.

They have already been in linux-next for several weeks, and should be
well tested."

* 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_read
tracing: Remove unneeded checks from the stack tracer
tracing: Add a resize function to make one buffer equivalent to another buffer

Linus Torvalds
2012-12-19 04:28:39 +0800
a2faf2fc5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull (again) user namespace infrastructure changes from Eric Biederman:
"Those bugs, those darn embarrasing bugs just want don't want to get
fixed.

Linus I just updated my mirror of your kernel.org tree and it appears
you successfully pulled everything except the last 4 commits that fix
those embarrasing bugs.

When you get a chance can you please repull my branch"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Fix typo in description of the limitation of userns_install
userns: Add a more complete capability subset test to commit_creds
userns: Require CAP_SYS_ADMIN for most uses of setns.
Fix cap_capable to only allow owners in the parent user namespace to have caps.

Linus Torvalds
2012-12-19 02:55:28 +0800

18 Dec, 2012

11 commits

848b81415 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc patches from Andrew Morton:
"Incoming:

- lots of misc stuff

- backlight tree updates

- lib/ updates

- Oleg's percpu-rwsem changes

- checkpatch

- rtc

- aoe

- more checkpoint/restart support

I still have a pile of MM stuff pending - Pekka should be merging
later today after which that is good to go. A number of other things
are twiddling thumbs awaiting maintainer merges."

* emailed patches from Andrew Morton : (180 commits)
scatterlist: don't BUG when we can trivially return a proper error.
docs: update documentation about /proc//fdinfo/ fanotify output
fs, fanotify: add @mflags field to fanotify output
docs: add documentation about /proc//fdinfo/ output
fs, notify: add procfs fdinfo helper
fs, exportfs: add exportfs_encode_inode_fh() helper
fs, exportfs: escape nil dereference if no s_export_op present
fs, epoll: add procfs fdinfo helper
fs, eventfd: add procfs fdinfo helper
procfs: add ability to plug in auxiliary fdinfo providers
tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
breakpoint selftests: print failure status instead of cause make error
kcmp selftests: print fail status instead of cause make error
kcmp selftests: make run_tests fix
mem-hotplug selftests: print failure status instead of cause make error
cpu-hotplug selftests: print failure status instead of cause make error
mqueue selftests: print failure status instead of cause make error
vm selftests: print failure status instead of cause make error
ubifs: use prandom_bytes
mtd: nandsim: use prandom_bytes
...

Linus Torvalds
2012-12-18 12:58:12 +0800
a5ba911ec pidns: remove unused is_container_init() ... Browse Code »

Since commit 1cdcbec1a337 ("CRED: Neuter sys_capset()")
is_container_init() has no callers.

Signed-off-by: Gao feng
Cc: David Howells
Acked-by: Serge Hallyn
Cc: James Morris
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gao feng
2012-12-18 09:15:23 +0800
992fb6e17 ptrace: introduce PTRACE_O_EXITKILL ... Browse Code »

Ptrace jailers want to be sure that the tracee can never escape
from the control. However if the tracer dies unexpectedly the
tracee continues to run in potentially unsafe mode.

Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
it sends SIGKILL to every tracee which has this bit set.

Note that the new option is not equal to the last-option << 1. Because
currently all options have an event, and the new one starts the eventless
group. It uses the random 20 bit, so we have the room for 12 more events,
but we can also add the new eventless options below this one.

Suggested by Amnon Shiloh.

Signed-off-by: Oleg Nesterov
Tested-by: Amnon Shiloh
Cc: Denys Vlasenko
Cc: Michael Kerrisk
Cc: Serge Hallyn
Cc: Chris Evans
Cc: David Howells
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2012-12-18 09:15:22 +0800
0ad50c389 compat: generic compat_sys_sched_rr_get_interval() implementation ... Browse Code »

This function is used by sparc, powerpc tile and arm64 for compat support.
The patch adds a generic implementation with a wrapper for PowerPC to do
the u32->int sign extension.

The reason for a single patch covering powerpc, tile, sparc and arm64 is
to keep it bisectable, otherwise kernel building may fail with mismatched
function declarations.

Signed-off-by: Catalin Marinas
Acked-by: Chris Metcalf [for tile]
Acked-by: David S. Miller
Acked-by: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2012-12-18 09:15:18 +0800
b2e902f02 trace: use kbasename() ... Browse Code »

Signed-off-by: Andy Shevchenko
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Shevchenko
2012-12-18 09:15:17 +0800
2fa72c8fa printk: boot_delay should only affect output ... Browse Code »

The boot_delay parameter affects all printk(), even if the log level
prevents visible output from the call. It results in delays greater than
the user intended without purpose.

This patch changes the behaviour of boot_delay to only delay output.

Signed-off-by: Andrew Cooks
Acked-by: Randy Dunlap
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Cooks
2012-12-18 09:15:13 +0800
0f34c4009 watchdog: store the watchdog sample period as a variable ... Browse Code »

Currently getting the sample period is always thru a complex
calculation: get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5).

We can store the sample period as a variable, and set it as __read_mostly
type.

Signed-off-by: liu chuansheng
Cc: Don Zickus
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chuansheng Liu
2012-12-18 09:15:13 +0800
965c8e59c lseek: the "whence" argument is called "whence" ... Browse Code »

But the kernel decided to call it "origin" instead. Fix most of the
sites.

Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-12-18 09:15:12 +0800
8ec7d50f1 kernel: remove reference to feature-removal-schedule.txt ... Browse Code »

In commit 9c0ece069b32 ("Get rid of Documentation/feature-removal.txt"),
Linus removed feature-removal-schedule.txt from Documentation, but there
is still some reference to this file. So remove them.

Signed-off-by: Tao Ma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tao Ma
2012-12-18 09:15:12 +0800
6a2b60b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace changes from Eric Biederman:
"While small this set of changes is very significant with respect to
containers in general and user namespaces in particular. The user
space interface is now complete.

This set of changes adds support for unprivileged users to create user
namespaces and as a user namespace root to create other namespaces.
The tyranny of supporting suid root preventing unprivileged users from
using cool new kernel features is broken.

This set of changes completes the work on setns, adding support for
the pid, user, mount namespaces.

This set of changes includes a bunch of basic pid namespace
cleanups/simplifications. Of particular significance is the rework of
the pid namespace cleanup so it no longer requires sending out
tendrils into all kinds of unexpected cleanup paths for operation. At
least one case of broken error handling is fixed by this cleanup.

The files under /proc//ns/ have been converted from regular files
to magic symlinks which prevents incorrect caching by the VFS,
ensuring the files always refer to the namespace the process is
currently using and ensuring that the ptrace_mayaccess permission
checks are always applied.

The files under /proc//ns/ have been given stable inode numbers
so it is now possible to see if different processes share the same
namespaces.

Through the David Miller's net tree are changes to relax many of the
permission checks in the networking stack to allowing the user
namespace root to usefully use the networking stack. Similar changes
for the mount namespace and the pid namespace are coming through my
tree.

Two small changes to add user namespace support were commited here adn
in David Miller's -net tree so that I could complete the work on the
/proc//ns/ files in this tree.

Work remains to make it safe to build user namespaces and 9p, afs,
ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
Kconfig guard remains in place preventing that user namespaces from
being built when any of those filesystems are enabled.

Future design work remains to allow root users outside of the initial
user namespace to mount more than just /proc and /sys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
proc: Usable inode numbers for the namespace file descriptors.
proc: Fix the namespace inode permission checks.
proc: Generalize proc inode allocation
userns: Allow unprivilged mounts of proc and sysfs
userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
procfs: Print task uids and gids in the userns that opened the proc file
userns: Implement unshare of the user namespace
userns: Implent proc namespace operations
userns: Kill task_user_ns
userns: Make create_new_namespaces take a user_ns parameter
userns: Allow unprivileged use of setns.
userns: Allow unprivileged users to create new namespaces
userns: Allow setting a userns mapping to your current uid.
userns: Allow chown and setgid preservation
userns: Allow unprivileged users to create user namespaces.
userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
userns: fix return value on mntns_install() failure
vfs: Allow unprivileged manipulation of the mount namespace.
vfs: Only support slave subtrees across different user namespaces
vfs: Add a user namespace reference from struct mnt_namespace
...

Linus Torvalds
2012-12-18 07:44:47 +0800
221392c3a sched: numa: Fix build error if CONFIG_NUMA_BALANCING && !CONFIG_TRANSPARENT_HUGEPAGE ... Browse Code »

Michal Hocko reported that the following build error occurs if
CONFIG_NUMA_BALANCING is set without THP support

kernel/sched/fair.c: In function ‘task_numa_work’:
kernel/sched/fair.c:932:55: error: call to ‘__build_bug_failed’ declared with attribute error: BUILD_BUG failed

The problem is that HPAGE_PMD_SHIFT triggers a BUILD_BUG() on
!CONFIG_TRANSPARENT_HUGEPAGE. This patch addresses the problem.

Reported-by: Michal Hocko
Signed-off-by: Mel Gorman
Signed-off-by: Linus Torvalds

Mel Gorman
2012-12-18 00:25:50 +0800

17 Dec, 2012

3 commits

613370549 random: Mix cputime from each thread that exits to the pool ... Browse Code »

When a thread exits mix it's cputime (userspace + kernelspace) to the entropy pool.

We don't know how "random" this is, so we use add_device_randomness that doesn't mess
with entropy count.

Signed-off-by: Nick Kossifidis
Signed-off-by: Theodore Ts'o

Nick Kossifidis
2012-12-17 11:18:11 +0800
2a74dbb9a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull security subsystem updates from James Morris:
"A quiet cycle for the security subsystem with just a few maintenance
updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
Smack: create a sysfs mount point for smackfs
Smack: use select not depends in Kconfig
Yama: remove locking from delete path
Yama: add RCU to drop read locking
drivers/char/tpm: remove tasklet and cleanup
KEYS: Use keyring_alloc() to create special keyrings
KEYS: Reduce initial permissions on keys
KEYS: Make the session and process keyrings per-thread
seccomp: Make syscall skipping and nr changes more consistent
key: Fix resource leak
keys: Fix unreachable code
KEYS: Add payload preparsing opportunity prior to key instantiate or update

Linus Torvalds
2012-12-17 07:40:50 +0800
3d59eebc5 Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma ... Browse Code »

Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
"There are three implementations for NUMA balancing, this tree
(balancenuma), numacore which has been developed in tip/master and
autonuma which is in aa.git.

In almost all respects balancenuma is the dumbest of the three because
its main impact is on the VM side with no attempt to be smart about
scheduling. In the interest of getting the ball rolling, it would be
desirable to see this much merged for 3.8 with the view to building
scheduler smarts on top and adapting the VM where required for 3.9.

The most recent set of comparisons available from different people are

mel: https://lkml.org/lkml/2012/12/9/108
mingo: https://lkml.org/lkml/2012/12/7/331
tglx: https://lkml.org/lkml/2012/12/10/437
srikar: https://lkml.org/lkml/2012/12/10/397

The results are a mixed bag. In my own tests, balancenuma does
reasonably well. It's dumb as rocks and does not regress against
mainline. On the other hand, Ingo's tests shows that balancenuma is
incapable of converging for this workloads driven by perf which is bad
but is potentially explained by the lack of scheduler smarts. Thomas'
results show balancenuma improves on mainline but falls far short of
numacore or autonuma. Srikar's results indicate we all suffer on a
large machine with imbalanced node sizes.

My own testing showed that recent numacore results have improved
dramatically, particularly in the last week but not universally.
We've butted heads heavily on system CPU usage and high levels of
migration even when it shows that overall performance is better.
There are also cases where it regresses. Of interest is that for
specjbb in some configurations it will regress for lower numbers of
warehouses and show gains for higher numbers which is not reported by
the tool by default and sometimes missed in treports. Recently I
reported for numacore that the JVM was crashing with
NullPointerExceptions but currently it's unclear what the source of
this problem is. Initially I thought it was in how numacore batch
handles PTEs but I'm no longer think this is the case. It's possible
numacore is just able to trigger it due to higher rates of migration.

These reports were quite late in the cycle so I/we would like to start
with this tree as it contains much of the code we can agree on and has
not changed significantly over the last 2-3 weeks."

* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
mm/rmap: Convert the struct anon_vma::mutex to an rwsem
mm: migrate: Account a transhuge page properly when rate limiting
mm: numa: Account for failed allocations and isolations as migration failures
mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
mm: numa: Add THP migration for the NUMA working set scanning fault case.
mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
mm: sched: numa: Control enabling and disabling of NUMA balancing
mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely tasknode relationships
mm: numa: migrate: Set last_nid on newly allocated page
mm: numa: split_huge_page: Transfer last_nid on tail page
mm: numa: Introduce last_nid to the page frame
sched: numa: Slowly increase the scanning period as NUMA faults are handled
mm: numa: Rate limit setting of pte_numa if node is saturated
mm: numa: Rate limit the amount of memory that is migrated between nodes
mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
mm: numa: Migrate pages handled during a pmd_numa hinting fault
mm: numa: Migrate on reference policy
...

Linus Torvalds
2012-12-17 07:18:08 +0800

16 Dec, 2012

1 commit

1ed55eac3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto update from Herbert Xu:

- Added aesni/avx/x86_64 implementations for camellia.

- Optimised AVX code for cast5/serpent/twofish/cast6.

- Fixed vmac bug with unaligned input.

- Allow compression algorithms in FIPS mode.

- Optimised crc32c implementation for Intel.

- Misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (32 commits)
crypto: caam - Updated SEC-4.0 device tree binding for ERA information.
crypto: testmgr - remove superfluous initializers for xts(aes)
crypto: testmgr - allow compression algs in fips mode
crypto: testmgr - add larger crc32c test vector to test FPU path in crc32c_intel
crypto: testmgr - clean alg_test_null entries in alg_test_descs[]
crypto: testmgr - remove fips_allowed flag from camellia-aesni null-tests
crypto: cast5/cast6 - move lookup tables to shared module
padata: use __this_cpu_read per-cpu helper
crypto: s5p-sss - Fix compilation error
crypto: picoxcell - Add terminating entry for platform_device_id table
crypto: omap-aes - select BLKCIPHER2
crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher
crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file
crypto: tcrypt - add async speed test for camellia cipher
crypto: tegra-aes - fix error-valued pointer dereference
crypto: tegra - fix missing unlock on error case
crypto: cast5/avx - avoid using temporary stack buffers
crypto: serpent/avx - avoid using temporary stack buffers
crypto: twofish/avx - avoid using temporary stack buffers
crypto: cast6/avx - avoid using temporary stack buffers
...

Linus Torvalds
2012-12-16 04:35:19 +0800

15 Dec, 2012

3 commits

5155040ed userns: Fix typo in description of the limitation of userns_install ... Browse Code »

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-12-15 10:36:36 +0800
aa6d054e5 userns: Add a more complete capability subset test to commit_creds ... Browse Code »

When unsharing a user namespace we reduce our credentials to just what
can be done in that user namespace. This is a subset of the credentials
we previously had. Teach commit_creds to recognize this is a subset
of the credentials we have had before and don't clear the dumpability flag.

This allows an unprivileged program to do:
unshare(CLONE_NEWUSER);
fd = open("/proc/self/uid_map", O_RDWR);

Where previously opening the uid_map writable would fail because
the the task had been made non-dumpable.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-12-15 10:36:26 +0800
5e4a08476 userns: Require CAP_SYS_ADMIN for most uses of setns. ... Browse Code »

Andy Lutomirski found a nasty little bug in
the permissions of setns. With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
char path[PATH_MAX];
snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
fd = open(path, O_RDONLY);
setns(fd, 0);
system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.

Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-12-15 08:12:03 +0800

14 Dec, 2012

2 commits

17bc14b76 Revert "sched: Update_cfs_shares at period edge" ... Browse Code »

This reverts commit f269ae0469fc882332bdfb5db15d3c1315fe2a10.

It turns out it causes a very noticeable interactivity regression with
CONFIG_SCHED_AUTOGROUP (test-case: "make -j32" of the kernel in a
terminal window, while scrolling in a browser - the autogrouping means
that the two end up in separate cgroups, and the browser should be
smooth as silk despite the high load).

Says Paul Turner:
"It seems that the update-throttling on the wake-side is reducing the
interactive tasks' ability to preempt. While I suspect the right
longer term answer here is force these updates only in the
cross-cgroup case; this is less trivial. For this release I believe
the right answer is either going to be a revert or restore the updates
on the enqueue-side."

Reported-by: Linus Torvalds
Bisected-by: Mike Galbraith
Acked-by: Paul Turner
Acked-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Linus Torvalds
2012-12-14 23:20:43 +0800
e10e1774e MODSIGN: Fix kbuild output when using default extra_certificates ... Browse Code »

Reported-by: Peter Foley
Signed-off-by: Michal Marek
Acked-by: Peter Foley
Signed-off-by: Rusty Russell

Michal Marek
2012-12-14 10:36:45 +0800