Doug / smarc-fsl-linux-kernel | Embedian Git Server

30 Apr, 2013

3 commits

ab86e974f Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core timer updates from Ingo Molnar:
"The main changes in this cycle's merge are:

- Implement shadow timekeeper to shorten in kernel reader side
blocking, by Thomas Gleixner.

- Posix timers enhancements by Pavel Emelyanov:

- allocate timer ID per process, so that exact timer ID allocations
can be re-created be checkpoint/restore code.

- debuggability and tooling (/proc/PID/timers, etc.) improvements.

- suspend/resume enhancements by Feng Tang: on certain new Intel Atom
processors (Penwell and Cloverview), there is a feature that the
TSC won't stop in S3 state, so the TSC value won't be reset to 0
after resume. This can be taken advantage of by the generic via
the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to
recover/approximate sleep time, the main (and precise) clocksource
can be used.

- Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many
CPUs the file goes beyond 4MB of size and thus the current
simplistic seqfile approach fails. Convert /proc/timer_list to a
proper seq_file with its own iterator.

- Cleanups and refactorings of the core timekeeping code by John
Stultz.

- International Atomic Clock time is managed by the NTP code
internally currently but not exposed externally. Separate the TAI
code out and add CLOCK_TAI support and TAI support to the hrtimer
and posix-timer code, by John Stultz.

- Add deep idle support enhacement to the broadcast clockevents core
timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ
clockevents feature (which will be utilized by future clockevents
driver updates), which allows the use of IRQ affinities to avoid
spurious wakeups of idle CPUs - the right CPU with an expiring
timer will be woken.

- Add new ARM bcm281xx clocksource driver, by Christian Daudt

- ... various other fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
clockevents: Set dummy handler on CPU_DEAD shutdown
timekeeping: Update tk->cycle_last in resume
posix-timers: Remove unused variable
clockevents: Switch into oneshot mode even if broadcast registered late
timer_list: Convert timer list to be a proper seq_file
timer_list: Split timer_list_show_tickdevices
posix-timers: Show sigevent info in proc file
posix-timers: Introduce /proc/PID/timers file
posix timers: Allocate timer id per process (v2)
timekeeping: Make sure to notify hrtimers when TAI offset changes
hrtimer: Fix ktime_add_ns() overflow on 32bit architectures
hrtimer: Add expiry time overflow check in hrtimer_interrupt
timekeeping: Shorten seq_count region
timekeeping: Implement a shadow timekeeper
timekeeping: Delay update of clock->cycle_last
timekeeping: Store cycle_last value in timekeeper struct as well
ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state
timekeeping: Simplify tai updating from do_adjtimex
timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()
...

Linus Torvalds
2013-04-30 23:15:40 +0800
3c743a7f7 fs/proc/kcore.c: use register_hotmemory_notifier() ... Browse Code »

Saves an ifdef, no code size changes

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2013-04-30 06:54:36 +0800
db3808c1b mm, vmalloc: move get_vmalloc_info() to vmalloc.c ... Browse Code »

Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason that this
code must be here and it's implementation needs vmlist_lock and it iterate
a vmlist which may be internal data structure for vmalloc.

It is preferable that vmlist_lock and vmlist is only used in vmalloc.c
for maintainability. So move the code to vmalloc.c

Signed-off-by: Joonsoo Kim
Signed-off-by: Joonsoo Kim
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Atsushi Kumagai
Cc: Chris Metcalf
Cc: Dave Anderson
Cc: Eric Biederman
Cc: Guan Xuetao
Cc: Ingo Molnar
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2013-04-30 06:54:33 +0800

25 Apr, 2013

1 commit

6402c7dc2 Merge branch 'linus' into timers/core ... Browse Code »

Reason: Get upstream fixes before adding conflicting code.

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-04-25 02:33:54 +0800

18 Apr, 2013

2 commits

57b8015e0 posix-timers: Show sigevent info in proc file ... Browse Code »

Previous patch added proc file to list posix timers created by task.
Expand the information provided in this file by adding info about
notification method, with which timers were created. I.e. after
the "ID:" line there go

1. "signal:" line, that shows signal number and sigval bits;
2. "notify:" line, that shows the timer notification method.

Thus the timer entry would looke like this:

ID: 123
signal: 14/0000000000b005d0
notify: signal/pid.732

This information is enough to understand how timer_create() was called
for each particular timer.

Signed-off-by: Pavel Emelyanov
Cc: Peter Zijlstra
Cc: Michael Kerrisk
Cc: Matthew Helsley
Link: http://lkml.kernel.org/r/513DA024.80404@parallels.com
Signed-off-by: Thomas Gleixner

Pavel Emelyanov
2013-04-18 02:51:01 +0800
48f6a7a51 posix-timers: Introduce /proc/PID/timers file ... Browse Code »

Currently kernel doesn't provide any API for getting info about what
posix timers are configured by processes. It's implied, that a process
which configured some timers, knows what it did. However, for external
tools it's impossible to get this information. In particular, this is
critical for checkpoint-restore project to have this info.

Introduce a per-pid proc file with information about posix
timers. Since these timers are shared between threads, this file is
present on tgid level only, no such thing in tid subdirs.

The file format is expected to be the "/proc//smaps"-like,
i.e. each timer will occupy seveal lines to allow for future
extending.

Each new timer entry starts with the

ID:

line which is added by this patch.

Signed-off-by: Pavel Emelyanov
Cc: Peter Zijlstra
Cc: Michael Kerrisk
Cc: Matthew Helsley
Link: http://lkml.kernel.org/r/513DA00D.6070009@parallels.com
Signed-off-by: Thomas Gleixner

Pavel Emelyanov
2013-04-18 02:51:01 +0800

12 Apr, 2013

1 commit

f2530dc71 kthread: Prevent unpark race which puts threads on the wrong cpu ... Browse Code »

The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0 CPU1 CPU2
unpark(T) wake_up_process(T)
clear(SHOULD_PARK) T runs
leave parkme() due to !SHOULD_PARK
bind_to(CPU2) BUG_ON(wrong CPU)

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0 CPU1
Bring up CPU2
create_thread(T)
park(T)
wait_for_completion()
parkme()
complete()
sched_set_stop_task()
schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().

Reported-by: Dave Jones
Reported-and-tested-by: Dave Hansen
Reported-and-tested-by: Borislav Petkov
Acked-by: Peter Ziljstra
Cc: Srivatsa S. Bhat
Cc: dhillf@gmail.com
Cc: Ingo Molnar
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2013-04-12 20:18:43 +0800

10 Apr, 2013

2 commits

e8f2b548d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"A nasty bug in fs/namespace.c caught by Andrey + a couple of less
serious unpleasantness - ecryptfs misc device playing hopeless games
with try_module_get() and palinfo procfs support being... not quite
correctly done, to be polite."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mnt: release locks on error path in do_loopback
palinfo fixes
procfs: add proc_remove_subtree()
ecryptfs: close rmmod race

Linus Torvalds
2013-04-10 03:22:49 +0800
8ce584c74 procfs: add proc_remove_subtree() ... Browse Code »

just what it sounds like; do that only to procfs subtrees you've
created - doing that to something shared with another driver is
not only antisocial, but might cause interesting races with
proc_create() and its ilk.

Signed-off-by: Al Viro

Al Viro
2013-04-10 02:09:17 +0800

29 Mar, 2013

1 commit

2c3de1c2d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull userns fixes from Eric W Biederman:
"The bulk of the changes are fixing the worst consequences of the user
namespace design oversight in not considering what happens when one
namespace starts off as a clone of another namespace, as happens with
the mount namespace.

The rest of the changes are just plain bug fixes.

Many thanks to Andy Lutomirski for pointing out many of these issues."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Restrict when proc and sysfs can be mounted
ipc: Restrict mounting the mqueue filesystem
vfs: Carefully propogate mounts across user namespaces
vfs: Add a mount flag to lock read only bind mounts
userns: Don't allow creation if the user is chrooted
yama: Better permission check for ptraceme
pid: Handle the exit of a multi-threaded init.
scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.

Linus Torvalds
2013-03-29 04:43:46 +0800

27 Mar, 2013

1 commit

87a8ebd63 userns: Restrict when proc and sysfs can be mounted ... Browse Code »

Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-27 22:50:08 +0800

23 Mar, 2013

1 commit

51f0885e5 vfs,proc: guarantee unique inodes in /proc ... Browse Code »

Dave Jones found another /proc issue with his Trinity tool: thanks to
the namespace model, we can have multiple /proc dentries that point to
the same inode, aliasing directories in /proc//net/ for example.

This ends up being a total disaster, because it acts like hardlinked
directories, and causes locking problems. We rely on the topological
sort of the inodes pointed to by dentries, and if we have aliased
directories, that odering becomes unreliable.

In short: don't do this. Multiple dentries with the same (directory)
inode is just a bad idea, and the namespace code should never have
exposed things this way. But we're kind of stuck with it.

This solves things by just always allocating a new inode during /proc
dentry lookup, instead of using "iget_locked()" to look up existing
inodes by superblock and number. That actually simplies the code a bit,
at the cost of potentially doing more inode [de]allocations.

That said, the inode lookup wasn't free either (and did a lot of locking
of inodes), so it is probably not that noticeable. We could easily keep
the old lookup model for non-directory entries, but rather than try to
be excessively clever this just implements the minimal and simplest
workaround for the problem.

Reported-and-tested-by: Dave Jones
Analyzed-by: Al Viro
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-03-23 02:44:04 +0800

09 Mar, 2013

1 commit

db04dc679 proc: Use nd_jump_link in proc_ns_follow_link ... Browse Code »

Update proc_ns_follow_link to use nd_jump_link instead of just
manually updating nd.path.dentry.

This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave
Jones and reproduced trivially with mkdir /proc/self/ns/uts/a.

Sigh it looks like the VFS change to require use of nd_jump_link
happend while proc_ns_follow_link was baking and since the common case
of proc_ns_follow_link continued to work without problems the need for
making this change was overlooked.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-09 16:14:45 +0800

28 Feb, 2013

3 commits

c2c1b089b fs/proc/vmcore.c: put if tests in the top of the while loop to reduce duplication ... Browse Code »

In read_vmcore() two `if' tests are duplicated. Change the position of
them could reduce the duplication. This change does not affect the
behaviour of the function.

[akpm@linux-foundation.org: avoid `if (foo = bar)' thing, use min_t()]
[akpm@linux-foundation.org: s/max_t/min_t/]
Signed-off-by: Zhang Yanfei
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang Yanfei
2013-02-28 11:10:11 +0800
87ebdc00e fs/proc: clean up printks ... Browse Code »

- use pr_foo() throughout

- remove a couple of duplicated KERN_WARNINGs, via WARN(KERN_WARNING "...")

- nuke a few warnings which I've never seen happen, ever.

Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2013-02-28 11:10:11 +0800
e579d2c25 coredump: remove redundant defines for dumpable states ... Browse Code »

The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_*
defines introduced in 54b501992dd2 ("coredump: warn about unsafe
suid_dumpable / core_pattern combo"). Remove the new ones, and use the
prior values instead.

Signed-off-by: Kees Cook
Reported-by: Chen Gang
Cc: Alexander Viro
Cc: Alan Cox
Cc: "Eric W. Biederman"
Cc: Doug Ledford
Cc: Serge Hallyn
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-02-28 11:10:11 +0800

27 Feb, 2013

1 commit

d895cb1af Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.

The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.

Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.

PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...

Linus Torvalds
2013-02-27 12:16:07 +0800

26 Feb, 2013

4 commits

d3d009cb9 saner proc_get_inode() calling conventions ... Browse Code »

Make it drop the pde in *all* cases when no new reference to it is
put into an inode - both when an inode had already been set up
(as we were already doing) and when inode allocation has failed.
Makes for simpler logics in callers...

Signed-off-by: Al Viro

Al Viro
2013-02-26 15:46:14 +0800
87e0aab37 proc: avoid extra pde_put() in proc_fill_super() ... Browse Code »

If proc_get_inode() succeeded, but d_make_root() failed, pde_put() for
proc_root will be called twice: the first time due to iput() called from
d_make_root() and the second time directly in the end of
proc_fill_super().

Signed-off-by: Maxim Patlasov
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Maxim Patlasov
2013-02-26 15:46:14 +0800
417358187 fs: change return values from -EACCES to -EPERM ... Browse Code »

According to SUSv3:

[EACCES] Permission denied. An attempt was made to access a file in a way
forbidden by its file access permissions.

[EPERM] Operation not permitted. An attempt was made to perform an operation
limited to processes with appropriate privileges or to the owner of a file
or other resource.

So -EPERM should be returned if capability checks fails.

Strictly speaking this is an API change since the error code user sees is
altered.

Signed-off-by: Zhao Hongjiang
Acked-by: Jan Kara
Acked-by: Steven Whitehouse
Acked-by: Ian Kent
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Zhao Hongjiang
2013-02-26 15:46:14 +0800
4f522a247 d_hash_and_lookup(): export, switch open-coded instances ... Browse Code »

* calling conventions change - ERR_PTR() is returned on ->d_hash() errors;
NULL is just for dcache miss now.
* exported, open-coded instances in ncpfs and cifs converted.

Signed-off-by: Al Viro

Al Viro
2013-02-26 15:46:07 +0800

24 Feb, 2013

2 commits

33806f06d swap: make each swap partition have one address_space ... Browse Code »

When I use several fast SSD to do swap, swapper_space.tree_lock is
heavily contended. This makes each swap partition have one
address_space to reduce the lock contention. There is an array of
address_space for swap. The swap entry type is the index to the array.

In my test with 3 SSD, this increases the swapout throughput 20%.

[akpm@linux-foundation.org: revert unneeded change to __add_to_swap_cache]
Signed-off-by: Shaohua Li
Cc: Hugh Dickins
Acked-by: Rik van Riel
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2013-02-24 09:50:17 +0800
293c07e31 memory-failure: use num_poisoned_pages instead of mce_bad_pages ... Browse Code »

Since MCE is an x86 concept, and this code is in mm/, it would be better
to use the name num_poisoned_pages instead of mce_bad_pages.

[akpm@linux-foundation.org: fix mm/sparse.c]
Signed-off-by: Xishi Qiu
Signed-off-by: Jiang Liu
Suggested-by: Borislav Petkov
Reviewed-by: Wanpeng Li
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2013-02-24 09:50:15 +0800

23 Feb, 2013

1 commit

496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

22 Feb, 2013

1 commit

21eaab6d1 Merge tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty ... Browse Code »

Pull tty/serial patches from Greg Kroah-Hartman:
"Here's the big tty/serial driver patches for 3.9-rc1.

More tty port rework and fixes from Jiri here, as well as lots of
individual serial driver updates and fixes.

All of these have been in the linux-next tree for a while."

* tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits)
tty: mxser: improve error handling in mxser_probe() and mxser_module_init()
serial: imx: fix uninitialized variable warning
serial: tegra: assume CONFIG_OF
TTY: do not update atime/mtime on read/write
lguest: select CONFIG_TTY to build properly.
ARM defconfigs: add missing inclusions of linux/platform_device.h
fb/exynos: include platform_device.h
ARM: sa1100/assabet: include platform_device.h directly
serial: imx: Fix recursive locking bug
pps: Fix build breakage from decoupling pps from tty
tty: Remove ancient hardpps()
pps: Additional cleanups in uart_handle_dcd_change
pps: Move timestamp read into PPS code proper
pps: Don't crash the machine when exiting will do
pps: Fix a use-after free bug when unregistering a source.
pps: Use pps_lookup_dev to reduce ldisc coupling
pps: Add pps_lookup_dev() function
tty: serial: uartlite: Support uartlite on big and little endian systems
tty: serial: uartlite: Fix sparse and checkpatch warnings
serial/arc-uart: Miscll DT related updates (Grant's review comments)
...

Fix up trivial conflicts, mostly just due to the TTY config option
clashing with the EXPERIMENTAL removal.

Linus Torvalds
2013-02-22 05:41:04 +0800

21 Feb, 2013

1 commit

a0b1c4295 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking update from David Miller:

1) Checkpoint/restarted TCP sockets now can properly propagate the TCP
timestamp offset. From Andrey Vagin.

2) VMWARE VM VSOCK layer, from Andy King.

3) Much improved support for virtual functions and SR-IOV in bnx2x,
from Ariel ELior.

4) All protocols on ipv4 and ipv6 are now network namespace aware, and
all the compatability checks for initial-namespace-only protocols is
removed. Thanks to Tom Parkin for helping deal with the last major
holdout, L2TP.

5) IPV6 support in netpoll and network namespace support in pktgen,
from Cong Wang.

6) Multiple Registration Protocol (MRP) and Multiple VLAN Registration
Protocol (MVRP) support, from David Ward.

7) Compute packet lengths more accurately in the packet scheduler, from
Eric Dumazet.

8) Use per-task page fragment allocator in skb_append_datato_frags(),
also from Eric Dumazet.

9) Add support for connection tracking labels in netfilter, from
Florian Westphal.

10) Fix default multicast group joining on ipv6, and add anti-spoofing
checks to 6to4 and 6rd. From Hannes Frederic Sowa.

11) Make ipv4/ipv6 fragmentation memory limits more reasonable in modern
times, rearrange inet frag datastructures for better cacheline
locality, and move more operations outside of locking. From Jesper
Dangaard Brouer.

12) Instead of strict master slave relationships, allow arbitrary
scenerios with "upper device lists". From Jiri Pirko.

13) Improve rate limiting accuracy in TBF and act_police, also from Jiri
Pirko.

14) Add a BPF filter netfilter match target, from Willem de Bruijn.

15) Orphan and delete a bunch of pre-historic networking drivers from
Paul Gortmaker.

16) Add TSO support for GRE tunnels, from Pravin B SHelar. Although
this still needs some minor bug fixing before it's %100 correct in
all cases.

17) Handle unresolved IPSEC states like ARP, with a resolution packet
queue. From Steffen Klassert.

18) Remove TCP Appropriate Byte Count support (ABC), from Stephen
Hemminger. This was long overdue.

19) Support SO_REUSEPORT, from Tom Herbert.

20) Allow locking a socket BPF filter, so that it cannot change after a
process drops capabilities.

21) Add VLAN filtering to bridge, from Vlad Yasevich.

22) Bring ipv6 on-par with ipv4 and do not cache neighbour entries in
the ipv6 routes, from YOSHIFUJI Hideaki.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1538 commits)
ipv6: fix race condition regarding dst->expires and dst->from.
net: fix a wrong assignment in skb_split()
ip_gre: remove an extra dst_release()
ppp: set qdisc_tx_busylock to avoid LOCKDEP splat
atl1c: restore buffer state
net: fix a build failure when !CONFIG_PROC_FS
net: ipv4: fix waring -Wunused-variable
net: proc: fix build failed when procfs is not configured
Revert "xen: netback: remove redundant xenvif_put"
net: move procfs code to net/core/net-procfs.c
qmi_wwan, cdc-ether: add ADU960S
bonding: set sysfs device_type to 'bond'
bonding: fix bond_release_all inconsistencies
b44: use netdev_alloc_skb_ip_align()
xen: netback: remove redundant xenvif_put
net: fec: Do a sanity check on the gpio number
ip_gre: propogate target device GSO capability to the tunnel device
ip_gre: allow CSUM capable devices to handle packets
bonding: Fix initialize after use for 3ad machine state spinlock
bonding: Fix race condition between bond_enslave() and bond_3ad_update_lacp_rate()
...

Linus Torvalds
2013-02-21 10:58:50 +0800

19 Feb, 2013

2 commits

c2399059a net: proc: remove proc_net_remove ... Browse Code »

proc_net_remove has been replaced by remove_proc_entry.
we can remove it now.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2013-02-19 03:53:08 +0800
b4278c961 net: proc: remove proc_net_fops_create ... Browse Code »

proc_net_fops_create has been replaced by proc_create,
we can remove it now.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2013-02-19 03:53:08 +0800

28 Jan, 2013

1 commit

6fac4829c cputime: Use accessors to read task cputime stats ... Browse Code »

This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.

Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Li Zhong
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner

Frederic Weisbecker
2013-01-28 02:23:31 +0800

19 Jan, 2013

1 commit

4f73bc4dd tty: Added a CONFIG_TTY option to allow removal of TTY ... Browse Code »

The option allows you to remove TTY and compile without errors. This
saves space on systems that won't support TTY interfaces anyway.
bloat-o-meter output is below.

The bulk of this patch consists of Kconfig changes adding "depends on
TTY" to various serial devices and similar drivers that require the TTY
layer. Ideally, these dependencies would occur on a common intermediate
symbol such as SERIO, but most drivers "select SERIO" rather than
"depends on SERIO", and "select" does not respect dependencies.

bloat-o-meter output comparing our previous minimal to new minimal by
removing TTY. The list is filtered to not show removed entries with awk
'$3 != "-"' as the list was very long.

add/remove: 0/226 grow/shrink: 2/14 up/down: 6/-35356 (-35350)
function old new delta
chr_dev_init 166 170 +4
allow_signal 80 82 +2
static.__warned 143 142 -1
disallow_signal 63 62 -1
__set_special_pids 95 94 -1
unregister_console 126 121 -5
start_kernel 546 541 -5
register_console 593 588 -5
copy_from_user 45 40 -5
sys_setsid 128 120 -8
sys_vhangup 32 19 -13
do_exit 1543 1526 -17
bitmap_zero 60 40 -20
arch_local_irq_save 137 117 -20
release_task 674 652 -22
static.spin_unlock_irqrestore 308 260 -48

Signed-off-by: Joe Millenbach
Reviewed-by: Jamey Sharp
Reviewed-by: Josh Triplett
Signed-off-by: Greg Kroah-Hartman

Joe Millenbach
2013-01-19 08:15:27 +0800

03 Jan, 2013

1 commit

a7a88b237 mempolicy: remove arg from mpol_parse_str, mpol_to_str ... Browse Code »

Remove the unused argument (formerly no_context) from mpol_parse_str()
and from mpol_to_str().

Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-01-03 01:27:10 +0800

26 Dec, 2012

1 commit

dfb2ea45b proc: Allow proc_free_inum to be called from any context ... Browse Code »

While testing the pid namespace code I hit this nasty warning.

[ 176.262617] ------------[ cut here ]------------
[ 176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0()
[ 176.265145] Hardware name: Bochs
[ 176.265677] Modules linked in:
[ 176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
[ 176.266564] Call Trace:
[ 176.266564] [] warn_slowpath_common+0x7f/0xc0
[ 176.266564] [] warn_slowpath_null+0x1a/0x20
[ 176.266564] [] local_bh_enable_ip+0x7a/0xa0
[ 176.266564] [] _raw_spin_unlock_bh+0x19/0x20
[ 176.266564] [] proc_free_inum+0x3a/0x50
[ 176.266564] [] free_pid_ns+0x1c/0x80
[ 176.266564] [] put_pid_ns+0x35/0x50
[ 176.266564] [] put_pid+0x4a/0x60
[ 176.266564] [] tty_ioctl+0x717/0xc10
[ 176.266564] [] ? wait_consider_task+0x855/0xb90
[ 176.266564] [] ? default_spin_lock_flags+0x9/0x10
[ 176.266564] [] ? remove_wait_queue+0x5a/0x70
[ 176.266564] [] do_vfs_ioctl+0x98/0x550
[ 176.266564] [] ? recalc_sigpending+0x1f/0x60
[ 176.266564] [] ? __set_task_blocked+0x37/0x80
[ 176.266564] [] ? sys_wait4+0xab/0xf0
[ 176.266564] [] sys_ioctl+0x91/0xb0
[ 176.266564] [] ? task_stopped_code+0x50/0x50
[ 176.266564] [] system_call_fastpath+0x16/0x1b
[ 176.266564] ---[ end trace 387af88219ad6143 ]---

It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
put_pid is called with another spinlock held and irqs disabled.

For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2012-12-26 08:23:12 +0800

21 Dec, 2012

3 commits

4c9a44aeb Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge the rest of Andrew's patches for -rc1:
"A bunch of fixes and misc missed-out-on things.

That'll do for -rc1. I still have a batch of IPC patches which still
have a possible bug report which I'm chasing down."

* emailed patches from Andrew Morton : (25 commits)
keys: use keyring_alloc() to create module signing keyring
keys: fix unreachable code
sendfile: allows bypassing of notifier events
SGI-XP: handle non-fatal traps
fat: fix incorrect function comment
Documentation: ABI: remove testing/sysfs-devices-node
proc: fix inconsistent lock state
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
memcg: don't register hotcpu notifier from ->css_alloc()
checkpatch: warn on uapi #includes that #include
mm: cma: WARN if freed memory is still in use
exec: do not leave bprm->interp on stack
...

Linus Torvalds
2012-12-21 12:00:43 +0800
ee297209b proc: fix inconsistent lock state ... Browse Code »

Lockdep found an inconsistent lock state when rcu is processing delayed
work in softirq. Currently, kernel is using spin_lock/spin_unlock to
protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
context.

Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.

=================================
[ INFO: inconsistent lock state ]
3.7.0 #36 Not tainted
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
{SOFTIRQ-ON-W} state was registered at:
__lock_acquire+0x8ae/0xca0
lock_acquire+0x199/0x200
_raw_spin_lock+0x41/0x50
proc_alloc_inum+0x4c/0xd0
alloc_mnt_ns+0x49/0xc0
create_mnt_ns+0x25/0x70
mnt_init+0x161/0x1c7
vfs_caches_init+0x107/0x11a
start_kernel+0x348/0x38c
x86_64_start_reservations+0x131/0x136
x86_64_start_kernel+0x103/0x112
irq event stamp: 2993422
hardirqs last enabled at (2993422): _raw_spin_unlock_irqrestore+0x55/0x80
hardirqs last disabled at (2993421): _raw_spin_lock_irqsave+0x29/0x70
softirqs last enabled at (2993394): _local_bh_enable+0x13/0x20
softirqs last disabled at (2993395): call_softirq+0x1c/0x30

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(proc_inum_lock);

lock(proc_inum_lock);

*** DEADLOCK ***

no locks held by swapper/1/0.

stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
Call Trace:
[] ? vprintk_emit+0x471/0x510
print_usage_bug+0x2a5/0x2c0
mark_lock+0x33b/0x5e0
__lock_acquire+0x813/0xca0
lock_acquire+0x199/0x200
_raw_spin_lock+0x41/0x50
proc_free_inum+0x1c/0x50
free_pid_ns+0x1c/0x50
put_pid_ns+0x2e/0x50
put_pid+0x4a/0x60
delayed_put_pid+0x12/0x20
rcu_process_callbacks+0x462/0x790
__do_softirq+0x1b4/0x3b0
call_softirq+0x1c/0x30
do_softirq+0x59/0xd0
irq_exit+0x54/0xd0
smp_apic_timer_interrupt+0x95/0xa3
apic_timer_interrupt+0x72/0x80
cpuidle_enter_tk+0x10/0x20
cpuidle_enter_state+0x17/0x50
cpuidle_idle_call+0x287/0x520
cpu_idle+0xba/0x130
start_secondary+0x2b3/0x2bc

Signed-off-by: Xiaotian Feng
Cc: Al Viro
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiaotian Feng
2012-12-21 09:40:20 +0800
46f695571 procfs: drop vmtruncate ... Browse Code »

Removed vmtruncate

Signed-off-by: Marco Stornelli
Signed-off-by: Al Viro

Marco Stornelli
2012-12-21 03:00:01 +0800

18 Dec, 2012

5 commits

848b81415 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc patches from Andrew Morton:
"Incoming:

- lots of misc stuff

- backlight tree updates

- lib/ updates

- Oleg's percpu-rwsem changes

- checkpatch

- rtc

- aoe

- more checkpoint/restart support

I still have a pile of MM stuff pending - Pekka should be merging
later today after which that is good to go. A number of other things
are twiddling thumbs awaiting maintainer merges."

* emailed patches from Andrew Morton : (180 commits)
scatterlist: don't BUG when we can trivially return a proper error.
docs: update documentation about /proc//fdinfo/ fanotify output
fs, fanotify: add @mflags field to fanotify output
docs: add documentation about /proc//fdinfo/ output
fs, notify: add procfs fdinfo helper
fs, exportfs: add exportfs_encode_inode_fh() helper
fs, exportfs: escape nil dereference if no s_export_op present
fs, epoll: add procfs fdinfo helper
fs, eventfd: add procfs fdinfo helper
procfs: add ability to plug in auxiliary fdinfo providers
tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
breakpoint selftests: print failure status instead of cause make error
kcmp selftests: print fail status instead of cause make error
kcmp selftests: make run_tests fix
mem-hotplug selftests: print failure status instead of cause make error
cpu-hotplug selftests: print failure status instead of cause make error
mqueue selftests: print failure status instead of cause make error
vm selftests: print failure status instead of cause make error
ubifs: use prandom_bytes
mtd: nandsim: use prandom_bytes
...

Linus Torvalds
2012-12-18 12:58:12 +0800
138d22b58 fs, epoll: add procfs fdinfo helper ... Browse Code »

This allows us to print out eventpoll target file descriptor, events and
data, the /proc/pid/fdinfo/fd consists of

| pos: 0
| flags: 02
| tfd: 5 events: 1d data: ffffffffffffffff enabled: 1

[avagin@: fix for unitialized ret variable]

Signed-off-by: Cyrill Gorcunov
Acked-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Andrey Vagin
Cc: Al Viro
Cc: Alexey Dobriyan
Cc: James Bottomley
Cc: "Aneesh Kumar K.V"
Cc: Alexey Dobriyan
Cc: Matthew Helsley
Cc: "J. Bruce Fields"
Cc: "Aneesh Kumar K.V"
Cc: Tvrtko Ursulin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2012-12-18 09:15:27 +0800
55985dd72 procfs: add ability to plug in auxiliary fdinfo providers ... Browse Code »

This patch brings ability to print out auxiliary data associated with
file in procfs interface /proc/pid/fdinfo/fd.

In particular further patches make eventfd, evenpoll, signalfd and
fsnotify to print additional information complete enough to restore
these objects after checkpoint.

To simplify the code we add show_fdinfo callback inside struct
file_operations (as Al and Pavel are proposing).

Signed-off-by: Cyrill Gorcunov
Acked-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Andrey Vagin
Cc: Al Viro
Cc: Alexey Dobriyan
Cc: James Bottomley
Cc: "Aneesh Kumar K.V"
Cc: Alexey Dobriyan
Cc: Matthew Helsley
Cc: "J. Bruce Fields"
Cc: "Aneesh Kumar K.V"
Cc: Tvrtko Ursulin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2012-12-18 09:15:27 +0800
8d238027b proc: pid/status: show all supplementary groups ... Browse Code »

We display a list of supplementary group for each process in
/proc//status. However, we show only the first 32 groups, not all of
them.

Although this is rare, but sometimes processes do have more than 32
supplementary groups, and this kernel limitation breaks user-space apps
that rely on the group list in /proc//status.

Number 32 comes from the internal NGROUPS_SMALL macro which defines the
length for the internal kernel "small" groups buffer. There is no
apparent reason to limit to this value.

This patch removes the 32 groups printing limit.

The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX,
which is currently set to 65536. And this is the maximum count of groups
we may possibly print.

Signed-off-by: Artem Bityutskiy
Acked-by: Serge E. Hallyn
Acked-by: Kees Cook
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Artem Bityutskiy
2012-12-18 09:15:23 +0800
2f4b3bf6b /proc/pid/status: add "Seccomp" field ... Browse Code »

It is currently impossible to examine the state of seccomp for a given
process. While attaching with gdb and attempting "call
prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
reliable. If the process is in seccomp mode 1, this query will kill the
process (prctl not allowed), if the process is in mode 2 with prctl not
allowed, it will similarly be killed, and in weird cases, if prctl is
filtered to return errno 0, it can look like seccomp is disabled.

When reviewing the state of running processes, there should be a way to
externally examine the seccomp mode. ("Did this build of Chrome end up
using seccomp?" "Did my distro ship ssh with seccomp enabled?")

This adds the "Seccomp" line to /proc/$pid/status.

Signed-off-by: Kees Cook
Reviewed-by: Cyrill Gorcunov
Cc: Andrea Arcangeli
Cc: James Morris
Acked-by: Serge E. Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2012-12-18 09:15:22 +0800