Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

26 Sep, 2014

3 commits

3721b4212 mm: filemap: move radix tree hole searching here ... Browse Code »

commit e7b563bb2a6f4d974208da46200784b9c5b5a47e upstream.

The radix tree hole searching code is only used for page cache, for
example the readahead code trying to get a a picture of the area
surrounding a fault.

It sufficed to rely on the radix tree definition of holes, which is
"empty tree slot". But this is about to change, though, as shadow page
descriptors will be stored in the page cache after the actual pages get
evicted from memory.

Move the functions over to mm/filemap.c and make them native page cache
operations, where they can later be adapted to handle the new definition
of "page cache hole".

Signed-off-by: Johannes Weiner
Reviewed-by: Rik van Riel
Reviewed-by: Minchan Kim
Acked-by: Mel Gorman
Cc: Andrea Arcangeli
Cc: Bob Liu
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: Greg Thelen
Cc: Hugh Dickins
Cc: Jan Kara
Cc: KOSAKI Motohiro
Cc: Luigi Semenzato
Cc: Metin Doslu
Cc: Michel Lespinasse
Cc: Ozgun Erdogan
Cc: Peter Zijlstra
Cc: Roman Gushchin
Cc: Ryan Mallon
Cc: Tejun Heo
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby

Johannes Weiner
2014-09-26 17:51:58 +0800
50c4613d2 lib: radix-tree: add radix_tree_delete_item() ... Browse Code »

commit 53c59f262d747ea82e7414774c59a489501186a0 upstream.

Provide a function that does not just delete an entry at a given index,
but also allows passing in an expected item. Delete only if that item
is still located at the specified index.

This is handy when lockless tree traversals want to delete entries as
well because they don't have to do an second, locked lookup to verify
the slot has not changed under them before deleting the entry.

Signed-off-by: Johannes Weiner
Reviewed-by: Minchan Kim
Reviewed-by: Rik van Riel
Acked-by: Mel Gorman
Cc: Andrea Arcangeli
Cc: Bob Liu
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: Greg Thelen
Cc: Hugh Dickins
Cc: Jan Kara
Cc: KOSAKI Motohiro
Cc: Luigi Semenzato
Cc: Metin Doslu
Cc: Michel Lespinasse
Cc: Ozgun Erdogan
Cc: Peter Zijlstra
Cc: Roman Gushchin
Cc: Ryan Mallon
Cc: Tejun Heo
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby

Johannes Weiner
2014-09-26 17:51:57 +0800
af1f48ee7 lib/plist: add plist_requeue ... Browse Code »

commit a75f232ce0fe38bd01301899ecd97ffd0254316a upstream.

Add plist_requeue(), which moves the specified plist_node after all other
same-priority plist_nodes in the list. This is essentially an optimized
plist_del() followed by plist_add().

This is needed by swap, which (with the next patch in this set) uses a
plist of available swap devices. When a swap device (either a swap
partition or swap file) are added to the system with swapon(), the device
is added to a plist, ordered by the swap device's priority. When swap
needs to allocate a page from one of the swap devices, it takes the page
from the first swap device on the plist, which is the highest priority
swap device. The swap device is left in the plist until all its pages are
used, and then removed from the plist when it becomes full.

However, as described in man 2 swapon, swap must allocate pages from swap
devices with the same priority in round-robin order; to do this, on each
swap page allocation, swap uses a page from the first swap device in the
plist, and then calls plist_requeue() to move that swap device entry to
after any other same-priority swap devices. The next swap page allocation
will again use a page from the first swap device in the plist and requeue
it, and so on, resulting in round-robin usage of equal-priority swap
devices.

Also add plist_test_requeue() test function, for use by plist_test() to
test plist_requeue() function.

Signed-off-by: Dan Streetman
Cc: Steven Rostedt
Cc: Peter Zijlstra
Acked-by: Mel Gorman
Cc: Paul Gortmaker
Cc: Thomas Gleixner
Cc: Shaohua Li
Cc: Hugh Dickins
Cc: Dan Streetman
Cc: Michal Hocko
Cc: Christian Ehrhardt
Cc: Weijie Yang
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby

Dan Streetman
2014-09-26 17:51:51 +0800

19 Aug, 2014

1 commit

834ed96b5 lib/btree.c: fix leak of whole btree nodes ... Browse Code »

commit c75b53af2f0043aff500af0a6f878497bef41bca upstream.

I use btree from 3.14-rc2 in my own module. When the btree module is
removed, a warning arises:

kmem_cache_destroy btree_node: Slab cache still has objects
CPU: 13 PID: 9150 Comm: rmmod Tainted: GF O 3.14.0-rc2 #1
Hardware name: Inspur NF5270M3/NF5270M3, BIOS CHEETAH_2.1.3 09/10/2013
Call Trace:
dump_stack+0x49/0x5d
kmem_cache_destroy+0xcf/0xe0
btree_module_exit+0x10/0x12 [btree]
SyS_delete_module+0x198/0x1f0
system_call_fastpath+0x16/0x1b

The cause is that it doesn't release the last btree node, when height = 1
and fill = 1.

[akpm@linux-foundation.org: remove unneeded test of NULL]
Signed-off-by: Minfei Huang
Cc: Joern Engel
Cc: Johannes Berg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Jiri Slaby

Minfei Huang
2014-08-19 20:23:40 +0800

17 Jul, 2014

1 commit

fac842fd6 lz4: add overrun checks to lz4_uncompress_unknownoutputsize() ... Browse Code »

commit 4a3a99045177369700c60d074c0e525e8093b0fc upstream.

Jan points out that I forgot to make the needed fixes to the
lz4_uncompress_unknownoutputsize() function to mirror the changes done
in lz4_decompress() with regards to potential pointer overflows.

The only in-kernel user of this function is the zram code, which only
takes data from a valid compressed buffer that it made itself, so it's
not a big issue. But due to external kernel modules using this
function, it's better to be safe here.

Reported-by: Jan Beulich
Cc: "Don A. Bailey"
Signed-off-by: Jiri Slaby

Greg Kroah-Hartman
2014-07-17 19:43:20 +0800

02 Jul, 2014

4 commits

8cef3ce4d lz4: fix another possible overrun ... Browse Code »

commit 4148c1f67abf823099b2d7db6851e4aea407f5ee upstream.

There is one other possible overrun in the lz4 code as implemented by
Linux at this point in time (which differs from the upstream lz4
codebase, but will get synced at in a future kernel release.) As
pointed out by Don, we also need to check the overflow in the data
itself.

While we are at it, replace the odd error return value with just a
"simple" -1 value as the return value is never used for anything other
than a basic "did this work or not" check.

Reported-by: "Don A. Bailey"
Reported-by: Willy Tarreau
Signed-off-by: Jiri Slaby

Greg Kroah-Hartman
2014-07-02 18:06:41 +0800
09b6ffbba idr: fix overflow bug during maximum ID calculation at maximum height ... Browse Code »

commit 3afb69cb5572b3c8c898c00880803cf1a49852c4 upstream.

idr_replace() open-codes the logic to calculate the maximum valid ID
given the height of the idr tree; unfortunately, the open-coded logic
doesn't account for the fact that the top layer may have unused slots
and over-shifts the limit to zero when the tree is at its maximum
height.

The following test code shows it fails to replace the value for
id=((1<<
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Jiri Slaby

Lai Jiangshan
2014-07-02 18:06:14 +0800
f1bac1ba1 lz4: ensure length does not wrap ... Browse Code »

commit 206204a1162b995e2185275167b22468c00d6b36 upstream.

Given some pathologically compressed data, lz4 could possibly decide to
wrap a few internal variables, causing unknown things to happen. Catch
this before the wrapping happens and abort the decompression.

Reported-by: "Don A. Bailey"
Signed-off-by: Jiri Slaby

Greg Kroah-Hartman
2014-07-02 18:05:51 +0800
83722ba4b lzo: properly check for overruns ... Browse Code »

commit 206a81c18401c0cde6e579164f752c4b147324ce upstream.

The lzo decompressor can, if given some really crazy data, possibly
overrun some variable types. Modify the checking logic to properly
detect overruns before they happen.

Reported-by: "Don A. Bailey"
Tested-by: "Don A. Bailey"
Signed-off-by: Jiri Slaby

Greg Kroah-Hartman
2014-07-02 18:05:50 +0800

23 Jun, 2014

1 commit

ec2ab4bc6 netlink: rate-limit leftover bytes warning and print process name ... Browse Code »

[ Upstream commit bfc5184b69cf9eeb286137640351c650c27f118a ]

Any process is able to send netlink messages with leftover bytes.
Make the warning rate-limited to prevent too much log spam.

The warning is supposed to help find userspace bugs, so print the
triggering command name to implicate the buggy program.

[v2: Use pr_warn_ratelimited instead of printk_ratelimited.]

Signed-off-by: Michal Schmidt
Signed-off-by: David S. Miller
Signed-off-by: Jiri Slaby

Michal Schmidt
2014-06-23 16:27:58 +0800

15 May, 2014

1 commit

743660fdf lib/percpu_counter.c: fix bad percpu counter state during suspend ... Browse Code »

commit e39435ce68bb4685288f78b1a7e24311f7ef939f upstream.

I got a bug report yesterday from Laszlo Ersek in which he states that
his kvm instance fails to suspend. Laszlo bisected it down to this
commit 1cf7e9c68fe8 ("virtio_blk: blk-mq support") where virtio-blk is
converted to use the blk-mq infrastructure.

After digging a bit, it became clear that the issue was with the queue
drain. blk-mq tracks queue usage in a percpu counter, which is
incremented on request alloc and decremented when the request is freed.
The initial hunt was for an inconsistency in blk-mq, but everything
seemed fine. In fact, the counter only returned crazy values when
suspend was in progress.

When a CPU is unplugged, the percpu counters merges that CPU state with
the general state. blk-mq takes care to register a hotcpu notifier with
the appropriate priority, so we know it runs after the percpu counter
notifier. However, the percpu counter notifier only merges the state
when the CPU is fully gone. This leaves a state transition where the
CPU going away is no longer in the online mask, yet it still holds
private values. This means that in this state, percpu_counter_sum()
returns invalid results, and the suspend then hangs waiting for
abs(dead-cpu-value) requests to complete which of course will never
happen.

Fix this by clearing the state earlier, so we never have a case where
the CPU isn't in online mask but still holds private state. This bug
has been there since forever, I guess we don't have a lot of users where
percpu counters needs to be reliable during the suspend cycle.

Signed-off-by: Jens Axboe
Reported-by: Laszlo Ersek
Tested-by: Laszlo Ersek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Jiri Slaby

Jens Axboe
2014-05-15 15:56:16 +0800

18 Apr, 2014

1 commit

066bde36d netlink: don't compare the nul-termination in nla_strcmp ... Browse Code »

[ Upstream commit 8b7b932434f5eee495b91a2804f5b64ebb2bc835 ]

nla_strcmp compares the string length plus one, so it's implicitly
including the nul-termination in the comparison.

int nla_strcmp(const struct nlattr *nla, const char *str)
{
int len = strlen(str) + 1;
...
d = memcmp(nla_data(nla), str, len);

However, if NLA_STRING is used, userspace can send us a string without
the nul-termination. This is a problem since the string
comparison will not match as the last byte may be not the
nul-termination.

Fix this by skipping the comparison of the nul-termination if the
attribute data is nul-terminated. Suggested by Thomas Graf.

Cc: Florian Westphal
Cc: Thomas Graf
Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller
Signed-off-by: Jiri Slaby

Pablo Neira
2014-04-18 17:07:16 +0800

12 Mar, 2014

1 commit

67e204a06 mm: do not walk all of system memory during show_mem ... Browse Code »

commit c78e93630d15b5f5774213aad9bdc9f52473a89b upstream.

It has been reported on very large machines that show_mem is taking almost
5 minutes to display information. This is a serious problem if there is
an OOM storm. The bulk of the cost is in show_mem doing a very expensive
PFN walk to give us the following information

Total RAM: Also available as totalram_pages
Highmem pages: Also available as totalhigh_pages
Reserved pages: Can be inferred from the zone structure
Shared pages: PFN walk required
Unshared pages: PFN walk required
Quick pages: Per-cpu walk required

Only the shared/unshared pages requires a full PFN walk but that
information is useless. It is also inaccurate as page pins of unshared
pages would be accounted for as shared. Even if the information was
accurate, I'm struggling to think how the shared/unshared information
could be useful for debugging OOM conditions. Maybe it was useful before
rmap existed when reclaiming shared pages was costly but it is less
relevant today.

The PFN walk could be optimised a bit but why bother as the information is
useless. This patch deletes the PFN walker and infers the total RAM,
highmem and reserved pages count from struct zone. It omits the
shared/unshared page usage on the grounds that it is useless. It also
corrects the reporting of HighMem as HighMem/MovableOnly as ZONE_MOVABLE
has similar problems to HighMem with respect to lowmem/highmem exhaustion.

Signed-off-by: Mel Gorman
Cc: David Rientjes
Acked-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Jiri Slaby

Mel Gorman
2014-03-12 20:25:42 +0800

21 Feb, 2014

1 commit

15b2a2a1a x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y ... Browse Code »

commit 6583327c4dd55acbbf2a6f25e775b28b3abf9a42 upstream.

Commit d61931d89b, "x86: Add optimized popcnt variants" introduced
compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
options -fprofile-arcs and -O2, this flag causes gcc to generate
broken constructor code. As a result, a 64 bit x86 kernel compiled
with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
file" and runs into sproadic BUGs during boot.

The gcc people indicate that these kinds of problems are endemic when
using ad hoc calling conventions. It is therefore best to treat any
file compiled with ad hoc calling conventions as an isolated
environment and avoid things like profiling or coverage analysis,
since those subsystems assume a "normal" calling conventions.

This patch avoids the bug by excluding lib/hweight.o from coverage
profiling.

Reported-by: Meelis Roos
Cc: Andrew Morton
Signed-off-by: Peter Oberparleiter
Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin
Signed-off-by: Greg Kroah-Hartman

Peter Oberparleiter
2014-02-21 03:08:01 +0800

14 Feb, 2014

2 commits

dedd6ba4c iscsi-target: Fix connection reset hang with percpu_ida_alloc ... Browse Code »

commit 555b270e25b0279b98083518a85f4b1da144a181 upstream.

This patch addresses a bug where connection reset would hang
indefinately once percpu_ida_alloc() was starved for tags, due
to the fact that it always assumed uninterruptible sleep mode.

So now make percpu_ida_alloc() check for signal_pending_state() for
making interruptible sleep optional, and convert iscsit_allocate_cmd()
to set TASK_INTERRUPTIBLE for GFP_KERNEL, or TASK_RUNNING for
GFP_ATOMIC.

Reported-by: Linus Torvalds
Cc: Kent Overstreet
Signed-off-by: Nicholas Bellinger
Signed-off-by: Greg Kroah-Hartman

Nicholas Bellinger
2014-02-14 05:50:19 +0800
8cf246178 percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask ... Browse Code »

commit 6f6b5d1ec56acdeab0503d2b823f6f88a0af493e upstream.

This patch changes percpu_ida_alloc() + callers to accept task state
bitmask for prepare_to_wait() for code like target/iscsi that needs
it for interruptible sleep, that is provided in a subsequent patch.

It now expects TASK_UNINTERRUPTIBLE when the caller is able to sleep
waiting for a new tag, or TASK_RUNNING when the caller cannot sleep,
and is forced to return a negative value when no tags are available.

v2 changes:
- Include blk-mq + tcm_fc + vhost/scsi + target/iscsi changes
- Drop signal_pending_state() call
v3 changes:
- Only call prepare_to_wait() + finish_wait() when != TASK_RUNNING
(PeterZ)

Reported-by: Linus Torvalds
Cc: Linus Torvalds
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Jens Axboe
Signed-off-by: Kent Overstreet
Signed-off-by: Nicholas Bellinger
Signed-off-by: Greg Kroah-Hartman

Kent Overstreet
2014-02-14 05:50:19 +0800

08 Dec, 2013

1 commit

d8b3693c2 random32: fix off-by-one in seeding requirement ... Browse Code »

[ Upstream commit 51c37a70aaa3f95773af560e6db3073520513912 ]

For properly initialising the Tausworthe generator [1], we have
a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15.

Commit 697f8d0348 ("random32: seeding improvement") introduced
a __seed() function that imposes boundary checks proposed by the
errata paper [2] to properly ensure above conditions.

However, we're off by one, as the function is implemented as:
"return (x < m) ? x + m : x;", and called with __seed(X, 1),
__seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15
would be possible, whereas the lower boundary should actually
be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise
an initialization with an unwanted seed could have the effect
that Tausworthe's PRNG properties cannot not be ensured.

Note that this PRNG is *not* used for cryptography in the kernel.

[1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
[2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps

Joint work with Hannes Frederic Sowa.

Fixes: 697f8d0348a6 ("random32: seeding improvement")
Cc: Stephen Hemminger
Cc: Florian Weimer
Cc: Theodore Ts'o
Signed-off-by: Daniel Borkmann
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Daniel Borkmann
2013-12-08 23:29:11 +0800

05 Dec, 2013

1 commit

4aa3ce547 vsprintf: check real user/group id for %pK ... Browse Code »

commit 312b4e226951f707e120b95b118cbc14f3d162b2 upstream.

Some setuid binaries will allow reading of files which have read
permission by the real user id. This is problematic with files which
use %pK because the file access permission is checked at open() time,
but the kptr_restrict setting is checked at read() time. If a setuid
binary opens a %pK file as an unprivileged user, and then elevates
permissions before reading the file, then kernel pointer values may be
leaked.

This happens for example with the setuid pppd application on Ubuntu 12.04:

$ head -1 /proc/kallsyms
00000000 T startup_32

$ pppd file /proc/kallsyms
pppd: In file /proc/kallsyms: unrecognized option 'c1000000'

This will only leak the pointer value from the first line, but other
setuid binaries may leak more information.

Fix this by adding a check that in addition to the current process having
CAP_SYSLOG, that effective user and group ids are equal to the real ids.
If a setuid binary reads the contents of a file which uses %pK then the
pointer values will be printed as NULL if the real user is unprivileged.

Update the sysctl documentation to reflect the changes, and also correct
the documentation to state the kptr_restrict=0 is the default.

This is a only temporary solution to the issue. The correct solution is
to do the permission check at open() time on files, and to replace %pK
with a function which checks the open() time permission. %pK uses in
printk should be removed since no sane permission check can be done, and
instead protected by using dmesg_restrict.

Signed-off-by: Ryan Mallon
Cc: Kees Cook
Cc: Alexander Viro
Cc: Joe Perches
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Ryan Mallon
2013-12-05 03:05:12 +0800

01 Nov, 2013

1 commit

3d77b50c5 lib/scatterlist.c: don't flush_kernel_dcache_page on slab page ... Browse Code »

Commit b1adaf65ba03 ("[SCSI] block: add sg buffer copy helper
functions") introduces two sg buffer copy helpers, and calls
flush_kernel_dcache_page() on pages in SG list after these pages are
written to.

Unfortunately, the commit may introduce a potential bug:

- Before sending some SCSI commands, kmalloc() buffer may be passed to
block layper, so flush_kernel_dcache_page() can see a slab page
finally

- According to cachetlb.txt, flush_kernel_dcache_page() is only called
on "a user page", which surely can't be a slab page.

- ARCH's implementation of flush_kernel_dcache_page() may use page
mapping information to do optimization so page_mapping() will see the
slab page, then VM_BUG_ON() is triggered.

Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
before calling flush_kernel_dcache_page().

Signed-off-by: Ming Lei
Reported-by: Aaro Koskinen
Tested-by: Simon Baatz
Cc: Russell King - ARM Linux
Cc: Will Deacon
Cc: Aaro Koskinen
Acked-by: Catalin Marinas
Cc: FUJITA Tomonori
Cc: Tejun Heo
Cc: "James E.J. Bottomley"
Cc: Jens Axboe
Cc: [3.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ming Lei
2013-11-01 07:58:13 +0800

29 Oct, 2013

1 commit

2a999aa0a Kconfig: make KOBJECT_RELEASE debugging require timer debugging ... Browse Code »

Without the timer debugging, the delayed kobject release will just
result in undebuggable oopses if it triggers any latent bugs. That
doesn't actually help debugging at all.

So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid
having people enable one without the other.

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-10-29 23:33:36 +0800

17 Oct, 2013

1 commit

5e9dd373d percpu_refcount: export symbols ... Browse Code »

Export the interface to be used within modules.

Signed-off-by: Matias Bjorling
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matias Bjorling
2013-10-17 12:35:53 +0800

11 Oct, 2013

1 commit

0ff18e373 kobject: show debug info on delayed kobject release ... Browse Code »

Useful for locating buggy drivers on kernel oops.

It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is
hopefully only enabled in debug kernels (like maybe the Fedora rawhide
one, or at developers), so being a bit more verbose is likely ok.

Acked-by: Russell King
Cc: Greg Kroah-Hartman
Signed-off-by: Fengguang Wu
Signed-off-by: Linus Torvalds

Fengguang Wu
2013-10-11 04:56:52 +0800

02 Oct, 2013

1 commit

c31eeaced Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking changes from David Miller:

1) Multiply in netfilter IPVS can overflow when calculating destination
weight. From Simon Kirby.

2) Use after free fixes in IPVS from Julian Anastasov.

3) SFC driver bug fixes from Daniel Pieczko.

4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.

5) Locking and encapsulation fixes to serial line CAN driver, from
Andrew Naujoks.

6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
Eilon Greenstein, and Ariel Elior.

7) In lapb, if no other packets are outstanding, T1 timeouts actually
stall things and no packet gets sent. Fix from Josselin Costanzi.

8) ICMP redirects should not make it to the socket error queues, from
Duan Jiong.

9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.

10) Fix setting of VLAN priority field on via-rhine driver, from Roget
Luethi.

11) Fix TX stalls and VLAN promisc programming in be2net driver from
Ajit Khaparde.

12) Packet padding doesn't get handled correctly in new usbnet SG
support code, from Ming Lei.

13) Fix races in netdevice teardown wrt. network namespace closing.
From Eric W. Biederman.

14) Fix potential missed initialization of net_secret if not TCP
connections are openned. From Eric Dumazet.

15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
Aleksander Morgado.

16) skb_cow_head() can change skb->data and thus packet header pointers,
don't use stale ip_hdr reference in ip_tunnel code.

17) Backend state transition handling fixes in xen-netback, from Paul
Durrant.

18) Packet offset for AH protocol is handled wrong in flow dissector,
from Eric Dumazet.

19) Taking down an fq packet scheduler instance can leave stale packets
in the queues, fix from Eric Dumazet.

20) Fix performance regressions introduced by TCP Small Queues. From
Eric Dumazet.

21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
Hannes Frederic Sowa.

22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
reference to the ipv4/ipv6 specific network device state, so use the
reference put that will check and release the object if the
reference hits zero. From Salam Noureddine.

23) Fix memory corruption in ip_tunnel driver, and use skb_push()
instead of __skb_push() so that similar bugs are less hard to find.
From Steffen Klassert.

24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
Nicolas Dichtel.

25) fq scheduler doesn't accurately rate limit in certain circumstances,
from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
pkt_sched: fq: rate limiting improvements
ip6tnl: allow to use rtnl ops on fb tunnel
sit: allow to use rtnl ops on fb tunnel
ip_tunnel: Remove double unregister of the fallback device
ip_tunnel_core: Change __skb_push back to skb_push
ip_tunnel: Add fallback tunnels to the hash lists
ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
qlcnic: Fix SR-IOV configuration
ll_temac: Reset dma descriptors indexes on ndo_open
skbuff: size of hole is wrong in a comment
ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
ethernet: moxa: fix incorrect placement of __initdata tag
ipv6: gre: correct calculation of max_headroom
powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
bonding: Fix broken promiscuity reference counting issue
tcp: TSQ can use a dynamic limit
dm9601: fix IFF_ALLMULTI handling
pkt_sched: fq: qdisc dismantle fixes
...

Linus Torvalds
2013-10-02 03:58:48 +0800

28 Sep, 2013

3 commits

491f6f8e5 lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP() ... Browse Code »

Make use of arch_mutex_cpu_relax() so architectures can override the
default cpu_relax() semantics.
This is especially useful for s390, where cpu_relax() means that we
yield() the current (virtual) cpu and therefore is very expensive,
and would contradict the whole purpose of the lockless cmpxchg loop.

Signed-off-by: Heiko Carstens

Heiko Carstens
2013-09-28 18:46:24 +0800
730d7d339 sysfs: Allow mounting without CONFIG_NET ... Browse Code »

In kobj_ns_current_may_mount the default should be to allow the mount.
The test is only for a single kobj_ns_type at a time, and unless there
is a reason to prevent it the mounting sysfs should be allowed.
Subsystems that are not registered can't have are not involved so can't
have a reason to prevent mounting sysfs.

This is a bug-fix to commit 7dc5dbc879bd ("sysfs: Restrict mounting
sysfs") that came in via the userns tree during the 3.12 merge window.

Reported-and-tested-by: James Hogan
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Linus Torvalds

Eric W. Biederman
2013-09-28 00:18:39 +0800
d2212b4dc lockref: allow relaxed cmpxchg64 variant for lockless updates ... Browse Code »

The 64-bit cmpxchg operation on the lockref is ordered by virtue of
hazarding between the cmpxchg operation and the reference count
manipulation. On weakly ordered memory architectures (such as ARM), it
can be of great benefit to omit the barrier instructions where they are
not needed.

This patch moves the lockless lockref code over to a cmpxchg64_relaxed
operation, which doesn't provide barrier semantics. If the operation
isn't defined, we simply #define it as the usual 64-bit cmpxchg macro.

Cc: Waiman Long
Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Will Deacon
2013-09-28 00:15:01 +0800

21 Sep, 2013

2 commits

c26d436cb lib: introduce upper case hex ascii helpers ... Browse Code »

To be able to use the hex ascii functions in case sensitive environments
the array hex_asc_upper[] and the needed functions for hex_byte_pack_upper()
are introduced.

Signed-off-by: Andre Naujoks
Signed-off-by: David S. Miller

Andre Naujoks
2013-09-21 03:38:26 +0800
8f4c34469 lockref: use cmpxchg64 explicitly for lockless updates ... Browse Code »

The cmpxchg() function tends not to support 64-bit arguments on 32-bit
architectures. This could be either due to use of unsigned long
arguments (like on ARM) or lack of instruction support (cmpxchgq on
x86). However, these architectures may implement a specific cmpxchg64()
function to provide 64-bit cmpxchg support instead.

Since the lockref code requires a 64-bit cmpxchg and relies on the
architecture selecting ARCH_USE_CMPXCHG_LOCKREF, move to using cmpxchg64
instead of cmpxchg and allow 32-bit architectures to make use of the
lockless lockref implementation.

Cc: Waiman Long
Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Will Deacon
2013-09-21 00:04:28 +0800

13 Sep, 2013

4 commits

399a946ed Merge branch 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull generic hardirq option removal from Martin Schwidefsky:
"All architectures now use generic hardirqs, s390 has been last to
switch.

With that the code under !CONFIG_GENERIC_HARDIRQS and the related
HAVE_GENERIC_HARDIRQS and GENERIC_HARDIRQS config options can be
removed. Yay!"

* 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
Remove GENERIC_HARDIRQ config option

Linus Torvalds
2013-09-13 22:31:38 +0800
0898d2aa9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto fixes from Herbert Xu:
"This fixes a 7+ year race condition in the crypto API that causes
sporadic crashes when multiple threads load the same algorithm.

It also fixes the crct10dif algorithm again to prevent boot failures
on systems where the initramfs tool ignores module softdeps"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: crct10dif - Add fallback for broken initrds
crypto: api - Fix race condition in larval lookup

Linus Torvalds
2013-09-13 22:11:14 +0800
0244ad004 Remove GENERIC_HARDIRQ config option ... Browse Code »

After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2013-09-13 21:09:52 +0800
48efe453e Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending ... Browse Code »

Pull SCSI target updates from Nicholas Bellinger:
"Lots of activity again this round for I/O performance optimizations
(per-cpu IDA pre-allocation for vhost + iscsi/target), and the
addition of new fabric independent features to target-core
(COMPARE_AND_WRITE + EXTENDED_COPY).

The main highlights include:

- Support for iscsi-target login multiplexing across individual
network portals
- Generic Per-cpu IDA logic (kent + akpm + clameter)
- Conversion of vhost to use per-cpu IDA pre-allocation for
descriptors, SGLs and userspace page pointer list
- Conversion of iscsi-target + iser-target to use per-cpu IDA
pre-allocation for descriptors
- Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
emulation for virtual backend drivers
- Add support for generic EXTENDED_COPY (CopyOffload) emulation for
virtual backend drivers.
- Add support for fast memory registration mode to iser-target (Vu)

The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
particular significance, which make us the first and only open source
target to support the full set of VAAI primitives.

Currently Linux clients are lacking upstream support to actually
utilize these primitives. However, with server side support now in
place for folks like MKP + ZAB working on the client, this logic once
reserved for the highest end of storage arrays, can now be run in VMs
on their laptops"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
target/iscsi: Bump versions to v4.1.0
target: Update copyright ownership/year information to 2013
iscsi-target: Bump default TCP listen backlog to 256
target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
iscsi-target; Bump default CmdSN Depth to 64
iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
target: remove unused including
iser-target: introduce fast memory registration mode (FRWR)
iser-target: generalize rdma memory registration and cleanup
iser-target: move rdma wr processing to a shared function
target: Enable global EXTENDED_COPY setup/release
target: Add Third Party Copy (3PC) bit in INQUIRY response
target: Enable EXTENDED_COPY setup in spc_parse_cdb
target: Add support for EXTENDED_COPY copy offload emulation
target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
target: Add global device list for EXTENDED_COPY
target: Make helpers non static for EXTENDED_COPY command setup
target: Make spc_parse_naa_6h_vendor_specific non static
...

Linus Torvalds
2013-09-13 07:11:45 +0800

12 Sep, 2013

8 commits

26052f9b9 crypto: crct10dif - Add fallback for broken initrds ... Browse Code »

Unfortunately, even with a softdep some distros fail to include
the necessary modules in the initrd. Therefore this patch adds
a fallback path to restore existing behaviour where we cannot
load the new crypto crct10dif algorithm.

In order to do this, the underlying crct10dif has been split out
from the crypto implementation so that it can be used on the
fallback path.

Signed-off-by: Herbert Xu

Herbert Xu
2013-09-12 13:31:34 +0800
b34081f1c lz4: fix compression/decompression signedness mismatch ... Browse Code »

LZ4 compression and decompression functions require different in
signedness input/output parameters: unsigned char for compression and
signed char for decompression.

Change decompression API to require "(const) unsigned char *".

Signed-off-by: Sergey Senozhatsky
Cc: Kyungsik Lee
Cc: Geert Uytterhoeven
Cc: Yann Collet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2013-09-12 06:59:45 +0800
5e4c0d974 lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt ... Browse Code »

With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
one such possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
radix_tree_node_alloc()
if (rtp->nr) {
ret = rtp->nodes[rtp->nr - 1];

...
radix_tree_preload()
...
radix_tree_insert()
radix_tree_node_alloc()
if (rtp->nr) {
ret = rtp->nodes[rtp->nr - 1];

And we give out one radix tree node twice. That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

We fix the problem by making radix_tree_node_alloc() always allocate fresh
radix tree nodes when in interrupt. Using preloading when in interrupt
doesn't make sense since all the allocations have to be atomic anyway and
we cannot steal nodes from process-context users because some users rely
on radix_tree_insert() succeeding after radix_tree_preload().
in_interrupt() check is somewhat ugly but we cannot simply key off passed
gfp_mask as that is acquired from root_gfp_mask() and thus the same for
all preload users.

Another part of the fix is to avoid node preallocation in
radix_tree_preload() when passed gfp_mask doesn't allow waiting. Again,
preallocation in such case doesn't make sense and when preallocation would
happen in interrupt we could possibly leak some allocated nodes. However,
some users of radix_tree_preload() require following radix_tree_insert()
to succeed. To avoid unexpected effects for these users,
radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
and we provide a new function radix_tree_maybe_preload() for those users
which get different gfp mask from different call sites and which are
prepared to handle radix_tree_insert() failure.

Signed-off-by: Jan Kara
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2013-09-12 06:59:36 +0800
7c993e11a rbtree: allow tests to run as builtin ... Browse Code »

No reason require rbtree test code to be a module, allow it to be builtin
(streamlines my development process)

Signed-off-by: Cody P Schafer
Reviewed-by: Seth Jennings
Cc: David Woodhouse
Cc: Rik van Riel
Cc: Michel Lespinasse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
2013-09-12 06:59:20 +0800
a791a62fd rbtree_test: add test for postorder iteration ... Browse Code »

Just check that we examine all nodes in the tree for the postorder
iteration.

Signed-off-by: Cody P Schafer
Reviewed-by: Seth Jennings
Cc: David Woodhouse
Cc: Rik van Riel
Cc: Michel Lespinasse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
2013-09-12 06:59:20 +0800
9dee5c515 rbtree: add postorder iteration functions ... Browse Code »

Postorder iteration yields all of a node's children prior to yielding the
node itself, and this particular implementation also avoids examining the
leaf links in a node after that node has been yielded.

In what I expect will be its most common usage, postorder iteration allows
the deletion of every node in an rbtree without modifying the rbtree nodes
(no _requirement_ that they be nulled) while avoiding referencing child
nodes after they have been "deleted" (most commonly, freed).

I have only updated zswap to use this functionality at this point, but
numerous bits of code (most notably in the filesystem drivers) use a hand
rolled postorder iteration that NULLs child links as it traverses the
tree. Each of those instances could be replaced with this common
implementation.

1 & 2 add rbtree postorder iteration functions.
3 adds testing of the iteration to the rbtree runtime tests
4 allows building the rbtree runtime tests as builtins
5 updates zswap.

This patch:

Add postorder iteration functions for rbtree. These are useful for safely
freeing an entire rbtree without modifying the tree at all.

Signed-off-by: Cody P Schafer
Reviewed-by: Seth Jennings
Cc: David Woodhouse
Cc: Rik van Riel
Cc: Michel Lespinasse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
2013-09-12 06:59:19 +0800
1431574a1 lib/decompressors: fix "no limit" output buffer length ... Browse Code »
13

When decompressing into memory, the output buffer length is set to some
arbitrarily high value (0x7fffffff) to indicate the output is, virtually,
unlimited in size.

The problem with this is that some platforms have their physical memory at
high physical addresses (0x80000000 or more), and that the output buffer
address and its "unlimited" length cannot be added without overflowing.
An example of this can be found in inflate_fast():

/* next_out is the output buffer address */
out = strm->next_out - OFF;
/* avail_out is the output buffer size. end will overflow if the output
* address is >= 0x80000104 */
end = out + (strm->avail_out - 257);

This has huge consequences on the performance of kernel decompression,
since the following exit condition of inflate_fast() will be always true:

} while (in < last && out < end);

Indeed, "end" has overflowed and is now always lower than "out". As a
result, inflate_fast() will return after processing one single byte of
input data, and will thus need to be called an unreasonably high number of
times. This probably went unnoticed because kernel decompression is fast
enough even with this issue.

Nonetheless, adjusting the output buffer length in such a way that the
above pointer arithmetic never overflows results in a kernel decompression
that is about 3 times faster on affected machines.

Signed-off-by: Alexandre Courbot
Tested-by: Jon Medhurst
Cc: Stephen Warren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexandre Courbot
2013-09-12 06:58:38 +0800
f2e1d2ac3 lib/crc32: update the comments of crc32_{be,le}_generic() ... Browse Code »

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Gu Zheng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gu Zheng
2013-09-12 06:58:38 +0800