Doug / smarc-fsl-linux-kernel | Embedian Git Server

03 Jul, 2013

6 commits

a0b2062b0 posix_timers: fix racy timer delta caching on task exit ... Browse Code »

When a task exits, we perform a caching of the remaining cputime delta
before expiring of its timers.

This is done from the following places:

* When the task is reaped. We iterate through its list of
posix cpu timers and store the remaining timer delta to
the timer struct instead of the absolute value.
(See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )

* When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
If the timer's task is considered dying when watched from these
places, the same conversion from absolute to relative expiry time
is performed. Then the given task's reference is released.
(See clear_dead_task() ).

The relevance of this caching is questionable but this is another
and deeper debate.

The big issue here is that these two sources of caching don't mix
up very well together.

More specifically, the caching can easily be done twice, resulting
in a wrong delta as it gets spuriously substracted a second time by
the elapsed clock. This can happen in the following scenario:

1) The task exits and gets reaped: we call posix_cpu_timers_exit()
and the absolute timer expiry values are converted to a relative
delta.

2) timer_gettime() -> posix_cpu_timer_get() is called and relies on
clear_dead_task() because tsk->exit_state == EXIT_DEAD.
The delta gets substracted again by the elapsed clock and we return
a wrong result.

To fix this, just remove the caching done on task reaping time. It
doesn't bring much value on its own. The caching done from
posix_cpu_timer_get/schedule is enough.

And it would also be hard to get it really right: we could make it put and
clear the target task in the timer struct so that readers know if they are
dealing with a relative cached of absolute value. But it would be racy.
The only safe way to do it would be to lock the itimer->it_lock so that we
know nobody reads the cputime expiry value while we modify it and its
target task reference. Doing so would involve some funny workarounds to
avoid circular lock against the sighand lock. There is just no reason to
maintain this.

The user visible effect of this patch can be observed by running the
following code: it creates a subthread that launches a posix cputimer
which expires after 10 seconds. But then the subthread only busy loops for 2
seconds and exits. The parent reaps the subthread and read the timer value.
Its expected value should the be the initial timer's expiration value
minus the cputime elapsed in the subthread. Roughly 10 - 2 = 8 seconds:

#include
#include
#include
#include
#include

static timer_t id;
static struct itimerspec val = { .it_value.tv_sec = 10, }, new;

static void *thread(void *unused)
{
int err;
struct timeval start, end, diff;

timer_create(CLOCK_THREAD_CPUTIME_ID, NULL, &id);
if (err < 0) {
perror("Can't create timer\n");
return NULL;
}

/* Arm 10 sec timer */
err = timer_settime(id, 0, &val, NULL);
if (err < 0) {
perror("Can't set timer\n");
return NULL;
}

/* Exit after 2 seconds of execution */
gettimeofday(&start, NULL);
do {
gettimeofday(&end, NULL);
timersub(&end, &start, &diff);
} while (diff.tv_sec < 2);

return NULL;
}

int main(int argc, char **argv)
{
pthread_t pthread;
int err;

err = pthread_create(&pthread, NULL, thread, NULL);
if (err) {
perror("Can't create thread\n");
return -1;
}
pthread_join(pthread, NULL);
/* Just wait a little bit to make sure the child got reaped */
sleep(1);
err = timer_gettime(id, &new);
if (err)
perror("Can't get timer value\n");
printf("%d %ld\n", new.it_value.tv_sec, new.it_value.tv_nsec);

return 0;
}

Before the patch:

$ ./posix_cpu_timers
6 2278074

After the patch:

$ ./posix_cpu_timers
8 1158766

Before the patch, the elapsed time got two more seconds spuriously accounted.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:54:42 +0800
76cdcdd97 posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule() ... Browse Code »

In order to re-arm a timer after it fired, we take a sample of the current
process or thread cputime.

If the task is dying though, we don't arm anything but we cache the
remaining timer expiration delta for further reads.

Something similar is performed in posix_cpu_timer_get() but here we forget
to take the process wide cputime sample before caching it.

As a result we are storing random stack content, leading every further
reads of that timer to return junk values.

Fix this by taking the appropriate sample in the case of process wide
timers.

This probably doesn't matter much in practice because, at this stage, the
thread is the last one in the group and we reached exit_notify(). This
implies that we called exit_itimers() and there should be no more timers
to handle for that task.

So this is likely dead code anyway but let's fix the current logic
and the warning that came along:

kernel/posix-cpu-timers.c: In function 'posix_cpu_timer_schedule':
kernel/posix-cpu-timers.c:1127: warning: 'now' may be used uninitialized in this function

Then we can start to think further about cleaning up that code.

Reported-by: Andrew Morton
Reported-by: Chen Gang
Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: Chen Gang
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:20:20 +0800
0bc4b0cf1 selftests: add basic posix timers selftests ... Browse Code »

Add some initial basic tests on a few posix timers interface such as
setitimer() and timer_settime().

These simply check that expiration happens in a reasonable timeframe after
expected elapsed clock time (user time, user + system time, real time,
...).

This is helpful for finding basic breakages while hacking
on this subsystem.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:20:03 +0800
2473f3e7a posix_cpu_timers: consolidate expired timers check ... Browse Code »

Consolidate the common code amongst per thread and per process timers list
on tick time.

List traversal, expiry check and subsequent updates can be shared in a
common helper.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:19:23 +0800
1a7fa510b posix_cpu_timers: consolidate timer list cleanups ... Browse Code »

Cleaning up the posix cpu timers on task exit shares some common code
among timer list types, most notably the list traversal and expiry time
update.

Unify this in a common helper.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:18:37 +0800
55ccb616a posix_cpu_timer: consolidate expiry time type ... Browse Code »

The posix cpu timer expiry time is stored in a union of two types: a 64
bits field if we rely on scheduler precise accounting, or a cputime_t if
we rely on jiffies.

This results in quite some duplicate code and special cases to handle the
two types.

Just unify this into a single 64 bits field. cputime_t can always fit
into it.

Signed-off-by: Frederic Weisbecker
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Olivier Langlois
Signed-off-by: Andrew Morton

Frederic Weisbecker
2013-07-03 22:16:20 +0800

01 Jul, 2013

3 commits

8bb495e3f Linux 3.10 Browse Code »

Linus Torvalds
2013-07-01 06:13:29 +0800
f0277dce1 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

Pull another powerpc fix from Benjamin Herrenschmidt:
"I mentioned that while we had fixed the kernel crashes, EEH error
recovery didn't always recover... It appears that I had a fix for
that already in powerpc-next (with a stable CC).

I cherry-picked it today and did a few tests and it seems that things
now work quite well. The patch is also pretty simple, so I see no
reason to wait before merging it."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/eeh: Fix fetching bus for single-dev-PE

Linus Torvalds
2013-07-01 06:08:15 +0800
4b483802f Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI fixes from James Bottomley:
"This is a set of seven bug fixes. Several fcoe fixes for locking
problems, initiator issues and a VLAN API change, all of which could
eventually lead to data corruption, one fix for a qla2xxx locking
problem which could lead to multiple completions of the same request
(and subsequent data corruption) and a use after free in the ipr
driver. Plus one minor MAINTAINERS file update"

(only six bugfixes in this pull, since I had already pulled the fcoe API
fix directly from Robert Love)

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] ipr: Avoid target_destroy accessing memory after it was freed
[SCSI] qla2xxx: Fix for locking issue between driver ISR and mailbox routines
MAINTAINERS: Fix fcoe mailing list
libfc: extend ex_lock to protect all of fc_seq_send
libfc: Correct check for initiator role
libfcoe: Fix Conflicting FCFs issue in the fabric

Linus Torvalds
2013-07-01 06:06:25 +0800

30 Jun, 2013

12 commits

ea461abf6 powerpc/eeh: Fix fetching bus for single-dev-PE ... Browse Code »

While running Linux as guest on top of phyp, we possiblly have
PE that includes single PCI device. However, we didn't return
its PCI bus correctly and it leads to failure on recovery from
EEH errors for single-dev-PE. The patch fixes the issue.

Cc: # v3.7+
Cc: Steve Best
Signed-off-by: Gavin Shan
Signed-off-by: Benjamin Herrenschmidt

Gavin Shan
2013-06-30 12:08:34 +0800
6c355beaf Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

Pull powerpc fixes from Ben Herrenschmidt:
"We discovered some breakage in our "EEH" (PCI Error Handling) code
while doing error injection, due to a couple of regressions. One of
them is due to a patch (37f02195bee9 "powerpc/pci: fix PCI-e devices
rescan issue on powerpc platform") that, in hindsight, I shouldn't
have merged considering that it caused more problems than it solved.

Please pull those two fixes. One for a simple EEH address cache
initialization issue. The other one is a patch from Guenter that I
had originally planned to put in 3.11 but which happens to also fix
that other regression (a kernel oops during EEH error handling and
possibly hotplug).

With those two, the couple of test machines I've hammered with error
injection are remaining up now. EEH appears to still fail to recover
on some devices, so there is another problem that Gavin is looking
into but at least it's no longer crashing the kernel."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pci: Improve device hotplug initialization
powerpc/eeh: Add eeh_dev to the cache during boot

Linus Torvalds
2013-06-30 08:02:48 +0800
8d5bc1a6a ARM: dt: Only print warning, not WARN() on bad cpu map in device tree ... Browse Code »

Due to recent changes and expecations of proper cpu bindings, there are
now cases for many of the in-tree devicetrees where a WARN() will hit
on boot due to badly formatted /cpus nodes.

Downgrade this to a pr_warn() to be less alarmist, since it's not a
new problem.

Tested on Arndale, Cubox, Seaboard and Panda ES. Panda hits the WARN
without this, the others do not.

Acked-by: Russell King
Signed-off-by: Olof Johansson
Signed-off-by: Linus Torvalds

Olof Johansson
2013-06-30 08:00:40 +0800
7846de406 powerpc/pci: Improve device hotplug initialization ... Browse Code »

Commit 37f02195b (powerpc/pci: fix PCI-e devices rescan issue on powerpc
platform) fixes a problem with interrupt and DMA initialization on hot
plugged devices. With this commit, interrupt and DMA initialization for
hot plugged devices is handled in the pci device enable function.

This approach has a couple of drawbacks. First, it creates two code paths
for device initialization, one for hot plugged devices and another for devices
known during the initial PCI scan. Second, the initialization code for hot
plugged devices is only called when the device is enabled, ie typically
in the probe function. Also, the platform specific setup code is called each
time pci_enable_device() is called, not only once during device discovery,
meaning it is actually called multiple times, once for devices discovered
during the initial scan and again each time a driver is re-loaded.

The visible result is that interrupt pins are only assigned to hot plugged
devices when the device driver is loaded. Effectively this changes the PCI
probe API, since pci_dev->irq and the device's dma configuration will now
only be valid after pci_enable() was called at least once. A more subtle
change is that platform specific PCI device setup is moved from device
discovery into the driver's probe function, more specifically into the
pci_enable_device() call.

To fix the inconsistencies, add new function pcibios_add_device.
Call pcibios_setup_device from pcibios_setup_bus_devices if device setup
is not complete, and from pcibios_add_device if bus setup is complete.

With this change, device setup code is moved back into device initialization,
and called exactly once for both static and hot plugged devices.

[ This also fixes a regression introduced by the above patch which
causes dev->irq to be overwritten under some cirumstances after
MSIs have been enabled for the device which leads to crashes due
to the MSI core "hijacking" dev->irq to store the base MSI number
and not the LSI. --BenH
]

Cc: Yuanquan Chen
Cc: Benjamin Herrenschmidt
Cc: Hiroo Matsumoto
Signed-off-by: Guenter Roeck
Signed-off-by: Benjamin Herrenschmidt

Guenter Roeck
2013-06-30 06:46:46 +0800
133841cab Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto fix from Herbert Xu:
"This fixes a crash in the crypto layer exposed by an SCTP test tool"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algboss - Hold ref count on larval

Linus Torvalds
2013-06-30 02:34:18 +0800
655443193 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux ... Browse Code »

Pull drm/qxl fix from Dave Airlie:
"Bad me forgot an access check, possible security issue, but since this
is the first kernel with it, should be fine to just put it in now"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/qxl: add missing access check for execbuffer ioctl

Linus Torvalds
2013-06-30 02:32:05 +0800
706b23bde Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validation ... Browse Code »

This __put_user() could be used by unprivileged processes to write into
kernel memory. The issue here is that even if copy_siginfo_to_user()
fails, the error code is not checked before __put_user() is executed.

Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle,
so it has not hit a stable release yet.

Signed-off-by: Mathieu Desnoyers
Acked-by: Oleg Nesterov
Cc: Andrey Vagin
Cc: Roland McGrath
Cc: Paul McKenney
Cc: David Howells
Cc: Dave Jones
Cc: Pavel Emelyanov
Cc: Pedro Alves
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2013-06-30 02:29:08 +0800
bd2931b5c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph fix from Sage Weil:
"This is a recently spotted regression in the snapshot behavior...

It turns out several tests weren't being run in the nightlies so this
took a while to spot"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: send snapshot context with writes

Linus Torvalds
2013-06-30 01:31:15 +0800
63edbce16 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull ubifs fixes from Al Viro:
"A couple of ubifs readdir/lseek race fixes. Stable fodder, really
nasty..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
UBIFS: fix a horrid bug
UBIFS: prepare to fix a horrid bug

Linus Torvalds
2013-06-30 01:30:31 +0800
a61aef7fc Merge tag 'for-linus-20130628' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…dhowells/linux-mn10300

Pull two MN10300 fixes from David Howells:
"The first fixes a problem with passing arrays rather than pointers to
get_user() where __typeof__ then wants to declare and initialise an
array variable which gcc doesn't like.

The second fixes a problem whereby putting mem=xxx into the kernel
command line causes init=xxx to get an incorrect value."

* tag 'for-linus-20130628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300:
mn10300: Use early_param() to parse "mem=" parameter
mn10300: Allow to pass array name to get_user()

Linus Torvalds
2013-06-30 01:28:52 +0800
a75930c63 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fix from Thomas Gleixner:
"Correct an ordering issue in the tick broadcast code. I really wish
we'd get compensation for pain and suffering for each line of code we
write to work around dysfunctional timer hardware."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Fix tick_broadcast_pending_mask not cleared

Linus Torvalds
2013-06-30 01:27:19 +0800
82d0b80ad Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fix from Ingo Molnar:
"One more fix for a recently discovered bug"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Disable monitoring on setuid processes for regular users

Linus Torvalds
2013-06-30 01:26:50 +0800

29 Jun, 2013

2 commits

605c912bb UBIFS: fix a horrid bug ... Browse Code »

Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses
it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage,
but this may corrupt memory and lead to all kinds of problems like crashes an
security holes.

This patch fixes the problem by using the 'file->f_version' field, which
'->llseek()' always unconditionally sets to zero. We set it to 1 in
'ubifs_readdir()' and whenever we detect that it became 0, we know there was a
seek and it is time to clear the state saved in 'file->private_data'.

I tested this patch by writing a user-space program which runds readdir and
seek in parallell. I could easily crash the kernel without these patches, but
could not crash it with these patches.

Cc: stable@vger.kernel.org
Reported-by: Al Viro
Tested-by: Artem Bityutskiy
Signed-off-by: Artem Bityutskiy
Signed-off-by: Al Viro

Artem Bityutskiy
2013-06-29 16:45:37 +0800
33f1a63ae UBIFS: prepare to fix a horrid bug ... Browse Code »

Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

First of all, this means that 'file->private_data' can be freed while
'ubifs_readdir()' uses it. But this particular patch does not fix the problem.
This patch is only a preparation, and the fix will follow next.

In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly,
because 'file->f_pos' can be changed by '->llseek()' at any point. This may
lead 'ubifs_readdir()' to returning inconsistent data: directory entry names
may correspond to incorrect file positions.

So here we introduce a local variable 'pos', read 'file->f_pose' once at very
the beginning, and then stick to 'pos'. The result of this is that when
'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of
'ubifs_readdir()', the latter "wins".

Cc: stable@vger.kernel.org
Reported-by: Al Viro
Tested-by: Artem Bityutskiy
Signed-off-by: Artem Bityutskiy
Signed-off-by: Al Viro

Artem Bityutskiy
2013-06-29 16:45:37 +0800

28 Jun, 2013

4 commits

e3f12a530 mn10300: Use early_param() to parse "mem=" parameter ... Browse Code »

This fixes the problem that "init=" options may not be passed to kernel
correctly.

parse_mem_cmdline() of mn10300 arch gets rid of "mem=" string from
redboot_command_line. Then init_setup() parses the "init=" options from
static_command_line, which is a copy of redboot_command_line, and keeps
the pointer to the init options in execute_command variable.

Since the commit 026cee0 upstream (params: _initcall-like kernel
parameters), static_command_line becomes overwritten by saved_command_line at
do_initcall_level(). Notice that saved_command_line is a command line
which includes "mem=" string.

As a result, execute_command may point to weird string by the length of
"mem=" parameter.
I noticed this problem when using the command line like this:

mem=128M console=ttyS0,115200 init=/bin/sh

Here is the processing flow of command line parameters.
start_kernel()
setup_arch(&command_line)
parse_mem_cmdline(cmdline_p)
* strcpy(boot_command_line, redboot_command_line);
* Remove "mem=xxx" from redboot_command_line.
* *cmdline_p = redboot_command_line;
setup_command_line(command_line)

Akira Takeuchi
2013-06-28 23:53:03 +0800
c6dc9f0a4 mn10300: Allow to pass array name to get_user() ... Browse Code »

This fixes the following compile error:

CC block/scsi_ioctl.o
block/scsi_ioctl.c: In function 'sg_scsi_ioctl':
block/scsi_ioctl.c:449: error: invalid initializer

Signed-off-by: David Howells

Akira Takeuchi
2013-06-28 23:53:01 +0800
18097b91a drm/qxl: add missing access check for execbuffer ioctl ... Browse Code »

Reported-by: Mathieu Desnoyers
Signed-off-by: Dave Airlie

Dave Airlie
2013-06-28 11:27:40 +0800
1abd60186 powerpc/eeh: Add eeh_dev to the cache during boot ... Browse Code »

commit f8f7d63fd96ead101415a1302035137a866f8998 ("powerpc/eeh: Trace eeh
device from I/O cache") broke EEH on pseries for devices that were
present during boot and have not been hotplugged/DLPARed.

eeh_check_failure will get the eeh_dev from the cache, and will get
NULL. eeh_addr_cache_build adds the addresses to the cache, but eeh_dev
for the giving pci_device is not set yet. Just reordering the call to
eeh_addr_cache_insert_dev works fine. The ordering is similar to the one
in eeh_add_device_late.

Signed-off-by: Thadeu Lima de Souza Cascardo
Acked-by: Gavin Shan
Signed-off-by: Benjamin Herrenschmidt

Thadeu Lima de Souza Cascardo
2013-06-28 10:02:07 +0800

27 Jun, 2013

13 commits

d2d1f17a0 rbd: send snapshot context with writes ... Browse Code »

Sending the right snapshot context with each write is required for
snapshots to work. Due to the ordering of calls, the snapshot context
is never set for any requests. This causes writes to the current
version of the image to be reflected in all snapshots, which are
supposed to be read-only.

This happens because rbd_osd_req_format_write() sets the snapshot
context based on obj_request->img_request. At this point, however,
obj_request->img_request has not been set yet, to the snapshot context
is set to NULL. Fix this by moving rbd_img_obj_request_add(), which
sets obj_request->img_request, before the osd request formatting
calls.

This resolves:
http://tracker.ceph.com/issues/5465

Reported-by: Karol Jurak
Signed-off-by: Josh Durgin
Reviewed-by: Sage Weil
Reviewed-by: Alex Elder

Josh Durgin
2013-06-27 20:55:29 +0800
a9e94ec35 Merge tag 'fcoe1' into fixes ... Browse Code »

This patch fixes a critical bug that was introduced in 3.9
related to VLAN tagging FCoE frames.

James Bottomley
2013-06-27 14:08:22 +0800
36a279686 Merge tag 'fcoe' into fixes ... Browse Code »

3.10 fixes

James Bottomley
2013-06-27 14:07:53 +0800
98b6ed0f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Found via trinity:

If you connect up an ipv6 socket to an ipv4 mapped address then an
ipv6 one, sendmsg() can croak because ip6_sk_dst_check() assumes the
route cached in the socket is an ipv6 one. In this case there is an
ipv4 route attached, so it gets stomped on.

Reported by Dave Jones and Hannes Frederic Sowa, fixed by Eric
Dumazet.

2) AF_KEY notifications leak some kernel memory to userspace, fix from
Mathias Krause.

3) DLCI calls __dev_get_by_name() without proper locking, and dlci_del
doesn't validate that the device being deleted is actually a DLCI
one. Fixes from Li Zefan.

4) Length check on bluetooth l2cap information responses is wrong, each
response type has a different lenth, so we should make sure it's in
a given range rather than enforce one single valid length. From
Jaganath Kanakkassery.

5) Receive FIFO overflow is really easy to trigger in stress scenerios
in the sh_eth driver, but the event isn't being handled properly at
all. Specifically, the mask of error interrupts doesn't include the
event so we never clear it, resulting in the driver becomming wedged
processing an interrupt that never gets cleared.

Fix from Sergei Shtylyov.

6) qlcnic sleeps while holding a spinlock, use mdelay() instead of
msleep(). From Shahed Shaikh.

7) Missing curly braces causes SIP netfilter NAT module to always drop
packets. Fix from Balazs Peter Odor.

8) ipt_ULOG in netfilter passes the wrong value to timer setup, causing
the timer to dereference crap when it fires. Fix from Gao Feng.

9) Missing RCU protection around txq->axq_acq traversal in
ath_txq_schedule(). Fix from Felix Fietkau.

10) Idle state transition test in ath9k_htc_config() is reversed, fix
from Sujith Manoharan.

11) IPV6 forwarding handles unicast Router Alert packets incorrectly.
It tests the wrong option state. Previously opt->ra being non-zero
indicated a router alert marking in the SKB, but now it's indicated
by a bit in opt->flags. Fix from YOSHIFUJI Hideaki.

12) SKB leak in GRE tunnel GSO handling, from Eric Dumazet.

13) get_user_pages_fast() error handling in TUN and MACVTAP use the same
local variable for the base index and the loop iterator for page
traversal, oops! Fix from Michael S Tsirkin.

14) ipv6_get_lladdr() can fail, and we must therefore check it's return
value in inet6_set_iftoken(). For from Hannes Frederic Sowa.

15) If you change an interface name and meanwhile can sneak in something
that looks up the name (like SO_BINDTODEVICE or SIOCGIFNAME) we can
deadlock with CONFIG_PREEMPT=n. Fix this by providing a helper
function that properly uses raw_seqcount_begin(). From Nicolas
Schichan.

16) Chain noise calibration test is inverted in iwlwifi, fix from
Nikolay Martynov.

17) Properly set TX iwlwifi descriptor flags for back requests. Fix
from Emmanuel Grumbach.

18) We can't assume skb_transport_header() is set in xt_TCPOPTSTRAP
module, fix from Pablo Neira Ayuso.

19) Some crummy APs don't provide the proper High Throughput info in
association response frames. Add a workaround by assume we'll use
whatever is in the beacon/probe. Fix from Johannes Berg.

20) mac80211 call to rate_idx_match_mask() swaps two arguments (mask and
channel width). Fix from Simon Wunderlich.

21) xt_TCPMSS (like xt_TCPOPTSTRAP) must not try to handle fragmented
frames. Fix from Phil Oester.

22) Fix rate control regression causing iwlwifi/iwlegacy chips to use
1Mbit/s on pre-11n networks. From Moshe Benji and Stanslaw Gruszka.

23) Disable brcmsmac power-save functions, they cause regressions. From
Arend van Spriel.

24) Enforce a sane minimum MTU in l2cap_build_cmd() otherwise we can
easily crash. Fix from Anderson Lizardo.

25) If a learning packet arrives during vxlan_stop() we crash, easily
fixed by checking netif_running(). From Stephen Hemminger.

26) Static vxlan FDB entries should not be migrated, also from Stephen.

27) skb_clone() failures not handled in vxlan_xmit(), oops. Also from
Stephen.

28) Add minimal driver for AR816x/AR817x ethernet chips, from Johannes
Berg.

29) Fix regression in userspace VLAN acceleration control, added by the
802.1ad support changes. Fix from Fernando Luis Vazquez Cao.

30) Interval selection for MLD queries in the bridging code was
reversed. Fix from Linus Lüssing.

31) ipv6's ndisc_send_redirect() erroneously writes to the packet we
received not the packet we are building to send out. Fix from
Matthias Schiffer.

32) Don't free netdev before unregistering it, in usb_8dev can driver.
From Marc Kleine-Budde.

33) Fix nl80211 attribute buffer races, from Johannes Berg.

34) Although netlink_diag.h is under uapi/ it isn't present in Kbuild.
From Stephen Hemminger.

35) Wrong address and family passed to MD5 key lookups in TCP, from
Aydin Arik.

36) phy_type attribute created by SFC driver should not be writable.
From Ben Hutchings.

37) Receive/Transmit queue allocations in pxa168_eth and mv643xx_eth
should use kzalloc(). Otherwise if setup fails half-way, we'll
dereference garbage when trying to teardown the rings. From Lubomir
Rintel.

38) Fix double-allocation of dst (resulting in unfreeable net device) in
ipv6's init_loopback(). From Gao Feng.

39) Fix fragmentation handling SKB leak in netfilter conntrack, we were
freeing the wrong skb pointer. From Phil Oester.

40) Don't report "-1" (SPEED_UNKNOWN) in bond_miimon_commit(), from
Nikolay Aleksandrov.

41) davinci_cpdma doesn't check for DMA mapping errors, letting the
device scribble to random addresses. From Sebastian Siewior.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
dlci: validate the net device in dlci_del()
dlci: acquire rtnl_lock before calling __dev_get_by_name()
af_key: fix info leaks in notify messages
ipv6: ip6_sk_dst_check() must not assume ipv6 dst
net: fix kernel deadlock with interface rename and netdev name retrieval.
net/tg3: Avoid delay during MMIO access
ipv6: check return value of ipv6_get_lladdr
macvtap: fix recovery from gup errors
tun: fix recovery from gup errors
gre: fix a possible skb leak
ipv6: Process unicast packet with Router Alert by checking flag in skb.
ath9k_htc: Handle IDLE state transition properly
ath9k: fix an RCU issue in calling ieee80211_get_tx_rates
netfilter: ipt_ULOG: fix incorrect setting of ulog timer
netfilter: ctnetlink: send event when conntrack label was modified
netfilter: nf_nat_sip: fix mangling
qlcnic: Do not sleep while holding spinlock
drivers: net: cpsw: fix compilation error with cpsw driver
tcp: doc : fix the syncookies default value
sh_eth: fix misreporting of transmit abort
...

Linus Torvalds
2013-06-27 13:24:37 +0800
1a506e473 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux ... Browse Code »

Pull i915 drm fixes from Dave Airlie:
"These should be the last two fixes for i915, one is for a fence leak
killing X on some older GPUs, and one is a late regression partial
revert for an swiotlb/xen/i915 interaction, Konrad has promised to
figure out the proper answer, and this patch is the best thing to do
at this stage to avoid regressing"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: make compact dma scatter lists creation work with SWIOTLB backend.
drm/i915: Restore fences after resume and GPU resets

Linus Torvalds
2013-06-27 13:23:15 +0800
578a1310f dlci: validate the net device in dlci_del() ... Browse Code »

We triggered an oops while running trinity with 3.4 kernel:

BUG: unable to handle kernel paging request at 0000000100000d07
IP: [] dlci_ioctl+0xd8/0x2d4 [dlci]
PGD 640c0d067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
CPU 3
...
Pid: 7302, comm: trinity-child3 Not tainted 3.4.24.09+ 40 Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA
RIP: 0010:[] [] dlci_ioctl+0xd8/0x2d4 [dlci]
...
Call Trace:
[] sock_ioctl+0x153/0x280
[] do_vfs_ioctl+0xa4/0x5e0
[] ? fget_light+0x3ea/0x490
[] sys_ioctl+0x4f/0x80
[] system_call_fastpath+0x16/0x1b
...

It's because the net device is not a dlci device.

Reported-by: Li Jinyue
Signed-off-by: Li Zefan
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller

Zefan Li
2013-06-27 06:36:42 +0800
11eb2645c dlci: acquire rtnl_lock before calling __dev_get_by_name() ... Browse Code »

Otherwise the net device returned can be freed at anytime.

Signed-off-by: Li Zefan
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller

Zefan Li
2013-06-27 06:36:42 +0800
a5cc68f3d af_key: fix info leaks in notify messages ... Browse Code »

key_notify_sa_flush() and key_notify_policy_flush() miss to initialize
the sadb_msg_reserved member of the broadcasted message and thereby
leak 2 bytes of heap memory to listeners. Fix that.

Signed-off-by: Mathias Krause
Cc: Steffen Klassert
Cc: "David S. Miller"
Cc: Herbert Xu
Signed-off-by: David S. Miller

Mathias Krause
2013-06-27 06:15:54 +0800
a963a37d3 ipv6: ip6_sk_dst_check() must not assume ipv6 dst ... Browse Code »

It's possible to use AF_INET6 sockets and to connect to an IPv4
destination. After this, socket dst cache is a pointer to a rtable,
not rt6_info.

ip6_sk_dst_check() should check the socket dst cache is IPv6, or else
various corruptions/crashes can happen.

Dave Jones can reproduce immediate crash with
trinity -q -l off -n -c sendmsg -c connect

With help from Hannes Frederic Sowa

Reported-by: Dave Jones
Reported-by: Hannes Frederic Sowa
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Eric Dumazet
2013-06-27 06:13:47 +0800
5dbe7c178 net: fix kernel deadlock with interface rename and netdev name retrieval. ... Browse Code »

When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
rename of a network interface, it can end up waiting for a workqueue
to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
the fact that read_secklock_begin() will spin forever waiting for the
writer process (the one doing the interface rename) to update the
devnet_rename_seq sequence.

This patch fixes the problem by adding a helper (netdev_get_name())
and using it in the code handling the SIOCGIFNAME ioctl and
SO_BINDTODEVICE setsockopt.

The netdev_get_name() helper uses raw_seqcount_begin() to avoid
spinning forever, waiting for devnet_rename_seq->sequence to become
even. cond_resched() is used in the contended case, before retrying
the access to give the writer process a chance to finish.

The use of raw_seqcount_begin() will incur some unneeded work in the
reader process in the contended case, but this is better than
deadlocking the system.

Signed-off-by: Nicolas Schichan
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Nicolas Schichan
2013-06-27 04:42:54 +0800
34a086818 Merge tag 'regulator-v3.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator ... Browse Code »

Pull regulator fix from Mark Brown:
"Fix module loading for tps6586x.

A simple one liner fix to make module loading work for distros
(product specific kernels tend to have things built in)"

* tag 'regulator-v3.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
mfd: tps6586x: correct device name of the regulator cell

Linus Torvalds
2013-06-27 03:18:37 +0800
6b935ca29 Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux ... Browse Code »

Pull GPIO regression fix from Grant Likely:
"It took a while to work out the correct solution to this regression.
It is sorted now. This branch was constructed and tested by Tony.
I've verified that it builds and signed the tag"

* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux:
gpio/omap: don't use linear domain mapping for OMAP1

Linus Torvalds
2013-06-27 03:08:58 +0800
687058aed Merge tag 'pm+acpi-3.10-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull late power management and ACPI fixes from Rafael Wysocki:
"Sorry about the timing of this, but ACPI-based docking stations with
PCI devices on them and ATA bays would be hardly usable with 3.10
without it. We've been working on these fixes for the last couple of
weeks and everyone involved appears to be reasonably comfortable with
them now.

The PM part is one fix for a cpufreq regression introduced recently

- Fix for an ACPI dock regression introduced by the recent rework of
the ACPI-based PCI hotplug code (acpiphp) that caused it to be
initialized before the ACPI dock driver, which is incorrect (ACPI
dock has to be initialized before acpiphp so that acpiphp can
register PCI devices on docking stations with it for PCI hotplug on
re-dock to work). From Jiang Liu.

- Fix for PCI resources allocation in the ACPI-based PCI hotplug code
(acpiphp) that makes it use the same PCI resources assignment rules
during runtime hotplug that are used during boot (the BIOS' choices
are now respected in both cases). This prevents PCI resource
allocation failures during hotplug from happening in some cases.
From Jiang Liu.

- Fix for ordering and synchronization issues during hot-removal of
PCI devices on docking stations. It makes the ACPI dock code carry
out the PCI devices removal synchronously during undock instead of
spawning a separate asynchronous work item to remove each of them
without even bothering to wait for all those work items to
complete. The hot-addition part is changed analogously.

- Fix for a regression (introduced a few releases ago) that removed
the code to register a hotplug notificaion handler for for ATA
ports/devices inadvertently which prevented ATA bays hotplug from
working. The missing code is added back with some improvements.
From Aaron Lu.

- Fix for a recent cpufreq regression causing a NULL pointer
dereference to trigger in od_set_powersave_bias() in some
situations from Jacob Shin"

* tag 'pm+acpi-3.10-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: fix NULL pointer deference at od_set_powersave_bias()
libata-acpi: add back ACPI based hotplug functionality
ACPI / dock / PCI: Synchronous handling of dock events for PCI devices
PCI / ACPI: Use boot-time resource allocation rules during hotplug
ACPI / dock: Initialize ACPI dock subsystem upfront

Linus Torvalds
2013-06-27 02:55:03 +0800