Doug / smarc-fsl-linux-kernel | Embedian Git Server

08 May, 2007

1 commit

5cefcab3d Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (34 commits)
[GFS2] Uncomment sprintf_symbol calling code
[DLM] lowcomms style
[GFS2] printk warning fixes
[GFS2] Patch to fix mmap of stuffed files
[GFS2] use lib/parser for parsing mount options
[DLM] Lowcomms nodeid range & initialisation fixes
[DLM] Fix dlm_lowcoms_stop hang
[DLM] fix mode munging
[GFS2] lockdump improvements
[GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks
[GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump)
[DLM] fs/dlm/ast.c should #include "ast.h"
[DLM] Consolidate transport protocols
[DLM] Remove redundant assignment
[GFS2] Fix bz 234168 (ignoring rgrp flags)
[DLM] change lkid format
[DLM] interface for purge (2/2)
[DLM] add orphan purging code (1/2)
[DLM] split create_message function
[GFS2] Set drop_count to 0 (off) by default
...

Linus Torvalds
2007-05-08 03:26:27 +0800

03 May, 2007

1 commit

823bccfc4 remove "struct subsystem" as it is no longer needed ... Browse Code »

We need to work on cleaning up the relationship between kobjects, ksets and
ktypes. The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.

Thanks to Kay for fixing the bugs in this patch.

Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2007-05-03 09:57:59 +0800

01 May, 2007

15 commits

617e82e10 [DLM] lowcomms style ... Browse Code »

Replace some printk with log_print, and fix some simple cases of lines
over 80. Also, return -ENOTCONN if lowcomms_start fails due to no local
IP address being available.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-05-01 16:11:51 +0800
30d3a2373 [DLM] Lowcomms nodeid range & initialisation fixes ... Browse Code »

Fix a few range & initialization bugs in lowcomms.
- max_nodeid is really the highest nodeid encountered, so all loops must include
it in their iterations.
- clean dlm_local_count & connection_idr so we can do a clean restart.
- Remove a spurious BUG_ON

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-05-01 16:11:41 +0800
2439fe507 [DLM] Fix dlm_lowcoms_stop hang ... Browse Code »

When you attempt to release a lockspace in DLM, it will hang trying to down a
semaphore that has already been downed. The attached patch fixes the problem.

Signed-off-by: Josef Bacik
Signed-off-by: Steven Whitehouse
Cc: Patrick Caulfield

Josef Bacik
2007-05-01 16:11:38 +0800
7d3c1feb8 [DLM] fix mode munging ... Browse Code »

There are flags to enable two specialized features in the dlm:
1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
changing the granted mode of locks to NL.
2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR
or CW to grant them if the normal requested mode can't be granted.

GFS direct i/o exercises both of these features, especially when mixed
with buffered i/o. The dlm has problems with them.

The first problem is on the master node. If it demotes a lock as a part of
converting it, the actual step of converting the lock isn't being done
after the demotion, the lock is just left sitting on the granted queue
with a granted mode of NL. I think the mistaken assumption was that the
call to grant_pending_locks() would grant it, but that function naturally
doesn't look at locks on the granted queue.

The second problem is on the process node. If the master either demotes
or gives an altmode, the munging of the gr/rq modes is never done in the
process copy of the lock, leaving the master/process copies out of sync.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-05-01 16:11:36 +0800
8fa1de386 [DLM] fs/dlm/ast.c should #include "ast.h" ... Browse Code »

Every file should include the headers containing the prototypes for
it's global functions.

Signed-off-by: Adrian Bunk
Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

Adrian Bunk
2007-05-01 16:11:25 +0800
6ed7257b4 [DLM] Consolidate transport protocols ... Browse Code »

This patch consolidates the TCP & SCTP protocols for the DLM into a single file
and makes it switchable at run-time (well, at least before the DLM actually
starts up!)

For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel
socket API but that has already been twice ACKed so it should be OK.

The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c
& lowcomms-tcp.c files.

Signed-off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-05-01 16:11:23 +0800
fc7c44f03 [DLM] Remove redundant assignment ... Browse Code »

This patch removes a redundant (and incorrect) assignment from compat_output

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-05-01 16:11:20 +0800
ce03f12b3 [DLM] change lkid format ... Browse Code »

A lock id is a uint32 and is used as an opaque reference to the lock. For
userland apps, the lkid is passed up, through libdlm, as the return value
from a write() on the dlm device. This created a problem when the high
bit was 1, making the lkid look like an error. This is fixed by changing
how the lkid is composed. The low 16 bits identified the hash bucket for
the lock and the high 16 bits were a per-bucket counter (which eventually
hit 0x8000 causing the problem). These are simply swapped around; the
number of hash table buckets is far below 0x8000, making all lkid's
positive when viewed as signed.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-05-01 16:11:15 +0800
72c2be776 [DLM] interface for purge (2/2) ... Browse Code »

Add code to accept purge commands from userland.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-05-01 16:11:12 +0800
8499137d4 [DLM] add orphan purging code (1/2) ... Browse Code »

Add code for purging orphan locks. A process can also purge all of its
own non-orphan locks by passing a pid of zero. Code already exists for
processes to create persistent locks that become orphans when the process
exits, but the complimentary capability for another process to then purge
these orphans has been missing.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-05-01 16:11:10 +0800
7e4dac335 [DLM] split create_message function ... Browse Code »

This splits the current create_message() function into two parts so that
later patches can call the new lower-level _create_message() function when
they don't have an rsb struct. No functional change in this patch.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-05-01 16:11:07 +0800
ef0c2bb05 [DLM] overlapping cancel and unlock ... Browse Code »

Full cancel and force-unlock support. In the past, cancel and force-unlock
wouldn't work if there was another operation in progress on the lock. Now,
both cancel and unlock-force can overlap an operation on a lock, meaning there
may be 2 or 3 operations in progress on a lock in parallel. This support is
important not only because cancel and force-unlock are explicit operations
that an app can use, but both are used implicitly when a process exits while
holding locks.

Summary of changes:

- add-to and remove-from waiters functions were rewritten to handle situations
with more than one remote operation outstanding on a lock

- validate_unlock_args detects when an overlapping cancel/unlock-force
can be sent and when it needs to be delayed until a request/lookup
reply is received

- processing request/lookup replies detects when cancel/unlock-force
occured during the op, and carries out the delayed cancel/unlock-force

- manipulation of the "waiters" (remote operation) state of a lock moved under
the standard rsb mutex that protects all the other lock state

- the two recovery routines related to locks on the waiters list changed
according to the way lkb's are now locked before accessing waiters state

- waiters recovery detects when lkb's being recovered have overlapping
cancel/unlock-force, and may not recover such locks

- revert_lock (cancel) returns a value to distinguish cases where it did
nothing vs cases where it actually did a cancel; the cancel completion ast
should only be done when cancel did something

- orphaned locks put on new list so they can be found later for purging

- cancel must be called on a lock when making it an orphan

- flag user locks (ENDOFLIFE) at the end of their useful life (to the
application) so we can return an error for any further cancel/unlock-force

- we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
either a completion or blocking ast

- clear an unread bast on a lock that's become unlocked

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-05-01 16:11:00 +0800
032067270 [DLM] fix coverity-spotted stupidity ... Browse Code »

Replacement patch to remove redundant code rather than moving it around.

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-05-01 16:10:57 +0800
254da030d [DLM] Don't delete misc device if lockspace removal fails ... Browse Code »

Currently if the lockspace removal fails the misc device associated with a
lockspace is left deleted. After that there is no way to access the orphaned
lockspace from userland.

This patch recreates the misc device if th dlm_release_lockspace fails. I
believe this is better than attempting to remove the lockspace first because
that leaves an unattached device lying around. The potential gap in which there
is no access to the lockspace between removing the misc device and recreating it
is acceptable ... after all the application is trying to remove it, and only new
users of the lockspace will be affected.

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-05-01 16:10:44 +0800
89adc934f [DLM] Fix uninitialised variable in receiving ... Browse Code »

The length of the second element of the kvec array was not initialised before
being added to the first one. This could cause invalid lengths to be passed to
kernel_recvmsg

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-05-01 16:10:34 +0800

08 Mar, 2007

1 commit

84c6e8cd3 [DLM] fs/dlm/user.c should #include "user.h" ... Browse Code »

Every file should include the headers containing the prototypes for
it's global functions.

Signed-off-by: Adrian Bunk
Signed-off-by: Steven Whitehouse

Adrian Bunk
2007-03-08 02:58:21 +0800

13 Feb, 2007

1 commit

00977a59b [PATCH] mark struct file_operations const 6 ... Browse Code »

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2007-02-13 01:48:45 +0800

12 Feb, 2007

1 commit

c37622296 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). ... Browse Code »

Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day
Cc: "Luck, Tony"
Cc: Andi Kleen
Cc: Roland McGrath
Cc: James Bottomley
Cc: Greg KH
Acked-by: Joel Becker
Cc: Steven Whitehouse
Cc: Jan Kara
Cc: Michael Halcrow
Cc: "David S. Miller"
Cc: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2007-02-12 02:51:27 +0800

10 Feb, 2007

1 commit

58addbffd [PATCH] dlm: use kern_recvmsg() ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2007-02-10 01:14:06 +0800

06 Feb, 2007

19 commits

a34fbc636 [DLM] fix softlockup in dlm_recv ... Browse Code »

This patch stops the dlm_recv workqueue from busy-waiting when a node
disconnects. This can cause soft lockup errors on debug systems and bad
performance generally.

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-02-06 02:38:27 +0800
62a0f6236 [DLM] zero new user lvbs ... Browse Code »

A new lvb for a userland lock wasn't being initialized to zero.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:38:24 +0800
9beeb9f3c [DLM/GFS2] indent help text ... Browse Code »

Indent help text as expected.

Signed-off-by: Randy Dunlap
Signed-off-by: Steven Whitehouse

Randy Dunlap
2007-02-06 02:38:20 +0800
001172778 [GFS2/DLM] fix GFS2 circular dependency ... Browse Code »

On Sun, Jan 28, 2007 at 11:08:18AM +0100, Jiri Slaby wrote:
> Andrew Morton napsal(a):
> >Temporarily at
> >
> > http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/
>
> Unable to select IPV6. Menuconfig doesn't offer it when INET is selected.
> When it's not it appears in the menu, but after state change it gets away.
> The same behaviour in xconfig, gconfig.
>
> $ mkdir ../a/tst
> $ make O=../a/tst menuconfig
> HOSTCC scripts/basic/fixdep
> [...]
> HOSTLD scripts/kconfig/mconf
> scripts/kconfig/mconf arch/i386/Kconfig
> Warning! Found recursive dependency: INET GFS2_FS_LOCKING_DLM SYSFS
> OCFS2_FS INET
>
> Maybe this is the problem?

Yes, patch below.

> regards,

cu
Adrian

This patch fixes a circular dependency by letting GFS2_FS_LOCKING_DLM
and DLM depend on instead of select SYSFS.

Since SYSFS depends on EMBEDDED this change shouldn't cause any problems
for users.

Signed-off-by: Adrian Bunk
Acked-by: Randy Dunlap
Signed-off-by: Steven Whitehouse

Adrian Bunk
2007-02-06 02:38:08 +0800
67f55897e [GFS2/DLM] use sysfs ... Browse Code »

With CONFIG_DLM=m, CONFIG_PROC_FS=n, and CONFIG_SYSFS=n, kernel build
fails with:

WARNING: "kernel_subsys" [fs/gfs2/locking/dlm/lock_dlm.ko] undefined!
WARNING: "kernel_subsys" [fs/dlm/dlm.ko] undefined!
WARNING: "kernel_subsys" [fs/configfs/configfs.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

Since fs/dlm/lockspace.c and fs/gfs2/locking/dlm/sysfs.c use
kernel_subsys, they should either DEPEND on it or SELECT it.

Signed-off-by: Randy Dunlap
Signed-off-by: Steven Whitehouse

Randy Dunlap
2007-02-06 02:38:05 +0800
b790c3b7c [DLM] can miss clearing resend flag ... Browse Code »

A long, complicated sequence of events, beginning with the RESEND flag not
being cleared on an lkb, can result in an unlock never completing.

- lkb on waiters list for remote lookup
- the remote node is both the dir node and the master node, so
it optimizes the lookup into a request and sends a request
reply back
- the request reply is saved on the requestqueue to be processed
after recovery
- recovery runs dlm_recover_waiters_pre() which sets RESEND flag
so the lookup will be resent after recovery
- end of recovery: process_requestqueue takes saved request reply
which removes the lkb off the waitesr list, _without_ clearing
the RESEND flag
- end of recovery: dlm_recover_waiters_post() doesn't do anything
with the now completed lookup lkb (would usually clear RESEND)
- later, the node unmounts, unlocks this lkb that still has RESEND
flag set
- the lkb is on the waiters list again, now for unlock, when recovery
occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
set, doesn't do anything since the master still exists
- end of recovery: dlm_recover_waiters_post() takes this lkb off
the waiters list because it has the RESEND flag set, then reports
an error because unlocks are never supposed to be handled in
recover_waiters_post().
- later, the unlock reply is received, doesn't find the lkb on
the waiters list because recover_waiters_post() has wrongly
removed it.
- the unlock operation has been lost, and we're left with a
stray granted lock
- unmount spins waiting for the unlock to complete

The visible evidence of this problem will be a node where gfs umount is
spinning, the dlm waiters list will be empty, and the dlm locks list will
show a granted lock.

The fix is simply to clear the RESEND flag when taking an lkb off the
waiters list.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:37:50 +0800
8fd3a98f2 [DLM] saved dlm message can be dropped ... Browse Code »

dlm_receive_message() returns 0 instead of returning 'error'. What would
happen is that process_requestqueue would take a saved message off the
requestqueue and call receive_message on it. receive_message would then
see that recovery had been aborted, set error to EINTR, and 'goto out',
expecting that the error would be returned. Instead, 0 was always
returned, so process_requestqueue would think that the message had been
processed and delete it instead of saving it to process next time. This
means the message (usually an unlock in my tests) would be lost.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:37:47 +0800
f1f1c1ccf [DLM] Make sock_sem into a mutex ... Browse Code »

Now that there can be multiple dlm_recv threads running we need to prevent two
recvs running for the same connection - it's unlikely but it can happen and it
causes message corruption.

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-02-06 02:37:44 +0800
bd44e2b00 [DLM] fix lowcomms receiving ... Browse Code »

This patch fixes a bug whereby data on a newly accepted connection would be
ignored if it arrived soon after the accept.

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-02-06 02:37:29 +0800
f2f5095f9 [DLM] lowcomms tidy ... Browse Code »

This patch removes some redundant fields from the connection structure and adds
some lockdep annotation to remove spurious warnings.

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-02-06 02:37:23 +0800
222d39609 [DLM] fix master recovery ... Browse Code »

If master recovery happens on an rsb in one recovery sequence, then that
sequence is aborted before lock recovery happens, then in the next
sequence, we rely on the previous master recovery (which may now be
invalid due to another node ignoring a lookup result) and go on do to the
lock recovery where we get stuck due to an invalid master value.

recovery cycle begins: master of rsb X has left
nodes A and B send node C an rcom lookup for X to find the new master
C gets lookup from B first, sets B as new master, and sends reply back to B
C gets lookup from A next, and sends reply back to A saying B is master
A gets lookup reply from C and sets B as the new master in the rsb
recovery cycle on A, B and C is aborted to start a new recovery
B gets lookup reply from C and ignores it since there's a new recovery
recovery cycle begins: some other node has joined
B doesn't think it's the master of X so it doesn't rebuild it in the directory
C looks up the master of X, no one is master, so it becomes new master
B looks up the master of X, finds it's C
A believes that B is the master of X, so it sends its lock to B
B sends an error back to A
A resends
this repeats forever, the incorrect master value on A is never corrected

The fix is to do master recovery on an rsb that still has the NEW_MASTER
flag set from an earlier recovery sequence, and therefore didn't complete
lock recovery.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:36:58 +0800
a1bc86e6b [DLM] fix user unlocking ... Browse Code »

When a user process exits, we clear all the locks it holds. There is a
problem, though, with locks that the process had begun unlocking before it
exited. We couldn't find the lkb's that were in the process of being
unlocked remotely, to flag that they are DEAD. To solve this, we move
lkb's being unlocked onto a new list in the per-process structure that
tracks what locks the process is holding. We can then go through this
list to flag the necessary lkb's when clearing locks for a process when it
exits.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:36:55 +0800
1d6e8131c [DLM] Use workqueues for dlm lowcomms ... Browse Code »

This patch converts the DLM TCP lowcomms to use workqueues rather than using its
own daemon functions. Simultaneously removing a lot of code and making it more
scalable on multi-processor machines.

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-02-06 02:36:52 +0800
d200778e1 [DLM] expose dlm_config_info fields in configfs ... Browse Code »

Make the dlm_config_info values readable and writeable via configfs
entries.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:36:43 +0800
99fc64874 [DLM] add config entry to enable log_debug ... Browse Code »

Add a new dlm_config_info field to enable log_debug output and change
log_debug() to use it.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:36:40 +0800
68c817a1c [DLM] rename dlm_config_info fields ... Browse Code »

Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
can use macros to add configfs functions to access them (in a later
patch). No functional changes in this patch, just naming changes.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:36:37 +0800
8ec688674 [DLM] change some log_error to log_debug ... Browse Code »

Some common, non-error messages should use log_debug instead of log_error
so they can be turned off.

Signed-off-by: David Teigland
Signed-off-by: Steven Whitehouse

David Teigland
2007-02-06 02:36:34 +0800
4edde74ee [DLM] Fix spin lock already unlocked bug ... Browse Code »

I just noticed this message when testing some other changes I'd made to
lowcomms (to use workqueues) but the problem seems to be in the current
git trees too. I'm amazed no-one has seen it.

BUG: spinlock already unlocked on CPU#1, dlm_recoverd/16868

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-02-06 02:36:21 +0800
3fb4a251f [DLM] Fix schedule() calls ... Browse Code »

I was a little over-enthusiastic turning schedule() calls int cond_sched() when fixing the DLM for Andrew Morton.

These four should really be calls to schedule() or the dlm can busy-wait.

Signed-Off-By: Patrick Caulfield
Signed-off-by: Steven Whitehouse

Patrick Caulfield
2007-02-06 02:36:18 +0800