11 Oct, 2007
7 commits
-
The problem: proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace). So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.To fix this introduce maybe_get_net and get_proc_net.
maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL. In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.Signed-off-by: Eric W. Biederman
Acked-by: Paul E. McKenney
Signed-off-by: David S. Miller -
Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfigSigned-off-by: Mark Nelson
Acked-by: Benjamin Thery
Signed-off-by: David S. Miller -
My bad.
Signed-off-by: David S. Miller
-
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctlwere modified to take a network namespace argument, and
deal with it.vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller -
Each netlink socket will live in exactly one network namespace,
this includes the controlling kernel sockets.This patch updates all of the existing netlink protocols
to only support the initial network namespace. Request
by clients in other namespaces will get -ECONREFUSED.
As they would if the kernel did not have the support for
that netlink protocol compiled in.As each netlink protocol is updated to be multiple network
namespace safe it can register multiple kernel sockets
to acquire a presence in the rest of the network namespaces.The implementation in af_netlink is a simple filter implementation
at hash table insertion and hash table look up time.Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller -
The current implementation of dev_ifname makes maintenance difficult
because updates to the implementation of the ioctl have to made in two
places. So this patch updates dev_ifname32 to do a classic 32/64
structure conversion and call sys_ioctl like the rest of the
compat calls do.Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller
10 Oct, 2007
1 commit
-
The recent fix for a circular lock dependency unfortunately introduced a
potential memory leak in the event where the call to nlmsvc_lookup_host
fails for some reason.Thanks to Roel Kluin for spotting this.
Signed-off-by: Trond Myklebust
Signed-off-by: Linus Torvalds
09 Oct, 2007
1 commit
-
When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect,
statement 'goto out_put_req' is executed. At label 'out_put_req',
aio_put_req(..) is called, which requires 'req->ki_filp' set.Signed-off-by: Yan Zheng
Cc: Zach Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Oct, 2007
2 commits
-
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle Pokki
Blackfin arch: gpio pinmux and resource allocation API required by BF537 on chip ethernet mac driver
Blackfin arch: add some missing syscall
binfmt_flat: checkpatch fixing minimum support for the blackfin relocations
Binfmt_flat: Add minimum support for the Blackfin relocations -
The fs was not unlocking the local alloc inode mutex in the code path in
which it failed to find a window of free bits in the global bitmap.Signed-off-by: Sunil Mushran
Signed-off-by: Mark Fasheh
03 Oct, 2007
2 commits
-
Cc: Bernd Schmidt
Cc: David McCullough
Cc: Greg Ungerer
Cc: Miles Bader
Signed-off-by: Andrew Morton
Signed-off-by: Bryan Wu -
Add minimum support for the Blackfin relocations, since we don't have
enough space in each reloc. The idea is to store a value with one
relocation so that subsequent ones can access it.Actually, this patch is required for Blackfin. Currently if BINFMT_FLAT is
enabled, git-tree kernel will fail to compile.Signed-off-by: Bernd Schmidt
Signed-off-by: Bryan Wu
Cc: David McCullough
Cc: Greg Ungerer
Cc: Miles Bader
Signed-off-by: Andrew Morton
02 Oct, 2007
1 commit
-
Nick Piggin points out that splice isn't being good about the mmap
semaphore: while two readers can nest inside each others, it does leave
a possible deadlock if a writer (ie a new mmap()) comes in during that
nesting.Original "just move the locking" patch by Nick, replaced by one by me
based on an optimistic pagefault_disable(). And then Jens tested and
updated that patch.Reported-by: Nick Piggin
Tested-by: Jens Axboe
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
01 Oct, 2007
1 commit
-
This reverts commit b394e43e995d08821588a22561c6a71a63b4ff27.
Lachlan McIlroy says:
It tried to fix an issue where log replay is replaying an inode cluster
initialisation transaction that should not be replayed because the inode
cluster on disk is more up to date. Since we don't log file sizes (we
rely on inode flushing to get them to disk) then we can't just replay
all the transations in the log and expect the inode to be completely
restored. We lose file size updates. Unfortunately this fix is causing
more (serious) problems than it is fixing.SGI-PV: 969656
SGI-Modid: xfs-linux-melb:xfs-kern:29804aSigned-off-by: Lachlan McIlroy
Signed-off-by: Tim Shimmin
29 Sep, 2007
1 commit
-
It doesn't look as if the NFS file name limit is being initialised correctly
in the struct nfs_server. Make sure that we limit whatever is being set in
nfs_probe_fsinfo() and nfs_init_server().Also ensure that readdirplus and nfs4_path_walk respect our file name
limits.Signed-off-by: Trond Myklebust
Signed-off-by: Linus Torvalds
27 Sep, 2007
1 commit
-
The problem is that the garbage collector for the 'host' structures
nlm_gc_hosts(), holds nlm_host_mutex while calling down to
nlmsvc_mark_resources, which, eventually takes the file->f_mutex.We cannot therefore call nlmsvc_lookup_host() from within
nlmsvc_create_block, since the caller will already hold file->f_mutex, so
the attempt to grab nlm_host_mutex may deadlock.Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.
Signed-off-by: Trond Myklebust
Signed-off-by: Linus Torvalds
26 Sep, 2007
2 commits
-
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
[PATCH] WE : Add missing auth compat-ioctl
[PATCH] softmac: Fix inability to associate with WEP networks -
…nville/wireless-2.6 into upstream-fixes
25 Sep, 2007
1 commit
-
Different types of ufs hold state in different places, to hide complexity
of this, there is ufs_get_fs_state, it returns state according to
"UFS_SB(sb)->s_flags", but during mount ufs_get_fs_state is called, before
setting s_flags, this cause message for ufs types like sun ufs: "fs need
fsck", and remount in readonly state.Signed-off-by: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Sep, 2007
1 commit
-
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] fix valid but harmless sparse warning
[XFS] fix filestreams on 32-bit boxes
22 Sep, 2007
1 commit
-
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
ocfs2: Pack vote message and response structures
ocfs2: Don't double set write parameters
ocfs2: Fix pos/len passed to ocfs2_write_cluster
ocfs2: Allow smaller allocations during large writes
21 Sep, 2007
6 commits
-
Johannes just found that we are missing a compat-ioctl
declaration. The fix is trivial. As previous patches for compat-ioctl,
this should also go to stable.More info :
http://marc.info/?l=linux-wireless&m=119029667902588&w=2Signed-off-by: Jean Tourrilhes
Signed-off-by: John W. Linville -
The ocfs2_vote_msg and ocfs2_response_msg structs needed to be
packed to ensure similar sizeofs in 32-bit and 64-bit arches. Without this,
we had inadvertantly broken 32/64 bit cross mounts.Signed-off-by: Sunil Mushran
Signed-off-by: Mark Fasheh -
The target page offsets were being incorrectly set a second time in
ocfs2_prepare_page_for_write(), which was causing problems on a 16k page
size kernel. Additionally, ocfs2_write_failure() was incorrectly using those
parameters instead of the parameters for the individual page being cleaned
up.Signed-off-by: Mark Fasheh
-
This was broken for file systems whose cluster size is greater than page
size. Pos needs to be incremented as we loop through the descriptors, and
len needs to be capped to the size of a single cluster.Signed-off-by: Mark Fasheh
-
The ocfs2 write code loops through a page much like the block code, except
that ocfs2 allocation units can be any size, including larger than page
size. Typically it's equal to or larger than page size - most kernels run 4k
pages, the minimum ocfs2 allocation (cluster) size.Some changes introduced during 2.6.23 changed the way writes to pages are
handled, and inadvertantly broke support for > 4k page size. Instead of just
writing one cluster at a time, we now handle the whole page in one pass.This means that multiple (small) seperate allocations might happen in the
same pass. The allocation code howver typically optimizes by getting the
maximum which was reserved. This triggered a BUG_ON in the extend code where
it'd ask for a single bit (for one part of a > 4k page) and get back more
than it asked for.Fix this by providing a variant of the high level allocation function which
allows the caller to specify a maximum. The traditional function remains and
just calls the new one with a maximum determined from the initial
reservation.Signed-off-by: Mark Fasheh
-
This simplifies signalfd code, by avoiding it to remain attached to the
sighand during its lifetime.In this way, the signalfd remain attached to the sighand only during
poll(2) (and select and epoll) and read(2). This also allows to remove
all the custom "tsk == current" checks in kernel/signal.c, since
dequeue_signal() will only be called by "current".I think this is also what Ben was suggesting time ago.
The external effect of this, is that a thread can extract only its own
private signals and the group ones. I think this is an acceptable
behaviour, in that those are the signals the thread would be able to
fetch w/out signalfd.Signed-off-by: Davide Libenzi
Signed-off-by: Linus Torvalds
20 Sep, 2007
6 commits
-
The new xlog_recover_do_reg_buffer checks call be16_to_cpu on di_gen which
is a 32bit value so sparse rightly complains. Fortunately the warning is
harmless because we don't care for the value, but only whether it's
non-NULL. Due to that fact we can simply kill the endian swaps on this and
the previous di_mode check entirely.SGI-PV: 969656
SGI-Modid: xfs-linux-melb:xfs-kern:29709aSigned-off-by: Christoph Hellwig
Signed-off-by: Lachlan McIlroy
Signed-off-by: Tim Shimmin -
xfs_filestream_mount() sets up an mru cache with:
err = xfs_mru_cache_create(&mp->m_filestream, lifetime, grp_count,
(xfs_mru_cache_free_func_t)xfs_fstrm_free_func);
but that cast is causing problems...
typedef void (*xfs_mru_cache_free_func_t)(unsigned long, void*);
but:
void xfs_fstrm_free_func( xfs_ino_t ino, fstrm_item_t *item)
so on a 32-bit box, it's casting (32, 32) args into (64, 32) and I assume
it's getting garbage for *item, which subsequently causes an explosion.
With this change the filestreams xfsqa tests don't oops on my 32-bit box.SGI-PV: 967795
SGI-Modid: xfs-linux-melb:xfs-kern:29510aSigned-off-by: Eric Sandeen
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Avoid replaying inode buffer initialisation log items if on-disk version is newer.
[XFS] Ensure file size updates have been completed before writing inode to disk.
[XFS] On-demand reaping of the MRU cache -
The do_split() function for htree dir blocks is intended to split a leaf
block to make room for a new entry. It sorts the entries in the original
block by hash value, then moves the last half of the entries to the new
block - without accounting for how much space this actually moves. (IOW,
it moves half of the entry *count* not half of the entry *space*). If by
chance we have both large & small entries, and we move only the smallest
entries, and we have a large new entry to insert, we may not have created
enough space for it.The patch below stores each record size when calculating the dx_map, and
then walks the hash-sorted dx_map, calculating how many entries must be
moved to more evenly split the existing entries between the old block and
the new block, guaranteeing enough space for the new entry.The dx_map "offs" member is reduced to u16 so that the overall map size
does not change - it is temporarily stored at the end of the new block, and
if it grows too large it may be overwritten. By making offs and size both
u16, we won't grow the map size.Also add a few comments to the functions involved.
This fixes the testcase reported by hooanon05@yahoo.co.jp on the
linux-ext4 list, "ext3 dir_index causes an error"Thanks to Andreas Dilger for discussing the problem & solution with me.
Signed-off-by: Eric Sandeen
Signed-off-by: Andreas Dilger
Tested-by: Junjiro Okajima
Cc: Theodore Ts'o
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
NFS unregisters sysctls only if V4 support is compiled in. However, sysctl
table is not V4 specific, so unregister it always.Steps to reproduce:
[build nfs.ko with CONFIG_NFS_V4=n]
modrobe nfs
rmmod nfs
ls /proc/sysUnable to handle kernel paging request at ffffffff880661c0 RIP:
[] proc_sys_readdir+0xd3/0x350
PGD 203067 PUD 207063 PMD 7e216067 PTE 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: lockd nfs_acl sunrpc
Pid: 3335, comm: ls Not tainted 2.6.23-rc3-bloat #2
RIP: 0010:[] [] proc_sys_readdir+0xd3/0x350
RSP: 0018:ffff81007fd93e78 EFLAGS: 00010286
RAX: ffffffff880661c0 RBX: ffffffff80466370 RCX: ffffffff880661c0
RDX: 00000000000014c0 RSI: ffff81007f3ad020 RDI: ffff81007efd8b40
RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff802a8570 R12: ffffffff880661c0
R13: ffff81007e219640 R14: ffff81007efd8b40 R15: ffff81007ded7280
FS: 00002ba25ef03060(0000) GS:ffff81007ff81258(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff880661c0 CR3: 000000007dfaf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 3335, threadinfo ffff81007fd92000, task ffff81007d8a0000)
Stack: ffff81007f3ad150 ffffffff80283f30 ffff81007fd93f48 ffff81007efd8b40
ffff81007ee00440 0000000422222222 0000000200035593 ffffffff88037e9a
2222222222222222 ffffffff80466500 ffff81007e416400 ffff81007e219640
Call Trace:
[] filldir+0x0/0xf0
[] filldir+0x0/0xf0
[] vfs_readdir+0xa7/0xc0
[] sys_getdents+0x96/0xe0
[] system_call+0x7e/0x83Code: 41 8b 14 24 85 d2 74 dc 49 8b 44 24 08 48 85 c0 74 e7 49 3b
RIP [] proc_sys_readdir+0xd3/0x350
RSP
CR2: ffffffff880661c0
Kernel panic - not syncing: Fatal exceptionSigned-off-by: Alexey Dobriyan
Acked-by: Trond Myklebust
Cc: "J. Bruce Fields"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert asserts (BUGs) in dx_probe from bad on-disk data to recoverable
errors with helpful warnings. With help catching other asserts from Duane
GriffinSigned-off-by: Eric Sandeen
Acked-by: Duane Griffin
Acked-by: Theodore Ts'o
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Sep, 2007
2 commits
-
SGI-PV: 969656
SGI-Modid: xfs-linux-melb:xfs-kern:29676aSigned-off-by: Lachlan McIlroy
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin -
SGI-PV: 968767
SGI-Modid: xfs-linux-melb:xfs-kern:29675aSigned-off-by: Lachlan McIlroy
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin
17 Sep, 2007
1 commit
-
Instead of running the mru cache reaper all the time based on a timeout,
we should only run it when the cache has active objects. This allows CPUs
to sleep when there is no activity rather than be woken repeatedly just to
check if there is anything to do.SGI-PV: 968554
SGI-Modid: xfs-linux-melb:xfs-kern:29305aSigned-off-by: David Chinner
Signed-off-by: Donald Douwsma
Signed-off-by: Tim Shimmin
16 Sep, 2007
1 commit
-
…nville/wireless-2.6 into upstream-fixes
15 Sep, 2007
1 commit
-
on return from ioctl calls
Signed-off-by: Masakazu Mokuno
Signed-off-by: John W. Linville
12 Sep, 2007
1 commit
-
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
ocfs2: Fix calculation of i_blocks during truncate
[PATCH] ocfs2: Fix a wrong cluster calculation.
[PATCH] ocfs2: fix mount option parsing
ocfs2: update docs for new features