Commit 3b34380ae8c5df6debd85183c7fa1ac05f79b7d2

Authored by NeilBrown
Committed by Linus Torvalds
1 parent 03c902e17f

[PATCH] md: allow chunk_size to be settable through sysfs

... only before array is started of course.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Showing 2 changed files with 34 additions and 0 deletions Inline Diff

Documentation/md.txt
1 Tools that manage md devices can be found at 1 Tools that manage md devices can be found at
2 http://www.<country>.kernel.org/pub/linux/utils/raid/.... 2 http://www.<country>.kernel.org/pub/linux/utils/raid/....
3 3
4 4
5 Boot time assembly of RAID arrays 5 Boot time assembly of RAID arrays
6 --------------------------------- 6 ---------------------------------
7 7
8 You can boot with your md device with the following kernel command 8 You can boot with your md device with the following kernel command
9 lines: 9 lines:
10 10
11 for old raid arrays without persistent superblocks: 11 for old raid arrays without persistent superblocks:
12 md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn 12 md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn
13 13
14 for raid arrays with persistent superblocks 14 for raid arrays with persistent superblocks
15 md=<md device no.>,dev0,dev1,...,devn 15 md=<md device no.>,dev0,dev1,...,devn
16 or, to assemble a partitionable array: 16 or, to assemble a partitionable array:
17 md=d<md device no.>,dev0,dev1,...,devn 17 md=d<md device no.>,dev0,dev1,...,devn
18 18
19 md device no. = the number of the md device ... 19 md device no. = the number of the md device ...
20 0 means md0, 20 0 means md0,
21 1 md1, 21 1 md1,
22 2 md2, 22 2 md2,
23 3 md3, 23 3 md3,
24 4 md4 24 4 md4
25 25
26 raid level = -1 linear mode 26 raid level = -1 linear mode
27 0 striped mode 27 0 striped mode
28 other modes are only supported with persistent super blocks 28 other modes are only supported with persistent super blocks
29 29
30 chunk size factor = (raid-0 and raid-1 only) 30 chunk size factor = (raid-0 and raid-1 only)
31 Set the chunk size as 4k << n. 31 Set the chunk size as 4k << n.
32 32
33 fault level = totally ignored 33 fault level = totally ignored
34 34
35 dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1 35 dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1
36 36
37 A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this: 37 A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this:
38 38
39 e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro 39 e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro
40 40
41 41
42 Boot time autodetection of RAID arrays 42 Boot time autodetection of RAID arrays
43 -------------------------------------- 43 --------------------------------------
44 44
45 When md is compiled into the kernel (not as module), partitions of 45 When md is compiled into the kernel (not as module), partitions of
46 type 0xfd are scanned and automatically assembled into RAID arrays. 46 type 0xfd are scanned and automatically assembled into RAID arrays.
47 This autodetection may be suppressed with the kernel parameter 47 This autodetection may be suppressed with the kernel parameter
48 "raid=noautodetect". As of kernel 2.6.9, only drives with a type 0 48 "raid=noautodetect". As of kernel 2.6.9, only drives with a type 0
49 superblock can be autodetected and run at boot time. 49 superblock can be autodetected and run at boot time.
50 50
51 The kernel parameter "raid=partitionable" (or "raid=part") means 51 The kernel parameter "raid=partitionable" (or "raid=part") means
52 that all auto-detected arrays are assembled as partitionable. 52 that all auto-detected arrays are assembled as partitionable.
53 53
54 Boot time assembly of degraded/dirty arrays 54 Boot time assembly of degraded/dirty arrays
55 ------------------------------------------- 55 -------------------------------------------
56 56
57 If a raid5 or raid6 array is both dirty and degraded, it could have 57 If a raid5 or raid6 array is both dirty and degraded, it could have
58 undetectable data corruption. This is because the fact that it is 58 undetectable data corruption. This is because the fact that it is
59 'dirty' means that the parity cannot be trusted, and the fact that it 59 'dirty' means that the parity cannot be trusted, and the fact that it
60 is degraded means that some datablocks are missing and cannot reliably 60 is degraded means that some datablocks are missing and cannot reliably
61 be reconstructed (due to no parity). 61 be reconstructed (due to no parity).
62 62
63 For this reason, md will normally refuse to start such an array. This 63 For this reason, md will normally refuse to start such an array. This
64 requires the sysadmin to take action to explicitly start the array 64 requires the sysadmin to take action to explicitly start the array
65 desipite possible corruption. This is normally done with 65 desipite possible corruption. This is normally done with
66 mdadm --assemble --force .... 66 mdadm --assemble --force ....
67 67
68 This option is not really available if the array has the root 68 This option is not really available if the array has the root
69 filesystem on it. In order to support this booting from such an 69 filesystem on it. In order to support this booting from such an
70 array, md supports a module parameter "start_dirty_degraded" which, 70 array, md supports a module parameter "start_dirty_degraded" which,
71 when set to 1, bypassed the checks and will allows dirty degraded 71 when set to 1, bypassed the checks and will allows dirty degraded
72 arrays to be started. 72 arrays to be started.
73 73
74 So, to boot with a root filesystem of a dirty degraded raid[56], use 74 So, to boot with a root filesystem of a dirty degraded raid[56], use
75 75
76 md-mod.start_dirty_degraded=1 76 md-mod.start_dirty_degraded=1
77 77
78 78
79 Superblock formats 79 Superblock formats
80 ------------------ 80 ------------------
81 81
82 The md driver can support a variety of different superblock formats. 82 The md driver can support a variety of different superblock formats.
83 Currently, it supports superblock formats "0.90.0" and the "md-1" format 83 Currently, it supports superblock formats "0.90.0" and the "md-1" format
84 introduced in the 2.5 development series. 84 introduced in the 2.5 development series.
85 85
86 The kernel will autodetect which format superblock is being used. 86 The kernel will autodetect which format superblock is being used.
87 87
88 Superblock format '0' is treated differently to others for legacy 88 Superblock format '0' is treated differently to others for legacy
89 reasons - it is the original superblock format. 89 reasons - it is the original superblock format.
90 90
91 91
92 General Rules - apply for all superblock formats 92 General Rules - apply for all superblock formats
93 ------------------------------------------------ 93 ------------------------------------------------
94 94
95 An array is 'created' by writing appropriate superblocks to all 95 An array is 'created' by writing appropriate superblocks to all
96 devices. 96 devices.
97 97
98 It is 'assembled' by associating each of these devices with an 98 It is 'assembled' by associating each of these devices with an
99 particular md virtual device. Once it is completely assembled, it can 99 particular md virtual device. Once it is completely assembled, it can
100 be accessed. 100 be accessed.
101 101
102 An array should be created by a user-space tool. This will write 102 An array should be created by a user-space tool. This will write
103 superblocks to all devices. It will usually mark the array as 103 superblocks to all devices. It will usually mark the array as
104 'unclean', or with some devices missing so that the kernel md driver 104 'unclean', or with some devices missing so that the kernel md driver
105 can create appropriate redundancy (copying in raid1, parity 105 can create appropriate redundancy (copying in raid1, parity
106 calculation in raid4/5). 106 calculation in raid4/5).
107 107
108 When an array is assembled, it is first initialized with the 108 When an array is assembled, it is first initialized with the
109 SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor 109 SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor
110 version number. The major version number selects which superblock 110 version number. The major version number selects which superblock
111 format is to be used. The minor number might be used to tune handling 111 format is to be used. The minor number might be used to tune handling
112 of the format, such as suggesting where on each device to look for the 112 of the format, such as suggesting where on each device to look for the
113 superblock. 113 superblock.
114 114
115 Then each device is added using the ADD_NEW_DISK ioctl. This 115 Then each device is added using the ADD_NEW_DISK ioctl. This
116 provides, in particular, a major and minor number identifying the 116 provides, in particular, a major and minor number identifying the
117 device to add. 117 device to add.
118 118
119 The array is started with the RUN_ARRAY ioctl. 119 The array is started with the RUN_ARRAY ioctl.
120 120
121 Once started, new devices can be added. They should have an 121 Once started, new devices can be added. They should have an
122 appropriate superblock written to them, and then passed be in with 122 appropriate superblock written to them, and then passed be in with
123 ADD_NEW_DISK. 123 ADD_NEW_DISK.
124 124
125 Devices that have failed or are not yet active can be detached from an 125 Devices that have failed or are not yet active can be detached from an
126 array using HOT_REMOVE_DISK. 126 array using HOT_REMOVE_DISK.
127 127
128 128
129 Specific Rules that apply to format-0 super block arrays, and 129 Specific Rules that apply to format-0 super block arrays, and
130 arrays with no superblock (non-persistent). 130 arrays with no superblock (non-persistent).
131 ------------------------------------------------------------- 131 -------------------------------------------------------------
132 132
133 An array can be 'created' by describing the array (level, chunksize 133 An array can be 'created' by describing the array (level, chunksize
134 etc) in a SET_ARRAY_INFO ioctl. This must has major_version==0 and 134 etc) in a SET_ARRAY_INFO ioctl. This must has major_version==0 and
135 raid_disks != 0. 135 raid_disks != 0.
136 136
137 Then uninitialized devices can be added with ADD_NEW_DISK. The 137 Then uninitialized devices can be added with ADD_NEW_DISK. The
138 structure passed to ADD_NEW_DISK must specify the state of the device 138 structure passed to ADD_NEW_DISK must specify the state of the device
139 and it's role in the array. 139 and it's role in the array.
140 140
141 Once started with RUN_ARRAY, uninitialized spares can be added with 141 Once started with RUN_ARRAY, uninitialized spares can be added with
142 HOT_ADD_DISK. 142 HOT_ADD_DISK.
143 143
144 144
145 145
146 MD devices in sysfs 146 MD devices in sysfs
147 ------------------- 147 -------------------
148 md devices appear in sysfs (/sys) as regular block devices, 148 md devices appear in sysfs (/sys) as regular block devices,
149 e.g. 149 e.g.
150 /sys/block/md0 150 /sys/block/md0
151 151
152 Each 'md' device will contain a subdirectory called 'md' which 152 Each 'md' device will contain a subdirectory called 'md' which
153 contains further md-specific information about the device. 153 contains further md-specific information about the device.
154 154
155 All md devices contain: 155 All md devices contain:
156 level 156 level
157 a text file indicating the 'raid level'. This may be a standard 157 a text file indicating the 'raid level'. This may be a standard
158 numerical level prefixed by "RAID-" - e.g. "RAID-5", or some 158 numerical level prefixed by "RAID-" - e.g. "RAID-5", or some
159 other name such as "linear" or "multipath". 159 other name such as "linear" or "multipath".
160 If no raid level has been set yet (array is still being 160 If no raid level has been set yet (array is still being
161 assembled), this file will be empty. 161 assembled), this file will be empty.
162 162
163 raid_disks 163 raid_disks
164 a text file with a simple number indicating the number of devices 164 a text file with a simple number indicating the number of devices
165 in a fully functional array. If this is not yet known, the file 165 in a fully functional array. If this is not yet known, the file
166 will be empty. If an array is being resized (not currently 166 will be empty. If an array is being resized (not currently
167 possible) this will contain the larger of the old and new sizes. 167 possible) this will contain the larger of the old and new sizes.
168 168
169 chunk_size
170 This is the size if bytes for 'chunks' and is only relevant to
171 raid levels that involve striping (1,4,5,6,10). The address space
172 of the array is conceptually divided into chunks and consecutive
173 chunks are striped onto neighbouring devices.
174 The size should be atleast PAGE_SIZE (4k) and should be a power
175 of 2. This can only be set while assembling an array
176
169 As component devices are added to an md array, they appear in the 'md' 177 As component devices are added to an md array, they appear in the 'md'
170 directory as new directories named 178 directory as new directories named
171 dev-XXX 179 dev-XXX
172 where XXX is a name that the kernel knows for the device, e.g. hdb1. 180 where XXX is a name that the kernel knows for the device, e.g. hdb1.
173 Each directory contains: 181 Each directory contains:
174 182
175 block 183 block
176 a symlink to the block device in /sys/block, e.g. 184 a symlink to the block device in /sys/block, e.g.
177 /sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1 185 /sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1
178 186
179 super 187 super
180 A file containing an image of the superblock read from, or 188 A file containing an image of the superblock read from, or
181 written to, that device. 189 written to, that device.
182 190
183 state 191 state
184 A file recording the current state of the device in the array 192 A file recording the current state of the device in the array
185 which can be a comma separated list of 193 which can be a comma separated list of
186 faulty - device has been kicked from active use due to 194 faulty - device has been kicked from active use due to
187 a detected fault 195 a detected fault
188 in_sync - device is a fully in-sync member of the array 196 in_sync - device is a fully in-sync member of the array
189 spare - device is working, but not a full member. 197 spare - device is working, but not a full member.
190 This includes spares that are in the process 198 This includes spares that are in the process
191 of being recoverred to 199 of being recoverred to
192 This list make grow in future. 200 This list make grow in future.
193 201
194 202
195 An active md device will also contain and entry for each active device 203 An active md device will also contain and entry for each active device
196 in the array. These are named 204 in the array. These are named
197 205
198 rdNN 206 rdNN
199 207
200 where 'NN' is the possition in the array, starting from 0. 208 where 'NN' is the possition in the array, starting from 0.
201 So for a 3 drive array there will be rd0, rd1, rd2. 209 So for a 3 drive array there will be rd0, rd1, rd2.
202 These are symbolic links to the appropriate 'dev-XXX' entry. 210 These are symbolic links to the appropriate 'dev-XXX' entry.
203 Thus, for example, 211 Thus, for example,
204 cat /sys/block/md*/md/rd*/state 212 cat /sys/block/md*/md/rd*/state
205 will show 'in_sync' on every line. 213 will show 'in_sync' on every line.
206 214
207 215
208 216
209 Active md devices for levels that support data redundancy (1,4,5,6) 217 Active md devices for levels that support data redundancy (1,4,5,6)
210 also have 218 also have
211 219
212 sync_action 220 sync_action
213 a text file that can be used to monitor and control the rebuild 221 a text file that can be used to monitor and control the rebuild
214 process. It contains one word which can be one of: 222 process. It contains one word which can be one of:
215 resync - redundancy is being recalculated after unclean 223 resync - redundancy is being recalculated after unclean
216 shutdown or creation 224 shutdown or creation
217 recover - a hot spare is being built to replace a 225 recover - a hot spare is being built to replace a
218 failed/missing device 226 failed/missing device
219 idle - nothing is happening 227 idle - nothing is happening
220 check - A full check of redundancy was requested and is 228 check - A full check of redundancy was requested and is
221 happening. This reads all block and checks 229 happening. This reads all block and checks
222 them. A repair may also happen for some raid 230 them. A repair may also happen for some raid
223 levels. 231 levels.
224 repair - A full check and repair is happening. This is 232 repair - A full check and repair is happening. This is
225 similar to 'resync', but was requested by the 233 similar to 'resync', but was requested by the
226 user, and the write-intent bitmap is NOT used to 234 user, and the write-intent bitmap is NOT used to
227 optimise the process. 235 optimise the process.
228 236
229 This file is writable, and each of the strings that could be 237 This file is writable, and each of the strings that could be
230 read are meaningful for writing. 238 read are meaningful for writing.
231 239
232 'idle' will stop an active resync/recovery etc. There is no 240 'idle' will stop an active resync/recovery etc. There is no
233 guarantee that another resync/recovery may not be automatically 241 guarantee that another resync/recovery may not be automatically
234 started again, though some event will be needed to trigger 242 started again, though some event will be needed to trigger
235 this. 243 this.
236 'resync' or 'recovery' can be used to restart the 244 'resync' or 'recovery' can be used to restart the
237 corresponding operation if it was stopped with 'idle'. 245 corresponding operation if it was stopped with 'idle'.
238 'check' and 'repair' will start the appropriate process 246 'check' and 'repair' will start the appropriate process
239 providing the current state is 'idle'. 247 providing the current state is 'idle'.
240 248
241 mismatch_count 249 mismatch_count
242 When performing 'check' and 'repair', and possibly when 250 When performing 'check' and 'repair', and possibly when
243 performing 'resync', md will count the number of errors that are 251 performing 'resync', md will count the number of errors that are
244 found. The count in 'mismatch_cnt' is the number of sectors 252 found. The count in 'mismatch_cnt' is the number of sectors
245 that were re-written, or (for 'check') would have been 253 that were re-written, or (for 'check') would have been
246 re-written. As most raid levels work in units of pages rather 254 re-written. As most raid levels work in units of pages rather
247 than sectors, this my be larger than the number of actual errors 255 than sectors, this my be larger than the number of actual errors
248 by a factor of the number of sectors in a page. 256 by a factor of the number of sectors in a page.
249 257
250 Each active md device may also have attributes specific to the 258 Each active md device may also have attributes specific to the
251 personality module that manages it. 259 personality module that manages it.
252 These are specific to the implementation of the module and could 260 These are specific to the implementation of the module and could
253 change substantially if the implementation changes. 261 change substantially if the implementation changes.
254 262
255 These currently include 263 These currently include
256 264
257 stripe_cache_size (currently raid5 only) 265 stripe_cache_size (currently raid5 only)
258 number of entries in the stripe cache. This is writable, but 266 number of entries in the stripe cache. This is writable, but
259 there are upper and lower limits (32768, 16). Default is 128. 267 there are upper and lower limits (32768, 16). Default is 128.
260 strip_cache_active (currently raid5 only) 268 strip_cache_active (currently raid5 only)
261 number of active entries in the stripe cache 269 number of active entries in the stripe cache
262 270
1 /* 1 /*
2 md.c : Multiple Devices driver for Linux 2 md.c : Multiple Devices driver for Linux
3 Copyright (C) 1998, 1999, 2000 Ingo Molnar 3 Copyright (C) 1998, 1999, 2000 Ingo Molnar
4 4
5 completely rewritten, based on the MD driver code from Marc Zyngier 5 completely rewritten, based on the MD driver code from Marc Zyngier
6 6
7 Changes: 7 Changes:
8 8
9 - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar 9 - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar
10 - RAID-6 extensions by H. Peter Anvin <hpa@zytor.com> 10 - RAID-6 extensions by H. Peter Anvin <hpa@zytor.com>
11 - boot support for linear and striped mode by Harald Hoyer <HarryH@Royal.Net> 11 - boot support for linear and striped mode by Harald Hoyer <HarryH@Royal.Net>
12 - kerneld support by Boris Tobotras <boris@xtalk.msk.su> 12 - kerneld support by Boris Tobotras <boris@xtalk.msk.su>
13 - kmod support by: Cyrus Durgin 13 - kmod support by: Cyrus Durgin
14 - RAID0 bugfixes: Mark Anthony Lisher <markal@iname.com> 14 - RAID0 bugfixes: Mark Anthony Lisher <markal@iname.com>
15 - Devfs support by Richard Gooch <rgooch@atnf.csiro.au> 15 - Devfs support by Richard Gooch <rgooch@atnf.csiro.au>
16 16
17 - lots of fixes and improvements to the RAID1/RAID5 and generic 17 - lots of fixes and improvements to the RAID1/RAID5 and generic
18 RAID code (such as request based resynchronization): 18 RAID code (such as request based resynchronization):
19 19
20 Neil Brown <neilb@cse.unsw.edu.au>. 20 Neil Brown <neilb@cse.unsw.edu.au>.
21 21
22 - persistent bitmap code 22 - persistent bitmap code
23 Copyright (C) 2003-2004, Paul Clements, SteelEye Technology, Inc. 23 Copyright (C) 2003-2004, Paul Clements, SteelEye Technology, Inc.
24 24
25 This program is free software; you can redistribute it and/or modify 25 This program is free software; you can redistribute it and/or modify
26 it under the terms of the GNU General Public License as published by 26 it under the terms of the GNU General Public License as published by
27 the Free Software Foundation; either version 2, or (at your option) 27 the Free Software Foundation; either version 2, or (at your option)
28 any later version. 28 any later version.
29 29
30 You should have received a copy of the GNU General Public License 30 You should have received a copy of the GNU General Public License
31 (for example /usr/src/linux/COPYING); if not, write to the Free 31 (for example /usr/src/linux/COPYING); if not, write to the Free
32 Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 32 Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
33 */ 33 */
34 34
35 #include <linux/module.h> 35 #include <linux/module.h>
36 #include <linux/config.h> 36 #include <linux/config.h>
37 #include <linux/kthread.h> 37 #include <linux/kthread.h>
38 #include <linux/linkage.h> 38 #include <linux/linkage.h>
39 #include <linux/raid/md.h> 39 #include <linux/raid/md.h>
40 #include <linux/raid/bitmap.h> 40 #include <linux/raid/bitmap.h>
41 #include <linux/sysctl.h> 41 #include <linux/sysctl.h>
42 #include <linux/devfs_fs_kernel.h> 42 #include <linux/devfs_fs_kernel.h>
43 #include <linux/buffer_head.h> /* for invalidate_bdev */ 43 #include <linux/buffer_head.h> /* for invalidate_bdev */
44 #include <linux/suspend.h> 44 #include <linux/suspend.h>
45 #include <linux/poll.h> 45 #include <linux/poll.h>
46 46
47 #include <linux/init.h> 47 #include <linux/init.h>
48 48
49 #include <linux/file.h> 49 #include <linux/file.h>
50 50
51 #ifdef CONFIG_KMOD 51 #ifdef CONFIG_KMOD
52 #include <linux/kmod.h> 52 #include <linux/kmod.h>
53 #endif 53 #endif
54 54
55 #include <asm/unaligned.h> 55 #include <asm/unaligned.h>
56 56
57 #define MAJOR_NR MD_MAJOR 57 #define MAJOR_NR MD_MAJOR
58 #define MD_DRIVER 58 #define MD_DRIVER
59 59
60 /* 63 partitions with the alternate major number (mdp) */ 60 /* 63 partitions with the alternate major number (mdp) */
61 #define MdpMinorShift 6 61 #define MdpMinorShift 6
62 62
63 #define DEBUG 0 63 #define DEBUG 0
64 #define dprintk(x...) ((void)(DEBUG && printk(x))) 64 #define dprintk(x...) ((void)(DEBUG && printk(x)))
65 65
66 66
67 #ifndef MODULE 67 #ifndef MODULE
68 static void autostart_arrays (int part); 68 static void autostart_arrays (int part);
69 #endif 69 #endif
70 70
71 static LIST_HEAD(pers_list); 71 static LIST_HEAD(pers_list);
72 static DEFINE_SPINLOCK(pers_lock); 72 static DEFINE_SPINLOCK(pers_lock);
73 73
74 /* 74 /*
75 * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' 75 * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit'
76 * is 1000 KB/sec, so the extra system load does not show up that much. 76 * is 1000 KB/sec, so the extra system load does not show up that much.
77 * Increase it if you want to have more _guaranteed_ speed. Note that 77 * Increase it if you want to have more _guaranteed_ speed. Note that
78 * the RAID driver will use the maximum available bandwidth if the IO 78 * the RAID driver will use the maximum available bandwidth if the IO
79 * subsystem is idle. There is also an 'absolute maximum' reconstruction 79 * subsystem is idle. There is also an 'absolute maximum' reconstruction
80 * speed limit - in case reconstruction slows down your system despite 80 * speed limit - in case reconstruction slows down your system despite
81 * idle IO detection. 81 * idle IO detection.
82 * 82 *
83 * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. 83 * you can change it via /proc/sys/dev/raid/speed_limit_min and _max.
84 */ 84 */
85 85
86 static int sysctl_speed_limit_min = 1000; 86 static int sysctl_speed_limit_min = 1000;
87 static int sysctl_speed_limit_max = 200000; 87 static int sysctl_speed_limit_max = 200000;
88 88
89 static struct ctl_table_header *raid_table_header; 89 static struct ctl_table_header *raid_table_header;
90 90
91 static ctl_table raid_table[] = { 91 static ctl_table raid_table[] = {
92 { 92 {
93 .ctl_name = DEV_RAID_SPEED_LIMIT_MIN, 93 .ctl_name = DEV_RAID_SPEED_LIMIT_MIN,
94 .procname = "speed_limit_min", 94 .procname = "speed_limit_min",
95 .data = &sysctl_speed_limit_min, 95 .data = &sysctl_speed_limit_min,
96 .maxlen = sizeof(int), 96 .maxlen = sizeof(int),
97 .mode = 0644, 97 .mode = 0644,
98 .proc_handler = &proc_dointvec, 98 .proc_handler = &proc_dointvec,
99 }, 99 },
100 { 100 {
101 .ctl_name = DEV_RAID_SPEED_LIMIT_MAX, 101 .ctl_name = DEV_RAID_SPEED_LIMIT_MAX,
102 .procname = "speed_limit_max", 102 .procname = "speed_limit_max",
103 .data = &sysctl_speed_limit_max, 103 .data = &sysctl_speed_limit_max,
104 .maxlen = sizeof(int), 104 .maxlen = sizeof(int),
105 .mode = 0644, 105 .mode = 0644,
106 .proc_handler = &proc_dointvec, 106 .proc_handler = &proc_dointvec,
107 }, 107 },
108 { .ctl_name = 0 } 108 { .ctl_name = 0 }
109 }; 109 };
110 110
111 static ctl_table raid_dir_table[] = { 111 static ctl_table raid_dir_table[] = {
112 { 112 {
113 .ctl_name = DEV_RAID, 113 .ctl_name = DEV_RAID,
114 .procname = "raid", 114 .procname = "raid",
115 .maxlen = 0, 115 .maxlen = 0,
116 .mode = 0555, 116 .mode = 0555,
117 .child = raid_table, 117 .child = raid_table,
118 }, 118 },
119 { .ctl_name = 0 } 119 { .ctl_name = 0 }
120 }; 120 };
121 121
122 static ctl_table raid_root_table[] = { 122 static ctl_table raid_root_table[] = {
123 { 123 {
124 .ctl_name = CTL_DEV, 124 .ctl_name = CTL_DEV,
125 .procname = "dev", 125 .procname = "dev",
126 .maxlen = 0, 126 .maxlen = 0,
127 .mode = 0555, 127 .mode = 0555,
128 .child = raid_dir_table, 128 .child = raid_dir_table,
129 }, 129 },
130 { .ctl_name = 0 } 130 { .ctl_name = 0 }
131 }; 131 };
132 132
133 static struct block_device_operations md_fops; 133 static struct block_device_operations md_fops;
134 134
135 static int start_readonly; 135 static int start_readonly;
136 136
137 /* 137 /*
138 * We have a system wide 'event count' that is incremented 138 * We have a system wide 'event count' that is incremented
139 * on any 'interesting' event, and readers of /proc/mdstat 139 * on any 'interesting' event, and readers of /proc/mdstat
140 * can use 'poll' or 'select' to find out when the event 140 * can use 'poll' or 'select' to find out when the event
141 * count increases. 141 * count increases.
142 * 142 *
143 * Events are: 143 * Events are:
144 * start array, stop array, error, add device, remove device, 144 * start array, stop array, error, add device, remove device,
145 * start build, activate spare 145 * start build, activate spare
146 */ 146 */
147 static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters); 147 static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
148 static atomic_t md_event_count; 148 static atomic_t md_event_count;
149 static void md_new_event(mddev_t *mddev) 149 static void md_new_event(mddev_t *mddev)
150 { 150 {
151 atomic_inc(&md_event_count); 151 atomic_inc(&md_event_count);
152 wake_up(&md_event_waiters); 152 wake_up(&md_event_waiters);
153 } 153 }
154 154
155 /* 155 /*
156 * Enables to iterate over all existing md arrays 156 * Enables to iterate over all existing md arrays
157 * all_mddevs_lock protects this list. 157 * all_mddevs_lock protects this list.
158 */ 158 */
159 static LIST_HEAD(all_mddevs); 159 static LIST_HEAD(all_mddevs);
160 static DEFINE_SPINLOCK(all_mddevs_lock); 160 static DEFINE_SPINLOCK(all_mddevs_lock);
161 161
162 162
163 /* 163 /*
164 * iterates through all used mddevs in the system. 164 * iterates through all used mddevs in the system.
165 * We take care to grab the all_mddevs_lock whenever navigating 165 * We take care to grab the all_mddevs_lock whenever navigating
166 * the list, and to always hold a refcount when unlocked. 166 * the list, and to always hold a refcount when unlocked.
167 * Any code which breaks out of this loop while own 167 * Any code which breaks out of this loop while own
168 * a reference to the current mddev and must mddev_put it. 168 * a reference to the current mddev and must mddev_put it.
169 */ 169 */
170 #define ITERATE_MDDEV(mddev,tmp) \ 170 #define ITERATE_MDDEV(mddev,tmp) \
171 \ 171 \
172 for (({ spin_lock(&all_mddevs_lock); \ 172 for (({ spin_lock(&all_mddevs_lock); \
173 tmp = all_mddevs.next; \ 173 tmp = all_mddevs.next; \
174 mddev = NULL;}); \ 174 mddev = NULL;}); \
175 ({ if (tmp != &all_mddevs) \ 175 ({ if (tmp != &all_mddevs) \
176 mddev_get(list_entry(tmp, mddev_t, all_mddevs));\ 176 mddev_get(list_entry(tmp, mddev_t, all_mddevs));\
177 spin_unlock(&all_mddevs_lock); \ 177 spin_unlock(&all_mddevs_lock); \
178 if (mddev) mddev_put(mddev); \ 178 if (mddev) mddev_put(mddev); \
179 mddev = list_entry(tmp, mddev_t, all_mddevs); \ 179 mddev = list_entry(tmp, mddev_t, all_mddevs); \
180 tmp != &all_mddevs;}); \ 180 tmp != &all_mddevs;}); \
181 ({ spin_lock(&all_mddevs_lock); \ 181 ({ spin_lock(&all_mddevs_lock); \
182 tmp = tmp->next;}) \ 182 tmp = tmp->next;}) \
183 ) 183 )
184 184
185 185
186 static int md_fail_request (request_queue_t *q, struct bio *bio) 186 static int md_fail_request (request_queue_t *q, struct bio *bio)
187 { 187 {
188 bio_io_error(bio, bio->bi_size); 188 bio_io_error(bio, bio->bi_size);
189 return 0; 189 return 0;
190 } 190 }
191 191
192 static inline mddev_t *mddev_get(mddev_t *mddev) 192 static inline mddev_t *mddev_get(mddev_t *mddev)
193 { 193 {
194 atomic_inc(&mddev->active); 194 atomic_inc(&mddev->active);
195 return mddev; 195 return mddev;
196 } 196 }
197 197
198 static void mddev_put(mddev_t *mddev) 198 static void mddev_put(mddev_t *mddev)
199 { 199 {
200 if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) 200 if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
201 return; 201 return;
202 if (!mddev->raid_disks && list_empty(&mddev->disks)) { 202 if (!mddev->raid_disks && list_empty(&mddev->disks)) {
203 list_del(&mddev->all_mddevs); 203 list_del(&mddev->all_mddevs);
204 blk_put_queue(mddev->queue); 204 blk_put_queue(mddev->queue);
205 kobject_unregister(&mddev->kobj); 205 kobject_unregister(&mddev->kobj);
206 } 206 }
207 spin_unlock(&all_mddevs_lock); 207 spin_unlock(&all_mddevs_lock);
208 } 208 }
209 209
210 static mddev_t * mddev_find(dev_t unit) 210 static mddev_t * mddev_find(dev_t unit)
211 { 211 {
212 mddev_t *mddev, *new = NULL; 212 mddev_t *mddev, *new = NULL;
213 213
214 retry: 214 retry:
215 spin_lock(&all_mddevs_lock); 215 spin_lock(&all_mddevs_lock);
216 list_for_each_entry(mddev, &all_mddevs, all_mddevs) 216 list_for_each_entry(mddev, &all_mddevs, all_mddevs)
217 if (mddev->unit == unit) { 217 if (mddev->unit == unit) {
218 mddev_get(mddev); 218 mddev_get(mddev);
219 spin_unlock(&all_mddevs_lock); 219 spin_unlock(&all_mddevs_lock);
220 kfree(new); 220 kfree(new);
221 return mddev; 221 return mddev;
222 } 222 }
223 223
224 if (new) { 224 if (new) {
225 list_add(&new->all_mddevs, &all_mddevs); 225 list_add(&new->all_mddevs, &all_mddevs);
226 spin_unlock(&all_mddevs_lock); 226 spin_unlock(&all_mddevs_lock);
227 return new; 227 return new;
228 } 228 }
229 spin_unlock(&all_mddevs_lock); 229 spin_unlock(&all_mddevs_lock);
230 230
231 new = kzalloc(sizeof(*new), GFP_KERNEL); 231 new = kzalloc(sizeof(*new), GFP_KERNEL);
232 if (!new) 232 if (!new)
233 return NULL; 233 return NULL;
234 234
235 new->unit = unit; 235 new->unit = unit;
236 if (MAJOR(unit) == MD_MAJOR) 236 if (MAJOR(unit) == MD_MAJOR)
237 new->md_minor = MINOR(unit); 237 new->md_minor = MINOR(unit);
238 else 238 else
239 new->md_minor = MINOR(unit) >> MdpMinorShift; 239 new->md_minor = MINOR(unit) >> MdpMinorShift;
240 240
241 init_MUTEX(&new->reconfig_sem); 241 init_MUTEX(&new->reconfig_sem);
242 INIT_LIST_HEAD(&new->disks); 242 INIT_LIST_HEAD(&new->disks);
243 INIT_LIST_HEAD(&new->all_mddevs); 243 INIT_LIST_HEAD(&new->all_mddevs);
244 init_timer(&new->safemode_timer); 244 init_timer(&new->safemode_timer);
245 atomic_set(&new->active, 1); 245 atomic_set(&new->active, 1);
246 spin_lock_init(&new->write_lock); 246 spin_lock_init(&new->write_lock);
247 init_waitqueue_head(&new->sb_wait); 247 init_waitqueue_head(&new->sb_wait);
248 248
249 new->queue = blk_alloc_queue(GFP_KERNEL); 249 new->queue = blk_alloc_queue(GFP_KERNEL);
250 if (!new->queue) { 250 if (!new->queue) {
251 kfree(new); 251 kfree(new);
252 return NULL; 252 return NULL;
253 } 253 }
254 254
255 blk_queue_make_request(new->queue, md_fail_request); 255 blk_queue_make_request(new->queue, md_fail_request);
256 256
257 goto retry; 257 goto retry;
258 } 258 }
259 259
260 static inline int mddev_lock(mddev_t * mddev) 260 static inline int mddev_lock(mddev_t * mddev)
261 { 261 {
262 return down_interruptible(&mddev->reconfig_sem); 262 return down_interruptible(&mddev->reconfig_sem);
263 } 263 }
264 264
265 static inline void mddev_lock_uninterruptible(mddev_t * mddev) 265 static inline void mddev_lock_uninterruptible(mddev_t * mddev)
266 { 266 {
267 down(&mddev->reconfig_sem); 267 down(&mddev->reconfig_sem);
268 } 268 }
269 269
270 static inline int mddev_trylock(mddev_t * mddev) 270 static inline int mddev_trylock(mddev_t * mddev)
271 { 271 {
272 return down_trylock(&mddev->reconfig_sem); 272 return down_trylock(&mddev->reconfig_sem);
273 } 273 }
274 274
275 static inline void mddev_unlock(mddev_t * mddev) 275 static inline void mddev_unlock(mddev_t * mddev)
276 { 276 {
277 up(&mddev->reconfig_sem); 277 up(&mddev->reconfig_sem);
278 278
279 md_wakeup_thread(mddev->thread); 279 md_wakeup_thread(mddev->thread);
280 } 280 }
281 281
282 static mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) 282 static mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
283 { 283 {
284 mdk_rdev_t * rdev; 284 mdk_rdev_t * rdev;
285 struct list_head *tmp; 285 struct list_head *tmp;
286 286
287 ITERATE_RDEV(mddev,rdev,tmp) { 287 ITERATE_RDEV(mddev,rdev,tmp) {
288 if (rdev->desc_nr == nr) 288 if (rdev->desc_nr == nr)
289 return rdev; 289 return rdev;
290 } 290 }
291 return NULL; 291 return NULL;
292 } 292 }
293 293
294 static mdk_rdev_t * find_rdev(mddev_t * mddev, dev_t dev) 294 static mdk_rdev_t * find_rdev(mddev_t * mddev, dev_t dev)
295 { 295 {
296 struct list_head *tmp; 296 struct list_head *tmp;
297 mdk_rdev_t *rdev; 297 mdk_rdev_t *rdev;
298 298
299 ITERATE_RDEV(mddev,rdev,tmp) { 299 ITERATE_RDEV(mddev,rdev,tmp) {
300 if (rdev->bdev->bd_dev == dev) 300 if (rdev->bdev->bd_dev == dev)
301 return rdev; 301 return rdev;
302 } 302 }
303 return NULL; 303 return NULL;
304 } 304 }
305 305
306 static struct mdk_personality *find_pers(int level) 306 static struct mdk_personality *find_pers(int level)
307 { 307 {
308 struct mdk_personality *pers; 308 struct mdk_personality *pers;
309 list_for_each_entry(pers, &pers_list, list) 309 list_for_each_entry(pers, &pers_list, list)
310 if (pers->level == level) 310 if (pers->level == level)
311 return pers; 311 return pers;
312 return NULL; 312 return NULL;
313 } 313 }
314 314
315 static inline sector_t calc_dev_sboffset(struct block_device *bdev) 315 static inline sector_t calc_dev_sboffset(struct block_device *bdev)
316 { 316 {
317 sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; 317 sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
318 return MD_NEW_SIZE_BLOCKS(size); 318 return MD_NEW_SIZE_BLOCKS(size);
319 } 319 }
320 320
321 static sector_t calc_dev_size(mdk_rdev_t *rdev, unsigned chunk_size) 321 static sector_t calc_dev_size(mdk_rdev_t *rdev, unsigned chunk_size)
322 { 322 {
323 sector_t size; 323 sector_t size;
324 324
325 size = rdev->sb_offset; 325 size = rdev->sb_offset;
326 326
327 if (chunk_size) 327 if (chunk_size)
328 size &= ~((sector_t)chunk_size/1024 - 1); 328 size &= ~((sector_t)chunk_size/1024 - 1);
329 return size; 329 return size;
330 } 330 }
331 331
332 static int alloc_disk_sb(mdk_rdev_t * rdev) 332 static int alloc_disk_sb(mdk_rdev_t * rdev)
333 { 333 {
334 if (rdev->sb_page) 334 if (rdev->sb_page)
335 MD_BUG(); 335 MD_BUG();
336 336
337 rdev->sb_page = alloc_page(GFP_KERNEL); 337 rdev->sb_page = alloc_page(GFP_KERNEL);
338 if (!rdev->sb_page) { 338 if (!rdev->sb_page) {
339 printk(KERN_ALERT "md: out of memory.\n"); 339 printk(KERN_ALERT "md: out of memory.\n");
340 return -EINVAL; 340 return -EINVAL;
341 } 341 }
342 342
343 return 0; 343 return 0;
344 } 344 }
345 345
346 static void free_disk_sb(mdk_rdev_t * rdev) 346 static void free_disk_sb(mdk_rdev_t * rdev)
347 { 347 {
348 if (rdev->sb_page) { 348 if (rdev->sb_page) {
349 put_page(rdev->sb_page); 349 put_page(rdev->sb_page);
350 rdev->sb_loaded = 0; 350 rdev->sb_loaded = 0;
351 rdev->sb_page = NULL; 351 rdev->sb_page = NULL;
352 rdev->sb_offset = 0; 352 rdev->sb_offset = 0;
353 rdev->size = 0; 353 rdev->size = 0;
354 } 354 }
355 } 355 }
356 356
357 357
358 static int super_written(struct bio *bio, unsigned int bytes_done, int error) 358 static int super_written(struct bio *bio, unsigned int bytes_done, int error)
359 { 359 {
360 mdk_rdev_t *rdev = bio->bi_private; 360 mdk_rdev_t *rdev = bio->bi_private;
361 mddev_t *mddev = rdev->mddev; 361 mddev_t *mddev = rdev->mddev;
362 if (bio->bi_size) 362 if (bio->bi_size)
363 return 1; 363 return 1;
364 364
365 if (error || !test_bit(BIO_UPTODATE, &bio->bi_flags)) 365 if (error || !test_bit(BIO_UPTODATE, &bio->bi_flags))
366 md_error(mddev, rdev); 366 md_error(mddev, rdev);
367 367
368 if (atomic_dec_and_test(&mddev->pending_writes)) 368 if (atomic_dec_and_test(&mddev->pending_writes))
369 wake_up(&mddev->sb_wait); 369 wake_up(&mddev->sb_wait);
370 bio_put(bio); 370 bio_put(bio);
371 return 0; 371 return 0;
372 } 372 }
373 373
374 static int super_written_barrier(struct bio *bio, unsigned int bytes_done, int error) 374 static int super_written_barrier(struct bio *bio, unsigned int bytes_done, int error)
375 { 375 {
376 struct bio *bio2 = bio->bi_private; 376 struct bio *bio2 = bio->bi_private;
377 mdk_rdev_t *rdev = bio2->bi_private; 377 mdk_rdev_t *rdev = bio2->bi_private;
378 mddev_t *mddev = rdev->mddev; 378 mddev_t *mddev = rdev->mddev;
379 if (bio->bi_size) 379 if (bio->bi_size)
380 return 1; 380 return 1;
381 381
382 if (!test_bit(BIO_UPTODATE, &bio->bi_flags) && 382 if (!test_bit(BIO_UPTODATE, &bio->bi_flags) &&
383 error == -EOPNOTSUPP) { 383 error == -EOPNOTSUPP) {
384 unsigned long flags; 384 unsigned long flags;
385 /* barriers don't appear to be supported :-( */ 385 /* barriers don't appear to be supported :-( */
386 set_bit(BarriersNotsupp, &rdev->flags); 386 set_bit(BarriersNotsupp, &rdev->flags);
387 mddev->barriers_work = 0; 387 mddev->barriers_work = 0;
388 spin_lock_irqsave(&mddev->write_lock, flags); 388 spin_lock_irqsave(&mddev->write_lock, flags);
389 bio2->bi_next = mddev->biolist; 389 bio2->bi_next = mddev->biolist;
390 mddev->biolist = bio2; 390 mddev->biolist = bio2;
391 spin_unlock_irqrestore(&mddev->write_lock, flags); 391 spin_unlock_irqrestore(&mddev->write_lock, flags);
392 wake_up(&mddev->sb_wait); 392 wake_up(&mddev->sb_wait);
393 bio_put(bio); 393 bio_put(bio);
394 return 0; 394 return 0;
395 } 395 }
396 bio_put(bio2); 396 bio_put(bio2);
397 bio->bi_private = rdev; 397 bio->bi_private = rdev;
398 return super_written(bio, bytes_done, error); 398 return super_written(bio, bytes_done, error);
399 } 399 }
400 400
401 void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev, 401 void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev,
402 sector_t sector, int size, struct page *page) 402 sector_t sector, int size, struct page *page)
403 { 403 {
404 /* write first size bytes of page to sector of rdev 404 /* write first size bytes of page to sector of rdev
405 * Increment mddev->pending_writes before returning 405 * Increment mddev->pending_writes before returning
406 * and decrement it on completion, waking up sb_wait 406 * and decrement it on completion, waking up sb_wait
407 * if zero is reached. 407 * if zero is reached.
408 * If an error occurred, call md_error 408 * If an error occurred, call md_error
409 * 409 *
410 * As we might need to resubmit the request if BIO_RW_BARRIER 410 * As we might need to resubmit the request if BIO_RW_BARRIER
411 * causes ENOTSUPP, we allocate a spare bio... 411 * causes ENOTSUPP, we allocate a spare bio...
412 */ 412 */
413 struct bio *bio = bio_alloc(GFP_NOIO, 1); 413 struct bio *bio = bio_alloc(GFP_NOIO, 1);
414 int rw = (1<<BIO_RW) | (1<<BIO_RW_SYNC); 414 int rw = (1<<BIO_RW) | (1<<BIO_RW_SYNC);
415 415
416 bio->bi_bdev = rdev->bdev; 416 bio->bi_bdev = rdev->bdev;
417 bio->bi_sector = sector; 417 bio->bi_sector = sector;
418 bio_add_page(bio, page, size, 0); 418 bio_add_page(bio, page, size, 0);
419 bio->bi_private = rdev; 419 bio->bi_private = rdev;
420 bio->bi_end_io = super_written; 420 bio->bi_end_io = super_written;
421 bio->bi_rw = rw; 421 bio->bi_rw = rw;
422 422
423 atomic_inc(&mddev->pending_writes); 423 atomic_inc(&mddev->pending_writes);
424 if (!test_bit(BarriersNotsupp, &rdev->flags)) { 424 if (!test_bit(BarriersNotsupp, &rdev->flags)) {
425 struct bio *rbio; 425 struct bio *rbio;
426 rw |= (1<<BIO_RW_BARRIER); 426 rw |= (1<<BIO_RW_BARRIER);
427 rbio = bio_clone(bio, GFP_NOIO); 427 rbio = bio_clone(bio, GFP_NOIO);
428 rbio->bi_private = bio; 428 rbio->bi_private = bio;
429 rbio->bi_end_io = super_written_barrier; 429 rbio->bi_end_io = super_written_barrier;
430 submit_bio(rw, rbio); 430 submit_bio(rw, rbio);
431 } else 431 } else
432 submit_bio(rw, bio); 432 submit_bio(rw, bio);
433 } 433 }
434 434
435 void md_super_wait(mddev_t *mddev) 435 void md_super_wait(mddev_t *mddev)
436 { 436 {
437 /* wait for all superblock writes that were scheduled to complete. 437 /* wait for all superblock writes that were scheduled to complete.
438 * if any had to be retried (due to BARRIER problems), retry them 438 * if any had to be retried (due to BARRIER problems), retry them
439 */ 439 */
440 DEFINE_WAIT(wq); 440 DEFINE_WAIT(wq);
441 for(;;) { 441 for(;;) {
442 prepare_to_wait(&mddev->sb_wait, &wq, TASK_UNINTERRUPTIBLE); 442 prepare_to_wait(&mddev->sb_wait, &wq, TASK_UNINTERRUPTIBLE);
443 if (atomic_read(&mddev->pending_writes)==0) 443 if (atomic_read(&mddev->pending_writes)==0)
444 break; 444 break;
445 while (mddev->biolist) { 445 while (mddev->biolist) {
446 struct bio *bio; 446 struct bio *bio;
447 spin_lock_irq(&mddev->write_lock); 447 spin_lock_irq(&mddev->write_lock);
448 bio = mddev->biolist; 448 bio = mddev->biolist;
449 mddev->biolist = bio->bi_next ; 449 mddev->biolist = bio->bi_next ;
450 bio->bi_next = NULL; 450 bio->bi_next = NULL;
451 spin_unlock_irq(&mddev->write_lock); 451 spin_unlock_irq(&mddev->write_lock);
452 submit_bio(bio->bi_rw, bio); 452 submit_bio(bio->bi_rw, bio);
453 } 453 }
454 schedule(); 454 schedule();
455 } 455 }
456 finish_wait(&mddev->sb_wait, &wq); 456 finish_wait(&mddev->sb_wait, &wq);
457 } 457 }
458 458
459 static int bi_complete(struct bio *bio, unsigned int bytes_done, int error) 459 static int bi_complete(struct bio *bio, unsigned int bytes_done, int error)
460 { 460 {
461 if (bio->bi_size) 461 if (bio->bi_size)
462 return 1; 462 return 1;
463 463
464 complete((struct completion*)bio->bi_private); 464 complete((struct completion*)bio->bi_private);
465 return 0; 465 return 0;
466 } 466 }
467 467
468 int sync_page_io(struct block_device *bdev, sector_t sector, int size, 468 int sync_page_io(struct block_device *bdev, sector_t sector, int size,
469 struct page *page, int rw) 469 struct page *page, int rw)
470 { 470 {
471 struct bio *bio = bio_alloc(GFP_NOIO, 1); 471 struct bio *bio = bio_alloc(GFP_NOIO, 1);
472 struct completion event; 472 struct completion event;
473 int ret; 473 int ret;
474 474
475 rw |= (1 << BIO_RW_SYNC); 475 rw |= (1 << BIO_RW_SYNC);
476 476
477 bio->bi_bdev = bdev; 477 bio->bi_bdev = bdev;
478 bio->bi_sector = sector; 478 bio->bi_sector = sector;
479 bio_add_page(bio, page, size, 0); 479 bio_add_page(bio, page, size, 0);
480 init_completion(&event); 480 init_completion(&event);
481 bio->bi_private = &event; 481 bio->bi_private = &event;
482 bio->bi_end_io = bi_complete; 482 bio->bi_end_io = bi_complete;
483 submit_bio(rw, bio); 483 submit_bio(rw, bio);
484 wait_for_completion(&event); 484 wait_for_completion(&event);
485 485
486 ret = test_bit(BIO_UPTODATE, &bio->bi_flags); 486 ret = test_bit(BIO_UPTODATE, &bio->bi_flags);
487 bio_put(bio); 487 bio_put(bio);
488 return ret; 488 return ret;
489 } 489 }
490 EXPORT_SYMBOL_GPL(sync_page_io); 490 EXPORT_SYMBOL_GPL(sync_page_io);
491 491
492 static int read_disk_sb(mdk_rdev_t * rdev, int size) 492 static int read_disk_sb(mdk_rdev_t * rdev, int size)
493 { 493 {
494 char b[BDEVNAME_SIZE]; 494 char b[BDEVNAME_SIZE];
495 if (!rdev->sb_page) { 495 if (!rdev->sb_page) {
496 MD_BUG(); 496 MD_BUG();
497 return -EINVAL; 497 return -EINVAL;
498 } 498 }
499 if (rdev->sb_loaded) 499 if (rdev->sb_loaded)
500 return 0; 500 return 0;
501 501
502 502
503 if (!sync_page_io(rdev->bdev, rdev->sb_offset<<1, size, rdev->sb_page, READ)) 503 if (!sync_page_io(rdev->bdev, rdev->sb_offset<<1, size, rdev->sb_page, READ))
504 goto fail; 504 goto fail;
505 rdev->sb_loaded = 1; 505 rdev->sb_loaded = 1;
506 return 0; 506 return 0;
507 507
508 fail: 508 fail:
509 printk(KERN_WARNING "md: disabled device %s, could not read superblock.\n", 509 printk(KERN_WARNING "md: disabled device %s, could not read superblock.\n",
510 bdevname(rdev->bdev,b)); 510 bdevname(rdev->bdev,b));
511 return -EINVAL; 511 return -EINVAL;
512 } 512 }
513 513
514 static int uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2) 514 static int uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2)
515 { 515 {
516 if ( (sb1->set_uuid0 == sb2->set_uuid0) && 516 if ( (sb1->set_uuid0 == sb2->set_uuid0) &&
517 (sb1->set_uuid1 == sb2->set_uuid1) && 517 (sb1->set_uuid1 == sb2->set_uuid1) &&
518 (sb1->set_uuid2 == sb2->set_uuid2) && 518 (sb1->set_uuid2 == sb2->set_uuid2) &&
519 (sb1->set_uuid3 == sb2->set_uuid3)) 519 (sb1->set_uuid3 == sb2->set_uuid3))
520 520
521 return 1; 521 return 1;
522 522
523 return 0; 523 return 0;
524 } 524 }
525 525
526 526
527 static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) 527 static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2)
528 { 528 {
529 int ret; 529 int ret;
530 mdp_super_t *tmp1, *tmp2; 530 mdp_super_t *tmp1, *tmp2;
531 531
532 tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); 532 tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL);
533 tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); 533 tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL);
534 534
535 if (!tmp1 || !tmp2) { 535 if (!tmp1 || !tmp2) {
536 ret = 0; 536 ret = 0;
537 printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); 537 printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n");
538 goto abort; 538 goto abort;
539 } 539 }
540 540
541 *tmp1 = *sb1; 541 *tmp1 = *sb1;
542 *tmp2 = *sb2; 542 *tmp2 = *sb2;
543 543
544 /* 544 /*
545 * nr_disks is not constant 545 * nr_disks is not constant
546 */ 546 */
547 tmp1->nr_disks = 0; 547 tmp1->nr_disks = 0;
548 tmp2->nr_disks = 0; 548 tmp2->nr_disks = 0;
549 549
550 if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) 550 if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4))
551 ret = 0; 551 ret = 0;
552 else 552 else
553 ret = 1; 553 ret = 1;
554 554
555 abort: 555 abort:
556 kfree(tmp1); 556 kfree(tmp1);
557 kfree(tmp2); 557 kfree(tmp2);
558 return ret; 558 return ret;
559 } 559 }
560 560
561 static unsigned int calc_sb_csum(mdp_super_t * sb) 561 static unsigned int calc_sb_csum(mdp_super_t * sb)
562 { 562 {
563 unsigned int disk_csum, csum; 563 unsigned int disk_csum, csum;
564 564
565 disk_csum = sb->sb_csum; 565 disk_csum = sb->sb_csum;
566 sb->sb_csum = 0; 566 sb->sb_csum = 0;
567 csum = csum_partial((void *)sb, MD_SB_BYTES, 0); 567 csum = csum_partial((void *)sb, MD_SB_BYTES, 0);
568 sb->sb_csum = disk_csum; 568 sb->sb_csum = disk_csum;
569 return csum; 569 return csum;
570 } 570 }
571 571
572 572
573 /* 573 /*
574 * Handle superblock details. 574 * Handle superblock details.
575 * We want to be able to handle multiple superblock formats 575 * We want to be able to handle multiple superblock formats
576 * so we have a common interface to them all, and an array of 576 * so we have a common interface to them all, and an array of
577 * different handlers. 577 * different handlers.
578 * We rely on user-space to write the initial superblock, and support 578 * We rely on user-space to write the initial superblock, and support
579 * reading and updating of superblocks. 579 * reading and updating of superblocks.
580 * Interface methods are: 580 * Interface methods are:
581 * int load_super(mdk_rdev_t *dev, mdk_rdev_t *refdev, int minor_version) 581 * int load_super(mdk_rdev_t *dev, mdk_rdev_t *refdev, int minor_version)
582 * loads and validates a superblock on dev. 582 * loads and validates a superblock on dev.
583 * if refdev != NULL, compare superblocks on both devices 583 * if refdev != NULL, compare superblocks on both devices
584 * Return: 584 * Return:
585 * 0 - dev has a superblock that is compatible with refdev 585 * 0 - dev has a superblock that is compatible with refdev
586 * 1 - dev has a superblock that is compatible and newer than refdev 586 * 1 - dev has a superblock that is compatible and newer than refdev
587 * so dev should be used as the refdev in future 587 * so dev should be used as the refdev in future
588 * -EINVAL superblock incompatible or invalid 588 * -EINVAL superblock incompatible or invalid
589 * -othererror e.g. -EIO 589 * -othererror e.g. -EIO
590 * 590 *
591 * int validate_super(mddev_t *mddev, mdk_rdev_t *dev) 591 * int validate_super(mddev_t *mddev, mdk_rdev_t *dev)
592 * Verify that dev is acceptable into mddev. 592 * Verify that dev is acceptable into mddev.
593 * The first time, mddev->raid_disks will be 0, and data from 593 * The first time, mddev->raid_disks will be 0, and data from
594 * dev should be merged in. Subsequent calls check that dev 594 * dev should be merged in. Subsequent calls check that dev
595 * is new enough. Return 0 or -EINVAL 595 * is new enough. Return 0 or -EINVAL
596 * 596 *
597 * void sync_super(mddev_t *mddev, mdk_rdev_t *dev) 597 * void sync_super(mddev_t *mddev, mdk_rdev_t *dev)
598 * Update the superblock for rdev with data in mddev 598 * Update the superblock for rdev with data in mddev
599 * This does not write to disc. 599 * This does not write to disc.
600 * 600 *
601 */ 601 */
602 602
603 struct super_type { 603 struct super_type {
604 char *name; 604 char *name;
605 struct module *owner; 605 struct module *owner;
606 int (*load_super)(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version); 606 int (*load_super)(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version);
607 int (*validate_super)(mddev_t *mddev, mdk_rdev_t *rdev); 607 int (*validate_super)(mddev_t *mddev, mdk_rdev_t *rdev);
608 void (*sync_super)(mddev_t *mddev, mdk_rdev_t *rdev); 608 void (*sync_super)(mddev_t *mddev, mdk_rdev_t *rdev);
609 }; 609 };
610 610
611 /* 611 /*
612 * load_super for 0.90.0 612 * load_super for 0.90.0
613 */ 613 */
614 static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) 614 static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version)
615 { 615 {
616 char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; 616 char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE];
617 mdp_super_t *sb; 617 mdp_super_t *sb;
618 int ret; 618 int ret;
619 sector_t sb_offset; 619 sector_t sb_offset;
620 620
621 /* 621 /*
622 * Calculate the position of the superblock, 622 * Calculate the position of the superblock,
623 * it's at the end of the disk. 623 * it's at the end of the disk.
624 * 624 *
625 * It also happens to be a multiple of 4Kb. 625 * It also happens to be a multiple of 4Kb.
626 */ 626 */
627 sb_offset = calc_dev_sboffset(rdev->bdev); 627 sb_offset = calc_dev_sboffset(rdev->bdev);
628 rdev->sb_offset = sb_offset; 628 rdev->sb_offset = sb_offset;
629 629
630 ret = read_disk_sb(rdev, MD_SB_BYTES); 630 ret = read_disk_sb(rdev, MD_SB_BYTES);
631 if (ret) return ret; 631 if (ret) return ret;
632 632
633 ret = -EINVAL; 633 ret = -EINVAL;
634 634
635 bdevname(rdev->bdev, b); 635 bdevname(rdev->bdev, b);
636 sb = (mdp_super_t*)page_address(rdev->sb_page); 636 sb = (mdp_super_t*)page_address(rdev->sb_page);
637 637
638 if (sb->md_magic != MD_SB_MAGIC) { 638 if (sb->md_magic != MD_SB_MAGIC) {
639 printk(KERN_ERR "md: invalid raid superblock magic on %s\n", 639 printk(KERN_ERR "md: invalid raid superblock magic on %s\n",
640 b); 640 b);
641 goto abort; 641 goto abort;
642 } 642 }
643 643
644 if (sb->major_version != 0 || 644 if (sb->major_version != 0 ||
645 sb->minor_version != 90) { 645 sb->minor_version != 90) {
646 printk(KERN_WARNING "Bad version number %d.%d on %s\n", 646 printk(KERN_WARNING "Bad version number %d.%d on %s\n",
647 sb->major_version, sb->minor_version, 647 sb->major_version, sb->minor_version,
648 b); 648 b);
649 goto abort; 649 goto abort;
650 } 650 }
651 651
652 if (sb->raid_disks <= 0) 652 if (sb->raid_disks <= 0)
653 goto abort; 653 goto abort;
654 654
655 if (csum_fold(calc_sb_csum(sb)) != csum_fold(sb->sb_csum)) { 655 if (csum_fold(calc_sb_csum(sb)) != csum_fold(sb->sb_csum)) {
656 printk(KERN_WARNING "md: invalid superblock checksum on %s\n", 656 printk(KERN_WARNING "md: invalid superblock checksum on %s\n",
657 b); 657 b);
658 goto abort; 658 goto abort;
659 } 659 }
660 660
661 rdev->preferred_minor = sb->md_minor; 661 rdev->preferred_minor = sb->md_minor;
662 rdev->data_offset = 0; 662 rdev->data_offset = 0;
663 rdev->sb_size = MD_SB_BYTES; 663 rdev->sb_size = MD_SB_BYTES;
664 664
665 if (sb->level == LEVEL_MULTIPATH) 665 if (sb->level == LEVEL_MULTIPATH)
666 rdev->desc_nr = -1; 666 rdev->desc_nr = -1;
667 else 667 else
668 rdev->desc_nr = sb->this_disk.number; 668 rdev->desc_nr = sb->this_disk.number;
669 669
670 if (refdev == 0) 670 if (refdev == 0)
671 ret = 1; 671 ret = 1;
672 else { 672 else {
673 __u64 ev1, ev2; 673 __u64 ev1, ev2;
674 mdp_super_t *refsb = (mdp_super_t*)page_address(refdev->sb_page); 674 mdp_super_t *refsb = (mdp_super_t*)page_address(refdev->sb_page);
675 if (!uuid_equal(refsb, sb)) { 675 if (!uuid_equal(refsb, sb)) {
676 printk(KERN_WARNING "md: %s has different UUID to %s\n", 676 printk(KERN_WARNING "md: %s has different UUID to %s\n",
677 b, bdevname(refdev->bdev,b2)); 677 b, bdevname(refdev->bdev,b2));
678 goto abort; 678 goto abort;
679 } 679 }
680 if (!sb_equal(refsb, sb)) { 680 if (!sb_equal(refsb, sb)) {
681 printk(KERN_WARNING "md: %s has same UUID" 681 printk(KERN_WARNING "md: %s has same UUID"
682 " but different superblock to %s\n", 682 " but different superblock to %s\n",
683 b, bdevname(refdev->bdev, b2)); 683 b, bdevname(refdev->bdev, b2));
684 goto abort; 684 goto abort;
685 } 685 }
686 ev1 = md_event(sb); 686 ev1 = md_event(sb);
687 ev2 = md_event(refsb); 687 ev2 = md_event(refsb);
688 if (ev1 > ev2) 688 if (ev1 > ev2)
689 ret = 1; 689 ret = 1;
690 else 690 else
691 ret = 0; 691 ret = 0;
692 } 692 }
693 rdev->size = calc_dev_size(rdev, sb->chunk_size); 693 rdev->size = calc_dev_size(rdev, sb->chunk_size);
694 694
695 abort: 695 abort:
696 return ret; 696 return ret;
697 } 697 }
698 698
699 /* 699 /*
700 * validate_super for 0.90.0 700 * validate_super for 0.90.0
701 */ 701 */
702 static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) 702 static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev)
703 { 703 {
704 mdp_disk_t *desc; 704 mdp_disk_t *desc;
705 mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page); 705 mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page);
706 706
707 rdev->raid_disk = -1; 707 rdev->raid_disk = -1;
708 rdev->flags = 0; 708 rdev->flags = 0;
709 if (mddev->raid_disks == 0) { 709 if (mddev->raid_disks == 0) {
710 mddev->major_version = 0; 710 mddev->major_version = 0;
711 mddev->minor_version = sb->minor_version; 711 mddev->minor_version = sb->minor_version;
712 mddev->patch_version = sb->patch_version; 712 mddev->patch_version = sb->patch_version;
713 mddev->persistent = ! sb->not_persistent; 713 mddev->persistent = ! sb->not_persistent;
714 mddev->chunk_size = sb->chunk_size; 714 mddev->chunk_size = sb->chunk_size;
715 mddev->ctime = sb->ctime; 715 mddev->ctime = sb->ctime;
716 mddev->utime = sb->utime; 716 mddev->utime = sb->utime;
717 mddev->level = sb->level; 717 mddev->level = sb->level;
718 mddev->layout = sb->layout; 718 mddev->layout = sb->layout;
719 mddev->raid_disks = sb->raid_disks; 719 mddev->raid_disks = sb->raid_disks;
720 mddev->size = sb->size; 720 mddev->size = sb->size;
721 mddev->events = md_event(sb); 721 mddev->events = md_event(sb);
722 mddev->bitmap_offset = 0; 722 mddev->bitmap_offset = 0;
723 mddev->default_bitmap_offset = MD_SB_BYTES >> 9; 723 mddev->default_bitmap_offset = MD_SB_BYTES >> 9;
724 724
725 if (sb->state & (1<<MD_SB_CLEAN)) 725 if (sb->state & (1<<MD_SB_CLEAN))
726 mddev->recovery_cp = MaxSector; 726 mddev->recovery_cp = MaxSector;
727 else { 727 else {
728 if (sb->events_hi == sb->cp_events_hi && 728 if (sb->events_hi == sb->cp_events_hi &&
729 sb->events_lo == sb->cp_events_lo) { 729 sb->events_lo == sb->cp_events_lo) {
730 mddev->recovery_cp = sb->recovery_cp; 730 mddev->recovery_cp = sb->recovery_cp;
731 } else 731 } else
732 mddev->recovery_cp = 0; 732 mddev->recovery_cp = 0;
733 } 733 }
734 734
735 memcpy(mddev->uuid+0, &sb->set_uuid0, 4); 735 memcpy(mddev->uuid+0, &sb->set_uuid0, 4);
736 memcpy(mddev->uuid+4, &sb->set_uuid1, 4); 736 memcpy(mddev->uuid+4, &sb->set_uuid1, 4);
737 memcpy(mddev->uuid+8, &sb->set_uuid2, 4); 737 memcpy(mddev->uuid+8, &sb->set_uuid2, 4);
738 memcpy(mddev->uuid+12,&sb->set_uuid3, 4); 738 memcpy(mddev->uuid+12,&sb->set_uuid3, 4);
739 739
740 mddev->max_disks = MD_SB_DISKS; 740 mddev->max_disks = MD_SB_DISKS;
741 741
742 if (sb->state & (1<<MD_SB_BITMAP_PRESENT) && 742 if (sb->state & (1<<MD_SB_BITMAP_PRESENT) &&
743 mddev->bitmap_file == NULL) { 743 mddev->bitmap_file == NULL) {
744 if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6 744 if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6
745 && mddev->level != 10) { 745 && mddev->level != 10) {
746 /* FIXME use a better test */ 746 /* FIXME use a better test */
747 printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); 747 printk(KERN_WARNING "md: bitmaps not supported for this level.\n");
748 return -EINVAL; 748 return -EINVAL;
749 } 749 }
750 mddev->bitmap_offset = mddev->default_bitmap_offset; 750 mddev->bitmap_offset = mddev->default_bitmap_offset;
751 } 751 }
752 752
753 } else if (mddev->pers == NULL) { 753 } else if (mddev->pers == NULL) {
754 /* Insist on good event counter while assembling */ 754 /* Insist on good event counter while assembling */
755 __u64 ev1 = md_event(sb); 755 __u64 ev1 = md_event(sb);
756 ++ev1; 756 ++ev1;
757 if (ev1 < mddev->events) 757 if (ev1 < mddev->events)
758 return -EINVAL; 758 return -EINVAL;
759 } else if (mddev->bitmap) { 759 } else if (mddev->bitmap) {
760 /* if adding to array with a bitmap, then we can accept an 760 /* if adding to array with a bitmap, then we can accept an
761 * older device ... but not too old. 761 * older device ... but not too old.
762 */ 762 */
763 __u64 ev1 = md_event(sb); 763 __u64 ev1 = md_event(sb);
764 if (ev1 < mddev->bitmap->events_cleared) 764 if (ev1 < mddev->bitmap->events_cleared)
765 return 0; 765 return 0;
766 } else /* just a hot-add of a new device, leave raid_disk at -1 */ 766 } else /* just a hot-add of a new device, leave raid_disk at -1 */
767 return 0; 767 return 0;
768 768
769 if (mddev->level != LEVEL_MULTIPATH) { 769 if (mddev->level != LEVEL_MULTIPATH) {
770 desc = sb->disks + rdev->desc_nr; 770 desc = sb->disks + rdev->desc_nr;
771 771
772 if (desc->state & (1<<MD_DISK_FAULTY)) 772 if (desc->state & (1<<MD_DISK_FAULTY))
773 set_bit(Faulty, &rdev->flags); 773 set_bit(Faulty, &rdev->flags);
774 else if (desc->state & (1<<MD_DISK_SYNC) && 774 else if (desc->state & (1<<MD_DISK_SYNC) &&
775 desc->raid_disk < mddev->raid_disks) { 775 desc->raid_disk < mddev->raid_disks) {
776 set_bit(In_sync, &rdev->flags); 776 set_bit(In_sync, &rdev->flags);
777 rdev->raid_disk = desc->raid_disk; 777 rdev->raid_disk = desc->raid_disk;
778 } 778 }
779 if (desc->state & (1<<MD_DISK_WRITEMOSTLY)) 779 if (desc->state & (1<<MD_DISK_WRITEMOSTLY))
780 set_bit(WriteMostly, &rdev->flags); 780 set_bit(WriteMostly, &rdev->flags);
781 } else /* MULTIPATH are always insync */ 781 } else /* MULTIPATH are always insync */
782 set_bit(In_sync, &rdev->flags); 782 set_bit(In_sync, &rdev->flags);
783 return 0; 783 return 0;
784 } 784 }
785 785
786 /* 786 /*
787 * sync_super for 0.90.0 787 * sync_super for 0.90.0
788 */ 788 */
789 static void super_90_sync(mddev_t *mddev, mdk_rdev_t *rdev) 789 static void super_90_sync(mddev_t *mddev, mdk_rdev_t *rdev)
790 { 790 {
791 mdp_super_t *sb; 791 mdp_super_t *sb;
792 struct list_head *tmp; 792 struct list_head *tmp;
793 mdk_rdev_t *rdev2; 793 mdk_rdev_t *rdev2;
794 int next_spare = mddev->raid_disks; 794 int next_spare = mddev->raid_disks;
795 795
796 796
797 /* make rdev->sb match mddev data.. 797 /* make rdev->sb match mddev data..
798 * 798 *
799 * 1/ zero out disks 799 * 1/ zero out disks
800 * 2/ Add info for each disk, keeping track of highest desc_nr (next_spare); 800 * 2/ Add info for each disk, keeping track of highest desc_nr (next_spare);
801 * 3/ any empty disks < next_spare become removed 801 * 3/ any empty disks < next_spare become removed
802 * 802 *
803 * disks[0] gets initialised to REMOVED because 803 * disks[0] gets initialised to REMOVED because
804 * we cannot be sure from other fields if it has 804 * we cannot be sure from other fields if it has
805 * been initialised or not. 805 * been initialised or not.
806 */ 806 */
807 int i; 807 int i;
808 int active=0, working=0,failed=0,spare=0,nr_disks=0; 808 int active=0, working=0,failed=0,spare=0,nr_disks=0;
809 809
810 rdev->sb_size = MD_SB_BYTES; 810 rdev->sb_size = MD_SB_BYTES;
811 811
812 sb = (mdp_super_t*)page_address(rdev->sb_page); 812 sb = (mdp_super_t*)page_address(rdev->sb_page);
813 813
814 memset(sb, 0, sizeof(*sb)); 814 memset(sb, 0, sizeof(*sb));
815 815
816 sb->md_magic = MD_SB_MAGIC; 816 sb->md_magic = MD_SB_MAGIC;
817 sb->major_version = mddev->major_version; 817 sb->major_version = mddev->major_version;
818 sb->minor_version = mddev->minor_version; 818 sb->minor_version = mddev->minor_version;
819 sb->patch_version = mddev->patch_version; 819 sb->patch_version = mddev->patch_version;
820 sb->gvalid_words = 0; /* ignored */ 820 sb->gvalid_words = 0; /* ignored */
821 memcpy(&sb->set_uuid0, mddev->uuid+0, 4); 821 memcpy(&sb->set_uuid0, mddev->uuid+0, 4);
822 memcpy(&sb->set_uuid1, mddev->uuid+4, 4); 822 memcpy(&sb->set_uuid1, mddev->uuid+4, 4);
823 memcpy(&sb->set_uuid2, mddev->uuid+8, 4); 823 memcpy(&sb->set_uuid2, mddev->uuid+8, 4);
824 memcpy(&sb->set_uuid3, mddev->uuid+12,4); 824 memcpy(&sb->set_uuid3, mddev->uuid+12,4);
825 825
826 sb->ctime = mddev->ctime; 826 sb->ctime = mddev->ctime;
827 sb->level = mddev->level; 827 sb->level = mddev->level;
828 sb->size = mddev->size; 828 sb->size = mddev->size;
829 sb->raid_disks = mddev->raid_disks; 829 sb->raid_disks = mddev->raid_disks;
830 sb->md_minor = mddev->md_minor; 830 sb->md_minor = mddev->md_minor;
831 sb->not_persistent = !mddev->persistent; 831 sb->not_persistent = !mddev->persistent;
832 sb->utime = mddev->utime; 832 sb->utime = mddev->utime;
833 sb->state = 0; 833 sb->state = 0;
834 sb->events_hi = (mddev->events>>32); 834 sb->events_hi = (mddev->events>>32);
835 sb->events_lo = (u32)mddev->events; 835 sb->events_lo = (u32)mddev->events;
836 836
837 if (mddev->in_sync) 837 if (mddev->in_sync)
838 { 838 {
839 sb->recovery_cp = mddev->recovery_cp; 839 sb->recovery_cp = mddev->recovery_cp;
840 sb->cp_events_hi = (mddev->events>>32); 840 sb->cp_events_hi = (mddev->events>>32);
841 sb->cp_events_lo = (u32)mddev->events; 841 sb->cp_events_lo = (u32)mddev->events;
842 if (mddev->recovery_cp == MaxSector) 842 if (mddev->recovery_cp == MaxSector)
843 sb->state = (1<< MD_SB_CLEAN); 843 sb->state = (1<< MD_SB_CLEAN);
844 } else 844 } else
845 sb->recovery_cp = 0; 845 sb->recovery_cp = 0;
846 846
847 sb->layout = mddev->layout; 847 sb->layout = mddev->layout;
848 sb->chunk_size = mddev->chunk_size; 848 sb->chunk_size = mddev->chunk_size;
849 849
850 if (mddev->bitmap && mddev->bitmap_file == NULL) 850 if (mddev->bitmap && mddev->bitmap_file == NULL)
851 sb->state |= (1<<MD_SB_BITMAP_PRESENT); 851 sb->state |= (1<<MD_SB_BITMAP_PRESENT);
852 852
853 sb->disks[0].state = (1<<MD_DISK_REMOVED); 853 sb->disks[0].state = (1<<MD_DISK_REMOVED);
854 ITERATE_RDEV(mddev,rdev2,tmp) { 854 ITERATE_RDEV(mddev,rdev2,tmp) {
855 mdp_disk_t *d; 855 mdp_disk_t *d;
856 int desc_nr; 856 int desc_nr;
857 if (rdev2->raid_disk >= 0 && test_bit(In_sync, &rdev2->flags) 857 if (rdev2->raid_disk >= 0 && test_bit(In_sync, &rdev2->flags)
858 && !test_bit(Faulty, &rdev2->flags)) 858 && !test_bit(Faulty, &rdev2->flags))
859 desc_nr = rdev2->raid_disk; 859 desc_nr = rdev2->raid_disk;
860 else 860 else
861 desc_nr = next_spare++; 861 desc_nr = next_spare++;
862 rdev2->desc_nr = desc_nr; 862 rdev2->desc_nr = desc_nr;
863 d = &sb->disks[rdev2->desc_nr]; 863 d = &sb->disks[rdev2->desc_nr];
864 nr_disks++; 864 nr_disks++;
865 d->number = rdev2->desc_nr; 865 d->number = rdev2->desc_nr;
866 d->major = MAJOR(rdev2->bdev->bd_dev); 866 d->major = MAJOR(rdev2->bdev->bd_dev);
867 d->minor = MINOR(rdev2->bdev->bd_dev); 867 d->minor = MINOR(rdev2->bdev->bd_dev);
868 if (rdev2->raid_disk >= 0 && test_bit(In_sync, &rdev2->flags) 868 if (rdev2->raid_disk >= 0 && test_bit(In_sync, &rdev2->flags)
869 && !test_bit(Faulty, &rdev2->flags)) 869 && !test_bit(Faulty, &rdev2->flags))
870 d->raid_disk = rdev2->raid_disk; 870 d->raid_disk = rdev2->raid_disk;
871 else 871 else
872 d->raid_disk = rdev2->desc_nr; /* compatibility */ 872 d->raid_disk = rdev2->desc_nr; /* compatibility */
873 if (test_bit(Faulty, &rdev2->flags)) { 873 if (test_bit(Faulty, &rdev2->flags)) {
874 d->state = (1<<MD_DISK_FAULTY); 874 d->state = (1<<MD_DISK_FAULTY);
875 failed++; 875 failed++;
876 } else if (test_bit(In_sync, &rdev2->flags)) { 876 } else if (test_bit(In_sync, &rdev2->flags)) {
877 d->state = (1<<MD_DISK_ACTIVE); 877 d->state = (1<<MD_DISK_ACTIVE);
878 d->state |= (1<<MD_DISK_SYNC); 878 d->state |= (1<<MD_DISK_SYNC);
879 active++; 879 active++;
880 working++; 880 working++;
881 } else { 881 } else {
882 d->state = 0; 882 d->state = 0;
883 spare++; 883 spare++;
884 working++; 884 working++;
885 } 885 }
886 if (test_bit(WriteMostly, &rdev2->flags)) 886 if (test_bit(WriteMostly, &rdev2->flags))
887 d->state |= (1<<MD_DISK_WRITEMOSTLY); 887 d->state |= (1<<MD_DISK_WRITEMOSTLY);
888 } 888 }
889 /* now set the "removed" and "faulty" bits on any missing devices */ 889 /* now set the "removed" and "faulty" bits on any missing devices */
890 for (i=0 ; i < mddev->raid_disks ; i++) { 890 for (i=0 ; i < mddev->raid_disks ; i++) {
891 mdp_disk_t *d = &sb->disks[i]; 891 mdp_disk_t *d = &sb->disks[i];
892 if (d->state == 0 && d->number == 0) { 892 if (d->state == 0 && d->number == 0) {
893 d->number = i; 893 d->number = i;
894 d->raid_disk = i; 894 d->raid_disk = i;
895 d->state = (1<<MD_DISK_REMOVED); 895 d->state = (1<<MD_DISK_REMOVED);
896 d->state |= (1<<MD_DISK_FAULTY); 896 d->state |= (1<<MD_DISK_FAULTY);
897 failed++; 897 failed++;
898 } 898 }
899 } 899 }
900 sb->nr_disks = nr_disks; 900 sb->nr_disks = nr_disks;
901 sb->active_disks = active; 901 sb->active_disks = active;
902 sb->working_disks = working; 902 sb->working_disks = working;
903 sb->failed_disks = failed; 903 sb->failed_disks = failed;
904 sb->spare_disks = spare; 904 sb->spare_disks = spare;
905 905
906 sb->this_disk = sb->disks[rdev->desc_nr]; 906 sb->this_disk = sb->disks[rdev->desc_nr];
907 sb->sb_csum = calc_sb_csum(sb); 907 sb->sb_csum = calc_sb_csum(sb);
908 } 908 }
909 909
910 /* 910 /*
911 * version 1 superblock 911 * version 1 superblock
912 */ 912 */
913 913
914 static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb) 914 static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb)
915 { 915 {
916 unsigned int disk_csum, csum; 916 unsigned int disk_csum, csum;
917 unsigned long long newcsum; 917 unsigned long long newcsum;
918 int size = 256 + le32_to_cpu(sb->max_dev)*2; 918 int size = 256 + le32_to_cpu(sb->max_dev)*2;
919 unsigned int *isuper = (unsigned int*)sb; 919 unsigned int *isuper = (unsigned int*)sb;
920 int i; 920 int i;
921 921
922 disk_csum = sb->sb_csum; 922 disk_csum = sb->sb_csum;
923 sb->sb_csum = 0; 923 sb->sb_csum = 0;
924 newcsum = 0; 924 newcsum = 0;
925 for (i=0; size>=4; size -= 4 ) 925 for (i=0; size>=4; size -= 4 )
926 newcsum += le32_to_cpu(*isuper++); 926 newcsum += le32_to_cpu(*isuper++);
927 927
928 if (size == 2) 928 if (size == 2)
929 newcsum += le16_to_cpu(*(unsigned short*) isuper); 929 newcsum += le16_to_cpu(*(unsigned short*) isuper);
930 930
931 csum = (newcsum & 0xffffffff) + (newcsum >> 32); 931 csum = (newcsum & 0xffffffff) + (newcsum >> 32);
932 sb->sb_csum = disk_csum; 932 sb->sb_csum = disk_csum;
933 return cpu_to_le32(csum); 933 return cpu_to_le32(csum);
934 } 934 }
935 935
936 static int super_1_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) 936 static int super_1_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version)
937 { 937 {
938 struct mdp_superblock_1 *sb; 938 struct mdp_superblock_1 *sb;
939 int ret; 939 int ret;
940 sector_t sb_offset; 940 sector_t sb_offset;
941 char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; 941 char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE];
942 int bmask; 942 int bmask;
943 943
944 /* 944 /*
945 * Calculate the position of the superblock. 945 * Calculate the position of the superblock.
946 * It is always aligned to a 4K boundary and 946 * It is always aligned to a 4K boundary and
947 * depeding on minor_version, it can be: 947 * depeding on minor_version, it can be:
948 * 0: At least 8K, but less than 12K, from end of device 948 * 0: At least 8K, but less than 12K, from end of device
949 * 1: At start of device 949 * 1: At start of device
950 * 2: 4K from start of device. 950 * 2: 4K from start of device.
951 */ 951 */
952 switch(minor_version) { 952 switch(minor_version) {
953 case 0: 953 case 0:
954 sb_offset = rdev->bdev->bd_inode->i_size >> 9; 954 sb_offset = rdev->bdev->bd_inode->i_size >> 9;
955 sb_offset -= 8*2; 955 sb_offset -= 8*2;
956 sb_offset &= ~(sector_t)(4*2-1); 956 sb_offset &= ~(sector_t)(4*2-1);
957 /* convert from sectors to K */ 957 /* convert from sectors to K */
958 sb_offset /= 2; 958 sb_offset /= 2;
959 break; 959 break;
960 case 1: 960 case 1:
961 sb_offset = 0; 961 sb_offset = 0;
962 break; 962 break;
963 case 2: 963 case 2:
964 sb_offset = 4; 964 sb_offset = 4;
965 break; 965 break;
966 default: 966 default:
967 return -EINVAL; 967 return -EINVAL;
968 } 968 }
969 rdev->sb_offset = sb_offset; 969 rdev->sb_offset = sb_offset;
970 970
971 /* superblock is rarely larger than 1K, but it can be larger, 971 /* superblock is rarely larger than 1K, but it can be larger,
972 * and it is safe to read 4k, so we do that 972 * and it is safe to read 4k, so we do that
973 */ 973 */
974 ret = read_disk_sb(rdev, 4096); 974 ret = read_disk_sb(rdev, 4096);
975 if (ret) return ret; 975 if (ret) return ret;
976 976
977 977
978 sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); 978 sb = (struct mdp_superblock_1*)page_address(rdev->sb_page);
979 979
980 if (sb->magic != cpu_to_le32(MD_SB_MAGIC) || 980 if (sb->magic != cpu_to_le32(MD_SB_MAGIC) ||
981 sb->major_version != cpu_to_le32(1) || 981 sb->major_version != cpu_to_le32(1) ||
982 le32_to_cpu(sb->max_dev) > (4096-256)/2 || 982 le32_to_cpu(sb->max_dev) > (4096-256)/2 ||
983 le64_to_cpu(sb->super_offset) != (rdev->sb_offset<<1) || 983 le64_to_cpu(sb->super_offset) != (rdev->sb_offset<<1) ||
984 (le32_to_cpu(sb->feature_map) & ~MD_FEATURE_ALL) != 0) 984 (le32_to_cpu(sb->feature_map) & ~MD_FEATURE_ALL) != 0)
985 return -EINVAL; 985 return -EINVAL;
986 986
987 if (calc_sb_1_csum(sb) != sb->sb_csum) { 987 if (calc_sb_1_csum(sb) != sb->sb_csum) {
988 printk("md: invalid superblock checksum on %s\n", 988 printk("md: invalid superblock checksum on %s\n",
989 bdevname(rdev->bdev,b)); 989 bdevname(rdev->bdev,b));
990 return -EINVAL; 990 return -EINVAL;
991 } 991 }
992 if (le64_to_cpu(sb->data_size) < 10) { 992 if (le64_to_cpu(sb->data_size) < 10) {
993 printk("md: data_size too small on %s\n", 993 printk("md: data_size too small on %s\n",
994 bdevname(rdev->bdev,b)); 994 bdevname(rdev->bdev,b));
995 return -EINVAL; 995 return -EINVAL;
996 } 996 }
997 rdev->preferred_minor = 0xffff; 997 rdev->preferred_minor = 0xffff;
998 rdev->data_offset = le64_to_cpu(sb->data_offset); 998 rdev->data_offset = le64_to_cpu(sb->data_offset);
999 999
1000 rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256; 1000 rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256;
1001 bmask = queue_hardsect_size(rdev->bdev->bd_disk->queue)-1; 1001 bmask = queue_hardsect_size(rdev->bdev->bd_disk->queue)-1;
1002 if (rdev->sb_size & bmask) 1002 if (rdev->sb_size & bmask)
1003 rdev-> sb_size = (rdev->sb_size | bmask)+1; 1003 rdev-> sb_size = (rdev->sb_size | bmask)+1;
1004 1004
1005 if (refdev == 0) 1005 if (refdev == 0)
1006 return 1; 1006 return 1;
1007 else { 1007 else {
1008 __u64 ev1, ev2; 1008 __u64 ev1, ev2;
1009 struct mdp_superblock_1 *refsb = 1009 struct mdp_superblock_1 *refsb =
1010 (struct mdp_superblock_1*)page_address(refdev->sb_page); 1010 (struct mdp_superblock_1*)page_address(refdev->sb_page);
1011 1011
1012 if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 || 1012 if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 ||
1013 sb->level != refsb->level || 1013 sb->level != refsb->level ||
1014 sb->layout != refsb->layout || 1014 sb->layout != refsb->layout ||
1015 sb->chunksize != refsb->chunksize) { 1015 sb->chunksize != refsb->chunksize) {
1016 printk(KERN_WARNING "md: %s has strangely different" 1016 printk(KERN_WARNING "md: %s has strangely different"
1017 " superblock to %s\n", 1017 " superblock to %s\n",
1018 bdevname(rdev->bdev,b), 1018 bdevname(rdev->bdev,b),
1019 bdevname(refdev->bdev,b2)); 1019 bdevname(refdev->bdev,b2));
1020 return -EINVAL; 1020 return -EINVAL;
1021 } 1021 }
1022 ev1 = le64_to_cpu(sb->events); 1022 ev1 = le64_to_cpu(sb->events);
1023 ev2 = le64_to_cpu(refsb->events); 1023 ev2 = le64_to_cpu(refsb->events);
1024 1024
1025 if (ev1 > ev2) 1025 if (ev1 > ev2)
1026 return 1; 1026 return 1;
1027 } 1027 }
1028 if (minor_version) 1028 if (minor_version)
1029 rdev->size = ((rdev->bdev->bd_inode->i_size>>9) - le64_to_cpu(sb->data_offset)) / 2; 1029 rdev->size = ((rdev->bdev->bd_inode->i_size>>9) - le64_to_cpu(sb->data_offset)) / 2;
1030 else 1030 else
1031 rdev->size = rdev->sb_offset; 1031 rdev->size = rdev->sb_offset;
1032 if (rdev->size < le64_to_cpu(sb->data_size)/2) 1032 if (rdev->size < le64_to_cpu(sb->data_size)/2)
1033 return -EINVAL; 1033 return -EINVAL;
1034 rdev->size = le64_to_cpu(sb->data_size)/2; 1034 rdev->size = le64_to_cpu(sb->data_size)/2;
1035 if (le32_to_cpu(sb->chunksize)) 1035 if (le32_to_cpu(sb->chunksize))
1036 rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1); 1036 rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1);
1037 return 0; 1037 return 0;
1038 } 1038 }
1039 1039
1040 static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev) 1040 static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev)
1041 { 1041 {
1042 struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); 1042 struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page);
1043 1043
1044 rdev->raid_disk = -1; 1044 rdev->raid_disk = -1;
1045 rdev->flags = 0; 1045 rdev->flags = 0;
1046 if (mddev->raid_disks == 0) { 1046 if (mddev->raid_disks == 0) {
1047 mddev->major_version = 1; 1047 mddev->major_version = 1;
1048 mddev->patch_version = 0; 1048 mddev->patch_version = 0;
1049 mddev->persistent = 1; 1049 mddev->persistent = 1;
1050 mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9; 1050 mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9;
1051 mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1); 1051 mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1);
1052 mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1); 1052 mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1);
1053 mddev->level = le32_to_cpu(sb->level); 1053 mddev->level = le32_to_cpu(sb->level);
1054 mddev->layout = le32_to_cpu(sb->layout); 1054 mddev->layout = le32_to_cpu(sb->layout);
1055 mddev->raid_disks = le32_to_cpu(sb->raid_disks); 1055 mddev->raid_disks = le32_to_cpu(sb->raid_disks);
1056 mddev->size = le64_to_cpu(sb->size)/2; 1056 mddev->size = le64_to_cpu(sb->size)/2;
1057 mddev->events = le64_to_cpu(sb->events); 1057 mddev->events = le64_to_cpu(sb->events);
1058 mddev->bitmap_offset = 0; 1058 mddev->bitmap_offset = 0;
1059 mddev->default_bitmap_offset = 1024; 1059 mddev->default_bitmap_offset = 1024;
1060 1060
1061 mddev->recovery_cp = le64_to_cpu(sb->resync_offset); 1061 mddev->recovery_cp = le64_to_cpu(sb->resync_offset);
1062 memcpy(mddev->uuid, sb->set_uuid, 16); 1062 memcpy(mddev->uuid, sb->set_uuid, 16);
1063 1063
1064 mddev->max_disks = (4096-256)/2; 1064 mddev->max_disks = (4096-256)/2;
1065 1065
1066 if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) && 1066 if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) &&
1067 mddev->bitmap_file == NULL ) { 1067 mddev->bitmap_file == NULL ) {
1068 if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6 1068 if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6
1069 && mddev->level != 10) { 1069 && mddev->level != 10) {
1070 printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); 1070 printk(KERN_WARNING "md: bitmaps not supported for this level.\n");
1071 return -EINVAL; 1071 return -EINVAL;
1072 } 1072 }
1073 mddev->bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset); 1073 mddev->bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset);
1074 } 1074 }
1075 } else if (mddev->pers == NULL) { 1075 } else if (mddev->pers == NULL) {
1076 /* Insist of good event counter while assembling */ 1076 /* Insist of good event counter while assembling */
1077 __u64 ev1 = le64_to_cpu(sb->events); 1077 __u64 ev1 = le64_to_cpu(sb->events);
1078 ++ev1; 1078 ++ev1;
1079 if (ev1 < mddev->events) 1079 if (ev1 < mddev->events)
1080 return -EINVAL; 1080 return -EINVAL;
1081 } else if (mddev->bitmap) { 1081 } else if (mddev->bitmap) {
1082 /* If adding to array with a bitmap, then we can accept an 1082 /* If adding to array with a bitmap, then we can accept an
1083 * older device, but not too old. 1083 * older device, but not too old.
1084 */ 1084 */
1085 __u64 ev1 = le64_to_cpu(sb->events); 1085 __u64 ev1 = le64_to_cpu(sb->events);
1086 if (ev1 < mddev->bitmap->events_cleared) 1086 if (ev1 < mddev->bitmap->events_cleared)
1087 return 0; 1087 return 0;
1088 } else /* just a hot-add of a new device, leave raid_disk at -1 */ 1088 } else /* just a hot-add of a new device, leave raid_disk at -1 */
1089 return 0; 1089 return 0;
1090 1090
1091 if (mddev->level != LEVEL_MULTIPATH) { 1091 if (mddev->level != LEVEL_MULTIPATH) {
1092 int role; 1092 int role;
1093 rdev->desc_nr = le32_to_cpu(sb->dev_number); 1093 rdev->desc_nr = le32_to_cpu(sb->dev_number);
1094 role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); 1094 role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]);
1095 switch(role) { 1095 switch(role) {
1096 case 0xffff: /* spare */ 1096 case 0xffff: /* spare */
1097 break; 1097 break;
1098 case 0xfffe: /* faulty */ 1098 case 0xfffe: /* faulty */
1099 set_bit(Faulty, &rdev->flags); 1099 set_bit(Faulty, &rdev->flags);
1100 break; 1100 break;
1101 default: 1101 default:
1102 set_bit(In_sync, &rdev->flags); 1102 set_bit(In_sync, &rdev->flags);
1103 rdev->raid_disk = role; 1103 rdev->raid_disk = role;
1104 break; 1104 break;
1105 } 1105 }
1106 if (sb->devflags & WriteMostly1) 1106 if (sb->devflags & WriteMostly1)
1107 set_bit(WriteMostly, &rdev->flags); 1107 set_bit(WriteMostly, &rdev->flags);
1108 } else /* MULTIPATH are always insync */ 1108 } else /* MULTIPATH are always insync */
1109 set_bit(In_sync, &rdev->flags); 1109 set_bit(In_sync, &rdev->flags);
1110 1110
1111 return 0; 1111 return 0;
1112 } 1112 }
1113 1113
1114 static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev) 1114 static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev)
1115 { 1115 {
1116 struct mdp_superblock_1 *sb; 1116 struct mdp_superblock_1 *sb;
1117 struct list_head *tmp; 1117 struct list_head *tmp;
1118 mdk_rdev_t *rdev2; 1118 mdk_rdev_t *rdev2;
1119 int max_dev, i; 1119 int max_dev, i;
1120 /* make rdev->sb match mddev and rdev data. */ 1120 /* make rdev->sb match mddev and rdev data. */
1121 1121
1122 sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); 1122 sb = (struct mdp_superblock_1*)page_address(rdev->sb_page);
1123 1123
1124 sb->feature_map = 0; 1124 sb->feature_map = 0;
1125 sb->pad0 = 0; 1125 sb->pad0 = 0;
1126 memset(sb->pad1, 0, sizeof(sb->pad1)); 1126 memset(sb->pad1, 0, sizeof(sb->pad1));
1127 memset(sb->pad2, 0, sizeof(sb->pad2)); 1127 memset(sb->pad2, 0, sizeof(sb->pad2));
1128 memset(sb->pad3, 0, sizeof(sb->pad3)); 1128 memset(sb->pad3, 0, sizeof(sb->pad3));
1129 1129
1130 sb->utime = cpu_to_le64((__u64)mddev->utime); 1130 sb->utime = cpu_to_le64((__u64)mddev->utime);
1131 sb->events = cpu_to_le64(mddev->events); 1131 sb->events = cpu_to_le64(mddev->events);
1132 if (mddev->in_sync) 1132 if (mddev->in_sync)
1133 sb->resync_offset = cpu_to_le64(mddev->recovery_cp); 1133 sb->resync_offset = cpu_to_le64(mddev->recovery_cp);
1134 else 1134 else
1135 sb->resync_offset = cpu_to_le64(0); 1135 sb->resync_offset = cpu_to_le64(0);
1136 1136
1137 if (mddev->bitmap && mddev->bitmap_file == NULL) { 1137 if (mddev->bitmap && mddev->bitmap_file == NULL) {
1138 sb->bitmap_offset = cpu_to_le32((__u32)mddev->bitmap_offset); 1138 sb->bitmap_offset = cpu_to_le32((__u32)mddev->bitmap_offset);
1139 sb->feature_map = cpu_to_le32(MD_FEATURE_BITMAP_OFFSET); 1139 sb->feature_map = cpu_to_le32(MD_FEATURE_BITMAP_OFFSET);
1140 } 1140 }
1141 1141
1142 max_dev = 0; 1142 max_dev = 0;
1143 ITERATE_RDEV(mddev,rdev2,tmp) 1143 ITERATE_RDEV(mddev,rdev2,tmp)
1144 if (rdev2->desc_nr+1 > max_dev) 1144 if (rdev2->desc_nr+1 > max_dev)
1145 max_dev = rdev2->desc_nr+1; 1145 max_dev = rdev2->desc_nr+1;
1146 1146
1147 sb->max_dev = cpu_to_le32(max_dev); 1147 sb->max_dev = cpu_to_le32(max_dev);
1148 for (i=0; i<max_dev;i++) 1148 for (i=0; i<max_dev;i++)
1149 sb->dev_roles[i] = cpu_to_le16(0xfffe); 1149 sb->dev_roles[i] = cpu_to_le16(0xfffe);
1150 1150
1151 ITERATE_RDEV(mddev,rdev2,tmp) { 1151 ITERATE_RDEV(mddev,rdev2,tmp) {
1152 i = rdev2->desc_nr; 1152 i = rdev2->desc_nr;
1153 if (test_bit(Faulty, &rdev2->flags)) 1153 if (test_bit(Faulty, &rdev2->flags))
1154 sb->dev_roles[i] = cpu_to_le16(0xfffe); 1154 sb->dev_roles[i] = cpu_to_le16(0xfffe);
1155 else if (test_bit(In_sync, &rdev2->flags)) 1155 else if (test_bit(In_sync, &rdev2->flags))
1156 sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); 1156 sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
1157 else 1157 else
1158 sb->dev_roles[i] = cpu_to_le16(0xffff); 1158 sb->dev_roles[i] = cpu_to_le16(0xffff);
1159 } 1159 }
1160 1160
1161 sb->recovery_offset = cpu_to_le64(0); /* not supported yet */ 1161 sb->recovery_offset = cpu_to_le64(0); /* not supported yet */
1162 sb->sb_csum = calc_sb_1_csum(sb); 1162 sb->sb_csum = calc_sb_1_csum(sb);
1163 } 1163 }
1164 1164
1165 1165
1166 static struct super_type super_types[] = { 1166 static struct super_type super_types[] = {
1167 [0] = { 1167 [0] = {
1168 .name = "0.90.0", 1168 .name = "0.90.0",
1169 .owner = THIS_MODULE, 1169 .owner = THIS_MODULE,
1170 .load_super = super_90_load, 1170 .load_super = super_90_load,
1171 .validate_super = super_90_validate, 1171 .validate_super = super_90_validate,
1172 .sync_super = super_90_sync, 1172 .sync_super = super_90_sync,
1173 }, 1173 },
1174 [1] = { 1174 [1] = {
1175 .name = "md-1", 1175 .name = "md-1",
1176 .owner = THIS_MODULE, 1176 .owner = THIS_MODULE,
1177 .load_super = super_1_load, 1177 .load_super = super_1_load,
1178 .validate_super = super_1_validate, 1178 .validate_super = super_1_validate,
1179 .sync_super = super_1_sync, 1179 .sync_super = super_1_sync,
1180 }, 1180 },
1181 }; 1181 };
1182 1182
1183 static mdk_rdev_t * match_dev_unit(mddev_t *mddev, mdk_rdev_t *dev) 1183 static mdk_rdev_t * match_dev_unit(mddev_t *mddev, mdk_rdev_t *dev)
1184 { 1184 {
1185 struct list_head *tmp; 1185 struct list_head *tmp;
1186 mdk_rdev_t *rdev; 1186 mdk_rdev_t *rdev;
1187 1187
1188 ITERATE_RDEV(mddev,rdev,tmp) 1188 ITERATE_RDEV(mddev,rdev,tmp)
1189 if (rdev->bdev->bd_contains == dev->bdev->bd_contains) 1189 if (rdev->bdev->bd_contains == dev->bdev->bd_contains)
1190 return rdev; 1190 return rdev;
1191 1191
1192 return NULL; 1192 return NULL;
1193 } 1193 }
1194 1194
1195 static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) 1195 static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
1196 { 1196 {
1197 struct list_head *tmp; 1197 struct list_head *tmp;
1198 mdk_rdev_t *rdev; 1198 mdk_rdev_t *rdev;
1199 1199
1200 ITERATE_RDEV(mddev1,rdev,tmp) 1200 ITERATE_RDEV(mddev1,rdev,tmp)
1201 if (match_dev_unit(mddev2, rdev)) 1201 if (match_dev_unit(mddev2, rdev))
1202 return 1; 1202 return 1;
1203 1203
1204 return 0; 1204 return 0;
1205 } 1205 }
1206 1206
1207 static LIST_HEAD(pending_raid_disks); 1207 static LIST_HEAD(pending_raid_disks);
1208 1208
1209 static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) 1209 static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
1210 { 1210 {
1211 mdk_rdev_t *same_pdev; 1211 mdk_rdev_t *same_pdev;
1212 char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; 1212 char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE];
1213 struct kobject *ko; 1213 struct kobject *ko;
1214 1214
1215 if (rdev->mddev) { 1215 if (rdev->mddev) {
1216 MD_BUG(); 1216 MD_BUG();
1217 return -EINVAL; 1217 return -EINVAL;
1218 } 1218 }
1219 same_pdev = match_dev_unit(mddev, rdev); 1219 same_pdev = match_dev_unit(mddev, rdev);
1220 if (same_pdev) 1220 if (same_pdev)
1221 printk(KERN_WARNING 1221 printk(KERN_WARNING
1222 "%s: WARNING: %s appears to be on the same physical" 1222 "%s: WARNING: %s appears to be on the same physical"
1223 " disk as %s. True\n protection against single-disk" 1223 " disk as %s. True\n protection against single-disk"
1224 " failure might be compromised.\n", 1224 " failure might be compromised.\n",
1225 mdname(mddev), bdevname(rdev->bdev,b), 1225 mdname(mddev), bdevname(rdev->bdev,b),
1226 bdevname(same_pdev->bdev,b2)); 1226 bdevname(same_pdev->bdev,b2));
1227 1227
1228 /* Verify rdev->desc_nr is unique. 1228 /* Verify rdev->desc_nr is unique.
1229 * If it is -1, assign a free number, else 1229 * If it is -1, assign a free number, else
1230 * check number is not in use 1230 * check number is not in use
1231 */ 1231 */
1232 if (rdev->desc_nr < 0) { 1232 if (rdev->desc_nr < 0) {
1233 int choice = 0; 1233 int choice = 0;
1234 if (mddev->pers) choice = mddev->raid_disks; 1234 if (mddev->pers) choice = mddev->raid_disks;
1235 while (find_rdev_nr(mddev, choice)) 1235 while (find_rdev_nr(mddev, choice))
1236 choice++; 1236 choice++;
1237 rdev->desc_nr = choice; 1237 rdev->desc_nr = choice;
1238 } else { 1238 } else {
1239 if (find_rdev_nr(mddev, rdev->desc_nr)) 1239 if (find_rdev_nr(mddev, rdev->desc_nr))
1240 return -EBUSY; 1240 return -EBUSY;
1241 } 1241 }
1242 bdevname(rdev->bdev,b); 1242 bdevname(rdev->bdev,b);
1243 if (kobject_set_name(&rdev->kobj, "dev-%s", b) < 0) 1243 if (kobject_set_name(&rdev->kobj, "dev-%s", b) < 0)
1244 return -ENOMEM; 1244 return -ENOMEM;
1245 1245
1246 list_add(&rdev->same_set, &mddev->disks); 1246 list_add(&rdev->same_set, &mddev->disks);
1247 rdev->mddev = mddev; 1247 rdev->mddev = mddev;
1248 printk(KERN_INFO "md: bind<%s>\n", b); 1248 printk(KERN_INFO "md: bind<%s>\n", b);
1249 1249
1250 rdev->kobj.parent = &mddev->kobj; 1250 rdev->kobj.parent = &mddev->kobj;
1251 kobject_add(&rdev->kobj); 1251 kobject_add(&rdev->kobj);
1252 1252
1253 if (rdev->bdev->bd_part) 1253 if (rdev->bdev->bd_part)
1254 ko = &rdev->bdev->bd_part->kobj; 1254 ko = &rdev->bdev->bd_part->kobj;
1255 else 1255 else
1256 ko = &rdev->bdev->bd_disk->kobj; 1256 ko = &rdev->bdev->bd_disk->kobj;
1257 sysfs_create_link(&rdev->kobj, ko, "block"); 1257 sysfs_create_link(&rdev->kobj, ko, "block");
1258 return 0; 1258 return 0;
1259 } 1259 }
1260 1260
1261 static void unbind_rdev_from_array(mdk_rdev_t * rdev) 1261 static void unbind_rdev_from_array(mdk_rdev_t * rdev)
1262 { 1262 {
1263 char b[BDEVNAME_SIZE]; 1263 char b[BDEVNAME_SIZE];
1264 if (!rdev->mddev) { 1264 if (!rdev->mddev) {
1265 MD_BUG(); 1265 MD_BUG();
1266 return; 1266 return;
1267 } 1267 }
1268 list_del_init(&rdev->same_set); 1268 list_del_init(&rdev->same_set);
1269 printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b)); 1269 printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
1270 rdev->mddev = NULL; 1270 rdev->mddev = NULL;
1271 sysfs_remove_link(&rdev->kobj, "block"); 1271 sysfs_remove_link(&rdev->kobj, "block");
1272 kobject_del(&rdev->kobj); 1272 kobject_del(&rdev->kobj);
1273 } 1273 }
1274 1274
1275 /* 1275 /*
1276 * prevent the device from being mounted, repartitioned or 1276 * prevent the device from being mounted, repartitioned or
1277 * otherwise reused by a RAID array (or any other kernel 1277 * otherwise reused by a RAID array (or any other kernel
1278 * subsystem), by bd_claiming the device. 1278 * subsystem), by bd_claiming the device.
1279 */ 1279 */
1280 static int lock_rdev(mdk_rdev_t *rdev, dev_t dev) 1280 static int lock_rdev(mdk_rdev_t *rdev, dev_t dev)
1281 { 1281 {
1282 int err = 0; 1282 int err = 0;
1283 struct block_device *bdev; 1283 struct block_device *bdev;
1284 char b[BDEVNAME_SIZE]; 1284 char b[BDEVNAME_SIZE];
1285 1285
1286 bdev = open_by_devnum(dev, FMODE_READ|FMODE_WRITE); 1286 bdev = open_by_devnum(dev, FMODE_READ|FMODE_WRITE);
1287 if (IS_ERR(bdev)) { 1287 if (IS_ERR(bdev)) {
1288 printk(KERN_ERR "md: could not open %s.\n", 1288 printk(KERN_ERR "md: could not open %s.\n",
1289 __bdevname(dev, b)); 1289 __bdevname(dev, b));
1290 return PTR_ERR(bdev); 1290 return PTR_ERR(bdev);
1291 } 1291 }
1292 err = bd_claim(bdev, rdev); 1292 err = bd_claim(bdev, rdev);
1293 if (err) { 1293 if (err) {
1294 printk(KERN_ERR "md: could not bd_claim %s.\n", 1294 printk(KERN_ERR "md: could not bd_claim %s.\n",
1295 bdevname(bdev, b)); 1295 bdevname(bdev, b));
1296 blkdev_put(bdev); 1296 blkdev_put(bdev);
1297 return err; 1297 return err;
1298 } 1298 }
1299 rdev->bdev = bdev; 1299 rdev->bdev = bdev;
1300 return err; 1300 return err;
1301 } 1301 }
1302 1302
1303 static void unlock_rdev(mdk_rdev_t *rdev) 1303 static void unlock_rdev(mdk_rdev_t *rdev)
1304 { 1304 {
1305 struct block_device *bdev = rdev->bdev; 1305 struct block_device *bdev = rdev->bdev;
1306 rdev->bdev = NULL; 1306 rdev->bdev = NULL;
1307 if (!bdev) 1307 if (!bdev)
1308 MD_BUG(); 1308 MD_BUG();
1309 bd_release(bdev); 1309 bd_release(bdev);
1310 blkdev_put(bdev); 1310 blkdev_put(bdev);
1311 } 1311 }
1312 1312
1313 void md_autodetect_dev(dev_t dev); 1313 void md_autodetect_dev(dev_t dev);
1314 1314
1315 static void export_rdev(mdk_rdev_t * rdev) 1315 static void export_rdev(mdk_rdev_t * rdev)
1316 { 1316 {
1317 char b[BDEVNAME_SIZE]; 1317 char b[BDEVNAME_SIZE];
1318 printk(KERN_INFO "md: export_rdev(%s)\n", 1318 printk(KERN_INFO "md: export_rdev(%s)\n",
1319 bdevname(rdev->bdev,b)); 1319 bdevname(rdev->bdev,b));
1320 if (rdev->mddev) 1320 if (rdev->mddev)
1321 MD_BUG(); 1321 MD_BUG();
1322 free_disk_sb(rdev); 1322 free_disk_sb(rdev);
1323 list_del_init(&rdev->same_set); 1323 list_del_init(&rdev->same_set);
1324 #ifndef MODULE 1324 #ifndef MODULE
1325 md_autodetect_dev(rdev->bdev->bd_dev); 1325 md_autodetect_dev(rdev->bdev->bd_dev);
1326 #endif 1326 #endif
1327 unlock_rdev(rdev); 1327 unlock_rdev(rdev);
1328 kobject_put(&rdev->kobj); 1328 kobject_put(&rdev->kobj);
1329 } 1329 }
1330 1330
1331 static void kick_rdev_from_array(mdk_rdev_t * rdev) 1331 static void kick_rdev_from_array(mdk_rdev_t * rdev)
1332 { 1332 {
1333 unbind_rdev_from_array(rdev); 1333 unbind_rdev_from_array(rdev);
1334 export_rdev(rdev); 1334 export_rdev(rdev);
1335 } 1335 }
1336 1336
1337 static void export_array(mddev_t *mddev) 1337 static void export_array(mddev_t *mddev)
1338 { 1338 {
1339 struct list_head *tmp; 1339 struct list_head *tmp;
1340 mdk_rdev_t *rdev; 1340 mdk_rdev_t *rdev;
1341 1341
1342 ITERATE_RDEV(mddev,rdev,tmp) { 1342 ITERATE_RDEV(mddev,rdev,tmp) {
1343 if (!rdev->mddev) { 1343 if (!rdev->mddev) {
1344 MD_BUG(); 1344 MD_BUG();
1345 continue; 1345 continue;
1346 } 1346 }
1347 kick_rdev_from_array(rdev); 1347 kick_rdev_from_array(rdev);
1348 } 1348 }
1349 if (!list_empty(&mddev->disks)) 1349 if (!list_empty(&mddev->disks))
1350 MD_BUG(); 1350 MD_BUG();
1351 mddev->raid_disks = 0; 1351 mddev->raid_disks = 0;
1352 mddev->major_version = 0; 1352 mddev->major_version = 0;
1353 } 1353 }
1354 1354
1355 static void print_desc(mdp_disk_t *desc) 1355 static void print_desc(mdp_disk_t *desc)
1356 { 1356 {
1357 printk(" DISK<N:%d,(%d,%d),R:%d,S:%d>\n", desc->number, 1357 printk(" DISK<N:%d,(%d,%d),R:%d,S:%d>\n", desc->number,
1358 desc->major,desc->minor,desc->raid_disk,desc->state); 1358 desc->major,desc->minor,desc->raid_disk,desc->state);
1359 } 1359 }
1360 1360
1361 static void print_sb(mdp_super_t *sb) 1361 static void print_sb(mdp_super_t *sb)
1362 { 1362 {
1363 int i; 1363 int i;
1364 1364
1365 printk(KERN_INFO 1365 printk(KERN_INFO
1366 "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", 1366 "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n",
1367 sb->major_version, sb->minor_version, sb->patch_version, 1367 sb->major_version, sb->minor_version, sb->patch_version,
1368 sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, 1368 sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3,
1369 sb->ctime); 1369 sb->ctime);
1370 printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", 1370 printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n",
1371 sb->level, sb->size, sb->nr_disks, sb->raid_disks, 1371 sb->level, sb->size, sb->nr_disks, sb->raid_disks,
1372 sb->md_minor, sb->layout, sb->chunk_size); 1372 sb->md_minor, sb->layout, sb->chunk_size);
1373 printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d" 1373 printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d"
1374 " FD:%d SD:%d CSUM:%08x E:%08lx\n", 1374 " FD:%d SD:%d CSUM:%08x E:%08lx\n",
1375 sb->utime, sb->state, sb->active_disks, sb->working_disks, 1375 sb->utime, sb->state, sb->active_disks, sb->working_disks,
1376 sb->failed_disks, sb->spare_disks, 1376 sb->failed_disks, sb->spare_disks,
1377 sb->sb_csum, (unsigned long)sb->events_lo); 1377 sb->sb_csum, (unsigned long)sb->events_lo);
1378 1378
1379 printk(KERN_INFO); 1379 printk(KERN_INFO);
1380 for (i = 0; i < MD_SB_DISKS; i++) { 1380 for (i = 0; i < MD_SB_DISKS; i++) {
1381 mdp_disk_t *desc; 1381 mdp_disk_t *desc;
1382 1382
1383 desc = sb->disks + i; 1383 desc = sb->disks + i;
1384 if (desc->number || desc->major || desc->minor || 1384 if (desc->number || desc->major || desc->minor ||
1385 desc->raid_disk || (desc->state && (desc->state != 4))) { 1385 desc->raid_disk || (desc->state && (desc->state != 4))) {
1386 printk(" D %2d: ", i); 1386 printk(" D %2d: ", i);
1387 print_desc(desc); 1387 print_desc(desc);
1388 } 1388 }
1389 } 1389 }
1390 printk(KERN_INFO "md: THIS: "); 1390 printk(KERN_INFO "md: THIS: ");
1391 print_desc(&sb->this_disk); 1391 print_desc(&sb->this_disk);
1392 1392
1393 } 1393 }
1394 1394
1395 static void print_rdev(mdk_rdev_t *rdev) 1395 static void print_rdev(mdk_rdev_t *rdev)
1396 { 1396 {
1397 char b[BDEVNAME_SIZE]; 1397 char b[BDEVNAME_SIZE];
1398 printk(KERN_INFO "md: rdev %s, SZ:%08llu F:%d S:%d DN:%u\n", 1398 printk(KERN_INFO "md: rdev %s, SZ:%08llu F:%d S:%d DN:%u\n",
1399 bdevname(rdev->bdev,b), (unsigned long long)rdev->size, 1399 bdevname(rdev->bdev,b), (unsigned long long)rdev->size,
1400 test_bit(Faulty, &rdev->flags), test_bit(In_sync, &rdev->flags), 1400 test_bit(Faulty, &rdev->flags), test_bit(In_sync, &rdev->flags),
1401 rdev->desc_nr); 1401 rdev->desc_nr);
1402 if (rdev->sb_loaded) { 1402 if (rdev->sb_loaded) {
1403 printk(KERN_INFO "md: rdev superblock:\n"); 1403 printk(KERN_INFO "md: rdev superblock:\n");
1404 print_sb((mdp_super_t*)page_address(rdev->sb_page)); 1404 print_sb((mdp_super_t*)page_address(rdev->sb_page));
1405 } else 1405 } else
1406 printk(KERN_INFO "md: no rdev superblock!\n"); 1406 printk(KERN_INFO "md: no rdev superblock!\n");
1407 } 1407 }
1408 1408
1409 void md_print_devices(void) 1409 void md_print_devices(void)
1410 { 1410 {
1411 struct list_head *tmp, *tmp2; 1411 struct list_head *tmp, *tmp2;
1412 mdk_rdev_t *rdev; 1412 mdk_rdev_t *rdev;
1413 mddev_t *mddev; 1413 mddev_t *mddev;
1414 char b[BDEVNAME_SIZE]; 1414 char b[BDEVNAME_SIZE];
1415 1415
1416 printk("\n"); 1416 printk("\n");
1417 printk("md: **********************************\n"); 1417 printk("md: **********************************\n");
1418 printk("md: * <COMPLETE RAID STATE PRINTOUT> *\n"); 1418 printk("md: * <COMPLETE RAID STATE PRINTOUT> *\n");
1419 printk("md: **********************************\n"); 1419 printk("md: **********************************\n");
1420 ITERATE_MDDEV(mddev,tmp) { 1420 ITERATE_MDDEV(mddev,tmp) {
1421 1421
1422 if (mddev->bitmap) 1422 if (mddev->bitmap)
1423 bitmap_print_sb(mddev->bitmap); 1423 bitmap_print_sb(mddev->bitmap);
1424 else 1424 else
1425 printk("%s: ", mdname(mddev)); 1425 printk("%s: ", mdname(mddev));
1426 ITERATE_RDEV(mddev,rdev,tmp2) 1426 ITERATE_RDEV(mddev,rdev,tmp2)
1427 printk("<%s>", bdevname(rdev->bdev,b)); 1427 printk("<%s>", bdevname(rdev->bdev,b));
1428 printk("\n"); 1428 printk("\n");
1429 1429
1430 ITERATE_RDEV(mddev,rdev,tmp2) 1430 ITERATE_RDEV(mddev,rdev,tmp2)
1431 print_rdev(rdev); 1431 print_rdev(rdev);
1432 } 1432 }
1433 printk("md: **********************************\n"); 1433 printk("md: **********************************\n");
1434 printk("\n"); 1434 printk("\n");
1435 } 1435 }
1436 1436
1437 1437
1438 static void sync_sbs(mddev_t * mddev) 1438 static void sync_sbs(mddev_t * mddev)
1439 { 1439 {
1440 mdk_rdev_t *rdev; 1440 mdk_rdev_t *rdev;
1441 struct list_head *tmp; 1441 struct list_head *tmp;
1442 1442
1443 ITERATE_RDEV(mddev,rdev,tmp) { 1443 ITERATE_RDEV(mddev,rdev,tmp) {
1444 super_types[mddev->major_version]. 1444 super_types[mddev->major_version].
1445 sync_super(mddev, rdev); 1445 sync_super(mddev, rdev);
1446 rdev->sb_loaded = 1; 1446 rdev->sb_loaded = 1;
1447 } 1447 }
1448 } 1448 }
1449 1449
1450 static void md_update_sb(mddev_t * mddev) 1450 static void md_update_sb(mddev_t * mddev)
1451 { 1451 {
1452 int err; 1452 int err;
1453 struct list_head *tmp; 1453 struct list_head *tmp;
1454 mdk_rdev_t *rdev; 1454 mdk_rdev_t *rdev;
1455 int sync_req; 1455 int sync_req;
1456 1456
1457 repeat: 1457 repeat:
1458 spin_lock_irq(&mddev->write_lock); 1458 spin_lock_irq(&mddev->write_lock);
1459 sync_req = mddev->in_sync; 1459 sync_req = mddev->in_sync;
1460 mddev->utime = get_seconds(); 1460 mddev->utime = get_seconds();
1461 mddev->events ++; 1461 mddev->events ++;
1462 1462
1463 if (!mddev->events) { 1463 if (!mddev->events) {
1464 /* 1464 /*
1465 * oops, this 64-bit counter should never wrap. 1465 * oops, this 64-bit counter should never wrap.
1466 * Either we are in around ~1 trillion A.C., assuming 1466 * Either we are in around ~1 trillion A.C., assuming
1467 * 1 reboot per second, or we have a bug: 1467 * 1 reboot per second, or we have a bug:
1468 */ 1468 */
1469 MD_BUG(); 1469 MD_BUG();
1470 mddev->events --; 1470 mddev->events --;
1471 } 1471 }
1472 mddev->sb_dirty = 2; 1472 mddev->sb_dirty = 2;
1473 sync_sbs(mddev); 1473 sync_sbs(mddev);
1474 1474
1475 /* 1475 /*
1476 * do not write anything to disk if using 1476 * do not write anything to disk if using
1477 * nonpersistent superblocks 1477 * nonpersistent superblocks
1478 */ 1478 */
1479 if (!mddev->persistent) { 1479 if (!mddev->persistent) {
1480 mddev->sb_dirty = 0; 1480 mddev->sb_dirty = 0;
1481 spin_unlock_irq(&mddev->write_lock); 1481 spin_unlock_irq(&mddev->write_lock);
1482 wake_up(&mddev->sb_wait); 1482 wake_up(&mddev->sb_wait);
1483 return; 1483 return;
1484 } 1484 }
1485 spin_unlock_irq(&mddev->write_lock); 1485 spin_unlock_irq(&mddev->write_lock);
1486 1486
1487 dprintk(KERN_INFO 1487 dprintk(KERN_INFO
1488 "md: updating %s RAID superblock on device (in sync %d)\n", 1488 "md: updating %s RAID superblock on device (in sync %d)\n",
1489 mdname(mddev),mddev->in_sync); 1489 mdname(mddev),mddev->in_sync);
1490 1490
1491 err = bitmap_update_sb(mddev->bitmap); 1491 err = bitmap_update_sb(mddev->bitmap);
1492 ITERATE_RDEV(mddev,rdev,tmp) { 1492 ITERATE_RDEV(mddev,rdev,tmp) {
1493 char b[BDEVNAME_SIZE]; 1493 char b[BDEVNAME_SIZE];
1494 dprintk(KERN_INFO "md: "); 1494 dprintk(KERN_INFO "md: ");
1495 if (test_bit(Faulty, &rdev->flags)) 1495 if (test_bit(Faulty, &rdev->flags))
1496 dprintk("(skipping faulty "); 1496 dprintk("(skipping faulty ");
1497 1497
1498 dprintk("%s ", bdevname(rdev->bdev,b)); 1498 dprintk("%s ", bdevname(rdev->bdev,b));
1499 if (!test_bit(Faulty, &rdev->flags)) { 1499 if (!test_bit(Faulty, &rdev->flags)) {
1500 md_super_write(mddev,rdev, 1500 md_super_write(mddev,rdev,
1501 rdev->sb_offset<<1, rdev->sb_size, 1501 rdev->sb_offset<<1, rdev->sb_size,
1502 rdev->sb_page); 1502 rdev->sb_page);
1503 dprintk(KERN_INFO "(write) %s's sb offset: %llu\n", 1503 dprintk(KERN_INFO "(write) %s's sb offset: %llu\n",
1504 bdevname(rdev->bdev,b), 1504 bdevname(rdev->bdev,b),
1505 (unsigned long long)rdev->sb_offset); 1505 (unsigned long long)rdev->sb_offset);
1506 1506
1507 } else 1507 } else
1508 dprintk(")\n"); 1508 dprintk(")\n");
1509 if (mddev->level == LEVEL_MULTIPATH) 1509 if (mddev->level == LEVEL_MULTIPATH)
1510 /* only need to write one superblock... */ 1510 /* only need to write one superblock... */
1511 break; 1511 break;
1512 } 1512 }
1513 md_super_wait(mddev); 1513 md_super_wait(mddev);
1514 /* if there was a failure, sb_dirty was set to 1, and we re-write super */ 1514 /* if there was a failure, sb_dirty was set to 1, and we re-write super */
1515 1515
1516 spin_lock_irq(&mddev->write_lock); 1516 spin_lock_irq(&mddev->write_lock);
1517 if (mddev->in_sync != sync_req|| mddev->sb_dirty == 1) { 1517 if (mddev->in_sync != sync_req|| mddev->sb_dirty == 1) {
1518 /* have to write it out again */ 1518 /* have to write it out again */
1519 spin_unlock_irq(&mddev->write_lock); 1519 spin_unlock_irq(&mddev->write_lock);
1520 goto repeat; 1520 goto repeat;
1521 } 1521 }
1522 mddev->sb_dirty = 0; 1522 mddev->sb_dirty = 0;
1523 spin_unlock_irq(&mddev->write_lock); 1523 spin_unlock_irq(&mddev->write_lock);
1524 wake_up(&mddev->sb_wait); 1524 wake_up(&mddev->sb_wait);
1525 1525
1526 } 1526 }
1527 1527
1528 /* words written to sysfs files may, or my not, be \n terminated. 1528 /* words written to sysfs files may, or my not, be \n terminated.
1529 * We want to accept with case. For this we use cmd_match. 1529 * We want to accept with case. For this we use cmd_match.
1530 */ 1530 */
1531 static int cmd_match(const char *cmd, const char *str) 1531 static int cmd_match(const char *cmd, const char *str)
1532 { 1532 {
1533 /* See if cmd, written into a sysfs file, matches 1533 /* See if cmd, written into a sysfs file, matches
1534 * str. They must either be the same, or cmd can 1534 * str. They must either be the same, or cmd can
1535 * have a trailing newline 1535 * have a trailing newline
1536 */ 1536 */
1537 while (*cmd && *str && *cmd == *str) { 1537 while (*cmd && *str && *cmd == *str) {
1538 cmd++; 1538 cmd++;
1539 str++; 1539 str++;
1540 } 1540 }
1541 if (*cmd == '\n') 1541 if (*cmd == '\n')
1542 cmd++; 1542 cmd++;
1543 if (*str || *cmd) 1543 if (*str || *cmd)
1544 return 0; 1544 return 0;
1545 return 1; 1545 return 1;
1546 } 1546 }
1547 1547
1548 struct rdev_sysfs_entry { 1548 struct rdev_sysfs_entry {
1549 struct attribute attr; 1549 struct attribute attr;
1550 ssize_t (*show)(mdk_rdev_t *, char *); 1550 ssize_t (*show)(mdk_rdev_t *, char *);
1551 ssize_t (*store)(mdk_rdev_t *, const char *, size_t); 1551 ssize_t (*store)(mdk_rdev_t *, const char *, size_t);
1552 }; 1552 };
1553 1553
1554 static ssize_t 1554 static ssize_t
1555 state_show(mdk_rdev_t *rdev, char *page) 1555 state_show(mdk_rdev_t *rdev, char *page)
1556 { 1556 {
1557 char *sep = ""; 1557 char *sep = "";
1558 int len=0; 1558 int len=0;
1559 1559
1560 if (test_bit(Faulty, &rdev->flags)) { 1560 if (test_bit(Faulty, &rdev->flags)) {
1561 len+= sprintf(page+len, "%sfaulty",sep); 1561 len+= sprintf(page+len, "%sfaulty",sep);
1562 sep = ","; 1562 sep = ",";
1563 } 1563 }
1564 if (test_bit(In_sync, &rdev->flags)) { 1564 if (test_bit(In_sync, &rdev->flags)) {
1565 len += sprintf(page+len, "%sin_sync",sep); 1565 len += sprintf(page+len, "%sin_sync",sep);
1566 sep = ","; 1566 sep = ",";
1567 } 1567 }
1568 if (!test_bit(Faulty, &rdev->flags) && 1568 if (!test_bit(Faulty, &rdev->flags) &&
1569 !test_bit(In_sync, &rdev->flags)) { 1569 !test_bit(In_sync, &rdev->flags)) {
1570 len += sprintf(page+len, "%sspare", sep); 1570 len += sprintf(page+len, "%sspare", sep);
1571 sep = ","; 1571 sep = ",";
1572 } 1572 }
1573 return len+sprintf(page+len, "\n"); 1573 return len+sprintf(page+len, "\n");
1574 } 1574 }
1575 1575
1576 static struct rdev_sysfs_entry 1576 static struct rdev_sysfs_entry
1577 rdev_state = __ATTR_RO(state); 1577 rdev_state = __ATTR_RO(state);
1578 1578
1579 static ssize_t 1579 static ssize_t
1580 super_show(mdk_rdev_t *rdev, char *page) 1580 super_show(mdk_rdev_t *rdev, char *page)
1581 { 1581 {
1582 if (rdev->sb_loaded && rdev->sb_size) { 1582 if (rdev->sb_loaded && rdev->sb_size) {
1583 memcpy(page, page_address(rdev->sb_page), rdev->sb_size); 1583 memcpy(page, page_address(rdev->sb_page), rdev->sb_size);
1584 return rdev->sb_size; 1584 return rdev->sb_size;
1585 } else 1585 } else
1586 return 0; 1586 return 0;
1587 } 1587 }
1588 static struct rdev_sysfs_entry rdev_super = __ATTR_RO(super); 1588 static struct rdev_sysfs_entry rdev_super = __ATTR_RO(super);
1589 1589
1590 static struct attribute *rdev_default_attrs[] = { 1590 static struct attribute *rdev_default_attrs[] = {
1591 &rdev_state.attr, 1591 &rdev_state.attr,
1592 &rdev_super.attr, 1592 &rdev_super.attr,
1593 NULL, 1593 NULL,
1594 }; 1594 };
1595 static ssize_t 1595 static ssize_t
1596 rdev_attr_show(struct kobject *kobj, struct attribute *attr, char *page) 1596 rdev_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
1597 { 1597 {
1598 struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); 1598 struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr);
1599 mdk_rdev_t *rdev = container_of(kobj, mdk_rdev_t, kobj); 1599 mdk_rdev_t *rdev = container_of(kobj, mdk_rdev_t, kobj);
1600 1600
1601 if (!entry->show) 1601 if (!entry->show)
1602 return -EIO; 1602 return -EIO;
1603 return entry->show(rdev, page); 1603 return entry->show(rdev, page);
1604 } 1604 }
1605 1605
1606 static ssize_t 1606 static ssize_t
1607 rdev_attr_store(struct kobject *kobj, struct attribute *attr, 1607 rdev_attr_store(struct kobject *kobj, struct attribute *attr,
1608 const char *page, size_t length) 1608 const char *page, size_t length)
1609 { 1609 {
1610 struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); 1610 struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr);
1611 mdk_rdev_t *rdev = container_of(kobj, mdk_rdev_t, kobj); 1611 mdk_rdev_t *rdev = container_of(kobj, mdk_rdev_t, kobj);
1612 1612
1613 if (!entry->store) 1613 if (!entry->store)
1614 return -EIO; 1614 return -EIO;
1615 return entry->store(rdev, page, length); 1615 return entry->store(rdev, page, length);
1616 } 1616 }
1617 1617
1618 static void rdev_free(struct kobject *ko) 1618 static void rdev_free(struct kobject *ko)
1619 { 1619 {
1620 mdk_rdev_t *rdev = container_of(ko, mdk_rdev_t, kobj); 1620 mdk_rdev_t *rdev = container_of(ko, mdk_rdev_t, kobj);
1621 kfree(rdev); 1621 kfree(rdev);
1622 } 1622 }
1623 static struct sysfs_ops rdev_sysfs_ops = { 1623 static struct sysfs_ops rdev_sysfs_ops = {
1624 .show = rdev_attr_show, 1624 .show = rdev_attr_show,
1625 .store = rdev_attr_store, 1625 .store = rdev_attr_store,
1626 }; 1626 };
1627 static struct kobj_type rdev_ktype = { 1627 static struct kobj_type rdev_ktype = {
1628 .release = rdev_free, 1628 .release = rdev_free,
1629 .sysfs_ops = &rdev_sysfs_ops, 1629 .sysfs_ops = &rdev_sysfs_ops,
1630 .default_attrs = rdev_default_attrs, 1630 .default_attrs = rdev_default_attrs,
1631 }; 1631 };
1632 1632
1633 /* 1633 /*
1634 * Import a device. If 'super_format' >= 0, then sanity check the superblock 1634 * Import a device. If 'super_format' >= 0, then sanity check the superblock
1635 * 1635 *
1636 * mark the device faulty if: 1636 * mark the device faulty if:
1637 * 1637 *
1638 * - the device is nonexistent (zero size) 1638 * - the device is nonexistent (zero size)
1639 * - the device has no valid superblock 1639 * - the device has no valid superblock
1640 * 1640 *
1641 * a faulty rdev _never_ has rdev->sb set. 1641 * a faulty rdev _never_ has rdev->sb set.
1642 */ 1642 */
1643 static mdk_rdev_t *md_import_device(dev_t newdev, int super_format, int super_minor) 1643 static mdk_rdev_t *md_import_device(dev_t newdev, int super_format, int super_minor)
1644 { 1644 {
1645 char b[BDEVNAME_SIZE]; 1645 char b[BDEVNAME_SIZE];
1646 int err; 1646 int err;
1647 mdk_rdev_t *rdev; 1647 mdk_rdev_t *rdev;
1648 sector_t size; 1648 sector_t size;
1649 1649
1650 rdev = kzalloc(sizeof(*rdev), GFP_KERNEL); 1650 rdev = kzalloc(sizeof(*rdev), GFP_KERNEL);
1651 if (!rdev) { 1651 if (!rdev) {
1652 printk(KERN_ERR "md: could not alloc mem for new device!\n"); 1652 printk(KERN_ERR "md: could not alloc mem for new device!\n");
1653 return ERR_PTR(-ENOMEM); 1653 return ERR_PTR(-ENOMEM);
1654 } 1654 }
1655 1655
1656 if ((err = alloc_disk_sb(rdev))) 1656 if ((err = alloc_disk_sb(rdev)))
1657 goto abort_free; 1657 goto abort_free;
1658 1658
1659 err = lock_rdev(rdev, newdev); 1659 err = lock_rdev(rdev, newdev);
1660 if (err) 1660 if (err)
1661 goto abort_free; 1661 goto abort_free;
1662 1662
1663 rdev->kobj.parent = NULL; 1663 rdev->kobj.parent = NULL;
1664 rdev->kobj.ktype = &rdev_ktype; 1664 rdev->kobj.ktype = &rdev_ktype;
1665 kobject_init(&rdev->kobj); 1665 kobject_init(&rdev->kobj);
1666 1666
1667 rdev->desc_nr = -1; 1667 rdev->desc_nr = -1;
1668 rdev->flags = 0; 1668 rdev->flags = 0;
1669 rdev->data_offset = 0; 1669 rdev->data_offset = 0;
1670 atomic_set(&rdev->nr_pending, 0); 1670 atomic_set(&rdev->nr_pending, 0);
1671 atomic_set(&rdev->read_errors, 0); 1671 atomic_set(&rdev->read_errors, 0);
1672 1672
1673 size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; 1673 size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
1674 if (!size) { 1674 if (!size) {
1675 printk(KERN_WARNING 1675 printk(KERN_WARNING
1676 "md: %s has zero or unknown size, marking faulty!\n", 1676 "md: %s has zero or unknown size, marking faulty!\n",
1677 bdevname(rdev->bdev,b)); 1677 bdevname(rdev->bdev,b));
1678 err = -EINVAL; 1678 err = -EINVAL;
1679 goto abort_free; 1679 goto abort_free;
1680 } 1680 }
1681 1681
1682 if (super_format >= 0) { 1682 if (super_format >= 0) {
1683 err = super_types[super_format]. 1683 err = super_types[super_format].
1684 load_super(rdev, NULL, super_minor); 1684 load_super(rdev, NULL, super_minor);
1685 if (err == -EINVAL) { 1685 if (err == -EINVAL) {
1686 printk(KERN_WARNING 1686 printk(KERN_WARNING
1687 "md: %s has invalid sb, not importing!\n", 1687 "md: %s has invalid sb, not importing!\n",
1688 bdevname(rdev->bdev,b)); 1688 bdevname(rdev->bdev,b));
1689 goto abort_free; 1689 goto abort_free;
1690 } 1690 }
1691 if (err < 0) { 1691 if (err < 0) {
1692 printk(KERN_WARNING 1692 printk(KERN_WARNING
1693 "md: could not read %s's sb, not importing!\n", 1693 "md: could not read %s's sb, not importing!\n",
1694 bdevname(rdev->bdev,b)); 1694 bdevname(rdev->bdev,b));
1695 goto abort_free; 1695 goto abort_free;
1696 } 1696 }
1697 } 1697 }
1698 INIT_LIST_HEAD(&rdev->same_set); 1698 INIT_LIST_HEAD(&rdev->same_set);
1699 1699
1700 return rdev; 1700 return rdev;
1701 1701
1702 abort_free: 1702 abort_free:
1703 if (rdev->sb_page) { 1703 if (rdev->sb_page) {
1704 if (rdev->bdev) 1704 if (rdev->bdev)
1705 unlock_rdev(rdev); 1705 unlock_rdev(rdev);
1706 free_disk_sb(rdev); 1706 free_disk_sb(rdev);
1707 } 1707 }
1708 kfree(rdev); 1708 kfree(rdev);
1709 return ERR_PTR(err); 1709 return ERR_PTR(err);
1710 } 1710 }
1711 1711
1712 /* 1712 /*
1713 * Check a full RAID array for plausibility 1713 * Check a full RAID array for plausibility
1714 */ 1714 */
1715 1715
1716 1716
1717 static void analyze_sbs(mddev_t * mddev) 1717 static void analyze_sbs(mddev_t * mddev)
1718 { 1718 {
1719 int i; 1719 int i;
1720 struct list_head *tmp; 1720 struct list_head *tmp;
1721 mdk_rdev_t *rdev, *freshest; 1721 mdk_rdev_t *rdev, *freshest;
1722 char b[BDEVNAME_SIZE]; 1722 char b[BDEVNAME_SIZE];
1723 1723
1724 freshest = NULL; 1724 freshest = NULL;
1725 ITERATE_RDEV(mddev,rdev,tmp) 1725 ITERATE_RDEV(mddev,rdev,tmp)
1726 switch (super_types[mddev->major_version]. 1726 switch (super_types[mddev->major_version].
1727 load_super(rdev, freshest, mddev->minor_version)) { 1727 load_super(rdev, freshest, mddev->minor_version)) {
1728 case 1: 1728 case 1:
1729 freshest = rdev; 1729 freshest = rdev;
1730 break; 1730 break;
1731 case 0: 1731 case 0:
1732 break; 1732 break;
1733 default: 1733 default:
1734 printk( KERN_ERR \ 1734 printk( KERN_ERR \
1735 "md: fatal superblock inconsistency in %s" 1735 "md: fatal superblock inconsistency in %s"
1736 " -- removing from array\n", 1736 " -- removing from array\n",
1737 bdevname(rdev->bdev,b)); 1737 bdevname(rdev->bdev,b));
1738 kick_rdev_from_array(rdev); 1738 kick_rdev_from_array(rdev);
1739 } 1739 }
1740 1740
1741 1741
1742 super_types[mddev->major_version]. 1742 super_types[mddev->major_version].
1743 validate_super(mddev, freshest); 1743 validate_super(mddev, freshest);
1744 1744
1745 i = 0; 1745 i = 0;
1746 ITERATE_RDEV(mddev,rdev,tmp) { 1746 ITERATE_RDEV(mddev,rdev,tmp) {
1747 if (rdev != freshest) 1747 if (rdev != freshest)
1748 if (super_types[mddev->major_version]. 1748 if (super_types[mddev->major_version].
1749 validate_super(mddev, rdev)) { 1749 validate_super(mddev, rdev)) {
1750 printk(KERN_WARNING "md: kicking non-fresh %s" 1750 printk(KERN_WARNING "md: kicking non-fresh %s"
1751 " from array!\n", 1751 " from array!\n",
1752 bdevname(rdev->bdev,b)); 1752 bdevname(rdev->bdev,b));
1753 kick_rdev_from_array(rdev); 1753 kick_rdev_from_array(rdev);
1754 continue; 1754 continue;
1755 } 1755 }
1756 if (mddev->level == LEVEL_MULTIPATH) { 1756 if (mddev->level == LEVEL_MULTIPATH) {
1757 rdev->desc_nr = i++; 1757 rdev->desc_nr = i++;
1758 rdev->raid_disk = rdev->desc_nr; 1758 rdev->raid_disk = rdev->desc_nr;
1759 set_bit(In_sync, &rdev->flags); 1759 set_bit(In_sync, &rdev->flags);
1760 } 1760 }
1761 } 1761 }
1762 1762
1763 1763
1764 1764
1765 if (mddev->recovery_cp != MaxSector && 1765 if (mddev->recovery_cp != MaxSector &&
1766 mddev->level >= 1) 1766 mddev->level >= 1)
1767 printk(KERN_ERR "md: %s: raid array is not clean" 1767 printk(KERN_ERR "md: %s: raid array is not clean"
1768 " -- starting background reconstruction\n", 1768 " -- starting background reconstruction\n",
1769 mdname(mddev)); 1769 mdname(mddev));
1770 1770
1771 } 1771 }
1772 1772
1773 static ssize_t 1773 static ssize_t
1774 level_show(mddev_t *mddev, char *page) 1774 level_show(mddev_t *mddev, char *page)
1775 { 1775 {
1776 struct mdk_personality *p = mddev->pers; 1776 struct mdk_personality *p = mddev->pers;
1777 if (p == NULL && mddev->raid_disks == 0) 1777 if (p == NULL && mddev->raid_disks == 0)
1778 return 0; 1778 return 0;
1779 if (mddev->level >= 0) 1779 if (mddev->level >= 0)
1780 return sprintf(page, "raid%d\n", mddev->level); 1780 return sprintf(page, "raid%d\n", mddev->level);
1781 else 1781 else
1782 return sprintf(page, "%s\n", p->name); 1782 return sprintf(page, "%s\n", p->name);
1783 } 1783 }
1784 1784
1785 static struct md_sysfs_entry md_level = __ATTR_RO(level); 1785 static struct md_sysfs_entry md_level = __ATTR_RO(level);
1786 1786
1787 static ssize_t 1787 static ssize_t
1788 raid_disks_show(mddev_t *mddev, char *page) 1788 raid_disks_show(mddev_t *mddev, char *page)
1789 { 1789 {
1790 if (mddev->raid_disks == 0) 1790 if (mddev->raid_disks == 0)
1791 return 0; 1791 return 0;
1792 return sprintf(page, "%d\n", mddev->raid_disks); 1792 return sprintf(page, "%d\n", mddev->raid_disks);
1793 } 1793 }
1794 1794
1795 static struct md_sysfs_entry md_raid_disks = __ATTR_RO(raid_disks); 1795 static struct md_sysfs_entry md_raid_disks = __ATTR_RO(raid_disks);
1796 1796
1797 static ssize_t 1797 static ssize_t
1798 chunk_size_show(mddev_t *mddev, char *page)
1799 {
1800 return sprintf(page, "%d\n", mddev->chunk_size);
1801 }
1802
1803 static ssize_t
1804 chunk_size_store(mddev_t *mddev, const char *buf, size_t len)
1805 {
1806 /* can only set chunk_size if array is not yet active */
1807 char *e;
1808 unsigned long n = simple_strtoul(buf, &e, 10);
1809
1810 if (mddev->pers)
1811 return -EBUSY;
1812 if (!*buf || (*e && *e != '\n'))
1813 return -EINVAL;
1814
1815 mddev->chunk_size = n;
1816 return len;
1817 }
1818 static struct md_sysfs_entry md_chunk_size =
1819 __ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store);
1820
1821
1822 static ssize_t
1798 action_show(mddev_t *mddev, char *page) 1823 action_show(mddev_t *mddev, char *page)
1799 { 1824 {
1800 char *type = "idle"; 1825 char *type = "idle";
1801 if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || 1826 if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
1802 test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) { 1827 test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) {
1803 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { 1828 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
1804 if (!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) 1829 if (!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
1805 type = "resync"; 1830 type = "resync";
1806 else if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) 1831 else if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
1807 type = "check"; 1832 type = "check";
1808 else 1833 else
1809 type = "repair"; 1834 type = "repair";
1810 } else 1835 } else
1811 type = "recover"; 1836 type = "recover";
1812 } 1837 }
1813 return sprintf(page, "%s\n", type); 1838 return sprintf(page, "%s\n", type);
1814 } 1839 }
1815 1840
1816 static ssize_t 1841 static ssize_t
1817 action_store(mddev_t *mddev, const char *page, size_t len) 1842 action_store(mddev_t *mddev, const char *page, size_t len)
1818 { 1843 {
1819 if (!mddev->pers || !mddev->pers->sync_request) 1844 if (!mddev->pers || !mddev->pers->sync_request)
1820 return -EINVAL; 1845 return -EINVAL;
1821 1846
1822 if (cmd_match(page, "idle")) { 1847 if (cmd_match(page, "idle")) {
1823 if (mddev->sync_thread) { 1848 if (mddev->sync_thread) {
1824 set_bit(MD_RECOVERY_INTR, &mddev->recovery); 1849 set_bit(MD_RECOVERY_INTR, &mddev->recovery);
1825 md_unregister_thread(mddev->sync_thread); 1850 md_unregister_thread(mddev->sync_thread);
1826 mddev->sync_thread = NULL; 1851 mddev->sync_thread = NULL;
1827 mddev->recovery = 0; 1852 mddev->recovery = 0;
1828 } 1853 }
1829 } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || 1854 } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
1830 test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) 1855 test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
1831 return -EBUSY; 1856 return -EBUSY;
1832 else if (cmd_match(page, "resync") || cmd_match(page, "recover")) 1857 else if (cmd_match(page, "resync") || cmd_match(page, "recover"))
1833 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 1858 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
1834 else { 1859 else {
1835 if (cmd_match(page, "check")) 1860 if (cmd_match(page, "check"))
1836 set_bit(MD_RECOVERY_CHECK, &mddev->recovery); 1861 set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
1837 else if (cmd_match(page, "repair")) 1862 else if (cmd_match(page, "repair"))
1838 return -EINVAL; 1863 return -EINVAL;
1839 set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); 1864 set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
1840 set_bit(MD_RECOVERY_SYNC, &mddev->recovery); 1865 set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
1841 } 1866 }
1842 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 1867 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
1843 md_wakeup_thread(mddev->thread); 1868 md_wakeup_thread(mddev->thread);
1844 return len; 1869 return len;
1845 } 1870 }
1846 1871
1847 static ssize_t 1872 static ssize_t
1848 mismatch_cnt_show(mddev_t *mddev, char *page) 1873 mismatch_cnt_show(mddev_t *mddev, char *page)
1849 { 1874 {
1850 return sprintf(page, "%llu\n", 1875 return sprintf(page, "%llu\n",
1851 (unsigned long long) mddev->resync_mismatches); 1876 (unsigned long long) mddev->resync_mismatches);
1852 } 1877 }
1853 1878
1854 static struct md_sysfs_entry 1879 static struct md_sysfs_entry
1855 md_scan_mode = __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); 1880 md_scan_mode = __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
1856 1881
1857 1882
1858 static struct md_sysfs_entry 1883 static struct md_sysfs_entry
1859 md_mismatches = __ATTR_RO(mismatch_cnt); 1884 md_mismatches = __ATTR_RO(mismatch_cnt);
1860 1885
1861 static struct attribute *md_default_attrs[] = { 1886 static struct attribute *md_default_attrs[] = {
1862 &md_level.attr, 1887 &md_level.attr,
1863 &md_raid_disks.attr, 1888 &md_raid_disks.attr,
1889 &md_chunk_size.attr,
1864 NULL, 1890 NULL,
1865 }; 1891 };
1866 1892
1867 static struct attribute *md_redundancy_attrs[] = { 1893 static struct attribute *md_redundancy_attrs[] = {
1868 &md_scan_mode.attr, 1894 &md_scan_mode.attr,
1869 &md_mismatches.attr, 1895 &md_mismatches.attr,
1870 NULL, 1896 NULL,
1871 }; 1897 };
1872 static struct attribute_group md_redundancy_group = { 1898 static struct attribute_group md_redundancy_group = {
1873 .name = NULL, 1899 .name = NULL,
1874 .attrs = md_redundancy_attrs, 1900 .attrs = md_redundancy_attrs,
1875 }; 1901 };
1876 1902
1877 1903
1878 static ssize_t 1904 static ssize_t
1879 md_attr_show(struct kobject *kobj, struct attribute *attr, char *page) 1905 md_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
1880 { 1906 {
1881 struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); 1907 struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr);
1882 mddev_t *mddev = container_of(kobj, struct mddev_s, kobj); 1908 mddev_t *mddev = container_of(kobj, struct mddev_s, kobj);
1883 ssize_t rv; 1909 ssize_t rv;
1884 1910
1885 if (!entry->show) 1911 if (!entry->show)
1886 return -EIO; 1912 return -EIO;
1887 mddev_lock(mddev); 1913 mddev_lock(mddev);
1888 rv = entry->show(mddev, page); 1914 rv = entry->show(mddev, page);
1889 mddev_unlock(mddev); 1915 mddev_unlock(mddev);
1890 return rv; 1916 return rv;
1891 } 1917 }
1892 1918
1893 static ssize_t 1919 static ssize_t
1894 md_attr_store(struct kobject *kobj, struct attribute *attr, 1920 md_attr_store(struct kobject *kobj, struct attribute *attr,
1895 const char *page, size_t length) 1921 const char *page, size_t length)
1896 { 1922 {
1897 struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); 1923 struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr);
1898 mddev_t *mddev = container_of(kobj, struct mddev_s, kobj); 1924 mddev_t *mddev = container_of(kobj, struct mddev_s, kobj);
1899 ssize_t rv; 1925 ssize_t rv;
1900 1926
1901 if (!entry->store) 1927 if (!entry->store)
1902 return -EIO; 1928 return -EIO;
1903 mddev_lock(mddev); 1929 mddev_lock(mddev);
1904 rv = entry->store(mddev, page, length); 1930 rv = entry->store(mddev, page, length);
1905 mddev_unlock(mddev); 1931 mddev_unlock(mddev);
1906 return rv; 1932 return rv;
1907 } 1933 }
1908 1934
1909 static void md_free(struct kobject *ko) 1935 static void md_free(struct kobject *ko)
1910 { 1936 {
1911 mddev_t *mddev = container_of(ko, mddev_t, kobj); 1937 mddev_t *mddev = container_of(ko, mddev_t, kobj);
1912 kfree(mddev); 1938 kfree(mddev);
1913 } 1939 }
1914 1940
1915 static struct sysfs_ops md_sysfs_ops = { 1941 static struct sysfs_ops md_sysfs_ops = {
1916 .show = md_attr_show, 1942 .show = md_attr_show,
1917 .store = md_attr_store, 1943 .store = md_attr_store,
1918 }; 1944 };
1919 static struct kobj_type md_ktype = { 1945 static struct kobj_type md_ktype = {
1920 .release = md_free, 1946 .release = md_free,
1921 .sysfs_ops = &md_sysfs_ops, 1947 .sysfs_ops = &md_sysfs_ops,
1922 .default_attrs = md_default_attrs, 1948 .default_attrs = md_default_attrs,
1923 }; 1949 };
1924 1950
1925 int mdp_major = 0; 1951 int mdp_major = 0;
1926 1952
1927 static struct kobject *md_probe(dev_t dev, int *part, void *data) 1953 static struct kobject *md_probe(dev_t dev, int *part, void *data)
1928 { 1954 {
1929 static DECLARE_MUTEX(disks_sem); 1955 static DECLARE_MUTEX(disks_sem);
1930 mddev_t *mddev = mddev_find(dev); 1956 mddev_t *mddev = mddev_find(dev);
1931 struct gendisk *disk; 1957 struct gendisk *disk;
1932 int partitioned = (MAJOR(dev) != MD_MAJOR); 1958 int partitioned = (MAJOR(dev) != MD_MAJOR);
1933 int shift = partitioned ? MdpMinorShift : 0; 1959 int shift = partitioned ? MdpMinorShift : 0;
1934 int unit = MINOR(dev) >> shift; 1960 int unit = MINOR(dev) >> shift;
1935 1961
1936 if (!mddev) 1962 if (!mddev)
1937 return NULL; 1963 return NULL;
1938 1964
1939 down(&disks_sem); 1965 down(&disks_sem);
1940 if (mddev->gendisk) { 1966 if (mddev->gendisk) {
1941 up(&disks_sem); 1967 up(&disks_sem);
1942 mddev_put(mddev); 1968 mddev_put(mddev);
1943 return NULL; 1969 return NULL;
1944 } 1970 }
1945 disk = alloc_disk(1 << shift); 1971 disk = alloc_disk(1 << shift);
1946 if (!disk) { 1972 if (!disk) {
1947 up(&disks_sem); 1973 up(&disks_sem);
1948 mddev_put(mddev); 1974 mddev_put(mddev);
1949 return NULL; 1975 return NULL;
1950 } 1976 }
1951 disk->major = MAJOR(dev); 1977 disk->major = MAJOR(dev);
1952 disk->first_minor = unit << shift; 1978 disk->first_minor = unit << shift;
1953 if (partitioned) { 1979 if (partitioned) {
1954 sprintf(disk->disk_name, "md_d%d", unit); 1980 sprintf(disk->disk_name, "md_d%d", unit);
1955 sprintf(disk->devfs_name, "md/d%d", unit); 1981 sprintf(disk->devfs_name, "md/d%d", unit);
1956 } else { 1982 } else {
1957 sprintf(disk->disk_name, "md%d", unit); 1983 sprintf(disk->disk_name, "md%d", unit);
1958 sprintf(disk->devfs_name, "md/%d", unit); 1984 sprintf(disk->devfs_name, "md/%d", unit);
1959 } 1985 }
1960 disk->fops = &md_fops; 1986 disk->fops = &md_fops;
1961 disk->private_data = mddev; 1987 disk->private_data = mddev;
1962 disk->queue = mddev->queue; 1988 disk->queue = mddev->queue;
1963 add_disk(disk); 1989 add_disk(disk);
1964 mddev->gendisk = disk; 1990 mddev->gendisk = disk;
1965 up(&disks_sem); 1991 up(&disks_sem);
1966 mddev->kobj.parent = &disk->kobj; 1992 mddev->kobj.parent = &disk->kobj;
1967 mddev->kobj.k_name = NULL; 1993 mddev->kobj.k_name = NULL;
1968 snprintf(mddev->kobj.name, KOBJ_NAME_LEN, "%s", "md"); 1994 snprintf(mddev->kobj.name, KOBJ_NAME_LEN, "%s", "md");
1969 mddev->kobj.ktype = &md_ktype; 1995 mddev->kobj.ktype = &md_ktype;
1970 kobject_register(&mddev->kobj); 1996 kobject_register(&mddev->kobj);
1971 return NULL; 1997 return NULL;
1972 } 1998 }
1973 1999
1974 void md_wakeup_thread(mdk_thread_t *thread); 2000 void md_wakeup_thread(mdk_thread_t *thread);
1975 2001
1976 static void md_safemode_timeout(unsigned long data) 2002 static void md_safemode_timeout(unsigned long data)
1977 { 2003 {
1978 mddev_t *mddev = (mddev_t *) data; 2004 mddev_t *mddev = (mddev_t *) data;
1979 2005
1980 mddev->safemode = 1; 2006 mddev->safemode = 1;
1981 md_wakeup_thread(mddev->thread); 2007 md_wakeup_thread(mddev->thread);
1982 } 2008 }
1983 2009
1984 static int start_dirty_degraded; 2010 static int start_dirty_degraded;
1985 2011
1986 static int do_md_run(mddev_t * mddev) 2012 static int do_md_run(mddev_t * mddev)
1987 { 2013 {
1988 int err; 2014 int err;
1989 int chunk_size; 2015 int chunk_size;
1990 struct list_head *tmp; 2016 struct list_head *tmp;
1991 mdk_rdev_t *rdev; 2017 mdk_rdev_t *rdev;
1992 struct gendisk *disk; 2018 struct gendisk *disk;
1993 struct mdk_personality *pers; 2019 struct mdk_personality *pers;
1994 char b[BDEVNAME_SIZE]; 2020 char b[BDEVNAME_SIZE];
1995 2021
1996 if (list_empty(&mddev->disks)) 2022 if (list_empty(&mddev->disks))
1997 /* cannot run an array with no devices.. */ 2023 /* cannot run an array with no devices.. */
1998 return -EINVAL; 2024 return -EINVAL;
1999 2025
2000 if (mddev->pers) 2026 if (mddev->pers)
2001 return -EBUSY; 2027 return -EBUSY;
2002 2028
2003 /* 2029 /*
2004 * Analyze all RAID superblock(s) 2030 * Analyze all RAID superblock(s)
2005 */ 2031 */
2006 if (!mddev->raid_disks) 2032 if (!mddev->raid_disks)
2007 analyze_sbs(mddev); 2033 analyze_sbs(mddev);
2008 2034
2009 chunk_size = mddev->chunk_size; 2035 chunk_size = mddev->chunk_size;
2010 2036
2011 if (chunk_size) { 2037 if (chunk_size) {
2012 if (chunk_size > MAX_CHUNK_SIZE) { 2038 if (chunk_size > MAX_CHUNK_SIZE) {
2013 printk(KERN_ERR "too big chunk_size: %d > %d\n", 2039 printk(KERN_ERR "too big chunk_size: %d > %d\n",
2014 chunk_size, MAX_CHUNK_SIZE); 2040 chunk_size, MAX_CHUNK_SIZE);
2015 return -EINVAL; 2041 return -EINVAL;
2016 } 2042 }
2017 /* 2043 /*
2018 * chunk-size has to be a power of 2 and multiples of PAGE_SIZE 2044 * chunk-size has to be a power of 2 and multiples of PAGE_SIZE
2019 */ 2045 */
2020 if ( (1 << ffz(~chunk_size)) != chunk_size) { 2046 if ( (1 << ffz(~chunk_size)) != chunk_size) {
2021 printk(KERN_ERR "chunk_size of %d not valid\n", chunk_size); 2047 printk(KERN_ERR "chunk_size of %d not valid\n", chunk_size);
2022 return -EINVAL; 2048 return -EINVAL;
2023 } 2049 }
2024 if (chunk_size < PAGE_SIZE) { 2050 if (chunk_size < PAGE_SIZE) {
2025 printk(KERN_ERR "too small chunk_size: %d < %ld\n", 2051 printk(KERN_ERR "too small chunk_size: %d < %ld\n",
2026 chunk_size, PAGE_SIZE); 2052 chunk_size, PAGE_SIZE);
2027 return -EINVAL; 2053 return -EINVAL;
2028 } 2054 }
2029 2055
2030 /* devices must have minimum size of one chunk */ 2056 /* devices must have minimum size of one chunk */
2031 ITERATE_RDEV(mddev,rdev,tmp) { 2057 ITERATE_RDEV(mddev,rdev,tmp) {
2032 if (test_bit(Faulty, &rdev->flags)) 2058 if (test_bit(Faulty, &rdev->flags))
2033 continue; 2059 continue;
2034 if (rdev->size < chunk_size / 1024) { 2060 if (rdev->size < chunk_size / 1024) {
2035 printk(KERN_WARNING 2061 printk(KERN_WARNING
2036 "md: Dev %s smaller than chunk_size:" 2062 "md: Dev %s smaller than chunk_size:"
2037 " %lluk < %dk\n", 2063 " %lluk < %dk\n",
2038 bdevname(rdev->bdev,b), 2064 bdevname(rdev->bdev,b),
2039 (unsigned long long)rdev->size, 2065 (unsigned long long)rdev->size,
2040 chunk_size / 1024); 2066 chunk_size / 1024);
2041 return -EINVAL; 2067 return -EINVAL;
2042 } 2068 }
2043 } 2069 }
2044 } 2070 }
2045 2071
2046 #ifdef CONFIG_KMOD 2072 #ifdef CONFIG_KMOD
2047 request_module("md-level-%d", mddev->level); 2073 request_module("md-level-%d", mddev->level);
2048 #endif 2074 #endif
2049 2075
2050 /* 2076 /*
2051 * Drop all container device buffers, from now on 2077 * Drop all container device buffers, from now on
2052 * the only valid external interface is through the md 2078 * the only valid external interface is through the md
2053 * device. 2079 * device.
2054 * Also find largest hardsector size 2080 * Also find largest hardsector size
2055 */ 2081 */
2056 ITERATE_RDEV(mddev,rdev,tmp) { 2082 ITERATE_RDEV(mddev,rdev,tmp) {
2057 if (test_bit(Faulty, &rdev->flags)) 2083 if (test_bit(Faulty, &rdev->flags))
2058 continue; 2084 continue;
2059 sync_blockdev(rdev->bdev); 2085 sync_blockdev(rdev->bdev);
2060 invalidate_bdev(rdev->bdev, 0); 2086 invalidate_bdev(rdev->bdev, 0);
2061 } 2087 }
2062 2088
2063 md_probe(mddev->unit, NULL, NULL); 2089 md_probe(mddev->unit, NULL, NULL);
2064 disk = mddev->gendisk; 2090 disk = mddev->gendisk;
2065 if (!disk) 2091 if (!disk)
2066 return -ENOMEM; 2092 return -ENOMEM;
2067 2093
2068 spin_lock(&pers_lock); 2094 spin_lock(&pers_lock);
2069 pers = find_pers(mddev->level); 2095 pers = find_pers(mddev->level);
2070 if (!pers || !try_module_get(pers->owner)) { 2096 if (!pers || !try_module_get(pers->owner)) {
2071 spin_unlock(&pers_lock); 2097 spin_unlock(&pers_lock);
2072 printk(KERN_WARNING "md: personality for level %d is not loaded!\n", 2098 printk(KERN_WARNING "md: personality for level %d is not loaded!\n",
2073 mddev->level); 2099 mddev->level);
2074 return -EINVAL; 2100 return -EINVAL;
2075 } 2101 }
2076 mddev->pers = pers; 2102 mddev->pers = pers;
2077 spin_unlock(&pers_lock); 2103 spin_unlock(&pers_lock);
2078 2104
2079 mddev->recovery = 0; 2105 mddev->recovery = 0;
2080 mddev->resync_max_sectors = mddev->size << 1; /* may be over-ridden by personality */ 2106 mddev->resync_max_sectors = mddev->size << 1; /* may be over-ridden by personality */
2081 mddev->barriers_work = 1; 2107 mddev->barriers_work = 1;
2082 mddev->ok_start_degraded = start_dirty_degraded; 2108 mddev->ok_start_degraded = start_dirty_degraded;
2083 2109
2084 if (start_readonly) 2110 if (start_readonly)
2085 mddev->ro = 2; /* read-only, but switch on first write */ 2111 mddev->ro = 2; /* read-only, but switch on first write */
2086 2112
2087 err = mddev->pers->run(mddev); 2113 err = mddev->pers->run(mddev);
2088 if (!err && mddev->pers->sync_request) { 2114 if (!err && mddev->pers->sync_request) {
2089 err = bitmap_create(mddev); 2115 err = bitmap_create(mddev);
2090 if (err) { 2116 if (err) {
2091 printk(KERN_ERR "%s: failed to create bitmap (%d)\n", 2117 printk(KERN_ERR "%s: failed to create bitmap (%d)\n",
2092 mdname(mddev), err); 2118 mdname(mddev), err);
2093 mddev->pers->stop(mddev); 2119 mddev->pers->stop(mddev);
2094 } 2120 }
2095 } 2121 }
2096 if (err) { 2122 if (err) {
2097 printk(KERN_ERR "md: pers->run() failed ...\n"); 2123 printk(KERN_ERR "md: pers->run() failed ...\n");
2098 module_put(mddev->pers->owner); 2124 module_put(mddev->pers->owner);
2099 mddev->pers = NULL; 2125 mddev->pers = NULL;
2100 bitmap_destroy(mddev); 2126 bitmap_destroy(mddev);
2101 return err; 2127 return err;
2102 } 2128 }
2103 if (mddev->pers->sync_request) 2129 if (mddev->pers->sync_request)
2104 sysfs_create_group(&mddev->kobj, &md_redundancy_group); 2130 sysfs_create_group(&mddev->kobj, &md_redundancy_group);
2105 else if (mddev->ro == 2) /* auto-readonly not meaningful */ 2131 else if (mddev->ro == 2) /* auto-readonly not meaningful */
2106 mddev->ro = 0; 2132 mddev->ro = 0;
2107 2133
2108 atomic_set(&mddev->writes_pending,0); 2134 atomic_set(&mddev->writes_pending,0);
2109 mddev->safemode = 0; 2135 mddev->safemode = 0;
2110 mddev->safemode_timer.function = md_safemode_timeout; 2136 mddev->safemode_timer.function = md_safemode_timeout;
2111 mddev->safemode_timer.data = (unsigned long) mddev; 2137 mddev->safemode_timer.data = (unsigned long) mddev;
2112 mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */ 2138 mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */
2113 mddev->in_sync = 1; 2139 mddev->in_sync = 1;
2114 2140
2115 ITERATE_RDEV(mddev,rdev,tmp) 2141 ITERATE_RDEV(mddev,rdev,tmp)
2116 if (rdev->raid_disk >= 0) { 2142 if (rdev->raid_disk >= 0) {
2117 char nm[20]; 2143 char nm[20];
2118 sprintf(nm, "rd%d", rdev->raid_disk); 2144 sprintf(nm, "rd%d", rdev->raid_disk);
2119 sysfs_create_link(&mddev->kobj, &rdev->kobj, nm); 2145 sysfs_create_link(&mddev->kobj, &rdev->kobj, nm);
2120 } 2146 }
2121 2147
2122 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 2148 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
2123 md_wakeup_thread(mddev->thread); 2149 md_wakeup_thread(mddev->thread);
2124 2150
2125 if (mddev->sb_dirty) 2151 if (mddev->sb_dirty)
2126 md_update_sb(mddev); 2152 md_update_sb(mddev);
2127 2153
2128 set_capacity(disk, mddev->array_size<<1); 2154 set_capacity(disk, mddev->array_size<<1);
2129 2155
2130 /* If we call blk_queue_make_request here, it will 2156 /* If we call blk_queue_make_request here, it will
2131 * re-initialise max_sectors etc which may have been 2157 * re-initialise max_sectors etc which may have been
2132 * refined inside -> run. So just set the bits we need to set. 2158 * refined inside -> run. So just set the bits we need to set.
2133 * Most initialisation happended when we called 2159 * Most initialisation happended when we called
2134 * blk_queue_make_request(..., md_fail_request) 2160 * blk_queue_make_request(..., md_fail_request)
2135 * earlier. 2161 * earlier.
2136 */ 2162 */
2137 mddev->queue->queuedata = mddev; 2163 mddev->queue->queuedata = mddev;
2138 mddev->queue->make_request_fn = mddev->pers->make_request; 2164 mddev->queue->make_request_fn = mddev->pers->make_request;
2139 2165
2140 mddev->changed = 1; 2166 mddev->changed = 1;
2141 md_new_event(mddev); 2167 md_new_event(mddev);
2142 return 0; 2168 return 0;
2143 } 2169 }
2144 2170
2145 static int restart_array(mddev_t *mddev) 2171 static int restart_array(mddev_t *mddev)
2146 { 2172 {
2147 struct gendisk *disk = mddev->gendisk; 2173 struct gendisk *disk = mddev->gendisk;
2148 int err; 2174 int err;
2149 2175
2150 /* 2176 /*
2151 * Complain if it has no devices 2177 * Complain if it has no devices
2152 */ 2178 */
2153 err = -ENXIO; 2179 err = -ENXIO;
2154 if (list_empty(&mddev->disks)) 2180 if (list_empty(&mddev->disks))
2155 goto out; 2181 goto out;
2156 2182
2157 if (mddev->pers) { 2183 if (mddev->pers) {
2158 err = -EBUSY; 2184 err = -EBUSY;
2159 if (!mddev->ro) 2185 if (!mddev->ro)
2160 goto out; 2186 goto out;
2161 2187
2162 mddev->safemode = 0; 2188 mddev->safemode = 0;
2163 mddev->ro = 0; 2189 mddev->ro = 0;
2164 set_disk_ro(disk, 0); 2190 set_disk_ro(disk, 0);
2165 2191
2166 printk(KERN_INFO "md: %s switched to read-write mode.\n", 2192 printk(KERN_INFO "md: %s switched to read-write mode.\n",
2167 mdname(mddev)); 2193 mdname(mddev));
2168 /* 2194 /*
2169 * Kick recovery or resync if necessary 2195 * Kick recovery or resync if necessary
2170 */ 2196 */
2171 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 2197 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
2172 md_wakeup_thread(mddev->thread); 2198 md_wakeup_thread(mddev->thread);
2173 err = 0; 2199 err = 0;
2174 } else { 2200 } else {
2175 printk(KERN_ERR "md: %s has no personality assigned.\n", 2201 printk(KERN_ERR "md: %s has no personality assigned.\n",
2176 mdname(mddev)); 2202 mdname(mddev));
2177 err = -EINVAL; 2203 err = -EINVAL;
2178 } 2204 }
2179 2205
2180 out: 2206 out:
2181 return err; 2207 return err;
2182 } 2208 }
2183 2209
2184 static int do_md_stop(mddev_t * mddev, int ro) 2210 static int do_md_stop(mddev_t * mddev, int ro)
2185 { 2211 {
2186 int err = 0; 2212 int err = 0;
2187 struct gendisk *disk = mddev->gendisk; 2213 struct gendisk *disk = mddev->gendisk;
2188 2214
2189 if (mddev->pers) { 2215 if (mddev->pers) {
2190 if (atomic_read(&mddev->active)>2) { 2216 if (atomic_read(&mddev->active)>2) {
2191 printk("md: %s still in use.\n",mdname(mddev)); 2217 printk("md: %s still in use.\n",mdname(mddev));
2192 return -EBUSY; 2218 return -EBUSY;
2193 } 2219 }
2194 2220
2195 if (mddev->sync_thread) { 2221 if (mddev->sync_thread) {
2196 set_bit(MD_RECOVERY_INTR, &mddev->recovery); 2222 set_bit(MD_RECOVERY_INTR, &mddev->recovery);
2197 md_unregister_thread(mddev->sync_thread); 2223 md_unregister_thread(mddev->sync_thread);
2198 mddev->sync_thread = NULL; 2224 mddev->sync_thread = NULL;
2199 } 2225 }
2200 2226
2201 del_timer_sync(&mddev->safemode_timer); 2227 del_timer_sync(&mddev->safemode_timer);
2202 2228
2203 invalidate_partition(disk, 0); 2229 invalidate_partition(disk, 0);
2204 2230
2205 if (ro) { 2231 if (ro) {
2206 err = -ENXIO; 2232 err = -ENXIO;
2207 if (mddev->ro==1) 2233 if (mddev->ro==1)
2208 goto out; 2234 goto out;
2209 mddev->ro = 1; 2235 mddev->ro = 1;
2210 } else { 2236 } else {
2211 bitmap_flush(mddev); 2237 bitmap_flush(mddev);
2212 md_super_wait(mddev); 2238 md_super_wait(mddev);
2213 if (mddev->ro) 2239 if (mddev->ro)
2214 set_disk_ro(disk, 0); 2240 set_disk_ro(disk, 0);
2215 blk_queue_make_request(mddev->queue, md_fail_request); 2241 blk_queue_make_request(mddev->queue, md_fail_request);
2216 mddev->pers->stop(mddev); 2242 mddev->pers->stop(mddev);
2217 if (mddev->pers->sync_request) 2243 if (mddev->pers->sync_request)
2218 sysfs_remove_group(&mddev->kobj, &md_redundancy_group); 2244 sysfs_remove_group(&mddev->kobj, &md_redundancy_group);
2219 2245
2220 module_put(mddev->pers->owner); 2246 module_put(mddev->pers->owner);
2221 mddev->pers = NULL; 2247 mddev->pers = NULL;
2222 if (mddev->ro) 2248 if (mddev->ro)
2223 mddev->ro = 0; 2249 mddev->ro = 0;
2224 } 2250 }
2225 if (!mddev->in_sync) { 2251 if (!mddev->in_sync) {
2226 /* mark array as shutdown cleanly */ 2252 /* mark array as shutdown cleanly */
2227 mddev->in_sync = 1; 2253 mddev->in_sync = 1;
2228 md_update_sb(mddev); 2254 md_update_sb(mddev);
2229 } 2255 }
2230 if (ro) 2256 if (ro)
2231 set_disk_ro(disk, 1); 2257 set_disk_ro(disk, 1);
2232 } 2258 }
2233 2259
2234 bitmap_destroy(mddev); 2260 bitmap_destroy(mddev);
2235 if (mddev->bitmap_file) { 2261 if (mddev->bitmap_file) {
2236 atomic_set(&mddev->bitmap_file->f_dentry->d_inode->i_writecount, 1); 2262 atomic_set(&mddev->bitmap_file->f_dentry->d_inode->i_writecount, 1);
2237 fput(mddev->bitmap_file); 2263 fput(mddev->bitmap_file);
2238 mddev->bitmap_file = NULL; 2264 mddev->bitmap_file = NULL;
2239 } 2265 }
2240 mddev->bitmap_offset = 0; 2266 mddev->bitmap_offset = 0;
2241 2267
2242 /* 2268 /*
2243 * Free resources if final stop 2269 * Free resources if final stop
2244 */ 2270 */
2245 if (!ro) { 2271 if (!ro) {
2246 mdk_rdev_t *rdev; 2272 mdk_rdev_t *rdev;
2247 struct list_head *tmp; 2273 struct list_head *tmp;
2248 struct gendisk *disk; 2274 struct gendisk *disk;
2249 printk(KERN_INFO "md: %s stopped.\n", mdname(mddev)); 2275 printk(KERN_INFO "md: %s stopped.\n", mdname(mddev));
2250 2276
2251 ITERATE_RDEV(mddev,rdev,tmp) 2277 ITERATE_RDEV(mddev,rdev,tmp)
2252 if (rdev->raid_disk >= 0) { 2278 if (rdev->raid_disk >= 0) {
2253 char nm[20]; 2279 char nm[20];
2254 sprintf(nm, "rd%d", rdev->raid_disk); 2280 sprintf(nm, "rd%d", rdev->raid_disk);
2255 sysfs_remove_link(&mddev->kobj, nm); 2281 sysfs_remove_link(&mddev->kobj, nm);
2256 } 2282 }
2257 2283
2258 export_array(mddev); 2284 export_array(mddev);
2259 2285
2260 mddev->array_size = 0; 2286 mddev->array_size = 0;
2261 disk = mddev->gendisk; 2287 disk = mddev->gendisk;
2262 if (disk) 2288 if (disk)
2263 set_capacity(disk, 0); 2289 set_capacity(disk, 0);
2264 mddev->changed = 1; 2290 mddev->changed = 1;
2265 } else 2291 } else
2266 printk(KERN_INFO "md: %s switched to read-only mode.\n", 2292 printk(KERN_INFO "md: %s switched to read-only mode.\n",
2267 mdname(mddev)); 2293 mdname(mddev));
2268 err = 0; 2294 err = 0;
2269 md_new_event(mddev); 2295 md_new_event(mddev);
2270 out: 2296 out:
2271 return err; 2297 return err;
2272 } 2298 }
2273 2299
2274 static void autorun_array(mddev_t *mddev) 2300 static void autorun_array(mddev_t *mddev)
2275 { 2301 {
2276 mdk_rdev_t *rdev; 2302 mdk_rdev_t *rdev;
2277 struct list_head *tmp; 2303 struct list_head *tmp;
2278 int err; 2304 int err;
2279 2305
2280 if (list_empty(&mddev->disks)) 2306 if (list_empty(&mddev->disks))
2281 return; 2307 return;
2282 2308
2283 printk(KERN_INFO "md: running: "); 2309 printk(KERN_INFO "md: running: ");
2284 2310
2285 ITERATE_RDEV(mddev,rdev,tmp) { 2311 ITERATE_RDEV(mddev,rdev,tmp) {
2286 char b[BDEVNAME_SIZE]; 2312 char b[BDEVNAME_SIZE];
2287 printk("<%s>", bdevname(rdev->bdev,b)); 2313 printk("<%s>", bdevname(rdev->bdev,b));
2288 } 2314 }
2289 printk("\n"); 2315 printk("\n");
2290 2316
2291 err = do_md_run (mddev); 2317 err = do_md_run (mddev);
2292 if (err) { 2318 if (err) {
2293 printk(KERN_WARNING "md: do_md_run() returned %d\n", err); 2319 printk(KERN_WARNING "md: do_md_run() returned %d\n", err);
2294 do_md_stop (mddev, 0); 2320 do_md_stop (mddev, 0);
2295 } 2321 }
2296 } 2322 }
2297 2323
2298 /* 2324 /*
2299 * lets try to run arrays based on all disks that have arrived 2325 * lets try to run arrays based on all disks that have arrived
2300 * until now. (those are in pending_raid_disks) 2326 * until now. (those are in pending_raid_disks)
2301 * 2327 *
2302 * the method: pick the first pending disk, collect all disks with 2328 * the method: pick the first pending disk, collect all disks with
2303 * the same UUID, remove all from the pending list and put them into 2329 * the same UUID, remove all from the pending list and put them into
2304 * the 'same_array' list. Then order this list based on superblock 2330 * the 'same_array' list. Then order this list based on superblock
2305 * update time (freshest comes first), kick out 'old' disks and 2331 * update time (freshest comes first), kick out 'old' disks and
2306 * compare superblocks. If everything's fine then run it. 2332 * compare superblocks. If everything's fine then run it.
2307 * 2333 *
2308 * If "unit" is allocated, then bump its reference count 2334 * If "unit" is allocated, then bump its reference count
2309 */ 2335 */
2310 static void autorun_devices(int part) 2336 static void autorun_devices(int part)
2311 { 2337 {
2312 struct list_head candidates; 2338 struct list_head candidates;
2313 struct list_head *tmp; 2339 struct list_head *tmp;
2314 mdk_rdev_t *rdev0, *rdev; 2340 mdk_rdev_t *rdev0, *rdev;
2315 mddev_t *mddev; 2341 mddev_t *mddev;
2316 char b[BDEVNAME_SIZE]; 2342 char b[BDEVNAME_SIZE];
2317 2343
2318 printk(KERN_INFO "md: autorun ...\n"); 2344 printk(KERN_INFO "md: autorun ...\n");
2319 while (!list_empty(&pending_raid_disks)) { 2345 while (!list_empty(&pending_raid_disks)) {
2320 dev_t dev; 2346 dev_t dev;
2321 rdev0 = list_entry(pending_raid_disks.next, 2347 rdev0 = list_entry(pending_raid_disks.next,
2322 mdk_rdev_t, same_set); 2348 mdk_rdev_t, same_set);
2323 2349
2324 printk(KERN_INFO "md: considering %s ...\n", 2350 printk(KERN_INFO "md: considering %s ...\n",
2325 bdevname(rdev0->bdev,b)); 2351 bdevname(rdev0->bdev,b));
2326 INIT_LIST_HEAD(&candidates); 2352 INIT_LIST_HEAD(&candidates);
2327 ITERATE_RDEV_PENDING(rdev,tmp) 2353 ITERATE_RDEV_PENDING(rdev,tmp)
2328 if (super_90_load(rdev, rdev0, 0) >= 0) { 2354 if (super_90_load(rdev, rdev0, 0) >= 0) {
2329 printk(KERN_INFO "md: adding %s ...\n", 2355 printk(KERN_INFO "md: adding %s ...\n",
2330 bdevname(rdev->bdev,b)); 2356 bdevname(rdev->bdev,b));
2331 list_move(&rdev->same_set, &candidates); 2357 list_move(&rdev->same_set, &candidates);
2332 } 2358 }
2333 /* 2359 /*
2334 * now we have a set of devices, with all of them having 2360 * now we have a set of devices, with all of them having
2335 * mostly sane superblocks. It's time to allocate the 2361 * mostly sane superblocks. It's time to allocate the
2336 * mddev. 2362 * mddev.
2337 */ 2363 */
2338 if (rdev0->preferred_minor < 0 || rdev0->preferred_minor >= MAX_MD_DEVS) { 2364 if (rdev0->preferred_minor < 0 || rdev0->preferred_minor >= MAX_MD_DEVS) {
2339 printk(KERN_INFO "md: unit number in %s is bad: %d\n", 2365 printk(KERN_INFO "md: unit number in %s is bad: %d\n",
2340 bdevname(rdev0->bdev, b), rdev0->preferred_minor); 2366 bdevname(rdev0->bdev, b), rdev0->preferred_minor);
2341 break; 2367 break;
2342 } 2368 }
2343 if (part) 2369 if (part)
2344 dev = MKDEV(mdp_major, 2370 dev = MKDEV(mdp_major,
2345 rdev0->preferred_minor << MdpMinorShift); 2371 rdev0->preferred_minor << MdpMinorShift);
2346 else 2372 else
2347 dev = MKDEV(MD_MAJOR, rdev0->preferred_minor); 2373 dev = MKDEV(MD_MAJOR, rdev0->preferred_minor);
2348 2374
2349 md_probe(dev, NULL, NULL); 2375 md_probe(dev, NULL, NULL);
2350 mddev = mddev_find(dev); 2376 mddev = mddev_find(dev);
2351 if (!mddev) { 2377 if (!mddev) {
2352 printk(KERN_ERR 2378 printk(KERN_ERR
2353 "md: cannot allocate memory for md drive.\n"); 2379 "md: cannot allocate memory for md drive.\n");
2354 break; 2380 break;
2355 } 2381 }
2356 if (mddev_lock(mddev)) 2382 if (mddev_lock(mddev))
2357 printk(KERN_WARNING "md: %s locked, cannot run\n", 2383 printk(KERN_WARNING "md: %s locked, cannot run\n",
2358 mdname(mddev)); 2384 mdname(mddev));
2359 else if (mddev->raid_disks || mddev->major_version 2385 else if (mddev->raid_disks || mddev->major_version
2360 || !list_empty(&mddev->disks)) { 2386 || !list_empty(&mddev->disks)) {
2361 printk(KERN_WARNING 2387 printk(KERN_WARNING
2362 "md: %s already running, cannot run %s\n", 2388 "md: %s already running, cannot run %s\n",
2363 mdname(mddev), bdevname(rdev0->bdev,b)); 2389 mdname(mddev), bdevname(rdev0->bdev,b));
2364 mddev_unlock(mddev); 2390 mddev_unlock(mddev);
2365 } else { 2391 } else {
2366 printk(KERN_INFO "md: created %s\n", mdname(mddev)); 2392 printk(KERN_INFO "md: created %s\n", mdname(mddev));
2367 ITERATE_RDEV_GENERIC(candidates,rdev,tmp) { 2393 ITERATE_RDEV_GENERIC(candidates,rdev,tmp) {
2368 list_del_init(&rdev->same_set); 2394 list_del_init(&rdev->same_set);
2369 if (bind_rdev_to_array(rdev, mddev)) 2395 if (bind_rdev_to_array(rdev, mddev))
2370 export_rdev(rdev); 2396 export_rdev(rdev);
2371 } 2397 }
2372 autorun_array(mddev); 2398 autorun_array(mddev);
2373 mddev_unlock(mddev); 2399 mddev_unlock(mddev);
2374 } 2400 }
2375 /* on success, candidates will be empty, on error 2401 /* on success, candidates will be empty, on error
2376 * it won't... 2402 * it won't...
2377 */ 2403 */
2378 ITERATE_RDEV_GENERIC(candidates,rdev,tmp) 2404 ITERATE_RDEV_GENERIC(candidates,rdev,tmp)
2379 export_rdev(rdev); 2405 export_rdev(rdev);
2380 mddev_put(mddev); 2406 mddev_put(mddev);
2381 } 2407 }
2382 printk(KERN_INFO "md: ... autorun DONE.\n"); 2408 printk(KERN_INFO "md: ... autorun DONE.\n");
2383 } 2409 }
2384 2410
2385 /* 2411 /*
2386 * import RAID devices based on one partition 2412 * import RAID devices based on one partition
2387 * if possible, the array gets run as well. 2413 * if possible, the array gets run as well.
2388 */ 2414 */
2389 2415
2390 static int autostart_array(dev_t startdev) 2416 static int autostart_array(dev_t startdev)
2391 { 2417 {
2392 char b[BDEVNAME_SIZE]; 2418 char b[BDEVNAME_SIZE];
2393 int err = -EINVAL, i; 2419 int err = -EINVAL, i;
2394 mdp_super_t *sb = NULL; 2420 mdp_super_t *sb = NULL;
2395 mdk_rdev_t *start_rdev = NULL, *rdev; 2421 mdk_rdev_t *start_rdev = NULL, *rdev;
2396 2422
2397 start_rdev = md_import_device(startdev, 0, 0); 2423 start_rdev = md_import_device(startdev, 0, 0);
2398 if (IS_ERR(start_rdev)) 2424 if (IS_ERR(start_rdev))
2399 return err; 2425 return err;
2400 2426
2401 2427
2402 /* NOTE: this can only work for 0.90.0 superblocks */ 2428 /* NOTE: this can only work for 0.90.0 superblocks */
2403 sb = (mdp_super_t*)page_address(start_rdev->sb_page); 2429 sb = (mdp_super_t*)page_address(start_rdev->sb_page);
2404 if (sb->major_version != 0 || 2430 if (sb->major_version != 0 ||
2405 sb->minor_version != 90 ) { 2431 sb->minor_version != 90 ) {
2406 printk(KERN_WARNING "md: can only autostart 0.90.0 arrays\n"); 2432 printk(KERN_WARNING "md: can only autostart 0.90.0 arrays\n");
2407 export_rdev(start_rdev); 2433 export_rdev(start_rdev);
2408 return err; 2434 return err;
2409 } 2435 }
2410 2436
2411 if (test_bit(Faulty, &start_rdev->flags)) { 2437 if (test_bit(Faulty, &start_rdev->flags)) {
2412 printk(KERN_WARNING 2438 printk(KERN_WARNING
2413 "md: can not autostart based on faulty %s!\n", 2439 "md: can not autostart based on faulty %s!\n",
2414 bdevname(start_rdev->bdev,b)); 2440 bdevname(start_rdev->bdev,b));
2415 export_rdev(start_rdev); 2441 export_rdev(start_rdev);
2416 return err; 2442 return err;
2417 } 2443 }
2418 list_add(&start_rdev->same_set, &pending_raid_disks); 2444 list_add(&start_rdev->same_set, &pending_raid_disks);
2419 2445
2420 for (i = 0; i < MD_SB_DISKS; i++) { 2446 for (i = 0; i < MD_SB_DISKS; i++) {
2421 mdp_disk_t *desc = sb->disks + i; 2447 mdp_disk_t *desc = sb->disks + i;
2422 dev_t dev = MKDEV(desc->major, desc->minor); 2448 dev_t dev = MKDEV(desc->major, desc->minor);
2423 2449
2424 if (!dev) 2450 if (!dev)
2425 continue; 2451 continue;
2426 if (dev == startdev) 2452 if (dev == startdev)
2427 continue; 2453 continue;
2428 if (MAJOR(dev) != desc->major || MINOR(dev) != desc->minor) 2454 if (MAJOR(dev) != desc->major || MINOR(dev) != desc->minor)
2429 continue; 2455 continue;
2430 rdev = md_import_device(dev, 0, 0); 2456 rdev = md_import_device(dev, 0, 0);
2431 if (IS_ERR(rdev)) 2457 if (IS_ERR(rdev))
2432 continue; 2458 continue;
2433 2459
2434 list_add(&rdev->same_set, &pending_raid_disks); 2460 list_add(&rdev->same_set, &pending_raid_disks);
2435 } 2461 }
2436 2462
2437 /* 2463 /*
2438 * possibly return codes 2464 * possibly return codes
2439 */ 2465 */
2440 autorun_devices(0); 2466 autorun_devices(0);
2441 return 0; 2467 return 0;
2442 2468
2443 } 2469 }
2444 2470
2445 2471
2446 static int get_version(void __user * arg) 2472 static int get_version(void __user * arg)
2447 { 2473 {
2448 mdu_version_t ver; 2474 mdu_version_t ver;
2449 2475
2450 ver.major = MD_MAJOR_VERSION; 2476 ver.major = MD_MAJOR_VERSION;
2451 ver.minor = MD_MINOR_VERSION; 2477 ver.minor = MD_MINOR_VERSION;
2452 ver.patchlevel = MD_PATCHLEVEL_VERSION; 2478 ver.patchlevel = MD_PATCHLEVEL_VERSION;
2453 2479
2454 if (copy_to_user(arg, &ver, sizeof(ver))) 2480 if (copy_to_user(arg, &ver, sizeof(ver)))
2455 return -EFAULT; 2481 return -EFAULT;
2456 2482
2457 return 0; 2483 return 0;
2458 } 2484 }
2459 2485
2460 static int get_array_info(mddev_t * mddev, void __user * arg) 2486 static int get_array_info(mddev_t * mddev, void __user * arg)
2461 { 2487 {
2462 mdu_array_info_t info; 2488 mdu_array_info_t info;
2463 int nr,working,active,failed,spare; 2489 int nr,working,active,failed,spare;
2464 mdk_rdev_t *rdev; 2490 mdk_rdev_t *rdev;
2465 struct list_head *tmp; 2491 struct list_head *tmp;
2466 2492
2467 nr=working=active=failed=spare=0; 2493 nr=working=active=failed=spare=0;
2468 ITERATE_RDEV(mddev,rdev,tmp) { 2494 ITERATE_RDEV(mddev,rdev,tmp) {
2469 nr++; 2495 nr++;
2470 if (test_bit(Faulty, &rdev->flags)) 2496 if (test_bit(Faulty, &rdev->flags))
2471 failed++; 2497 failed++;
2472 else { 2498 else {
2473 working++; 2499 working++;
2474 if (test_bit(In_sync, &rdev->flags)) 2500 if (test_bit(In_sync, &rdev->flags))
2475 active++; 2501 active++;
2476 else 2502 else
2477 spare++; 2503 spare++;
2478 } 2504 }
2479 } 2505 }
2480 2506
2481 info.major_version = mddev->major_version; 2507 info.major_version = mddev->major_version;
2482 info.minor_version = mddev->minor_version; 2508 info.minor_version = mddev->minor_version;
2483 info.patch_version = MD_PATCHLEVEL_VERSION; 2509 info.patch_version = MD_PATCHLEVEL_VERSION;
2484 info.ctime = mddev->ctime; 2510 info.ctime = mddev->ctime;
2485 info.level = mddev->level; 2511 info.level = mddev->level;
2486 info.size = mddev->size; 2512 info.size = mddev->size;
2487 info.nr_disks = nr; 2513 info.nr_disks = nr;
2488 info.raid_disks = mddev->raid_disks; 2514 info.raid_disks = mddev->raid_disks;
2489 info.md_minor = mddev->md_minor; 2515 info.md_minor = mddev->md_minor;
2490 info.not_persistent= !mddev->persistent; 2516 info.not_persistent= !mddev->persistent;
2491 2517
2492 info.utime = mddev->utime; 2518 info.utime = mddev->utime;
2493 info.state = 0; 2519 info.state = 0;
2494 if (mddev->in_sync) 2520 if (mddev->in_sync)
2495 info.state = (1<<MD_SB_CLEAN); 2521 info.state = (1<<MD_SB_CLEAN);
2496 if (mddev->bitmap && mddev->bitmap_offset) 2522 if (mddev->bitmap && mddev->bitmap_offset)
2497 info.state = (1<<MD_SB_BITMAP_PRESENT); 2523 info.state = (1<<MD_SB_BITMAP_PRESENT);
2498 info.active_disks = active; 2524 info.active_disks = active;
2499 info.working_disks = working; 2525 info.working_disks = working;
2500 info.failed_disks = failed; 2526 info.failed_disks = failed;
2501 info.spare_disks = spare; 2527 info.spare_disks = spare;
2502 2528
2503 info.layout = mddev->layout; 2529 info.layout = mddev->layout;
2504 info.chunk_size = mddev->chunk_size; 2530 info.chunk_size = mddev->chunk_size;
2505 2531
2506 if (copy_to_user(arg, &info, sizeof(info))) 2532 if (copy_to_user(arg, &info, sizeof(info)))
2507 return -EFAULT; 2533 return -EFAULT;
2508 2534
2509 return 0; 2535 return 0;
2510 } 2536 }
2511 2537
2512 static int get_bitmap_file(mddev_t * mddev, void __user * arg) 2538 static int get_bitmap_file(mddev_t * mddev, void __user * arg)
2513 { 2539 {
2514 mdu_bitmap_file_t *file = NULL; /* too big for stack allocation */ 2540 mdu_bitmap_file_t *file = NULL; /* too big for stack allocation */
2515 char *ptr, *buf = NULL; 2541 char *ptr, *buf = NULL;
2516 int err = -ENOMEM; 2542 int err = -ENOMEM;
2517 2543
2518 file = kmalloc(sizeof(*file), GFP_KERNEL); 2544 file = kmalloc(sizeof(*file), GFP_KERNEL);
2519 if (!file) 2545 if (!file)
2520 goto out; 2546 goto out;
2521 2547
2522 /* bitmap disabled, zero the first byte and copy out */ 2548 /* bitmap disabled, zero the first byte and copy out */
2523 if (!mddev->bitmap || !mddev->bitmap->file) { 2549 if (!mddev->bitmap || !mddev->bitmap->file) {
2524 file->pathname[0] = '\0'; 2550 file->pathname[0] = '\0';
2525 goto copy_out; 2551 goto copy_out;
2526 } 2552 }
2527 2553
2528 buf = kmalloc(sizeof(file->pathname), GFP_KERNEL); 2554 buf = kmalloc(sizeof(file->pathname), GFP_KERNEL);
2529 if (!buf) 2555 if (!buf)
2530 goto out; 2556 goto out;
2531 2557
2532 ptr = file_path(mddev->bitmap->file, buf, sizeof(file->pathname)); 2558 ptr = file_path(mddev->bitmap->file, buf, sizeof(file->pathname));
2533 if (!ptr) 2559 if (!ptr)
2534 goto out; 2560 goto out;
2535 2561
2536 strcpy(file->pathname, ptr); 2562 strcpy(file->pathname, ptr);
2537 2563
2538 copy_out: 2564 copy_out:
2539 err = 0; 2565 err = 0;
2540 if (copy_to_user(arg, file, sizeof(*file))) 2566 if (copy_to_user(arg, file, sizeof(*file)))
2541 err = -EFAULT; 2567 err = -EFAULT;
2542 out: 2568 out:
2543 kfree(buf); 2569 kfree(buf);
2544 kfree(file); 2570 kfree(file);
2545 return err; 2571 return err;
2546 } 2572 }
2547 2573
2548 static int get_disk_info(mddev_t * mddev, void __user * arg) 2574 static int get_disk_info(mddev_t * mddev, void __user * arg)
2549 { 2575 {
2550 mdu_disk_info_t info; 2576 mdu_disk_info_t info;
2551 unsigned int nr; 2577 unsigned int nr;
2552 mdk_rdev_t *rdev; 2578 mdk_rdev_t *rdev;
2553 2579
2554 if (copy_from_user(&info, arg, sizeof(info))) 2580 if (copy_from_user(&info, arg, sizeof(info)))
2555 return -EFAULT; 2581 return -EFAULT;
2556 2582
2557 nr = info.number; 2583 nr = info.number;
2558 2584
2559 rdev = find_rdev_nr(mddev, nr); 2585 rdev = find_rdev_nr(mddev, nr);
2560 if (rdev) { 2586 if (rdev) {
2561 info.major = MAJOR(rdev->bdev->bd_dev); 2587 info.major = MAJOR(rdev->bdev->bd_dev);
2562 info.minor = MINOR(rdev->bdev->bd_dev); 2588 info.minor = MINOR(rdev->bdev->bd_dev);
2563 info.raid_disk = rdev->raid_disk; 2589 info.raid_disk = rdev->raid_disk;
2564 info.state = 0; 2590 info.state = 0;
2565 if (test_bit(Faulty, &rdev->flags)) 2591 if (test_bit(Faulty, &rdev->flags))
2566 info.state |= (1<<MD_DISK_FAULTY); 2592 info.state |= (1<<MD_DISK_FAULTY);
2567 else if (test_bit(In_sync, &rdev->flags)) { 2593 else if (test_bit(In_sync, &rdev->flags)) {
2568 info.state |= (1<<MD_DISK_ACTIVE); 2594 info.state |= (1<<MD_DISK_ACTIVE);
2569 info.state |= (1<<MD_DISK_SYNC); 2595 info.state |= (1<<MD_DISK_SYNC);
2570 } 2596 }
2571 if (test_bit(WriteMostly, &rdev->flags)) 2597 if (test_bit(WriteMostly, &rdev->flags))
2572 info.state |= (1<<MD_DISK_WRITEMOSTLY); 2598 info.state |= (1<<MD_DISK_WRITEMOSTLY);
2573 } else { 2599 } else {
2574 info.major = info.minor = 0; 2600 info.major = info.minor = 0;
2575 info.raid_disk = -1; 2601 info.raid_disk = -1;
2576 info.state = (1<<MD_DISK_REMOVED); 2602 info.state = (1<<MD_DISK_REMOVED);
2577 } 2603 }
2578 2604
2579 if (copy_to_user(arg, &info, sizeof(info))) 2605 if (copy_to_user(arg, &info, sizeof(info)))
2580 return -EFAULT; 2606 return -EFAULT;
2581 2607
2582 return 0; 2608 return 0;
2583 } 2609 }
2584 2610
2585 static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info) 2611 static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info)
2586 { 2612 {
2587 char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; 2613 char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE];
2588 mdk_rdev_t *rdev; 2614 mdk_rdev_t *rdev;
2589 dev_t dev = MKDEV(info->major,info->minor); 2615 dev_t dev = MKDEV(info->major,info->minor);
2590 2616
2591 if (info->major != MAJOR(dev) || info->minor != MINOR(dev)) 2617 if (info->major != MAJOR(dev) || info->minor != MINOR(dev))
2592 return -EOVERFLOW; 2618 return -EOVERFLOW;
2593 2619
2594 if (!mddev->raid_disks) { 2620 if (!mddev->raid_disks) {
2595 int err; 2621 int err;
2596 /* expecting a device which has a superblock */ 2622 /* expecting a device which has a superblock */
2597 rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); 2623 rdev = md_import_device(dev, mddev->major_version, mddev->minor_version);
2598 if (IS_ERR(rdev)) { 2624 if (IS_ERR(rdev)) {
2599 printk(KERN_WARNING 2625 printk(KERN_WARNING
2600 "md: md_import_device returned %ld\n", 2626 "md: md_import_device returned %ld\n",
2601 PTR_ERR(rdev)); 2627 PTR_ERR(rdev));
2602 return PTR_ERR(rdev); 2628 return PTR_ERR(rdev);
2603 } 2629 }
2604 if (!list_empty(&mddev->disks)) { 2630 if (!list_empty(&mddev->disks)) {
2605 mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, 2631 mdk_rdev_t *rdev0 = list_entry(mddev->disks.next,
2606 mdk_rdev_t, same_set); 2632 mdk_rdev_t, same_set);
2607 int err = super_types[mddev->major_version] 2633 int err = super_types[mddev->major_version]
2608 .load_super(rdev, rdev0, mddev->minor_version); 2634 .load_super(rdev, rdev0, mddev->minor_version);
2609 if (err < 0) { 2635 if (err < 0) {
2610 printk(KERN_WARNING 2636 printk(KERN_WARNING
2611 "md: %s has different UUID to %s\n", 2637 "md: %s has different UUID to %s\n",
2612 bdevname(rdev->bdev,b), 2638 bdevname(rdev->bdev,b),
2613 bdevname(rdev0->bdev,b2)); 2639 bdevname(rdev0->bdev,b2));
2614 export_rdev(rdev); 2640 export_rdev(rdev);
2615 return -EINVAL; 2641 return -EINVAL;
2616 } 2642 }
2617 } 2643 }
2618 err = bind_rdev_to_array(rdev, mddev); 2644 err = bind_rdev_to_array(rdev, mddev);
2619 if (err) 2645 if (err)
2620 export_rdev(rdev); 2646 export_rdev(rdev);
2621 return err; 2647 return err;
2622 } 2648 }
2623 2649
2624 /* 2650 /*
2625 * add_new_disk can be used once the array is assembled 2651 * add_new_disk can be used once the array is assembled
2626 * to add "hot spares". They must already have a superblock 2652 * to add "hot spares". They must already have a superblock
2627 * written 2653 * written
2628 */ 2654 */
2629 if (mddev->pers) { 2655 if (mddev->pers) {
2630 int err; 2656 int err;
2631 if (!mddev->pers->hot_add_disk) { 2657 if (!mddev->pers->hot_add_disk) {
2632 printk(KERN_WARNING 2658 printk(KERN_WARNING
2633 "%s: personality does not support diskops!\n", 2659 "%s: personality does not support diskops!\n",
2634 mdname(mddev)); 2660 mdname(mddev));
2635 return -EINVAL; 2661 return -EINVAL;
2636 } 2662 }
2637 if (mddev->persistent) 2663 if (mddev->persistent)
2638 rdev = md_import_device(dev, mddev->major_version, 2664 rdev = md_import_device(dev, mddev->major_version,
2639 mddev->minor_version); 2665 mddev->minor_version);
2640 else 2666 else
2641 rdev = md_import_device(dev, -1, -1); 2667 rdev = md_import_device(dev, -1, -1);
2642 if (IS_ERR(rdev)) { 2668 if (IS_ERR(rdev)) {
2643 printk(KERN_WARNING 2669 printk(KERN_WARNING
2644 "md: md_import_device returned %ld\n", 2670 "md: md_import_device returned %ld\n",
2645 PTR_ERR(rdev)); 2671 PTR_ERR(rdev));
2646 return PTR_ERR(rdev); 2672 return PTR_ERR(rdev);
2647 } 2673 }
2648 /* set save_raid_disk if appropriate */ 2674 /* set save_raid_disk if appropriate */
2649 if (!mddev->persistent) { 2675 if (!mddev->persistent) {
2650 if (info->state & (1<<MD_DISK_SYNC) && 2676 if (info->state & (1<<MD_DISK_SYNC) &&
2651 info->raid_disk < mddev->raid_disks) 2677 info->raid_disk < mddev->raid_disks)
2652 rdev->raid_disk = info->raid_disk; 2678 rdev->raid_disk = info->raid_disk;
2653 else 2679 else
2654 rdev->raid_disk = -1; 2680 rdev->raid_disk = -1;
2655 } else 2681 } else
2656 super_types[mddev->major_version]. 2682 super_types[mddev->major_version].
2657 validate_super(mddev, rdev); 2683 validate_super(mddev, rdev);
2658 rdev->saved_raid_disk = rdev->raid_disk; 2684 rdev->saved_raid_disk = rdev->raid_disk;
2659 2685
2660 clear_bit(In_sync, &rdev->flags); /* just to be sure */ 2686 clear_bit(In_sync, &rdev->flags); /* just to be sure */
2661 if (info->state & (1<<MD_DISK_WRITEMOSTLY)) 2687 if (info->state & (1<<MD_DISK_WRITEMOSTLY))
2662 set_bit(WriteMostly, &rdev->flags); 2688 set_bit(WriteMostly, &rdev->flags);
2663 2689
2664 rdev->raid_disk = -1; 2690 rdev->raid_disk = -1;
2665 err = bind_rdev_to_array(rdev, mddev); 2691 err = bind_rdev_to_array(rdev, mddev);
2666 if (err) 2692 if (err)
2667 export_rdev(rdev); 2693 export_rdev(rdev);
2668 2694
2669 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 2695 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
2670 md_wakeup_thread(mddev->thread); 2696 md_wakeup_thread(mddev->thread);
2671 return err; 2697 return err;
2672 } 2698 }
2673 2699
2674 /* otherwise, add_new_disk is only allowed 2700 /* otherwise, add_new_disk is only allowed
2675 * for major_version==0 superblocks 2701 * for major_version==0 superblocks
2676 */ 2702 */
2677 if (mddev->major_version != 0) { 2703 if (mddev->major_version != 0) {
2678 printk(KERN_WARNING "%s: ADD_NEW_DISK not supported\n", 2704 printk(KERN_WARNING "%s: ADD_NEW_DISK not supported\n",
2679 mdname(mddev)); 2705 mdname(mddev));
2680 return -EINVAL; 2706 return -EINVAL;
2681 } 2707 }
2682 2708
2683 if (!(info->state & (1<<MD_DISK_FAULTY))) { 2709 if (!(info->state & (1<<MD_DISK_FAULTY))) {
2684 int err; 2710 int err;
2685 rdev = md_import_device (dev, -1, 0); 2711 rdev = md_import_device (dev, -1, 0);
2686 if (IS_ERR(rdev)) { 2712 if (IS_ERR(rdev)) {
2687 printk(KERN_WARNING 2713 printk(KERN_WARNING
2688 "md: error, md_import_device() returned %ld\n", 2714 "md: error, md_import_device() returned %ld\n",
2689 PTR_ERR(rdev)); 2715 PTR_ERR(rdev));
2690 return PTR_ERR(rdev); 2716 return PTR_ERR(rdev);
2691 } 2717 }
2692 rdev->desc_nr = info->number; 2718 rdev->desc_nr = info->number;
2693 if (info->raid_disk < mddev->raid_disks) 2719 if (info->raid_disk < mddev->raid_disks)
2694 rdev->raid_disk = info->raid_disk; 2720 rdev->raid_disk = info->raid_disk;
2695 else 2721 else
2696 rdev->raid_disk = -1; 2722 rdev->raid_disk = -1;
2697 2723
2698 rdev->flags = 0; 2724 rdev->flags = 0;
2699 2725
2700 if (rdev->raid_disk < mddev->raid_disks) 2726 if (rdev->raid_disk < mddev->raid_disks)
2701 if (info->state & (1<<MD_DISK_SYNC)) 2727 if (info->state & (1<<MD_DISK_SYNC))
2702 set_bit(In_sync, &rdev->flags); 2728 set_bit(In_sync, &rdev->flags);
2703 2729
2704 if (info->state & (1<<MD_DISK_WRITEMOSTLY)) 2730 if (info->state & (1<<MD_DISK_WRITEMOSTLY))
2705 set_bit(WriteMostly, &rdev->flags); 2731 set_bit(WriteMostly, &rdev->flags);
2706 2732
2707 err = bind_rdev_to_array(rdev, mddev); 2733 err = bind_rdev_to_array(rdev, mddev);
2708 if (err) { 2734 if (err) {
2709 export_rdev(rdev); 2735 export_rdev(rdev);
2710 return err; 2736 return err;
2711 } 2737 }
2712 2738
2713 if (!mddev->persistent) { 2739 if (!mddev->persistent) {
2714 printk(KERN_INFO "md: nonpersistent superblock ...\n"); 2740 printk(KERN_INFO "md: nonpersistent superblock ...\n");
2715 rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; 2741 rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
2716 } else 2742 } else
2717 rdev->sb_offset = calc_dev_sboffset(rdev->bdev); 2743 rdev->sb_offset = calc_dev_sboffset(rdev->bdev);
2718 rdev->size = calc_dev_size(rdev, mddev->chunk_size); 2744 rdev->size = calc_dev_size(rdev, mddev->chunk_size);
2719 2745
2720 if (!mddev->size || (mddev->size > rdev->size)) 2746 if (!mddev->size || (mddev->size > rdev->size))
2721 mddev->size = rdev->size; 2747 mddev->size = rdev->size;
2722 } 2748 }
2723 2749
2724 return 0; 2750 return 0;
2725 } 2751 }
2726 2752
2727 static int hot_remove_disk(mddev_t * mddev, dev_t dev) 2753 static int hot_remove_disk(mddev_t * mddev, dev_t dev)
2728 { 2754 {
2729 char b[BDEVNAME_SIZE]; 2755 char b[BDEVNAME_SIZE];
2730 mdk_rdev_t *rdev; 2756 mdk_rdev_t *rdev;
2731 2757
2732 if (!mddev->pers) 2758 if (!mddev->pers)
2733 return -ENODEV; 2759 return -ENODEV;
2734 2760
2735 rdev = find_rdev(mddev, dev); 2761 rdev = find_rdev(mddev, dev);
2736 if (!rdev) 2762 if (!rdev)
2737 return -ENXIO; 2763 return -ENXIO;
2738 2764
2739 if (rdev->raid_disk >= 0) 2765 if (rdev->raid_disk >= 0)
2740 goto busy; 2766 goto busy;
2741 2767
2742 kick_rdev_from_array(rdev); 2768 kick_rdev_from_array(rdev);
2743 md_update_sb(mddev); 2769 md_update_sb(mddev);
2744 md_new_event(mddev); 2770 md_new_event(mddev);
2745 2771
2746 return 0; 2772 return 0;
2747 busy: 2773 busy:
2748 printk(KERN_WARNING "md: cannot remove active disk %s from %s ... \n", 2774 printk(KERN_WARNING "md: cannot remove active disk %s from %s ... \n",
2749 bdevname(rdev->bdev,b), mdname(mddev)); 2775 bdevname(rdev->bdev,b), mdname(mddev));
2750 return -EBUSY; 2776 return -EBUSY;
2751 } 2777 }
2752 2778
2753 static int hot_add_disk(mddev_t * mddev, dev_t dev) 2779 static int hot_add_disk(mddev_t * mddev, dev_t dev)
2754 { 2780 {
2755 char b[BDEVNAME_SIZE]; 2781 char b[BDEVNAME_SIZE];
2756 int err; 2782 int err;
2757 unsigned int size; 2783 unsigned int size;
2758 mdk_rdev_t *rdev; 2784 mdk_rdev_t *rdev;
2759 2785
2760 if (!mddev->pers) 2786 if (!mddev->pers)
2761 return -ENODEV; 2787 return -ENODEV;
2762 2788
2763 if (mddev->major_version != 0) { 2789 if (mddev->major_version != 0) {
2764 printk(KERN_WARNING "%s: HOT_ADD may only be used with" 2790 printk(KERN_WARNING "%s: HOT_ADD may only be used with"
2765 " version-0 superblocks.\n", 2791 " version-0 superblocks.\n",
2766 mdname(mddev)); 2792 mdname(mddev));
2767 return -EINVAL; 2793 return -EINVAL;
2768 } 2794 }
2769 if (!mddev->pers->hot_add_disk) { 2795 if (!mddev->pers->hot_add_disk) {
2770 printk(KERN_WARNING 2796 printk(KERN_WARNING
2771 "%s: personality does not support diskops!\n", 2797 "%s: personality does not support diskops!\n",
2772 mdname(mddev)); 2798 mdname(mddev));
2773 return -EINVAL; 2799 return -EINVAL;
2774 } 2800 }
2775 2801
2776 rdev = md_import_device (dev, -1, 0); 2802 rdev = md_import_device (dev, -1, 0);
2777 if (IS_ERR(rdev)) { 2803 if (IS_ERR(rdev)) {
2778 printk(KERN_WARNING 2804 printk(KERN_WARNING
2779 "md: error, md_import_device() returned %ld\n", 2805 "md: error, md_import_device() returned %ld\n",
2780 PTR_ERR(rdev)); 2806 PTR_ERR(rdev));
2781 return -EINVAL; 2807 return -EINVAL;
2782 } 2808 }
2783 2809
2784 if (mddev->persistent) 2810 if (mddev->persistent)
2785 rdev->sb_offset = calc_dev_sboffset(rdev->bdev); 2811 rdev->sb_offset = calc_dev_sboffset(rdev->bdev);
2786 else 2812 else
2787 rdev->sb_offset = 2813 rdev->sb_offset =
2788 rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; 2814 rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
2789 2815
2790 size = calc_dev_size(rdev, mddev->chunk_size); 2816 size = calc_dev_size(rdev, mddev->chunk_size);
2791 rdev->size = size; 2817 rdev->size = size;
2792 2818
2793 if (size < mddev->size) { 2819 if (size < mddev->size) {
2794 printk(KERN_WARNING 2820 printk(KERN_WARNING
2795 "%s: disk size %llu blocks < array size %llu\n", 2821 "%s: disk size %llu blocks < array size %llu\n",
2796 mdname(mddev), (unsigned long long)size, 2822 mdname(mddev), (unsigned long long)size,
2797 (unsigned long long)mddev->size); 2823 (unsigned long long)mddev->size);
2798 err = -ENOSPC; 2824 err = -ENOSPC;
2799 goto abort_export; 2825 goto abort_export;
2800 } 2826 }
2801 2827
2802 if (test_bit(Faulty, &rdev->flags)) { 2828 if (test_bit(Faulty, &rdev->flags)) {
2803 printk(KERN_WARNING 2829 printk(KERN_WARNING
2804 "md: can not hot-add faulty %s disk to %s!\n", 2830 "md: can not hot-add faulty %s disk to %s!\n",
2805 bdevname(rdev->bdev,b), mdname(mddev)); 2831 bdevname(rdev->bdev,b), mdname(mddev));
2806 err = -EINVAL; 2832 err = -EINVAL;
2807 goto abort_export; 2833 goto abort_export;
2808 } 2834 }
2809 clear_bit(In_sync, &rdev->flags); 2835 clear_bit(In_sync, &rdev->flags);
2810 rdev->desc_nr = -1; 2836 rdev->desc_nr = -1;
2811 bind_rdev_to_array(rdev, mddev); 2837 bind_rdev_to_array(rdev, mddev);
2812 2838
2813 /* 2839 /*
2814 * The rest should better be atomic, we can have disk failures 2840 * The rest should better be atomic, we can have disk failures
2815 * noticed in interrupt contexts ... 2841 * noticed in interrupt contexts ...
2816 */ 2842 */
2817 2843
2818 if (rdev->desc_nr == mddev->max_disks) { 2844 if (rdev->desc_nr == mddev->max_disks) {
2819 printk(KERN_WARNING "%s: can not hot-add to full array!\n", 2845 printk(KERN_WARNING "%s: can not hot-add to full array!\n",
2820 mdname(mddev)); 2846 mdname(mddev));
2821 err = -EBUSY; 2847 err = -EBUSY;
2822 goto abort_unbind_export; 2848 goto abort_unbind_export;
2823 } 2849 }
2824 2850
2825 rdev->raid_disk = -1; 2851 rdev->raid_disk = -1;
2826 2852
2827 md_update_sb(mddev); 2853 md_update_sb(mddev);
2828 2854
2829 /* 2855 /*
2830 * Kick recovery, maybe this spare has to be added to the 2856 * Kick recovery, maybe this spare has to be added to the
2831 * array immediately. 2857 * array immediately.
2832 */ 2858 */
2833 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 2859 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
2834 md_wakeup_thread(mddev->thread); 2860 md_wakeup_thread(mddev->thread);
2835 md_new_event(mddev); 2861 md_new_event(mddev);
2836 return 0; 2862 return 0;
2837 2863
2838 abort_unbind_export: 2864 abort_unbind_export:
2839 unbind_rdev_from_array(rdev); 2865 unbind_rdev_from_array(rdev);
2840 2866
2841 abort_export: 2867 abort_export:
2842 export_rdev(rdev); 2868 export_rdev(rdev);
2843 return err; 2869 return err;
2844 } 2870 }
2845 2871
2846 /* similar to deny_write_access, but accounts for our holding a reference 2872 /* similar to deny_write_access, but accounts for our holding a reference
2847 * to the file ourselves */ 2873 * to the file ourselves */
2848 static int deny_bitmap_write_access(struct file * file) 2874 static int deny_bitmap_write_access(struct file * file)
2849 { 2875 {
2850 struct inode *inode = file->f_mapping->host; 2876 struct inode *inode = file->f_mapping->host;
2851 2877
2852 spin_lock(&inode->i_lock); 2878 spin_lock(&inode->i_lock);
2853 if (atomic_read(&inode->i_writecount) > 1) { 2879 if (atomic_read(&inode->i_writecount) > 1) {
2854 spin_unlock(&inode->i_lock); 2880 spin_unlock(&inode->i_lock);
2855 return -ETXTBSY; 2881 return -ETXTBSY;
2856 } 2882 }
2857 atomic_set(&inode->i_writecount, -1); 2883 atomic_set(&inode->i_writecount, -1);
2858 spin_unlock(&inode->i_lock); 2884 spin_unlock(&inode->i_lock);
2859 2885
2860 return 0; 2886 return 0;
2861 } 2887 }
2862 2888
2863 static int set_bitmap_file(mddev_t *mddev, int fd) 2889 static int set_bitmap_file(mddev_t *mddev, int fd)
2864 { 2890 {
2865 int err; 2891 int err;
2866 2892
2867 if (mddev->pers) { 2893 if (mddev->pers) {
2868 if (!mddev->pers->quiesce) 2894 if (!mddev->pers->quiesce)
2869 return -EBUSY; 2895 return -EBUSY;
2870 if (mddev->recovery || mddev->sync_thread) 2896 if (mddev->recovery || mddev->sync_thread)
2871 return -EBUSY; 2897 return -EBUSY;
2872 /* we should be able to change the bitmap.. */ 2898 /* we should be able to change the bitmap.. */
2873 } 2899 }
2874 2900
2875 2901
2876 if (fd >= 0) { 2902 if (fd >= 0) {
2877 if (mddev->bitmap) 2903 if (mddev->bitmap)
2878 return -EEXIST; /* cannot add when bitmap is present */ 2904 return -EEXIST; /* cannot add when bitmap is present */
2879 mddev->bitmap_file = fget(fd); 2905 mddev->bitmap_file = fget(fd);
2880 2906
2881 if (mddev->bitmap_file == NULL) { 2907 if (mddev->bitmap_file == NULL) {
2882 printk(KERN_ERR "%s: error: failed to get bitmap file\n", 2908 printk(KERN_ERR "%s: error: failed to get bitmap file\n",
2883 mdname(mddev)); 2909 mdname(mddev));
2884 return -EBADF; 2910 return -EBADF;
2885 } 2911 }
2886 2912
2887 err = deny_bitmap_write_access(mddev->bitmap_file); 2913 err = deny_bitmap_write_access(mddev->bitmap_file);
2888 if (err) { 2914 if (err) {
2889 printk(KERN_ERR "%s: error: bitmap file is already in use\n", 2915 printk(KERN_ERR "%s: error: bitmap file is already in use\n",
2890 mdname(mddev)); 2916 mdname(mddev));
2891 fput(mddev->bitmap_file); 2917 fput(mddev->bitmap_file);
2892 mddev->bitmap_file = NULL; 2918 mddev->bitmap_file = NULL;
2893 return err; 2919 return err;
2894 } 2920 }
2895 mddev->bitmap_offset = 0; /* file overrides offset */ 2921 mddev->bitmap_offset = 0; /* file overrides offset */
2896 } else if (mddev->bitmap == NULL) 2922 } else if (mddev->bitmap == NULL)
2897 return -ENOENT; /* cannot remove what isn't there */ 2923 return -ENOENT; /* cannot remove what isn't there */
2898 err = 0; 2924 err = 0;
2899 if (mddev->pers) { 2925 if (mddev->pers) {
2900 mddev->pers->quiesce(mddev, 1); 2926 mddev->pers->quiesce(mddev, 1);
2901 if (fd >= 0) 2927 if (fd >= 0)
2902 err = bitmap_create(mddev); 2928 err = bitmap_create(mddev);
2903 if (fd < 0 || err) 2929 if (fd < 0 || err)
2904 bitmap_destroy(mddev); 2930 bitmap_destroy(mddev);
2905 mddev->pers->quiesce(mddev, 0); 2931 mddev->pers->quiesce(mddev, 0);
2906 } else if (fd < 0) { 2932 } else if (fd < 0) {
2907 if (mddev->bitmap_file) 2933 if (mddev->bitmap_file)
2908 fput(mddev->bitmap_file); 2934 fput(mddev->bitmap_file);
2909 mddev->bitmap_file = NULL; 2935 mddev->bitmap_file = NULL;
2910 } 2936 }
2911 2937
2912 return err; 2938 return err;
2913 } 2939 }
2914 2940
2915 /* 2941 /*
2916 * set_array_info is used two different ways 2942 * set_array_info is used two different ways
2917 * The original usage is when creating a new array. 2943 * The original usage is when creating a new array.
2918 * In this usage, raid_disks is > 0 and it together with 2944 * In this usage, raid_disks is > 0 and it together with
2919 * level, size, not_persistent,layout,chunksize determine the 2945 * level, size, not_persistent,layout,chunksize determine the
2920 * shape of the array. 2946 * shape of the array.
2921 * This will always create an array with a type-0.90.0 superblock. 2947 * This will always create an array with a type-0.90.0 superblock.
2922 * The newer usage is when assembling an array. 2948 * The newer usage is when assembling an array.
2923 * In this case raid_disks will be 0, and the major_version field is 2949 * In this case raid_disks will be 0, and the major_version field is
2924 * use to determine which style super-blocks are to be found on the devices. 2950 * use to determine which style super-blocks are to be found on the devices.
2925 * The minor and patch _version numbers are also kept incase the 2951 * The minor and patch _version numbers are also kept incase the
2926 * super_block handler wishes to interpret them. 2952 * super_block handler wishes to interpret them.
2927 */ 2953 */
2928 static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) 2954 static int set_array_info(mddev_t * mddev, mdu_array_info_t *info)
2929 { 2955 {
2930 2956
2931 if (info->raid_disks == 0) { 2957 if (info->raid_disks == 0) {
2932 /* just setting version number for superblock loading */ 2958 /* just setting version number for superblock loading */
2933 if (info->major_version < 0 || 2959 if (info->major_version < 0 ||
2934 info->major_version >= sizeof(super_types)/sizeof(super_types[0]) || 2960 info->major_version >= sizeof(super_types)/sizeof(super_types[0]) ||
2935 super_types[info->major_version].name == NULL) { 2961 super_types[info->major_version].name == NULL) {
2936 /* maybe try to auto-load a module? */ 2962 /* maybe try to auto-load a module? */
2937 printk(KERN_INFO 2963 printk(KERN_INFO
2938 "md: superblock version %d not known\n", 2964 "md: superblock version %d not known\n",
2939 info->major_version); 2965 info->major_version);
2940 return -EINVAL; 2966 return -EINVAL;
2941 } 2967 }
2942 mddev->major_version = info->major_version; 2968 mddev->major_version = info->major_version;
2943 mddev->minor_version = info->minor_version; 2969 mddev->minor_version = info->minor_version;
2944 mddev->patch_version = info->patch_version; 2970 mddev->patch_version = info->patch_version;
2945 return 0; 2971 return 0;
2946 } 2972 }
2947 mddev->major_version = MD_MAJOR_VERSION; 2973 mddev->major_version = MD_MAJOR_VERSION;
2948 mddev->minor_version = MD_MINOR_VERSION; 2974 mddev->minor_version = MD_MINOR_VERSION;
2949 mddev->patch_version = MD_PATCHLEVEL_VERSION; 2975 mddev->patch_version = MD_PATCHLEVEL_VERSION;
2950 mddev->ctime = get_seconds(); 2976 mddev->ctime = get_seconds();
2951 2977
2952 mddev->level = info->level; 2978 mddev->level = info->level;
2953 mddev->size = info->size; 2979 mddev->size = info->size;
2954 mddev->raid_disks = info->raid_disks; 2980 mddev->raid_disks = info->raid_disks;
2955 /* don't set md_minor, it is determined by which /dev/md* was 2981 /* don't set md_minor, it is determined by which /dev/md* was
2956 * openned 2982 * openned
2957 */ 2983 */
2958 if (info->state & (1<<MD_SB_CLEAN)) 2984 if (info->state & (1<<MD_SB_CLEAN))
2959 mddev->recovery_cp = MaxSector; 2985 mddev->recovery_cp = MaxSector;
2960 else 2986 else
2961 mddev->recovery_cp = 0; 2987 mddev->recovery_cp = 0;
2962 mddev->persistent = ! info->not_persistent; 2988 mddev->persistent = ! info->not_persistent;
2963 2989
2964 mddev->layout = info->layout; 2990 mddev->layout = info->layout;
2965 mddev->chunk_size = info->chunk_size; 2991 mddev->chunk_size = info->chunk_size;
2966 2992
2967 mddev->max_disks = MD_SB_DISKS; 2993 mddev->max_disks = MD_SB_DISKS;
2968 2994
2969 mddev->sb_dirty = 1; 2995 mddev->sb_dirty = 1;
2970 2996
2971 mddev->default_bitmap_offset = MD_SB_BYTES >> 9; 2997 mddev->default_bitmap_offset = MD_SB_BYTES >> 9;
2972 mddev->bitmap_offset = 0; 2998 mddev->bitmap_offset = 0;
2973 2999
2974 /* 3000 /*
2975 * Generate a 128 bit UUID 3001 * Generate a 128 bit UUID
2976 */ 3002 */
2977 get_random_bytes(mddev->uuid, 16); 3003 get_random_bytes(mddev->uuid, 16);
2978 3004
2979 return 0; 3005 return 0;
2980 } 3006 }
2981 3007
2982 /* 3008 /*
2983 * update_array_info is used to change the configuration of an 3009 * update_array_info is used to change the configuration of an
2984 * on-line array. 3010 * on-line array.
2985 * The version, ctime,level,size,raid_disks,not_persistent, layout,chunk_size 3011 * The version, ctime,level,size,raid_disks,not_persistent, layout,chunk_size
2986 * fields in the info are checked against the array. 3012 * fields in the info are checked against the array.
2987 * Any differences that cannot be handled will cause an error. 3013 * Any differences that cannot be handled will cause an error.
2988 * Normally, only one change can be managed at a time. 3014 * Normally, only one change can be managed at a time.
2989 */ 3015 */
2990 static int update_array_info(mddev_t *mddev, mdu_array_info_t *info) 3016 static int update_array_info(mddev_t *mddev, mdu_array_info_t *info)
2991 { 3017 {
2992 int rv = 0; 3018 int rv = 0;
2993 int cnt = 0; 3019 int cnt = 0;
2994 int state = 0; 3020 int state = 0;
2995 3021
2996 /* calculate expected state,ignoring low bits */ 3022 /* calculate expected state,ignoring low bits */
2997 if (mddev->bitmap && mddev->bitmap_offset) 3023 if (mddev->bitmap && mddev->bitmap_offset)
2998 state |= (1 << MD_SB_BITMAP_PRESENT); 3024 state |= (1 << MD_SB_BITMAP_PRESENT);
2999 3025
3000 if (mddev->major_version != info->major_version || 3026 if (mddev->major_version != info->major_version ||
3001 mddev->minor_version != info->minor_version || 3027 mddev->minor_version != info->minor_version ||
3002 /* mddev->patch_version != info->patch_version || */ 3028 /* mddev->patch_version != info->patch_version || */
3003 mddev->ctime != info->ctime || 3029 mddev->ctime != info->ctime ||
3004 mddev->level != info->level || 3030 mddev->level != info->level ||
3005 /* mddev->layout != info->layout || */ 3031 /* mddev->layout != info->layout || */
3006 !mddev->persistent != info->not_persistent|| 3032 !mddev->persistent != info->not_persistent||
3007 mddev->chunk_size != info->chunk_size || 3033 mddev->chunk_size != info->chunk_size ||
3008 /* ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT to change */ 3034 /* ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT to change */
3009 ((state^info->state) & 0xfffffe00) 3035 ((state^info->state) & 0xfffffe00)
3010 ) 3036 )
3011 return -EINVAL; 3037 return -EINVAL;
3012 /* Check there is only one change */ 3038 /* Check there is only one change */
3013 if (mddev->size != info->size) cnt++; 3039 if (mddev->size != info->size) cnt++;
3014 if (mddev->raid_disks != info->raid_disks) cnt++; 3040 if (mddev->raid_disks != info->raid_disks) cnt++;
3015 if (mddev->layout != info->layout) cnt++; 3041 if (mddev->layout != info->layout) cnt++;
3016 if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) cnt++; 3042 if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) cnt++;
3017 if (cnt == 0) return 0; 3043 if (cnt == 0) return 0;
3018 if (cnt > 1) return -EINVAL; 3044 if (cnt > 1) return -EINVAL;
3019 3045
3020 if (mddev->layout != info->layout) { 3046 if (mddev->layout != info->layout) {
3021 /* Change layout 3047 /* Change layout
3022 * we don't need to do anything at the md level, the 3048 * we don't need to do anything at the md level, the
3023 * personality will take care of it all. 3049 * personality will take care of it all.
3024 */ 3050 */
3025 if (mddev->pers->reconfig == NULL) 3051 if (mddev->pers->reconfig == NULL)
3026 return -EINVAL; 3052 return -EINVAL;
3027 else 3053 else
3028 return mddev->pers->reconfig(mddev, info->layout, -1); 3054 return mddev->pers->reconfig(mddev, info->layout, -1);
3029 } 3055 }
3030 if (mddev->size != info->size) { 3056 if (mddev->size != info->size) {
3031 mdk_rdev_t * rdev; 3057 mdk_rdev_t * rdev;
3032 struct list_head *tmp; 3058 struct list_head *tmp;
3033 if (mddev->pers->resize == NULL) 3059 if (mddev->pers->resize == NULL)
3034 return -EINVAL; 3060 return -EINVAL;
3035 /* The "size" is the amount of each device that is used. 3061 /* The "size" is the amount of each device that is used.
3036 * This can only make sense for arrays with redundancy. 3062 * This can only make sense for arrays with redundancy.
3037 * linear and raid0 always use whatever space is available 3063 * linear and raid0 always use whatever space is available
3038 * We can only consider changing the size if no resync 3064 * We can only consider changing the size if no resync
3039 * or reconstruction is happening, and if the new size 3065 * or reconstruction is happening, and if the new size
3040 * is acceptable. It must fit before the sb_offset or, 3066 * is acceptable. It must fit before the sb_offset or,
3041 * if that is <data_offset, it must fit before the 3067 * if that is <data_offset, it must fit before the
3042 * size of each device. 3068 * size of each device.
3043 * If size is zero, we find the largest size that fits. 3069 * If size is zero, we find the largest size that fits.
3044 */ 3070 */
3045 if (mddev->sync_thread) 3071 if (mddev->sync_thread)
3046 return -EBUSY; 3072 return -EBUSY;
3047 ITERATE_RDEV(mddev,rdev,tmp) { 3073 ITERATE_RDEV(mddev,rdev,tmp) {
3048 sector_t avail; 3074 sector_t avail;
3049 int fit = (info->size == 0); 3075 int fit = (info->size == 0);
3050 if (rdev->sb_offset > rdev->data_offset) 3076 if (rdev->sb_offset > rdev->data_offset)
3051 avail = (rdev->sb_offset*2) - rdev->data_offset; 3077 avail = (rdev->sb_offset*2) - rdev->data_offset;
3052 else 3078 else
3053 avail = get_capacity(rdev->bdev->bd_disk) 3079 avail = get_capacity(rdev->bdev->bd_disk)
3054 - rdev->data_offset; 3080 - rdev->data_offset;
3055 if (fit && (info->size == 0 || info->size > avail/2)) 3081 if (fit && (info->size == 0 || info->size > avail/2))
3056 info->size = avail/2; 3082 info->size = avail/2;
3057 if (avail < ((sector_t)info->size << 1)) 3083 if (avail < ((sector_t)info->size << 1))
3058 return -ENOSPC; 3084 return -ENOSPC;
3059 } 3085 }
3060 rv = mddev->pers->resize(mddev, (sector_t)info->size *2); 3086 rv = mddev->pers->resize(mddev, (sector_t)info->size *2);
3061 if (!rv) { 3087 if (!rv) {
3062 struct block_device *bdev; 3088 struct block_device *bdev;
3063 3089
3064 bdev = bdget_disk(mddev->gendisk, 0); 3090 bdev = bdget_disk(mddev->gendisk, 0);
3065 if (bdev) { 3091 if (bdev) {
3066 down(&bdev->bd_inode->i_sem); 3092 down(&bdev->bd_inode->i_sem);
3067 i_size_write(bdev->bd_inode, mddev->array_size << 10); 3093 i_size_write(bdev->bd_inode, mddev->array_size << 10);
3068 up(&bdev->bd_inode->i_sem); 3094 up(&bdev->bd_inode->i_sem);
3069 bdput(bdev); 3095 bdput(bdev);
3070 } 3096 }
3071 } 3097 }
3072 } 3098 }
3073 if (mddev->raid_disks != info->raid_disks) { 3099 if (mddev->raid_disks != info->raid_disks) {
3074 /* change the number of raid disks */ 3100 /* change the number of raid disks */
3075 if (mddev->pers->reshape == NULL) 3101 if (mddev->pers->reshape == NULL)
3076 return -EINVAL; 3102 return -EINVAL;
3077 if (info->raid_disks <= 0 || 3103 if (info->raid_disks <= 0 ||
3078 info->raid_disks >= mddev->max_disks) 3104 info->raid_disks >= mddev->max_disks)
3079 return -EINVAL; 3105 return -EINVAL;
3080 if (mddev->sync_thread) 3106 if (mddev->sync_thread)
3081 return -EBUSY; 3107 return -EBUSY;
3082 rv = mddev->pers->reshape(mddev, info->raid_disks); 3108 rv = mddev->pers->reshape(mddev, info->raid_disks);
3083 if (!rv) { 3109 if (!rv) {
3084 struct block_device *bdev; 3110 struct block_device *bdev;
3085 3111
3086 bdev = bdget_disk(mddev->gendisk, 0); 3112 bdev = bdget_disk(mddev->gendisk, 0);
3087 if (bdev) { 3113 if (bdev) {
3088 down(&bdev->bd_inode->i_sem); 3114 down(&bdev->bd_inode->i_sem);
3089 i_size_write(bdev->bd_inode, mddev->array_size << 10); 3115 i_size_write(bdev->bd_inode, mddev->array_size << 10);
3090 up(&bdev->bd_inode->i_sem); 3116 up(&bdev->bd_inode->i_sem);
3091 bdput(bdev); 3117 bdput(bdev);
3092 } 3118 }
3093 } 3119 }
3094 } 3120 }
3095 if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) { 3121 if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) {
3096 if (mddev->pers->quiesce == NULL) 3122 if (mddev->pers->quiesce == NULL)
3097 return -EINVAL; 3123 return -EINVAL;
3098 if (mddev->recovery || mddev->sync_thread) 3124 if (mddev->recovery || mddev->sync_thread)
3099 return -EBUSY; 3125 return -EBUSY;
3100 if (info->state & (1<<MD_SB_BITMAP_PRESENT)) { 3126 if (info->state & (1<<MD_SB_BITMAP_PRESENT)) {
3101 /* add the bitmap */ 3127 /* add the bitmap */
3102 if (mddev->bitmap) 3128 if (mddev->bitmap)
3103 return -EEXIST; 3129 return -EEXIST;
3104 if (mddev->default_bitmap_offset == 0) 3130 if (mddev->default_bitmap_offset == 0)
3105 return -EINVAL; 3131 return -EINVAL;
3106 mddev->bitmap_offset = mddev->default_bitmap_offset; 3132 mddev->bitmap_offset = mddev->default_bitmap_offset;
3107 mddev->pers->quiesce(mddev, 1); 3133 mddev->pers->quiesce(mddev, 1);
3108 rv = bitmap_create(mddev); 3134 rv = bitmap_create(mddev);
3109 if (rv) 3135 if (rv)
3110 bitmap_destroy(mddev); 3136 bitmap_destroy(mddev);
3111 mddev->pers->quiesce(mddev, 0); 3137 mddev->pers->quiesce(mddev, 0);
3112 } else { 3138 } else {
3113 /* remove the bitmap */ 3139 /* remove the bitmap */
3114 if (!mddev->bitmap) 3140 if (!mddev->bitmap)
3115 return -ENOENT; 3141 return -ENOENT;
3116 if (mddev->bitmap->file) 3142 if (mddev->bitmap->file)
3117 return -EINVAL; 3143 return -EINVAL;
3118 mddev->pers->quiesce(mddev, 1); 3144 mddev->pers->quiesce(mddev, 1);
3119 bitmap_destroy(mddev); 3145 bitmap_destroy(mddev);
3120 mddev->pers->quiesce(mddev, 0); 3146 mddev->pers->quiesce(mddev, 0);
3121 mddev->bitmap_offset = 0; 3147 mddev->bitmap_offset = 0;
3122 } 3148 }
3123 } 3149 }
3124 md_update_sb(mddev); 3150 md_update_sb(mddev);
3125 return rv; 3151 return rv;
3126 } 3152 }
3127 3153
3128 static int set_disk_faulty(mddev_t *mddev, dev_t dev) 3154 static int set_disk_faulty(mddev_t *mddev, dev_t dev)
3129 { 3155 {
3130 mdk_rdev_t *rdev; 3156 mdk_rdev_t *rdev;
3131 3157
3132 if (mddev->pers == NULL) 3158 if (mddev->pers == NULL)
3133 return -ENODEV; 3159 return -ENODEV;
3134 3160
3135 rdev = find_rdev(mddev, dev); 3161 rdev = find_rdev(mddev, dev);
3136 if (!rdev) 3162 if (!rdev)
3137 return -ENODEV; 3163 return -ENODEV;
3138 3164
3139 md_error(mddev, rdev); 3165 md_error(mddev, rdev);
3140 return 0; 3166 return 0;
3141 } 3167 }
3142 3168
3143 static int md_ioctl(struct inode *inode, struct file *file, 3169 static int md_ioctl(struct inode *inode, struct file *file,
3144 unsigned int cmd, unsigned long arg) 3170 unsigned int cmd, unsigned long arg)
3145 { 3171 {
3146 int err = 0; 3172 int err = 0;
3147 void __user *argp = (void __user *)arg; 3173 void __user *argp = (void __user *)arg;
3148 struct hd_geometry __user *loc = argp; 3174 struct hd_geometry __user *loc = argp;
3149 mddev_t *mddev = NULL; 3175 mddev_t *mddev = NULL;
3150 3176
3151 if (!capable(CAP_SYS_ADMIN)) 3177 if (!capable(CAP_SYS_ADMIN))
3152 return -EACCES; 3178 return -EACCES;
3153 3179
3154 /* 3180 /*
3155 * Commands dealing with the RAID driver but not any 3181 * Commands dealing with the RAID driver but not any
3156 * particular array: 3182 * particular array:
3157 */ 3183 */
3158 switch (cmd) 3184 switch (cmd)
3159 { 3185 {
3160 case RAID_VERSION: 3186 case RAID_VERSION:
3161 err = get_version(argp); 3187 err = get_version(argp);
3162 goto done; 3188 goto done;
3163 3189
3164 case PRINT_RAID_DEBUG: 3190 case PRINT_RAID_DEBUG:
3165 err = 0; 3191 err = 0;
3166 md_print_devices(); 3192 md_print_devices();
3167 goto done; 3193 goto done;
3168 3194
3169 #ifndef MODULE 3195 #ifndef MODULE
3170 case RAID_AUTORUN: 3196 case RAID_AUTORUN:
3171 err = 0; 3197 err = 0;
3172 autostart_arrays(arg); 3198 autostart_arrays(arg);
3173 goto done; 3199 goto done;
3174 #endif 3200 #endif
3175 default:; 3201 default:;
3176 } 3202 }
3177 3203
3178 /* 3204 /*
3179 * Commands creating/starting a new array: 3205 * Commands creating/starting a new array:
3180 */ 3206 */
3181 3207
3182 mddev = inode->i_bdev->bd_disk->private_data; 3208 mddev = inode->i_bdev->bd_disk->private_data;
3183 3209
3184 if (!mddev) { 3210 if (!mddev) {
3185 BUG(); 3211 BUG();
3186 goto abort; 3212 goto abort;
3187 } 3213 }
3188 3214
3189 3215
3190 if (cmd == START_ARRAY) { 3216 if (cmd == START_ARRAY) {
3191 /* START_ARRAY doesn't need to lock the array as autostart_array 3217 /* START_ARRAY doesn't need to lock the array as autostart_array
3192 * does the locking, and it could even be a different array 3218 * does the locking, and it could even be a different array
3193 */ 3219 */
3194 static int cnt = 3; 3220 static int cnt = 3;
3195 if (cnt > 0 ) { 3221 if (cnt > 0 ) {
3196 printk(KERN_WARNING 3222 printk(KERN_WARNING
3197 "md: %s(pid %d) used deprecated START_ARRAY ioctl. " 3223 "md: %s(pid %d) used deprecated START_ARRAY ioctl. "
3198 "This will not be supported beyond July 2006\n", 3224 "This will not be supported beyond July 2006\n",
3199 current->comm, current->pid); 3225 current->comm, current->pid);
3200 cnt--; 3226 cnt--;
3201 } 3227 }
3202 err = autostart_array(new_decode_dev(arg)); 3228 err = autostart_array(new_decode_dev(arg));
3203 if (err) { 3229 if (err) {
3204 printk(KERN_WARNING "md: autostart failed!\n"); 3230 printk(KERN_WARNING "md: autostart failed!\n");
3205 goto abort; 3231 goto abort;
3206 } 3232 }
3207 goto done; 3233 goto done;
3208 } 3234 }
3209 3235
3210 err = mddev_lock(mddev); 3236 err = mddev_lock(mddev);
3211 if (err) { 3237 if (err) {
3212 printk(KERN_INFO 3238 printk(KERN_INFO
3213 "md: ioctl lock interrupted, reason %d, cmd %d\n", 3239 "md: ioctl lock interrupted, reason %d, cmd %d\n",
3214 err, cmd); 3240 err, cmd);
3215 goto abort; 3241 goto abort;
3216 } 3242 }
3217 3243
3218 switch (cmd) 3244 switch (cmd)
3219 { 3245 {
3220 case SET_ARRAY_INFO: 3246 case SET_ARRAY_INFO:
3221 { 3247 {
3222 mdu_array_info_t info; 3248 mdu_array_info_t info;
3223 if (!arg) 3249 if (!arg)
3224 memset(&info, 0, sizeof(info)); 3250 memset(&info, 0, sizeof(info));
3225 else if (copy_from_user(&info, argp, sizeof(info))) { 3251 else if (copy_from_user(&info, argp, sizeof(info))) {
3226 err = -EFAULT; 3252 err = -EFAULT;
3227 goto abort_unlock; 3253 goto abort_unlock;
3228 } 3254 }
3229 if (mddev->pers) { 3255 if (mddev->pers) {
3230 err = update_array_info(mddev, &info); 3256 err = update_array_info(mddev, &info);
3231 if (err) { 3257 if (err) {
3232 printk(KERN_WARNING "md: couldn't update" 3258 printk(KERN_WARNING "md: couldn't update"
3233 " array info. %d\n", err); 3259 " array info. %d\n", err);
3234 goto abort_unlock; 3260 goto abort_unlock;
3235 } 3261 }
3236 goto done_unlock; 3262 goto done_unlock;
3237 } 3263 }
3238 if (!list_empty(&mddev->disks)) { 3264 if (!list_empty(&mddev->disks)) {
3239 printk(KERN_WARNING 3265 printk(KERN_WARNING
3240 "md: array %s already has disks!\n", 3266 "md: array %s already has disks!\n",
3241 mdname(mddev)); 3267 mdname(mddev));
3242 err = -EBUSY; 3268 err = -EBUSY;
3243 goto abort_unlock; 3269 goto abort_unlock;
3244 } 3270 }
3245 if (mddev->raid_disks) { 3271 if (mddev->raid_disks) {
3246 printk(KERN_WARNING 3272 printk(KERN_WARNING
3247 "md: array %s already initialised!\n", 3273 "md: array %s already initialised!\n",
3248 mdname(mddev)); 3274 mdname(mddev));
3249 err = -EBUSY; 3275 err = -EBUSY;
3250 goto abort_unlock; 3276 goto abort_unlock;
3251 } 3277 }
3252 err = set_array_info(mddev, &info); 3278 err = set_array_info(mddev, &info);
3253 if (err) { 3279 if (err) {
3254 printk(KERN_WARNING "md: couldn't set" 3280 printk(KERN_WARNING "md: couldn't set"
3255 " array info. %d\n", err); 3281 " array info. %d\n", err);
3256 goto abort_unlock; 3282 goto abort_unlock;
3257 } 3283 }
3258 } 3284 }
3259 goto done_unlock; 3285 goto done_unlock;
3260 3286
3261 default:; 3287 default:;
3262 } 3288 }
3263 3289
3264 /* 3290 /*
3265 * Commands querying/configuring an existing array: 3291 * Commands querying/configuring an existing array:
3266 */ 3292 */
3267 /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY, 3293 /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY,
3268 * RUN_ARRAY, and SET_BITMAP_FILE are allowed */ 3294 * RUN_ARRAY, and SET_BITMAP_FILE are allowed */
3269 if (!mddev->raid_disks && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY 3295 if (!mddev->raid_disks && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY
3270 && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE) { 3296 && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE) {
3271 err = -ENODEV; 3297 err = -ENODEV;
3272 goto abort_unlock; 3298 goto abort_unlock;
3273 } 3299 }
3274 3300
3275 /* 3301 /*
3276 * Commands even a read-only array can execute: 3302 * Commands even a read-only array can execute:
3277 */ 3303 */
3278 switch (cmd) 3304 switch (cmd)
3279 { 3305 {
3280 case GET_ARRAY_INFO: 3306 case GET_ARRAY_INFO:
3281 err = get_array_info(mddev, argp); 3307 err = get_array_info(mddev, argp);
3282 goto done_unlock; 3308 goto done_unlock;
3283 3309
3284 case GET_BITMAP_FILE: 3310 case GET_BITMAP_FILE:
3285 err = get_bitmap_file(mddev, argp); 3311 err = get_bitmap_file(mddev, argp);
3286 goto done_unlock; 3312 goto done_unlock;
3287 3313
3288 case GET_DISK_INFO: 3314 case GET_DISK_INFO:
3289 err = get_disk_info(mddev, argp); 3315 err = get_disk_info(mddev, argp);
3290 goto done_unlock; 3316 goto done_unlock;
3291 3317
3292 case RESTART_ARRAY_RW: 3318 case RESTART_ARRAY_RW:
3293 err = restart_array(mddev); 3319 err = restart_array(mddev);
3294 goto done_unlock; 3320 goto done_unlock;
3295 3321
3296 case STOP_ARRAY: 3322 case STOP_ARRAY:
3297 err = do_md_stop (mddev, 0); 3323 err = do_md_stop (mddev, 0);
3298 goto done_unlock; 3324 goto done_unlock;
3299 3325
3300 case STOP_ARRAY_RO: 3326 case STOP_ARRAY_RO:
3301 err = do_md_stop (mddev, 1); 3327 err = do_md_stop (mddev, 1);
3302 goto done_unlock; 3328 goto done_unlock;
3303 3329
3304 /* 3330 /*
3305 * We have a problem here : there is no easy way to give a CHS 3331 * We have a problem here : there is no easy way to give a CHS
3306 * virtual geometry. We currently pretend that we have a 2 heads 3332 * virtual geometry. We currently pretend that we have a 2 heads
3307 * 4 sectors (with a BIG number of cylinders...). This drives 3333 * 4 sectors (with a BIG number of cylinders...). This drives
3308 * dosfs just mad... ;-) 3334 * dosfs just mad... ;-)
3309 */ 3335 */
3310 case HDIO_GETGEO: 3336 case HDIO_GETGEO:
3311 if (!loc) { 3337 if (!loc) {
3312 err = -EINVAL; 3338 err = -EINVAL;
3313 goto abort_unlock; 3339 goto abort_unlock;
3314 } 3340 }
3315 err = put_user (2, (char __user *) &loc->heads); 3341 err = put_user (2, (char __user *) &loc->heads);
3316 if (err) 3342 if (err)
3317 goto abort_unlock; 3343 goto abort_unlock;
3318 err = put_user (4, (char __user *) &loc->sectors); 3344 err = put_user (4, (char __user *) &loc->sectors);
3319 if (err) 3345 if (err)
3320 goto abort_unlock; 3346 goto abort_unlock;
3321 err = put_user(get_capacity(mddev->gendisk)/8, 3347 err = put_user(get_capacity(mddev->gendisk)/8,
3322 (short __user *) &loc->cylinders); 3348 (short __user *) &loc->cylinders);
3323 if (err) 3349 if (err)
3324 goto abort_unlock; 3350 goto abort_unlock;
3325 err = put_user (get_start_sect(inode->i_bdev), 3351 err = put_user (get_start_sect(inode->i_bdev),
3326 (long __user *) &loc->start); 3352 (long __user *) &loc->start);
3327 goto done_unlock; 3353 goto done_unlock;
3328 } 3354 }
3329 3355
3330 /* 3356 /*
3331 * The remaining ioctls are changing the state of the 3357 * The remaining ioctls are changing the state of the
3332 * superblock, so we do not allow them on read-only arrays. 3358 * superblock, so we do not allow them on read-only arrays.
3333 * However non-MD ioctls (e.g. get-size) will still come through 3359 * However non-MD ioctls (e.g. get-size) will still come through
3334 * here and hit the 'default' below, so only disallow 3360 * here and hit the 'default' below, so only disallow
3335 * 'md' ioctls, and switch to rw mode if started auto-readonly. 3361 * 'md' ioctls, and switch to rw mode if started auto-readonly.
3336 */ 3362 */
3337 if (_IOC_TYPE(cmd) == MD_MAJOR && 3363 if (_IOC_TYPE(cmd) == MD_MAJOR &&
3338 mddev->ro && mddev->pers) { 3364 mddev->ro && mddev->pers) {
3339 if (mddev->ro == 2) { 3365 if (mddev->ro == 2) {
3340 mddev->ro = 0; 3366 mddev->ro = 0;
3341 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 3367 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
3342 md_wakeup_thread(mddev->thread); 3368 md_wakeup_thread(mddev->thread);
3343 3369
3344 } else { 3370 } else {
3345 err = -EROFS; 3371 err = -EROFS;
3346 goto abort_unlock; 3372 goto abort_unlock;
3347 } 3373 }
3348 } 3374 }
3349 3375
3350 switch (cmd) 3376 switch (cmd)
3351 { 3377 {
3352 case ADD_NEW_DISK: 3378 case ADD_NEW_DISK:
3353 { 3379 {
3354 mdu_disk_info_t info; 3380 mdu_disk_info_t info;
3355 if (copy_from_user(&info, argp, sizeof(info))) 3381 if (copy_from_user(&info, argp, sizeof(info)))
3356 err = -EFAULT; 3382 err = -EFAULT;
3357 else 3383 else
3358 err = add_new_disk(mddev, &info); 3384 err = add_new_disk(mddev, &info);
3359 goto done_unlock; 3385 goto done_unlock;
3360 } 3386 }
3361 3387
3362 case HOT_REMOVE_DISK: 3388 case HOT_REMOVE_DISK:
3363 err = hot_remove_disk(mddev, new_decode_dev(arg)); 3389 err = hot_remove_disk(mddev, new_decode_dev(arg));
3364 goto done_unlock; 3390 goto done_unlock;
3365 3391
3366 case HOT_ADD_DISK: 3392 case HOT_ADD_DISK:
3367 err = hot_add_disk(mddev, new_decode_dev(arg)); 3393 err = hot_add_disk(mddev, new_decode_dev(arg));
3368 goto done_unlock; 3394 goto done_unlock;
3369 3395
3370 case SET_DISK_FAULTY: 3396 case SET_DISK_FAULTY:
3371 err = set_disk_faulty(mddev, new_decode_dev(arg)); 3397 err = set_disk_faulty(mddev, new_decode_dev(arg));
3372 goto done_unlock; 3398 goto done_unlock;
3373 3399
3374 case RUN_ARRAY: 3400 case RUN_ARRAY:
3375 err = do_md_run (mddev); 3401 err = do_md_run (mddev);
3376 goto done_unlock; 3402 goto done_unlock;
3377 3403
3378 case SET_BITMAP_FILE: 3404 case SET_BITMAP_FILE:
3379 err = set_bitmap_file(mddev, (int)arg); 3405 err = set_bitmap_file(mddev, (int)arg);
3380 goto done_unlock; 3406 goto done_unlock;
3381 3407
3382 default: 3408 default:
3383 if (_IOC_TYPE(cmd) == MD_MAJOR) 3409 if (_IOC_TYPE(cmd) == MD_MAJOR)
3384 printk(KERN_WARNING "md: %s(pid %d) used" 3410 printk(KERN_WARNING "md: %s(pid %d) used"
3385 " obsolete MD ioctl, upgrade your" 3411 " obsolete MD ioctl, upgrade your"
3386 " software to use new ictls.\n", 3412 " software to use new ictls.\n",
3387 current->comm, current->pid); 3413 current->comm, current->pid);
3388 err = -EINVAL; 3414 err = -EINVAL;
3389 goto abort_unlock; 3415 goto abort_unlock;
3390 } 3416 }
3391 3417
3392 done_unlock: 3418 done_unlock:
3393 abort_unlock: 3419 abort_unlock:
3394 mddev_unlock(mddev); 3420 mddev_unlock(mddev);
3395 3421
3396 return err; 3422 return err;
3397 done: 3423 done:
3398 if (err) 3424 if (err)
3399 MD_BUG(); 3425 MD_BUG();
3400 abort: 3426 abort:
3401 return err; 3427 return err;
3402 } 3428 }
3403 3429
3404 static int md_open(struct inode *inode, struct file *file) 3430 static int md_open(struct inode *inode, struct file *file)
3405 { 3431 {
3406 /* 3432 /*
3407 * Succeed if we can lock the mddev, which confirms that 3433 * Succeed if we can lock the mddev, which confirms that
3408 * it isn't being stopped right now. 3434 * it isn't being stopped right now.
3409 */ 3435 */
3410 mddev_t *mddev = inode->i_bdev->bd_disk->private_data; 3436 mddev_t *mddev = inode->i_bdev->bd_disk->private_data;
3411 int err; 3437 int err;
3412 3438
3413 if ((err = mddev_lock(mddev))) 3439 if ((err = mddev_lock(mddev)))
3414 goto out; 3440 goto out;
3415 3441
3416 err = 0; 3442 err = 0;
3417 mddev_get(mddev); 3443 mddev_get(mddev);
3418 mddev_unlock(mddev); 3444 mddev_unlock(mddev);
3419 3445
3420 check_disk_change(inode->i_bdev); 3446 check_disk_change(inode->i_bdev);
3421 out: 3447 out:
3422 return err; 3448 return err;
3423 } 3449 }
3424 3450
3425 static int md_release(struct inode *inode, struct file * file) 3451 static int md_release(struct inode *inode, struct file * file)
3426 { 3452 {
3427 mddev_t *mddev = inode->i_bdev->bd_disk->private_data; 3453 mddev_t *mddev = inode->i_bdev->bd_disk->private_data;
3428 3454
3429 if (!mddev) 3455 if (!mddev)
3430 BUG(); 3456 BUG();
3431 mddev_put(mddev); 3457 mddev_put(mddev);
3432 3458
3433 return 0; 3459 return 0;
3434 } 3460 }
3435 3461
3436 static int md_media_changed(struct gendisk *disk) 3462 static int md_media_changed(struct gendisk *disk)
3437 { 3463 {
3438 mddev_t *mddev = disk->private_data; 3464 mddev_t *mddev = disk->private_data;
3439 3465
3440 return mddev->changed; 3466 return mddev->changed;
3441 } 3467 }
3442 3468
3443 static int md_revalidate(struct gendisk *disk) 3469 static int md_revalidate(struct gendisk *disk)
3444 { 3470 {
3445 mddev_t *mddev = disk->private_data; 3471 mddev_t *mddev = disk->private_data;
3446 3472
3447 mddev->changed = 0; 3473 mddev->changed = 0;
3448 return 0; 3474 return 0;
3449 } 3475 }
3450 static struct block_device_operations md_fops = 3476 static struct block_device_operations md_fops =
3451 { 3477 {
3452 .owner = THIS_MODULE, 3478 .owner = THIS_MODULE,
3453 .open = md_open, 3479 .open = md_open,
3454 .release = md_release, 3480 .release = md_release,
3455 .ioctl = md_ioctl, 3481 .ioctl = md_ioctl,
3456 .media_changed = md_media_changed, 3482 .media_changed = md_media_changed,
3457 .revalidate_disk= md_revalidate, 3483 .revalidate_disk= md_revalidate,
3458 }; 3484 };
3459 3485
3460 static int md_thread(void * arg) 3486 static int md_thread(void * arg)
3461 { 3487 {
3462 mdk_thread_t *thread = arg; 3488 mdk_thread_t *thread = arg;
3463 3489
3464 /* 3490 /*
3465 * md_thread is a 'system-thread', it's priority should be very 3491 * md_thread is a 'system-thread', it's priority should be very
3466 * high. We avoid resource deadlocks individually in each 3492 * high. We avoid resource deadlocks individually in each
3467 * raid personality. (RAID5 does preallocation) We also use RR and 3493 * raid personality. (RAID5 does preallocation) We also use RR and
3468 * the very same RT priority as kswapd, thus we will never get 3494 * the very same RT priority as kswapd, thus we will never get
3469 * into a priority inversion deadlock. 3495 * into a priority inversion deadlock.
3470 * 3496 *
3471 * we definitely have to have equal or higher priority than 3497 * we definitely have to have equal or higher priority than
3472 * bdflush, otherwise bdflush will deadlock if there are too 3498 * bdflush, otherwise bdflush will deadlock if there are too
3473 * many dirty RAID5 blocks. 3499 * many dirty RAID5 blocks.
3474 */ 3500 */
3475 3501
3476 allow_signal(SIGKILL); 3502 allow_signal(SIGKILL);
3477 while (!kthread_should_stop()) { 3503 while (!kthread_should_stop()) {
3478 3504
3479 /* We need to wait INTERRUPTIBLE so that 3505 /* We need to wait INTERRUPTIBLE so that
3480 * we don't add to the load-average. 3506 * we don't add to the load-average.
3481 * That means we need to be sure no signals are 3507 * That means we need to be sure no signals are
3482 * pending 3508 * pending
3483 */ 3509 */
3484 if (signal_pending(current)) 3510 if (signal_pending(current))
3485 flush_signals(current); 3511 flush_signals(current);
3486 3512
3487 wait_event_interruptible_timeout 3513 wait_event_interruptible_timeout
3488 (thread->wqueue, 3514 (thread->wqueue,
3489 test_bit(THREAD_WAKEUP, &thread->flags) 3515 test_bit(THREAD_WAKEUP, &thread->flags)
3490 || kthread_should_stop(), 3516 || kthread_should_stop(),
3491 thread->timeout); 3517 thread->timeout);
3492 try_to_freeze(); 3518 try_to_freeze();
3493 3519
3494 clear_bit(THREAD_WAKEUP, &thread->flags); 3520 clear_bit(THREAD_WAKEUP, &thread->flags);
3495 3521
3496 thread->run(thread->mddev); 3522 thread->run(thread->mddev);
3497 } 3523 }
3498 3524
3499 return 0; 3525 return 0;
3500 } 3526 }
3501 3527
3502 void md_wakeup_thread(mdk_thread_t *thread) 3528 void md_wakeup_thread(mdk_thread_t *thread)
3503 { 3529 {
3504 if (thread) { 3530 if (thread) {
3505 dprintk("md: waking up MD thread %s.\n", thread->tsk->comm); 3531 dprintk("md: waking up MD thread %s.\n", thread->tsk->comm);
3506 set_bit(THREAD_WAKEUP, &thread->flags); 3532 set_bit(THREAD_WAKEUP, &thread->flags);
3507 wake_up(&thread->wqueue); 3533 wake_up(&thread->wqueue);
3508 } 3534 }
3509 } 3535 }
3510 3536
3511 mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev, 3537 mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev,
3512 const char *name) 3538 const char *name)
3513 { 3539 {
3514 mdk_thread_t *thread; 3540 mdk_thread_t *thread;
3515 3541
3516 thread = kzalloc(sizeof(mdk_thread_t), GFP_KERNEL); 3542 thread = kzalloc(sizeof(mdk_thread_t), GFP_KERNEL);
3517 if (!thread) 3543 if (!thread)
3518 return NULL; 3544 return NULL;
3519 3545
3520 init_waitqueue_head(&thread->wqueue); 3546 init_waitqueue_head(&thread->wqueue);
3521 3547
3522 thread->run = run; 3548 thread->run = run;
3523 thread->mddev = mddev; 3549 thread->mddev = mddev;
3524 thread->timeout = MAX_SCHEDULE_TIMEOUT; 3550 thread->timeout = MAX_SCHEDULE_TIMEOUT;
3525 thread->tsk = kthread_run(md_thread, thread, name, mdname(thread->mddev)); 3551 thread->tsk = kthread_run(md_thread, thread, name, mdname(thread->mddev));
3526 if (IS_ERR(thread->tsk)) { 3552 if (IS_ERR(thread->tsk)) {
3527 kfree(thread); 3553 kfree(thread);
3528 return NULL; 3554 return NULL;
3529 } 3555 }
3530 return thread; 3556 return thread;
3531 } 3557 }
3532 3558
3533 void md_unregister_thread(mdk_thread_t *thread) 3559 void md_unregister_thread(mdk_thread_t *thread)
3534 { 3560 {
3535 dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); 3561 dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid);
3536 3562
3537 kthread_stop(thread->tsk); 3563 kthread_stop(thread->tsk);
3538 kfree(thread); 3564 kfree(thread);
3539 } 3565 }
3540 3566
3541 void md_error(mddev_t *mddev, mdk_rdev_t *rdev) 3567 void md_error(mddev_t *mddev, mdk_rdev_t *rdev)
3542 { 3568 {
3543 if (!mddev) { 3569 if (!mddev) {
3544 MD_BUG(); 3570 MD_BUG();
3545 return; 3571 return;
3546 } 3572 }
3547 3573
3548 if (!rdev || test_bit(Faulty, &rdev->flags)) 3574 if (!rdev || test_bit(Faulty, &rdev->flags))
3549 return; 3575 return;
3550 /* 3576 /*
3551 dprintk("md_error dev:%s, rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", 3577 dprintk("md_error dev:%s, rdev:(%d:%d), (caller: %p,%p,%p,%p).\n",
3552 mdname(mddev), 3578 mdname(mddev),
3553 MAJOR(rdev->bdev->bd_dev), MINOR(rdev->bdev->bd_dev), 3579 MAJOR(rdev->bdev->bd_dev), MINOR(rdev->bdev->bd_dev),
3554 __builtin_return_address(0),__builtin_return_address(1), 3580 __builtin_return_address(0),__builtin_return_address(1),
3555 __builtin_return_address(2),__builtin_return_address(3)); 3581 __builtin_return_address(2),__builtin_return_address(3));
3556 */ 3582 */
3557 if (!mddev->pers->error_handler) 3583 if (!mddev->pers->error_handler)
3558 return; 3584 return;
3559 mddev->pers->error_handler(mddev,rdev); 3585 mddev->pers->error_handler(mddev,rdev);
3560 set_bit(MD_RECOVERY_INTR, &mddev->recovery); 3586 set_bit(MD_RECOVERY_INTR, &mddev->recovery);
3561 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 3587 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
3562 md_wakeup_thread(mddev->thread); 3588 md_wakeup_thread(mddev->thread);
3563 md_new_event(mddev); 3589 md_new_event(mddev);
3564 } 3590 }
3565 3591
3566 /* seq_file implementation /proc/mdstat */ 3592 /* seq_file implementation /proc/mdstat */
3567 3593
3568 static void status_unused(struct seq_file *seq) 3594 static void status_unused(struct seq_file *seq)
3569 { 3595 {
3570 int i = 0; 3596 int i = 0;
3571 mdk_rdev_t *rdev; 3597 mdk_rdev_t *rdev;
3572 struct list_head *tmp; 3598 struct list_head *tmp;
3573 3599
3574 seq_printf(seq, "unused devices: "); 3600 seq_printf(seq, "unused devices: ");
3575 3601
3576 ITERATE_RDEV_PENDING(rdev,tmp) { 3602 ITERATE_RDEV_PENDING(rdev,tmp) {
3577 char b[BDEVNAME_SIZE]; 3603 char b[BDEVNAME_SIZE];
3578 i++; 3604 i++;
3579 seq_printf(seq, "%s ", 3605 seq_printf(seq, "%s ",
3580 bdevname(rdev->bdev,b)); 3606 bdevname(rdev->bdev,b));
3581 } 3607 }
3582 if (!i) 3608 if (!i)
3583 seq_printf(seq, "<none>"); 3609 seq_printf(seq, "<none>");
3584 3610
3585 seq_printf(seq, "\n"); 3611 seq_printf(seq, "\n");
3586 } 3612 }
3587 3613
3588 3614
3589 static void status_resync(struct seq_file *seq, mddev_t * mddev) 3615 static void status_resync(struct seq_file *seq, mddev_t * mddev)
3590 { 3616 {
3591 unsigned long max_blocks, resync, res, dt, db, rt; 3617 unsigned long max_blocks, resync, res, dt, db, rt;
3592 3618
3593 resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; 3619 resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2;
3594 3620
3595 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) 3621 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
3596 max_blocks = mddev->resync_max_sectors >> 1; 3622 max_blocks = mddev->resync_max_sectors >> 1;
3597 else 3623 else
3598 max_blocks = mddev->size; 3624 max_blocks = mddev->size;
3599 3625
3600 /* 3626 /*
3601 * Should not happen. 3627 * Should not happen.
3602 */ 3628 */
3603 if (!max_blocks) { 3629 if (!max_blocks) {
3604 MD_BUG(); 3630 MD_BUG();
3605 return; 3631 return;
3606 } 3632 }
3607 res = (resync/1024)*1000/(max_blocks/1024 + 1); 3633 res = (resync/1024)*1000/(max_blocks/1024 + 1);
3608 { 3634 {
3609 int i, x = res/50, y = 20-x; 3635 int i, x = res/50, y = 20-x;
3610 seq_printf(seq, "["); 3636 seq_printf(seq, "[");
3611 for (i = 0; i < x; i++) 3637 for (i = 0; i < x; i++)
3612 seq_printf(seq, "="); 3638 seq_printf(seq, "=");
3613 seq_printf(seq, ">"); 3639 seq_printf(seq, ">");
3614 for (i = 0; i < y; i++) 3640 for (i = 0; i < y; i++)
3615 seq_printf(seq, "."); 3641 seq_printf(seq, ".");
3616 seq_printf(seq, "] "); 3642 seq_printf(seq, "] ");
3617 } 3643 }
3618 seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)", 3644 seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)",
3619 (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? 3645 (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?
3620 "resync" : "recovery"), 3646 "resync" : "recovery"),
3621 res/10, res % 10, resync, max_blocks); 3647 res/10, res % 10, resync, max_blocks);
3622 3648
3623 /* 3649 /*
3624 * We do not want to overflow, so the order of operands and 3650 * We do not want to overflow, so the order of operands and
3625 * the * 100 / 100 trick are important. We do a +1 to be 3651 * the * 100 / 100 trick are important. We do a +1 to be
3626 * safe against division by zero. We only estimate anyway. 3652 * safe against division by zero. We only estimate anyway.
3627 * 3653 *
3628 * dt: time from mark until now 3654 * dt: time from mark until now
3629 * db: blocks written from mark until now 3655 * db: blocks written from mark until now
3630 * rt: remaining time 3656 * rt: remaining time
3631 */ 3657 */
3632 dt = ((jiffies - mddev->resync_mark) / HZ); 3658 dt = ((jiffies - mddev->resync_mark) / HZ);
3633 if (!dt) dt++; 3659 if (!dt) dt++;
3634 db = resync - (mddev->resync_mark_cnt/2); 3660 db = resync - (mddev->resync_mark_cnt/2);
3635 rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; 3661 rt = (dt * ((max_blocks-resync) / (db/100+1)))/100;
3636 3662
3637 seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); 3663 seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6);
3638 3664
3639 seq_printf(seq, " speed=%ldK/sec", db/dt); 3665 seq_printf(seq, " speed=%ldK/sec", db/dt);
3640 } 3666 }
3641 3667
3642 static void *md_seq_start(struct seq_file *seq, loff_t *pos) 3668 static void *md_seq_start(struct seq_file *seq, loff_t *pos)
3643 { 3669 {
3644 struct list_head *tmp; 3670 struct list_head *tmp;
3645 loff_t l = *pos; 3671 loff_t l = *pos;
3646 mddev_t *mddev; 3672 mddev_t *mddev;
3647 3673
3648 if (l >= 0x10000) 3674 if (l >= 0x10000)
3649 return NULL; 3675 return NULL;
3650 if (!l--) 3676 if (!l--)
3651 /* header */ 3677 /* header */
3652 return (void*)1; 3678 return (void*)1;
3653 3679
3654 spin_lock(&all_mddevs_lock); 3680 spin_lock(&all_mddevs_lock);
3655 list_for_each(tmp,&all_mddevs) 3681 list_for_each(tmp,&all_mddevs)
3656 if (!l--) { 3682 if (!l--) {
3657 mddev = list_entry(tmp, mddev_t, all_mddevs); 3683 mddev = list_entry(tmp, mddev_t, all_mddevs);
3658 mddev_get(mddev); 3684 mddev_get(mddev);
3659 spin_unlock(&all_mddevs_lock); 3685 spin_unlock(&all_mddevs_lock);
3660 return mddev; 3686 return mddev;
3661 } 3687 }
3662 spin_unlock(&all_mddevs_lock); 3688 spin_unlock(&all_mddevs_lock);
3663 if (!l--) 3689 if (!l--)
3664 return (void*)2;/* tail */ 3690 return (void*)2;/* tail */
3665 return NULL; 3691 return NULL;
3666 } 3692 }
3667 3693
3668 static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) 3694 static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
3669 { 3695 {
3670 struct list_head *tmp; 3696 struct list_head *tmp;
3671 mddev_t *next_mddev, *mddev = v; 3697 mddev_t *next_mddev, *mddev = v;
3672 3698
3673 ++*pos; 3699 ++*pos;
3674 if (v == (void*)2) 3700 if (v == (void*)2)
3675 return NULL; 3701 return NULL;
3676 3702
3677 spin_lock(&all_mddevs_lock); 3703 spin_lock(&all_mddevs_lock);
3678 if (v == (void*)1) 3704 if (v == (void*)1)
3679 tmp = all_mddevs.next; 3705 tmp = all_mddevs.next;
3680 else 3706 else
3681 tmp = mddev->all_mddevs.next; 3707 tmp = mddev->all_mddevs.next;
3682 if (tmp != &all_mddevs) 3708 if (tmp != &all_mddevs)
3683 next_mddev = mddev_get(list_entry(tmp,mddev_t,all_mddevs)); 3709 next_mddev = mddev_get(list_entry(tmp,mddev_t,all_mddevs));
3684 else { 3710 else {
3685 next_mddev = (void*)2; 3711 next_mddev = (void*)2;
3686 *pos = 0x10000; 3712 *pos = 0x10000;
3687 } 3713 }
3688 spin_unlock(&all_mddevs_lock); 3714 spin_unlock(&all_mddevs_lock);
3689 3715
3690 if (v != (void*)1) 3716 if (v != (void*)1)
3691 mddev_put(mddev); 3717 mddev_put(mddev);
3692 return next_mddev; 3718 return next_mddev;
3693 3719
3694 } 3720 }
3695 3721
3696 static void md_seq_stop(struct seq_file *seq, void *v) 3722 static void md_seq_stop(struct seq_file *seq, void *v)
3697 { 3723 {
3698 mddev_t *mddev = v; 3724 mddev_t *mddev = v;
3699 3725
3700 if (mddev && v != (void*)1 && v != (void*)2) 3726 if (mddev && v != (void*)1 && v != (void*)2)
3701 mddev_put(mddev); 3727 mddev_put(mddev);
3702 } 3728 }
3703 3729
3704 struct mdstat_info { 3730 struct mdstat_info {
3705 int event; 3731 int event;
3706 }; 3732 };
3707 3733
3708 static int md_seq_show(struct seq_file *seq, void *v) 3734 static int md_seq_show(struct seq_file *seq, void *v)
3709 { 3735 {
3710 mddev_t *mddev = v; 3736 mddev_t *mddev = v;
3711 sector_t size; 3737 sector_t size;
3712 struct list_head *tmp2; 3738 struct list_head *tmp2;
3713 mdk_rdev_t *rdev; 3739 mdk_rdev_t *rdev;
3714 struct mdstat_info *mi = seq->private; 3740 struct mdstat_info *mi = seq->private;
3715 struct bitmap *bitmap; 3741 struct bitmap *bitmap;
3716 3742
3717 if (v == (void*)1) { 3743 if (v == (void*)1) {
3718 struct mdk_personality *pers; 3744 struct mdk_personality *pers;
3719 seq_printf(seq, "Personalities : "); 3745 seq_printf(seq, "Personalities : ");
3720 spin_lock(&pers_lock); 3746 spin_lock(&pers_lock);
3721 list_for_each_entry(pers, &pers_list, list) 3747 list_for_each_entry(pers, &pers_list, list)
3722 seq_printf(seq, "[%s] ", pers->name); 3748 seq_printf(seq, "[%s] ", pers->name);
3723 3749
3724 spin_unlock(&pers_lock); 3750 spin_unlock(&pers_lock);
3725 seq_printf(seq, "\n"); 3751 seq_printf(seq, "\n");
3726 mi->event = atomic_read(&md_event_count); 3752 mi->event = atomic_read(&md_event_count);
3727 return 0; 3753 return 0;
3728 } 3754 }
3729 if (v == (void*)2) { 3755 if (v == (void*)2) {
3730 status_unused(seq); 3756 status_unused(seq);
3731 return 0; 3757 return 0;
3732 } 3758 }
3733 3759
3734 if (mddev_lock(mddev)!=0) 3760 if (mddev_lock(mddev)!=0)
3735 return -EINTR; 3761 return -EINTR;
3736 if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { 3762 if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) {
3737 seq_printf(seq, "%s : %sactive", mdname(mddev), 3763 seq_printf(seq, "%s : %sactive", mdname(mddev),
3738 mddev->pers ? "" : "in"); 3764 mddev->pers ? "" : "in");
3739 if (mddev->pers) { 3765 if (mddev->pers) {
3740 if (mddev->ro==1) 3766 if (mddev->ro==1)
3741 seq_printf(seq, " (read-only)"); 3767 seq_printf(seq, " (read-only)");
3742 if (mddev->ro==2) 3768 if (mddev->ro==2)
3743 seq_printf(seq, "(auto-read-only)"); 3769 seq_printf(seq, "(auto-read-only)");
3744 seq_printf(seq, " %s", mddev->pers->name); 3770 seq_printf(seq, " %s", mddev->pers->name);
3745 } 3771 }
3746 3772
3747 size = 0; 3773 size = 0;
3748 ITERATE_RDEV(mddev,rdev,tmp2) { 3774 ITERATE_RDEV(mddev,rdev,tmp2) {
3749 char b[BDEVNAME_SIZE]; 3775 char b[BDEVNAME_SIZE];
3750 seq_printf(seq, " %s[%d]", 3776 seq_printf(seq, " %s[%d]",
3751 bdevname(rdev->bdev,b), rdev->desc_nr); 3777 bdevname(rdev->bdev,b), rdev->desc_nr);
3752 if (test_bit(WriteMostly, &rdev->flags)) 3778 if (test_bit(WriteMostly, &rdev->flags))
3753 seq_printf(seq, "(W)"); 3779 seq_printf(seq, "(W)");
3754 if (test_bit(Faulty, &rdev->flags)) { 3780 if (test_bit(Faulty, &rdev->flags)) {
3755 seq_printf(seq, "(F)"); 3781 seq_printf(seq, "(F)");
3756 continue; 3782 continue;
3757 } else if (rdev->raid_disk < 0) 3783 } else if (rdev->raid_disk < 0)
3758 seq_printf(seq, "(S)"); /* spare */ 3784 seq_printf(seq, "(S)"); /* spare */
3759 size += rdev->size; 3785 size += rdev->size;
3760 } 3786 }
3761 3787
3762 if (!list_empty(&mddev->disks)) { 3788 if (!list_empty(&mddev->disks)) {
3763 if (mddev->pers) 3789 if (mddev->pers)
3764 seq_printf(seq, "\n %llu blocks", 3790 seq_printf(seq, "\n %llu blocks",
3765 (unsigned long long)mddev->array_size); 3791 (unsigned long long)mddev->array_size);
3766 else 3792 else
3767 seq_printf(seq, "\n %llu blocks", 3793 seq_printf(seq, "\n %llu blocks",
3768 (unsigned long long)size); 3794 (unsigned long long)size);
3769 } 3795 }
3770 if (mddev->persistent) { 3796 if (mddev->persistent) {
3771 if (mddev->major_version != 0 || 3797 if (mddev->major_version != 0 ||
3772 mddev->minor_version != 90) { 3798 mddev->minor_version != 90) {
3773 seq_printf(seq," super %d.%d", 3799 seq_printf(seq," super %d.%d",
3774 mddev->major_version, 3800 mddev->major_version,
3775 mddev->minor_version); 3801 mddev->minor_version);
3776 } 3802 }
3777 } else 3803 } else
3778 seq_printf(seq, " super non-persistent"); 3804 seq_printf(seq, " super non-persistent");
3779 3805
3780 if (mddev->pers) { 3806 if (mddev->pers) {
3781 mddev->pers->status (seq, mddev); 3807 mddev->pers->status (seq, mddev);
3782 seq_printf(seq, "\n "); 3808 seq_printf(seq, "\n ");
3783 if (mddev->pers->sync_request) { 3809 if (mddev->pers->sync_request) {
3784 if (mddev->curr_resync > 2) { 3810 if (mddev->curr_resync > 2) {
3785 status_resync (seq, mddev); 3811 status_resync (seq, mddev);
3786 seq_printf(seq, "\n "); 3812 seq_printf(seq, "\n ");
3787 } else if (mddev->curr_resync == 1 || mddev->curr_resync == 2) 3813 } else if (mddev->curr_resync == 1 || mddev->curr_resync == 2)
3788 seq_printf(seq, "\tresync=DELAYED\n "); 3814 seq_printf(seq, "\tresync=DELAYED\n ");
3789 else if (mddev->recovery_cp < MaxSector) 3815 else if (mddev->recovery_cp < MaxSector)
3790 seq_printf(seq, "\tresync=PENDING\n "); 3816 seq_printf(seq, "\tresync=PENDING\n ");
3791 } 3817 }
3792 } else 3818 } else
3793 seq_printf(seq, "\n "); 3819 seq_printf(seq, "\n ");
3794 3820
3795 if ((bitmap = mddev->bitmap)) { 3821 if ((bitmap = mddev->bitmap)) {
3796 unsigned long chunk_kb; 3822 unsigned long chunk_kb;
3797 unsigned long flags; 3823 unsigned long flags;
3798 spin_lock_irqsave(&bitmap->lock, flags); 3824 spin_lock_irqsave(&bitmap->lock, flags);
3799 chunk_kb = bitmap->chunksize >> 10; 3825 chunk_kb = bitmap->chunksize >> 10;
3800 seq_printf(seq, "bitmap: %lu/%lu pages [%luKB], " 3826 seq_printf(seq, "bitmap: %lu/%lu pages [%luKB], "
3801 "%lu%s chunk", 3827 "%lu%s chunk",
3802 bitmap->pages - bitmap->missing_pages, 3828 bitmap->pages - bitmap->missing_pages,
3803 bitmap->pages, 3829 bitmap->pages,
3804 (bitmap->pages - bitmap->missing_pages) 3830 (bitmap->pages - bitmap->missing_pages)
3805 << (PAGE_SHIFT - 10), 3831 << (PAGE_SHIFT - 10),
3806 chunk_kb ? chunk_kb : bitmap->chunksize, 3832 chunk_kb ? chunk_kb : bitmap->chunksize,
3807 chunk_kb ? "KB" : "B"); 3833 chunk_kb ? "KB" : "B");
3808 if (bitmap->file) { 3834 if (bitmap->file) {
3809 seq_printf(seq, ", file: "); 3835 seq_printf(seq, ", file: ");
3810 seq_path(seq, bitmap->file->f_vfsmnt, 3836 seq_path(seq, bitmap->file->f_vfsmnt,
3811 bitmap->file->f_dentry," \t\n"); 3837 bitmap->file->f_dentry," \t\n");
3812 } 3838 }
3813 3839
3814 seq_printf(seq, "\n"); 3840 seq_printf(seq, "\n");
3815 spin_unlock_irqrestore(&bitmap->lock, flags); 3841 spin_unlock_irqrestore(&bitmap->lock, flags);
3816 } 3842 }
3817 3843
3818 seq_printf(seq, "\n"); 3844 seq_printf(seq, "\n");
3819 } 3845 }
3820 mddev_unlock(mddev); 3846 mddev_unlock(mddev);
3821 3847
3822 return 0; 3848 return 0;
3823 } 3849 }
3824 3850
3825 static struct seq_operations md_seq_ops = { 3851 static struct seq_operations md_seq_ops = {
3826 .start = md_seq_start, 3852 .start = md_seq_start,
3827 .next = md_seq_next, 3853 .next = md_seq_next,
3828 .stop = md_seq_stop, 3854 .stop = md_seq_stop,
3829 .show = md_seq_show, 3855 .show = md_seq_show,
3830 }; 3856 };
3831 3857
3832 static int md_seq_open(struct inode *inode, struct file *file) 3858 static int md_seq_open(struct inode *inode, struct file *file)
3833 { 3859 {
3834 int error; 3860 int error;
3835 struct mdstat_info *mi = kmalloc(sizeof(*mi), GFP_KERNEL); 3861 struct mdstat_info *mi = kmalloc(sizeof(*mi), GFP_KERNEL);
3836 if (mi == NULL) 3862 if (mi == NULL)
3837 return -ENOMEM; 3863 return -ENOMEM;
3838 3864
3839 error = seq_open(file, &md_seq_ops); 3865 error = seq_open(file, &md_seq_ops);
3840 if (error) 3866 if (error)
3841 kfree(mi); 3867 kfree(mi);
3842 else { 3868 else {
3843 struct seq_file *p = file->private_data; 3869 struct seq_file *p = file->private_data;
3844 p->private = mi; 3870 p->private = mi;
3845 mi->event = atomic_read(&md_event_count); 3871 mi->event = atomic_read(&md_event_count);
3846 } 3872 }
3847 return error; 3873 return error;
3848 } 3874 }
3849 3875
3850 static int md_seq_release(struct inode *inode, struct file *file) 3876 static int md_seq_release(struct inode *inode, struct file *file)
3851 { 3877 {
3852 struct seq_file *m = file->private_data; 3878 struct seq_file *m = file->private_data;
3853 struct mdstat_info *mi = m->private; 3879 struct mdstat_info *mi = m->private;
3854 m->private = NULL; 3880 m->private = NULL;
3855 kfree(mi); 3881 kfree(mi);
3856 return seq_release(inode, file); 3882 return seq_release(inode, file);
3857 } 3883 }
3858 3884
3859 static unsigned int mdstat_poll(struct file *filp, poll_table *wait) 3885 static unsigned int mdstat_poll(struct file *filp, poll_table *wait)
3860 { 3886 {
3861 struct seq_file *m = filp->private_data; 3887 struct seq_file *m = filp->private_data;
3862 struct mdstat_info *mi = m->private; 3888 struct mdstat_info *mi = m->private;
3863 int mask; 3889 int mask;
3864 3890
3865 poll_wait(filp, &md_event_waiters, wait); 3891 poll_wait(filp, &md_event_waiters, wait);
3866 3892
3867 /* always allow read */ 3893 /* always allow read */
3868 mask = POLLIN | POLLRDNORM; 3894 mask = POLLIN | POLLRDNORM;
3869 3895
3870 if (mi->event != atomic_read(&md_event_count)) 3896 if (mi->event != atomic_read(&md_event_count))
3871 mask |= POLLERR | POLLPRI; 3897 mask |= POLLERR | POLLPRI;
3872 return mask; 3898 return mask;
3873 } 3899 }
3874 3900
3875 static struct file_operations md_seq_fops = { 3901 static struct file_operations md_seq_fops = {
3876 .open = md_seq_open, 3902 .open = md_seq_open,
3877 .read = seq_read, 3903 .read = seq_read,
3878 .llseek = seq_lseek, 3904 .llseek = seq_lseek,
3879 .release = md_seq_release, 3905 .release = md_seq_release,
3880 .poll = mdstat_poll, 3906 .poll = mdstat_poll,
3881 }; 3907 };
3882 3908
3883 int register_md_personality(struct mdk_personality *p) 3909 int register_md_personality(struct mdk_personality *p)
3884 { 3910 {
3885 spin_lock(&pers_lock); 3911 spin_lock(&pers_lock);
3886 list_add_tail(&p->list, &pers_list); 3912 list_add_tail(&p->list, &pers_list);
3887 printk(KERN_INFO "md: %s personality registered for level %d\n", p->name, p->level); 3913 printk(KERN_INFO "md: %s personality registered for level %d\n", p->name, p->level);
3888 spin_unlock(&pers_lock); 3914 spin_unlock(&pers_lock);
3889 return 0; 3915 return 0;
3890 } 3916 }
3891 3917
3892 int unregister_md_personality(struct mdk_personality *p) 3918 int unregister_md_personality(struct mdk_personality *p)
3893 { 3919 {
3894 printk(KERN_INFO "md: %s personality unregistered\n", p->name); 3920 printk(KERN_INFO "md: %s personality unregistered\n", p->name);
3895 spin_lock(&pers_lock); 3921 spin_lock(&pers_lock);
3896 list_del_init(&p->list); 3922 list_del_init(&p->list);
3897 spin_unlock(&pers_lock); 3923 spin_unlock(&pers_lock);
3898 return 0; 3924 return 0;
3899 } 3925 }
3900 3926
3901 static int is_mddev_idle(mddev_t *mddev) 3927 static int is_mddev_idle(mddev_t *mddev)
3902 { 3928 {
3903 mdk_rdev_t * rdev; 3929 mdk_rdev_t * rdev;
3904 struct list_head *tmp; 3930 struct list_head *tmp;
3905 int idle; 3931 int idle;
3906 unsigned long curr_events; 3932 unsigned long curr_events;
3907 3933
3908 idle = 1; 3934 idle = 1;
3909 ITERATE_RDEV(mddev,rdev,tmp) { 3935 ITERATE_RDEV(mddev,rdev,tmp) {
3910 struct gendisk *disk = rdev->bdev->bd_contains->bd_disk; 3936 struct gendisk *disk = rdev->bdev->bd_contains->bd_disk;
3911 curr_events = disk_stat_read(disk, sectors[0]) + 3937 curr_events = disk_stat_read(disk, sectors[0]) +
3912 disk_stat_read(disk, sectors[1]) - 3938 disk_stat_read(disk, sectors[1]) -
3913 atomic_read(&disk->sync_io); 3939 atomic_read(&disk->sync_io);
3914 /* The difference between curr_events and last_events 3940 /* The difference between curr_events and last_events
3915 * will be affected by any new non-sync IO (making 3941 * will be affected by any new non-sync IO (making
3916 * curr_events bigger) and any difference in the amount of 3942 * curr_events bigger) and any difference in the amount of
3917 * in-flight syncio (making current_events bigger or smaller) 3943 * in-flight syncio (making current_events bigger or smaller)
3918 * The amount in-flight is currently limited to 3944 * The amount in-flight is currently limited to
3919 * 32*64K in raid1/10 and 256*PAGE_SIZE in raid5/6 3945 * 32*64K in raid1/10 and 256*PAGE_SIZE in raid5/6
3920 * which is at most 4096 sectors. 3946 * which is at most 4096 sectors.
3921 * These numbers are fairly fragile and should be made 3947 * These numbers are fairly fragile and should be made
3922 * more robust, probably by enforcing the 3948 * more robust, probably by enforcing the
3923 * 'window size' that md_do_sync sort-of uses. 3949 * 'window size' that md_do_sync sort-of uses.
3924 * 3950 *
3925 * Note: the following is an unsigned comparison. 3951 * Note: the following is an unsigned comparison.
3926 */ 3952 */
3927 if ((curr_events - rdev->last_events + 4096) > 8192) { 3953 if ((curr_events - rdev->last_events + 4096) > 8192) {
3928 rdev->last_events = curr_events; 3954 rdev->last_events = curr_events;
3929 idle = 0; 3955 idle = 0;
3930 } 3956 }
3931 } 3957 }
3932 return idle; 3958 return idle;
3933 } 3959 }
3934 3960
3935 void md_done_sync(mddev_t *mddev, int blocks, int ok) 3961 void md_done_sync(mddev_t *mddev, int blocks, int ok)
3936 { 3962 {
3937 /* another "blocks" (512byte) blocks have been synced */ 3963 /* another "blocks" (512byte) blocks have been synced */
3938 atomic_sub(blocks, &mddev->recovery_active); 3964 atomic_sub(blocks, &mddev->recovery_active);
3939 wake_up(&mddev->recovery_wait); 3965 wake_up(&mddev->recovery_wait);
3940 if (!ok) { 3966 if (!ok) {
3941 set_bit(MD_RECOVERY_ERR, &mddev->recovery); 3967 set_bit(MD_RECOVERY_ERR, &mddev->recovery);
3942 md_wakeup_thread(mddev->thread); 3968 md_wakeup_thread(mddev->thread);
3943 // stop recovery, signal do_sync .... 3969 // stop recovery, signal do_sync ....
3944 } 3970 }
3945 } 3971 }
3946 3972
3947 3973
3948 /* md_write_start(mddev, bi) 3974 /* md_write_start(mddev, bi)
3949 * If we need to update some array metadata (e.g. 'active' flag 3975 * If we need to update some array metadata (e.g. 'active' flag
3950 * in superblock) before writing, schedule a superblock update 3976 * in superblock) before writing, schedule a superblock update
3951 * and wait for it to complete. 3977 * and wait for it to complete.
3952 */ 3978 */
3953 void md_write_start(mddev_t *mddev, struct bio *bi) 3979 void md_write_start(mddev_t *mddev, struct bio *bi)
3954 { 3980 {
3955 if (bio_data_dir(bi) != WRITE) 3981 if (bio_data_dir(bi) != WRITE)
3956 return; 3982 return;
3957 3983
3958 BUG_ON(mddev->ro == 1); 3984 BUG_ON(mddev->ro == 1);
3959 if (mddev->ro == 2) { 3985 if (mddev->ro == 2) {
3960 /* need to switch to read/write */ 3986 /* need to switch to read/write */
3961 mddev->ro = 0; 3987 mddev->ro = 0;
3962 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 3988 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
3963 md_wakeup_thread(mddev->thread); 3989 md_wakeup_thread(mddev->thread);
3964 } 3990 }
3965 atomic_inc(&mddev->writes_pending); 3991 atomic_inc(&mddev->writes_pending);
3966 if (mddev->in_sync) { 3992 if (mddev->in_sync) {
3967 spin_lock_irq(&mddev->write_lock); 3993 spin_lock_irq(&mddev->write_lock);
3968 if (mddev->in_sync) { 3994 if (mddev->in_sync) {
3969 mddev->in_sync = 0; 3995 mddev->in_sync = 0;
3970 mddev->sb_dirty = 1; 3996 mddev->sb_dirty = 1;
3971 md_wakeup_thread(mddev->thread); 3997 md_wakeup_thread(mddev->thread);
3972 } 3998 }
3973 spin_unlock_irq(&mddev->write_lock); 3999 spin_unlock_irq(&mddev->write_lock);
3974 } 4000 }
3975 wait_event(mddev->sb_wait, mddev->sb_dirty==0); 4001 wait_event(mddev->sb_wait, mddev->sb_dirty==0);
3976 } 4002 }
3977 4003
3978 void md_write_end(mddev_t *mddev) 4004 void md_write_end(mddev_t *mddev)
3979 { 4005 {
3980 if (atomic_dec_and_test(&mddev->writes_pending)) { 4006 if (atomic_dec_and_test(&mddev->writes_pending)) {
3981 if (mddev->safemode == 2) 4007 if (mddev->safemode == 2)
3982 md_wakeup_thread(mddev->thread); 4008 md_wakeup_thread(mddev->thread);
3983 else 4009 else
3984 mod_timer(&mddev->safemode_timer, jiffies + mddev->safemode_delay); 4010 mod_timer(&mddev->safemode_timer, jiffies + mddev->safemode_delay);
3985 } 4011 }
3986 } 4012 }
3987 4013
3988 static DECLARE_WAIT_QUEUE_HEAD(resync_wait); 4014 static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
3989 4015
3990 #define SYNC_MARKS 10 4016 #define SYNC_MARKS 10
3991 #define SYNC_MARK_STEP (3*HZ) 4017 #define SYNC_MARK_STEP (3*HZ)
3992 static void md_do_sync(mddev_t *mddev) 4018 static void md_do_sync(mddev_t *mddev)
3993 { 4019 {
3994 mddev_t *mddev2; 4020 mddev_t *mddev2;
3995 unsigned int currspeed = 0, 4021 unsigned int currspeed = 0,
3996 window; 4022 window;
3997 sector_t max_sectors,j, io_sectors; 4023 sector_t max_sectors,j, io_sectors;
3998 unsigned long mark[SYNC_MARKS]; 4024 unsigned long mark[SYNC_MARKS];
3999 sector_t mark_cnt[SYNC_MARKS]; 4025 sector_t mark_cnt[SYNC_MARKS];
4000 int last_mark,m; 4026 int last_mark,m;
4001 struct list_head *tmp; 4027 struct list_head *tmp;
4002 sector_t last_check; 4028 sector_t last_check;
4003 int skipped = 0; 4029 int skipped = 0;
4004 4030
4005 /* just incase thread restarts... */ 4031 /* just incase thread restarts... */
4006 if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) 4032 if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
4007 return; 4033 return;
4008 4034
4009 /* we overload curr_resync somewhat here. 4035 /* we overload curr_resync somewhat here.
4010 * 0 == not engaged in resync at all 4036 * 0 == not engaged in resync at all
4011 * 2 == checking that there is no conflict with another sync 4037 * 2 == checking that there is no conflict with another sync
4012 * 1 == like 2, but have yielded to allow conflicting resync to 4038 * 1 == like 2, but have yielded to allow conflicting resync to
4013 * commense 4039 * commense
4014 * other == active in resync - this many blocks 4040 * other == active in resync - this many blocks
4015 * 4041 *
4016 * Before starting a resync we must have set curr_resync to 4042 * Before starting a resync we must have set curr_resync to
4017 * 2, and then checked that every "conflicting" array has curr_resync 4043 * 2, and then checked that every "conflicting" array has curr_resync
4018 * less than ours. When we find one that is the same or higher 4044 * less than ours. When we find one that is the same or higher
4019 * we wait on resync_wait. To avoid deadlock, we reduce curr_resync 4045 * we wait on resync_wait. To avoid deadlock, we reduce curr_resync
4020 * to 1 if we choose to yield (based arbitrarily on address of mddev structure). 4046 * to 1 if we choose to yield (based arbitrarily on address of mddev structure).
4021 * This will mean we have to start checking from the beginning again. 4047 * This will mean we have to start checking from the beginning again.
4022 * 4048 *
4023 */ 4049 */
4024 4050
4025 do { 4051 do {
4026 mddev->curr_resync = 2; 4052 mddev->curr_resync = 2;
4027 4053
4028 try_again: 4054 try_again:
4029 if (kthread_should_stop()) { 4055 if (kthread_should_stop()) {
4030 set_bit(MD_RECOVERY_INTR, &mddev->recovery); 4056 set_bit(MD_RECOVERY_INTR, &mddev->recovery);
4031 goto skip; 4057 goto skip;
4032 } 4058 }
4033 ITERATE_MDDEV(mddev2,tmp) { 4059 ITERATE_MDDEV(mddev2,tmp) {
4034 if (mddev2 == mddev) 4060 if (mddev2 == mddev)
4035 continue; 4061 continue;
4036 if (mddev2->curr_resync && 4062 if (mddev2->curr_resync &&
4037 match_mddev_units(mddev,mddev2)) { 4063 match_mddev_units(mddev,mddev2)) {
4038 DEFINE_WAIT(wq); 4064 DEFINE_WAIT(wq);
4039 if (mddev < mddev2 && mddev->curr_resync == 2) { 4065 if (mddev < mddev2 && mddev->curr_resync == 2) {
4040 /* arbitrarily yield */ 4066 /* arbitrarily yield */
4041 mddev->curr_resync = 1; 4067 mddev->curr_resync = 1;
4042 wake_up(&resync_wait); 4068 wake_up(&resync_wait);
4043 } 4069 }
4044 if (mddev > mddev2 && mddev->curr_resync == 1) 4070 if (mddev > mddev2 && mddev->curr_resync == 1)
4045 /* no need to wait here, we can wait the next 4071 /* no need to wait here, we can wait the next
4046 * time 'round when curr_resync == 2 4072 * time 'round when curr_resync == 2
4047 */ 4073 */
4048 continue; 4074 continue;
4049 prepare_to_wait(&resync_wait, &wq, TASK_UNINTERRUPTIBLE); 4075 prepare_to_wait(&resync_wait, &wq, TASK_UNINTERRUPTIBLE);
4050 if (!kthread_should_stop() && 4076 if (!kthread_should_stop() &&
4051 mddev2->curr_resync >= mddev->curr_resync) { 4077 mddev2->curr_resync >= mddev->curr_resync) {
4052 printk(KERN_INFO "md: delaying resync of %s" 4078 printk(KERN_INFO "md: delaying resync of %s"
4053 " until %s has finished resync (they" 4079 " until %s has finished resync (they"
4054 " share one or more physical units)\n", 4080 " share one or more physical units)\n",
4055 mdname(mddev), mdname(mddev2)); 4081 mdname(mddev), mdname(mddev2));
4056 mddev_put(mddev2); 4082 mddev_put(mddev2);
4057 schedule(); 4083 schedule();
4058 finish_wait(&resync_wait, &wq); 4084 finish_wait(&resync_wait, &wq);
4059 goto try_again; 4085 goto try_again;
4060 } 4086 }
4061 finish_wait(&resync_wait, &wq); 4087 finish_wait(&resync_wait, &wq);
4062 } 4088 }
4063 } 4089 }
4064 } while (mddev->curr_resync < 2); 4090 } while (mddev->curr_resync < 2);
4065 4091
4066 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { 4092 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
4067 /* resync follows the size requested by the personality, 4093 /* resync follows the size requested by the personality,
4068 * which defaults to physical size, but can be virtual size 4094 * which defaults to physical size, but can be virtual size
4069 */ 4095 */
4070 max_sectors = mddev->resync_max_sectors; 4096 max_sectors = mddev->resync_max_sectors;
4071 mddev->resync_mismatches = 0; 4097 mddev->resync_mismatches = 0;
4072 } else 4098 } else
4073 /* recovery follows the physical size of devices */ 4099 /* recovery follows the physical size of devices */
4074 max_sectors = mddev->size << 1; 4100 max_sectors = mddev->size << 1;
4075 4101
4076 printk(KERN_INFO "md: syncing RAID array %s\n", mdname(mddev)); 4102 printk(KERN_INFO "md: syncing RAID array %s\n", mdname(mddev));
4077 printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:" 4103 printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:"
4078 " %d KB/sec/disc.\n", sysctl_speed_limit_min); 4104 " %d KB/sec/disc.\n", sysctl_speed_limit_min);
4079 printk(KERN_INFO "md: using maximum available idle IO bandwidth " 4105 printk(KERN_INFO "md: using maximum available idle IO bandwidth "
4080 "(but not more than %d KB/sec) for reconstruction.\n", 4106 "(but not more than %d KB/sec) for reconstruction.\n",
4081 sysctl_speed_limit_max); 4107 sysctl_speed_limit_max);
4082 4108
4083 is_mddev_idle(mddev); /* this also initializes IO event counters */ 4109 is_mddev_idle(mddev); /* this also initializes IO event counters */
4084 /* we don't use the checkpoint if there's a bitmap */ 4110 /* we don't use the checkpoint if there's a bitmap */
4085 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && !mddev->bitmap 4111 if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && !mddev->bitmap
4086 && ! test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) 4112 && ! test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
4087 j = mddev->recovery_cp; 4113 j = mddev->recovery_cp;
4088 else 4114 else
4089 j = 0; 4115 j = 0;
4090 io_sectors = 0; 4116 io_sectors = 0;
4091 for (m = 0; m < SYNC_MARKS; m++) { 4117 for (m = 0; m < SYNC_MARKS; m++) {
4092 mark[m] = jiffies; 4118 mark[m] = jiffies;
4093 mark_cnt[m] = io_sectors; 4119 mark_cnt[m] = io_sectors;
4094 } 4120 }
4095 last_mark = 0; 4121 last_mark = 0;
4096 mddev->resync_mark = mark[last_mark]; 4122 mddev->resync_mark = mark[last_mark];
4097 mddev->resync_mark_cnt = mark_cnt[last_mark]; 4123 mddev->resync_mark_cnt = mark_cnt[last_mark];
4098 4124
4099 /* 4125 /*
4100 * Tune reconstruction: 4126 * Tune reconstruction:
4101 */ 4127 */
4102 window = 32*(PAGE_SIZE/512); 4128 window = 32*(PAGE_SIZE/512);
4103 printk(KERN_INFO "md: using %dk window, over a total of %llu blocks.\n", 4129 printk(KERN_INFO "md: using %dk window, over a total of %llu blocks.\n",
4104 window/2,(unsigned long long) max_sectors/2); 4130 window/2,(unsigned long long) max_sectors/2);
4105 4131
4106 atomic_set(&mddev->recovery_active, 0); 4132 atomic_set(&mddev->recovery_active, 0);
4107 init_waitqueue_head(&mddev->recovery_wait); 4133 init_waitqueue_head(&mddev->recovery_wait);
4108 last_check = 0; 4134 last_check = 0;
4109 4135
4110 if (j>2) { 4136 if (j>2) {
4111 printk(KERN_INFO 4137 printk(KERN_INFO
4112 "md: resuming recovery of %s from checkpoint.\n", 4138 "md: resuming recovery of %s from checkpoint.\n",
4113 mdname(mddev)); 4139 mdname(mddev));
4114 mddev->curr_resync = j; 4140 mddev->curr_resync = j;
4115 } 4141 }
4116 4142
4117 while (j < max_sectors) { 4143 while (j < max_sectors) {
4118 sector_t sectors; 4144 sector_t sectors;
4119 4145
4120 skipped = 0; 4146 skipped = 0;
4121 sectors = mddev->pers->sync_request(mddev, j, &skipped, 4147 sectors = mddev->pers->sync_request(mddev, j, &skipped,
4122 currspeed < sysctl_speed_limit_min); 4148 currspeed < sysctl_speed_limit_min);
4123 if (sectors == 0) { 4149 if (sectors == 0) {
4124 set_bit(MD_RECOVERY_ERR, &mddev->recovery); 4150 set_bit(MD_RECOVERY_ERR, &mddev->recovery);
4125 goto out; 4151 goto out;
4126 } 4152 }
4127 4153
4128 if (!skipped) { /* actual IO requested */ 4154 if (!skipped) { /* actual IO requested */
4129 io_sectors += sectors; 4155 io_sectors += sectors;
4130 atomic_add(sectors, &mddev->recovery_active); 4156 atomic_add(sectors, &mddev->recovery_active);
4131 } 4157 }
4132 4158
4133 j += sectors; 4159 j += sectors;
4134 if (j>1) mddev->curr_resync = j; 4160 if (j>1) mddev->curr_resync = j;
4135 if (last_check == 0) 4161 if (last_check == 0)
4136 /* this is the earliers that rebuilt will be 4162 /* this is the earliers that rebuilt will be
4137 * visible in /proc/mdstat 4163 * visible in /proc/mdstat
4138 */ 4164 */
4139 md_new_event(mddev); 4165 md_new_event(mddev);
4140 4166
4141 if (last_check + window > io_sectors || j == max_sectors) 4167 if (last_check + window > io_sectors || j == max_sectors)
4142 continue; 4168 continue;
4143 4169
4144 last_check = io_sectors; 4170 last_check = io_sectors;
4145 4171
4146 if (test_bit(MD_RECOVERY_INTR, &mddev->recovery) || 4172 if (test_bit(MD_RECOVERY_INTR, &mddev->recovery) ||
4147 test_bit(MD_RECOVERY_ERR, &mddev->recovery)) 4173 test_bit(MD_RECOVERY_ERR, &mddev->recovery))
4148 break; 4174 break;
4149 4175
4150 repeat: 4176 repeat:
4151 if (time_after_eq(jiffies, mark[last_mark] + SYNC_MARK_STEP )) { 4177 if (time_after_eq(jiffies, mark[last_mark] + SYNC_MARK_STEP )) {
4152 /* step marks */ 4178 /* step marks */
4153 int next = (last_mark+1) % SYNC_MARKS; 4179 int next = (last_mark+1) % SYNC_MARKS;
4154 4180
4155 mddev->resync_mark = mark[next]; 4181 mddev->resync_mark = mark[next];
4156 mddev->resync_mark_cnt = mark_cnt[next]; 4182 mddev->resync_mark_cnt = mark_cnt[next];
4157 mark[next] = jiffies; 4183 mark[next] = jiffies;
4158 mark_cnt[next] = io_sectors - atomic_read(&mddev->recovery_active); 4184 mark_cnt[next] = io_sectors - atomic_read(&mddev->recovery_active);
4159 last_mark = next; 4185 last_mark = next;
4160 } 4186 }
4161 4187
4162 4188
4163 if (kthread_should_stop()) { 4189 if (kthread_should_stop()) {
4164 /* 4190 /*
4165 * got a signal, exit. 4191 * got a signal, exit.
4166 */ 4192 */
4167 printk(KERN_INFO 4193 printk(KERN_INFO
4168 "md: md_do_sync() got signal ... exiting\n"); 4194 "md: md_do_sync() got signal ... exiting\n");
4169 set_bit(MD_RECOVERY_INTR, &mddev->recovery); 4195 set_bit(MD_RECOVERY_INTR, &mddev->recovery);
4170 goto out; 4196 goto out;
4171 } 4197 }
4172 4198
4173 /* 4199 /*
4174 * this loop exits only if either when we are slower than 4200 * this loop exits only if either when we are slower than
4175 * the 'hard' speed limit, or the system was IO-idle for 4201 * the 'hard' speed limit, or the system was IO-idle for
4176 * a jiffy. 4202 * a jiffy.
4177 * the system might be non-idle CPU-wise, but we only care 4203 * the system might be non-idle CPU-wise, but we only care
4178 * about not overloading the IO subsystem. (things like an 4204 * about not overloading the IO subsystem. (things like an
4179 * e2fsck being done on the RAID array should execute fast) 4205 * e2fsck being done on the RAID array should execute fast)
4180 */ 4206 */
4181 mddev->queue->unplug_fn(mddev->queue); 4207 mddev->queue->unplug_fn(mddev->queue);
4182 cond_resched(); 4208 cond_resched();
4183 4209
4184 currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2 4210 currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2
4185 /((jiffies-mddev->resync_mark)/HZ +1) +1; 4211 /((jiffies-mddev->resync_mark)/HZ +1) +1;
4186 4212
4187 if (currspeed > sysctl_speed_limit_min) { 4213 if (currspeed > sysctl_speed_limit_min) {
4188 if ((currspeed > sysctl_speed_limit_max) || 4214 if ((currspeed > sysctl_speed_limit_max) ||
4189 !is_mddev_idle(mddev)) { 4215 !is_mddev_idle(mddev)) {
4190 msleep(500); 4216 msleep(500);
4191 goto repeat; 4217 goto repeat;
4192 } 4218 }
4193 } 4219 }
4194 } 4220 }
4195 printk(KERN_INFO "md: %s: sync done.\n",mdname(mddev)); 4221 printk(KERN_INFO "md: %s: sync done.\n",mdname(mddev));
4196 /* 4222 /*
4197 * this also signals 'finished resyncing' to md_stop 4223 * this also signals 'finished resyncing' to md_stop
4198 */ 4224 */
4199 out: 4225 out:
4200 mddev->queue->unplug_fn(mddev->queue); 4226 mddev->queue->unplug_fn(mddev->queue);
4201 4227
4202 wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); 4228 wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
4203 4229
4204 /* tell personality that we are finished */ 4230 /* tell personality that we are finished */
4205 mddev->pers->sync_request(mddev, max_sectors, &skipped, 1); 4231 mddev->pers->sync_request(mddev, max_sectors, &skipped, 1);
4206 4232
4207 if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && 4233 if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) &&
4208 mddev->curr_resync > 2 && 4234 mddev->curr_resync > 2 &&
4209 mddev->curr_resync >= mddev->recovery_cp) { 4235 mddev->curr_resync >= mddev->recovery_cp) {
4210 if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { 4236 if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
4211 printk(KERN_INFO 4237 printk(KERN_INFO
4212 "md: checkpointing recovery of %s.\n", 4238 "md: checkpointing recovery of %s.\n",
4213 mdname(mddev)); 4239 mdname(mddev));
4214 mddev->recovery_cp = mddev->curr_resync; 4240 mddev->recovery_cp = mddev->curr_resync;
4215 } else 4241 } else
4216 mddev->recovery_cp = MaxSector; 4242 mddev->recovery_cp = MaxSector;
4217 } 4243 }
4218 4244
4219 skip: 4245 skip:
4220 mddev->curr_resync = 0; 4246 mddev->curr_resync = 0;
4221 wake_up(&resync_wait); 4247 wake_up(&resync_wait);
4222 set_bit(MD_RECOVERY_DONE, &mddev->recovery); 4248 set_bit(MD_RECOVERY_DONE, &mddev->recovery);
4223 md_wakeup_thread(mddev->thread); 4249 md_wakeup_thread(mddev->thread);
4224 } 4250 }
4225 4251
4226 4252
4227 /* 4253 /*
4228 * This routine is regularly called by all per-raid-array threads to 4254 * This routine is regularly called by all per-raid-array threads to
4229 * deal with generic issues like resync and super-block update. 4255 * deal with generic issues like resync and super-block update.
4230 * Raid personalities that don't have a thread (linear/raid0) do not 4256 * Raid personalities that don't have a thread (linear/raid0) do not
4231 * need this as they never do any recovery or update the superblock. 4257 * need this as they never do any recovery or update the superblock.
4232 * 4258 *
4233 * It does not do any resync itself, but rather "forks" off other threads 4259 * It does not do any resync itself, but rather "forks" off other threads
4234 * to do that as needed. 4260 * to do that as needed.
4235 * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in 4261 * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in
4236 * "->recovery" and create a thread at ->sync_thread. 4262 * "->recovery" and create a thread at ->sync_thread.
4237 * When the thread finishes it sets MD_RECOVERY_DONE (and might set MD_RECOVERY_ERR) 4263 * When the thread finishes it sets MD_RECOVERY_DONE (and might set MD_RECOVERY_ERR)
4238 * and wakeups up this thread which will reap the thread and finish up. 4264 * and wakeups up this thread which will reap the thread and finish up.
4239 * This thread also removes any faulty devices (with nr_pending == 0). 4265 * This thread also removes any faulty devices (with nr_pending == 0).
4240 * 4266 *
4241 * The overall approach is: 4267 * The overall approach is:
4242 * 1/ if the superblock needs updating, update it. 4268 * 1/ if the superblock needs updating, update it.
4243 * 2/ If a recovery thread is running, don't do anything else. 4269 * 2/ If a recovery thread is running, don't do anything else.
4244 * 3/ If recovery has finished, clean up, possibly marking spares active. 4270 * 3/ If recovery has finished, clean up, possibly marking spares active.
4245 * 4/ If there are any faulty devices, remove them. 4271 * 4/ If there are any faulty devices, remove them.
4246 * 5/ If array is degraded, try to add spares devices 4272 * 5/ If array is degraded, try to add spares devices
4247 * 6/ If array has spares or is not in-sync, start a resync thread. 4273 * 6/ If array has spares or is not in-sync, start a resync thread.
4248 */ 4274 */
4249 void md_check_recovery(mddev_t *mddev) 4275 void md_check_recovery(mddev_t *mddev)
4250 { 4276 {
4251 mdk_rdev_t *rdev; 4277 mdk_rdev_t *rdev;
4252 struct list_head *rtmp; 4278 struct list_head *rtmp;
4253 4279
4254 4280
4255 if (mddev->bitmap) 4281 if (mddev->bitmap)
4256 bitmap_daemon_work(mddev->bitmap); 4282 bitmap_daemon_work(mddev->bitmap);
4257 4283
4258 if (mddev->ro) 4284 if (mddev->ro)
4259 return; 4285 return;
4260 4286
4261 if (signal_pending(current)) { 4287 if (signal_pending(current)) {
4262 if (mddev->pers->sync_request) { 4288 if (mddev->pers->sync_request) {
4263 printk(KERN_INFO "md: %s in immediate safe mode\n", 4289 printk(KERN_INFO "md: %s in immediate safe mode\n",
4264 mdname(mddev)); 4290 mdname(mddev));
4265 mddev->safemode = 2; 4291 mddev->safemode = 2;
4266 } 4292 }
4267 flush_signals(current); 4293 flush_signals(current);
4268 } 4294 }
4269 4295
4270 if ( ! ( 4296 if ( ! (
4271 mddev->sb_dirty || 4297 mddev->sb_dirty ||
4272 test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || 4298 test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
4273 test_bit(MD_RECOVERY_DONE, &mddev->recovery) || 4299 test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
4274 (mddev->safemode == 1) || 4300 (mddev->safemode == 1) ||
4275 (mddev->safemode == 2 && ! atomic_read(&mddev->writes_pending) 4301 (mddev->safemode == 2 && ! atomic_read(&mddev->writes_pending)
4276 && !mddev->in_sync && mddev->recovery_cp == MaxSector) 4302 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
4277 )) 4303 ))
4278 return; 4304 return;
4279 4305
4280 if (mddev_trylock(mddev)==0) { 4306 if (mddev_trylock(mddev)==0) {
4281 int spares =0; 4307 int spares =0;
4282 4308
4283 spin_lock_irq(&mddev->write_lock); 4309 spin_lock_irq(&mddev->write_lock);
4284 if (mddev->safemode && !atomic_read(&mddev->writes_pending) && 4310 if (mddev->safemode && !atomic_read(&mddev->writes_pending) &&
4285 !mddev->in_sync && mddev->recovery_cp == MaxSector) { 4311 !mddev->in_sync && mddev->recovery_cp == MaxSector) {
4286 mddev->in_sync = 1; 4312 mddev->in_sync = 1;
4287 mddev->sb_dirty = 1; 4313 mddev->sb_dirty = 1;
4288 } 4314 }
4289 if (mddev->safemode == 1) 4315 if (mddev->safemode == 1)
4290 mddev->safemode = 0; 4316 mddev->safemode = 0;
4291 spin_unlock_irq(&mddev->write_lock); 4317 spin_unlock_irq(&mddev->write_lock);
4292 4318
4293 if (mddev->sb_dirty) 4319 if (mddev->sb_dirty)
4294 md_update_sb(mddev); 4320 md_update_sb(mddev);
4295 4321
4296 4322
4297 if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && 4323 if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
4298 !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { 4324 !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
4299 /* resync/recovery still happening */ 4325 /* resync/recovery still happening */
4300 clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 4326 clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
4301 goto unlock; 4327 goto unlock;
4302 } 4328 }
4303 if (mddev->sync_thread) { 4329 if (mddev->sync_thread) {
4304 /* resync has finished, collect result */ 4330 /* resync has finished, collect result */
4305 md_unregister_thread(mddev->sync_thread); 4331 md_unregister_thread(mddev->sync_thread);
4306 mddev->sync_thread = NULL; 4332 mddev->sync_thread = NULL;
4307 if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && 4333 if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) &&
4308 !test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { 4334 !test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
4309 /* success...*/ 4335 /* success...*/
4310 /* activate any spares */ 4336 /* activate any spares */
4311 mddev->pers->spare_active(mddev); 4337 mddev->pers->spare_active(mddev);
4312 } 4338 }
4313 md_update_sb(mddev); 4339 md_update_sb(mddev);
4314 4340
4315 /* if array is no-longer degraded, then any saved_raid_disk 4341 /* if array is no-longer degraded, then any saved_raid_disk
4316 * information must be scrapped 4342 * information must be scrapped
4317 */ 4343 */
4318 if (!mddev->degraded) 4344 if (!mddev->degraded)
4319 ITERATE_RDEV(mddev,rdev,rtmp) 4345 ITERATE_RDEV(mddev,rdev,rtmp)
4320 rdev->saved_raid_disk = -1; 4346 rdev->saved_raid_disk = -1;
4321 4347
4322 mddev->recovery = 0; 4348 mddev->recovery = 0;
4323 /* flag recovery needed just to double check */ 4349 /* flag recovery needed just to double check */
4324 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 4350 set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
4325 md_new_event(mddev); 4351 md_new_event(mddev);
4326 goto unlock; 4352 goto unlock;
4327 } 4353 }
4328 /* Clear some bits that don't mean anything, but 4354 /* Clear some bits that don't mean anything, but
4329 * might be left set 4355 * might be left set
4330 */ 4356 */
4331 clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); 4357 clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
4332 clear_bit(MD_RECOVERY_ERR, &mddev->recovery); 4358 clear_bit(MD_RECOVERY_ERR, &mddev->recovery);
4333 clear_bit(MD_RECOVERY_INTR, &mddev->recovery); 4359 clear_bit(MD_RECOVERY_INTR, &mddev->recovery);
4334 clear_bit(MD_RECOVERY_DONE, &mddev->recovery); 4360 clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
4335 4361
4336 /* no recovery is running. 4362 /* no recovery is running.
4337 * remove any failed drives, then 4363 * remove any failed drives, then
4338 * add spares if possible. 4364 * add spares if possible.
4339 * Spare are also removed and re-added, to allow 4365 * Spare are also removed and re-added, to allow
4340 * the personality to fail the re-add. 4366 * the personality to fail the re-add.
4341 */ 4367 */
4342 ITERATE_RDEV(mddev,rdev,rtmp) 4368 ITERATE_RDEV(mddev,rdev,rtmp)
4343 if (rdev->raid_disk >= 0 && 4369 if (rdev->raid_disk >= 0 &&
4344 (test_bit(Faulty, &rdev->flags) || ! test_bit(In_sync, &rdev->flags)) && 4370 (test_bit(Faulty, &rdev->flags) || ! test_bit(In_sync, &rdev->flags)) &&
4345 atomic_read(&rdev->nr_pending)==0) { 4371 atomic_read(&rdev->nr_pending)==0) {
4346 if (mddev->pers->hot_remove_disk(mddev, rdev->raid_disk)==0) { 4372 if (mddev->pers->hot_remove_disk(mddev, rdev->raid_disk)==0) {
4347 char nm[20]; 4373 char nm[20];
4348 sprintf(nm,"rd%d", rdev->raid_disk); 4374 sprintf(nm,"rd%d", rdev->raid_disk);
4349 sysfs_remove_link(&mddev->kobj, nm); 4375 sysfs_remove_link(&mddev->kobj, nm);
4350 rdev->raid_disk = -1; 4376 rdev->raid_disk = -1;
4351 } 4377 }
4352 } 4378 }
4353 4379
4354 if (mddev->degraded) { 4380 if (mddev->degraded) {
4355 ITERATE_RDEV(mddev,rdev,rtmp) 4381 ITERATE_RDEV(mddev,rdev,rtmp)
4356 if (rdev->raid_disk < 0 4382 if (rdev->raid_disk < 0
4357 && !test_bit(Faulty, &rdev->flags)) { 4383 && !test_bit(Faulty, &rdev->flags)) {
4358 if (mddev->pers->hot_add_disk(mddev,rdev)) { 4384 if (mddev->pers->hot_add_disk(mddev,rdev)) {
4359 char nm[20]; 4385 char nm[20];
4360 sprintf(nm, "rd%d", rdev->raid_disk); 4386 sprintf(nm, "rd%d", rdev->raid_disk);
4361 sysfs_create_link(&mddev->kobj, &rdev->kobj, nm); 4387 sysfs_create_link(&mddev->kobj, &rdev->kobj, nm);
4362 spares++; 4388 spares++;
4363 md_new_event(mddev); 4389 md_new_event(mddev);
4364 } else 4390 } else
4365 break; 4391 break;
4366 } 4392 }
4367 } 4393 }
4368 4394
4369 if (spares) { 4395 if (spares) {
4370 clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); 4396 clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
4371 clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); 4397 clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
4372 } else if (mddev->recovery_cp < MaxSector) { 4398 } else if (mddev->recovery_cp < MaxSector) {
4373 set_bit(MD_RECOVERY_SYNC, &mddev->recovery); 4399 set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
4374 } else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) 4400 } else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
4375 /* nothing to be done ... */ 4401 /* nothing to be done ... */
4376 goto unlock; 4402 goto unlock;
4377 4403
4378 if (mddev->pers->sync_request) { 4404 if (mddev->pers->sync_request) {
4379 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); 4405 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
4380 if (spares && mddev->bitmap && ! mddev->bitmap->file) { 4406 if (spares && mddev->bitmap && ! mddev->bitmap->file) {
4381 /* We are adding a device or devices to an array 4407 /* We are adding a device or devices to an array
4382 * which has the bitmap stored on all devices. 4408 * which has the bitmap stored on all devices.
4383 * So make sure all bitmap pages get written 4409 * So make sure all bitmap pages get written
4384 */ 4410 */
4385 bitmap_write_all(mddev->bitmap); 4411 bitmap_write_all(mddev->bitmap);
4386 } 4412 }
4387 mddev->sync_thread = md_register_thread(md_do_sync, 4413 mddev->sync_thread = md_register_thread(md_do_sync,
4388 mddev, 4414 mddev,
4389 "%s_resync"); 4415 "%s_resync");
4390 if (!mddev->sync_thread) { 4416 if (!mddev->sync_thread) {
4391 printk(KERN_ERR "%s: could not start resync" 4417 printk(KERN_ERR "%s: could not start resync"
4392 " thread...\n", 4418 " thread...\n",
4393 mdname(mddev)); 4419 mdname(mddev));
4394 /* leave the spares where they are, it shouldn't hurt */ 4420 /* leave the spares where they are, it shouldn't hurt */
4395 mddev->recovery = 0; 4421 mddev->recovery = 0;
4396 } else 4422 } else
4397 md_wakeup_thread(mddev->sync_thread); 4423 md_wakeup_thread(mddev->sync_thread);
4398 md_new_event(mddev); 4424 md_new_event(mddev);
4399 } 4425 }
4400 unlock: 4426 unlock:
4401 mddev_unlock(mddev); 4427 mddev_unlock(mddev);
4402 } 4428 }
4403 } 4429 }
4404 4430
4405 static int md_notify_reboot(struct notifier_block *this, 4431 static int md_notify_reboot(struct notifier_block *this,
4406 unsigned long code, void *x) 4432 unsigned long code, void *x)
4407 { 4433 {
4408 struct list_head *tmp; 4434 struct list_head *tmp;
4409 mddev_t *mddev; 4435 mddev_t *mddev;
4410 4436
4411 if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) { 4437 if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) {
4412 4438
4413 printk(KERN_INFO "md: stopping all md devices.\n"); 4439 printk(KERN_INFO "md: stopping all md devices.\n");
4414 4440
4415 ITERATE_MDDEV(mddev,tmp) 4441 ITERATE_MDDEV(mddev,tmp)
4416 if (mddev_trylock(mddev)==0) 4442 if (mddev_trylock(mddev)==0)
4417 do_md_stop (mddev, 1); 4443 do_md_stop (mddev, 1);
4418 /* 4444 /*
4419 * certain more exotic SCSI devices are known to be 4445 * certain more exotic SCSI devices are known to be
4420 * volatile wrt too early system reboots. While the 4446 * volatile wrt too early system reboots. While the
4421 * right place to handle this issue is the given 4447 * right place to handle this issue is the given
4422 * driver, we do want to have a safe RAID driver ... 4448 * driver, we do want to have a safe RAID driver ...
4423 */ 4449 */
4424 mdelay(1000*1); 4450 mdelay(1000*1);
4425 } 4451 }
4426 return NOTIFY_DONE; 4452 return NOTIFY_DONE;
4427 } 4453 }
4428 4454
4429 static struct notifier_block md_notifier = { 4455 static struct notifier_block md_notifier = {
4430 .notifier_call = md_notify_reboot, 4456 .notifier_call = md_notify_reboot,
4431 .next = NULL, 4457 .next = NULL,
4432 .priority = INT_MAX, /* before any real devices */ 4458 .priority = INT_MAX, /* before any real devices */
4433 }; 4459 };
4434 4460
4435 static void md_geninit(void) 4461 static void md_geninit(void)
4436 { 4462 {
4437 struct proc_dir_entry *p; 4463 struct proc_dir_entry *p;
4438 4464
4439 dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); 4465 dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t));
4440 4466
4441 p = create_proc_entry("mdstat", S_IRUGO, NULL); 4467 p = create_proc_entry("mdstat", S_IRUGO, NULL);
4442 if (p) 4468 if (p)
4443 p->proc_fops = &md_seq_fops; 4469 p->proc_fops = &md_seq_fops;
4444 } 4470 }
4445 4471
4446 static int __init md_init(void) 4472 static int __init md_init(void)
4447 { 4473 {
4448 int minor; 4474 int minor;
4449 4475
4450 printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d," 4476 printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d,"
4451 " MD_SB_DISKS=%d\n", 4477 " MD_SB_DISKS=%d\n",
4452 MD_MAJOR_VERSION, MD_MINOR_VERSION, 4478 MD_MAJOR_VERSION, MD_MINOR_VERSION,
4453 MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); 4479 MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS);
4454 printk(KERN_INFO "md: bitmap version %d.%d\n", BITMAP_MAJOR_HI, 4480 printk(KERN_INFO "md: bitmap version %d.%d\n", BITMAP_MAJOR_HI,
4455 BITMAP_MINOR); 4481 BITMAP_MINOR);
4456 4482
4457 if (register_blkdev(MAJOR_NR, "md")) 4483 if (register_blkdev(MAJOR_NR, "md"))
4458 return -1; 4484 return -1;
4459 if ((mdp_major=register_blkdev(0, "mdp"))<=0) { 4485 if ((mdp_major=register_blkdev(0, "mdp"))<=0) {
4460 unregister_blkdev(MAJOR_NR, "md"); 4486 unregister_blkdev(MAJOR_NR, "md");
4461 return -1; 4487 return -1;
4462 } 4488 }
4463 devfs_mk_dir("md"); 4489 devfs_mk_dir("md");
4464 blk_register_region(MKDEV(MAJOR_NR, 0), MAX_MD_DEVS, THIS_MODULE, 4490 blk_register_region(MKDEV(MAJOR_NR, 0), MAX_MD_DEVS, THIS_MODULE,
4465 md_probe, NULL, NULL); 4491 md_probe, NULL, NULL);
4466 blk_register_region(MKDEV(mdp_major, 0), MAX_MD_DEVS<<MdpMinorShift, THIS_MODULE, 4492 blk_register_region(MKDEV(mdp_major, 0), MAX_MD_DEVS<<MdpMinorShift, THIS_MODULE,
4467 md_probe, NULL, NULL); 4493 md_probe, NULL, NULL);
4468 4494
4469 for (minor=0; minor < MAX_MD_DEVS; ++minor) 4495 for (minor=0; minor < MAX_MD_DEVS; ++minor)
4470 devfs_mk_bdev(MKDEV(MAJOR_NR, minor), 4496 devfs_mk_bdev(MKDEV(MAJOR_NR, minor),
4471 S_IFBLK|S_IRUSR|S_IWUSR, 4497 S_IFBLK|S_IRUSR|S_IWUSR,
4472 "md/%d", minor); 4498 "md/%d", minor);
4473 4499
4474 for (minor=0; minor < MAX_MD_DEVS; ++minor) 4500 for (minor=0; minor < MAX_MD_DEVS; ++minor)
4475 devfs_mk_bdev(MKDEV(mdp_major, minor<<MdpMinorShift), 4501 devfs_mk_bdev(MKDEV(mdp_major, minor<<MdpMinorShift),
4476 S_IFBLK|S_IRUSR|S_IWUSR, 4502 S_IFBLK|S_IRUSR|S_IWUSR,
4477 "md/mdp%d", minor); 4503 "md/mdp%d", minor);
4478 4504
4479 4505
4480 register_reboot_notifier(&md_notifier); 4506 register_reboot_notifier(&md_notifier);
4481 raid_table_header = register_sysctl_table(raid_root_table, 1); 4507 raid_table_header = register_sysctl_table(raid_root_table, 1);
4482 4508
4483 md_geninit(); 4509 md_geninit();
4484 return (0); 4510 return (0);
4485 } 4511 }
4486 4512
4487 4513
4488 #ifndef MODULE 4514 #ifndef MODULE
4489 4515
4490 /* 4516 /*
4491 * Searches all registered partitions for autorun RAID arrays 4517 * Searches all registered partitions for autorun RAID arrays
4492 * at boot time. 4518 * at boot time.
4493 */ 4519 */
4494 static dev_t detected_devices[128]; 4520 static dev_t detected_devices[128];
4495 static int dev_cnt; 4521 static int dev_cnt;
4496 4522
4497 void md_autodetect_dev(dev_t dev) 4523 void md_autodetect_dev(dev_t dev)
4498 { 4524 {
4499 if (dev_cnt >= 0 && dev_cnt < 127) 4525 if (dev_cnt >= 0 && dev_cnt < 127)
4500 detected_devices[dev_cnt++] = dev; 4526 detected_devices[dev_cnt++] = dev;
4501 } 4527 }
4502 4528
4503 4529
4504 static void autostart_arrays(int part) 4530 static void autostart_arrays(int part)
4505 { 4531 {
4506 mdk_rdev_t *rdev; 4532 mdk_rdev_t *rdev;
4507 int i; 4533 int i;
4508 4534
4509 printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); 4535 printk(KERN_INFO "md: Autodetecting RAID arrays.\n");
4510 4536
4511 for (i = 0; i < dev_cnt; i++) { 4537 for (i = 0; i < dev_cnt; i++) {
4512 dev_t dev = detected_devices[i]; 4538 dev_t dev = detected_devices[i];
4513 4539
4514 rdev = md_import_device(dev,0, 0); 4540 rdev = md_import_device(dev,0, 0);
4515 if (IS_ERR(rdev)) 4541 if (IS_ERR(rdev))
4516 continue; 4542 continue;
4517 4543
4518 if (test_bit(Faulty, &rdev->flags)) { 4544 if (test_bit(Faulty, &rdev->flags)) {
4519 MD_BUG(); 4545 MD_BUG();
4520 continue; 4546 continue;
4521 } 4547 }
4522 list_add(&rdev->same_set, &pending_raid_disks); 4548 list_add(&rdev->same_set, &pending_raid_disks);
4523 } 4549 }
4524 dev_cnt = 0; 4550 dev_cnt = 0;
4525 4551
4526 autorun_devices(part); 4552 autorun_devices(part);
4527 } 4553 }
4528 4554
4529 #endif 4555 #endif
4530 4556
4531 static __exit void md_exit(void) 4557 static __exit void md_exit(void)
4532 { 4558 {
4533 mddev_t *mddev; 4559 mddev_t *mddev;
4534 struct list_head *tmp; 4560 struct list_head *tmp;
4535 int i; 4561 int i;
4536 blk_unregister_region(MKDEV(MAJOR_NR,0), MAX_MD_DEVS); 4562 blk_unregister_region(MKDEV(MAJOR_NR,0), MAX_MD_DEVS);
4537 blk_unregister_region(MKDEV(mdp_major,0), MAX_MD_DEVS << MdpMinorShift); 4563 blk_unregister_region(MKDEV(mdp_major,0), MAX_MD_DEVS << MdpMinorShift);
4538 for (i=0; i < MAX_MD_DEVS; i++) 4564 for (i=0; i < MAX_MD_DEVS; i++)
4539 devfs_remove("md/%d", i); 4565 devfs_remove("md/%d", i);
4540 for (i=0; i < MAX_MD_DEVS; i++) 4566 for (i=0; i < MAX_MD_DEVS; i++)
4541 devfs_remove("md/d%d", i); 4567 devfs_remove("md/d%d", i);
4542 4568
4543 devfs_remove("md"); 4569 devfs_remove("md");
4544 4570
4545 unregister_blkdev(MAJOR_NR,"md"); 4571 unregister_blkdev(MAJOR_NR,"md");
4546 unregister_blkdev(mdp_major, "mdp"); 4572 unregister_blkdev(mdp_major, "mdp");
4547 unregister_reboot_notifier(&md_notifier); 4573 unregister_reboot_notifier(&md_notifier);
4548 unregister_sysctl_table(raid_table_header); 4574 unregister_sysctl_table(raid_table_header);
4549 remove_proc_entry("mdstat", NULL); 4575 remove_proc_entry("mdstat", NULL);
4550 ITERATE_MDDEV(mddev,tmp) { 4576 ITERATE_MDDEV(mddev,tmp) {
4551 struct gendisk *disk = mddev->gendisk; 4577 struct gendisk *disk = mddev->gendisk;
4552 if (!disk) 4578 if (!disk)
4553 continue; 4579 continue;
4554 export_array(mddev); 4580 export_array(mddev);
4555 del_gendisk(disk); 4581 del_gendisk(disk);
4556 put_disk(disk); 4582 put_disk(disk);
4557 mddev->gendisk = NULL; 4583 mddev->gendisk = NULL;
4558 mddev_put(mddev); 4584 mddev_put(mddev);
4559 } 4585 }
4560 } 4586 }
4561 4587
4562 module_init(md_init) 4588 module_init(md_init)
4563 module_exit(md_exit) 4589 module_exit(md_exit)
4564 4590
4565 static int get_ro(char *buffer, struct kernel_param *kp) 4591 static int get_ro(char *buffer, struct kernel_param *kp)
4566 { 4592 {
4567 return sprintf(buffer, "%d", start_readonly); 4593 return sprintf(buffer, "%d", start_readonly);
4568 } 4594 }
4569 static int set_ro(const char *val, struct kernel_param *kp) 4595 static int set_ro(const char *val, struct kernel_param *kp)
4570 { 4596 {
4571 char *e; 4597 char *e;
4572 int num = simple_strtoul(val, &e, 10); 4598 int num = simple_strtoul(val, &e, 10);
4573 if (*val && (*e == '\0' || *e == '\n')) { 4599 if (*val && (*e == '\0' || *e == '\n')) {
4574 start_readonly = num; 4600 start_readonly = num;
4575 return 0;; 4601 return 0;;
4576 } 4602 }
4577 return -EINVAL; 4603 return -EINVAL;
4578 } 4604 }
4579 4605
4580 module_param_call(start_ro, set_ro, get_ro, NULL, 0600); 4606 module_param_call(start_ro, set_ro, get_ro, NULL, 0600);
4581 module_param(start_dirty_degraded, int, 0644); 4607 module_param(start_dirty_degraded, int, 0644);
4582 4608
4583 4609
4584 EXPORT_SYMBOL(register_md_personality); 4610 EXPORT_SYMBOL(register_md_personality);
4585 EXPORT_SYMBOL(unregister_md_personality); 4611 EXPORT_SYMBOL(unregister_md_personality);
4586 EXPORT_SYMBOL(md_error); 4612 EXPORT_SYMBOL(md_error);
4587 EXPORT_SYMBOL(md_done_sync); 4613 EXPORT_SYMBOL(md_done_sync);
4588 EXPORT_SYMBOL(md_write_start); 4614 EXPORT_SYMBOL(md_write_start);
4589 EXPORT_SYMBOL(md_write_end); 4615 EXPORT_SYMBOL(md_write_end);
4590 EXPORT_SYMBOL(md_register_thread); 4616 EXPORT_SYMBOL(md_register_thread);
4591 EXPORT_SYMBOL(md_unregister_thread); 4617 EXPORT_SYMBOL(md_unregister_thread);
4592 EXPORT_SYMBOL(md_wakeup_thread); 4618 EXPORT_SYMBOL(md_wakeup_thread);
4593 EXPORT_SYMBOL(md_print_devices); 4619 EXPORT_SYMBOL(md_print_devices);
4594 EXPORT_SYMBOL(md_check_recovery); 4620 EXPORT_SYMBOL(md_check_recovery);
4595 MODULE_LICENSE("GPL"); 4621 MODULE_LICENSE("GPL");
4596 MODULE_ALIAS("md"); 4622 MODULE_ALIAS("md");
4597 MODULE_ALIAS_BLOCKDEV_MAJOR(MD_MAJOR); 4623 MODULE_ALIAS_BLOCKDEV_MAJOR(MD_MAJOR);
4598 4624