Commit 3b34380ae8c5df6debd85183c7fa1ac05f79b7d2
Committed by
Linus Torvalds
1 parent
03c902e17f
[PATCH] md: allow chunk_size to be settable through sysfs
... only before array is started of course. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Showing 2 changed files with 34 additions and 0 deletions Inline Diff
Documentation/md.txt
1 | Tools that manage md devices can be found at | 1 | Tools that manage md devices can be found at |
2 | http://www.<country>.kernel.org/pub/linux/utils/raid/.... | 2 | http://www.<country>.kernel.org/pub/linux/utils/raid/.... |
3 | 3 | ||
4 | 4 | ||
5 | Boot time assembly of RAID arrays | 5 | Boot time assembly of RAID arrays |
6 | --------------------------------- | 6 | --------------------------------- |
7 | 7 | ||
8 | You can boot with your md device with the following kernel command | 8 | You can boot with your md device with the following kernel command |
9 | lines: | 9 | lines: |
10 | 10 | ||
11 | for old raid arrays without persistent superblocks: | 11 | for old raid arrays without persistent superblocks: |
12 | md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn | 12 | md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn |
13 | 13 | ||
14 | for raid arrays with persistent superblocks | 14 | for raid arrays with persistent superblocks |
15 | md=<md device no.>,dev0,dev1,...,devn | 15 | md=<md device no.>,dev0,dev1,...,devn |
16 | or, to assemble a partitionable array: | 16 | or, to assemble a partitionable array: |
17 | md=d<md device no.>,dev0,dev1,...,devn | 17 | md=d<md device no.>,dev0,dev1,...,devn |
18 | 18 | ||
19 | md device no. = the number of the md device ... | 19 | md device no. = the number of the md device ... |
20 | 0 means md0, | 20 | 0 means md0, |
21 | 1 md1, | 21 | 1 md1, |
22 | 2 md2, | 22 | 2 md2, |
23 | 3 md3, | 23 | 3 md3, |
24 | 4 md4 | 24 | 4 md4 |
25 | 25 | ||
26 | raid level = -1 linear mode | 26 | raid level = -1 linear mode |
27 | 0 striped mode | 27 | 0 striped mode |
28 | other modes are only supported with persistent super blocks | 28 | other modes are only supported with persistent super blocks |
29 | 29 | ||
30 | chunk size factor = (raid-0 and raid-1 only) | 30 | chunk size factor = (raid-0 and raid-1 only) |
31 | Set the chunk size as 4k << n. | 31 | Set the chunk size as 4k << n. |
32 | 32 | ||
33 | fault level = totally ignored | 33 | fault level = totally ignored |
34 | 34 | ||
35 | dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1 | 35 | dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1 |
36 | 36 | ||
37 | A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this: | 37 | A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this: |
38 | 38 | ||
39 | e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro | 39 | e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro |
40 | 40 | ||
41 | 41 | ||
42 | Boot time autodetection of RAID arrays | 42 | Boot time autodetection of RAID arrays |
43 | -------------------------------------- | 43 | -------------------------------------- |
44 | 44 | ||
45 | When md is compiled into the kernel (not as module), partitions of | 45 | When md is compiled into the kernel (not as module), partitions of |
46 | type 0xfd are scanned and automatically assembled into RAID arrays. | 46 | type 0xfd are scanned and automatically assembled into RAID arrays. |
47 | This autodetection may be suppressed with the kernel parameter | 47 | This autodetection may be suppressed with the kernel parameter |
48 | "raid=noautodetect". As of kernel 2.6.9, only drives with a type 0 | 48 | "raid=noautodetect". As of kernel 2.6.9, only drives with a type 0 |
49 | superblock can be autodetected and run at boot time. | 49 | superblock can be autodetected and run at boot time. |
50 | 50 | ||
51 | The kernel parameter "raid=partitionable" (or "raid=part") means | 51 | The kernel parameter "raid=partitionable" (or "raid=part") means |
52 | that all auto-detected arrays are assembled as partitionable. | 52 | that all auto-detected arrays are assembled as partitionable. |
53 | 53 | ||
54 | Boot time assembly of degraded/dirty arrays | 54 | Boot time assembly of degraded/dirty arrays |
55 | ------------------------------------------- | 55 | ------------------------------------------- |
56 | 56 | ||
57 | If a raid5 or raid6 array is both dirty and degraded, it could have | 57 | If a raid5 or raid6 array is both dirty and degraded, it could have |
58 | undetectable data corruption. This is because the fact that it is | 58 | undetectable data corruption. This is because the fact that it is |
59 | 'dirty' means that the parity cannot be trusted, and the fact that it | 59 | 'dirty' means that the parity cannot be trusted, and the fact that it |
60 | is degraded means that some datablocks are missing and cannot reliably | 60 | is degraded means that some datablocks are missing and cannot reliably |
61 | be reconstructed (due to no parity). | 61 | be reconstructed (due to no parity). |
62 | 62 | ||
63 | For this reason, md will normally refuse to start such an array. This | 63 | For this reason, md will normally refuse to start such an array. This |
64 | requires the sysadmin to take action to explicitly start the array | 64 | requires the sysadmin to take action to explicitly start the array |
65 | desipite possible corruption. This is normally done with | 65 | desipite possible corruption. This is normally done with |
66 | mdadm --assemble --force .... | 66 | mdadm --assemble --force .... |
67 | 67 | ||
68 | This option is not really available if the array has the root | 68 | This option is not really available if the array has the root |
69 | filesystem on it. In order to support this booting from such an | 69 | filesystem on it. In order to support this booting from such an |
70 | array, md supports a module parameter "start_dirty_degraded" which, | 70 | array, md supports a module parameter "start_dirty_degraded" which, |
71 | when set to 1, bypassed the checks and will allows dirty degraded | 71 | when set to 1, bypassed the checks and will allows dirty degraded |
72 | arrays to be started. | 72 | arrays to be started. |
73 | 73 | ||
74 | So, to boot with a root filesystem of a dirty degraded raid[56], use | 74 | So, to boot with a root filesystem of a dirty degraded raid[56], use |
75 | 75 | ||
76 | md-mod.start_dirty_degraded=1 | 76 | md-mod.start_dirty_degraded=1 |
77 | 77 | ||
78 | 78 | ||
79 | Superblock formats | 79 | Superblock formats |
80 | ------------------ | 80 | ------------------ |
81 | 81 | ||
82 | The md driver can support a variety of different superblock formats. | 82 | The md driver can support a variety of different superblock formats. |
83 | Currently, it supports superblock formats "0.90.0" and the "md-1" format | 83 | Currently, it supports superblock formats "0.90.0" and the "md-1" format |
84 | introduced in the 2.5 development series. | 84 | introduced in the 2.5 development series. |
85 | 85 | ||
86 | The kernel will autodetect which format superblock is being used. | 86 | The kernel will autodetect which format superblock is being used. |
87 | 87 | ||
88 | Superblock format '0' is treated differently to others for legacy | 88 | Superblock format '0' is treated differently to others for legacy |
89 | reasons - it is the original superblock format. | 89 | reasons - it is the original superblock format. |
90 | 90 | ||
91 | 91 | ||
92 | General Rules - apply for all superblock formats | 92 | General Rules - apply for all superblock formats |
93 | ------------------------------------------------ | 93 | ------------------------------------------------ |
94 | 94 | ||
95 | An array is 'created' by writing appropriate superblocks to all | 95 | An array is 'created' by writing appropriate superblocks to all |
96 | devices. | 96 | devices. |
97 | 97 | ||
98 | It is 'assembled' by associating each of these devices with an | 98 | It is 'assembled' by associating each of these devices with an |
99 | particular md virtual device. Once it is completely assembled, it can | 99 | particular md virtual device. Once it is completely assembled, it can |
100 | be accessed. | 100 | be accessed. |
101 | 101 | ||
102 | An array should be created by a user-space tool. This will write | 102 | An array should be created by a user-space tool. This will write |
103 | superblocks to all devices. It will usually mark the array as | 103 | superblocks to all devices. It will usually mark the array as |
104 | 'unclean', or with some devices missing so that the kernel md driver | 104 | 'unclean', or with some devices missing so that the kernel md driver |
105 | can create appropriate redundancy (copying in raid1, parity | 105 | can create appropriate redundancy (copying in raid1, parity |
106 | calculation in raid4/5). | 106 | calculation in raid4/5). |
107 | 107 | ||
108 | When an array is assembled, it is first initialized with the | 108 | When an array is assembled, it is first initialized with the |
109 | SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor | 109 | SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor |
110 | version number. The major version number selects which superblock | 110 | version number. The major version number selects which superblock |
111 | format is to be used. The minor number might be used to tune handling | 111 | format is to be used. The minor number might be used to tune handling |
112 | of the format, such as suggesting where on each device to look for the | 112 | of the format, such as suggesting where on each device to look for the |
113 | superblock. | 113 | superblock. |
114 | 114 | ||
115 | Then each device is added using the ADD_NEW_DISK ioctl. This | 115 | Then each device is added using the ADD_NEW_DISK ioctl. This |
116 | provides, in particular, a major and minor number identifying the | 116 | provides, in particular, a major and minor number identifying the |
117 | device to add. | 117 | device to add. |
118 | 118 | ||
119 | The array is started with the RUN_ARRAY ioctl. | 119 | The array is started with the RUN_ARRAY ioctl. |
120 | 120 | ||
121 | Once started, new devices can be added. They should have an | 121 | Once started, new devices can be added. They should have an |
122 | appropriate superblock written to them, and then passed be in with | 122 | appropriate superblock written to them, and then passed be in with |
123 | ADD_NEW_DISK. | 123 | ADD_NEW_DISK. |
124 | 124 | ||
125 | Devices that have failed or are not yet active can be detached from an | 125 | Devices that have failed or are not yet active can be detached from an |
126 | array using HOT_REMOVE_DISK. | 126 | array using HOT_REMOVE_DISK. |
127 | 127 | ||
128 | 128 | ||
129 | Specific Rules that apply to format-0 super block arrays, and | 129 | Specific Rules that apply to format-0 super block arrays, and |
130 | arrays with no superblock (non-persistent). | 130 | arrays with no superblock (non-persistent). |
131 | ------------------------------------------------------------- | 131 | ------------------------------------------------------------- |
132 | 132 | ||
133 | An array can be 'created' by describing the array (level, chunksize | 133 | An array can be 'created' by describing the array (level, chunksize |
134 | etc) in a SET_ARRAY_INFO ioctl. This must has major_version==0 and | 134 | etc) in a SET_ARRAY_INFO ioctl. This must has major_version==0 and |
135 | raid_disks != 0. | 135 | raid_disks != 0. |
136 | 136 | ||
137 | Then uninitialized devices can be added with ADD_NEW_DISK. The | 137 | Then uninitialized devices can be added with ADD_NEW_DISK. The |
138 | structure passed to ADD_NEW_DISK must specify the state of the device | 138 | structure passed to ADD_NEW_DISK must specify the state of the device |
139 | and it's role in the array. | 139 | and it's role in the array. |
140 | 140 | ||
141 | Once started with RUN_ARRAY, uninitialized spares can be added with | 141 | Once started with RUN_ARRAY, uninitialized spares can be added with |
142 | HOT_ADD_DISK. | 142 | HOT_ADD_DISK. |
143 | 143 | ||
144 | 144 | ||
145 | 145 | ||
146 | MD devices in sysfs | 146 | MD devices in sysfs |
147 | ------------------- | 147 | ------------------- |
148 | md devices appear in sysfs (/sys) as regular block devices, | 148 | md devices appear in sysfs (/sys) as regular block devices, |
149 | e.g. | 149 | e.g. |
150 | /sys/block/md0 | 150 | /sys/block/md0 |
151 | 151 | ||
152 | Each 'md' device will contain a subdirectory called 'md' which | 152 | Each 'md' device will contain a subdirectory called 'md' which |
153 | contains further md-specific information about the device. | 153 | contains further md-specific information about the device. |
154 | 154 | ||
155 | All md devices contain: | 155 | All md devices contain: |
156 | level | 156 | level |
157 | a text file indicating the 'raid level'. This may be a standard | 157 | a text file indicating the 'raid level'. This may be a standard |
158 | numerical level prefixed by "RAID-" - e.g. "RAID-5", or some | 158 | numerical level prefixed by "RAID-" - e.g. "RAID-5", or some |
159 | other name such as "linear" or "multipath". | 159 | other name such as "linear" or "multipath". |
160 | If no raid level has been set yet (array is still being | 160 | If no raid level has been set yet (array is still being |
161 | assembled), this file will be empty. | 161 | assembled), this file will be empty. |
162 | 162 | ||
163 | raid_disks | 163 | raid_disks |
164 | a text file with a simple number indicating the number of devices | 164 | a text file with a simple number indicating the number of devices |
165 | in a fully functional array. If this is not yet known, the file | 165 | in a fully functional array. If this is not yet known, the file |
166 | will be empty. If an array is being resized (not currently | 166 | will be empty. If an array is being resized (not currently |
167 | possible) this will contain the larger of the old and new sizes. | 167 | possible) this will contain the larger of the old and new sizes. |
168 | 168 | ||
169 | chunk_size | ||
170 | This is the size if bytes for 'chunks' and is only relevant to | ||
171 | raid levels that involve striping (1,4,5,6,10). The address space | ||
172 | of the array is conceptually divided into chunks and consecutive | ||
173 | chunks are striped onto neighbouring devices. | ||
174 | The size should be atleast PAGE_SIZE (4k) and should be a power | ||
175 | of 2. This can only be set while assembling an array | ||
176 | |||
169 | As component devices are added to an md array, they appear in the 'md' | 177 | As component devices are added to an md array, they appear in the 'md' |
170 | directory as new directories named | 178 | directory as new directories named |
171 | dev-XXX | 179 | dev-XXX |
172 | where XXX is a name that the kernel knows for the device, e.g. hdb1. | 180 | where XXX is a name that the kernel knows for the device, e.g. hdb1. |
173 | Each directory contains: | 181 | Each directory contains: |
174 | 182 | ||
175 | block | 183 | block |
176 | a symlink to the block device in /sys/block, e.g. | 184 | a symlink to the block device in /sys/block, e.g. |
177 | /sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1 | 185 | /sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1 |
178 | 186 | ||
179 | super | 187 | super |
180 | A file containing an image of the superblock read from, or | 188 | A file containing an image of the superblock read from, or |
181 | written to, that device. | 189 | written to, that device. |
182 | 190 | ||
183 | state | 191 | state |
184 | A file recording the current state of the device in the array | 192 | A file recording the current state of the device in the array |
185 | which can be a comma separated list of | 193 | which can be a comma separated list of |
186 | faulty - device has been kicked from active use due to | 194 | faulty - device has been kicked from active use due to |
187 | a detected fault | 195 | a detected fault |
188 | in_sync - device is a fully in-sync member of the array | 196 | in_sync - device is a fully in-sync member of the array |
189 | spare - device is working, but not a full member. | 197 | spare - device is working, but not a full member. |
190 | This includes spares that are in the process | 198 | This includes spares that are in the process |
191 | of being recoverred to | 199 | of being recoverred to |
192 | This list make grow in future. | 200 | This list make grow in future. |
193 | 201 | ||
194 | 202 | ||
195 | An active md device will also contain and entry for each active device | 203 | An active md device will also contain and entry for each active device |
196 | in the array. These are named | 204 | in the array. These are named |
197 | 205 | ||
198 | rdNN | 206 | rdNN |
199 | 207 | ||
200 | where 'NN' is the possition in the array, starting from 0. | 208 | where 'NN' is the possition in the array, starting from 0. |
201 | So for a 3 drive array there will be rd0, rd1, rd2. | 209 | So for a 3 drive array there will be rd0, rd1, rd2. |
202 | These are symbolic links to the appropriate 'dev-XXX' entry. | 210 | These are symbolic links to the appropriate 'dev-XXX' entry. |
203 | Thus, for example, | 211 | Thus, for example, |
204 | cat /sys/block/md*/md/rd*/state | 212 | cat /sys/block/md*/md/rd*/state |
205 | will show 'in_sync' on every line. | 213 | will show 'in_sync' on every line. |
206 | 214 | ||
207 | 215 | ||
208 | 216 | ||
209 | Active md devices for levels that support data redundancy (1,4,5,6) | 217 | Active md devices for levels that support data redundancy (1,4,5,6) |
210 | also have | 218 | also have |
211 | 219 | ||
212 | sync_action | 220 | sync_action |
213 | a text file that can be used to monitor and control the rebuild | 221 | a text file that can be used to monitor and control the rebuild |
214 | process. It contains one word which can be one of: | 222 | process. It contains one word which can be one of: |
215 | resync - redundancy is being recalculated after unclean | 223 | resync - redundancy is being recalculated after unclean |
216 | shutdown or creation | 224 | shutdown or creation |
217 | recover - a hot spare is being built to replace a | 225 | recover - a hot spare is being built to replace a |
218 | failed/missing device | 226 | failed/missing device |
219 | idle - nothing is happening | 227 | idle - nothing is happening |
220 | check - A full check of redundancy was requested and is | 228 | check - A full check of redundancy was requested and is |
221 | happening. This reads all block and checks | 229 | happening. This reads all block and checks |
222 | them. A repair may also happen for some raid | 230 | them. A repair may also happen for some raid |
223 | levels. | 231 | levels. |
224 | repair - A full check and repair is happening. This is | 232 | repair - A full check and repair is happening. This is |
225 | similar to 'resync', but was requested by the | 233 | similar to 'resync', but was requested by the |
226 | user, and the write-intent bitmap is NOT used to | 234 | user, and the write-intent bitmap is NOT used to |
227 | optimise the process. | 235 | optimise the process. |
228 | 236 | ||
229 | This file is writable, and each of the strings that could be | 237 | This file is writable, and each of the strings that could be |
230 | read are meaningful for writing. | 238 | read are meaningful for writing. |
231 | 239 | ||
232 | 'idle' will stop an active resync/recovery etc. There is no | 240 | 'idle' will stop an active resync/recovery etc. There is no |
233 | guarantee that another resync/recovery may not be automatically | 241 | guarantee that another resync/recovery may not be automatically |
234 | started again, though some event will be needed to trigger | 242 | started again, though some event will be needed to trigger |
235 | this. | 243 | this. |
236 | 'resync' or 'recovery' can be used to restart the | 244 | 'resync' or 'recovery' can be used to restart the |
237 | corresponding operation if it was stopped with 'idle'. | 245 | corresponding operation if it was stopped with 'idle'. |
238 | 'check' and 'repair' will start the appropriate process | 246 | 'check' and 'repair' will start the appropriate process |
239 | providing the current state is 'idle'. | 247 | providing the current state is 'idle'. |
240 | 248 | ||
241 | mismatch_count | 249 | mismatch_count |
242 | When performing 'check' and 'repair', and possibly when | 250 | When performing 'check' and 'repair', and possibly when |
243 | performing 'resync', md will count the number of errors that are | 251 | performing 'resync', md will count the number of errors that are |
244 | found. The count in 'mismatch_cnt' is the number of sectors | 252 | found. The count in 'mismatch_cnt' is the number of sectors |
245 | that were re-written, or (for 'check') would have been | 253 | that were re-written, or (for 'check') would have been |
246 | re-written. As most raid levels work in units of pages rather | 254 | re-written. As most raid levels work in units of pages rather |
247 | than sectors, this my be larger than the number of actual errors | 255 | than sectors, this my be larger than the number of actual errors |
248 | by a factor of the number of sectors in a page. | 256 | by a factor of the number of sectors in a page. |
249 | 257 | ||
250 | Each active md device may also have attributes specific to the | 258 | Each active md device may also have attributes specific to the |
251 | personality module that manages it. | 259 | personality module that manages it. |
252 | These are specific to the implementation of the module and could | 260 | These are specific to the implementation of the module and could |
253 | change substantially if the implementation changes. | 261 | change substantially if the implementation changes. |
254 | 262 | ||
255 | These currently include | 263 | These currently include |
256 | 264 | ||
257 | stripe_cache_size (currently raid5 only) | 265 | stripe_cache_size (currently raid5 only) |
258 | number of entries in the stripe cache. This is writable, but | 266 | number of entries in the stripe cache. This is writable, but |
259 | there are upper and lower limits (32768, 16). Default is 128. | 267 | there are upper and lower limits (32768, 16). Default is 128. |
260 | strip_cache_active (currently raid5 only) | 268 | strip_cache_active (currently raid5 only) |
261 | number of active entries in the stripe cache | 269 | number of active entries in the stripe cache |
262 | 270 |
drivers/md/md.c
1 | /* | 1 | /* |
2 | md.c : Multiple Devices driver for Linux | 2 | md.c : Multiple Devices driver for Linux |
3 | Copyright (C) 1998, 1999, 2000 Ingo Molnar | 3 | Copyright (C) 1998, 1999, 2000 Ingo Molnar |
4 | 4 | ||
5 | completely rewritten, based on the MD driver code from Marc Zyngier | 5 | completely rewritten, based on the MD driver code from Marc Zyngier |
6 | 6 | ||
7 | Changes: | 7 | Changes: |
8 | 8 | ||
9 | - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar | 9 | - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar |
10 | - RAID-6 extensions by H. Peter Anvin <hpa@zytor.com> | 10 | - RAID-6 extensions by H. Peter Anvin <hpa@zytor.com> |
11 | - boot support for linear and striped mode by Harald Hoyer <HarryH@Royal.Net> | 11 | - boot support for linear and striped mode by Harald Hoyer <HarryH@Royal.Net> |
12 | - kerneld support by Boris Tobotras <boris@xtalk.msk.su> | 12 | - kerneld support by Boris Tobotras <boris@xtalk.msk.su> |
13 | - kmod support by: Cyrus Durgin | 13 | - kmod support by: Cyrus Durgin |
14 | - RAID0 bugfixes: Mark Anthony Lisher <markal@iname.com> | 14 | - RAID0 bugfixes: Mark Anthony Lisher <markal@iname.com> |
15 | - Devfs support by Richard Gooch <rgooch@atnf.csiro.au> | 15 | - Devfs support by Richard Gooch <rgooch@atnf.csiro.au> |
16 | 16 | ||
17 | - lots of fixes and improvements to the RAID1/RAID5 and generic | 17 | - lots of fixes and improvements to the RAID1/RAID5 and generic |
18 | RAID code (such as request based resynchronization): | 18 | RAID code (such as request based resynchronization): |
19 | 19 | ||
20 | Neil Brown <neilb@cse.unsw.edu.au>. | 20 | Neil Brown <neilb@cse.unsw.edu.au>. |
21 | 21 | ||
22 | - persistent bitmap code | 22 | - persistent bitmap code |
23 | Copyright (C) 2003-2004, Paul Clements, SteelEye Technology, Inc. | 23 | Copyright (C) 2003-2004, Paul Clements, SteelEye Technology, Inc. |
24 | 24 | ||
25 | This program is free software; you can redistribute it and/or modify | 25 | This program is free software; you can redistribute it and/or modify |
26 | it under the terms of the GNU General Public License as published by | 26 | it under the terms of the GNU General Public License as published by |
27 | the Free Software Foundation; either version 2, or (at your option) | 27 | the Free Software Foundation; either version 2, or (at your option) |
28 | any later version. | 28 | any later version. |
29 | 29 | ||
30 | You should have received a copy of the GNU General Public License | 30 | You should have received a copy of the GNU General Public License |
31 | (for example /usr/src/linux/COPYING); if not, write to the Free | 31 | (for example /usr/src/linux/COPYING); if not, write to the Free |
32 | Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | 32 | Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
33 | */ | 33 | */ |
34 | 34 | ||
35 | #include <linux/module.h> | 35 | #include <linux/module.h> |
36 | #include <linux/config.h> | 36 | #include <linux/config.h> |
37 | #include <linux/kthread.h> | 37 | #include <linux/kthread.h> |
38 | #include <linux/linkage.h> | 38 | #include <linux/linkage.h> |
39 | #include <linux/raid/md.h> | 39 | #include <linux/raid/md.h> |
40 | #include <linux/raid/bitmap.h> | 40 | #include <linux/raid/bitmap.h> |
41 | #include <linux/sysctl.h> | 41 | #include <linux/sysctl.h> |
42 | #include <linux/devfs_fs_kernel.h> | 42 | #include <linux/devfs_fs_kernel.h> |
43 | #include <linux/buffer_head.h> /* for invalidate_bdev */ | 43 | #include <linux/buffer_head.h> /* for invalidate_bdev */ |
44 | #include <linux/suspend.h> | 44 | #include <linux/suspend.h> |
45 | #include <linux/poll.h> | 45 | #include <linux/poll.h> |
46 | 46 | ||
47 | #include <linux/init.h> | 47 | #include <linux/init.h> |
48 | 48 | ||
49 | #include <linux/file.h> | 49 | #include <linux/file.h> |
50 | 50 | ||
51 | #ifdef CONFIG_KMOD | 51 | #ifdef CONFIG_KMOD |
52 | #include <linux/kmod.h> | 52 | #include <linux/kmod.h> |
53 | #endif | 53 | #endif |
54 | 54 | ||
55 | #include <asm/unaligned.h> | 55 | #include <asm/unaligned.h> |
56 | 56 | ||
57 | #define MAJOR_NR MD_MAJOR | 57 | #define MAJOR_NR MD_MAJOR |
58 | #define MD_DRIVER | 58 | #define MD_DRIVER |
59 | 59 | ||
60 | /* 63 partitions with the alternate major number (mdp) */ | 60 | /* 63 partitions with the alternate major number (mdp) */ |
61 | #define MdpMinorShift 6 | 61 | #define MdpMinorShift 6 |
62 | 62 | ||
63 | #define DEBUG 0 | 63 | #define DEBUG 0 |
64 | #define dprintk(x...) ((void)(DEBUG && printk(x))) | 64 | #define dprintk(x...) ((void)(DEBUG && printk(x))) |
65 | 65 | ||
66 | 66 | ||
67 | #ifndef MODULE | 67 | #ifndef MODULE |
68 | static void autostart_arrays (int part); | 68 | static void autostart_arrays (int part); |
69 | #endif | 69 | #endif |
70 | 70 | ||
71 | static LIST_HEAD(pers_list); | 71 | static LIST_HEAD(pers_list); |
72 | static DEFINE_SPINLOCK(pers_lock); | 72 | static DEFINE_SPINLOCK(pers_lock); |
73 | 73 | ||
74 | /* | 74 | /* |
75 | * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' | 75 | * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' |
76 | * is 1000 KB/sec, so the extra system load does not show up that much. | 76 | * is 1000 KB/sec, so the extra system load does not show up that much. |
77 | * Increase it if you want to have more _guaranteed_ speed. Note that | 77 | * Increase it if you want to have more _guaranteed_ speed. Note that |
78 | * the RAID driver will use the maximum available bandwidth if the IO | 78 | * the RAID driver will use the maximum available bandwidth if the IO |
79 | * subsystem is idle. There is also an 'absolute maximum' reconstruction | 79 | * subsystem is idle. There is also an 'absolute maximum' reconstruction |
80 | * speed limit - in case reconstruction slows down your system despite | 80 | * speed limit - in case reconstruction slows down your system despite |
81 | * idle IO detection. | 81 | * idle IO detection. |
82 | * | 82 | * |
83 | * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. | 83 | * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. |
84 | */ | 84 | */ |
85 | 85 | ||
86 | static int sysctl_speed_limit_min = 1000; | 86 | static int sysctl_speed_limit_min = 1000; |
87 | static int sysctl_speed_limit_max = 200000; | 87 | static int sysctl_speed_limit_max = 200000; |
88 | 88 | ||
89 | static struct ctl_table_header *raid_table_header; | 89 | static struct ctl_table_header *raid_table_header; |
90 | 90 | ||
91 | static ctl_table raid_table[] = { | 91 | static ctl_table raid_table[] = { |
92 | { | 92 | { |
93 | .ctl_name = DEV_RAID_SPEED_LIMIT_MIN, | 93 | .ctl_name = DEV_RAID_SPEED_LIMIT_MIN, |
94 | .procname = "speed_limit_min", | 94 | .procname = "speed_limit_min", |
95 | .data = &sysctl_speed_limit_min, | 95 | .data = &sysctl_speed_limit_min, |
96 | .maxlen = sizeof(int), | 96 | .maxlen = sizeof(int), |
97 | .mode = 0644, | 97 | .mode = 0644, |
98 | .proc_handler = &proc_dointvec, | 98 | .proc_handler = &proc_dointvec, |
99 | }, | 99 | }, |
100 | { | 100 | { |
101 | .ctl_name = DEV_RAID_SPEED_LIMIT_MAX, | 101 | .ctl_name = DEV_RAID_SPEED_LIMIT_MAX, |
102 | .procname = "speed_limit_max", | 102 | .procname = "speed_limit_max", |
103 | .data = &sysctl_speed_limit_max, | 103 | .data = &sysctl_speed_limit_max, |
104 | .maxlen = sizeof(int), | 104 | .maxlen = sizeof(int), |
105 | .mode = 0644, | 105 | .mode = 0644, |
106 | .proc_handler = &proc_dointvec, | 106 | .proc_handler = &proc_dointvec, |
107 | }, | 107 | }, |
108 | { .ctl_name = 0 } | 108 | { .ctl_name = 0 } |
109 | }; | 109 | }; |
110 | 110 | ||
111 | static ctl_table raid_dir_table[] = { | 111 | static ctl_table raid_dir_table[] = { |
112 | { | 112 | { |
113 | .ctl_name = DEV_RAID, | 113 | .ctl_name = DEV_RAID, |
114 | .procname = "raid", | 114 | .procname = "raid", |
115 | .maxlen = 0, | 115 | .maxlen = 0, |
116 | .mode = 0555, | 116 | .mode = 0555, |
117 | .child = raid_table, | 117 | .child = raid_table, |
118 | }, | 118 | }, |
119 | { .ctl_name = 0 } | 119 | { .ctl_name = 0 } |
120 | }; | 120 | }; |
121 | 121 | ||
122 | static ctl_table raid_root_table[] = { | 122 | static ctl_table raid_root_table[] = { |
123 | { | 123 | { |
124 | .ctl_name = CTL_DEV, | 124 | .ctl_name = CTL_DEV, |
125 | .procname = "dev", | 125 | .procname = "dev", |
126 | .maxlen = 0, | 126 | .maxlen = 0, |
127 | .mode = 0555, | 127 | .mode = 0555, |
128 | .child = raid_dir_table, | 128 | .child = raid_dir_table, |
129 | }, | 129 | }, |
130 | { .ctl_name = 0 } | 130 | { .ctl_name = 0 } |
131 | }; | 131 | }; |
132 | 132 | ||
133 | static struct block_device_operations md_fops; | 133 | static struct block_device_operations md_fops; |
134 | 134 | ||
135 | static int start_readonly; | 135 | static int start_readonly; |
136 | 136 | ||
137 | /* | 137 | /* |
138 | * We have a system wide 'event count' that is incremented | 138 | * We have a system wide 'event count' that is incremented |
139 | * on any 'interesting' event, and readers of /proc/mdstat | 139 | * on any 'interesting' event, and readers of /proc/mdstat |
140 | * can use 'poll' or 'select' to find out when the event | 140 | * can use 'poll' or 'select' to find out when the event |
141 | * count increases. | 141 | * count increases. |
142 | * | 142 | * |
143 | * Events are: | 143 | * Events are: |
144 | * start array, stop array, error, add device, remove device, | 144 | * start array, stop array, error, add device, remove device, |
145 | * start build, activate spare | 145 | * start build, activate spare |
146 | */ | 146 | */ |
147 | static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters); | 147 | static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters); |
148 | static atomic_t md_event_count; | 148 | static atomic_t md_event_count; |
149 | static void md_new_event(mddev_t *mddev) | 149 | static void md_new_event(mddev_t *mddev) |
150 | { | 150 | { |
151 | atomic_inc(&md_event_count); | 151 | atomic_inc(&md_event_count); |
152 | wake_up(&md_event_waiters); | 152 | wake_up(&md_event_waiters); |
153 | } | 153 | } |
154 | 154 | ||
155 | /* | 155 | /* |
156 | * Enables to iterate over all existing md arrays | 156 | * Enables to iterate over all existing md arrays |
157 | * all_mddevs_lock protects this list. | 157 | * all_mddevs_lock protects this list. |
158 | */ | 158 | */ |
159 | static LIST_HEAD(all_mddevs); | 159 | static LIST_HEAD(all_mddevs); |
160 | static DEFINE_SPINLOCK(all_mddevs_lock); | 160 | static DEFINE_SPINLOCK(all_mddevs_lock); |
161 | 161 | ||
162 | 162 | ||
163 | /* | 163 | /* |
164 | * iterates through all used mddevs in the system. | 164 | * iterates through all used mddevs in the system. |
165 | * We take care to grab the all_mddevs_lock whenever navigating | 165 | * We take care to grab the all_mddevs_lock whenever navigating |
166 | * the list, and to always hold a refcount when unlocked. | 166 | * the list, and to always hold a refcount when unlocked. |
167 | * Any code which breaks out of this loop while own | 167 | * Any code which breaks out of this loop while own |
168 | * a reference to the current mddev and must mddev_put it. | 168 | * a reference to the current mddev and must mddev_put it. |
169 | */ | 169 | */ |
170 | #define ITERATE_MDDEV(mddev,tmp) \ | 170 | #define ITERATE_MDDEV(mddev,tmp) \ |
171 | \ | 171 | \ |
172 | for (({ spin_lock(&all_mddevs_lock); \ | 172 | for (({ spin_lock(&all_mddevs_lock); \ |
173 | tmp = all_mddevs.next; \ | 173 | tmp = all_mddevs.next; \ |
174 | mddev = NULL;}); \ | 174 | mddev = NULL;}); \ |
175 | ({ if (tmp != &all_mddevs) \ | 175 | ({ if (tmp != &all_mddevs) \ |
176 | mddev_get(list_entry(tmp, mddev_t, all_mddevs));\ | 176 | mddev_get(list_entry(tmp, mddev_t, all_mddevs));\ |
177 | spin_unlock(&all_mddevs_lock); \ | 177 | spin_unlock(&all_mddevs_lock); \ |
178 | if (mddev) mddev_put(mddev); \ | 178 | if (mddev) mddev_put(mddev); \ |
179 | mddev = list_entry(tmp, mddev_t, all_mddevs); \ | 179 | mddev = list_entry(tmp, mddev_t, all_mddevs); \ |
180 | tmp != &all_mddevs;}); \ | 180 | tmp != &all_mddevs;}); \ |
181 | ({ spin_lock(&all_mddevs_lock); \ | 181 | ({ spin_lock(&all_mddevs_lock); \ |
182 | tmp = tmp->next;}) \ | 182 | tmp = tmp->next;}) \ |
183 | ) | 183 | ) |
184 | 184 | ||
185 | 185 | ||
186 | static int md_fail_request (request_queue_t *q, struct bio *bio) | 186 | static int md_fail_request (request_queue_t *q, struct bio *bio) |
187 | { | 187 | { |
188 | bio_io_error(bio, bio->bi_size); | 188 | bio_io_error(bio, bio->bi_size); |
189 | return 0; | 189 | return 0; |
190 | } | 190 | } |
191 | 191 | ||
192 | static inline mddev_t *mddev_get(mddev_t *mddev) | 192 | static inline mddev_t *mddev_get(mddev_t *mddev) |
193 | { | 193 | { |
194 | atomic_inc(&mddev->active); | 194 | atomic_inc(&mddev->active); |
195 | return mddev; | 195 | return mddev; |
196 | } | 196 | } |
197 | 197 | ||
198 | static void mddev_put(mddev_t *mddev) | 198 | static void mddev_put(mddev_t *mddev) |
199 | { | 199 | { |
200 | if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) | 200 | if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) |
201 | return; | 201 | return; |
202 | if (!mddev->raid_disks && list_empty(&mddev->disks)) { | 202 | if (!mddev->raid_disks && list_empty(&mddev->disks)) { |
203 | list_del(&mddev->all_mddevs); | 203 | list_del(&mddev->all_mddevs); |
204 | blk_put_queue(mddev->queue); | 204 | blk_put_queue(mddev->queue); |
205 | kobject_unregister(&mddev->kobj); | 205 | kobject_unregister(&mddev->kobj); |
206 | } | 206 | } |
207 | spin_unlock(&all_mddevs_lock); | 207 | spin_unlock(&all_mddevs_lock); |
208 | } | 208 | } |
209 | 209 | ||
210 | static mddev_t * mddev_find(dev_t unit) | 210 | static mddev_t * mddev_find(dev_t unit) |
211 | { | 211 | { |
212 | mddev_t *mddev, *new = NULL; | 212 | mddev_t *mddev, *new = NULL; |
213 | 213 | ||
214 | retry: | 214 | retry: |
215 | spin_lock(&all_mddevs_lock); | 215 | spin_lock(&all_mddevs_lock); |
216 | list_for_each_entry(mddev, &all_mddevs, all_mddevs) | 216 | list_for_each_entry(mddev, &all_mddevs, all_mddevs) |
217 | if (mddev->unit == unit) { | 217 | if (mddev->unit == unit) { |
218 | mddev_get(mddev); | 218 | mddev_get(mddev); |
219 | spin_unlock(&all_mddevs_lock); | 219 | spin_unlock(&all_mddevs_lock); |
220 | kfree(new); | 220 | kfree(new); |
221 | return mddev; | 221 | return mddev; |
222 | } | 222 | } |
223 | 223 | ||
224 | if (new) { | 224 | if (new) { |
225 | list_add(&new->all_mddevs, &all_mddevs); | 225 | list_add(&new->all_mddevs, &all_mddevs); |
226 | spin_unlock(&all_mddevs_lock); | 226 | spin_unlock(&all_mddevs_lock); |
227 | return new; | 227 | return new; |
228 | } | 228 | } |
229 | spin_unlock(&all_mddevs_lock); | 229 | spin_unlock(&all_mddevs_lock); |
230 | 230 | ||
231 | new = kzalloc(sizeof(*new), GFP_KERNEL); | 231 | new = kzalloc(sizeof(*new), GFP_KERNEL); |
232 | if (!new) | 232 | if (!new) |
233 | return NULL; | 233 | return NULL; |
234 | 234 | ||
235 | new->unit = unit; | 235 | new->unit = unit; |
236 | if (MAJOR(unit) == MD_MAJOR) | 236 | if (MAJOR(unit) == MD_MAJOR) |
237 | new->md_minor = MINOR(unit); | 237 | new->md_minor = MINOR(unit); |
238 | else | 238 | else |
239 | new->md_minor = MINOR(unit) >> MdpMinorShift; | 239 | new->md_minor = MINOR(unit) >> MdpMinorShift; |
240 | 240 | ||
241 | init_MUTEX(&new->reconfig_sem); | 241 | init_MUTEX(&new->reconfig_sem); |
242 | INIT_LIST_HEAD(&new->disks); | 242 | INIT_LIST_HEAD(&new->disks); |
243 | INIT_LIST_HEAD(&new->all_mddevs); | 243 | INIT_LIST_HEAD(&new->all_mddevs); |
244 | init_timer(&new->safemode_timer); | 244 | init_timer(&new->safemode_timer); |
245 | atomic_set(&new->active, 1); | 245 | atomic_set(&new->active, 1); |
246 | spin_lock_init(&new->write_lock); | 246 | spin_lock_init(&new->write_lock); |
247 | init_waitqueue_head(&new->sb_wait); | 247 | init_waitqueue_head(&new->sb_wait); |
248 | 248 | ||
249 | new->queue = blk_alloc_queue(GFP_KERNEL); | 249 | new->queue = blk_alloc_queue(GFP_KERNEL); |
250 | if (!new->queue) { | 250 | if (!new->queue) { |
251 | kfree(new); | 251 | kfree(new); |
252 | return NULL; | 252 | return NULL; |
253 | } | 253 | } |
254 | 254 | ||
255 | blk_queue_make_request(new->queue, md_fail_request); | 255 | blk_queue_make_request(new->queue, md_fail_request); |
256 | 256 | ||
257 | goto retry; | 257 | goto retry; |
258 | } | 258 | } |
259 | 259 | ||
260 | static inline int mddev_lock(mddev_t * mddev) | 260 | static inline int mddev_lock(mddev_t * mddev) |
261 | { | 261 | { |
262 | return down_interruptible(&mddev->reconfig_sem); | 262 | return down_interruptible(&mddev->reconfig_sem); |
263 | } | 263 | } |
264 | 264 | ||
265 | static inline void mddev_lock_uninterruptible(mddev_t * mddev) | 265 | static inline void mddev_lock_uninterruptible(mddev_t * mddev) |
266 | { | 266 | { |
267 | down(&mddev->reconfig_sem); | 267 | down(&mddev->reconfig_sem); |
268 | } | 268 | } |
269 | 269 | ||
270 | static inline int mddev_trylock(mddev_t * mddev) | 270 | static inline int mddev_trylock(mddev_t * mddev) |
271 | { | 271 | { |
272 | return down_trylock(&mddev->reconfig_sem); | 272 | return down_trylock(&mddev->reconfig_sem); |
273 | } | 273 | } |
274 | 274 | ||
275 | static inline void mddev_unlock(mddev_t * mddev) | 275 | static inline void mddev_unlock(mddev_t * mddev) |
276 | { | 276 | { |
277 | up(&mddev->reconfig_sem); | 277 | up(&mddev->reconfig_sem); |
278 | 278 | ||
279 | md_wakeup_thread(mddev->thread); | 279 | md_wakeup_thread(mddev->thread); |
280 | } | 280 | } |
281 | 281 | ||
282 | static mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) | 282 | static mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) |
283 | { | 283 | { |
284 | mdk_rdev_t * rdev; | 284 | mdk_rdev_t * rdev; |
285 | struct list_head *tmp; | 285 | struct list_head *tmp; |
286 | 286 | ||
287 | ITERATE_RDEV(mddev,rdev,tmp) { | 287 | ITERATE_RDEV(mddev,rdev,tmp) { |
288 | if (rdev->desc_nr == nr) | 288 | if (rdev->desc_nr == nr) |
289 | return rdev; | 289 | return rdev; |
290 | } | 290 | } |
291 | return NULL; | 291 | return NULL; |
292 | } | 292 | } |
293 | 293 | ||
294 | static mdk_rdev_t * find_rdev(mddev_t * mddev, dev_t dev) | 294 | static mdk_rdev_t * find_rdev(mddev_t * mddev, dev_t dev) |
295 | { | 295 | { |
296 | struct list_head *tmp; | 296 | struct list_head *tmp; |
297 | mdk_rdev_t *rdev; | 297 | mdk_rdev_t *rdev; |
298 | 298 | ||
299 | ITERATE_RDEV(mddev,rdev,tmp) { | 299 | ITERATE_RDEV(mddev,rdev,tmp) { |
300 | if (rdev->bdev->bd_dev == dev) | 300 | if (rdev->bdev->bd_dev == dev) |
301 | return rdev; | 301 | return rdev; |
302 | } | 302 | } |
303 | return NULL; | 303 | return NULL; |
304 | } | 304 | } |
305 | 305 | ||
306 | static struct mdk_personality *find_pers(int level) | 306 | static struct mdk_personality *find_pers(int level) |
307 | { | 307 | { |
308 | struct mdk_personality *pers; | 308 | struct mdk_personality *pers; |
309 | list_for_each_entry(pers, &pers_list, list) | 309 | list_for_each_entry(pers, &pers_list, list) |
310 | if (pers->level == level) | 310 | if (pers->level == level) |
311 | return pers; | 311 | return pers; |
312 | return NULL; | 312 | return NULL; |
313 | } | 313 | } |
314 | 314 | ||
315 | static inline sector_t calc_dev_sboffset(struct block_device *bdev) | 315 | static inline sector_t calc_dev_sboffset(struct block_device *bdev) |
316 | { | 316 | { |
317 | sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; | 317 | sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; |
318 | return MD_NEW_SIZE_BLOCKS(size); | 318 | return MD_NEW_SIZE_BLOCKS(size); |
319 | } | 319 | } |
320 | 320 | ||
321 | static sector_t calc_dev_size(mdk_rdev_t *rdev, unsigned chunk_size) | 321 | static sector_t calc_dev_size(mdk_rdev_t *rdev, unsigned chunk_size) |
322 | { | 322 | { |
323 | sector_t size; | 323 | sector_t size; |
324 | 324 | ||
325 | size = rdev->sb_offset; | 325 | size = rdev->sb_offset; |
326 | 326 | ||
327 | if (chunk_size) | 327 | if (chunk_size) |
328 | size &= ~((sector_t)chunk_size/1024 - 1); | 328 | size &= ~((sector_t)chunk_size/1024 - 1); |
329 | return size; | 329 | return size; |
330 | } | 330 | } |
331 | 331 | ||
332 | static int alloc_disk_sb(mdk_rdev_t * rdev) | 332 | static int alloc_disk_sb(mdk_rdev_t * rdev) |
333 | { | 333 | { |
334 | if (rdev->sb_page) | 334 | if (rdev->sb_page) |
335 | MD_BUG(); | 335 | MD_BUG(); |
336 | 336 | ||
337 | rdev->sb_page = alloc_page(GFP_KERNEL); | 337 | rdev->sb_page = alloc_page(GFP_KERNEL); |
338 | if (!rdev->sb_page) { | 338 | if (!rdev->sb_page) { |
339 | printk(KERN_ALERT "md: out of memory.\n"); | 339 | printk(KERN_ALERT "md: out of memory.\n"); |
340 | return -EINVAL; | 340 | return -EINVAL; |
341 | } | 341 | } |
342 | 342 | ||
343 | return 0; | 343 | return 0; |
344 | } | 344 | } |
345 | 345 | ||
346 | static void free_disk_sb(mdk_rdev_t * rdev) | 346 | static void free_disk_sb(mdk_rdev_t * rdev) |
347 | { | 347 | { |
348 | if (rdev->sb_page) { | 348 | if (rdev->sb_page) { |
349 | put_page(rdev->sb_page); | 349 | put_page(rdev->sb_page); |
350 | rdev->sb_loaded = 0; | 350 | rdev->sb_loaded = 0; |
351 | rdev->sb_page = NULL; | 351 | rdev->sb_page = NULL; |
352 | rdev->sb_offset = 0; | 352 | rdev->sb_offset = 0; |
353 | rdev->size = 0; | 353 | rdev->size = 0; |
354 | } | 354 | } |
355 | } | 355 | } |
356 | 356 | ||
357 | 357 | ||
358 | static int super_written(struct bio *bio, unsigned int bytes_done, int error) | 358 | static int super_written(struct bio *bio, unsigned int bytes_done, int error) |
359 | { | 359 | { |
360 | mdk_rdev_t *rdev = bio->bi_private; | 360 | mdk_rdev_t *rdev = bio->bi_private; |
361 | mddev_t *mddev = rdev->mddev; | 361 | mddev_t *mddev = rdev->mddev; |
362 | if (bio->bi_size) | 362 | if (bio->bi_size) |
363 | return 1; | 363 | return 1; |
364 | 364 | ||
365 | if (error || !test_bit(BIO_UPTODATE, &bio->bi_flags)) | 365 | if (error || !test_bit(BIO_UPTODATE, &bio->bi_flags)) |
366 | md_error(mddev, rdev); | 366 | md_error(mddev, rdev); |
367 | 367 | ||
368 | if (atomic_dec_and_test(&mddev->pending_writes)) | 368 | if (atomic_dec_and_test(&mddev->pending_writes)) |
369 | wake_up(&mddev->sb_wait); | 369 | wake_up(&mddev->sb_wait); |
370 | bio_put(bio); | 370 | bio_put(bio); |
371 | return 0; | 371 | return 0; |
372 | } | 372 | } |
373 | 373 | ||
374 | static int super_written_barrier(struct bio *bio, unsigned int bytes_done, int error) | 374 | static int super_written_barrier(struct bio *bio, unsigned int bytes_done, int error) |
375 | { | 375 | { |
376 | struct bio *bio2 = bio->bi_private; | 376 | struct bio *bio2 = bio->bi_private; |
377 | mdk_rdev_t *rdev = bio2->bi_private; | 377 | mdk_rdev_t *rdev = bio2->bi_private; |
378 | mddev_t *mddev = rdev->mddev; | 378 | mddev_t *mddev = rdev->mddev; |
379 | if (bio->bi_size) | 379 | if (bio->bi_size) |
380 | return 1; | 380 | return 1; |
381 | 381 | ||
382 | if (!test_bit(BIO_UPTODATE, &bio->bi_flags) && | 382 | if (!test_bit(BIO_UPTODATE, &bio->bi_flags) && |
383 | error == -EOPNOTSUPP) { | 383 | error == -EOPNOTSUPP) { |
384 | unsigned long flags; | 384 | unsigned long flags; |
385 | /* barriers don't appear to be supported :-( */ | 385 | /* barriers don't appear to be supported :-( */ |
386 | set_bit(BarriersNotsupp, &rdev->flags); | 386 | set_bit(BarriersNotsupp, &rdev->flags); |
387 | mddev->barriers_work = 0; | 387 | mddev->barriers_work = 0; |
388 | spin_lock_irqsave(&mddev->write_lock, flags); | 388 | spin_lock_irqsave(&mddev->write_lock, flags); |
389 | bio2->bi_next = mddev->biolist; | 389 | bio2->bi_next = mddev->biolist; |
390 | mddev->biolist = bio2; | 390 | mddev->biolist = bio2; |
391 | spin_unlock_irqrestore(&mddev->write_lock, flags); | 391 | spin_unlock_irqrestore(&mddev->write_lock, flags); |
392 | wake_up(&mddev->sb_wait); | 392 | wake_up(&mddev->sb_wait); |
393 | bio_put(bio); | 393 | bio_put(bio); |
394 | return 0; | 394 | return 0; |
395 | } | 395 | } |
396 | bio_put(bio2); | 396 | bio_put(bio2); |
397 | bio->bi_private = rdev; | 397 | bio->bi_private = rdev; |
398 | return super_written(bio, bytes_done, error); | 398 | return super_written(bio, bytes_done, error); |
399 | } | 399 | } |
400 | 400 | ||
401 | void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev, | 401 | void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev, |
402 | sector_t sector, int size, struct page *page) | 402 | sector_t sector, int size, struct page *page) |
403 | { | 403 | { |
404 | /* write first size bytes of page to sector of rdev | 404 | /* write first size bytes of page to sector of rdev |
405 | * Increment mddev->pending_writes before returning | 405 | * Increment mddev->pending_writes before returning |
406 | * and decrement it on completion, waking up sb_wait | 406 | * and decrement it on completion, waking up sb_wait |
407 | * if zero is reached. | 407 | * if zero is reached. |
408 | * If an error occurred, call md_error | 408 | * If an error occurred, call md_error |
409 | * | 409 | * |
410 | * As we might need to resubmit the request if BIO_RW_BARRIER | 410 | * As we might need to resubmit the request if BIO_RW_BARRIER |
411 | * causes ENOTSUPP, we allocate a spare bio... | 411 | * causes ENOTSUPP, we allocate a spare bio... |
412 | */ | 412 | */ |
413 | struct bio *bio = bio_alloc(GFP_NOIO, 1); | 413 | struct bio *bio = bio_alloc(GFP_NOIO, 1); |
414 | int rw = (1<<BIO_RW) | (1<<BIO_RW_SYNC); | 414 | int rw = (1<<BIO_RW) | (1<<BIO_RW_SYNC); |
415 | 415 | ||
416 | bio->bi_bdev = rdev->bdev; | 416 | bio->bi_bdev = rdev->bdev; |
417 | bio->bi_sector = sector; | 417 | bio->bi_sector = sector; |
418 | bio_add_page(bio, page, size, 0); | 418 | bio_add_page(bio, page, size, 0); |
419 | bio->bi_private = rdev; | 419 | bio->bi_private = rdev; |
420 | bio->bi_end_io = super_written; | 420 | bio->bi_end_io = super_written; |
421 | bio->bi_rw = rw; | 421 | bio->bi_rw = rw; |
422 | 422 | ||
423 | atomic_inc(&mddev->pending_writes); | 423 | atomic_inc(&mddev->pending_writes); |
424 | if (!test_bit(BarriersNotsupp, &rdev->flags)) { | 424 | if (!test_bit(BarriersNotsupp, &rdev->flags)) { |
425 | struct bio *rbio; | 425 | struct bio *rbio; |
426 | rw |= (1<<BIO_RW_BARRIER); | 426 | rw |= (1<<BIO_RW_BARRIER); |
427 | rbio = bio_clone(bio, GFP_NOIO); | 427 | rbio = bio_clone(bio, GFP_NOIO); |
428 | rbio->bi_private = bio; | 428 | rbio->bi_private = bio; |
429 | rbio->bi_end_io = super_written_barrier; | 429 | rbio->bi_end_io = super_written_barrier; |
430 | submit_bio(rw, rbio); | 430 | submit_bio(rw, rbio); |
431 | } else | 431 | } else |
432 | submit_bio(rw, bio); | 432 | submit_bio(rw, bio); |
433 | } | 433 | } |
434 | 434 | ||
435 | void md_super_wait(mddev_t *mddev) | 435 | void md_super_wait(mddev_t *mddev) |
436 | { | 436 | { |
437 | /* wait for all superblock writes that were scheduled to complete. | 437 | /* wait for all superblock writes that were scheduled to complete. |
438 | * if any had to be retried (due to BARRIER problems), retry them | 438 | * if any had to be retried (due to BARRIER problems), retry them |
439 | */ | 439 | */ |
440 | DEFINE_WAIT(wq); | 440 | DEFINE_WAIT(wq); |
441 | for(;;) { | 441 | for(;;) { |
442 | prepare_to_wait(&mddev->sb_wait, &wq, TASK_UNINTERRUPTIBLE); | 442 | prepare_to_wait(&mddev->sb_wait, &wq, TASK_UNINTERRUPTIBLE); |
443 | if (atomic_read(&mddev->pending_writes)==0) | 443 | if (atomic_read(&mddev->pending_writes)==0) |
444 | break; | 444 | break; |
445 | while (mddev->biolist) { | 445 | while (mddev->biolist) { |
446 | struct bio *bio; | 446 | struct bio *bio; |
447 | spin_lock_irq(&mddev->write_lock); | 447 | spin_lock_irq(&mddev->write_lock); |
448 | bio = mddev->biolist; | 448 | bio = mddev->biolist; |
449 | mddev->biolist = bio->bi_next ; | 449 | mddev->biolist = bio->bi_next ; |
450 | bio->bi_next = NULL; | 450 | bio->bi_next = NULL; |
451 | spin_unlock_irq(&mddev->write_lock); | 451 | spin_unlock_irq(&mddev->write_lock); |
452 | submit_bio(bio->bi_rw, bio); | 452 | submit_bio(bio->bi_rw, bio); |
453 | } | 453 | } |
454 | schedule(); | 454 | schedule(); |
455 | } | 455 | } |
456 | finish_wait(&mddev->sb_wait, &wq); | 456 | finish_wait(&mddev->sb_wait, &wq); |
457 | } | 457 | } |
458 | 458 | ||
459 | static int bi_complete(struct bio *bio, unsigned int bytes_done, int error) | 459 | static int bi_complete(struct bio *bio, unsigned int bytes_done, int error) |
460 | { | 460 | { |
461 | if (bio->bi_size) | 461 | if (bio->bi_size) |
462 | return 1; | 462 | return 1; |
463 | 463 | ||
464 | complete((struct completion*)bio->bi_private); | 464 | complete((struct completion*)bio->bi_private); |
465 | return 0; | 465 | return 0; |
466 | } | 466 | } |
467 | 467 | ||
468 | int sync_page_io(struct block_device *bdev, sector_t sector, int size, | 468 | int sync_page_io(struct block_device *bdev, sector_t sector, int size, |
469 | struct page *page, int rw) | 469 | struct page *page, int rw) |
470 | { | 470 | { |
471 | struct bio *bio = bio_alloc(GFP_NOIO, 1); | 471 | struct bio *bio = bio_alloc(GFP_NOIO, 1); |
472 | struct completion event; | 472 | struct completion event; |
473 | int ret; | 473 | int ret; |
474 | 474 | ||
475 | rw |= (1 << BIO_RW_SYNC); | 475 | rw |= (1 << BIO_RW_SYNC); |
476 | 476 | ||
477 | bio->bi_bdev = bdev; | 477 | bio->bi_bdev = bdev; |
478 | bio->bi_sector = sector; | 478 | bio->bi_sector = sector; |
479 | bio_add_page(bio, page, size, 0); | 479 | bio_add_page(bio, page, size, 0); |
480 | init_completion(&event); | 480 | init_completion(&event); |
481 | bio->bi_private = &event; | 481 | bio->bi_private = &event; |
482 | bio->bi_end_io = bi_complete; | 482 | bio->bi_end_io = bi_complete; |
483 | submit_bio(rw, bio); | 483 | submit_bio(rw, bio); |
484 | wait_for_completion(&event); | 484 | wait_for_completion(&event); |
485 | 485 | ||
486 | ret = test_bit(BIO_UPTODATE, &bio->bi_flags); | 486 | ret = test_bit(BIO_UPTODATE, &bio->bi_flags); |
487 | bio_put(bio); | 487 | bio_put(bio); |
488 | return ret; | 488 | return ret; |
489 | } | 489 | } |
490 | EXPORT_SYMBOL_GPL(sync_page_io); | 490 | EXPORT_SYMBOL_GPL(sync_page_io); |
491 | 491 | ||
492 | static int read_disk_sb(mdk_rdev_t * rdev, int size) | 492 | static int read_disk_sb(mdk_rdev_t * rdev, int size) |
493 | { | 493 | { |
494 | char b[BDEVNAME_SIZE]; | 494 | char b[BDEVNAME_SIZE]; |
495 | if (!rdev->sb_page) { | 495 | if (!rdev->sb_page) { |
496 | MD_BUG(); | 496 | MD_BUG(); |
497 | return -EINVAL; | 497 | return -EINVAL; |
498 | } | 498 | } |
499 | if (rdev->sb_loaded) | 499 | if (rdev->sb_loaded) |
500 | return 0; | 500 | return 0; |
501 | 501 | ||
502 | 502 | ||
503 | if (!sync_page_io(rdev->bdev, rdev->sb_offset<<1, size, rdev->sb_page, READ)) | 503 | if (!sync_page_io(rdev->bdev, rdev->sb_offset<<1, size, rdev->sb_page, READ)) |
504 | goto fail; | 504 | goto fail; |
505 | rdev->sb_loaded = 1; | 505 | rdev->sb_loaded = 1; |
506 | return 0; | 506 | return 0; |
507 | 507 | ||
508 | fail: | 508 | fail: |
509 | printk(KERN_WARNING "md: disabled device %s, could not read superblock.\n", | 509 | printk(KERN_WARNING "md: disabled device %s, could not read superblock.\n", |
510 | bdevname(rdev->bdev,b)); | 510 | bdevname(rdev->bdev,b)); |
511 | return -EINVAL; | 511 | return -EINVAL; |
512 | } | 512 | } |
513 | 513 | ||
514 | static int uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2) | 514 | static int uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2) |
515 | { | 515 | { |
516 | if ( (sb1->set_uuid0 == sb2->set_uuid0) && | 516 | if ( (sb1->set_uuid0 == sb2->set_uuid0) && |
517 | (sb1->set_uuid1 == sb2->set_uuid1) && | 517 | (sb1->set_uuid1 == sb2->set_uuid1) && |
518 | (sb1->set_uuid2 == sb2->set_uuid2) && | 518 | (sb1->set_uuid2 == sb2->set_uuid2) && |
519 | (sb1->set_uuid3 == sb2->set_uuid3)) | 519 | (sb1->set_uuid3 == sb2->set_uuid3)) |
520 | 520 | ||
521 | return 1; | 521 | return 1; |
522 | 522 | ||
523 | return 0; | 523 | return 0; |
524 | } | 524 | } |
525 | 525 | ||
526 | 526 | ||
527 | static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) | 527 | static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) |
528 | { | 528 | { |
529 | int ret; | 529 | int ret; |
530 | mdp_super_t *tmp1, *tmp2; | 530 | mdp_super_t *tmp1, *tmp2; |
531 | 531 | ||
532 | tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); | 532 | tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); |
533 | tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); | 533 | tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); |
534 | 534 | ||
535 | if (!tmp1 || !tmp2) { | 535 | if (!tmp1 || !tmp2) { |
536 | ret = 0; | 536 | ret = 0; |
537 | printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); | 537 | printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); |
538 | goto abort; | 538 | goto abort; |
539 | } | 539 | } |
540 | 540 | ||
541 | *tmp1 = *sb1; | 541 | *tmp1 = *sb1; |
542 | *tmp2 = *sb2; | 542 | *tmp2 = *sb2; |
543 | 543 | ||
544 | /* | 544 | /* |
545 | * nr_disks is not constant | 545 | * nr_disks is not constant |
546 | */ | 546 | */ |
547 | tmp1->nr_disks = 0; | 547 | tmp1->nr_disks = 0; |
548 | tmp2->nr_disks = 0; | 548 | tmp2->nr_disks = 0; |
549 | 549 | ||
550 | if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) | 550 | if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) |
551 | ret = 0; | 551 | ret = 0; |
552 | else | 552 | else |
553 | ret = 1; | 553 | ret = 1; |
554 | 554 | ||
555 | abort: | 555 | abort: |
556 | kfree(tmp1); | 556 | kfree(tmp1); |
557 | kfree(tmp2); | 557 | kfree(tmp2); |
558 | return ret; | 558 | return ret; |
559 | } | 559 | } |
560 | 560 | ||
561 | static unsigned int calc_sb_csum(mdp_super_t * sb) | 561 | static unsigned int calc_sb_csum(mdp_super_t * sb) |
562 | { | 562 | { |
563 | unsigned int disk_csum, csum; | 563 | unsigned int disk_csum, csum; |
564 | 564 | ||
565 | disk_csum = sb->sb_csum; | 565 | disk_csum = sb->sb_csum; |
566 | sb->sb_csum = 0; | 566 | sb->sb_csum = 0; |
567 | csum = csum_partial((void *)sb, MD_SB_BYTES, 0); | 567 | csum = csum_partial((void *)sb, MD_SB_BYTES, 0); |
568 | sb->sb_csum = disk_csum; | 568 | sb->sb_csum = disk_csum; |
569 | return csum; | 569 | return csum; |
570 | } | 570 | } |
571 | 571 | ||
572 | 572 | ||
573 | /* | 573 | /* |
574 | * Handle superblock details. | 574 | * Handle superblock details. |
575 | * We want to be able to handle multiple superblock formats | 575 | * We want to be able to handle multiple superblock formats |
576 | * so we have a common interface to them all, and an array of | 576 | * so we have a common interface to them all, and an array of |
577 | * different handlers. | 577 | * different handlers. |
578 | * We rely on user-space to write the initial superblock, and support | 578 | * We rely on user-space to write the initial superblock, and support |
579 | * reading and updating of superblocks. | 579 | * reading and updating of superblocks. |
580 | * Interface methods are: | 580 | * Interface methods are: |
581 | * int load_super(mdk_rdev_t *dev, mdk_rdev_t *refdev, int minor_version) | 581 | * int load_super(mdk_rdev_t *dev, mdk_rdev_t *refdev, int minor_version) |
582 | * loads and validates a superblock on dev. | 582 | * loads and validates a superblock on dev. |
583 | * if refdev != NULL, compare superblocks on both devices | 583 | * if refdev != NULL, compare superblocks on both devices |
584 | * Return: | 584 | * Return: |
585 | * 0 - dev has a superblock that is compatible with refdev | 585 | * 0 - dev has a superblock that is compatible with refdev |
586 | * 1 - dev has a superblock that is compatible and newer than refdev | 586 | * 1 - dev has a superblock that is compatible and newer than refdev |
587 | * so dev should be used as the refdev in future | 587 | * so dev should be used as the refdev in future |
588 | * -EINVAL superblock incompatible or invalid | 588 | * -EINVAL superblock incompatible or invalid |
589 | * -othererror e.g. -EIO | 589 | * -othererror e.g. -EIO |
590 | * | 590 | * |
591 | * int validate_super(mddev_t *mddev, mdk_rdev_t *dev) | 591 | * int validate_super(mddev_t *mddev, mdk_rdev_t *dev) |
592 | * Verify that dev is acceptable into mddev. | 592 | * Verify that dev is acceptable into mddev. |
593 | * The first time, mddev->raid_disks will be 0, and data from | 593 | * The first time, mddev->raid_disks will be 0, and data from |
594 | * dev should be merged in. Subsequent calls check that dev | 594 | * dev should be merged in. Subsequent calls check that dev |
595 | * is new enough. Return 0 or -EINVAL | 595 | * is new enough. Return 0 or -EINVAL |
596 | * | 596 | * |
597 | * void sync_super(mddev_t *mddev, mdk_rdev_t *dev) | 597 | * void sync_super(mddev_t *mddev, mdk_rdev_t *dev) |
598 | * Update the superblock for rdev with data in mddev | 598 | * Update the superblock for rdev with data in mddev |
599 | * This does not write to disc. | 599 | * This does not write to disc. |
600 | * | 600 | * |
601 | */ | 601 | */ |
602 | 602 | ||
603 | struct super_type { | 603 | struct super_type { |
604 | char *name; | 604 | char *name; |
605 | struct module *owner; | 605 | struct module *owner; |
606 | int (*load_super)(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version); | 606 | int (*load_super)(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version); |
607 | int (*validate_super)(mddev_t *mddev, mdk_rdev_t *rdev); | 607 | int (*validate_super)(mddev_t *mddev, mdk_rdev_t *rdev); |
608 | void (*sync_super)(mddev_t *mddev, mdk_rdev_t *rdev); | 608 | void (*sync_super)(mddev_t *mddev, mdk_rdev_t *rdev); |
609 | }; | 609 | }; |
610 | 610 | ||
611 | /* | 611 | /* |
612 | * load_super for 0.90.0 | 612 | * load_super for 0.90.0 |
613 | */ | 613 | */ |
614 | static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) | 614 | static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) |
615 | { | 615 | { |
616 | char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; | 616 | char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; |
617 | mdp_super_t *sb; | 617 | mdp_super_t *sb; |
618 | int ret; | 618 | int ret; |
619 | sector_t sb_offset; | 619 | sector_t sb_offset; |
620 | 620 | ||
621 | /* | 621 | /* |
622 | * Calculate the position of the superblock, | 622 | * Calculate the position of the superblock, |
623 | * it's at the end of the disk. | 623 | * it's at the end of the disk. |
624 | * | 624 | * |
625 | * It also happens to be a multiple of 4Kb. | 625 | * It also happens to be a multiple of 4Kb. |
626 | */ | 626 | */ |
627 | sb_offset = calc_dev_sboffset(rdev->bdev); | 627 | sb_offset = calc_dev_sboffset(rdev->bdev); |
628 | rdev->sb_offset = sb_offset; | 628 | rdev->sb_offset = sb_offset; |
629 | 629 | ||
630 | ret = read_disk_sb(rdev, MD_SB_BYTES); | 630 | ret = read_disk_sb(rdev, MD_SB_BYTES); |
631 | if (ret) return ret; | 631 | if (ret) return ret; |
632 | 632 | ||
633 | ret = -EINVAL; | 633 | ret = -EINVAL; |
634 | 634 | ||
635 | bdevname(rdev->bdev, b); | 635 | bdevname(rdev->bdev, b); |
636 | sb = (mdp_super_t*)page_address(rdev->sb_page); | 636 | sb = (mdp_super_t*)page_address(rdev->sb_page); |
637 | 637 | ||
638 | if (sb->md_magic != MD_SB_MAGIC) { | 638 | if (sb->md_magic != MD_SB_MAGIC) { |
639 | printk(KERN_ERR "md: invalid raid superblock magic on %s\n", | 639 | printk(KERN_ERR "md: invalid raid superblock magic on %s\n", |
640 | b); | 640 | b); |
641 | goto abort; | 641 | goto abort; |
642 | } | 642 | } |
643 | 643 | ||
644 | if (sb->major_version != 0 || | 644 | if (sb->major_version != 0 || |
645 | sb->minor_version != 90) { | 645 | sb->minor_version != 90) { |
646 | printk(KERN_WARNING "Bad version number %d.%d on %s\n", | 646 | printk(KERN_WARNING "Bad version number %d.%d on %s\n", |
647 | sb->major_version, sb->minor_version, | 647 | sb->major_version, sb->minor_version, |
648 | b); | 648 | b); |
649 | goto abort; | 649 | goto abort; |
650 | } | 650 | } |
651 | 651 | ||
652 | if (sb->raid_disks <= 0) | 652 | if (sb->raid_disks <= 0) |
653 | goto abort; | 653 | goto abort; |
654 | 654 | ||
655 | if (csum_fold(calc_sb_csum(sb)) != csum_fold(sb->sb_csum)) { | 655 | if (csum_fold(calc_sb_csum(sb)) != csum_fold(sb->sb_csum)) { |
656 | printk(KERN_WARNING "md: invalid superblock checksum on %s\n", | 656 | printk(KERN_WARNING "md: invalid superblock checksum on %s\n", |
657 | b); | 657 | b); |
658 | goto abort; | 658 | goto abort; |
659 | } | 659 | } |
660 | 660 | ||
661 | rdev->preferred_minor = sb->md_minor; | 661 | rdev->preferred_minor = sb->md_minor; |
662 | rdev->data_offset = 0; | 662 | rdev->data_offset = 0; |
663 | rdev->sb_size = MD_SB_BYTES; | 663 | rdev->sb_size = MD_SB_BYTES; |
664 | 664 | ||
665 | if (sb->level == LEVEL_MULTIPATH) | 665 | if (sb->level == LEVEL_MULTIPATH) |
666 | rdev->desc_nr = -1; | 666 | rdev->desc_nr = -1; |
667 | else | 667 | else |
668 | rdev->desc_nr = sb->this_disk.number; | 668 | rdev->desc_nr = sb->this_disk.number; |
669 | 669 | ||
670 | if (refdev == 0) | 670 | if (refdev == 0) |
671 | ret = 1; | 671 | ret = 1; |
672 | else { | 672 | else { |
673 | __u64 ev1, ev2; | 673 | __u64 ev1, ev2; |
674 | mdp_super_t *refsb = (mdp_super_t*)page_address(refdev->sb_page); | 674 | mdp_super_t *refsb = (mdp_super_t*)page_address(refdev->sb_page); |
675 | if (!uuid_equal(refsb, sb)) { | 675 | if (!uuid_equal(refsb, sb)) { |
676 | printk(KERN_WARNING "md: %s has different UUID to %s\n", | 676 | printk(KERN_WARNING "md: %s has different UUID to %s\n", |
677 | b, bdevname(refdev->bdev,b2)); | 677 | b, bdevname(refdev->bdev,b2)); |
678 | goto abort; | 678 | goto abort; |
679 | } | 679 | } |
680 | if (!sb_equal(refsb, sb)) { | 680 | if (!sb_equal(refsb, sb)) { |
681 | printk(KERN_WARNING "md: %s has same UUID" | 681 | printk(KERN_WARNING "md: %s has same UUID" |
682 | " but different superblock to %s\n", | 682 | " but different superblock to %s\n", |
683 | b, bdevname(refdev->bdev, b2)); | 683 | b, bdevname(refdev->bdev, b2)); |
684 | goto abort; | 684 | goto abort; |
685 | } | 685 | } |
686 | ev1 = md_event(sb); | 686 | ev1 = md_event(sb); |
687 | ev2 = md_event(refsb); | 687 | ev2 = md_event(refsb); |
688 | if (ev1 > ev2) | 688 | if (ev1 > ev2) |
689 | ret = 1; | 689 | ret = 1; |
690 | else | 690 | else |
691 | ret = 0; | 691 | ret = 0; |
692 | } | 692 | } |
693 | rdev->size = calc_dev_size(rdev, sb->chunk_size); | 693 | rdev->size = calc_dev_size(rdev, sb->chunk_size); |
694 | 694 | ||
695 | abort: | 695 | abort: |
696 | return ret; | 696 | return ret; |
697 | } | 697 | } |
698 | 698 | ||
699 | /* | 699 | /* |
700 | * validate_super for 0.90.0 | 700 | * validate_super for 0.90.0 |
701 | */ | 701 | */ |
702 | static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) | 702 | static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) |
703 | { | 703 | { |
704 | mdp_disk_t *desc; | 704 | mdp_disk_t *desc; |
705 | mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page); | 705 | mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page); |
706 | 706 | ||
707 | rdev->raid_disk = -1; | 707 | rdev->raid_disk = -1; |
708 | rdev->flags = 0; | 708 | rdev->flags = 0; |
709 | if (mddev->raid_disks == 0) { | 709 | if (mddev->raid_disks == 0) { |
710 | mddev->major_version = 0; | 710 | mddev->major_version = 0; |
711 | mddev->minor_version = sb->minor_version; | 711 | mddev->minor_version = sb->minor_version; |
712 | mddev->patch_version = sb->patch_version; | 712 | mddev->patch_version = sb->patch_version; |
713 | mddev->persistent = ! sb->not_persistent; | 713 | mddev->persistent = ! sb->not_persistent; |
714 | mddev->chunk_size = sb->chunk_size; | 714 | mddev->chunk_size = sb->chunk_size; |
715 | mddev->ctime = sb->ctime; | 715 | mddev->ctime = sb->ctime; |
716 | mddev->utime = sb->utime; | 716 | mddev->utime = sb->utime; |
717 | mddev->level = sb->level; | 717 | mddev->level = sb->level; |
718 | mddev->layout = sb->layout; | 718 | mddev->layout = sb->layout; |
719 | mddev->raid_disks = sb->raid_disks; | 719 | mddev->raid_disks = sb->raid_disks; |
720 | mddev->size = sb->size; | 720 | mddev->size = sb->size; |
721 | mddev->events = md_event(sb); | 721 | mddev->events = md_event(sb); |
722 | mddev->bitmap_offset = 0; | 722 | mddev->bitmap_offset = 0; |
723 | mddev->default_bitmap_offset = MD_SB_BYTES >> 9; | 723 | mddev->default_bitmap_offset = MD_SB_BYTES >> 9; |
724 | 724 | ||
725 | if (sb->state & (1<<MD_SB_CLEAN)) | 725 | if (sb->state & (1<<MD_SB_CLEAN)) |
726 | mddev->recovery_cp = MaxSector; | 726 | mddev->recovery_cp = MaxSector; |
727 | else { | 727 | else { |
728 | if (sb->events_hi == sb->cp_events_hi && | 728 | if (sb->events_hi == sb->cp_events_hi && |
729 | sb->events_lo == sb->cp_events_lo) { | 729 | sb->events_lo == sb->cp_events_lo) { |
730 | mddev->recovery_cp = sb->recovery_cp; | 730 | mddev->recovery_cp = sb->recovery_cp; |
731 | } else | 731 | } else |
732 | mddev->recovery_cp = 0; | 732 | mddev->recovery_cp = 0; |
733 | } | 733 | } |
734 | 734 | ||
735 | memcpy(mddev->uuid+0, &sb->set_uuid0, 4); | 735 | memcpy(mddev->uuid+0, &sb->set_uuid0, 4); |
736 | memcpy(mddev->uuid+4, &sb->set_uuid1, 4); | 736 | memcpy(mddev->uuid+4, &sb->set_uuid1, 4); |
737 | memcpy(mddev->uuid+8, &sb->set_uuid2, 4); | 737 | memcpy(mddev->uuid+8, &sb->set_uuid2, 4); |
738 | memcpy(mddev->uuid+12,&sb->set_uuid3, 4); | 738 | memcpy(mddev->uuid+12,&sb->set_uuid3, 4); |
739 | 739 | ||
740 | mddev->max_disks = MD_SB_DISKS; | 740 | mddev->max_disks = MD_SB_DISKS; |
741 | 741 | ||
742 | if (sb->state & (1<<MD_SB_BITMAP_PRESENT) && | 742 | if (sb->state & (1<<MD_SB_BITMAP_PRESENT) && |
743 | mddev->bitmap_file == NULL) { | 743 | mddev->bitmap_file == NULL) { |
744 | if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6 | 744 | if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6 |
745 | && mddev->level != 10) { | 745 | && mddev->level != 10) { |
746 | /* FIXME use a better test */ | 746 | /* FIXME use a better test */ |
747 | printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); | 747 | printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); |
748 | return -EINVAL; | 748 | return -EINVAL; |
749 | } | 749 | } |
750 | mddev->bitmap_offset = mddev->default_bitmap_offset; | 750 | mddev->bitmap_offset = mddev->default_bitmap_offset; |
751 | } | 751 | } |
752 | 752 | ||
753 | } else if (mddev->pers == NULL) { | 753 | } else if (mddev->pers == NULL) { |
754 | /* Insist on good event counter while assembling */ | 754 | /* Insist on good event counter while assembling */ |
755 | __u64 ev1 = md_event(sb); | 755 | __u64 ev1 = md_event(sb); |
756 | ++ev1; | 756 | ++ev1; |
757 | if (ev1 < mddev->events) | 757 | if (ev1 < mddev->events) |
758 | return -EINVAL; | 758 | return -EINVAL; |
759 | } else if (mddev->bitmap) { | 759 | } else if (mddev->bitmap) { |
760 | /* if adding to array with a bitmap, then we can accept an | 760 | /* if adding to array with a bitmap, then we can accept an |
761 | * older device ... but not too old. | 761 | * older device ... but not too old. |
762 | */ | 762 | */ |
763 | __u64 ev1 = md_event(sb); | 763 | __u64 ev1 = md_event(sb); |
764 | if (ev1 < mddev->bitmap->events_cleared) | 764 | if (ev1 < mddev->bitmap->events_cleared) |
765 | return 0; | 765 | return 0; |
766 | } else /* just a hot-add of a new device, leave raid_disk at -1 */ | 766 | } else /* just a hot-add of a new device, leave raid_disk at -1 */ |
767 | return 0; | 767 | return 0; |
768 | 768 | ||
769 | if (mddev->level != LEVEL_MULTIPATH) { | 769 | if (mddev->level != LEVEL_MULTIPATH) { |
770 | desc = sb->disks + rdev->desc_nr; | 770 | desc = sb->disks + rdev->desc_nr; |
771 | 771 | ||
772 | if (desc->state & (1<<MD_DISK_FAULTY)) | 772 | if (desc->state & (1<<MD_DISK_FAULTY)) |
773 | set_bit(Faulty, &rdev->flags); | 773 | set_bit(Faulty, &rdev->flags); |
774 | else if (desc->state & (1<<MD_DISK_SYNC) && | 774 | else if (desc->state & (1<<MD_DISK_SYNC) && |
775 | desc->raid_disk < mddev->raid_disks) { | 775 | desc->raid_disk < mddev->raid_disks) { |
776 | set_bit(In_sync, &rdev->flags); | 776 | set_bit(In_sync, &rdev->flags); |
777 | rdev->raid_disk = desc->raid_disk; | 777 | rdev->raid_disk = desc->raid_disk; |
778 | } | 778 | } |
779 | if (desc->state & (1<<MD_DISK_WRITEMOSTLY)) | 779 | if (desc->state & (1<<MD_DISK_WRITEMOSTLY)) |
780 | set_bit(WriteMostly, &rdev->flags); | 780 | set_bit(WriteMostly, &rdev->flags); |
781 | } else /* MULTIPATH are always insync */ | 781 | } else /* MULTIPATH are always insync */ |
782 | set_bit(In_sync, &rdev->flags); | 782 | set_bit(In_sync, &rdev->flags); |
783 | return 0; | 783 | return 0; |
784 | } | 784 | } |
785 | 785 | ||
786 | /* | 786 | /* |
787 | * sync_super for 0.90.0 | 787 | * sync_super for 0.90.0 |
788 | */ | 788 | */ |
789 | static void super_90_sync(mddev_t *mddev, mdk_rdev_t *rdev) | 789 | static void super_90_sync(mddev_t *mddev, mdk_rdev_t *rdev) |
790 | { | 790 | { |
791 | mdp_super_t *sb; | 791 | mdp_super_t *sb; |
792 | struct list_head *tmp; | 792 | struct list_head *tmp; |
793 | mdk_rdev_t *rdev2; | 793 | mdk_rdev_t *rdev2; |
794 | int next_spare = mddev->raid_disks; | 794 | int next_spare = mddev->raid_disks; |
795 | 795 | ||
796 | 796 | ||
797 | /* make rdev->sb match mddev data.. | 797 | /* make rdev->sb match mddev data.. |
798 | * | 798 | * |
799 | * 1/ zero out disks | 799 | * 1/ zero out disks |
800 | * 2/ Add info for each disk, keeping track of highest desc_nr (next_spare); | 800 | * 2/ Add info for each disk, keeping track of highest desc_nr (next_spare); |
801 | * 3/ any empty disks < next_spare become removed | 801 | * 3/ any empty disks < next_spare become removed |
802 | * | 802 | * |
803 | * disks[0] gets initialised to REMOVED because | 803 | * disks[0] gets initialised to REMOVED because |
804 | * we cannot be sure from other fields if it has | 804 | * we cannot be sure from other fields if it has |
805 | * been initialised or not. | 805 | * been initialised or not. |
806 | */ | 806 | */ |
807 | int i; | 807 | int i; |
808 | int active=0, working=0,failed=0,spare=0,nr_disks=0; | 808 | int active=0, working=0,failed=0,spare=0,nr_disks=0; |
809 | 809 | ||
810 | rdev->sb_size = MD_SB_BYTES; | 810 | rdev->sb_size = MD_SB_BYTES; |
811 | 811 | ||
812 | sb = (mdp_super_t*)page_address(rdev->sb_page); | 812 | sb = (mdp_super_t*)page_address(rdev->sb_page); |
813 | 813 | ||
814 | memset(sb, 0, sizeof(*sb)); | 814 | memset(sb, 0, sizeof(*sb)); |
815 | 815 | ||
816 | sb->md_magic = MD_SB_MAGIC; | 816 | sb->md_magic = MD_SB_MAGIC; |
817 | sb->major_version = mddev->major_version; | 817 | sb->major_version = mddev->major_version; |
818 | sb->minor_version = mddev->minor_version; | 818 | sb->minor_version = mddev->minor_version; |
819 | sb->patch_version = mddev->patch_version; | 819 | sb->patch_version = mddev->patch_version; |
820 | sb->gvalid_words = 0; /* ignored */ | 820 | sb->gvalid_words = 0; /* ignored */ |
821 | memcpy(&sb->set_uuid0, mddev->uuid+0, 4); | 821 | memcpy(&sb->set_uuid0, mddev->uuid+0, 4); |
822 | memcpy(&sb->set_uuid1, mddev->uuid+4, 4); | 822 | memcpy(&sb->set_uuid1, mddev->uuid+4, 4); |
823 | memcpy(&sb->set_uuid2, mddev->uuid+8, 4); | 823 | memcpy(&sb->set_uuid2, mddev->uuid+8, 4); |
824 | memcpy(&sb->set_uuid3, mddev->uuid+12,4); | 824 | memcpy(&sb->set_uuid3, mddev->uuid+12,4); |
825 | 825 | ||
826 | sb->ctime = mddev->ctime; | 826 | sb->ctime = mddev->ctime; |
827 | sb->level = mddev->level; | 827 | sb->level = mddev->level; |
828 | sb->size = mddev->size; | 828 | sb->size = mddev->size; |
829 | sb->raid_disks = mddev->raid_disks; | 829 | sb->raid_disks = mddev->raid_disks; |
830 | sb->md_minor = mddev->md_minor; | 830 | sb->md_minor = mddev->md_minor; |
831 | sb->not_persistent = !mddev->persistent; | 831 | sb->not_persistent = !mddev->persistent; |
832 | sb->utime = mddev->utime; | 832 | sb->utime = mddev->utime; |
833 | sb->state = 0; | 833 | sb->state = 0; |
834 | sb->events_hi = (mddev->events>>32); | 834 | sb->events_hi = (mddev->events>>32); |
835 | sb->events_lo = (u32)mddev->events; | 835 | sb->events_lo = (u32)mddev->events; |
836 | 836 | ||
837 | if (mddev->in_sync) | 837 | if (mddev->in_sync) |
838 | { | 838 | { |
839 | sb->recovery_cp = mddev->recovery_cp; | 839 | sb->recovery_cp = mddev->recovery_cp; |
840 | sb->cp_events_hi = (mddev->events>>32); | 840 | sb->cp_events_hi = (mddev->events>>32); |
841 | sb->cp_events_lo = (u32)mddev->events; | 841 | sb->cp_events_lo = (u32)mddev->events; |
842 | if (mddev->recovery_cp == MaxSector) | 842 | if (mddev->recovery_cp == MaxSector) |
843 | sb->state = (1<< MD_SB_CLEAN); | 843 | sb->state = (1<< MD_SB_CLEAN); |
844 | } else | 844 | } else |
845 | sb->recovery_cp = 0; | 845 | sb->recovery_cp = 0; |
846 | 846 | ||
847 | sb->layout = mddev->layout; | 847 | sb->layout = mddev->layout; |
848 | sb->chunk_size = mddev->chunk_size; | 848 | sb->chunk_size = mddev->chunk_size; |
849 | 849 | ||
850 | if (mddev->bitmap && mddev->bitmap_file == NULL) | 850 | if (mddev->bitmap && mddev->bitmap_file == NULL) |
851 | sb->state |= (1<<MD_SB_BITMAP_PRESENT); | 851 | sb->state |= (1<<MD_SB_BITMAP_PRESENT); |
852 | 852 | ||
853 | sb->disks[0].state = (1<<MD_DISK_REMOVED); | 853 | sb->disks[0].state = (1<<MD_DISK_REMOVED); |
854 | ITERATE_RDEV(mddev,rdev2,tmp) { | 854 | ITERATE_RDEV(mddev,rdev2,tmp) { |
855 | mdp_disk_t *d; | 855 | mdp_disk_t *d; |
856 | int desc_nr; | 856 | int desc_nr; |
857 | if (rdev2->raid_disk >= 0 && test_bit(In_sync, &rdev2->flags) | 857 | if (rdev2->raid_disk >= 0 && test_bit(In_sync, &rdev2->flags) |
858 | && !test_bit(Faulty, &rdev2->flags)) | 858 | && !test_bit(Faulty, &rdev2->flags)) |
859 | desc_nr = rdev2->raid_disk; | 859 | desc_nr = rdev2->raid_disk; |
860 | else | 860 | else |
861 | desc_nr = next_spare++; | 861 | desc_nr = next_spare++; |
862 | rdev2->desc_nr = desc_nr; | 862 | rdev2->desc_nr = desc_nr; |
863 | d = &sb->disks[rdev2->desc_nr]; | 863 | d = &sb->disks[rdev2->desc_nr]; |
864 | nr_disks++; | 864 | nr_disks++; |
865 | d->number = rdev2->desc_nr; | 865 | d->number = rdev2->desc_nr; |
866 | d->major = MAJOR(rdev2->bdev->bd_dev); | 866 | d->major = MAJOR(rdev2->bdev->bd_dev); |
867 | d->minor = MINOR(rdev2->bdev->bd_dev); | 867 | d->minor = MINOR(rdev2->bdev->bd_dev); |
868 | if (rdev2->raid_disk >= 0 && test_bit(In_sync, &rdev2->flags) | 868 | if (rdev2->raid_disk >= 0 && test_bit(In_sync, &rdev2->flags) |
869 | && !test_bit(Faulty, &rdev2->flags)) | 869 | && !test_bit(Faulty, &rdev2->flags)) |
870 | d->raid_disk = rdev2->raid_disk; | 870 | d->raid_disk = rdev2->raid_disk; |
871 | else | 871 | else |
872 | d->raid_disk = rdev2->desc_nr; /* compatibility */ | 872 | d->raid_disk = rdev2->desc_nr; /* compatibility */ |
873 | if (test_bit(Faulty, &rdev2->flags)) { | 873 | if (test_bit(Faulty, &rdev2->flags)) { |
874 | d->state = (1<<MD_DISK_FAULTY); | 874 | d->state = (1<<MD_DISK_FAULTY); |
875 | failed++; | 875 | failed++; |
876 | } else if (test_bit(In_sync, &rdev2->flags)) { | 876 | } else if (test_bit(In_sync, &rdev2->flags)) { |
877 | d->state = (1<<MD_DISK_ACTIVE); | 877 | d->state = (1<<MD_DISK_ACTIVE); |
878 | d->state |= (1<<MD_DISK_SYNC); | 878 | d->state |= (1<<MD_DISK_SYNC); |
879 | active++; | 879 | active++; |
880 | working++; | 880 | working++; |
881 | } else { | 881 | } else { |
882 | d->state = 0; | 882 | d->state = 0; |
883 | spare++; | 883 | spare++; |
884 | working++; | 884 | working++; |
885 | } | 885 | } |
886 | if (test_bit(WriteMostly, &rdev2->flags)) | 886 | if (test_bit(WriteMostly, &rdev2->flags)) |
887 | d->state |= (1<<MD_DISK_WRITEMOSTLY); | 887 | d->state |= (1<<MD_DISK_WRITEMOSTLY); |
888 | } | 888 | } |
889 | /* now set the "removed" and "faulty" bits on any missing devices */ | 889 | /* now set the "removed" and "faulty" bits on any missing devices */ |
890 | for (i=0 ; i < mddev->raid_disks ; i++) { | 890 | for (i=0 ; i < mddev->raid_disks ; i++) { |
891 | mdp_disk_t *d = &sb->disks[i]; | 891 | mdp_disk_t *d = &sb->disks[i]; |
892 | if (d->state == 0 && d->number == 0) { | 892 | if (d->state == 0 && d->number == 0) { |
893 | d->number = i; | 893 | d->number = i; |
894 | d->raid_disk = i; | 894 | d->raid_disk = i; |
895 | d->state = (1<<MD_DISK_REMOVED); | 895 | d->state = (1<<MD_DISK_REMOVED); |
896 | d->state |= (1<<MD_DISK_FAULTY); | 896 | d->state |= (1<<MD_DISK_FAULTY); |
897 | failed++; | 897 | failed++; |
898 | } | 898 | } |
899 | } | 899 | } |
900 | sb->nr_disks = nr_disks; | 900 | sb->nr_disks = nr_disks; |
901 | sb->active_disks = active; | 901 | sb->active_disks = active; |
902 | sb->working_disks = working; | 902 | sb->working_disks = working; |
903 | sb->failed_disks = failed; | 903 | sb->failed_disks = failed; |
904 | sb->spare_disks = spare; | 904 | sb->spare_disks = spare; |
905 | 905 | ||
906 | sb->this_disk = sb->disks[rdev->desc_nr]; | 906 | sb->this_disk = sb->disks[rdev->desc_nr]; |
907 | sb->sb_csum = calc_sb_csum(sb); | 907 | sb->sb_csum = calc_sb_csum(sb); |
908 | } | 908 | } |
909 | 909 | ||
910 | /* | 910 | /* |
911 | * version 1 superblock | 911 | * version 1 superblock |
912 | */ | 912 | */ |
913 | 913 | ||
914 | static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb) | 914 | static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb) |
915 | { | 915 | { |
916 | unsigned int disk_csum, csum; | 916 | unsigned int disk_csum, csum; |
917 | unsigned long long newcsum; | 917 | unsigned long long newcsum; |
918 | int size = 256 + le32_to_cpu(sb->max_dev)*2; | 918 | int size = 256 + le32_to_cpu(sb->max_dev)*2; |
919 | unsigned int *isuper = (unsigned int*)sb; | 919 | unsigned int *isuper = (unsigned int*)sb; |
920 | int i; | 920 | int i; |
921 | 921 | ||
922 | disk_csum = sb->sb_csum; | 922 | disk_csum = sb->sb_csum; |
923 | sb->sb_csum = 0; | 923 | sb->sb_csum = 0; |
924 | newcsum = 0; | 924 | newcsum = 0; |
925 | for (i=0; size>=4; size -= 4 ) | 925 | for (i=0; size>=4; size -= 4 ) |
926 | newcsum += le32_to_cpu(*isuper++); | 926 | newcsum += le32_to_cpu(*isuper++); |
927 | 927 | ||
928 | if (size == 2) | 928 | if (size == 2) |
929 | newcsum += le16_to_cpu(*(unsigned short*) isuper); | 929 | newcsum += le16_to_cpu(*(unsigned short*) isuper); |
930 | 930 | ||
931 | csum = (newcsum & 0xffffffff) + (newcsum >> 32); | 931 | csum = (newcsum & 0xffffffff) + (newcsum >> 32); |
932 | sb->sb_csum = disk_csum; | 932 | sb->sb_csum = disk_csum; |
933 | return cpu_to_le32(csum); | 933 | return cpu_to_le32(csum); |
934 | } | 934 | } |
935 | 935 | ||
936 | static int super_1_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) | 936 | static int super_1_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) |
937 | { | 937 | { |
938 | struct mdp_superblock_1 *sb; | 938 | struct mdp_superblock_1 *sb; |
939 | int ret; | 939 | int ret; |
940 | sector_t sb_offset; | 940 | sector_t sb_offset; |
941 | char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; | 941 | char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; |
942 | int bmask; | 942 | int bmask; |
943 | 943 | ||
944 | /* | 944 | /* |
945 | * Calculate the position of the superblock. | 945 | * Calculate the position of the superblock. |
946 | * It is always aligned to a 4K boundary and | 946 | * It is always aligned to a 4K boundary and |
947 | * depeding on minor_version, it can be: | 947 | * depeding on minor_version, it can be: |
948 | * 0: At least 8K, but less than 12K, from end of device | 948 | * 0: At least 8K, but less than 12K, from end of device |
949 | * 1: At start of device | 949 | * 1: At start of device |
950 | * 2: 4K from start of device. | 950 | * 2: 4K from start of device. |
951 | */ | 951 | */ |
952 | switch(minor_version) { | 952 | switch(minor_version) { |
953 | case 0: | 953 | case 0: |
954 | sb_offset = rdev->bdev->bd_inode->i_size >> 9; | 954 | sb_offset = rdev->bdev->bd_inode->i_size >> 9; |
955 | sb_offset -= 8*2; | 955 | sb_offset -= 8*2; |
956 | sb_offset &= ~(sector_t)(4*2-1); | 956 | sb_offset &= ~(sector_t)(4*2-1); |
957 | /* convert from sectors to K */ | 957 | /* convert from sectors to K */ |
958 | sb_offset /= 2; | 958 | sb_offset /= 2; |
959 | break; | 959 | break; |
960 | case 1: | 960 | case 1: |
961 | sb_offset = 0; | 961 | sb_offset = 0; |
962 | break; | 962 | break; |
963 | case 2: | 963 | case 2: |
964 | sb_offset = 4; | 964 | sb_offset = 4; |
965 | break; | 965 | break; |
966 | default: | 966 | default: |
967 | return -EINVAL; | 967 | return -EINVAL; |
968 | } | 968 | } |
969 | rdev->sb_offset = sb_offset; | 969 | rdev->sb_offset = sb_offset; |
970 | 970 | ||
971 | /* superblock is rarely larger than 1K, but it can be larger, | 971 | /* superblock is rarely larger than 1K, but it can be larger, |
972 | * and it is safe to read 4k, so we do that | 972 | * and it is safe to read 4k, so we do that |
973 | */ | 973 | */ |
974 | ret = read_disk_sb(rdev, 4096); | 974 | ret = read_disk_sb(rdev, 4096); |
975 | if (ret) return ret; | 975 | if (ret) return ret; |
976 | 976 | ||
977 | 977 | ||
978 | sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); | 978 | sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); |
979 | 979 | ||
980 | if (sb->magic != cpu_to_le32(MD_SB_MAGIC) || | 980 | if (sb->magic != cpu_to_le32(MD_SB_MAGIC) || |
981 | sb->major_version != cpu_to_le32(1) || | 981 | sb->major_version != cpu_to_le32(1) || |
982 | le32_to_cpu(sb->max_dev) > (4096-256)/2 || | 982 | le32_to_cpu(sb->max_dev) > (4096-256)/2 || |
983 | le64_to_cpu(sb->super_offset) != (rdev->sb_offset<<1) || | 983 | le64_to_cpu(sb->super_offset) != (rdev->sb_offset<<1) || |
984 | (le32_to_cpu(sb->feature_map) & ~MD_FEATURE_ALL) != 0) | 984 | (le32_to_cpu(sb->feature_map) & ~MD_FEATURE_ALL) != 0) |
985 | return -EINVAL; | 985 | return -EINVAL; |
986 | 986 | ||
987 | if (calc_sb_1_csum(sb) != sb->sb_csum) { | 987 | if (calc_sb_1_csum(sb) != sb->sb_csum) { |
988 | printk("md: invalid superblock checksum on %s\n", | 988 | printk("md: invalid superblock checksum on %s\n", |
989 | bdevname(rdev->bdev,b)); | 989 | bdevname(rdev->bdev,b)); |
990 | return -EINVAL; | 990 | return -EINVAL; |
991 | } | 991 | } |
992 | if (le64_to_cpu(sb->data_size) < 10) { | 992 | if (le64_to_cpu(sb->data_size) < 10) { |
993 | printk("md: data_size too small on %s\n", | 993 | printk("md: data_size too small on %s\n", |
994 | bdevname(rdev->bdev,b)); | 994 | bdevname(rdev->bdev,b)); |
995 | return -EINVAL; | 995 | return -EINVAL; |
996 | } | 996 | } |
997 | rdev->preferred_minor = 0xffff; | 997 | rdev->preferred_minor = 0xffff; |
998 | rdev->data_offset = le64_to_cpu(sb->data_offset); | 998 | rdev->data_offset = le64_to_cpu(sb->data_offset); |
999 | 999 | ||
1000 | rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256; | 1000 | rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256; |
1001 | bmask = queue_hardsect_size(rdev->bdev->bd_disk->queue)-1; | 1001 | bmask = queue_hardsect_size(rdev->bdev->bd_disk->queue)-1; |
1002 | if (rdev->sb_size & bmask) | 1002 | if (rdev->sb_size & bmask) |
1003 | rdev-> sb_size = (rdev->sb_size | bmask)+1; | 1003 | rdev-> sb_size = (rdev->sb_size | bmask)+1; |
1004 | 1004 | ||
1005 | if (refdev == 0) | 1005 | if (refdev == 0) |
1006 | return 1; | 1006 | return 1; |
1007 | else { | 1007 | else { |
1008 | __u64 ev1, ev2; | 1008 | __u64 ev1, ev2; |
1009 | struct mdp_superblock_1 *refsb = | 1009 | struct mdp_superblock_1 *refsb = |
1010 | (struct mdp_superblock_1*)page_address(refdev->sb_page); | 1010 | (struct mdp_superblock_1*)page_address(refdev->sb_page); |
1011 | 1011 | ||
1012 | if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 || | 1012 | if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 || |
1013 | sb->level != refsb->level || | 1013 | sb->level != refsb->level || |
1014 | sb->layout != refsb->layout || | 1014 | sb->layout != refsb->layout || |
1015 | sb->chunksize != refsb->chunksize) { | 1015 | sb->chunksize != refsb->chunksize) { |
1016 | printk(KERN_WARNING "md: %s has strangely different" | 1016 | printk(KERN_WARNING "md: %s has strangely different" |
1017 | " superblock to %s\n", | 1017 | " superblock to %s\n", |
1018 | bdevname(rdev->bdev,b), | 1018 | bdevname(rdev->bdev,b), |
1019 | bdevname(refdev->bdev,b2)); | 1019 | bdevname(refdev->bdev,b2)); |
1020 | return -EINVAL; | 1020 | return -EINVAL; |
1021 | } | 1021 | } |
1022 | ev1 = le64_to_cpu(sb->events); | 1022 | ev1 = le64_to_cpu(sb->events); |
1023 | ev2 = le64_to_cpu(refsb->events); | 1023 | ev2 = le64_to_cpu(refsb->events); |
1024 | 1024 | ||
1025 | if (ev1 > ev2) | 1025 | if (ev1 > ev2) |
1026 | return 1; | 1026 | return 1; |
1027 | } | 1027 | } |
1028 | if (minor_version) | 1028 | if (minor_version) |
1029 | rdev->size = ((rdev->bdev->bd_inode->i_size>>9) - le64_to_cpu(sb->data_offset)) / 2; | 1029 | rdev->size = ((rdev->bdev->bd_inode->i_size>>9) - le64_to_cpu(sb->data_offset)) / 2; |
1030 | else | 1030 | else |
1031 | rdev->size = rdev->sb_offset; | 1031 | rdev->size = rdev->sb_offset; |
1032 | if (rdev->size < le64_to_cpu(sb->data_size)/2) | 1032 | if (rdev->size < le64_to_cpu(sb->data_size)/2) |
1033 | return -EINVAL; | 1033 | return -EINVAL; |
1034 | rdev->size = le64_to_cpu(sb->data_size)/2; | 1034 | rdev->size = le64_to_cpu(sb->data_size)/2; |
1035 | if (le32_to_cpu(sb->chunksize)) | 1035 | if (le32_to_cpu(sb->chunksize)) |
1036 | rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1); | 1036 | rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1); |
1037 | return 0; | 1037 | return 0; |
1038 | } | 1038 | } |
1039 | 1039 | ||
1040 | static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev) | 1040 | static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev) |
1041 | { | 1041 | { |
1042 | struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); | 1042 | struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); |
1043 | 1043 | ||
1044 | rdev->raid_disk = -1; | 1044 | rdev->raid_disk = -1; |
1045 | rdev->flags = 0; | 1045 | rdev->flags = 0; |
1046 | if (mddev->raid_disks == 0) { | 1046 | if (mddev->raid_disks == 0) { |
1047 | mddev->major_version = 1; | 1047 | mddev->major_version = 1; |
1048 | mddev->patch_version = 0; | 1048 | mddev->patch_version = 0; |
1049 | mddev->persistent = 1; | 1049 | mddev->persistent = 1; |
1050 | mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9; | 1050 | mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9; |
1051 | mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1); | 1051 | mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1); |
1052 | mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1); | 1052 | mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1); |
1053 | mddev->level = le32_to_cpu(sb->level); | 1053 | mddev->level = le32_to_cpu(sb->level); |
1054 | mddev->layout = le32_to_cpu(sb->layout); | 1054 | mddev->layout = le32_to_cpu(sb->layout); |
1055 | mddev->raid_disks = le32_to_cpu(sb->raid_disks); | 1055 | mddev->raid_disks = le32_to_cpu(sb->raid_disks); |
1056 | mddev->size = le64_to_cpu(sb->size)/2; | 1056 | mddev->size = le64_to_cpu(sb->size)/2; |
1057 | mddev->events = le64_to_cpu(sb->events); | 1057 | mddev->events = le64_to_cpu(sb->events); |
1058 | mddev->bitmap_offset = 0; | 1058 | mddev->bitmap_offset = 0; |
1059 | mddev->default_bitmap_offset = 1024; | 1059 | mddev->default_bitmap_offset = 1024; |
1060 | 1060 | ||
1061 | mddev->recovery_cp = le64_to_cpu(sb->resync_offset); | 1061 | mddev->recovery_cp = le64_to_cpu(sb->resync_offset); |
1062 | memcpy(mddev->uuid, sb->set_uuid, 16); | 1062 | memcpy(mddev->uuid, sb->set_uuid, 16); |
1063 | 1063 | ||
1064 | mddev->max_disks = (4096-256)/2; | 1064 | mddev->max_disks = (4096-256)/2; |
1065 | 1065 | ||
1066 | if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) && | 1066 | if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) && |
1067 | mddev->bitmap_file == NULL ) { | 1067 | mddev->bitmap_file == NULL ) { |
1068 | if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6 | 1068 | if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6 |
1069 | && mddev->level != 10) { | 1069 | && mddev->level != 10) { |
1070 | printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); | 1070 | printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); |
1071 | return -EINVAL; | 1071 | return -EINVAL; |
1072 | } | 1072 | } |
1073 | mddev->bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset); | 1073 | mddev->bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset); |
1074 | } | 1074 | } |
1075 | } else if (mddev->pers == NULL) { | 1075 | } else if (mddev->pers == NULL) { |
1076 | /* Insist of good event counter while assembling */ | 1076 | /* Insist of good event counter while assembling */ |
1077 | __u64 ev1 = le64_to_cpu(sb->events); | 1077 | __u64 ev1 = le64_to_cpu(sb->events); |
1078 | ++ev1; | 1078 | ++ev1; |
1079 | if (ev1 < mddev->events) | 1079 | if (ev1 < mddev->events) |
1080 | return -EINVAL; | 1080 | return -EINVAL; |
1081 | } else if (mddev->bitmap) { | 1081 | } else if (mddev->bitmap) { |
1082 | /* If adding to array with a bitmap, then we can accept an | 1082 | /* If adding to array with a bitmap, then we can accept an |
1083 | * older device, but not too old. | 1083 | * older device, but not too old. |
1084 | */ | 1084 | */ |
1085 | __u64 ev1 = le64_to_cpu(sb->events); | 1085 | __u64 ev1 = le64_to_cpu(sb->events); |
1086 | if (ev1 < mddev->bitmap->events_cleared) | 1086 | if (ev1 < mddev->bitmap->events_cleared) |
1087 | return 0; | 1087 | return 0; |
1088 | } else /* just a hot-add of a new device, leave raid_disk at -1 */ | 1088 | } else /* just a hot-add of a new device, leave raid_disk at -1 */ |
1089 | return 0; | 1089 | return 0; |
1090 | 1090 | ||
1091 | if (mddev->level != LEVEL_MULTIPATH) { | 1091 | if (mddev->level != LEVEL_MULTIPATH) { |
1092 | int role; | 1092 | int role; |
1093 | rdev->desc_nr = le32_to_cpu(sb->dev_number); | 1093 | rdev->desc_nr = le32_to_cpu(sb->dev_number); |
1094 | role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); | 1094 | role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); |
1095 | switch(role) { | 1095 | switch(role) { |
1096 | case 0xffff: /* spare */ | 1096 | case 0xffff: /* spare */ |
1097 | break; | 1097 | break; |
1098 | case 0xfffe: /* faulty */ | 1098 | case 0xfffe: /* faulty */ |
1099 | set_bit(Faulty, &rdev->flags); | 1099 | set_bit(Faulty, &rdev->flags); |
1100 | break; | 1100 | break; |
1101 | default: | 1101 | default: |
1102 | set_bit(In_sync, &rdev->flags); | 1102 | set_bit(In_sync, &rdev->flags); |
1103 | rdev->raid_disk = role; | 1103 | rdev->raid_disk = role; |
1104 | break; | 1104 | break; |
1105 | } | 1105 | } |
1106 | if (sb->devflags & WriteMostly1) | 1106 | if (sb->devflags & WriteMostly1) |
1107 | set_bit(WriteMostly, &rdev->flags); | 1107 | set_bit(WriteMostly, &rdev->flags); |
1108 | } else /* MULTIPATH are always insync */ | 1108 | } else /* MULTIPATH are always insync */ |
1109 | set_bit(In_sync, &rdev->flags); | 1109 | set_bit(In_sync, &rdev->flags); |
1110 | 1110 | ||
1111 | return 0; | 1111 | return 0; |
1112 | } | 1112 | } |
1113 | 1113 | ||
1114 | static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev) | 1114 | static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev) |
1115 | { | 1115 | { |
1116 | struct mdp_superblock_1 *sb; | 1116 | struct mdp_superblock_1 *sb; |
1117 | struct list_head *tmp; | 1117 | struct list_head *tmp; |
1118 | mdk_rdev_t *rdev2; | 1118 | mdk_rdev_t *rdev2; |
1119 | int max_dev, i; | 1119 | int max_dev, i; |
1120 | /* make rdev->sb match mddev and rdev data. */ | 1120 | /* make rdev->sb match mddev and rdev data. */ |
1121 | 1121 | ||
1122 | sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); | 1122 | sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); |
1123 | 1123 | ||
1124 | sb->feature_map = 0; | 1124 | sb->feature_map = 0; |
1125 | sb->pad0 = 0; | 1125 | sb->pad0 = 0; |
1126 | memset(sb->pad1, 0, sizeof(sb->pad1)); | 1126 | memset(sb->pad1, 0, sizeof(sb->pad1)); |
1127 | memset(sb->pad2, 0, sizeof(sb->pad2)); | 1127 | memset(sb->pad2, 0, sizeof(sb->pad2)); |
1128 | memset(sb->pad3, 0, sizeof(sb->pad3)); | 1128 | memset(sb->pad3, 0, sizeof(sb->pad3)); |
1129 | 1129 | ||
1130 | sb->utime = cpu_to_le64((__u64)mddev->utime); | 1130 | sb->utime = cpu_to_le64((__u64)mddev->utime); |
1131 | sb->events = cpu_to_le64(mddev->events); | 1131 | sb->events = cpu_to_le64(mddev->events); |
1132 | if (mddev->in_sync) | 1132 | if (mddev->in_sync) |
1133 | sb->resync_offset = cpu_to_le64(mddev->recovery_cp); | 1133 | sb->resync_offset = cpu_to_le64(mddev->recovery_cp); |
1134 | else | 1134 | else |
1135 | sb->resync_offset = cpu_to_le64(0); | 1135 | sb->resync_offset = cpu_to_le64(0); |
1136 | 1136 | ||
1137 | if (mddev->bitmap && mddev->bitmap_file == NULL) { | 1137 | if (mddev->bitmap && mddev->bitmap_file == NULL) { |
1138 | sb->bitmap_offset = cpu_to_le32((__u32)mddev->bitmap_offset); | 1138 | sb->bitmap_offset = cpu_to_le32((__u32)mddev->bitmap_offset); |
1139 | sb->feature_map = cpu_to_le32(MD_FEATURE_BITMAP_OFFSET); | 1139 | sb->feature_map = cpu_to_le32(MD_FEATURE_BITMAP_OFFSET); |
1140 | } | 1140 | } |
1141 | 1141 | ||
1142 | max_dev = 0; | 1142 | max_dev = 0; |
1143 | ITERATE_RDEV(mddev,rdev2,tmp) | 1143 | ITERATE_RDEV(mddev,rdev2,tmp) |
1144 | if (rdev2->desc_nr+1 > max_dev) | 1144 | if (rdev2->desc_nr+1 > max_dev) |
1145 | max_dev = rdev2->desc_nr+1; | 1145 | max_dev = rdev2->desc_nr+1; |
1146 | 1146 | ||
1147 | sb->max_dev = cpu_to_le32(max_dev); | 1147 | sb->max_dev = cpu_to_le32(max_dev); |
1148 | for (i=0; i<max_dev;i++) | 1148 | for (i=0; i<max_dev;i++) |
1149 | sb->dev_roles[i] = cpu_to_le16(0xfffe); | 1149 | sb->dev_roles[i] = cpu_to_le16(0xfffe); |
1150 | 1150 | ||
1151 | ITERATE_RDEV(mddev,rdev2,tmp) { | 1151 | ITERATE_RDEV(mddev,rdev2,tmp) { |
1152 | i = rdev2->desc_nr; | 1152 | i = rdev2->desc_nr; |
1153 | if (test_bit(Faulty, &rdev2->flags)) | 1153 | if (test_bit(Faulty, &rdev2->flags)) |
1154 | sb->dev_roles[i] = cpu_to_le16(0xfffe); | 1154 | sb->dev_roles[i] = cpu_to_le16(0xfffe); |
1155 | else if (test_bit(In_sync, &rdev2->flags)) | 1155 | else if (test_bit(In_sync, &rdev2->flags)) |
1156 | sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); | 1156 | sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); |
1157 | else | 1157 | else |
1158 | sb->dev_roles[i] = cpu_to_le16(0xffff); | 1158 | sb->dev_roles[i] = cpu_to_le16(0xffff); |
1159 | } | 1159 | } |
1160 | 1160 | ||
1161 | sb->recovery_offset = cpu_to_le64(0); /* not supported yet */ | 1161 | sb->recovery_offset = cpu_to_le64(0); /* not supported yet */ |
1162 | sb->sb_csum = calc_sb_1_csum(sb); | 1162 | sb->sb_csum = calc_sb_1_csum(sb); |
1163 | } | 1163 | } |
1164 | 1164 | ||
1165 | 1165 | ||
1166 | static struct super_type super_types[] = { | 1166 | static struct super_type super_types[] = { |
1167 | [0] = { | 1167 | [0] = { |
1168 | .name = "0.90.0", | 1168 | .name = "0.90.0", |
1169 | .owner = THIS_MODULE, | 1169 | .owner = THIS_MODULE, |
1170 | .load_super = super_90_load, | 1170 | .load_super = super_90_load, |
1171 | .validate_super = super_90_validate, | 1171 | .validate_super = super_90_validate, |
1172 | .sync_super = super_90_sync, | 1172 | .sync_super = super_90_sync, |
1173 | }, | 1173 | }, |
1174 | [1] = { | 1174 | [1] = { |
1175 | .name = "md-1", | 1175 | .name = "md-1", |
1176 | .owner = THIS_MODULE, | 1176 | .owner = THIS_MODULE, |
1177 | .load_super = super_1_load, | 1177 | .load_super = super_1_load, |
1178 | .validate_super = super_1_validate, | 1178 | .validate_super = super_1_validate, |
1179 | .sync_super = super_1_sync, | 1179 | .sync_super = super_1_sync, |
1180 | }, | 1180 | }, |
1181 | }; | 1181 | }; |
1182 | 1182 | ||
1183 | static mdk_rdev_t * match_dev_unit(mddev_t *mddev, mdk_rdev_t *dev) | 1183 | static mdk_rdev_t * match_dev_unit(mddev_t *mddev, mdk_rdev_t *dev) |
1184 | { | 1184 | { |
1185 | struct list_head *tmp; | 1185 | struct list_head *tmp; |
1186 | mdk_rdev_t *rdev; | 1186 | mdk_rdev_t *rdev; |
1187 | 1187 | ||
1188 | ITERATE_RDEV(mddev,rdev,tmp) | 1188 | ITERATE_RDEV(mddev,rdev,tmp) |
1189 | if (rdev->bdev->bd_contains == dev->bdev->bd_contains) | 1189 | if (rdev->bdev->bd_contains == dev->bdev->bd_contains) |
1190 | return rdev; | 1190 | return rdev; |
1191 | 1191 | ||
1192 | return NULL; | 1192 | return NULL; |
1193 | } | 1193 | } |
1194 | 1194 | ||
1195 | static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) | 1195 | static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) |
1196 | { | 1196 | { |
1197 | struct list_head *tmp; | 1197 | struct list_head *tmp; |
1198 | mdk_rdev_t *rdev; | 1198 | mdk_rdev_t *rdev; |
1199 | 1199 | ||
1200 | ITERATE_RDEV(mddev1,rdev,tmp) | 1200 | ITERATE_RDEV(mddev1,rdev,tmp) |
1201 | if (match_dev_unit(mddev2, rdev)) | 1201 | if (match_dev_unit(mddev2, rdev)) |
1202 | return 1; | 1202 | return 1; |
1203 | 1203 | ||
1204 | return 0; | 1204 | return 0; |
1205 | } | 1205 | } |
1206 | 1206 | ||
1207 | static LIST_HEAD(pending_raid_disks); | 1207 | static LIST_HEAD(pending_raid_disks); |
1208 | 1208 | ||
1209 | static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) | 1209 | static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) |
1210 | { | 1210 | { |
1211 | mdk_rdev_t *same_pdev; | 1211 | mdk_rdev_t *same_pdev; |
1212 | char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; | 1212 | char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; |
1213 | struct kobject *ko; | 1213 | struct kobject *ko; |
1214 | 1214 | ||
1215 | if (rdev->mddev) { | 1215 | if (rdev->mddev) { |
1216 | MD_BUG(); | 1216 | MD_BUG(); |
1217 | return -EINVAL; | 1217 | return -EINVAL; |
1218 | } | 1218 | } |
1219 | same_pdev = match_dev_unit(mddev, rdev); | 1219 | same_pdev = match_dev_unit(mddev, rdev); |
1220 | if (same_pdev) | 1220 | if (same_pdev) |
1221 | printk(KERN_WARNING | 1221 | printk(KERN_WARNING |
1222 | "%s: WARNING: %s appears to be on the same physical" | 1222 | "%s: WARNING: %s appears to be on the same physical" |
1223 | " disk as %s. True\n protection against single-disk" | 1223 | " disk as %s. True\n protection against single-disk" |
1224 | " failure might be compromised.\n", | 1224 | " failure might be compromised.\n", |
1225 | mdname(mddev), bdevname(rdev->bdev,b), | 1225 | mdname(mddev), bdevname(rdev->bdev,b), |
1226 | bdevname(same_pdev->bdev,b2)); | 1226 | bdevname(same_pdev->bdev,b2)); |
1227 | 1227 | ||
1228 | /* Verify rdev->desc_nr is unique. | 1228 | /* Verify rdev->desc_nr is unique. |
1229 | * If it is -1, assign a free number, else | 1229 | * If it is -1, assign a free number, else |
1230 | * check number is not in use | 1230 | * check number is not in use |
1231 | */ | 1231 | */ |
1232 | if (rdev->desc_nr < 0) { | 1232 | if (rdev->desc_nr < 0) { |
1233 | int choice = 0; | 1233 | int choice = 0; |
1234 | if (mddev->pers) choice = mddev->raid_disks; | 1234 | if (mddev->pers) choice = mddev->raid_disks; |
1235 | while (find_rdev_nr(mddev, choice)) | 1235 | while (find_rdev_nr(mddev, choice)) |
1236 | choice++; | 1236 | choice++; |
1237 | rdev->desc_nr = choice; | 1237 | rdev->desc_nr = choice; |
1238 | } else { | 1238 | } else { |
1239 | if (find_rdev_nr(mddev, rdev->desc_nr)) | 1239 | if (find_rdev_nr(mddev, rdev->desc_nr)) |
1240 | return -EBUSY; | 1240 | return -EBUSY; |
1241 | } | 1241 | } |
1242 | bdevname(rdev->bdev,b); | 1242 | bdevname(rdev->bdev,b); |
1243 | if (kobject_set_name(&rdev->kobj, "dev-%s", b) < 0) | 1243 | if (kobject_set_name(&rdev->kobj, "dev-%s", b) < 0) |
1244 | return -ENOMEM; | 1244 | return -ENOMEM; |
1245 | 1245 | ||
1246 | list_add(&rdev->same_set, &mddev->disks); | 1246 | list_add(&rdev->same_set, &mddev->disks); |
1247 | rdev->mddev = mddev; | 1247 | rdev->mddev = mddev; |
1248 | printk(KERN_INFO "md: bind<%s>\n", b); | 1248 | printk(KERN_INFO "md: bind<%s>\n", b); |
1249 | 1249 | ||
1250 | rdev->kobj.parent = &mddev->kobj; | 1250 | rdev->kobj.parent = &mddev->kobj; |
1251 | kobject_add(&rdev->kobj); | 1251 | kobject_add(&rdev->kobj); |
1252 | 1252 | ||
1253 | if (rdev->bdev->bd_part) | 1253 | if (rdev->bdev->bd_part) |
1254 | ko = &rdev->bdev->bd_part->kobj; | 1254 | ko = &rdev->bdev->bd_part->kobj; |
1255 | else | 1255 | else |
1256 | ko = &rdev->bdev->bd_disk->kobj; | 1256 | ko = &rdev->bdev->bd_disk->kobj; |
1257 | sysfs_create_link(&rdev->kobj, ko, "block"); | 1257 | sysfs_create_link(&rdev->kobj, ko, "block"); |
1258 | return 0; | 1258 | return 0; |
1259 | } | 1259 | } |
1260 | 1260 | ||
1261 | static void unbind_rdev_from_array(mdk_rdev_t * rdev) | 1261 | static void unbind_rdev_from_array(mdk_rdev_t * rdev) |
1262 | { | 1262 | { |
1263 | char b[BDEVNAME_SIZE]; | 1263 | char b[BDEVNAME_SIZE]; |
1264 | if (!rdev->mddev) { | 1264 | if (!rdev->mddev) { |
1265 | MD_BUG(); | 1265 | MD_BUG(); |
1266 | return; | 1266 | return; |
1267 | } | 1267 | } |
1268 | list_del_init(&rdev->same_set); | 1268 | list_del_init(&rdev->same_set); |
1269 | printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b)); | 1269 | printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b)); |
1270 | rdev->mddev = NULL; | 1270 | rdev->mddev = NULL; |
1271 | sysfs_remove_link(&rdev->kobj, "block"); | 1271 | sysfs_remove_link(&rdev->kobj, "block"); |
1272 | kobject_del(&rdev->kobj); | 1272 | kobject_del(&rdev->kobj); |
1273 | } | 1273 | } |
1274 | 1274 | ||
1275 | /* | 1275 | /* |
1276 | * prevent the device from being mounted, repartitioned or | 1276 | * prevent the device from being mounted, repartitioned or |
1277 | * otherwise reused by a RAID array (or any other kernel | 1277 | * otherwise reused by a RAID array (or any other kernel |
1278 | * subsystem), by bd_claiming the device. | 1278 | * subsystem), by bd_claiming the device. |
1279 | */ | 1279 | */ |
1280 | static int lock_rdev(mdk_rdev_t *rdev, dev_t dev) | 1280 | static int lock_rdev(mdk_rdev_t *rdev, dev_t dev) |
1281 | { | 1281 | { |
1282 | int err = 0; | 1282 | int err = 0; |
1283 | struct block_device *bdev; | 1283 | struct block_device *bdev; |
1284 | char b[BDEVNAME_SIZE]; | 1284 | char b[BDEVNAME_SIZE]; |
1285 | 1285 | ||
1286 | bdev = open_by_devnum(dev, FMODE_READ|FMODE_WRITE); | 1286 | bdev = open_by_devnum(dev, FMODE_READ|FMODE_WRITE); |
1287 | if (IS_ERR(bdev)) { | 1287 | if (IS_ERR(bdev)) { |
1288 | printk(KERN_ERR "md: could not open %s.\n", | 1288 | printk(KERN_ERR "md: could not open %s.\n", |
1289 | __bdevname(dev, b)); | 1289 | __bdevname(dev, b)); |
1290 | return PTR_ERR(bdev); | 1290 | return PTR_ERR(bdev); |
1291 | } | 1291 | } |
1292 | err = bd_claim(bdev, rdev); | 1292 | err = bd_claim(bdev, rdev); |
1293 | if (err) { | 1293 | if (err) { |
1294 | printk(KERN_ERR "md: could not bd_claim %s.\n", | 1294 | printk(KERN_ERR "md: could not bd_claim %s.\n", |
1295 | bdevname(bdev, b)); | 1295 | bdevname(bdev, b)); |
1296 | blkdev_put(bdev); | 1296 | blkdev_put(bdev); |
1297 | return err; | 1297 | return err; |
1298 | } | 1298 | } |
1299 | rdev->bdev = bdev; | 1299 | rdev->bdev = bdev; |
1300 | return err; | 1300 | return err; |
1301 | } | 1301 | } |
1302 | 1302 | ||
1303 | static void unlock_rdev(mdk_rdev_t *rdev) | 1303 | static void unlock_rdev(mdk_rdev_t *rdev) |
1304 | { | 1304 | { |
1305 | struct block_device *bdev = rdev->bdev; | 1305 | struct block_device *bdev = rdev->bdev; |
1306 | rdev->bdev = NULL; | 1306 | rdev->bdev = NULL; |
1307 | if (!bdev) | 1307 | if (!bdev) |
1308 | MD_BUG(); | 1308 | MD_BUG(); |
1309 | bd_release(bdev); | 1309 | bd_release(bdev); |
1310 | blkdev_put(bdev); | 1310 | blkdev_put(bdev); |
1311 | } | 1311 | } |
1312 | 1312 | ||
1313 | void md_autodetect_dev(dev_t dev); | 1313 | void md_autodetect_dev(dev_t dev); |
1314 | 1314 | ||
1315 | static void export_rdev(mdk_rdev_t * rdev) | 1315 | static void export_rdev(mdk_rdev_t * rdev) |
1316 | { | 1316 | { |
1317 | char b[BDEVNAME_SIZE]; | 1317 | char b[BDEVNAME_SIZE]; |
1318 | printk(KERN_INFO "md: export_rdev(%s)\n", | 1318 | printk(KERN_INFO "md: export_rdev(%s)\n", |
1319 | bdevname(rdev->bdev,b)); | 1319 | bdevname(rdev->bdev,b)); |
1320 | if (rdev->mddev) | 1320 | if (rdev->mddev) |
1321 | MD_BUG(); | 1321 | MD_BUG(); |
1322 | free_disk_sb(rdev); | 1322 | free_disk_sb(rdev); |
1323 | list_del_init(&rdev->same_set); | 1323 | list_del_init(&rdev->same_set); |
1324 | #ifndef MODULE | 1324 | #ifndef MODULE |
1325 | md_autodetect_dev(rdev->bdev->bd_dev); | 1325 | md_autodetect_dev(rdev->bdev->bd_dev); |
1326 | #endif | 1326 | #endif |
1327 | unlock_rdev(rdev); | 1327 | unlock_rdev(rdev); |
1328 | kobject_put(&rdev->kobj); | 1328 | kobject_put(&rdev->kobj); |
1329 | } | 1329 | } |
1330 | 1330 | ||
1331 | static void kick_rdev_from_array(mdk_rdev_t * rdev) | 1331 | static void kick_rdev_from_array(mdk_rdev_t * rdev) |
1332 | { | 1332 | { |
1333 | unbind_rdev_from_array(rdev); | 1333 | unbind_rdev_from_array(rdev); |
1334 | export_rdev(rdev); | 1334 | export_rdev(rdev); |
1335 | } | 1335 | } |
1336 | 1336 | ||
1337 | static void export_array(mddev_t *mddev) | 1337 | static void export_array(mddev_t *mddev) |
1338 | { | 1338 | { |
1339 | struct list_head *tmp; | 1339 | struct list_head *tmp; |
1340 | mdk_rdev_t *rdev; | 1340 | mdk_rdev_t *rdev; |
1341 | 1341 | ||
1342 | ITERATE_RDEV(mddev,rdev,tmp) { | 1342 | ITERATE_RDEV(mddev,rdev,tmp) { |
1343 | if (!rdev->mddev) { | 1343 | if (!rdev->mddev) { |
1344 | MD_BUG(); | 1344 | MD_BUG(); |
1345 | continue; | 1345 | continue; |
1346 | } | 1346 | } |
1347 | kick_rdev_from_array(rdev); | 1347 | kick_rdev_from_array(rdev); |
1348 | } | 1348 | } |
1349 | if (!list_empty(&mddev->disks)) | 1349 | if (!list_empty(&mddev->disks)) |
1350 | MD_BUG(); | 1350 | MD_BUG(); |
1351 | mddev->raid_disks = 0; | 1351 | mddev->raid_disks = 0; |
1352 | mddev->major_version = 0; | 1352 | mddev->major_version = 0; |
1353 | } | 1353 | } |
1354 | 1354 | ||
1355 | static void print_desc(mdp_disk_t *desc) | 1355 | static void print_desc(mdp_disk_t *desc) |
1356 | { | 1356 | { |
1357 | printk(" DISK<N:%d,(%d,%d),R:%d,S:%d>\n", desc->number, | 1357 | printk(" DISK<N:%d,(%d,%d),R:%d,S:%d>\n", desc->number, |
1358 | desc->major,desc->minor,desc->raid_disk,desc->state); | 1358 | desc->major,desc->minor,desc->raid_disk,desc->state); |
1359 | } | 1359 | } |
1360 | 1360 | ||
1361 | static void print_sb(mdp_super_t *sb) | 1361 | static void print_sb(mdp_super_t *sb) |
1362 | { | 1362 | { |
1363 | int i; | 1363 | int i; |
1364 | 1364 | ||
1365 | printk(KERN_INFO | 1365 | printk(KERN_INFO |
1366 | "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", | 1366 | "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", |
1367 | sb->major_version, sb->minor_version, sb->patch_version, | 1367 | sb->major_version, sb->minor_version, sb->patch_version, |
1368 | sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, | 1368 | sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, |
1369 | sb->ctime); | 1369 | sb->ctime); |
1370 | printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", | 1370 | printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", |
1371 | sb->level, sb->size, sb->nr_disks, sb->raid_disks, | 1371 | sb->level, sb->size, sb->nr_disks, sb->raid_disks, |
1372 | sb->md_minor, sb->layout, sb->chunk_size); | 1372 | sb->md_minor, sb->layout, sb->chunk_size); |
1373 | printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d" | 1373 | printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d" |
1374 | " FD:%d SD:%d CSUM:%08x E:%08lx\n", | 1374 | " FD:%d SD:%d CSUM:%08x E:%08lx\n", |
1375 | sb->utime, sb->state, sb->active_disks, sb->working_disks, | 1375 | sb->utime, sb->state, sb->active_disks, sb->working_disks, |
1376 | sb->failed_disks, sb->spare_disks, | 1376 | sb->failed_disks, sb->spare_disks, |
1377 | sb->sb_csum, (unsigned long)sb->events_lo); | 1377 | sb->sb_csum, (unsigned long)sb->events_lo); |
1378 | 1378 | ||
1379 | printk(KERN_INFO); | 1379 | printk(KERN_INFO); |
1380 | for (i = 0; i < MD_SB_DISKS; i++) { | 1380 | for (i = 0; i < MD_SB_DISKS; i++) { |
1381 | mdp_disk_t *desc; | 1381 | mdp_disk_t *desc; |
1382 | 1382 | ||
1383 | desc = sb->disks + i; | 1383 | desc = sb->disks + i; |
1384 | if (desc->number || desc->major || desc->minor || | 1384 | if (desc->number || desc->major || desc->minor || |
1385 | desc->raid_disk || (desc->state && (desc->state != 4))) { | 1385 | desc->raid_disk || (desc->state && (desc->state != 4))) { |
1386 | printk(" D %2d: ", i); | 1386 | printk(" D %2d: ", i); |
1387 | print_desc(desc); | 1387 | print_desc(desc); |
1388 | } | 1388 | } |
1389 | } | 1389 | } |
1390 | printk(KERN_INFO "md: THIS: "); | 1390 | printk(KERN_INFO "md: THIS: "); |
1391 | print_desc(&sb->this_disk); | 1391 | print_desc(&sb->this_disk); |
1392 | 1392 | ||
1393 | } | 1393 | } |
1394 | 1394 | ||
1395 | static void print_rdev(mdk_rdev_t *rdev) | 1395 | static void print_rdev(mdk_rdev_t *rdev) |
1396 | { | 1396 | { |
1397 | char b[BDEVNAME_SIZE]; | 1397 | char b[BDEVNAME_SIZE]; |
1398 | printk(KERN_INFO "md: rdev %s, SZ:%08llu F:%d S:%d DN:%u\n", | 1398 | printk(KERN_INFO "md: rdev %s, SZ:%08llu F:%d S:%d DN:%u\n", |
1399 | bdevname(rdev->bdev,b), (unsigned long long)rdev->size, | 1399 | bdevname(rdev->bdev,b), (unsigned long long)rdev->size, |
1400 | test_bit(Faulty, &rdev->flags), test_bit(In_sync, &rdev->flags), | 1400 | test_bit(Faulty, &rdev->flags), test_bit(In_sync, &rdev->flags), |
1401 | rdev->desc_nr); | 1401 | rdev->desc_nr); |
1402 | if (rdev->sb_loaded) { | 1402 | if (rdev->sb_loaded) { |
1403 | printk(KERN_INFO "md: rdev superblock:\n"); | 1403 | printk(KERN_INFO "md: rdev superblock:\n"); |
1404 | print_sb((mdp_super_t*)page_address(rdev->sb_page)); | 1404 | print_sb((mdp_super_t*)page_address(rdev->sb_page)); |
1405 | } else | 1405 | } else |
1406 | printk(KERN_INFO "md: no rdev superblock!\n"); | 1406 | printk(KERN_INFO "md: no rdev superblock!\n"); |
1407 | } | 1407 | } |
1408 | 1408 | ||
1409 | void md_print_devices(void) | 1409 | void md_print_devices(void) |
1410 | { | 1410 | { |
1411 | struct list_head *tmp, *tmp2; | 1411 | struct list_head *tmp, *tmp2; |
1412 | mdk_rdev_t *rdev; | 1412 | mdk_rdev_t *rdev; |
1413 | mddev_t *mddev; | 1413 | mddev_t *mddev; |
1414 | char b[BDEVNAME_SIZE]; | 1414 | char b[BDEVNAME_SIZE]; |
1415 | 1415 | ||
1416 | printk("\n"); | 1416 | printk("\n"); |
1417 | printk("md: **********************************\n"); | 1417 | printk("md: **********************************\n"); |
1418 | printk("md: * <COMPLETE RAID STATE PRINTOUT> *\n"); | 1418 | printk("md: * <COMPLETE RAID STATE PRINTOUT> *\n"); |
1419 | printk("md: **********************************\n"); | 1419 | printk("md: **********************************\n"); |
1420 | ITERATE_MDDEV(mddev,tmp) { | 1420 | ITERATE_MDDEV(mddev,tmp) { |
1421 | 1421 | ||
1422 | if (mddev->bitmap) | 1422 | if (mddev->bitmap) |
1423 | bitmap_print_sb(mddev->bitmap); | 1423 | bitmap_print_sb(mddev->bitmap); |
1424 | else | 1424 | else |
1425 | printk("%s: ", mdname(mddev)); | 1425 | printk("%s: ", mdname(mddev)); |
1426 | ITERATE_RDEV(mddev,rdev,tmp2) | 1426 | ITERATE_RDEV(mddev,rdev,tmp2) |
1427 | printk("<%s>", bdevname(rdev->bdev,b)); | 1427 | printk("<%s>", bdevname(rdev->bdev,b)); |
1428 | printk("\n"); | 1428 | printk("\n"); |
1429 | 1429 | ||
1430 | ITERATE_RDEV(mddev,rdev,tmp2) | 1430 | ITERATE_RDEV(mddev,rdev,tmp2) |
1431 | print_rdev(rdev); | 1431 | print_rdev(rdev); |
1432 | } | 1432 | } |
1433 | printk("md: **********************************\n"); | 1433 | printk("md: **********************************\n"); |
1434 | printk("\n"); | 1434 | printk("\n"); |
1435 | } | 1435 | } |
1436 | 1436 | ||
1437 | 1437 | ||
1438 | static void sync_sbs(mddev_t * mddev) | 1438 | static void sync_sbs(mddev_t * mddev) |
1439 | { | 1439 | { |
1440 | mdk_rdev_t *rdev; | 1440 | mdk_rdev_t *rdev; |
1441 | struct list_head *tmp; | 1441 | struct list_head *tmp; |
1442 | 1442 | ||
1443 | ITERATE_RDEV(mddev,rdev,tmp) { | 1443 | ITERATE_RDEV(mddev,rdev,tmp) { |
1444 | super_types[mddev->major_version]. | 1444 | super_types[mddev->major_version]. |
1445 | sync_super(mddev, rdev); | 1445 | sync_super(mddev, rdev); |
1446 | rdev->sb_loaded = 1; | 1446 | rdev->sb_loaded = 1; |
1447 | } | 1447 | } |
1448 | } | 1448 | } |
1449 | 1449 | ||
1450 | static void md_update_sb(mddev_t * mddev) | 1450 | static void md_update_sb(mddev_t * mddev) |
1451 | { | 1451 | { |
1452 | int err; | 1452 | int err; |
1453 | struct list_head *tmp; | 1453 | struct list_head *tmp; |
1454 | mdk_rdev_t *rdev; | 1454 | mdk_rdev_t *rdev; |
1455 | int sync_req; | 1455 | int sync_req; |
1456 | 1456 | ||
1457 | repeat: | 1457 | repeat: |
1458 | spin_lock_irq(&mddev->write_lock); | 1458 | spin_lock_irq(&mddev->write_lock); |
1459 | sync_req = mddev->in_sync; | 1459 | sync_req = mddev->in_sync; |
1460 | mddev->utime = get_seconds(); | 1460 | mddev->utime = get_seconds(); |
1461 | mddev->events ++; | 1461 | mddev->events ++; |
1462 | 1462 | ||
1463 | if (!mddev->events) { | 1463 | if (!mddev->events) { |
1464 | /* | 1464 | /* |
1465 | * oops, this 64-bit counter should never wrap. | 1465 | * oops, this 64-bit counter should never wrap. |
1466 | * Either we are in around ~1 trillion A.C., assuming | 1466 | * Either we are in around ~1 trillion A.C., assuming |
1467 | * 1 reboot per second, or we have a bug: | 1467 | * 1 reboot per second, or we have a bug: |
1468 | */ | 1468 | */ |
1469 | MD_BUG(); | 1469 | MD_BUG(); |
1470 | mddev->events --; | 1470 | mddev->events --; |
1471 | } | 1471 | } |
1472 | mddev->sb_dirty = 2; | 1472 | mddev->sb_dirty = 2; |
1473 | sync_sbs(mddev); | 1473 | sync_sbs(mddev); |
1474 | 1474 | ||
1475 | /* | 1475 | /* |
1476 | * do not write anything to disk if using | 1476 | * do not write anything to disk if using |
1477 | * nonpersistent superblocks | 1477 | * nonpersistent superblocks |
1478 | */ | 1478 | */ |
1479 | if (!mddev->persistent) { | 1479 | if (!mddev->persistent) { |
1480 | mddev->sb_dirty = 0; | 1480 | mddev->sb_dirty = 0; |
1481 | spin_unlock_irq(&mddev->write_lock); | 1481 | spin_unlock_irq(&mddev->write_lock); |
1482 | wake_up(&mddev->sb_wait); | 1482 | wake_up(&mddev->sb_wait); |
1483 | return; | 1483 | return; |
1484 | } | 1484 | } |
1485 | spin_unlock_irq(&mddev->write_lock); | 1485 | spin_unlock_irq(&mddev->write_lock); |
1486 | 1486 | ||
1487 | dprintk(KERN_INFO | 1487 | dprintk(KERN_INFO |
1488 | "md: updating %s RAID superblock on device (in sync %d)\n", | 1488 | "md: updating %s RAID superblock on device (in sync %d)\n", |
1489 | mdname(mddev),mddev->in_sync); | 1489 | mdname(mddev),mddev->in_sync); |
1490 | 1490 | ||
1491 | err = bitmap_update_sb(mddev->bitmap); | 1491 | err = bitmap_update_sb(mddev->bitmap); |
1492 | ITERATE_RDEV(mddev,rdev,tmp) { | 1492 | ITERATE_RDEV(mddev,rdev,tmp) { |
1493 | char b[BDEVNAME_SIZE]; | 1493 | char b[BDEVNAME_SIZE]; |
1494 | dprintk(KERN_INFO "md: "); | 1494 | dprintk(KERN_INFO "md: "); |
1495 | if (test_bit(Faulty, &rdev->flags)) | 1495 | if (test_bit(Faulty, &rdev->flags)) |
1496 | dprintk("(skipping faulty "); | 1496 | dprintk("(skipping faulty "); |
1497 | 1497 | ||
1498 | dprintk("%s ", bdevname(rdev->bdev,b)); | 1498 | dprintk("%s ", bdevname(rdev->bdev,b)); |
1499 | if (!test_bit(Faulty, &rdev->flags)) { | 1499 | if (!test_bit(Faulty, &rdev->flags)) { |
1500 | md_super_write(mddev,rdev, | 1500 | md_super_write(mddev,rdev, |
1501 | rdev->sb_offset<<1, rdev->sb_size, | 1501 | rdev->sb_offset<<1, rdev->sb_size, |
1502 | rdev->sb_page); | 1502 | rdev->sb_page); |
1503 | dprintk(KERN_INFO "(write) %s's sb offset: %llu\n", | 1503 | dprintk(KERN_INFO "(write) %s's sb offset: %llu\n", |
1504 | bdevname(rdev->bdev,b), | 1504 | bdevname(rdev->bdev,b), |
1505 | (unsigned long long)rdev->sb_offset); | 1505 | (unsigned long long)rdev->sb_offset); |
1506 | 1506 | ||
1507 | } else | 1507 | } else |
1508 | dprintk(")\n"); | 1508 | dprintk(")\n"); |
1509 | if (mddev->level == LEVEL_MULTIPATH) | 1509 | if (mddev->level == LEVEL_MULTIPATH) |
1510 | /* only need to write one superblock... */ | 1510 | /* only need to write one superblock... */ |
1511 | break; | 1511 | break; |
1512 | } | 1512 | } |
1513 | md_super_wait(mddev); | 1513 | md_super_wait(mddev); |
1514 | /* if there was a failure, sb_dirty was set to 1, and we re-write super */ | 1514 | /* if there was a failure, sb_dirty was set to 1, and we re-write super */ |
1515 | 1515 | ||
1516 | spin_lock_irq(&mddev->write_lock); | 1516 | spin_lock_irq(&mddev->write_lock); |
1517 | if (mddev->in_sync != sync_req|| mddev->sb_dirty == 1) { | 1517 | if (mddev->in_sync != sync_req|| mddev->sb_dirty == 1) { |
1518 | /* have to write it out again */ | 1518 | /* have to write it out again */ |
1519 | spin_unlock_irq(&mddev->write_lock); | 1519 | spin_unlock_irq(&mddev->write_lock); |
1520 | goto repeat; | 1520 | goto repeat; |
1521 | } | 1521 | } |
1522 | mddev->sb_dirty = 0; | 1522 | mddev->sb_dirty = 0; |
1523 | spin_unlock_irq(&mddev->write_lock); | 1523 | spin_unlock_irq(&mddev->write_lock); |
1524 | wake_up(&mddev->sb_wait); | 1524 | wake_up(&mddev->sb_wait); |
1525 | 1525 | ||
1526 | } | 1526 | } |
1527 | 1527 | ||
1528 | /* words written to sysfs files may, or my not, be \n terminated. | 1528 | /* words written to sysfs files may, or my not, be \n terminated. |
1529 | * We want to accept with case. For this we use cmd_match. | 1529 | * We want to accept with case. For this we use cmd_match. |
1530 | */ | 1530 | */ |
1531 | static int cmd_match(const char *cmd, const char *str) | 1531 | static int cmd_match(const char *cmd, const char *str) |
1532 | { | 1532 | { |
1533 | /* See if cmd, written into a sysfs file, matches | 1533 | /* See if cmd, written into a sysfs file, matches |
1534 | * str. They must either be the same, or cmd can | 1534 | * str. They must either be the same, or cmd can |
1535 | * have a trailing newline | 1535 | * have a trailing newline |
1536 | */ | 1536 | */ |
1537 | while (*cmd && *str && *cmd == *str) { | 1537 | while (*cmd && *str && *cmd == *str) { |
1538 | cmd++; | 1538 | cmd++; |
1539 | str++; | 1539 | str++; |
1540 | } | 1540 | } |
1541 | if (*cmd == '\n') | 1541 | if (*cmd == '\n') |
1542 | cmd++; | 1542 | cmd++; |
1543 | if (*str || *cmd) | 1543 | if (*str || *cmd) |
1544 | return 0; | 1544 | return 0; |
1545 | return 1; | 1545 | return 1; |
1546 | } | 1546 | } |
1547 | 1547 | ||
1548 | struct rdev_sysfs_entry { | 1548 | struct rdev_sysfs_entry { |
1549 | struct attribute attr; | 1549 | struct attribute attr; |
1550 | ssize_t (*show)(mdk_rdev_t *, char *); | 1550 | ssize_t (*show)(mdk_rdev_t *, char *); |
1551 | ssize_t (*store)(mdk_rdev_t *, const char *, size_t); | 1551 | ssize_t (*store)(mdk_rdev_t *, const char *, size_t); |
1552 | }; | 1552 | }; |
1553 | 1553 | ||
1554 | static ssize_t | 1554 | static ssize_t |
1555 | state_show(mdk_rdev_t *rdev, char *page) | 1555 | state_show(mdk_rdev_t *rdev, char *page) |
1556 | { | 1556 | { |
1557 | char *sep = ""; | 1557 | char *sep = ""; |
1558 | int len=0; | 1558 | int len=0; |
1559 | 1559 | ||
1560 | if (test_bit(Faulty, &rdev->flags)) { | 1560 | if (test_bit(Faulty, &rdev->flags)) { |
1561 | len+= sprintf(page+len, "%sfaulty",sep); | 1561 | len+= sprintf(page+len, "%sfaulty",sep); |
1562 | sep = ","; | 1562 | sep = ","; |
1563 | } | 1563 | } |
1564 | if (test_bit(In_sync, &rdev->flags)) { | 1564 | if (test_bit(In_sync, &rdev->flags)) { |
1565 | len += sprintf(page+len, "%sin_sync",sep); | 1565 | len += sprintf(page+len, "%sin_sync",sep); |
1566 | sep = ","; | 1566 | sep = ","; |
1567 | } | 1567 | } |
1568 | if (!test_bit(Faulty, &rdev->flags) && | 1568 | if (!test_bit(Faulty, &rdev->flags) && |
1569 | !test_bit(In_sync, &rdev->flags)) { | 1569 | !test_bit(In_sync, &rdev->flags)) { |
1570 | len += sprintf(page+len, "%sspare", sep); | 1570 | len += sprintf(page+len, "%sspare", sep); |
1571 | sep = ","; | 1571 | sep = ","; |
1572 | } | 1572 | } |
1573 | return len+sprintf(page+len, "\n"); | 1573 | return len+sprintf(page+len, "\n"); |
1574 | } | 1574 | } |
1575 | 1575 | ||
1576 | static struct rdev_sysfs_entry | 1576 | static struct rdev_sysfs_entry |
1577 | rdev_state = __ATTR_RO(state); | 1577 | rdev_state = __ATTR_RO(state); |
1578 | 1578 | ||
1579 | static ssize_t | 1579 | static ssize_t |
1580 | super_show(mdk_rdev_t *rdev, char *page) | 1580 | super_show(mdk_rdev_t *rdev, char *page) |
1581 | { | 1581 | { |
1582 | if (rdev->sb_loaded && rdev->sb_size) { | 1582 | if (rdev->sb_loaded && rdev->sb_size) { |
1583 | memcpy(page, page_address(rdev->sb_page), rdev->sb_size); | 1583 | memcpy(page, page_address(rdev->sb_page), rdev->sb_size); |
1584 | return rdev->sb_size; | 1584 | return rdev->sb_size; |
1585 | } else | 1585 | } else |
1586 | return 0; | 1586 | return 0; |
1587 | } | 1587 | } |
1588 | static struct rdev_sysfs_entry rdev_super = __ATTR_RO(super); | 1588 | static struct rdev_sysfs_entry rdev_super = __ATTR_RO(super); |
1589 | 1589 | ||
1590 | static struct attribute *rdev_default_attrs[] = { | 1590 | static struct attribute *rdev_default_attrs[] = { |
1591 | &rdev_state.attr, | 1591 | &rdev_state.attr, |
1592 | &rdev_super.attr, | 1592 | &rdev_super.attr, |
1593 | NULL, | 1593 | NULL, |
1594 | }; | 1594 | }; |
1595 | static ssize_t | 1595 | static ssize_t |
1596 | rdev_attr_show(struct kobject *kobj, struct attribute *attr, char *page) | 1596 | rdev_attr_show(struct kobject *kobj, struct attribute *attr, char *page) |
1597 | { | 1597 | { |
1598 | struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); | 1598 | struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); |
1599 | mdk_rdev_t *rdev = container_of(kobj, mdk_rdev_t, kobj); | 1599 | mdk_rdev_t *rdev = container_of(kobj, mdk_rdev_t, kobj); |
1600 | 1600 | ||
1601 | if (!entry->show) | 1601 | if (!entry->show) |
1602 | return -EIO; | 1602 | return -EIO; |
1603 | return entry->show(rdev, page); | 1603 | return entry->show(rdev, page); |
1604 | } | 1604 | } |
1605 | 1605 | ||
1606 | static ssize_t | 1606 | static ssize_t |
1607 | rdev_attr_store(struct kobject *kobj, struct attribute *attr, | 1607 | rdev_attr_store(struct kobject *kobj, struct attribute *attr, |
1608 | const char *page, size_t length) | 1608 | const char *page, size_t length) |
1609 | { | 1609 | { |
1610 | struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); | 1610 | struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); |
1611 | mdk_rdev_t *rdev = container_of(kobj, mdk_rdev_t, kobj); | 1611 | mdk_rdev_t *rdev = container_of(kobj, mdk_rdev_t, kobj); |
1612 | 1612 | ||
1613 | if (!entry->store) | 1613 | if (!entry->store) |
1614 | return -EIO; | 1614 | return -EIO; |
1615 | return entry->store(rdev, page, length); | 1615 | return entry->store(rdev, page, length); |
1616 | } | 1616 | } |
1617 | 1617 | ||
1618 | static void rdev_free(struct kobject *ko) | 1618 | static void rdev_free(struct kobject *ko) |
1619 | { | 1619 | { |
1620 | mdk_rdev_t *rdev = container_of(ko, mdk_rdev_t, kobj); | 1620 | mdk_rdev_t *rdev = container_of(ko, mdk_rdev_t, kobj); |
1621 | kfree(rdev); | 1621 | kfree(rdev); |
1622 | } | 1622 | } |
1623 | static struct sysfs_ops rdev_sysfs_ops = { | 1623 | static struct sysfs_ops rdev_sysfs_ops = { |
1624 | .show = rdev_attr_show, | 1624 | .show = rdev_attr_show, |
1625 | .store = rdev_attr_store, | 1625 | .store = rdev_attr_store, |
1626 | }; | 1626 | }; |
1627 | static struct kobj_type rdev_ktype = { | 1627 | static struct kobj_type rdev_ktype = { |
1628 | .release = rdev_free, | 1628 | .release = rdev_free, |
1629 | .sysfs_ops = &rdev_sysfs_ops, | 1629 | .sysfs_ops = &rdev_sysfs_ops, |
1630 | .default_attrs = rdev_default_attrs, | 1630 | .default_attrs = rdev_default_attrs, |
1631 | }; | 1631 | }; |
1632 | 1632 | ||
1633 | /* | 1633 | /* |
1634 | * Import a device. If 'super_format' >= 0, then sanity check the superblock | 1634 | * Import a device. If 'super_format' >= 0, then sanity check the superblock |
1635 | * | 1635 | * |
1636 | * mark the device faulty if: | 1636 | * mark the device faulty if: |
1637 | * | 1637 | * |
1638 | * - the device is nonexistent (zero size) | 1638 | * - the device is nonexistent (zero size) |
1639 | * - the device has no valid superblock | 1639 | * - the device has no valid superblock |
1640 | * | 1640 | * |
1641 | * a faulty rdev _never_ has rdev->sb set. | 1641 | * a faulty rdev _never_ has rdev->sb set. |
1642 | */ | 1642 | */ |
1643 | static mdk_rdev_t *md_import_device(dev_t newdev, int super_format, int super_minor) | 1643 | static mdk_rdev_t *md_import_device(dev_t newdev, int super_format, int super_minor) |
1644 | { | 1644 | { |
1645 | char b[BDEVNAME_SIZE]; | 1645 | char b[BDEVNAME_SIZE]; |
1646 | int err; | 1646 | int err; |
1647 | mdk_rdev_t *rdev; | 1647 | mdk_rdev_t *rdev; |
1648 | sector_t size; | 1648 | sector_t size; |
1649 | 1649 | ||
1650 | rdev = kzalloc(sizeof(*rdev), GFP_KERNEL); | 1650 | rdev = kzalloc(sizeof(*rdev), GFP_KERNEL); |
1651 | if (!rdev) { | 1651 | if (!rdev) { |
1652 | printk(KERN_ERR "md: could not alloc mem for new device!\n"); | 1652 | printk(KERN_ERR "md: could not alloc mem for new device!\n"); |
1653 | return ERR_PTR(-ENOMEM); | 1653 | return ERR_PTR(-ENOMEM); |
1654 | } | 1654 | } |
1655 | 1655 | ||
1656 | if ((err = alloc_disk_sb(rdev))) | 1656 | if ((err = alloc_disk_sb(rdev))) |
1657 | goto abort_free; | 1657 | goto abort_free; |
1658 | 1658 | ||
1659 | err = lock_rdev(rdev, newdev); | 1659 | err = lock_rdev(rdev, newdev); |
1660 | if (err) | 1660 | if (err) |
1661 | goto abort_free; | 1661 | goto abort_free; |
1662 | 1662 | ||
1663 | rdev->kobj.parent = NULL; | 1663 | rdev->kobj.parent = NULL; |
1664 | rdev->kobj.ktype = &rdev_ktype; | 1664 | rdev->kobj.ktype = &rdev_ktype; |
1665 | kobject_init(&rdev->kobj); | 1665 | kobject_init(&rdev->kobj); |
1666 | 1666 | ||
1667 | rdev->desc_nr = -1; | 1667 | rdev->desc_nr = -1; |
1668 | rdev->flags = 0; | 1668 | rdev->flags = 0; |
1669 | rdev->data_offset = 0; | 1669 | rdev->data_offset = 0; |
1670 | atomic_set(&rdev->nr_pending, 0); | 1670 | atomic_set(&rdev->nr_pending, 0); |
1671 | atomic_set(&rdev->read_errors, 0); | 1671 | atomic_set(&rdev->read_errors, 0); |
1672 | 1672 | ||
1673 | size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; | 1673 | size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; |
1674 | if (!size) { | 1674 | if (!size) { |
1675 | printk(KERN_WARNING | 1675 | printk(KERN_WARNING |
1676 | "md: %s has zero or unknown size, marking faulty!\n", | 1676 | "md: %s has zero or unknown size, marking faulty!\n", |
1677 | bdevname(rdev->bdev,b)); | 1677 | bdevname(rdev->bdev,b)); |
1678 | err = -EINVAL; | 1678 | err = -EINVAL; |
1679 | goto abort_free; | 1679 | goto abort_free; |
1680 | } | 1680 | } |
1681 | 1681 | ||
1682 | if (super_format >= 0) { | 1682 | if (super_format >= 0) { |
1683 | err = super_types[super_format]. | 1683 | err = super_types[super_format]. |
1684 | load_super(rdev, NULL, super_minor); | 1684 | load_super(rdev, NULL, super_minor); |
1685 | if (err == -EINVAL) { | 1685 | if (err == -EINVAL) { |
1686 | printk(KERN_WARNING | 1686 | printk(KERN_WARNING |
1687 | "md: %s has invalid sb, not importing!\n", | 1687 | "md: %s has invalid sb, not importing!\n", |
1688 | bdevname(rdev->bdev,b)); | 1688 | bdevname(rdev->bdev,b)); |
1689 | goto abort_free; | 1689 | goto abort_free; |
1690 | } | 1690 | } |
1691 | if (err < 0) { | 1691 | if (err < 0) { |
1692 | printk(KERN_WARNING | 1692 | printk(KERN_WARNING |
1693 | "md: could not read %s's sb, not importing!\n", | 1693 | "md: could not read %s's sb, not importing!\n", |
1694 | bdevname(rdev->bdev,b)); | 1694 | bdevname(rdev->bdev,b)); |
1695 | goto abort_free; | 1695 | goto abort_free; |
1696 | } | 1696 | } |
1697 | } | 1697 | } |
1698 | INIT_LIST_HEAD(&rdev->same_set); | 1698 | INIT_LIST_HEAD(&rdev->same_set); |
1699 | 1699 | ||
1700 | return rdev; | 1700 | return rdev; |
1701 | 1701 | ||
1702 | abort_free: | 1702 | abort_free: |
1703 | if (rdev->sb_page) { | 1703 | if (rdev->sb_page) { |
1704 | if (rdev->bdev) | 1704 | if (rdev->bdev) |
1705 | unlock_rdev(rdev); | 1705 | unlock_rdev(rdev); |
1706 | free_disk_sb(rdev); | 1706 | free_disk_sb(rdev); |
1707 | } | 1707 | } |
1708 | kfree(rdev); | 1708 | kfree(rdev); |
1709 | return ERR_PTR(err); | 1709 | return ERR_PTR(err); |
1710 | } | 1710 | } |
1711 | 1711 | ||
1712 | /* | 1712 | /* |
1713 | * Check a full RAID array for plausibility | 1713 | * Check a full RAID array for plausibility |
1714 | */ | 1714 | */ |
1715 | 1715 | ||
1716 | 1716 | ||
1717 | static void analyze_sbs(mddev_t * mddev) | 1717 | static void analyze_sbs(mddev_t * mddev) |
1718 | { | 1718 | { |
1719 | int i; | 1719 | int i; |
1720 | struct list_head *tmp; | 1720 | struct list_head *tmp; |
1721 | mdk_rdev_t *rdev, *freshest; | 1721 | mdk_rdev_t *rdev, *freshest; |
1722 | char b[BDEVNAME_SIZE]; | 1722 | char b[BDEVNAME_SIZE]; |
1723 | 1723 | ||
1724 | freshest = NULL; | 1724 | freshest = NULL; |
1725 | ITERATE_RDEV(mddev,rdev,tmp) | 1725 | ITERATE_RDEV(mddev,rdev,tmp) |
1726 | switch (super_types[mddev->major_version]. | 1726 | switch (super_types[mddev->major_version]. |
1727 | load_super(rdev, freshest, mddev->minor_version)) { | 1727 | load_super(rdev, freshest, mddev->minor_version)) { |
1728 | case 1: | 1728 | case 1: |
1729 | freshest = rdev; | 1729 | freshest = rdev; |
1730 | break; | 1730 | break; |
1731 | case 0: | 1731 | case 0: |
1732 | break; | 1732 | break; |
1733 | default: | 1733 | default: |
1734 | printk( KERN_ERR \ | 1734 | printk( KERN_ERR \ |
1735 | "md: fatal superblock inconsistency in %s" | 1735 | "md: fatal superblock inconsistency in %s" |
1736 | " -- removing from array\n", | 1736 | " -- removing from array\n", |
1737 | bdevname(rdev->bdev,b)); | 1737 | bdevname(rdev->bdev,b)); |
1738 | kick_rdev_from_array(rdev); | 1738 | kick_rdev_from_array(rdev); |
1739 | } | 1739 | } |
1740 | 1740 | ||
1741 | 1741 | ||
1742 | super_types[mddev->major_version]. | 1742 | super_types[mddev->major_version]. |
1743 | validate_super(mddev, freshest); | 1743 | validate_super(mddev, freshest); |
1744 | 1744 | ||
1745 | i = 0; | 1745 | i = 0; |
1746 | ITERATE_RDEV(mddev,rdev,tmp) { | 1746 | ITERATE_RDEV(mddev,rdev,tmp) { |
1747 | if (rdev != freshest) | 1747 | if (rdev != freshest) |
1748 | if (super_types[mddev->major_version]. | 1748 | if (super_types[mddev->major_version]. |
1749 | validate_super(mddev, rdev)) { | 1749 | validate_super(mddev, rdev)) { |
1750 | printk(KERN_WARNING "md: kicking non-fresh %s" | 1750 | printk(KERN_WARNING "md: kicking non-fresh %s" |
1751 | " from array!\n", | 1751 | " from array!\n", |
1752 | bdevname(rdev->bdev,b)); | 1752 | bdevname(rdev->bdev,b)); |
1753 | kick_rdev_from_array(rdev); | 1753 | kick_rdev_from_array(rdev); |
1754 | continue; | 1754 | continue; |
1755 | } | 1755 | } |
1756 | if (mddev->level == LEVEL_MULTIPATH) { | 1756 | if (mddev->level == LEVEL_MULTIPATH) { |
1757 | rdev->desc_nr = i++; | 1757 | rdev->desc_nr = i++; |
1758 | rdev->raid_disk = rdev->desc_nr; | 1758 | rdev->raid_disk = rdev->desc_nr; |
1759 | set_bit(In_sync, &rdev->flags); | 1759 | set_bit(In_sync, &rdev->flags); |
1760 | } | 1760 | } |
1761 | } | 1761 | } |
1762 | 1762 | ||
1763 | 1763 | ||
1764 | 1764 | ||
1765 | if (mddev->recovery_cp != MaxSector && | 1765 | if (mddev->recovery_cp != MaxSector && |
1766 | mddev->level >= 1) | 1766 | mddev->level >= 1) |
1767 | printk(KERN_ERR "md: %s: raid array is not clean" | 1767 | printk(KERN_ERR "md: %s: raid array is not clean" |
1768 | " -- starting background reconstruction\n", | 1768 | " -- starting background reconstruction\n", |
1769 | mdname(mddev)); | 1769 | mdname(mddev)); |
1770 | 1770 | ||
1771 | } | 1771 | } |
1772 | 1772 | ||
1773 | static ssize_t | 1773 | static ssize_t |
1774 | level_show(mddev_t *mddev, char *page) | 1774 | level_show(mddev_t *mddev, char *page) |
1775 | { | 1775 | { |
1776 | struct mdk_personality *p = mddev->pers; | 1776 | struct mdk_personality *p = mddev->pers; |
1777 | if (p == NULL && mddev->raid_disks == 0) | 1777 | if (p == NULL && mddev->raid_disks == 0) |
1778 | return 0; | 1778 | return 0; |
1779 | if (mddev->level >= 0) | 1779 | if (mddev->level >= 0) |
1780 | return sprintf(page, "raid%d\n", mddev->level); | 1780 | return sprintf(page, "raid%d\n", mddev->level); |
1781 | else | 1781 | else |
1782 | return sprintf(page, "%s\n", p->name); | 1782 | return sprintf(page, "%s\n", p->name); |
1783 | } | 1783 | } |
1784 | 1784 | ||
1785 | static struct md_sysfs_entry md_level = __ATTR_RO(level); | 1785 | static struct md_sysfs_entry md_level = __ATTR_RO(level); |
1786 | 1786 | ||
1787 | static ssize_t | 1787 | static ssize_t |
1788 | raid_disks_show(mddev_t *mddev, char *page) | 1788 | raid_disks_show(mddev_t *mddev, char *page) |
1789 | { | 1789 | { |
1790 | if (mddev->raid_disks == 0) | 1790 | if (mddev->raid_disks == 0) |
1791 | return 0; | 1791 | return 0; |
1792 | return sprintf(page, "%d\n", mddev->raid_disks); | 1792 | return sprintf(page, "%d\n", mddev->raid_disks); |
1793 | } | 1793 | } |
1794 | 1794 | ||
1795 | static struct md_sysfs_entry md_raid_disks = __ATTR_RO(raid_disks); | 1795 | static struct md_sysfs_entry md_raid_disks = __ATTR_RO(raid_disks); |
1796 | 1796 | ||
1797 | static ssize_t | 1797 | static ssize_t |
1798 | chunk_size_show(mddev_t *mddev, char *page) | ||
1799 | { | ||
1800 | return sprintf(page, "%d\n", mddev->chunk_size); | ||
1801 | } | ||
1802 | |||
1803 | static ssize_t | ||
1804 | chunk_size_store(mddev_t *mddev, const char *buf, size_t len) | ||
1805 | { | ||
1806 | /* can only set chunk_size if array is not yet active */ | ||
1807 | char *e; | ||
1808 | unsigned long n = simple_strtoul(buf, &e, 10); | ||
1809 | |||
1810 | if (mddev->pers) | ||
1811 | return -EBUSY; | ||
1812 | if (!*buf || (*e && *e != '\n')) | ||
1813 | return -EINVAL; | ||
1814 | |||
1815 | mddev->chunk_size = n; | ||
1816 | return len; | ||
1817 | } | ||
1818 | static struct md_sysfs_entry md_chunk_size = | ||
1819 | __ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store); | ||
1820 | |||
1821 | |||
1822 | static ssize_t | ||
1798 | action_show(mddev_t *mddev, char *page) | 1823 | action_show(mddev_t *mddev, char *page) |
1799 | { | 1824 | { |
1800 | char *type = "idle"; | 1825 | char *type = "idle"; |
1801 | if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || | 1826 | if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || |
1802 | test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) { | 1827 | test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) { |
1803 | if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { | 1828 | if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { |
1804 | if (!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) | 1829 | if (!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) |
1805 | type = "resync"; | 1830 | type = "resync"; |
1806 | else if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) | 1831 | else if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) |
1807 | type = "check"; | 1832 | type = "check"; |
1808 | else | 1833 | else |
1809 | type = "repair"; | 1834 | type = "repair"; |
1810 | } else | 1835 | } else |
1811 | type = "recover"; | 1836 | type = "recover"; |
1812 | } | 1837 | } |
1813 | return sprintf(page, "%s\n", type); | 1838 | return sprintf(page, "%s\n", type); |
1814 | } | 1839 | } |
1815 | 1840 | ||
1816 | static ssize_t | 1841 | static ssize_t |
1817 | action_store(mddev_t *mddev, const char *page, size_t len) | 1842 | action_store(mddev_t *mddev, const char *page, size_t len) |
1818 | { | 1843 | { |
1819 | if (!mddev->pers || !mddev->pers->sync_request) | 1844 | if (!mddev->pers || !mddev->pers->sync_request) |
1820 | return -EINVAL; | 1845 | return -EINVAL; |
1821 | 1846 | ||
1822 | if (cmd_match(page, "idle")) { | 1847 | if (cmd_match(page, "idle")) { |
1823 | if (mddev->sync_thread) { | 1848 | if (mddev->sync_thread) { |
1824 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); | 1849 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); |
1825 | md_unregister_thread(mddev->sync_thread); | 1850 | md_unregister_thread(mddev->sync_thread); |
1826 | mddev->sync_thread = NULL; | 1851 | mddev->sync_thread = NULL; |
1827 | mddev->recovery = 0; | 1852 | mddev->recovery = 0; |
1828 | } | 1853 | } |
1829 | } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || | 1854 | } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || |
1830 | test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) | 1855 | test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) |
1831 | return -EBUSY; | 1856 | return -EBUSY; |
1832 | else if (cmd_match(page, "resync") || cmd_match(page, "recover")) | 1857 | else if (cmd_match(page, "resync") || cmd_match(page, "recover")) |
1833 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 1858 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
1834 | else { | 1859 | else { |
1835 | if (cmd_match(page, "check")) | 1860 | if (cmd_match(page, "check")) |
1836 | set_bit(MD_RECOVERY_CHECK, &mddev->recovery); | 1861 | set_bit(MD_RECOVERY_CHECK, &mddev->recovery); |
1837 | else if (cmd_match(page, "repair")) | 1862 | else if (cmd_match(page, "repair")) |
1838 | return -EINVAL; | 1863 | return -EINVAL; |
1839 | set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); | 1864 | set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); |
1840 | set_bit(MD_RECOVERY_SYNC, &mddev->recovery); | 1865 | set_bit(MD_RECOVERY_SYNC, &mddev->recovery); |
1841 | } | 1866 | } |
1842 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 1867 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
1843 | md_wakeup_thread(mddev->thread); | 1868 | md_wakeup_thread(mddev->thread); |
1844 | return len; | 1869 | return len; |
1845 | } | 1870 | } |
1846 | 1871 | ||
1847 | static ssize_t | 1872 | static ssize_t |
1848 | mismatch_cnt_show(mddev_t *mddev, char *page) | 1873 | mismatch_cnt_show(mddev_t *mddev, char *page) |
1849 | { | 1874 | { |
1850 | return sprintf(page, "%llu\n", | 1875 | return sprintf(page, "%llu\n", |
1851 | (unsigned long long) mddev->resync_mismatches); | 1876 | (unsigned long long) mddev->resync_mismatches); |
1852 | } | 1877 | } |
1853 | 1878 | ||
1854 | static struct md_sysfs_entry | 1879 | static struct md_sysfs_entry |
1855 | md_scan_mode = __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); | 1880 | md_scan_mode = __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); |
1856 | 1881 | ||
1857 | 1882 | ||
1858 | static struct md_sysfs_entry | 1883 | static struct md_sysfs_entry |
1859 | md_mismatches = __ATTR_RO(mismatch_cnt); | 1884 | md_mismatches = __ATTR_RO(mismatch_cnt); |
1860 | 1885 | ||
1861 | static struct attribute *md_default_attrs[] = { | 1886 | static struct attribute *md_default_attrs[] = { |
1862 | &md_level.attr, | 1887 | &md_level.attr, |
1863 | &md_raid_disks.attr, | 1888 | &md_raid_disks.attr, |
1889 | &md_chunk_size.attr, | ||
1864 | NULL, | 1890 | NULL, |
1865 | }; | 1891 | }; |
1866 | 1892 | ||
1867 | static struct attribute *md_redundancy_attrs[] = { | 1893 | static struct attribute *md_redundancy_attrs[] = { |
1868 | &md_scan_mode.attr, | 1894 | &md_scan_mode.attr, |
1869 | &md_mismatches.attr, | 1895 | &md_mismatches.attr, |
1870 | NULL, | 1896 | NULL, |
1871 | }; | 1897 | }; |
1872 | static struct attribute_group md_redundancy_group = { | 1898 | static struct attribute_group md_redundancy_group = { |
1873 | .name = NULL, | 1899 | .name = NULL, |
1874 | .attrs = md_redundancy_attrs, | 1900 | .attrs = md_redundancy_attrs, |
1875 | }; | 1901 | }; |
1876 | 1902 | ||
1877 | 1903 | ||
1878 | static ssize_t | 1904 | static ssize_t |
1879 | md_attr_show(struct kobject *kobj, struct attribute *attr, char *page) | 1905 | md_attr_show(struct kobject *kobj, struct attribute *attr, char *page) |
1880 | { | 1906 | { |
1881 | struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); | 1907 | struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); |
1882 | mddev_t *mddev = container_of(kobj, struct mddev_s, kobj); | 1908 | mddev_t *mddev = container_of(kobj, struct mddev_s, kobj); |
1883 | ssize_t rv; | 1909 | ssize_t rv; |
1884 | 1910 | ||
1885 | if (!entry->show) | 1911 | if (!entry->show) |
1886 | return -EIO; | 1912 | return -EIO; |
1887 | mddev_lock(mddev); | 1913 | mddev_lock(mddev); |
1888 | rv = entry->show(mddev, page); | 1914 | rv = entry->show(mddev, page); |
1889 | mddev_unlock(mddev); | 1915 | mddev_unlock(mddev); |
1890 | return rv; | 1916 | return rv; |
1891 | } | 1917 | } |
1892 | 1918 | ||
1893 | static ssize_t | 1919 | static ssize_t |
1894 | md_attr_store(struct kobject *kobj, struct attribute *attr, | 1920 | md_attr_store(struct kobject *kobj, struct attribute *attr, |
1895 | const char *page, size_t length) | 1921 | const char *page, size_t length) |
1896 | { | 1922 | { |
1897 | struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); | 1923 | struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); |
1898 | mddev_t *mddev = container_of(kobj, struct mddev_s, kobj); | 1924 | mddev_t *mddev = container_of(kobj, struct mddev_s, kobj); |
1899 | ssize_t rv; | 1925 | ssize_t rv; |
1900 | 1926 | ||
1901 | if (!entry->store) | 1927 | if (!entry->store) |
1902 | return -EIO; | 1928 | return -EIO; |
1903 | mddev_lock(mddev); | 1929 | mddev_lock(mddev); |
1904 | rv = entry->store(mddev, page, length); | 1930 | rv = entry->store(mddev, page, length); |
1905 | mddev_unlock(mddev); | 1931 | mddev_unlock(mddev); |
1906 | return rv; | 1932 | return rv; |
1907 | } | 1933 | } |
1908 | 1934 | ||
1909 | static void md_free(struct kobject *ko) | 1935 | static void md_free(struct kobject *ko) |
1910 | { | 1936 | { |
1911 | mddev_t *mddev = container_of(ko, mddev_t, kobj); | 1937 | mddev_t *mddev = container_of(ko, mddev_t, kobj); |
1912 | kfree(mddev); | 1938 | kfree(mddev); |
1913 | } | 1939 | } |
1914 | 1940 | ||
1915 | static struct sysfs_ops md_sysfs_ops = { | 1941 | static struct sysfs_ops md_sysfs_ops = { |
1916 | .show = md_attr_show, | 1942 | .show = md_attr_show, |
1917 | .store = md_attr_store, | 1943 | .store = md_attr_store, |
1918 | }; | 1944 | }; |
1919 | static struct kobj_type md_ktype = { | 1945 | static struct kobj_type md_ktype = { |
1920 | .release = md_free, | 1946 | .release = md_free, |
1921 | .sysfs_ops = &md_sysfs_ops, | 1947 | .sysfs_ops = &md_sysfs_ops, |
1922 | .default_attrs = md_default_attrs, | 1948 | .default_attrs = md_default_attrs, |
1923 | }; | 1949 | }; |
1924 | 1950 | ||
1925 | int mdp_major = 0; | 1951 | int mdp_major = 0; |
1926 | 1952 | ||
1927 | static struct kobject *md_probe(dev_t dev, int *part, void *data) | 1953 | static struct kobject *md_probe(dev_t dev, int *part, void *data) |
1928 | { | 1954 | { |
1929 | static DECLARE_MUTEX(disks_sem); | 1955 | static DECLARE_MUTEX(disks_sem); |
1930 | mddev_t *mddev = mddev_find(dev); | 1956 | mddev_t *mddev = mddev_find(dev); |
1931 | struct gendisk *disk; | 1957 | struct gendisk *disk; |
1932 | int partitioned = (MAJOR(dev) != MD_MAJOR); | 1958 | int partitioned = (MAJOR(dev) != MD_MAJOR); |
1933 | int shift = partitioned ? MdpMinorShift : 0; | 1959 | int shift = partitioned ? MdpMinorShift : 0; |
1934 | int unit = MINOR(dev) >> shift; | 1960 | int unit = MINOR(dev) >> shift; |
1935 | 1961 | ||
1936 | if (!mddev) | 1962 | if (!mddev) |
1937 | return NULL; | 1963 | return NULL; |
1938 | 1964 | ||
1939 | down(&disks_sem); | 1965 | down(&disks_sem); |
1940 | if (mddev->gendisk) { | 1966 | if (mddev->gendisk) { |
1941 | up(&disks_sem); | 1967 | up(&disks_sem); |
1942 | mddev_put(mddev); | 1968 | mddev_put(mddev); |
1943 | return NULL; | 1969 | return NULL; |
1944 | } | 1970 | } |
1945 | disk = alloc_disk(1 << shift); | 1971 | disk = alloc_disk(1 << shift); |
1946 | if (!disk) { | 1972 | if (!disk) { |
1947 | up(&disks_sem); | 1973 | up(&disks_sem); |
1948 | mddev_put(mddev); | 1974 | mddev_put(mddev); |
1949 | return NULL; | 1975 | return NULL; |
1950 | } | 1976 | } |
1951 | disk->major = MAJOR(dev); | 1977 | disk->major = MAJOR(dev); |
1952 | disk->first_minor = unit << shift; | 1978 | disk->first_minor = unit << shift; |
1953 | if (partitioned) { | 1979 | if (partitioned) { |
1954 | sprintf(disk->disk_name, "md_d%d", unit); | 1980 | sprintf(disk->disk_name, "md_d%d", unit); |
1955 | sprintf(disk->devfs_name, "md/d%d", unit); | 1981 | sprintf(disk->devfs_name, "md/d%d", unit); |
1956 | } else { | 1982 | } else { |
1957 | sprintf(disk->disk_name, "md%d", unit); | 1983 | sprintf(disk->disk_name, "md%d", unit); |
1958 | sprintf(disk->devfs_name, "md/%d", unit); | 1984 | sprintf(disk->devfs_name, "md/%d", unit); |
1959 | } | 1985 | } |
1960 | disk->fops = &md_fops; | 1986 | disk->fops = &md_fops; |
1961 | disk->private_data = mddev; | 1987 | disk->private_data = mddev; |
1962 | disk->queue = mddev->queue; | 1988 | disk->queue = mddev->queue; |
1963 | add_disk(disk); | 1989 | add_disk(disk); |
1964 | mddev->gendisk = disk; | 1990 | mddev->gendisk = disk; |
1965 | up(&disks_sem); | 1991 | up(&disks_sem); |
1966 | mddev->kobj.parent = &disk->kobj; | 1992 | mddev->kobj.parent = &disk->kobj; |
1967 | mddev->kobj.k_name = NULL; | 1993 | mddev->kobj.k_name = NULL; |
1968 | snprintf(mddev->kobj.name, KOBJ_NAME_LEN, "%s", "md"); | 1994 | snprintf(mddev->kobj.name, KOBJ_NAME_LEN, "%s", "md"); |
1969 | mddev->kobj.ktype = &md_ktype; | 1995 | mddev->kobj.ktype = &md_ktype; |
1970 | kobject_register(&mddev->kobj); | 1996 | kobject_register(&mddev->kobj); |
1971 | return NULL; | 1997 | return NULL; |
1972 | } | 1998 | } |
1973 | 1999 | ||
1974 | void md_wakeup_thread(mdk_thread_t *thread); | 2000 | void md_wakeup_thread(mdk_thread_t *thread); |
1975 | 2001 | ||
1976 | static void md_safemode_timeout(unsigned long data) | 2002 | static void md_safemode_timeout(unsigned long data) |
1977 | { | 2003 | { |
1978 | mddev_t *mddev = (mddev_t *) data; | 2004 | mddev_t *mddev = (mddev_t *) data; |
1979 | 2005 | ||
1980 | mddev->safemode = 1; | 2006 | mddev->safemode = 1; |
1981 | md_wakeup_thread(mddev->thread); | 2007 | md_wakeup_thread(mddev->thread); |
1982 | } | 2008 | } |
1983 | 2009 | ||
1984 | static int start_dirty_degraded; | 2010 | static int start_dirty_degraded; |
1985 | 2011 | ||
1986 | static int do_md_run(mddev_t * mddev) | 2012 | static int do_md_run(mddev_t * mddev) |
1987 | { | 2013 | { |
1988 | int err; | 2014 | int err; |
1989 | int chunk_size; | 2015 | int chunk_size; |
1990 | struct list_head *tmp; | 2016 | struct list_head *tmp; |
1991 | mdk_rdev_t *rdev; | 2017 | mdk_rdev_t *rdev; |
1992 | struct gendisk *disk; | 2018 | struct gendisk *disk; |
1993 | struct mdk_personality *pers; | 2019 | struct mdk_personality *pers; |
1994 | char b[BDEVNAME_SIZE]; | 2020 | char b[BDEVNAME_SIZE]; |
1995 | 2021 | ||
1996 | if (list_empty(&mddev->disks)) | 2022 | if (list_empty(&mddev->disks)) |
1997 | /* cannot run an array with no devices.. */ | 2023 | /* cannot run an array with no devices.. */ |
1998 | return -EINVAL; | 2024 | return -EINVAL; |
1999 | 2025 | ||
2000 | if (mddev->pers) | 2026 | if (mddev->pers) |
2001 | return -EBUSY; | 2027 | return -EBUSY; |
2002 | 2028 | ||
2003 | /* | 2029 | /* |
2004 | * Analyze all RAID superblock(s) | 2030 | * Analyze all RAID superblock(s) |
2005 | */ | 2031 | */ |
2006 | if (!mddev->raid_disks) | 2032 | if (!mddev->raid_disks) |
2007 | analyze_sbs(mddev); | 2033 | analyze_sbs(mddev); |
2008 | 2034 | ||
2009 | chunk_size = mddev->chunk_size; | 2035 | chunk_size = mddev->chunk_size; |
2010 | 2036 | ||
2011 | if (chunk_size) { | 2037 | if (chunk_size) { |
2012 | if (chunk_size > MAX_CHUNK_SIZE) { | 2038 | if (chunk_size > MAX_CHUNK_SIZE) { |
2013 | printk(KERN_ERR "too big chunk_size: %d > %d\n", | 2039 | printk(KERN_ERR "too big chunk_size: %d > %d\n", |
2014 | chunk_size, MAX_CHUNK_SIZE); | 2040 | chunk_size, MAX_CHUNK_SIZE); |
2015 | return -EINVAL; | 2041 | return -EINVAL; |
2016 | } | 2042 | } |
2017 | /* | 2043 | /* |
2018 | * chunk-size has to be a power of 2 and multiples of PAGE_SIZE | 2044 | * chunk-size has to be a power of 2 and multiples of PAGE_SIZE |
2019 | */ | 2045 | */ |
2020 | if ( (1 << ffz(~chunk_size)) != chunk_size) { | 2046 | if ( (1 << ffz(~chunk_size)) != chunk_size) { |
2021 | printk(KERN_ERR "chunk_size of %d not valid\n", chunk_size); | 2047 | printk(KERN_ERR "chunk_size of %d not valid\n", chunk_size); |
2022 | return -EINVAL; | 2048 | return -EINVAL; |
2023 | } | 2049 | } |
2024 | if (chunk_size < PAGE_SIZE) { | 2050 | if (chunk_size < PAGE_SIZE) { |
2025 | printk(KERN_ERR "too small chunk_size: %d < %ld\n", | 2051 | printk(KERN_ERR "too small chunk_size: %d < %ld\n", |
2026 | chunk_size, PAGE_SIZE); | 2052 | chunk_size, PAGE_SIZE); |
2027 | return -EINVAL; | 2053 | return -EINVAL; |
2028 | } | 2054 | } |
2029 | 2055 | ||
2030 | /* devices must have minimum size of one chunk */ | 2056 | /* devices must have minimum size of one chunk */ |
2031 | ITERATE_RDEV(mddev,rdev,tmp) { | 2057 | ITERATE_RDEV(mddev,rdev,tmp) { |
2032 | if (test_bit(Faulty, &rdev->flags)) | 2058 | if (test_bit(Faulty, &rdev->flags)) |
2033 | continue; | 2059 | continue; |
2034 | if (rdev->size < chunk_size / 1024) { | 2060 | if (rdev->size < chunk_size / 1024) { |
2035 | printk(KERN_WARNING | 2061 | printk(KERN_WARNING |
2036 | "md: Dev %s smaller than chunk_size:" | 2062 | "md: Dev %s smaller than chunk_size:" |
2037 | " %lluk < %dk\n", | 2063 | " %lluk < %dk\n", |
2038 | bdevname(rdev->bdev,b), | 2064 | bdevname(rdev->bdev,b), |
2039 | (unsigned long long)rdev->size, | 2065 | (unsigned long long)rdev->size, |
2040 | chunk_size / 1024); | 2066 | chunk_size / 1024); |
2041 | return -EINVAL; | 2067 | return -EINVAL; |
2042 | } | 2068 | } |
2043 | } | 2069 | } |
2044 | } | 2070 | } |
2045 | 2071 | ||
2046 | #ifdef CONFIG_KMOD | 2072 | #ifdef CONFIG_KMOD |
2047 | request_module("md-level-%d", mddev->level); | 2073 | request_module("md-level-%d", mddev->level); |
2048 | #endif | 2074 | #endif |
2049 | 2075 | ||
2050 | /* | 2076 | /* |
2051 | * Drop all container device buffers, from now on | 2077 | * Drop all container device buffers, from now on |
2052 | * the only valid external interface is through the md | 2078 | * the only valid external interface is through the md |
2053 | * device. | 2079 | * device. |
2054 | * Also find largest hardsector size | 2080 | * Also find largest hardsector size |
2055 | */ | 2081 | */ |
2056 | ITERATE_RDEV(mddev,rdev,tmp) { | 2082 | ITERATE_RDEV(mddev,rdev,tmp) { |
2057 | if (test_bit(Faulty, &rdev->flags)) | 2083 | if (test_bit(Faulty, &rdev->flags)) |
2058 | continue; | 2084 | continue; |
2059 | sync_blockdev(rdev->bdev); | 2085 | sync_blockdev(rdev->bdev); |
2060 | invalidate_bdev(rdev->bdev, 0); | 2086 | invalidate_bdev(rdev->bdev, 0); |
2061 | } | 2087 | } |
2062 | 2088 | ||
2063 | md_probe(mddev->unit, NULL, NULL); | 2089 | md_probe(mddev->unit, NULL, NULL); |
2064 | disk = mddev->gendisk; | 2090 | disk = mddev->gendisk; |
2065 | if (!disk) | 2091 | if (!disk) |
2066 | return -ENOMEM; | 2092 | return -ENOMEM; |
2067 | 2093 | ||
2068 | spin_lock(&pers_lock); | 2094 | spin_lock(&pers_lock); |
2069 | pers = find_pers(mddev->level); | 2095 | pers = find_pers(mddev->level); |
2070 | if (!pers || !try_module_get(pers->owner)) { | 2096 | if (!pers || !try_module_get(pers->owner)) { |
2071 | spin_unlock(&pers_lock); | 2097 | spin_unlock(&pers_lock); |
2072 | printk(KERN_WARNING "md: personality for level %d is not loaded!\n", | 2098 | printk(KERN_WARNING "md: personality for level %d is not loaded!\n", |
2073 | mddev->level); | 2099 | mddev->level); |
2074 | return -EINVAL; | 2100 | return -EINVAL; |
2075 | } | 2101 | } |
2076 | mddev->pers = pers; | 2102 | mddev->pers = pers; |
2077 | spin_unlock(&pers_lock); | 2103 | spin_unlock(&pers_lock); |
2078 | 2104 | ||
2079 | mddev->recovery = 0; | 2105 | mddev->recovery = 0; |
2080 | mddev->resync_max_sectors = mddev->size << 1; /* may be over-ridden by personality */ | 2106 | mddev->resync_max_sectors = mddev->size << 1; /* may be over-ridden by personality */ |
2081 | mddev->barriers_work = 1; | 2107 | mddev->barriers_work = 1; |
2082 | mddev->ok_start_degraded = start_dirty_degraded; | 2108 | mddev->ok_start_degraded = start_dirty_degraded; |
2083 | 2109 | ||
2084 | if (start_readonly) | 2110 | if (start_readonly) |
2085 | mddev->ro = 2; /* read-only, but switch on first write */ | 2111 | mddev->ro = 2; /* read-only, but switch on first write */ |
2086 | 2112 | ||
2087 | err = mddev->pers->run(mddev); | 2113 | err = mddev->pers->run(mddev); |
2088 | if (!err && mddev->pers->sync_request) { | 2114 | if (!err && mddev->pers->sync_request) { |
2089 | err = bitmap_create(mddev); | 2115 | err = bitmap_create(mddev); |
2090 | if (err) { | 2116 | if (err) { |
2091 | printk(KERN_ERR "%s: failed to create bitmap (%d)\n", | 2117 | printk(KERN_ERR "%s: failed to create bitmap (%d)\n", |
2092 | mdname(mddev), err); | 2118 | mdname(mddev), err); |
2093 | mddev->pers->stop(mddev); | 2119 | mddev->pers->stop(mddev); |
2094 | } | 2120 | } |
2095 | } | 2121 | } |
2096 | if (err) { | 2122 | if (err) { |
2097 | printk(KERN_ERR "md: pers->run() failed ...\n"); | 2123 | printk(KERN_ERR "md: pers->run() failed ...\n"); |
2098 | module_put(mddev->pers->owner); | 2124 | module_put(mddev->pers->owner); |
2099 | mddev->pers = NULL; | 2125 | mddev->pers = NULL; |
2100 | bitmap_destroy(mddev); | 2126 | bitmap_destroy(mddev); |
2101 | return err; | 2127 | return err; |
2102 | } | 2128 | } |
2103 | if (mddev->pers->sync_request) | 2129 | if (mddev->pers->sync_request) |
2104 | sysfs_create_group(&mddev->kobj, &md_redundancy_group); | 2130 | sysfs_create_group(&mddev->kobj, &md_redundancy_group); |
2105 | else if (mddev->ro == 2) /* auto-readonly not meaningful */ | 2131 | else if (mddev->ro == 2) /* auto-readonly not meaningful */ |
2106 | mddev->ro = 0; | 2132 | mddev->ro = 0; |
2107 | 2133 | ||
2108 | atomic_set(&mddev->writes_pending,0); | 2134 | atomic_set(&mddev->writes_pending,0); |
2109 | mddev->safemode = 0; | 2135 | mddev->safemode = 0; |
2110 | mddev->safemode_timer.function = md_safemode_timeout; | 2136 | mddev->safemode_timer.function = md_safemode_timeout; |
2111 | mddev->safemode_timer.data = (unsigned long) mddev; | 2137 | mddev->safemode_timer.data = (unsigned long) mddev; |
2112 | mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */ | 2138 | mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */ |
2113 | mddev->in_sync = 1; | 2139 | mddev->in_sync = 1; |
2114 | 2140 | ||
2115 | ITERATE_RDEV(mddev,rdev,tmp) | 2141 | ITERATE_RDEV(mddev,rdev,tmp) |
2116 | if (rdev->raid_disk >= 0) { | 2142 | if (rdev->raid_disk >= 0) { |
2117 | char nm[20]; | 2143 | char nm[20]; |
2118 | sprintf(nm, "rd%d", rdev->raid_disk); | 2144 | sprintf(nm, "rd%d", rdev->raid_disk); |
2119 | sysfs_create_link(&mddev->kobj, &rdev->kobj, nm); | 2145 | sysfs_create_link(&mddev->kobj, &rdev->kobj, nm); |
2120 | } | 2146 | } |
2121 | 2147 | ||
2122 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 2148 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
2123 | md_wakeup_thread(mddev->thread); | 2149 | md_wakeup_thread(mddev->thread); |
2124 | 2150 | ||
2125 | if (mddev->sb_dirty) | 2151 | if (mddev->sb_dirty) |
2126 | md_update_sb(mddev); | 2152 | md_update_sb(mddev); |
2127 | 2153 | ||
2128 | set_capacity(disk, mddev->array_size<<1); | 2154 | set_capacity(disk, mddev->array_size<<1); |
2129 | 2155 | ||
2130 | /* If we call blk_queue_make_request here, it will | 2156 | /* If we call blk_queue_make_request here, it will |
2131 | * re-initialise max_sectors etc which may have been | 2157 | * re-initialise max_sectors etc which may have been |
2132 | * refined inside -> run. So just set the bits we need to set. | 2158 | * refined inside -> run. So just set the bits we need to set. |
2133 | * Most initialisation happended when we called | 2159 | * Most initialisation happended when we called |
2134 | * blk_queue_make_request(..., md_fail_request) | 2160 | * blk_queue_make_request(..., md_fail_request) |
2135 | * earlier. | 2161 | * earlier. |
2136 | */ | 2162 | */ |
2137 | mddev->queue->queuedata = mddev; | 2163 | mddev->queue->queuedata = mddev; |
2138 | mddev->queue->make_request_fn = mddev->pers->make_request; | 2164 | mddev->queue->make_request_fn = mddev->pers->make_request; |
2139 | 2165 | ||
2140 | mddev->changed = 1; | 2166 | mddev->changed = 1; |
2141 | md_new_event(mddev); | 2167 | md_new_event(mddev); |
2142 | return 0; | 2168 | return 0; |
2143 | } | 2169 | } |
2144 | 2170 | ||
2145 | static int restart_array(mddev_t *mddev) | 2171 | static int restart_array(mddev_t *mddev) |
2146 | { | 2172 | { |
2147 | struct gendisk *disk = mddev->gendisk; | 2173 | struct gendisk *disk = mddev->gendisk; |
2148 | int err; | 2174 | int err; |
2149 | 2175 | ||
2150 | /* | 2176 | /* |
2151 | * Complain if it has no devices | 2177 | * Complain if it has no devices |
2152 | */ | 2178 | */ |
2153 | err = -ENXIO; | 2179 | err = -ENXIO; |
2154 | if (list_empty(&mddev->disks)) | 2180 | if (list_empty(&mddev->disks)) |
2155 | goto out; | 2181 | goto out; |
2156 | 2182 | ||
2157 | if (mddev->pers) { | 2183 | if (mddev->pers) { |
2158 | err = -EBUSY; | 2184 | err = -EBUSY; |
2159 | if (!mddev->ro) | 2185 | if (!mddev->ro) |
2160 | goto out; | 2186 | goto out; |
2161 | 2187 | ||
2162 | mddev->safemode = 0; | 2188 | mddev->safemode = 0; |
2163 | mddev->ro = 0; | 2189 | mddev->ro = 0; |
2164 | set_disk_ro(disk, 0); | 2190 | set_disk_ro(disk, 0); |
2165 | 2191 | ||
2166 | printk(KERN_INFO "md: %s switched to read-write mode.\n", | 2192 | printk(KERN_INFO "md: %s switched to read-write mode.\n", |
2167 | mdname(mddev)); | 2193 | mdname(mddev)); |
2168 | /* | 2194 | /* |
2169 | * Kick recovery or resync if necessary | 2195 | * Kick recovery or resync if necessary |
2170 | */ | 2196 | */ |
2171 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 2197 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
2172 | md_wakeup_thread(mddev->thread); | 2198 | md_wakeup_thread(mddev->thread); |
2173 | err = 0; | 2199 | err = 0; |
2174 | } else { | 2200 | } else { |
2175 | printk(KERN_ERR "md: %s has no personality assigned.\n", | 2201 | printk(KERN_ERR "md: %s has no personality assigned.\n", |
2176 | mdname(mddev)); | 2202 | mdname(mddev)); |
2177 | err = -EINVAL; | 2203 | err = -EINVAL; |
2178 | } | 2204 | } |
2179 | 2205 | ||
2180 | out: | 2206 | out: |
2181 | return err; | 2207 | return err; |
2182 | } | 2208 | } |
2183 | 2209 | ||
2184 | static int do_md_stop(mddev_t * mddev, int ro) | 2210 | static int do_md_stop(mddev_t * mddev, int ro) |
2185 | { | 2211 | { |
2186 | int err = 0; | 2212 | int err = 0; |
2187 | struct gendisk *disk = mddev->gendisk; | 2213 | struct gendisk *disk = mddev->gendisk; |
2188 | 2214 | ||
2189 | if (mddev->pers) { | 2215 | if (mddev->pers) { |
2190 | if (atomic_read(&mddev->active)>2) { | 2216 | if (atomic_read(&mddev->active)>2) { |
2191 | printk("md: %s still in use.\n",mdname(mddev)); | 2217 | printk("md: %s still in use.\n",mdname(mddev)); |
2192 | return -EBUSY; | 2218 | return -EBUSY; |
2193 | } | 2219 | } |
2194 | 2220 | ||
2195 | if (mddev->sync_thread) { | 2221 | if (mddev->sync_thread) { |
2196 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); | 2222 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); |
2197 | md_unregister_thread(mddev->sync_thread); | 2223 | md_unregister_thread(mddev->sync_thread); |
2198 | mddev->sync_thread = NULL; | 2224 | mddev->sync_thread = NULL; |
2199 | } | 2225 | } |
2200 | 2226 | ||
2201 | del_timer_sync(&mddev->safemode_timer); | 2227 | del_timer_sync(&mddev->safemode_timer); |
2202 | 2228 | ||
2203 | invalidate_partition(disk, 0); | 2229 | invalidate_partition(disk, 0); |
2204 | 2230 | ||
2205 | if (ro) { | 2231 | if (ro) { |
2206 | err = -ENXIO; | 2232 | err = -ENXIO; |
2207 | if (mddev->ro==1) | 2233 | if (mddev->ro==1) |
2208 | goto out; | 2234 | goto out; |
2209 | mddev->ro = 1; | 2235 | mddev->ro = 1; |
2210 | } else { | 2236 | } else { |
2211 | bitmap_flush(mddev); | 2237 | bitmap_flush(mddev); |
2212 | md_super_wait(mddev); | 2238 | md_super_wait(mddev); |
2213 | if (mddev->ro) | 2239 | if (mddev->ro) |
2214 | set_disk_ro(disk, 0); | 2240 | set_disk_ro(disk, 0); |
2215 | blk_queue_make_request(mddev->queue, md_fail_request); | 2241 | blk_queue_make_request(mddev->queue, md_fail_request); |
2216 | mddev->pers->stop(mddev); | 2242 | mddev->pers->stop(mddev); |
2217 | if (mddev->pers->sync_request) | 2243 | if (mddev->pers->sync_request) |
2218 | sysfs_remove_group(&mddev->kobj, &md_redundancy_group); | 2244 | sysfs_remove_group(&mddev->kobj, &md_redundancy_group); |
2219 | 2245 | ||
2220 | module_put(mddev->pers->owner); | 2246 | module_put(mddev->pers->owner); |
2221 | mddev->pers = NULL; | 2247 | mddev->pers = NULL; |
2222 | if (mddev->ro) | 2248 | if (mddev->ro) |
2223 | mddev->ro = 0; | 2249 | mddev->ro = 0; |
2224 | } | 2250 | } |
2225 | if (!mddev->in_sync) { | 2251 | if (!mddev->in_sync) { |
2226 | /* mark array as shutdown cleanly */ | 2252 | /* mark array as shutdown cleanly */ |
2227 | mddev->in_sync = 1; | 2253 | mddev->in_sync = 1; |
2228 | md_update_sb(mddev); | 2254 | md_update_sb(mddev); |
2229 | } | 2255 | } |
2230 | if (ro) | 2256 | if (ro) |
2231 | set_disk_ro(disk, 1); | 2257 | set_disk_ro(disk, 1); |
2232 | } | 2258 | } |
2233 | 2259 | ||
2234 | bitmap_destroy(mddev); | 2260 | bitmap_destroy(mddev); |
2235 | if (mddev->bitmap_file) { | 2261 | if (mddev->bitmap_file) { |
2236 | atomic_set(&mddev->bitmap_file->f_dentry->d_inode->i_writecount, 1); | 2262 | atomic_set(&mddev->bitmap_file->f_dentry->d_inode->i_writecount, 1); |
2237 | fput(mddev->bitmap_file); | 2263 | fput(mddev->bitmap_file); |
2238 | mddev->bitmap_file = NULL; | 2264 | mddev->bitmap_file = NULL; |
2239 | } | 2265 | } |
2240 | mddev->bitmap_offset = 0; | 2266 | mddev->bitmap_offset = 0; |
2241 | 2267 | ||
2242 | /* | 2268 | /* |
2243 | * Free resources if final stop | 2269 | * Free resources if final stop |
2244 | */ | 2270 | */ |
2245 | if (!ro) { | 2271 | if (!ro) { |
2246 | mdk_rdev_t *rdev; | 2272 | mdk_rdev_t *rdev; |
2247 | struct list_head *tmp; | 2273 | struct list_head *tmp; |
2248 | struct gendisk *disk; | 2274 | struct gendisk *disk; |
2249 | printk(KERN_INFO "md: %s stopped.\n", mdname(mddev)); | 2275 | printk(KERN_INFO "md: %s stopped.\n", mdname(mddev)); |
2250 | 2276 | ||
2251 | ITERATE_RDEV(mddev,rdev,tmp) | 2277 | ITERATE_RDEV(mddev,rdev,tmp) |
2252 | if (rdev->raid_disk >= 0) { | 2278 | if (rdev->raid_disk >= 0) { |
2253 | char nm[20]; | 2279 | char nm[20]; |
2254 | sprintf(nm, "rd%d", rdev->raid_disk); | 2280 | sprintf(nm, "rd%d", rdev->raid_disk); |
2255 | sysfs_remove_link(&mddev->kobj, nm); | 2281 | sysfs_remove_link(&mddev->kobj, nm); |
2256 | } | 2282 | } |
2257 | 2283 | ||
2258 | export_array(mddev); | 2284 | export_array(mddev); |
2259 | 2285 | ||
2260 | mddev->array_size = 0; | 2286 | mddev->array_size = 0; |
2261 | disk = mddev->gendisk; | 2287 | disk = mddev->gendisk; |
2262 | if (disk) | 2288 | if (disk) |
2263 | set_capacity(disk, 0); | 2289 | set_capacity(disk, 0); |
2264 | mddev->changed = 1; | 2290 | mddev->changed = 1; |
2265 | } else | 2291 | } else |
2266 | printk(KERN_INFO "md: %s switched to read-only mode.\n", | 2292 | printk(KERN_INFO "md: %s switched to read-only mode.\n", |
2267 | mdname(mddev)); | 2293 | mdname(mddev)); |
2268 | err = 0; | 2294 | err = 0; |
2269 | md_new_event(mddev); | 2295 | md_new_event(mddev); |
2270 | out: | 2296 | out: |
2271 | return err; | 2297 | return err; |
2272 | } | 2298 | } |
2273 | 2299 | ||
2274 | static void autorun_array(mddev_t *mddev) | 2300 | static void autorun_array(mddev_t *mddev) |
2275 | { | 2301 | { |
2276 | mdk_rdev_t *rdev; | 2302 | mdk_rdev_t *rdev; |
2277 | struct list_head *tmp; | 2303 | struct list_head *tmp; |
2278 | int err; | 2304 | int err; |
2279 | 2305 | ||
2280 | if (list_empty(&mddev->disks)) | 2306 | if (list_empty(&mddev->disks)) |
2281 | return; | 2307 | return; |
2282 | 2308 | ||
2283 | printk(KERN_INFO "md: running: "); | 2309 | printk(KERN_INFO "md: running: "); |
2284 | 2310 | ||
2285 | ITERATE_RDEV(mddev,rdev,tmp) { | 2311 | ITERATE_RDEV(mddev,rdev,tmp) { |
2286 | char b[BDEVNAME_SIZE]; | 2312 | char b[BDEVNAME_SIZE]; |
2287 | printk("<%s>", bdevname(rdev->bdev,b)); | 2313 | printk("<%s>", bdevname(rdev->bdev,b)); |
2288 | } | 2314 | } |
2289 | printk("\n"); | 2315 | printk("\n"); |
2290 | 2316 | ||
2291 | err = do_md_run (mddev); | 2317 | err = do_md_run (mddev); |
2292 | if (err) { | 2318 | if (err) { |
2293 | printk(KERN_WARNING "md: do_md_run() returned %d\n", err); | 2319 | printk(KERN_WARNING "md: do_md_run() returned %d\n", err); |
2294 | do_md_stop (mddev, 0); | 2320 | do_md_stop (mddev, 0); |
2295 | } | 2321 | } |
2296 | } | 2322 | } |
2297 | 2323 | ||
2298 | /* | 2324 | /* |
2299 | * lets try to run arrays based on all disks that have arrived | 2325 | * lets try to run arrays based on all disks that have arrived |
2300 | * until now. (those are in pending_raid_disks) | 2326 | * until now. (those are in pending_raid_disks) |
2301 | * | 2327 | * |
2302 | * the method: pick the first pending disk, collect all disks with | 2328 | * the method: pick the first pending disk, collect all disks with |
2303 | * the same UUID, remove all from the pending list and put them into | 2329 | * the same UUID, remove all from the pending list and put them into |
2304 | * the 'same_array' list. Then order this list based on superblock | 2330 | * the 'same_array' list. Then order this list based on superblock |
2305 | * update time (freshest comes first), kick out 'old' disks and | 2331 | * update time (freshest comes first), kick out 'old' disks and |
2306 | * compare superblocks. If everything's fine then run it. | 2332 | * compare superblocks. If everything's fine then run it. |
2307 | * | 2333 | * |
2308 | * If "unit" is allocated, then bump its reference count | 2334 | * If "unit" is allocated, then bump its reference count |
2309 | */ | 2335 | */ |
2310 | static void autorun_devices(int part) | 2336 | static void autorun_devices(int part) |
2311 | { | 2337 | { |
2312 | struct list_head candidates; | 2338 | struct list_head candidates; |
2313 | struct list_head *tmp; | 2339 | struct list_head *tmp; |
2314 | mdk_rdev_t *rdev0, *rdev; | 2340 | mdk_rdev_t *rdev0, *rdev; |
2315 | mddev_t *mddev; | 2341 | mddev_t *mddev; |
2316 | char b[BDEVNAME_SIZE]; | 2342 | char b[BDEVNAME_SIZE]; |
2317 | 2343 | ||
2318 | printk(KERN_INFO "md: autorun ...\n"); | 2344 | printk(KERN_INFO "md: autorun ...\n"); |
2319 | while (!list_empty(&pending_raid_disks)) { | 2345 | while (!list_empty(&pending_raid_disks)) { |
2320 | dev_t dev; | 2346 | dev_t dev; |
2321 | rdev0 = list_entry(pending_raid_disks.next, | 2347 | rdev0 = list_entry(pending_raid_disks.next, |
2322 | mdk_rdev_t, same_set); | 2348 | mdk_rdev_t, same_set); |
2323 | 2349 | ||
2324 | printk(KERN_INFO "md: considering %s ...\n", | 2350 | printk(KERN_INFO "md: considering %s ...\n", |
2325 | bdevname(rdev0->bdev,b)); | 2351 | bdevname(rdev0->bdev,b)); |
2326 | INIT_LIST_HEAD(&candidates); | 2352 | INIT_LIST_HEAD(&candidates); |
2327 | ITERATE_RDEV_PENDING(rdev,tmp) | 2353 | ITERATE_RDEV_PENDING(rdev,tmp) |
2328 | if (super_90_load(rdev, rdev0, 0) >= 0) { | 2354 | if (super_90_load(rdev, rdev0, 0) >= 0) { |
2329 | printk(KERN_INFO "md: adding %s ...\n", | 2355 | printk(KERN_INFO "md: adding %s ...\n", |
2330 | bdevname(rdev->bdev,b)); | 2356 | bdevname(rdev->bdev,b)); |
2331 | list_move(&rdev->same_set, &candidates); | 2357 | list_move(&rdev->same_set, &candidates); |
2332 | } | 2358 | } |
2333 | /* | 2359 | /* |
2334 | * now we have a set of devices, with all of them having | 2360 | * now we have a set of devices, with all of them having |
2335 | * mostly sane superblocks. It's time to allocate the | 2361 | * mostly sane superblocks. It's time to allocate the |
2336 | * mddev. | 2362 | * mddev. |
2337 | */ | 2363 | */ |
2338 | if (rdev0->preferred_minor < 0 || rdev0->preferred_minor >= MAX_MD_DEVS) { | 2364 | if (rdev0->preferred_minor < 0 || rdev0->preferred_minor >= MAX_MD_DEVS) { |
2339 | printk(KERN_INFO "md: unit number in %s is bad: %d\n", | 2365 | printk(KERN_INFO "md: unit number in %s is bad: %d\n", |
2340 | bdevname(rdev0->bdev, b), rdev0->preferred_minor); | 2366 | bdevname(rdev0->bdev, b), rdev0->preferred_minor); |
2341 | break; | 2367 | break; |
2342 | } | 2368 | } |
2343 | if (part) | 2369 | if (part) |
2344 | dev = MKDEV(mdp_major, | 2370 | dev = MKDEV(mdp_major, |
2345 | rdev0->preferred_minor << MdpMinorShift); | 2371 | rdev0->preferred_minor << MdpMinorShift); |
2346 | else | 2372 | else |
2347 | dev = MKDEV(MD_MAJOR, rdev0->preferred_minor); | 2373 | dev = MKDEV(MD_MAJOR, rdev0->preferred_minor); |
2348 | 2374 | ||
2349 | md_probe(dev, NULL, NULL); | 2375 | md_probe(dev, NULL, NULL); |
2350 | mddev = mddev_find(dev); | 2376 | mddev = mddev_find(dev); |
2351 | if (!mddev) { | 2377 | if (!mddev) { |
2352 | printk(KERN_ERR | 2378 | printk(KERN_ERR |
2353 | "md: cannot allocate memory for md drive.\n"); | 2379 | "md: cannot allocate memory for md drive.\n"); |
2354 | break; | 2380 | break; |
2355 | } | 2381 | } |
2356 | if (mddev_lock(mddev)) | 2382 | if (mddev_lock(mddev)) |
2357 | printk(KERN_WARNING "md: %s locked, cannot run\n", | 2383 | printk(KERN_WARNING "md: %s locked, cannot run\n", |
2358 | mdname(mddev)); | 2384 | mdname(mddev)); |
2359 | else if (mddev->raid_disks || mddev->major_version | 2385 | else if (mddev->raid_disks || mddev->major_version |
2360 | || !list_empty(&mddev->disks)) { | 2386 | || !list_empty(&mddev->disks)) { |
2361 | printk(KERN_WARNING | 2387 | printk(KERN_WARNING |
2362 | "md: %s already running, cannot run %s\n", | 2388 | "md: %s already running, cannot run %s\n", |
2363 | mdname(mddev), bdevname(rdev0->bdev,b)); | 2389 | mdname(mddev), bdevname(rdev0->bdev,b)); |
2364 | mddev_unlock(mddev); | 2390 | mddev_unlock(mddev); |
2365 | } else { | 2391 | } else { |
2366 | printk(KERN_INFO "md: created %s\n", mdname(mddev)); | 2392 | printk(KERN_INFO "md: created %s\n", mdname(mddev)); |
2367 | ITERATE_RDEV_GENERIC(candidates,rdev,tmp) { | 2393 | ITERATE_RDEV_GENERIC(candidates,rdev,tmp) { |
2368 | list_del_init(&rdev->same_set); | 2394 | list_del_init(&rdev->same_set); |
2369 | if (bind_rdev_to_array(rdev, mddev)) | 2395 | if (bind_rdev_to_array(rdev, mddev)) |
2370 | export_rdev(rdev); | 2396 | export_rdev(rdev); |
2371 | } | 2397 | } |
2372 | autorun_array(mddev); | 2398 | autorun_array(mddev); |
2373 | mddev_unlock(mddev); | 2399 | mddev_unlock(mddev); |
2374 | } | 2400 | } |
2375 | /* on success, candidates will be empty, on error | 2401 | /* on success, candidates will be empty, on error |
2376 | * it won't... | 2402 | * it won't... |
2377 | */ | 2403 | */ |
2378 | ITERATE_RDEV_GENERIC(candidates,rdev,tmp) | 2404 | ITERATE_RDEV_GENERIC(candidates,rdev,tmp) |
2379 | export_rdev(rdev); | 2405 | export_rdev(rdev); |
2380 | mddev_put(mddev); | 2406 | mddev_put(mddev); |
2381 | } | 2407 | } |
2382 | printk(KERN_INFO "md: ... autorun DONE.\n"); | 2408 | printk(KERN_INFO "md: ... autorun DONE.\n"); |
2383 | } | 2409 | } |
2384 | 2410 | ||
2385 | /* | 2411 | /* |
2386 | * import RAID devices based on one partition | 2412 | * import RAID devices based on one partition |
2387 | * if possible, the array gets run as well. | 2413 | * if possible, the array gets run as well. |
2388 | */ | 2414 | */ |
2389 | 2415 | ||
2390 | static int autostart_array(dev_t startdev) | 2416 | static int autostart_array(dev_t startdev) |
2391 | { | 2417 | { |
2392 | char b[BDEVNAME_SIZE]; | 2418 | char b[BDEVNAME_SIZE]; |
2393 | int err = -EINVAL, i; | 2419 | int err = -EINVAL, i; |
2394 | mdp_super_t *sb = NULL; | 2420 | mdp_super_t *sb = NULL; |
2395 | mdk_rdev_t *start_rdev = NULL, *rdev; | 2421 | mdk_rdev_t *start_rdev = NULL, *rdev; |
2396 | 2422 | ||
2397 | start_rdev = md_import_device(startdev, 0, 0); | 2423 | start_rdev = md_import_device(startdev, 0, 0); |
2398 | if (IS_ERR(start_rdev)) | 2424 | if (IS_ERR(start_rdev)) |
2399 | return err; | 2425 | return err; |
2400 | 2426 | ||
2401 | 2427 | ||
2402 | /* NOTE: this can only work for 0.90.0 superblocks */ | 2428 | /* NOTE: this can only work for 0.90.0 superblocks */ |
2403 | sb = (mdp_super_t*)page_address(start_rdev->sb_page); | 2429 | sb = (mdp_super_t*)page_address(start_rdev->sb_page); |
2404 | if (sb->major_version != 0 || | 2430 | if (sb->major_version != 0 || |
2405 | sb->minor_version != 90 ) { | 2431 | sb->minor_version != 90 ) { |
2406 | printk(KERN_WARNING "md: can only autostart 0.90.0 arrays\n"); | 2432 | printk(KERN_WARNING "md: can only autostart 0.90.0 arrays\n"); |
2407 | export_rdev(start_rdev); | 2433 | export_rdev(start_rdev); |
2408 | return err; | 2434 | return err; |
2409 | } | 2435 | } |
2410 | 2436 | ||
2411 | if (test_bit(Faulty, &start_rdev->flags)) { | 2437 | if (test_bit(Faulty, &start_rdev->flags)) { |
2412 | printk(KERN_WARNING | 2438 | printk(KERN_WARNING |
2413 | "md: can not autostart based on faulty %s!\n", | 2439 | "md: can not autostart based on faulty %s!\n", |
2414 | bdevname(start_rdev->bdev,b)); | 2440 | bdevname(start_rdev->bdev,b)); |
2415 | export_rdev(start_rdev); | 2441 | export_rdev(start_rdev); |
2416 | return err; | 2442 | return err; |
2417 | } | 2443 | } |
2418 | list_add(&start_rdev->same_set, &pending_raid_disks); | 2444 | list_add(&start_rdev->same_set, &pending_raid_disks); |
2419 | 2445 | ||
2420 | for (i = 0; i < MD_SB_DISKS; i++) { | 2446 | for (i = 0; i < MD_SB_DISKS; i++) { |
2421 | mdp_disk_t *desc = sb->disks + i; | 2447 | mdp_disk_t *desc = sb->disks + i; |
2422 | dev_t dev = MKDEV(desc->major, desc->minor); | 2448 | dev_t dev = MKDEV(desc->major, desc->minor); |
2423 | 2449 | ||
2424 | if (!dev) | 2450 | if (!dev) |
2425 | continue; | 2451 | continue; |
2426 | if (dev == startdev) | 2452 | if (dev == startdev) |
2427 | continue; | 2453 | continue; |
2428 | if (MAJOR(dev) != desc->major || MINOR(dev) != desc->minor) | 2454 | if (MAJOR(dev) != desc->major || MINOR(dev) != desc->minor) |
2429 | continue; | 2455 | continue; |
2430 | rdev = md_import_device(dev, 0, 0); | 2456 | rdev = md_import_device(dev, 0, 0); |
2431 | if (IS_ERR(rdev)) | 2457 | if (IS_ERR(rdev)) |
2432 | continue; | 2458 | continue; |
2433 | 2459 | ||
2434 | list_add(&rdev->same_set, &pending_raid_disks); | 2460 | list_add(&rdev->same_set, &pending_raid_disks); |
2435 | } | 2461 | } |
2436 | 2462 | ||
2437 | /* | 2463 | /* |
2438 | * possibly return codes | 2464 | * possibly return codes |
2439 | */ | 2465 | */ |
2440 | autorun_devices(0); | 2466 | autorun_devices(0); |
2441 | return 0; | 2467 | return 0; |
2442 | 2468 | ||
2443 | } | 2469 | } |
2444 | 2470 | ||
2445 | 2471 | ||
2446 | static int get_version(void __user * arg) | 2472 | static int get_version(void __user * arg) |
2447 | { | 2473 | { |
2448 | mdu_version_t ver; | 2474 | mdu_version_t ver; |
2449 | 2475 | ||
2450 | ver.major = MD_MAJOR_VERSION; | 2476 | ver.major = MD_MAJOR_VERSION; |
2451 | ver.minor = MD_MINOR_VERSION; | 2477 | ver.minor = MD_MINOR_VERSION; |
2452 | ver.patchlevel = MD_PATCHLEVEL_VERSION; | 2478 | ver.patchlevel = MD_PATCHLEVEL_VERSION; |
2453 | 2479 | ||
2454 | if (copy_to_user(arg, &ver, sizeof(ver))) | 2480 | if (copy_to_user(arg, &ver, sizeof(ver))) |
2455 | return -EFAULT; | 2481 | return -EFAULT; |
2456 | 2482 | ||
2457 | return 0; | 2483 | return 0; |
2458 | } | 2484 | } |
2459 | 2485 | ||
2460 | static int get_array_info(mddev_t * mddev, void __user * arg) | 2486 | static int get_array_info(mddev_t * mddev, void __user * arg) |
2461 | { | 2487 | { |
2462 | mdu_array_info_t info; | 2488 | mdu_array_info_t info; |
2463 | int nr,working,active,failed,spare; | 2489 | int nr,working,active,failed,spare; |
2464 | mdk_rdev_t *rdev; | 2490 | mdk_rdev_t *rdev; |
2465 | struct list_head *tmp; | 2491 | struct list_head *tmp; |
2466 | 2492 | ||
2467 | nr=working=active=failed=spare=0; | 2493 | nr=working=active=failed=spare=0; |
2468 | ITERATE_RDEV(mddev,rdev,tmp) { | 2494 | ITERATE_RDEV(mddev,rdev,tmp) { |
2469 | nr++; | 2495 | nr++; |
2470 | if (test_bit(Faulty, &rdev->flags)) | 2496 | if (test_bit(Faulty, &rdev->flags)) |
2471 | failed++; | 2497 | failed++; |
2472 | else { | 2498 | else { |
2473 | working++; | 2499 | working++; |
2474 | if (test_bit(In_sync, &rdev->flags)) | 2500 | if (test_bit(In_sync, &rdev->flags)) |
2475 | active++; | 2501 | active++; |
2476 | else | 2502 | else |
2477 | spare++; | 2503 | spare++; |
2478 | } | 2504 | } |
2479 | } | 2505 | } |
2480 | 2506 | ||
2481 | info.major_version = mddev->major_version; | 2507 | info.major_version = mddev->major_version; |
2482 | info.minor_version = mddev->minor_version; | 2508 | info.minor_version = mddev->minor_version; |
2483 | info.patch_version = MD_PATCHLEVEL_VERSION; | 2509 | info.patch_version = MD_PATCHLEVEL_VERSION; |
2484 | info.ctime = mddev->ctime; | 2510 | info.ctime = mddev->ctime; |
2485 | info.level = mddev->level; | 2511 | info.level = mddev->level; |
2486 | info.size = mddev->size; | 2512 | info.size = mddev->size; |
2487 | info.nr_disks = nr; | 2513 | info.nr_disks = nr; |
2488 | info.raid_disks = mddev->raid_disks; | 2514 | info.raid_disks = mddev->raid_disks; |
2489 | info.md_minor = mddev->md_minor; | 2515 | info.md_minor = mddev->md_minor; |
2490 | info.not_persistent= !mddev->persistent; | 2516 | info.not_persistent= !mddev->persistent; |
2491 | 2517 | ||
2492 | info.utime = mddev->utime; | 2518 | info.utime = mddev->utime; |
2493 | info.state = 0; | 2519 | info.state = 0; |
2494 | if (mddev->in_sync) | 2520 | if (mddev->in_sync) |
2495 | info.state = (1<<MD_SB_CLEAN); | 2521 | info.state = (1<<MD_SB_CLEAN); |
2496 | if (mddev->bitmap && mddev->bitmap_offset) | 2522 | if (mddev->bitmap && mddev->bitmap_offset) |
2497 | info.state = (1<<MD_SB_BITMAP_PRESENT); | 2523 | info.state = (1<<MD_SB_BITMAP_PRESENT); |
2498 | info.active_disks = active; | 2524 | info.active_disks = active; |
2499 | info.working_disks = working; | 2525 | info.working_disks = working; |
2500 | info.failed_disks = failed; | 2526 | info.failed_disks = failed; |
2501 | info.spare_disks = spare; | 2527 | info.spare_disks = spare; |
2502 | 2528 | ||
2503 | info.layout = mddev->layout; | 2529 | info.layout = mddev->layout; |
2504 | info.chunk_size = mddev->chunk_size; | 2530 | info.chunk_size = mddev->chunk_size; |
2505 | 2531 | ||
2506 | if (copy_to_user(arg, &info, sizeof(info))) | 2532 | if (copy_to_user(arg, &info, sizeof(info))) |
2507 | return -EFAULT; | 2533 | return -EFAULT; |
2508 | 2534 | ||
2509 | return 0; | 2535 | return 0; |
2510 | } | 2536 | } |
2511 | 2537 | ||
2512 | static int get_bitmap_file(mddev_t * mddev, void __user * arg) | 2538 | static int get_bitmap_file(mddev_t * mddev, void __user * arg) |
2513 | { | 2539 | { |
2514 | mdu_bitmap_file_t *file = NULL; /* too big for stack allocation */ | 2540 | mdu_bitmap_file_t *file = NULL; /* too big for stack allocation */ |
2515 | char *ptr, *buf = NULL; | 2541 | char *ptr, *buf = NULL; |
2516 | int err = -ENOMEM; | 2542 | int err = -ENOMEM; |
2517 | 2543 | ||
2518 | file = kmalloc(sizeof(*file), GFP_KERNEL); | 2544 | file = kmalloc(sizeof(*file), GFP_KERNEL); |
2519 | if (!file) | 2545 | if (!file) |
2520 | goto out; | 2546 | goto out; |
2521 | 2547 | ||
2522 | /* bitmap disabled, zero the first byte and copy out */ | 2548 | /* bitmap disabled, zero the first byte and copy out */ |
2523 | if (!mddev->bitmap || !mddev->bitmap->file) { | 2549 | if (!mddev->bitmap || !mddev->bitmap->file) { |
2524 | file->pathname[0] = '\0'; | 2550 | file->pathname[0] = '\0'; |
2525 | goto copy_out; | 2551 | goto copy_out; |
2526 | } | 2552 | } |
2527 | 2553 | ||
2528 | buf = kmalloc(sizeof(file->pathname), GFP_KERNEL); | 2554 | buf = kmalloc(sizeof(file->pathname), GFP_KERNEL); |
2529 | if (!buf) | 2555 | if (!buf) |
2530 | goto out; | 2556 | goto out; |
2531 | 2557 | ||
2532 | ptr = file_path(mddev->bitmap->file, buf, sizeof(file->pathname)); | 2558 | ptr = file_path(mddev->bitmap->file, buf, sizeof(file->pathname)); |
2533 | if (!ptr) | 2559 | if (!ptr) |
2534 | goto out; | 2560 | goto out; |
2535 | 2561 | ||
2536 | strcpy(file->pathname, ptr); | 2562 | strcpy(file->pathname, ptr); |
2537 | 2563 | ||
2538 | copy_out: | 2564 | copy_out: |
2539 | err = 0; | 2565 | err = 0; |
2540 | if (copy_to_user(arg, file, sizeof(*file))) | 2566 | if (copy_to_user(arg, file, sizeof(*file))) |
2541 | err = -EFAULT; | 2567 | err = -EFAULT; |
2542 | out: | 2568 | out: |
2543 | kfree(buf); | 2569 | kfree(buf); |
2544 | kfree(file); | 2570 | kfree(file); |
2545 | return err; | 2571 | return err; |
2546 | } | 2572 | } |
2547 | 2573 | ||
2548 | static int get_disk_info(mddev_t * mddev, void __user * arg) | 2574 | static int get_disk_info(mddev_t * mddev, void __user * arg) |
2549 | { | 2575 | { |
2550 | mdu_disk_info_t info; | 2576 | mdu_disk_info_t info; |
2551 | unsigned int nr; | 2577 | unsigned int nr; |
2552 | mdk_rdev_t *rdev; | 2578 | mdk_rdev_t *rdev; |
2553 | 2579 | ||
2554 | if (copy_from_user(&info, arg, sizeof(info))) | 2580 | if (copy_from_user(&info, arg, sizeof(info))) |
2555 | return -EFAULT; | 2581 | return -EFAULT; |
2556 | 2582 | ||
2557 | nr = info.number; | 2583 | nr = info.number; |
2558 | 2584 | ||
2559 | rdev = find_rdev_nr(mddev, nr); | 2585 | rdev = find_rdev_nr(mddev, nr); |
2560 | if (rdev) { | 2586 | if (rdev) { |
2561 | info.major = MAJOR(rdev->bdev->bd_dev); | 2587 | info.major = MAJOR(rdev->bdev->bd_dev); |
2562 | info.minor = MINOR(rdev->bdev->bd_dev); | 2588 | info.minor = MINOR(rdev->bdev->bd_dev); |
2563 | info.raid_disk = rdev->raid_disk; | 2589 | info.raid_disk = rdev->raid_disk; |
2564 | info.state = 0; | 2590 | info.state = 0; |
2565 | if (test_bit(Faulty, &rdev->flags)) | 2591 | if (test_bit(Faulty, &rdev->flags)) |
2566 | info.state |= (1<<MD_DISK_FAULTY); | 2592 | info.state |= (1<<MD_DISK_FAULTY); |
2567 | else if (test_bit(In_sync, &rdev->flags)) { | 2593 | else if (test_bit(In_sync, &rdev->flags)) { |
2568 | info.state |= (1<<MD_DISK_ACTIVE); | 2594 | info.state |= (1<<MD_DISK_ACTIVE); |
2569 | info.state |= (1<<MD_DISK_SYNC); | 2595 | info.state |= (1<<MD_DISK_SYNC); |
2570 | } | 2596 | } |
2571 | if (test_bit(WriteMostly, &rdev->flags)) | 2597 | if (test_bit(WriteMostly, &rdev->flags)) |
2572 | info.state |= (1<<MD_DISK_WRITEMOSTLY); | 2598 | info.state |= (1<<MD_DISK_WRITEMOSTLY); |
2573 | } else { | 2599 | } else { |
2574 | info.major = info.minor = 0; | 2600 | info.major = info.minor = 0; |
2575 | info.raid_disk = -1; | 2601 | info.raid_disk = -1; |
2576 | info.state = (1<<MD_DISK_REMOVED); | 2602 | info.state = (1<<MD_DISK_REMOVED); |
2577 | } | 2603 | } |
2578 | 2604 | ||
2579 | if (copy_to_user(arg, &info, sizeof(info))) | 2605 | if (copy_to_user(arg, &info, sizeof(info))) |
2580 | return -EFAULT; | 2606 | return -EFAULT; |
2581 | 2607 | ||
2582 | return 0; | 2608 | return 0; |
2583 | } | 2609 | } |
2584 | 2610 | ||
2585 | static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info) | 2611 | static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info) |
2586 | { | 2612 | { |
2587 | char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; | 2613 | char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; |
2588 | mdk_rdev_t *rdev; | 2614 | mdk_rdev_t *rdev; |
2589 | dev_t dev = MKDEV(info->major,info->minor); | 2615 | dev_t dev = MKDEV(info->major,info->minor); |
2590 | 2616 | ||
2591 | if (info->major != MAJOR(dev) || info->minor != MINOR(dev)) | 2617 | if (info->major != MAJOR(dev) || info->minor != MINOR(dev)) |
2592 | return -EOVERFLOW; | 2618 | return -EOVERFLOW; |
2593 | 2619 | ||
2594 | if (!mddev->raid_disks) { | 2620 | if (!mddev->raid_disks) { |
2595 | int err; | 2621 | int err; |
2596 | /* expecting a device which has a superblock */ | 2622 | /* expecting a device which has a superblock */ |
2597 | rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); | 2623 | rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); |
2598 | if (IS_ERR(rdev)) { | 2624 | if (IS_ERR(rdev)) { |
2599 | printk(KERN_WARNING | 2625 | printk(KERN_WARNING |
2600 | "md: md_import_device returned %ld\n", | 2626 | "md: md_import_device returned %ld\n", |
2601 | PTR_ERR(rdev)); | 2627 | PTR_ERR(rdev)); |
2602 | return PTR_ERR(rdev); | 2628 | return PTR_ERR(rdev); |
2603 | } | 2629 | } |
2604 | if (!list_empty(&mddev->disks)) { | 2630 | if (!list_empty(&mddev->disks)) { |
2605 | mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, | 2631 | mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, |
2606 | mdk_rdev_t, same_set); | 2632 | mdk_rdev_t, same_set); |
2607 | int err = super_types[mddev->major_version] | 2633 | int err = super_types[mddev->major_version] |
2608 | .load_super(rdev, rdev0, mddev->minor_version); | 2634 | .load_super(rdev, rdev0, mddev->minor_version); |
2609 | if (err < 0) { | 2635 | if (err < 0) { |
2610 | printk(KERN_WARNING | 2636 | printk(KERN_WARNING |
2611 | "md: %s has different UUID to %s\n", | 2637 | "md: %s has different UUID to %s\n", |
2612 | bdevname(rdev->bdev,b), | 2638 | bdevname(rdev->bdev,b), |
2613 | bdevname(rdev0->bdev,b2)); | 2639 | bdevname(rdev0->bdev,b2)); |
2614 | export_rdev(rdev); | 2640 | export_rdev(rdev); |
2615 | return -EINVAL; | 2641 | return -EINVAL; |
2616 | } | 2642 | } |
2617 | } | 2643 | } |
2618 | err = bind_rdev_to_array(rdev, mddev); | 2644 | err = bind_rdev_to_array(rdev, mddev); |
2619 | if (err) | 2645 | if (err) |
2620 | export_rdev(rdev); | 2646 | export_rdev(rdev); |
2621 | return err; | 2647 | return err; |
2622 | } | 2648 | } |
2623 | 2649 | ||
2624 | /* | 2650 | /* |
2625 | * add_new_disk can be used once the array is assembled | 2651 | * add_new_disk can be used once the array is assembled |
2626 | * to add "hot spares". They must already have a superblock | 2652 | * to add "hot spares". They must already have a superblock |
2627 | * written | 2653 | * written |
2628 | */ | 2654 | */ |
2629 | if (mddev->pers) { | 2655 | if (mddev->pers) { |
2630 | int err; | 2656 | int err; |
2631 | if (!mddev->pers->hot_add_disk) { | 2657 | if (!mddev->pers->hot_add_disk) { |
2632 | printk(KERN_WARNING | 2658 | printk(KERN_WARNING |
2633 | "%s: personality does not support diskops!\n", | 2659 | "%s: personality does not support diskops!\n", |
2634 | mdname(mddev)); | 2660 | mdname(mddev)); |
2635 | return -EINVAL; | 2661 | return -EINVAL; |
2636 | } | 2662 | } |
2637 | if (mddev->persistent) | 2663 | if (mddev->persistent) |
2638 | rdev = md_import_device(dev, mddev->major_version, | 2664 | rdev = md_import_device(dev, mddev->major_version, |
2639 | mddev->minor_version); | 2665 | mddev->minor_version); |
2640 | else | 2666 | else |
2641 | rdev = md_import_device(dev, -1, -1); | 2667 | rdev = md_import_device(dev, -1, -1); |
2642 | if (IS_ERR(rdev)) { | 2668 | if (IS_ERR(rdev)) { |
2643 | printk(KERN_WARNING | 2669 | printk(KERN_WARNING |
2644 | "md: md_import_device returned %ld\n", | 2670 | "md: md_import_device returned %ld\n", |
2645 | PTR_ERR(rdev)); | 2671 | PTR_ERR(rdev)); |
2646 | return PTR_ERR(rdev); | 2672 | return PTR_ERR(rdev); |
2647 | } | 2673 | } |
2648 | /* set save_raid_disk if appropriate */ | 2674 | /* set save_raid_disk if appropriate */ |
2649 | if (!mddev->persistent) { | 2675 | if (!mddev->persistent) { |
2650 | if (info->state & (1<<MD_DISK_SYNC) && | 2676 | if (info->state & (1<<MD_DISK_SYNC) && |
2651 | info->raid_disk < mddev->raid_disks) | 2677 | info->raid_disk < mddev->raid_disks) |
2652 | rdev->raid_disk = info->raid_disk; | 2678 | rdev->raid_disk = info->raid_disk; |
2653 | else | 2679 | else |
2654 | rdev->raid_disk = -1; | 2680 | rdev->raid_disk = -1; |
2655 | } else | 2681 | } else |
2656 | super_types[mddev->major_version]. | 2682 | super_types[mddev->major_version]. |
2657 | validate_super(mddev, rdev); | 2683 | validate_super(mddev, rdev); |
2658 | rdev->saved_raid_disk = rdev->raid_disk; | 2684 | rdev->saved_raid_disk = rdev->raid_disk; |
2659 | 2685 | ||
2660 | clear_bit(In_sync, &rdev->flags); /* just to be sure */ | 2686 | clear_bit(In_sync, &rdev->flags); /* just to be sure */ |
2661 | if (info->state & (1<<MD_DISK_WRITEMOSTLY)) | 2687 | if (info->state & (1<<MD_DISK_WRITEMOSTLY)) |
2662 | set_bit(WriteMostly, &rdev->flags); | 2688 | set_bit(WriteMostly, &rdev->flags); |
2663 | 2689 | ||
2664 | rdev->raid_disk = -1; | 2690 | rdev->raid_disk = -1; |
2665 | err = bind_rdev_to_array(rdev, mddev); | 2691 | err = bind_rdev_to_array(rdev, mddev); |
2666 | if (err) | 2692 | if (err) |
2667 | export_rdev(rdev); | 2693 | export_rdev(rdev); |
2668 | 2694 | ||
2669 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 2695 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
2670 | md_wakeup_thread(mddev->thread); | 2696 | md_wakeup_thread(mddev->thread); |
2671 | return err; | 2697 | return err; |
2672 | } | 2698 | } |
2673 | 2699 | ||
2674 | /* otherwise, add_new_disk is only allowed | 2700 | /* otherwise, add_new_disk is only allowed |
2675 | * for major_version==0 superblocks | 2701 | * for major_version==0 superblocks |
2676 | */ | 2702 | */ |
2677 | if (mddev->major_version != 0) { | 2703 | if (mddev->major_version != 0) { |
2678 | printk(KERN_WARNING "%s: ADD_NEW_DISK not supported\n", | 2704 | printk(KERN_WARNING "%s: ADD_NEW_DISK not supported\n", |
2679 | mdname(mddev)); | 2705 | mdname(mddev)); |
2680 | return -EINVAL; | 2706 | return -EINVAL; |
2681 | } | 2707 | } |
2682 | 2708 | ||
2683 | if (!(info->state & (1<<MD_DISK_FAULTY))) { | 2709 | if (!(info->state & (1<<MD_DISK_FAULTY))) { |
2684 | int err; | 2710 | int err; |
2685 | rdev = md_import_device (dev, -1, 0); | 2711 | rdev = md_import_device (dev, -1, 0); |
2686 | if (IS_ERR(rdev)) { | 2712 | if (IS_ERR(rdev)) { |
2687 | printk(KERN_WARNING | 2713 | printk(KERN_WARNING |
2688 | "md: error, md_import_device() returned %ld\n", | 2714 | "md: error, md_import_device() returned %ld\n", |
2689 | PTR_ERR(rdev)); | 2715 | PTR_ERR(rdev)); |
2690 | return PTR_ERR(rdev); | 2716 | return PTR_ERR(rdev); |
2691 | } | 2717 | } |
2692 | rdev->desc_nr = info->number; | 2718 | rdev->desc_nr = info->number; |
2693 | if (info->raid_disk < mddev->raid_disks) | 2719 | if (info->raid_disk < mddev->raid_disks) |
2694 | rdev->raid_disk = info->raid_disk; | 2720 | rdev->raid_disk = info->raid_disk; |
2695 | else | 2721 | else |
2696 | rdev->raid_disk = -1; | 2722 | rdev->raid_disk = -1; |
2697 | 2723 | ||
2698 | rdev->flags = 0; | 2724 | rdev->flags = 0; |
2699 | 2725 | ||
2700 | if (rdev->raid_disk < mddev->raid_disks) | 2726 | if (rdev->raid_disk < mddev->raid_disks) |
2701 | if (info->state & (1<<MD_DISK_SYNC)) | 2727 | if (info->state & (1<<MD_DISK_SYNC)) |
2702 | set_bit(In_sync, &rdev->flags); | 2728 | set_bit(In_sync, &rdev->flags); |
2703 | 2729 | ||
2704 | if (info->state & (1<<MD_DISK_WRITEMOSTLY)) | 2730 | if (info->state & (1<<MD_DISK_WRITEMOSTLY)) |
2705 | set_bit(WriteMostly, &rdev->flags); | 2731 | set_bit(WriteMostly, &rdev->flags); |
2706 | 2732 | ||
2707 | err = bind_rdev_to_array(rdev, mddev); | 2733 | err = bind_rdev_to_array(rdev, mddev); |
2708 | if (err) { | 2734 | if (err) { |
2709 | export_rdev(rdev); | 2735 | export_rdev(rdev); |
2710 | return err; | 2736 | return err; |
2711 | } | 2737 | } |
2712 | 2738 | ||
2713 | if (!mddev->persistent) { | 2739 | if (!mddev->persistent) { |
2714 | printk(KERN_INFO "md: nonpersistent superblock ...\n"); | 2740 | printk(KERN_INFO "md: nonpersistent superblock ...\n"); |
2715 | rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; | 2741 | rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; |
2716 | } else | 2742 | } else |
2717 | rdev->sb_offset = calc_dev_sboffset(rdev->bdev); | 2743 | rdev->sb_offset = calc_dev_sboffset(rdev->bdev); |
2718 | rdev->size = calc_dev_size(rdev, mddev->chunk_size); | 2744 | rdev->size = calc_dev_size(rdev, mddev->chunk_size); |
2719 | 2745 | ||
2720 | if (!mddev->size || (mddev->size > rdev->size)) | 2746 | if (!mddev->size || (mddev->size > rdev->size)) |
2721 | mddev->size = rdev->size; | 2747 | mddev->size = rdev->size; |
2722 | } | 2748 | } |
2723 | 2749 | ||
2724 | return 0; | 2750 | return 0; |
2725 | } | 2751 | } |
2726 | 2752 | ||
2727 | static int hot_remove_disk(mddev_t * mddev, dev_t dev) | 2753 | static int hot_remove_disk(mddev_t * mddev, dev_t dev) |
2728 | { | 2754 | { |
2729 | char b[BDEVNAME_SIZE]; | 2755 | char b[BDEVNAME_SIZE]; |
2730 | mdk_rdev_t *rdev; | 2756 | mdk_rdev_t *rdev; |
2731 | 2757 | ||
2732 | if (!mddev->pers) | 2758 | if (!mddev->pers) |
2733 | return -ENODEV; | 2759 | return -ENODEV; |
2734 | 2760 | ||
2735 | rdev = find_rdev(mddev, dev); | 2761 | rdev = find_rdev(mddev, dev); |
2736 | if (!rdev) | 2762 | if (!rdev) |
2737 | return -ENXIO; | 2763 | return -ENXIO; |
2738 | 2764 | ||
2739 | if (rdev->raid_disk >= 0) | 2765 | if (rdev->raid_disk >= 0) |
2740 | goto busy; | 2766 | goto busy; |
2741 | 2767 | ||
2742 | kick_rdev_from_array(rdev); | 2768 | kick_rdev_from_array(rdev); |
2743 | md_update_sb(mddev); | 2769 | md_update_sb(mddev); |
2744 | md_new_event(mddev); | 2770 | md_new_event(mddev); |
2745 | 2771 | ||
2746 | return 0; | 2772 | return 0; |
2747 | busy: | 2773 | busy: |
2748 | printk(KERN_WARNING "md: cannot remove active disk %s from %s ... \n", | 2774 | printk(KERN_WARNING "md: cannot remove active disk %s from %s ... \n", |
2749 | bdevname(rdev->bdev,b), mdname(mddev)); | 2775 | bdevname(rdev->bdev,b), mdname(mddev)); |
2750 | return -EBUSY; | 2776 | return -EBUSY; |
2751 | } | 2777 | } |
2752 | 2778 | ||
2753 | static int hot_add_disk(mddev_t * mddev, dev_t dev) | 2779 | static int hot_add_disk(mddev_t * mddev, dev_t dev) |
2754 | { | 2780 | { |
2755 | char b[BDEVNAME_SIZE]; | 2781 | char b[BDEVNAME_SIZE]; |
2756 | int err; | 2782 | int err; |
2757 | unsigned int size; | 2783 | unsigned int size; |
2758 | mdk_rdev_t *rdev; | 2784 | mdk_rdev_t *rdev; |
2759 | 2785 | ||
2760 | if (!mddev->pers) | 2786 | if (!mddev->pers) |
2761 | return -ENODEV; | 2787 | return -ENODEV; |
2762 | 2788 | ||
2763 | if (mddev->major_version != 0) { | 2789 | if (mddev->major_version != 0) { |
2764 | printk(KERN_WARNING "%s: HOT_ADD may only be used with" | 2790 | printk(KERN_WARNING "%s: HOT_ADD may only be used with" |
2765 | " version-0 superblocks.\n", | 2791 | " version-0 superblocks.\n", |
2766 | mdname(mddev)); | 2792 | mdname(mddev)); |
2767 | return -EINVAL; | 2793 | return -EINVAL; |
2768 | } | 2794 | } |
2769 | if (!mddev->pers->hot_add_disk) { | 2795 | if (!mddev->pers->hot_add_disk) { |
2770 | printk(KERN_WARNING | 2796 | printk(KERN_WARNING |
2771 | "%s: personality does not support diskops!\n", | 2797 | "%s: personality does not support diskops!\n", |
2772 | mdname(mddev)); | 2798 | mdname(mddev)); |
2773 | return -EINVAL; | 2799 | return -EINVAL; |
2774 | } | 2800 | } |
2775 | 2801 | ||
2776 | rdev = md_import_device (dev, -1, 0); | 2802 | rdev = md_import_device (dev, -1, 0); |
2777 | if (IS_ERR(rdev)) { | 2803 | if (IS_ERR(rdev)) { |
2778 | printk(KERN_WARNING | 2804 | printk(KERN_WARNING |
2779 | "md: error, md_import_device() returned %ld\n", | 2805 | "md: error, md_import_device() returned %ld\n", |
2780 | PTR_ERR(rdev)); | 2806 | PTR_ERR(rdev)); |
2781 | return -EINVAL; | 2807 | return -EINVAL; |
2782 | } | 2808 | } |
2783 | 2809 | ||
2784 | if (mddev->persistent) | 2810 | if (mddev->persistent) |
2785 | rdev->sb_offset = calc_dev_sboffset(rdev->bdev); | 2811 | rdev->sb_offset = calc_dev_sboffset(rdev->bdev); |
2786 | else | 2812 | else |
2787 | rdev->sb_offset = | 2813 | rdev->sb_offset = |
2788 | rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; | 2814 | rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; |
2789 | 2815 | ||
2790 | size = calc_dev_size(rdev, mddev->chunk_size); | 2816 | size = calc_dev_size(rdev, mddev->chunk_size); |
2791 | rdev->size = size; | 2817 | rdev->size = size; |
2792 | 2818 | ||
2793 | if (size < mddev->size) { | 2819 | if (size < mddev->size) { |
2794 | printk(KERN_WARNING | 2820 | printk(KERN_WARNING |
2795 | "%s: disk size %llu blocks < array size %llu\n", | 2821 | "%s: disk size %llu blocks < array size %llu\n", |
2796 | mdname(mddev), (unsigned long long)size, | 2822 | mdname(mddev), (unsigned long long)size, |
2797 | (unsigned long long)mddev->size); | 2823 | (unsigned long long)mddev->size); |
2798 | err = -ENOSPC; | 2824 | err = -ENOSPC; |
2799 | goto abort_export; | 2825 | goto abort_export; |
2800 | } | 2826 | } |
2801 | 2827 | ||
2802 | if (test_bit(Faulty, &rdev->flags)) { | 2828 | if (test_bit(Faulty, &rdev->flags)) { |
2803 | printk(KERN_WARNING | 2829 | printk(KERN_WARNING |
2804 | "md: can not hot-add faulty %s disk to %s!\n", | 2830 | "md: can not hot-add faulty %s disk to %s!\n", |
2805 | bdevname(rdev->bdev,b), mdname(mddev)); | 2831 | bdevname(rdev->bdev,b), mdname(mddev)); |
2806 | err = -EINVAL; | 2832 | err = -EINVAL; |
2807 | goto abort_export; | 2833 | goto abort_export; |
2808 | } | 2834 | } |
2809 | clear_bit(In_sync, &rdev->flags); | 2835 | clear_bit(In_sync, &rdev->flags); |
2810 | rdev->desc_nr = -1; | 2836 | rdev->desc_nr = -1; |
2811 | bind_rdev_to_array(rdev, mddev); | 2837 | bind_rdev_to_array(rdev, mddev); |
2812 | 2838 | ||
2813 | /* | 2839 | /* |
2814 | * The rest should better be atomic, we can have disk failures | 2840 | * The rest should better be atomic, we can have disk failures |
2815 | * noticed in interrupt contexts ... | 2841 | * noticed in interrupt contexts ... |
2816 | */ | 2842 | */ |
2817 | 2843 | ||
2818 | if (rdev->desc_nr == mddev->max_disks) { | 2844 | if (rdev->desc_nr == mddev->max_disks) { |
2819 | printk(KERN_WARNING "%s: can not hot-add to full array!\n", | 2845 | printk(KERN_WARNING "%s: can not hot-add to full array!\n", |
2820 | mdname(mddev)); | 2846 | mdname(mddev)); |
2821 | err = -EBUSY; | 2847 | err = -EBUSY; |
2822 | goto abort_unbind_export; | 2848 | goto abort_unbind_export; |
2823 | } | 2849 | } |
2824 | 2850 | ||
2825 | rdev->raid_disk = -1; | 2851 | rdev->raid_disk = -1; |
2826 | 2852 | ||
2827 | md_update_sb(mddev); | 2853 | md_update_sb(mddev); |
2828 | 2854 | ||
2829 | /* | 2855 | /* |
2830 | * Kick recovery, maybe this spare has to be added to the | 2856 | * Kick recovery, maybe this spare has to be added to the |
2831 | * array immediately. | 2857 | * array immediately. |
2832 | */ | 2858 | */ |
2833 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 2859 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
2834 | md_wakeup_thread(mddev->thread); | 2860 | md_wakeup_thread(mddev->thread); |
2835 | md_new_event(mddev); | 2861 | md_new_event(mddev); |
2836 | return 0; | 2862 | return 0; |
2837 | 2863 | ||
2838 | abort_unbind_export: | 2864 | abort_unbind_export: |
2839 | unbind_rdev_from_array(rdev); | 2865 | unbind_rdev_from_array(rdev); |
2840 | 2866 | ||
2841 | abort_export: | 2867 | abort_export: |
2842 | export_rdev(rdev); | 2868 | export_rdev(rdev); |
2843 | return err; | 2869 | return err; |
2844 | } | 2870 | } |
2845 | 2871 | ||
2846 | /* similar to deny_write_access, but accounts for our holding a reference | 2872 | /* similar to deny_write_access, but accounts for our holding a reference |
2847 | * to the file ourselves */ | 2873 | * to the file ourselves */ |
2848 | static int deny_bitmap_write_access(struct file * file) | 2874 | static int deny_bitmap_write_access(struct file * file) |
2849 | { | 2875 | { |
2850 | struct inode *inode = file->f_mapping->host; | 2876 | struct inode *inode = file->f_mapping->host; |
2851 | 2877 | ||
2852 | spin_lock(&inode->i_lock); | 2878 | spin_lock(&inode->i_lock); |
2853 | if (atomic_read(&inode->i_writecount) > 1) { | 2879 | if (atomic_read(&inode->i_writecount) > 1) { |
2854 | spin_unlock(&inode->i_lock); | 2880 | spin_unlock(&inode->i_lock); |
2855 | return -ETXTBSY; | 2881 | return -ETXTBSY; |
2856 | } | 2882 | } |
2857 | atomic_set(&inode->i_writecount, -1); | 2883 | atomic_set(&inode->i_writecount, -1); |
2858 | spin_unlock(&inode->i_lock); | 2884 | spin_unlock(&inode->i_lock); |
2859 | 2885 | ||
2860 | return 0; | 2886 | return 0; |
2861 | } | 2887 | } |
2862 | 2888 | ||
2863 | static int set_bitmap_file(mddev_t *mddev, int fd) | 2889 | static int set_bitmap_file(mddev_t *mddev, int fd) |
2864 | { | 2890 | { |
2865 | int err; | 2891 | int err; |
2866 | 2892 | ||
2867 | if (mddev->pers) { | 2893 | if (mddev->pers) { |
2868 | if (!mddev->pers->quiesce) | 2894 | if (!mddev->pers->quiesce) |
2869 | return -EBUSY; | 2895 | return -EBUSY; |
2870 | if (mddev->recovery || mddev->sync_thread) | 2896 | if (mddev->recovery || mddev->sync_thread) |
2871 | return -EBUSY; | 2897 | return -EBUSY; |
2872 | /* we should be able to change the bitmap.. */ | 2898 | /* we should be able to change the bitmap.. */ |
2873 | } | 2899 | } |
2874 | 2900 | ||
2875 | 2901 | ||
2876 | if (fd >= 0) { | 2902 | if (fd >= 0) { |
2877 | if (mddev->bitmap) | 2903 | if (mddev->bitmap) |
2878 | return -EEXIST; /* cannot add when bitmap is present */ | 2904 | return -EEXIST; /* cannot add when bitmap is present */ |
2879 | mddev->bitmap_file = fget(fd); | 2905 | mddev->bitmap_file = fget(fd); |
2880 | 2906 | ||
2881 | if (mddev->bitmap_file == NULL) { | 2907 | if (mddev->bitmap_file == NULL) { |
2882 | printk(KERN_ERR "%s: error: failed to get bitmap file\n", | 2908 | printk(KERN_ERR "%s: error: failed to get bitmap file\n", |
2883 | mdname(mddev)); | 2909 | mdname(mddev)); |
2884 | return -EBADF; | 2910 | return -EBADF; |
2885 | } | 2911 | } |
2886 | 2912 | ||
2887 | err = deny_bitmap_write_access(mddev->bitmap_file); | 2913 | err = deny_bitmap_write_access(mddev->bitmap_file); |
2888 | if (err) { | 2914 | if (err) { |
2889 | printk(KERN_ERR "%s: error: bitmap file is already in use\n", | 2915 | printk(KERN_ERR "%s: error: bitmap file is already in use\n", |
2890 | mdname(mddev)); | 2916 | mdname(mddev)); |
2891 | fput(mddev->bitmap_file); | 2917 | fput(mddev->bitmap_file); |
2892 | mddev->bitmap_file = NULL; | 2918 | mddev->bitmap_file = NULL; |
2893 | return err; | 2919 | return err; |
2894 | } | 2920 | } |
2895 | mddev->bitmap_offset = 0; /* file overrides offset */ | 2921 | mddev->bitmap_offset = 0; /* file overrides offset */ |
2896 | } else if (mddev->bitmap == NULL) | 2922 | } else if (mddev->bitmap == NULL) |
2897 | return -ENOENT; /* cannot remove what isn't there */ | 2923 | return -ENOENT; /* cannot remove what isn't there */ |
2898 | err = 0; | 2924 | err = 0; |
2899 | if (mddev->pers) { | 2925 | if (mddev->pers) { |
2900 | mddev->pers->quiesce(mddev, 1); | 2926 | mddev->pers->quiesce(mddev, 1); |
2901 | if (fd >= 0) | 2927 | if (fd >= 0) |
2902 | err = bitmap_create(mddev); | 2928 | err = bitmap_create(mddev); |
2903 | if (fd < 0 || err) | 2929 | if (fd < 0 || err) |
2904 | bitmap_destroy(mddev); | 2930 | bitmap_destroy(mddev); |
2905 | mddev->pers->quiesce(mddev, 0); | 2931 | mddev->pers->quiesce(mddev, 0); |
2906 | } else if (fd < 0) { | 2932 | } else if (fd < 0) { |
2907 | if (mddev->bitmap_file) | 2933 | if (mddev->bitmap_file) |
2908 | fput(mddev->bitmap_file); | 2934 | fput(mddev->bitmap_file); |
2909 | mddev->bitmap_file = NULL; | 2935 | mddev->bitmap_file = NULL; |
2910 | } | 2936 | } |
2911 | 2937 | ||
2912 | return err; | 2938 | return err; |
2913 | } | 2939 | } |
2914 | 2940 | ||
2915 | /* | 2941 | /* |
2916 | * set_array_info is used two different ways | 2942 | * set_array_info is used two different ways |
2917 | * The original usage is when creating a new array. | 2943 | * The original usage is when creating a new array. |
2918 | * In this usage, raid_disks is > 0 and it together with | 2944 | * In this usage, raid_disks is > 0 and it together with |
2919 | * level, size, not_persistent,layout,chunksize determine the | 2945 | * level, size, not_persistent,layout,chunksize determine the |
2920 | * shape of the array. | 2946 | * shape of the array. |
2921 | * This will always create an array with a type-0.90.0 superblock. | 2947 | * This will always create an array with a type-0.90.0 superblock. |
2922 | * The newer usage is when assembling an array. | 2948 | * The newer usage is when assembling an array. |
2923 | * In this case raid_disks will be 0, and the major_version field is | 2949 | * In this case raid_disks will be 0, and the major_version field is |
2924 | * use to determine which style super-blocks are to be found on the devices. | 2950 | * use to determine which style super-blocks are to be found on the devices. |
2925 | * The minor and patch _version numbers are also kept incase the | 2951 | * The minor and patch _version numbers are also kept incase the |
2926 | * super_block handler wishes to interpret them. | 2952 | * super_block handler wishes to interpret them. |
2927 | */ | 2953 | */ |
2928 | static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) | 2954 | static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) |
2929 | { | 2955 | { |
2930 | 2956 | ||
2931 | if (info->raid_disks == 0) { | 2957 | if (info->raid_disks == 0) { |
2932 | /* just setting version number for superblock loading */ | 2958 | /* just setting version number for superblock loading */ |
2933 | if (info->major_version < 0 || | 2959 | if (info->major_version < 0 || |
2934 | info->major_version >= sizeof(super_types)/sizeof(super_types[0]) || | 2960 | info->major_version >= sizeof(super_types)/sizeof(super_types[0]) || |
2935 | super_types[info->major_version].name == NULL) { | 2961 | super_types[info->major_version].name == NULL) { |
2936 | /* maybe try to auto-load a module? */ | 2962 | /* maybe try to auto-load a module? */ |
2937 | printk(KERN_INFO | 2963 | printk(KERN_INFO |
2938 | "md: superblock version %d not known\n", | 2964 | "md: superblock version %d not known\n", |
2939 | info->major_version); | 2965 | info->major_version); |
2940 | return -EINVAL; | 2966 | return -EINVAL; |
2941 | } | 2967 | } |
2942 | mddev->major_version = info->major_version; | 2968 | mddev->major_version = info->major_version; |
2943 | mddev->minor_version = info->minor_version; | 2969 | mddev->minor_version = info->minor_version; |
2944 | mddev->patch_version = info->patch_version; | 2970 | mddev->patch_version = info->patch_version; |
2945 | return 0; | 2971 | return 0; |
2946 | } | 2972 | } |
2947 | mddev->major_version = MD_MAJOR_VERSION; | 2973 | mddev->major_version = MD_MAJOR_VERSION; |
2948 | mddev->minor_version = MD_MINOR_VERSION; | 2974 | mddev->minor_version = MD_MINOR_VERSION; |
2949 | mddev->patch_version = MD_PATCHLEVEL_VERSION; | 2975 | mddev->patch_version = MD_PATCHLEVEL_VERSION; |
2950 | mddev->ctime = get_seconds(); | 2976 | mddev->ctime = get_seconds(); |
2951 | 2977 | ||
2952 | mddev->level = info->level; | 2978 | mddev->level = info->level; |
2953 | mddev->size = info->size; | 2979 | mddev->size = info->size; |
2954 | mddev->raid_disks = info->raid_disks; | 2980 | mddev->raid_disks = info->raid_disks; |
2955 | /* don't set md_minor, it is determined by which /dev/md* was | 2981 | /* don't set md_minor, it is determined by which /dev/md* was |
2956 | * openned | 2982 | * openned |
2957 | */ | 2983 | */ |
2958 | if (info->state & (1<<MD_SB_CLEAN)) | 2984 | if (info->state & (1<<MD_SB_CLEAN)) |
2959 | mddev->recovery_cp = MaxSector; | 2985 | mddev->recovery_cp = MaxSector; |
2960 | else | 2986 | else |
2961 | mddev->recovery_cp = 0; | 2987 | mddev->recovery_cp = 0; |
2962 | mddev->persistent = ! info->not_persistent; | 2988 | mddev->persistent = ! info->not_persistent; |
2963 | 2989 | ||
2964 | mddev->layout = info->layout; | 2990 | mddev->layout = info->layout; |
2965 | mddev->chunk_size = info->chunk_size; | 2991 | mddev->chunk_size = info->chunk_size; |
2966 | 2992 | ||
2967 | mddev->max_disks = MD_SB_DISKS; | 2993 | mddev->max_disks = MD_SB_DISKS; |
2968 | 2994 | ||
2969 | mddev->sb_dirty = 1; | 2995 | mddev->sb_dirty = 1; |
2970 | 2996 | ||
2971 | mddev->default_bitmap_offset = MD_SB_BYTES >> 9; | 2997 | mddev->default_bitmap_offset = MD_SB_BYTES >> 9; |
2972 | mddev->bitmap_offset = 0; | 2998 | mddev->bitmap_offset = 0; |
2973 | 2999 | ||
2974 | /* | 3000 | /* |
2975 | * Generate a 128 bit UUID | 3001 | * Generate a 128 bit UUID |
2976 | */ | 3002 | */ |
2977 | get_random_bytes(mddev->uuid, 16); | 3003 | get_random_bytes(mddev->uuid, 16); |
2978 | 3004 | ||
2979 | return 0; | 3005 | return 0; |
2980 | } | 3006 | } |
2981 | 3007 | ||
2982 | /* | 3008 | /* |
2983 | * update_array_info is used to change the configuration of an | 3009 | * update_array_info is used to change the configuration of an |
2984 | * on-line array. | 3010 | * on-line array. |
2985 | * The version, ctime,level,size,raid_disks,not_persistent, layout,chunk_size | 3011 | * The version, ctime,level,size,raid_disks,not_persistent, layout,chunk_size |
2986 | * fields in the info are checked against the array. | 3012 | * fields in the info are checked against the array. |
2987 | * Any differences that cannot be handled will cause an error. | 3013 | * Any differences that cannot be handled will cause an error. |
2988 | * Normally, only one change can be managed at a time. | 3014 | * Normally, only one change can be managed at a time. |
2989 | */ | 3015 | */ |
2990 | static int update_array_info(mddev_t *mddev, mdu_array_info_t *info) | 3016 | static int update_array_info(mddev_t *mddev, mdu_array_info_t *info) |
2991 | { | 3017 | { |
2992 | int rv = 0; | 3018 | int rv = 0; |
2993 | int cnt = 0; | 3019 | int cnt = 0; |
2994 | int state = 0; | 3020 | int state = 0; |
2995 | 3021 | ||
2996 | /* calculate expected state,ignoring low bits */ | 3022 | /* calculate expected state,ignoring low bits */ |
2997 | if (mddev->bitmap && mddev->bitmap_offset) | 3023 | if (mddev->bitmap && mddev->bitmap_offset) |
2998 | state |= (1 << MD_SB_BITMAP_PRESENT); | 3024 | state |= (1 << MD_SB_BITMAP_PRESENT); |
2999 | 3025 | ||
3000 | if (mddev->major_version != info->major_version || | 3026 | if (mddev->major_version != info->major_version || |
3001 | mddev->minor_version != info->minor_version || | 3027 | mddev->minor_version != info->minor_version || |
3002 | /* mddev->patch_version != info->patch_version || */ | 3028 | /* mddev->patch_version != info->patch_version || */ |
3003 | mddev->ctime != info->ctime || | 3029 | mddev->ctime != info->ctime || |
3004 | mddev->level != info->level || | 3030 | mddev->level != info->level || |
3005 | /* mddev->layout != info->layout || */ | 3031 | /* mddev->layout != info->layout || */ |
3006 | !mddev->persistent != info->not_persistent|| | 3032 | !mddev->persistent != info->not_persistent|| |
3007 | mddev->chunk_size != info->chunk_size || | 3033 | mddev->chunk_size != info->chunk_size || |
3008 | /* ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT to change */ | 3034 | /* ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT to change */ |
3009 | ((state^info->state) & 0xfffffe00) | 3035 | ((state^info->state) & 0xfffffe00) |
3010 | ) | 3036 | ) |
3011 | return -EINVAL; | 3037 | return -EINVAL; |
3012 | /* Check there is only one change */ | 3038 | /* Check there is only one change */ |
3013 | if (mddev->size != info->size) cnt++; | 3039 | if (mddev->size != info->size) cnt++; |
3014 | if (mddev->raid_disks != info->raid_disks) cnt++; | 3040 | if (mddev->raid_disks != info->raid_disks) cnt++; |
3015 | if (mddev->layout != info->layout) cnt++; | 3041 | if (mddev->layout != info->layout) cnt++; |
3016 | if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) cnt++; | 3042 | if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) cnt++; |
3017 | if (cnt == 0) return 0; | 3043 | if (cnt == 0) return 0; |
3018 | if (cnt > 1) return -EINVAL; | 3044 | if (cnt > 1) return -EINVAL; |
3019 | 3045 | ||
3020 | if (mddev->layout != info->layout) { | 3046 | if (mddev->layout != info->layout) { |
3021 | /* Change layout | 3047 | /* Change layout |
3022 | * we don't need to do anything at the md level, the | 3048 | * we don't need to do anything at the md level, the |
3023 | * personality will take care of it all. | 3049 | * personality will take care of it all. |
3024 | */ | 3050 | */ |
3025 | if (mddev->pers->reconfig == NULL) | 3051 | if (mddev->pers->reconfig == NULL) |
3026 | return -EINVAL; | 3052 | return -EINVAL; |
3027 | else | 3053 | else |
3028 | return mddev->pers->reconfig(mddev, info->layout, -1); | 3054 | return mddev->pers->reconfig(mddev, info->layout, -1); |
3029 | } | 3055 | } |
3030 | if (mddev->size != info->size) { | 3056 | if (mddev->size != info->size) { |
3031 | mdk_rdev_t * rdev; | 3057 | mdk_rdev_t * rdev; |
3032 | struct list_head *tmp; | 3058 | struct list_head *tmp; |
3033 | if (mddev->pers->resize == NULL) | 3059 | if (mddev->pers->resize == NULL) |
3034 | return -EINVAL; | 3060 | return -EINVAL; |
3035 | /* The "size" is the amount of each device that is used. | 3061 | /* The "size" is the amount of each device that is used. |
3036 | * This can only make sense for arrays with redundancy. | 3062 | * This can only make sense for arrays with redundancy. |
3037 | * linear and raid0 always use whatever space is available | 3063 | * linear and raid0 always use whatever space is available |
3038 | * We can only consider changing the size if no resync | 3064 | * We can only consider changing the size if no resync |
3039 | * or reconstruction is happening, and if the new size | 3065 | * or reconstruction is happening, and if the new size |
3040 | * is acceptable. It must fit before the sb_offset or, | 3066 | * is acceptable. It must fit before the sb_offset or, |
3041 | * if that is <data_offset, it must fit before the | 3067 | * if that is <data_offset, it must fit before the |
3042 | * size of each device. | 3068 | * size of each device. |
3043 | * If size is zero, we find the largest size that fits. | 3069 | * If size is zero, we find the largest size that fits. |
3044 | */ | 3070 | */ |
3045 | if (mddev->sync_thread) | 3071 | if (mddev->sync_thread) |
3046 | return -EBUSY; | 3072 | return -EBUSY; |
3047 | ITERATE_RDEV(mddev,rdev,tmp) { | 3073 | ITERATE_RDEV(mddev,rdev,tmp) { |
3048 | sector_t avail; | 3074 | sector_t avail; |
3049 | int fit = (info->size == 0); | 3075 | int fit = (info->size == 0); |
3050 | if (rdev->sb_offset > rdev->data_offset) | 3076 | if (rdev->sb_offset > rdev->data_offset) |
3051 | avail = (rdev->sb_offset*2) - rdev->data_offset; | 3077 | avail = (rdev->sb_offset*2) - rdev->data_offset; |
3052 | else | 3078 | else |
3053 | avail = get_capacity(rdev->bdev->bd_disk) | 3079 | avail = get_capacity(rdev->bdev->bd_disk) |
3054 | - rdev->data_offset; | 3080 | - rdev->data_offset; |
3055 | if (fit && (info->size == 0 || info->size > avail/2)) | 3081 | if (fit && (info->size == 0 || info->size > avail/2)) |
3056 | info->size = avail/2; | 3082 | info->size = avail/2; |
3057 | if (avail < ((sector_t)info->size << 1)) | 3083 | if (avail < ((sector_t)info->size << 1)) |
3058 | return -ENOSPC; | 3084 | return -ENOSPC; |
3059 | } | 3085 | } |
3060 | rv = mddev->pers->resize(mddev, (sector_t)info->size *2); | 3086 | rv = mddev->pers->resize(mddev, (sector_t)info->size *2); |
3061 | if (!rv) { | 3087 | if (!rv) { |
3062 | struct block_device *bdev; | 3088 | struct block_device *bdev; |
3063 | 3089 | ||
3064 | bdev = bdget_disk(mddev->gendisk, 0); | 3090 | bdev = bdget_disk(mddev->gendisk, 0); |
3065 | if (bdev) { | 3091 | if (bdev) { |
3066 | down(&bdev->bd_inode->i_sem); | 3092 | down(&bdev->bd_inode->i_sem); |
3067 | i_size_write(bdev->bd_inode, mddev->array_size << 10); | 3093 | i_size_write(bdev->bd_inode, mddev->array_size << 10); |
3068 | up(&bdev->bd_inode->i_sem); | 3094 | up(&bdev->bd_inode->i_sem); |
3069 | bdput(bdev); | 3095 | bdput(bdev); |
3070 | } | 3096 | } |
3071 | } | 3097 | } |
3072 | } | 3098 | } |
3073 | if (mddev->raid_disks != info->raid_disks) { | 3099 | if (mddev->raid_disks != info->raid_disks) { |
3074 | /* change the number of raid disks */ | 3100 | /* change the number of raid disks */ |
3075 | if (mddev->pers->reshape == NULL) | 3101 | if (mddev->pers->reshape == NULL) |
3076 | return -EINVAL; | 3102 | return -EINVAL; |
3077 | if (info->raid_disks <= 0 || | 3103 | if (info->raid_disks <= 0 || |
3078 | info->raid_disks >= mddev->max_disks) | 3104 | info->raid_disks >= mddev->max_disks) |
3079 | return -EINVAL; | 3105 | return -EINVAL; |
3080 | if (mddev->sync_thread) | 3106 | if (mddev->sync_thread) |
3081 | return -EBUSY; | 3107 | return -EBUSY; |
3082 | rv = mddev->pers->reshape(mddev, info->raid_disks); | 3108 | rv = mddev->pers->reshape(mddev, info->raid_disks); |
3083 | if (!rv) { | 3109 | if (!rv) { |
3084 | struct block_device *bdev; | 3110 | struct block_device *bdev; |
3085 | 3111 | ||
3086 | bdev = bdget_disk(mddev->gendisk, 0); | 3112 | bdev = bdget_disk(mddev->gendisk, 0); |
3087 | if (bdev) { | 3113 | if (bdev) { |
3088 | down(&bdev->bd_inode->i_sem); | 3114 | down(&bdev->bd_inode->i_sem); |
3089 | i_size_write(bdev->bd_inode, mddev->array_size << 10); | 3115 | i_size_write(bdev->bd_inode, mddev->array_size << 10); |
3090 | up(&bdev->bd_inode->i_sem); | 3116 | up(&bdev->bd_inode->i_sem); |
3091 | bdput(bdev); | 3117 | bdput(bdev); |
3092 | } | 3118 | } |
3093 | } | 3119 | } |
3094 | } | 3120 | } |
3095 | if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) { | 3121 | if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) { |
3096 | if (mddev->pers->quiesce == NULL) | 3122 | if (mddev->pers->quiesce == NULL) |
3097 | return -EINVAL; | 3123 | return -EINVAL; |
3098 | if (mddev->recovery || mddev->sync_thread) | 3124 | if (mddev->recovery || mddev->sync_thread) |
3099 | return -EBUSY; | 3125 | return -EBUSY; |
3100 | if (info->state & (1<<MD_SB_BITMAP_PRESENT)) { | 3126 | if (info->state & (1<<MD_SB_BITMAP_PRESENT)) { |
3101 | /* add the bitmap */ | 3127 | /* add the bitmap */ |
3102 | if (mddev->bitmap) | 3128 | if (mddev->bitmap) |
3103 | return -EEXIST; | 3129 | return -EEXIST; |
3104 | if (mddev->default_bitmap_offset == 0) | 3130 | if (mddev->default_bitmap_offset == 0) |
3105 | return -EINVAL; | 3131 | return -EINVAL; |
3106 | mddev->bitmap_offset = mddev->default_bitmap_offset; | 3132 | mddev->bitmap_offset = mddev->default_bitmap_offset; |
3107 | mddev->pers->quiesce(mddev, 1); | 3133 | mddev->pers->quiesce(mddev, 1); |
3108 | rv = bitmap_create(mddev); | 3134 | rv = bitmap_create(mddev); |
3109 | if (rv) | 3135 | if (rv) |
3110 | bitmap_destroy(mddev); | 3136 | bitmap_destroy(mddev); |
3111 | mddev->pers->quiesce(mddev, 0); | 3137 | mddev->pers->quiesce(mddev, 0); |
3112 | } else { | 3138 | } else { |
3113 | /* remove the bitmap */ | 3139 | /* remove the bitmap */ |
3114 | if (!mddev->bitmap) | 3140 | if (!mddev->bitmap) |
3115 | return -ENOENT; | 3141 | return -ENOENT; |
3116 | if (mddev->bitmap->file) | 3142 | if (mddev->bitmap->file) |
3117 | return -EINVAL; | 3143 | return -EINVAL; |
3118 | mddev->pers->quiesce(mddev, 1); | 3144 | mddev->pers->quiesce(mddev, 1); |
3119 | bitmap_destroy(mddev); | 3145 | bitmap_destroy(mddev); |
3120 | mddev->pers->quiesce(mddev, 0); | 3146 | mddev->pers->quiesce(mddev, 0); |
3121 | mddev->bitmap_offset = 0; | 3147 | mddev->bitmap_offset = 0; |
3122 | } | 3148 | } |
3123 | } | 3149 | } |
3124 | md_update_sb(mddev); | 3150 | md_update_sb(mddev); |
3125 | return rv; | 3151 | return rv; |
3126 | } | 3152 | } |
3127 | 3153 | ||
3128 | static int set_disk_faulty(mddev_t *mddev, dev_t dev) | 3154 | static int set_disk_faulty(mddev_t *mddev, dev_t dev) |
3129 | { | 3155 | { |
3130 | mdk_rdev_t *rdev; | 3156 | mdk_rdev_t *rdev; |
3131 | 3157 | ||
3132 | if (mddev->pers == NULL) | 3158 | if (mddev->pers == NULL) |
3133 | return -ENODEV; | 3159 | return -ENODEV; |
3134 | 3160 | ||
3135 | rdev = find_rdev(mddev, dev); | 3161 | rdev = find_rdev(mddev, dev); |
3136 | if (!rdev) | 3162 | if (!rdev) |
3137 | return -ENODEV; | 3163 | return -ENODEV; |
3138 | 3164 | ||
3139 | md_error(mddev, rdev); | 3165 | md_error(mddev, rdev); |
3140 | return 0; | 3166 | return 0; |
3141 | } | 3167 | } |
3142 | 3168 | ||
3143 | static int md_ioctl(struct inode *inode, struct file *file, | 3169 | static int md_ioctl(struct inode *inode, struct file *file, |
3144 | unsigned int cmd, unsigned long arg) | 3170 | unsigned int cmd, unsigned long arg) |
3145 | { | 3171 | { |
3146 | int err = 0; | 3172 | int err = 0; |
3147 | void __user *argp = (void __user *)arg; | 3173 | void __user *argp = (void __user *)arg; |
3148 | struct hd_geometry __user *loc = argp; | 3174 | struct hd_geometry __user *loc = argp; |
3149 | mddev_t *mddev = NULL; | 3175 | mddev_t *mddev = NULL; |
3150 | 3176 | ||
3151 | if (!capable(CAP_SYS_ADMIN)) | 3177 | if (!capable(CAP_SYS_ADMIN)) |
3152 | return -EACCES; | 3178 | return -EACCES; |
3153 | 3179 | ||
3154 | /* | 3180 | /* |
3155 | * Commands dealing with the RAID driver but not any | 3181 | * Commands dealing with the RAID driver but not any |
3156 | * particular array: | 3182 | * particular array: |
3157 | */ | 3183 | */ |
3158 | switch (cmd) | 3184 | switch (cmd) |
3159 | { | 3185 | { |
3160 | case RAID_VERSION: | 3186 | case RAID_VERSION: |
3161 | err = get_version(argp); | 3187 | err = get_version(argp); |
3162 | goto done; | 3188 | goto done; |
3163 | 3189 | ||
3164 | case PRINT_RAID_DEBUG: | 3190 | case PRINT_RAID_DEBUG: |
3165 | err = 0; | 3191 | err = 0; |
3166 | md_print_devices(); | 3192 | md_print_devices(); |
3167 | goto done; | 3193 | goto done; |
3168 | 3194 | ||
3169 | #ifndef MODULE | 3195 | #ifndef MODULE |
3170 | case RAID_AUTORUN: | 3196 | case RAID_AUTORUN: |
3171 | err = 0; | 3197 | err = 0; |
3172 | autostart_arrays(arg); | 3198 | autostart_arrays(arg); |
3173 | goto done; | 3199 | goto done; |
3174 | #endif | 3200 | #endif |
3175 | default:; | 3201 | default:; |
3176 | } | 3202 | } |
3177 | 3203 | ||
3178 | /* | 3204 | /* |
3179 | * Commands creating/starting a new array: | 3205 | * Commands creating/starting a new array: |
3180 | */ | 3206 | */ |
3181 | 3207 | ||
3182 | mddev = inode->i_bdev->bd_disk->private_data; | 3208 | mddev = inode->i_bdev->bd_disk->private_data; |
3183 | 3209 | ||
3184 | if (!mddev) { | 3210 | if (!mddev) { |
3185 | BUG(); | 3211 | BUG(); |
3186 | goto abort; | 3212 | goto abort; |
3187 | } | 3213 | } |
3188 | 3214 | ||
3189 | 3215 | ||
3190 | if (cmd == START_ARRAY) { | 3216 | if (cmd == START_ARRAY) { |
3191 | /* START_ARRAY doesn't need to lock the array as autostart_array | 3217 | /* START_ARRAY doesn't need to lock the array as autostart_array |
3192 | * does the locking, and it could even be a different array | 3218 | * does the locking, and it could even be a different array |
3193 | */ | 3219 | */ |
3194 | static int cnt = 3; | 3220 | static int cnt = 3; |
3195 | if (cnt > 0 ) { | 3221 | if (cnt > 0 ) { |
3196 | printk(KERN_WARNING | 3222 | printk(KERN_WARNING |
3197 | "md: %s(pid %d) used deprecated START_ARRAY ioctl. " | 3223 | "md: %s(pid %d) used deprecated START_ARRAY ioctl. " |
3198 | "This will not be supported beyond July 2006\n", | 3224 | "This will not be supported beyond July 2006\n", |
3199 | current->comm, current->pid); | 3225 | current->comm, current->pid); |
3200 | cnt--; | 3226 | cnt--; |
3201 | } | 3227 | } |
3202 | err = autostart_array(new_decode_dev(arg)); | 3228 | err = autostart_array(new_decode_dev(arg)); |
3203 | if (err) { | 3229 | if (err) { |
3204 | printk(KERN_WARNING "md: autostart failed!\n"); | 3230 | printk(KERN_WARNING "md: autostart failed!\n"); |
3205 | goto abort; | 3231 | goto abort; |
3206 | } | 3232 | } |
3207 | goto done; | 3233 | goto done; |
3208 | } | 3234 | } |
3209 | 3235 | ||
3210 | err = mddev_lock(mddev); | 3236 | err = mddev_lock(mddev); |
3211 | if (err) { | 3237 | if (err) { |
3212 | printk(KERN_INFO | 3238 | printk(KERN_INFO |
3213 | "md: ioctl lock interrupted, reason %d, cmd %d\n", | 3239 | "md: ioctl lock interrupted, reason %d, cmd %d\n", |
3214 | err, cmd); | 3240 | err, cmd); |
3215 | goto abort; | 3241 | goto abort; |
3216 | } | 3242 | } |
3217 | 3243 | ||
3218 | switch (cmd) | 3244 | switch (cmd) |
3219 | { | 3245 | { |
3220 | case SET_ARRAY_INFO: | 3246 | case SET_ARRAY_INFO: |
3221 | { | 3247 | { |
3222 | mdu_array_info_t info; | 3248 | mdu_array_info_t info; |
3223 | if (!arg) | 3249 | if (!arg) |
3224 | memset(&info, 0, sizeof(info)); | 3250 | memset(&info, 0, sizeof(info)); |
3225 | else if (copy_from_user(&info, argp, sizeof(info))) { | 3251 | else if (copy_from_user(&info, argp, sizeof(info))) { |
3226 | err = -EFAULT; | 3252 | err = -EFAULT; |
3227 | goto abort_unlock; | 3253 | goto abort_unlock; |
3228 | } | 3254 | } |
3229 | if (mddev->pers) { | 3255 | if (mddev->pers) { |
3230 | err = update_array_info(mddev, &info); | 3256 | err = update_array_info(mddev, &info); |
3231 | if (err) { | 3257 | if (err) { |
3232 | printk(KERN_WARNING "md: couldn't update" | 3258 | printk(KERN_WARNING "md: couldn't update" |
3233 | " array info. %d\n", err); | 3259 | " array info. %d\n", err); |
3234 | goto abort_unlock; | 3260 | goto abort_unlock; |
3235 | } | 3261 | } |
3236 | goto done_unlock; | 3262 | goto done_unlock; |
3237 | } | 3263 | } |
3238 | if (!list_empty(&mddev->disks)) { | 3264 | if (!list_empty(&mddev->disks)) { |
3239 | printk(KERN_WARNING | 3265 | printk(KERN_WARNING |
3240 | "md: array %s already has disks!\n", | 3266 | "md: array %s already has disks!\n", |
3241 | mdname(mddev)); | 3267 | mdname(mddev)); |
3242 | err = -EBUSY; | 3268 | err = -EBUSY; |
3243 | goto abort_unlock; | 3269 | goto abort_unlock; |
3244 | } | 3270 | } |
3245 | if (mddev->raid_disks) { | 3271 | if (mddev->raid_disks) { |
3246 | printk(KERN_WARNING | 3272 | printk(KERN_WARNING |
3247 | "md: array %s already initialised!\n", | 3273 | "md: array %s already initialised!\n", |
3248 | mdname(mddev)); | 3274 | mdname(mddev)); |
3249 | err = -EBUSY; | 3275 | err = -EBUSY; |
3250 | goto abort_unlock; | 3276 | goto abort_unlock; |
3251 | } | 3277 | } |
3252 | err = set_array_info(mddev, &info); | 3278 | err = set_array_info(mddev, &info); |
3253 | if (err) { | 3279 | if (err) { |
3254 | printk(KERN_WARNING "md: couldn't set" | 3280 | printk(KERN_WARNING "md: couldn't set" |
3255 | " array info. %d\n", err); | 3281 | " array info. %d\n", err); |
3256 | goto abort_unlock; | 3282 | goto abort_unlock; |
3257 | } | 3283 | } |
3258 | } | 3284 | } |
3259 | goto done_unlock; | 3285 | goto done_unlock; |
3260 | 3286 | ||
3261 | default:; | 3287 | default:; |
3262 | } | 3288 | } |
3263 | 3289 | ||
3264 | /* | 3290 | /* |
3265 | * Commands querying/configuring an existing array: | 3291 | * Commands querying/configuring an existing array: |
3266 | */ | 3292 | */ |
3267 | /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY, | 3293 | /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY, |
3268 | * RUN_ARRAY, and SET_BITMAP_FILE are allowed */ | 3294 | * RUN_ARRAY, and SET_BITMAP_FILE are allowed */ |
3269 | if (!mddev->raid_disks && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY | 3295 | if (!mddev->raid_disks && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY |
3270 | && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE) { | 3296 | && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE) { |
3271 | err = -ENODEV; | 3297 | err = -ENODEV; |
3272 | goto abort_unlock; | 3298 | goto abort_unlock; |
3273 | } | 3299 | } |
3274 | 3300 | ||
3275 | /* | 3301 | /* |
3276 | * Commands even a read-only array can execute: | 3302 | * Commands even a read-only array can execute: |
3277 | */ | 3303 | */ |
3278 | switch (cmd) | 3304 | switch (cmd) |
3279 | { | 3305 | { |
3280 | case GET_ARRAY_INFO: | 3306 | case GET_ARRAY_INFO: |
3281 | err = get_array_info(mddev, argp); | 3307 | err = get_array_info(mddev, argp); |
3282 | goto done_unlock; | 3308 | goto done_unlock; |
3283 | 3309 | ||
3284 | case GET_BITMAP_FILE: | 3310 | case GET_BITMAP_FILE: |
3285 | err = get_bitmap_file(mddev, argp); | 3311 | err = get_bitmap_file(mddev, argp); |
3286 | goto done_unlock; | 3312 | goto done_unlock; |
3287 | 3313 | ||
3288 | case GET_DISK_INFO: | 3314 | case GET_DISK_INFO: |
3289 | err = get_disk_info(mddev, argp); | 3315 | err = get_disk_info(mddev, argp); |
3290 | goto done_unlock; | 3316 | goto done_unlock; |
3291 | 3317 | ||
3292 | case RESTART_ARRAY_RW: | 3318 | case RESTART_ARRAY_RW: |
3293 | err = restart_array(mddev); | 3319 | err = restart_array(mddev); |
3294 | goto done_unlock; | 3320 | goto done_unlock; |
3295 | 3321 | ||
3296 | case STOP_ARRAY: | 3322 | case STOP_ARRAY: |
3297 | err = do_md_stop (mddev, 0); | 3323 | err = do_md_stop (mddev, 0); |
3298 | goto done_unlock; | 3324 | goto done_unlock; |
3299 | 3325 | ||
3300 | case STOP_ARRAY_RO: | 3326 | case STOP_ARRAY_RO: |
3301 | err = do_md_stop (mddev, 1); | 3327 | err = do_md_stop (mddev, 1); |
3302 | goto done_unlock; | 3328 | goto done_unlock; |
3303 | 3329 | ||
3304 | /* | 3330 | /* |
3305 | * We have a problem here : there is no easy way to give a CHS | 3331 | * We have a problem here : there is no easy way to give a CHS |
3306 | * virtual geometry. We currently pretend that we have a 2 heads | 3332 | * virtual geometry. We currently pretend that we have a 2 heads |
3307 | * 4 sectors (with a BIG number of cylinders...). This drives | 3333 | * 4 sectors (with a BIG number of cylinders...). This drives |
3308 | * dosfs just mad... ;-) | 3334 | * dosfs just mad... ;-) |
3309 | */ | 3335 | */ |
3310 | case HDIO_GETGEO: | 3336 | case HDIO_GETGEO: |
3311 | if (!loc) { | 3337 | if (!loc) { |
3312 | err = -EINVAL; | 3338 | err = -EINVAL; |
3313 | goto abort_unlock; | 3339 | goto abort_unlock; |
3314 | } | 3340 | } |
3315 | err = put_user (2, (char __user *) &loc->heads); | 3341 | err = put_user (2, (char __user *) &loc->heads); |
3316 | if (err) | 3342 | if (err) |
3317 | goto abort_unlock; | 3343 | goto abort_unlock; |
3318 | err = put_user (4, (char __user *) &loc->sectors); | 3344 | err = put_user (4, (char __user *) &loc->sectors); |
3319 | if (err) | 3345 | if (err) |
3320 | goto abort_unlock; | 3346 | goto abort_unlock; |
3321 | err = put_user(get_capacity(mddev->gendisk)/8, | 3347 | err = put_user(get_capacity(mddev->gendisk)/8, |
3322 | (short __user *) &loc->cylinders); | 3348 | (short __user *) &loc->cylinders); |
3323 | if (err) | 3349 | if (err) |
3324 | goto abort_unlock; | 3350 | goto abort_unlock; |
3325 | err = put_user (get_start_sect(inode->i_bdev), | 3351 | err = put_user (get_start_sect(inode->i_bdev), |
3326 | (long __user *) &loc->start); | 3352 | (long __user *) &loc->start); |
3327 | goto done_unlock; | 3353 | goto done_unlock; |
3328 | } | 3354 | } |
3329 | 3355 | ||
3330 | /* | 3356 | /* |
3331 | * The remaining ioctls are changing the state of the | 3357 | * The remaining ioctls are changing the state of the |
3332 | * superblock, so we do not allow them on read-only arrays. | 3358 | * superblock, so we do not allow them on read-only arrays. |
3333 | * However non-MD ioctls (e.g. get-size) will still come through | 3359 | * However non-MD ioctls (e.g. get-size) will still come through |
3334 | * here and hit the 'default' below, so only disallow | 3360 | * here and hit the 'default' below, so only disallow |
3335 | * 'md' ioctls, and switch to rw mode if started auto-readonly. | 3361 | * 'md' ioctls, and switch to rw mode if started auto-readonly. |
3336 | */ | 3362 | */ |
3337 | if (_IOC_TYPE(cmd) == MD_MAJOR && | 3363 | if (_IOC_TYPE(cmd) == MD_MAJOR && |
3338 | mddev->ro && mddev->pers) { | 3364 | mddev->ro && mddev->pers) { |
3339 | if (mddev->ro == 2) { | 3365 | if (mddev->ro == 2) { |
3340 | mddev->ro = 0; | 3366 | mddev->ro = 0; |
3341 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 3367 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
3342 | md_wakeup_thread(mddev->thread); | 3368 | md_wakeup_thread(mddev->thread); |
3343 | 3369 | ||
3344 | } else { | 3370 | } else { |
3345 | err = -EROFS; | 3371 | err = -EROFS; |
3346 | goto abort_unlock; | 3372 | goto abort_unlock; |
3347 | } | 3373 | } |
3348 | } | 3374 | } |
3349 | 3375 | ||
3350 | switch (cmd) | 3376 | switch (cmd) |
3351 | { | 3377 | { |
3352 | case ADD_NEW_DISK: | 3378 | case ADD_NEW_DISK: |
3353 | { | 3379 | { |
3354 | mdu_disk_info_t info; | 3380 | mdu_disk_info_t info; |
3355 | if (copy_from_user(&info, argp, sizeof(info))) | 3381 | if (copy_from_user(&info, argp, sizeof(info))) |
3356 | err = -EFAULT; | 3382 | err = -EFAULT; |
3357 | else | 3383 | else |
3358 | err = add_new_disk(mddev, &info); | 3384 | err = add_new_disk(mddev, &info); |
3359 | goto done_unlock; | 3385 | goto done_unlock; |
3360 | } | 3386 | } |
3361 | 3387 | ||
3362 | case HOT_REMOVE_DISK: | 3388 | case HOT_REMOVE_DISK: |
3363 | err = hot_remove_disk(mddev, new_decode_dev(arg)); | 3389 | err = hot_remove_disk(mddev, new_decode_dev(arg)); |
3364 | goto done_unlock; | 3390 | goto done_unlock; |
3365 | 3391 | ||
3366 | case HOT_ADD_DISK: | 3392 | case HOT_ADD_DISK: |
3367 | err = hot_add_disk(mddev, new_decode_dev(arg)); | 3393 | err = hot_add_disk(mddev, new_decode_dev(arg)); |
3368 | goto done_unlock; | 3394 | goto done_unlock; |
3369 | 3395 | ||
3370 | case SET_DISK_FAULTY: | 3396 | case SET_DISK_FAULTY: |
3371 | err = set_disk_faulty(mddev, new_decode_dev(arg)); | 3397 | err = set_disk_faulty(mddev, new_decode_dev(arg)); |
3372 | goto done_unlock; | 3398 | goto done_unlock; |
3373 | 3399 | ||
3374 | case RUN_ARRAY: | 3400 | case RUN_ARRAY: |
3375 | err = do_md_run (mddev); | 3401 | err = do_md_run (mddev); |
3376 | goto done_unlock; | 3402 | goto done_unlock; |
3377 | 3403 | ||
3378 | case SET_BITMAP_FILE: | 3404 | case SET_BITMAP_FILE: |
3379 | err = set_bitmap_file(mddev, (int)arg); | 3405 | err = set_bitmap_file(mddev, (int)arg); |
3380 | goto done_unlock; | 3406 | goto done_unlock; |
3381 | 3407 | ||
3382 | default: | 3408 | default: |
3383 | if (_IOC_TYPE(cmd) == MD_MAJOR) | 3409 | if (_IOC_TYPE(cmd) == MD_MAJOR) |
3384 | printk(KERN_WARNING "md: %s(pid %d) used" | 3410 | printk(KERN_WARNING "md: %s(pid %d) used" |
3385 | " obsolete MD ioctl, upgrade your" | 3411 | " obsolete MD ioctl, upgrade your" |
3386 | " software to use new ictls.\n", | 3412 | " software to use new ictls.\n", |
3387 | current->comm, current->pid); | 3413 | current->comm, current->pid); |
3388 | err = -EINVAL; | 3414 | err = -EINVAL; |
3389 | goto abort_unlock; | 3415 | goto abort_unlock; |
3390 | } | 3416 | } |
3391 | 3417 | ||
3392 | done_unlock: | 3418 | done_unlock: |
3393 | abort_unlock: | 3419 | abort_unlock: |
3394 | mddev_unlock(mddev); | 3420 | mddev_unlock(mddev); |
3395 | 3421 | ||
3396 | return err; | 3422 | return err; |
3397 | done: | 3423 | done: |
3398 | if (err) | 3424 | if (err) |
3399 | MD_BUG(); | 3425 | MD_BUG(); |
3400 | abort: | 3426 | abort: |
3401 | return err; | 3427 | return err; |
3402 | } | 3428 | } |
3403 | 3429 | ||
3404 | static int md_open(struct inode *inode, struct file *file) | 3430 | static int md_open(struct inode *inode, struct file *file) |
3405 | { | 3431 | { |
3406 | /* | 3432 | /* |
3407 | * Succeed if we can lock the mddev, which confirms that | 3433 | * Succeed if we can lock the mddev, which confirms that |
3408 | * it isn't being stopped right now. | 3434 | * it isn't being stopped right now. |
3409 | */ | 3435 | */ |
3410 | mddev_t *mddev = inode->i_bdev->bd_disk->private_data; | 3436 | mddev_t *mddev = inode->i_bdev->bd_disk->private_data; |
3411 | int err; | 3437 | int err; |
3412 | 3438 | ||
3413 | if ((err = mddev_lock(mddev))) | 3439 | if ((err = mddev_lock(mddev))) |
3414 | goto out; | 3440 | goto out; |
3415 | 3441 | ||
3416 | err = 0; | 3442 | err = 0; |
3417 | mddev_get(mddev); | 3443 | mddev_get(mddev); |
3418 | mddev_unlock(mddev); | 3444 | mddev_unlock(mddev); |
3419 | 3445 | ||
3420 | check_disk_change(inode->i_bdev); | 3446 | check_disk_change(inode->i_bdev); |
3421 | out: | 3447 | out: |
3422 | return err; | 3448 | return err; |
3423 | } | 3449 | } |
3424 | 3450 | ||
3425 | static int md_release(struct inode *inode, struct file * file) | 3451 | static int md_release(struct inode *inode, struct file * file) |
3426 | { | 3452 | { |
3427 | mddev_t *mddev = inode->i_bdev->bd_disk->private_data; | 3453 | mddev_t *mddev = inode->i_bdev->bd_disk->private_data; |
3428 | 3454 | ||
3429 | if (!mddev) | 3455 | if (!mddev) |
3430 | BUG(); | 3456 | BUG(); |
3431 | mddev_put(mddev); | 3457 | mddev_put(mddev); |
3432 | 3458 | ||
3433 | return 0; | 3459 | return 0; |
3434 | } | 3460 | } |
3435 | 3461 | ||
3436 | static int md_media_changed(struct gendisk *disk) | 3462 | static int md_media_changed(struct gendisk *disk) |
3437 | { | 3463 | { |
3438 | mddev_t *mddev = disk->private_data; | 3464 | mddev_t *mddev = disk->private_data; |
3439 | 3465 | ||
3440 | return mddev->changed; | 3466 | return mddev->changed; |
3441 | } | 3467 | } |
3442 | 3468 | ||
3443 | static int md_revalidate(struct gendisk *disk) | 3469 | static int md_revalidate(struct gendisk *disk) |
3444 | { | 3470 | { |
3445 | mddev_t *mddev = disk->private_data; | 3471 | mddev_t *mddev = disk->private_data; |
3446 | 3472 | ||
3447 | mddev->changed = 0; | 3473 | mddev->changed = 0; |
3448 | return 0; | 3474 | return 0; |
3449 | } | 3475 | } |
3450 | static struct block_device_operations md_fops = | 3476 | static struct block_device_operations md_fops = |
3451 | { | 3477 | { |
3452 | .owner = THIS_MODULE, | 3478 | .owner = THIS_MODULE, |
3453 | .open = md_open, | 3479 | .open = md_open, |
3454 | .release = md_release, | 3480 | .release = md_release, |
3455 | .ioctl = md_ioctl, | 3481 | .ioctl = md_ioctl, |
3456 | .media_changed = md_media_changed, | 3482 | .media_changed = md_media_changed, |
3457 | .revalidate_disk= md_revalidate, | 3483 | .revalidate_disk= md_revalidate, |
3458 | }; | 3484 | }; |
3459 | 3485 | ||
3460 | static int md_thread(void * arg) | 3486 | static int md_thread(void * arg) |
3461 | { | 3487 | { |
3462 | mdk_thread_t *thread = arg; | 3488 | mdk_thread_t *thread = arg; |
3463 | 3489 | ||
3464 | /* | 3490 | /* |
3465 | * md_thread is a 'system-thread', it's priority should be very | 3491 | * md_thread is a 'system-thread', it's priority should be very |
3466 | * high. We avoid resource deadlocks individually in each | 3492 | * high. We avoid resource deadlocks individually in each |
3467 | * raid personality. (RAID5 does preallocation) We also use RR and | 3493 | * raid personality. (RAID5 does preallocation) We also use RR and |
3468 | * the very same RT priority as kswapd, thus we will never get | 3494 | * the very same RT priority as kswapd, thus we will never get |
3469 | * into a priority inversion deadlock. | 3495 | * into a priority inversion deadlock. |
3470 | * | 3496 | * |
3471 | * we definitely have to have equal or higher priority than | 3497 | * we definitely have to have equal or higher priority than |
3472 | * bdflush, otherwise bdflush will deadlock if there are too | 3498 | * bdflush, otherwise bdflush will deadlock if there are too |
3473 | * many dirty RAID5 blocks. | 3499 | * many dirty RAID5 blocks. |
3474 | */ | 3500 | */ |
3475 | 3501 | ||
3476 | allow_signal(SIGKILL); | 3502 | allow_signal(SIGKILL); |
3477 | while (!kthread_should_stop()) { | 3503 | while (!kthread_should_stop()) { |
3478 | 3504 | ||
3479 | /* We need to wait INTERRUPTIBLE so that | 3505 | /* We need to wait INTERRUPTIBLE so that |
3480 | * we don't add to the load-average. | 3506 | * we don't add to the load-average. |
3481 | * That means we need to be sure no signals are | 3507 | * That means we need to be sure no signals are |
3482 | * pending | 3508 | * pending |
3483 | */ | 3509 | */ |
3484 | if (signal_pending(current)) | 3510 | if (signal_pending(current)) |
3485 | flush_signals(current); | 3511 | flush_signals(current); |
3486 | 3512 | ||
3487 | wait_event_interruptible_timeout | 3513 | wait_event_interruptible_timeout |
3488 | (thread->wqueue, | 3514 | (thread->wqueue, |
3489 | test_bit(THREAD_WAKEUP, &thread->flags) | 3515 | test_bit(THREAD_WAKEUP, &thread->flags) |
3490 | || kthread_should_stop(), | 3516 | || kthread_should_stop(), |
3491 | thread->timeout); | 3517 | thread->timeout); |
3492 | try_to_freeze(); | 3518 | try_to_freeze(); |
3493 | 3519 | ||
3494 | clear_bit(THREAD_WAKEUP, &thread->flags); | 3520 | clear_bit(THREAD_WAKEUP, &thread->flags); |
3495 | 3521 | ||
3496 | thread->run(thread->mddev); | 3522 | thread->run(thread->mddev); |
3497 | } | 3523 | } |
3498 | 3524 | ||
3499 | return 0; | 3525 | return 0; |
3500 | } | 3526 | } |
3501 | 3527 | ||
3502 | void md_wakeup_thread(mdk_thread_t *thread) | 3528 | void md_wakeup_thread(mdk_thread_t *thread) |
3503 | { | 3529 | { |
3504 | if (thread) { | 3530 | if (thread) { |
3505 | dprintk("md: waking up MD thread %s.\n", thread->tsk->comm); | 3531 | dprintk("md: waking up MD thread %s.\n", thread->tsk->comm); |
3506 | set_bit(THREAD_WAKEUP, &thread->flags); | 3532 | set_bit(THREAD_WAKEUP, &thread->flags); |
3507 | wake_up(&thread->wqueue); | 3533 | wake_up(&thread->wqueue); |
3508 | } | 3534 | } |
3509 | } | 3535 | } |
3510 | 3536 | ||
3511 | mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev, | 3537 | mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev, |
3512 | const char *name) | 3538 | const char *name) |
3513 | { | 3539 | { |
3514 | mdk_thread_t *thread; | 3540 | mdk_thread_t *thread; |
3515 | 3541 | ||
3516 | thread = kzalloc(sizeof(mdk_thread_t), GFP_KERNEL); | 3542 | thread = kzalloc(sizeof(mdk_thread_t), GFP_KERNEL); |
3517 | if (!thread) | 3543 | if (!thread) |
3518 | return NULL; | 3544 | return NULL; |
3519 | 3545 | ||
3520 | init_waitqueue_head(&thread->wqueue); | 3546 | init_waitqueue_head(&thread->wqueue); |
3521 | 3547 | ||
3522 | thread->run = run; | 3548 | thread->run = run; |
3523 | thread->mddev = mddev; | 3549 | thread->mddev = mddev; |
3524 | thread->timeout = MAX_SCHEDULE_TIMEOUT; | 3550 | thread->timeout = MAX_SCHEDULE_TIMEOUT; |
3525 | thread->tsk = kthread_run(md_thread, thread, name, mdname(thread->mddev)); | 3551 | thread->tsk = kthread_run(md_thread, thread, name, mdname(thread->mddev)); |
3526 | if (IS_ERR(thread->tsk)) { | 3552 | if (IS_ERR(thread->tsk)) { |
3527 | kfree(thread); | 3553 | kfree(thread); |
3528 | return NULL; | 3554 | return NULL; |
3529 | } | 3555 | } |
3530 | return thread; | 3556 | return thread; |
3531 | } | 3557 | } |
3532 | 3558 | ||
3533 | void md_unregister_thread(mdk_thread_t *thread) | 3559 | void md_unregister_thread(mdk_thread_t *thread) |
3534 | { | 3560 | { |
3535 | dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); | 3561 | dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); |
3536 | 3562 | ||
3537 | kthread_stop(thread->tsk); | 3563 | kthread_stop(thread->tsk); |
3538 | kfree(thread); | 3564 | kfree(thread); |
3539 | } | 3565 | } |
3540 | 3566 | ||
3541 | void md_error(mddev_t *mddev, mdk_rdev_t *rdev) | 3567 | void md_error(mddev_t *mddev, mdk_rdev_t *rdev) |
3542 | { | 3568 | { |
3543 | if (!mddev) { | 3569 | if (!mddev) { |
3544 | MD_BUG(); | 3570 | MD_BUG(); |
3545 | return; | 3571 | return; |
3546 | } | 3572 | } |
3547 | 3573 | ||
3548 | if (!rdev || test_bit(Faulty, &rdev->flags)) | 3574 | if (!rdev || test_bit(Faulty, &rdev->flags)) |
3549 | return; | 3575 | return; |
3550 | /* | 3576 | /* |
3551 | dprintk("md_error dev:%s, rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", | 3577 | dprintk("md_error dev:%s, rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", |
3552 | mdname(mddev), | 3578 | mdname(mddev), |
3553 | MAJOR(rdev->bdev->bd_dev), MINOR(rdev->bdev->bd_dev), | 3579 | MAJOR(rdev->bdev->bd_dev), MINOR(rdev->bdev->bd_dev), |
3554 | __builtin_return_address(0),__builtin_return_address(1), | 3580 | __builtin_return_address(0),__builtin_return_address(1), |
3555 | __builtin_return_address(2),__builtin_return_address(3)); | 3581 | __builtin_return_address(2),__builtin_return_address(3)); |
3556 | */ | 3582 | */ |
3557 | if (!mddev->pers->error_handler) | 3583 | if (!mddev->pers->error_handler) |
3558 | return; | 3584 | return; |
3559 | mddev->pers->error_handler(mddev,rdev); | 3585 | mddev->pers->error_handler(mddev,rdev); |
3560 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); | 3586 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); |
3561 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 3587 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
3562 | md_wakeup_thread(mddev->thread); | 3588 | md_wakeup_thread(mddev->thread); |
3563 | md_new_event(mddev); | 3589 | md_new_event(mddev); |
3564 | } | 3590 | } |
3565 | 3591 | ||
3566 | /* seq_file implementation /proc/mdstat */ | 3592 | /* seq_file implementation /proc/mdstat */ |
3567 | 3593 | ||
3568 | static void status_unused(struct seq_file *seq) | 3594 | static void status_unused(struct seq_file *seq) |
3569 | { | 3595 | { |
3570 | int i = 0; | 3596 | int i = 0; |
3571 | mdk_rdev_t *rdev; | 3597 | mdk_rdev_t *rdev; |
3572 | struct list_head *tmp; | 3598 | struct list_head *tmp; |
3573 | 3599 | ||
3574 | seq_printf(seq, "unused devices: "); | 3600 | seq_printf(seq, "unused devices: "); |
3575 | 3601 | ||
3576 | ITERATE_RDEV_PENDING(rdev,tmp) { | 3602 | ITERATE_RDEV_PENDING(rdev,tmp) { |
3577 | char b[BDEVNAME_SIZE]; | 3603 | char b[BDEVNAME_SIZE]; |
3578 | i++; | 3604 | i++; |
3579 | seq_printf(seq, "%s ", | 3605 | seq_printf(seq, "%s ", |
3580 | bdevname(rdev->bdev,b)); | 3606 | bdevname(rdev->bdev,b)); |
3581 | } | 3607 | } |
3582 | if (!i) | 3608 | if (!i) |
3583 | seq_printf(seq, "<none>"); | 3609 | seq_printf(seq, "<none>"); |
3584 | 3610 | ||
3585 | seq_printf(seq, "\n"); | 3611 | seq_printf(seq, "\n"); |
3586 | } | 3612 | } |
3587 | 3613 | ||
3588 | 3614 | ||
3589 | static void status_resync(struct seq_file *seq, mddev_t * mddev) | 3615 | static void status_resync(struct seq_file *seq, mddev_t * mddev) |
3590 | { | 3616 | { |
3591 | unsigned long max_blocks, resync, res, dt, db, rt; | 3617 | unsigned long max_blocks, resync, res, dt, db, rt; |
3592 | 3618 | ||
3593 | resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; | 3619 | resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; |
3594 | 3620 | ||
3595 | if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) | 3621 | if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) |
3596 | max_blocks = mddev->resync_max_sectors >> 1; | 3622 | max_blocks = mddev->resync_max_sectors >> 1; |
3597 | else | 3623 | else |
3598 | max_blocks = mddev->size; | 3624 | max_blocks = mddev->size; |
3599 | 3625 | ||
3600 | /* | 3626 | /* |
3601 | * Should not happen. | 3627 | * Should not happen. |
3602 | */ | 3628 | */ |
3603 | if (!max_blocks) { | 3629 | if (!max_blocks) { |
3604 | MD_BUG(); | 3630 | MD_BUG(); |
3605 | return; | 3631 | return; |
3606 | } | 3632 | } |
3607 | res = (resync/1024)*1000/(max_blocks/1024 + 1); | 3633 | res = (resync/1024)*1000/(max_blocks/1024 + 1); |
3608 | { | 3634 | { |
3609 | int i, x = res/50, y = 20-x; | 3635 | int i, x = res/50, y = 20-x; |
3610 | seq_printf(seq, "["); | 3636 | seq_printf(seq, "["); |
3611 | for (i = 0; i < x; i++) | 3637 | for (i = 0; i < x; i++) |
3612 | seq_printf(seq, "="); | 3638 | seq_printf(seq, "="); |
3613 | seq_printf(seq, ">"); | 3639 | seq_printf(seq, ">"); |
3614 | for (i = 0; i < y; i++) | 3640 | for (i = 0; i < y; i++) |
3615 | seq_printf(seq, "."); | 3641 | seq_printf(seq, "."); |
3616 | seq_printf(seq, "] "); | 3642 | seq_printf(seq, "] "); |
3617 | } | 3643 | } |
3618 | seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)", | 3644 | seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)", |
3619 | (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? | 3645 | (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? |
3620 | "resync" : "recovery"), | 3646 | "resync" : "recovery"), |
3621 | res/10, res % 10, resync, max_blocks); | 3647 | res/10, res % 10, resync, max_blocks); |
3622 | 3648 | ||
3623 | /* | 3649 | /* |
3624 | * We do not want to overflow, so the order of operands and | 3650 | * We do not want to overflow, so the order of operands and |
3625 | * the * 100 / 100 trick are important. We do a +1 to be | 3651 | * the * 100 / 100 trick are important. We do a +1 to be |
3626 | * safe against division by zero. We only estimate anyway. | 3652 | * safe against division by zero. We only estimate anyway. |
3627 | * | 3653 | * |
3628 | * dt: time from mark until now | 3654 | * dt: time from mark until now |
3629 | * db: blocks written from mark until now | 3655 | * db: blocks written from mark until now |
3630 | * rt: remaining time | 3656 | * rt: remaining time |
3631 | */ | 3657 | */ |
3632 | dt = ((jiffies - mddev->resync_mark) / HZ); | 3658 | dt = ((jiffies - mddev->resync_mark) / HZ); |
3633 | if (!dt) dt++; | 3659 | if (!dt) dt++; |
3634 | db = resync - (mddev->resync_mark_cnt/2); | 3660 | db = resync - (mddev->resync_mark_cnt/2); |
3635 | rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; | 3661 | rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; |
3636 | 3662 | ||
3637 | seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); | 3663 | seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); |
3638 | 3664 | ||
3639 | seq_printf(seq, " speed=%ldK/sec", db/dt); | 3665 | seq_printf(seq, " speed=%ldK/sec", db/dt); |
3640 | } | 3666 | } |
3641 | 3667 | ||
3642 | static void *md_seq_start(struct seq_file *seq, loff_t *pos) | 3668 | static void *md_seq_start(struct seq_file *seq, loff_t *pos) |
3643 | { | 3669 | { |
3644 | struct list_head *tmp; | 3670 | struct list_head *tmp; |
3645 | loff_t l = *pos; | 3671 | loff_t l = *pos; |
3646 | mddev_t *mddev; | 3672 | mddev_t *mddev; |
3647 | 3673 | ||
3648 | if (l >= 0x10000) | 3674 | if (l >= 0x10000) |
3649 | return NULL; | 3675 | return NULL; |
3650 | if (!l--) | 3676 | if (!l--) |
3651 | /* header */ | 3677 | /* header */ |
3652 | return (void*)1; | 3678 | return (void*)1; |
3653 | 3679 | ||
3654 | spin_lock(&all_mddevs_lock); | 3680 | spin_lock(&all_mddevs_lock); |
3655 | list_for_each(tmp,&all_mddevs) | 3681 | list_for_each(tmp,&all_mddevs) |
3656 | if (!l--) { | 3682 | if (!l--) { |
3657 | mddev = list_entry(tmp, mddev_t, all_mddevs); | 3683 | mddev = list_entry(tmp, mddev_t, all_mddevs); |
3658 | mddev_get(mddev); | 3684 | mddev_get(mddev); |
3659 | spin_unlock(&all_mddevs_lock); | 3685 | spin_unlock(&all_mddevs_lock); |
3660 | return mddev; | 3686 | return mddev; |
3661 | } | 3687 | } |
3662 | spin_unlock(&all_mddevs_lock); | 3688 | spin_unlock(&all_mddevs_lock); |
3663 | if (!l--) | 3689 | if (!l--) |
3664 | return (void*)2;/* tail */ | 3690 | return (void*)2;/* tail */ |
3665 | return NULL; | 3691 | return NULL; |
3666 | } | 3692 | } |
3667 | 3693 | ||
3668 | static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) | 3694 | static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) |
3669 | { | 3695 | { |
3670 | struct list_head *tmp; | 3696 | struct list_head *tmp; |
3671 | mddev_t *next_mddev, *mddev = v; | 3697 | mddev_t *next_mddev, *mddev = v; |
3672 | 3698 | ||
3673 | ++*pos; | 3699 | ++*pos; |
3674 | if (v == (void*)2) | 3700 | if (v == (void*)2) |
3675 | return NULL; | 3701 | return NULL; |
3676 | 3702 | ||
3677 | spin_lock(&all_mddevs_lock); | 3703 | spin_lock(&all_mddevs_lock); |
3678 | if (v == (void*)1) | 3704 | if (v == (void*)1) |
3679 | tmp = all_mddevs.next; | 3705 | tmp = all_mddevs.next; |
3680 | else | 3706 | else |
3681 | tmp = mddev->all_mddevs.next; | 3707 | tmp = mddev->all_mddevs.next; |
3682 | if (tmp != &all_mddevs) | 3708 | if (tmp != &all_mddevs) |
3683 | next_mddev = mddev_get(list_entry(tmp,mddev_t,all_mddevs)); | 3709 | next_mddev = mddev_get(list_entry(tmp,mddev_t,all_mddevs)); |
3684 | else { | 3710 | else { |
3685 | next_mddev = (void*)2; | 3711 | next_mddev = (void*)2; |
3686 | *pos = 0x10000; | 3712 | *pos = 0x10000; |
3687 | } | 3713 | } |
3688 | spin_unlock(&all_mddevs_lock); | 3714 | spin_unlock(&all_mddevs_lock); |
3689 | 3715 | ||
3690 | if (v != (void*)1) | 3716 | if (v != (void*)1) |
3691 | mddev_put(mddev); | 3717 | mddev_put(mddev); |
3692 | return next_mddev; | 3718 | return next_mddev; |
3693 | 3719 | ||
3694 | } | 3720 | } |
3695 | 3721 | ||
3696 | static void md_seq_stop(struct seq_file *seq, void *v) | 3722 | static void md_seq_stop(struct seq_file *seq, void *v) |
3697 | { | 3723 | { |
3698 | mddev_t *mddev = v; | 3724 | mddev_t *mddev = v; |
3699 | 3725 | ||
3700 | if (mddev && v != (void*)1 && v != (void*)2) | 3726 | if (mddev && v != (void*)1 && v != (void*)2) |
3701 | mddev_put(mddev); | 3727 | mddev_put(mddev); |
3702 | } | 3728 | } |
3703 | 3729 | ||
3704 | struct mdstat_info { | 3730 | struct mdstat_info { |
3705 | int event; | 3731 | int event; |
3706 | }; | 3732 | }; |
3707 | 3733 | ||
3708 | static int md_seq_show(struct seq_file *seq, void *v) | 3734 | static int md_seq_show(struct seq_file *seq, void *v) |
3709 | { | 3735 | { |
3710 | mddev_t *mddev = v; | 3736 | mddev_t *mddev = v; |
3711 | sector_t size; | 3737 | sector_t size; |
3712 | struct list_head *tmp2; | 3738 | struct list_head *tmp2; |
3713 | mdk_rdev_t *rdev; | 3739 | mdk_rdev_t *rdev; |
3714 | struct mdstat_info *mi = seq->private; | 3740 | struct mdstat_info *mi = seq->private; |
3715 | struct bitmap *bitmap; | 3741 | struct bitmap *bitmap; |
3716 | 3742 | ||
3717 | if (v == (void*)1) { | 3743 | if (v == (void*)1) { |
3718 | struct mdk_personality *pers; | 3744 | struct mdk_personality *pers; |
3719 | seq_printf(seq, "Personalities : "); | 3745 | seq_printf(seq, "Personalities : "); |
3720 | spin_lock(&pers_lock); | 3746 | spin_lock(&pers_lock); |
3721 | list_for_each_entry(pers, &pers_list, list) | 3747 | list_for_each_entry(pers, &pers_list, list) |
3722 | seq_printf(seq, "[%s] ", pers->name); | 3748 | seq_printf(seq, "[%s] ", pers->name); |
3723 | 3749 | ||
3724 | spin_unlock(&pers_lock); | 3750 | spin_unlock(&pers_lock); |
3725 | seq_printf(seq, "\n"); | 3751 | seq_printf(seq, "\n"); |
3726 | mi->event = atomic_read(&md_event_count); | 3752 | mi->event = atomic_read(&md_event_count); |
3727 | return 0; | 3753 | return 0; |
3728 | } | 3754 | } |
3729 | if (v == (void*)2) { | 3755 | if (v == (void*)2) { |
3730 | status_unused(seq); | 3756 | status_unused(seq); |
3731 | return 0; | 3757 | return 0; |
3732 | } | 3758 | } |
3733 | 3759 | ||
3734 | if (mddev_lock(mddev)!=0) | 3760 | if (mddev_lock(mddev)!=0) |
3735 | return -EINTR; | 3761 | return -EINTR; |
3736 | if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { | 3762 | if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { |
3737 | seq_printf(seq, "%s : %sactive", mdname(mddev), | 3763 | seq_printf(seq, "%s : %sactive", mdname(mddev), |
3738 | mddev->pers ? "" : "in"); | 3764 | mddev->pers ? "" : "in"); |
3739 | if (mddev->pers) { | 3765 | if (mddev->pers) { |
3740 | if (mddev->ro==1) | 3766 | if (mddev->ro==1) |
3741 | seq_printf(seq, " (read-only)"); | 3767 | seq_printf(seq, " (read-only)"); |
3742 | if (mddev->ro==2) | 3768 | if (mddev->ro==2) |
3743 | seq_printf(seq, "(auto-read-only)"); | 3769 | seq_printf(seq, "(auto-read-only)"); |
3744 | seq_printf(seq, " %s", mddev->pers->name); | 3770 | seq_printf(seq, " %s", mddev->pers->name); |
3745 | } | 3771 | } |
3746 | 3772 | ||
3747 | size = 0; | 3773 | size = 0; |
3748 | ITERATE_RDEV(mddev,rdev,tmp2) { | 3774 | ITERATE_RDEV(mddev,rdev,tmp2) { |
3749 | char b[BDEVNAME_SIZE]; | 3775 | char b[BDEVNAME_SIZE]; |
3750 | seq_printf(seq, " %s[%d]", | 3776 | seq_printf(seq, " %s[%d]", |
3751 | bdevname(rdev->bdev,b), rdev->desc_nr); | 3777 | bdevname(rdev->bdev,b), rdev->desc_nr); |
3752 | if (test_bit(WriteMostly, &rdev->flags)) | 3778 | if (test_bit(WriteMostly, &rdev->flags)) |
3753 | seq_printf(seq, "(W)"); | 3779 | seq_printf(seq, "(W)"); |
3754 | if (test_bit(Faulty, &rdev->flags)) { | 3780 | if (test_bit(Faulty, &rdev->flags)) { |
3755 | seq_printf(seq, "(F)"); | 3781 | seq_printf(seq, "(F)"); |
3756 | continue; | 3782 | continue; |
3757 | } else if (rdev->raid_disk < 0) | 3783 | } else if (rdev->raid_disk < 0) |
3758 | seq_printf(seq, "(S)"); /* spare */ | 3784 | seq_printf(seq, "(S)"); /* spare */ |
3759 | size += rdev->size; | 3785 | size += rdev->size; |
3760 | } | 3786 | } |
3761 | 3787 | ||
3762 | if (!list_empty(&mddev->disks)) { | 3788 | if (!list_empty(&mddev->disks)) { |
3763 | if (mddev->pers) | 3789 | if (mddev->pers) |
3764 | seq_printf(seq, "\n %llu blocks", | 3790 | seq_printf(seq, "\n %llu blocks", |
3765 | (unsigned long long)mddev->array_size); | 3791 | (unsigned long long)mddev->array_size); |
3766 | else | 3792 | else |
3767 | seq_printf(seq, "\n %llu blocks", | 3793 | seq_printf(seq, "\n %llu blocks", |
3768 | (unsigned long long)size); | 3794 | (unsigned long long)size); |
3769 | } | 3795 | } |
3770 | if (mddev->persistent) { | 3796 | if (mddev->persistent) { |
3771 | if (mddev->major_version != 0 || | 3797 | if (mddev->major_version != 0 || |
3772 | mddev->minor_version != 90) { | 3798 | mddev->minor_version != 90) { |
3773 | seq_printf(seq," super %d.%d", | 3799 | seq_printf(seq," super %d.%d", |
3774 | mddev->major_version, | 3800 | mddev->major_version, |
3775 | mddev->minor_version); | 3801 | mddev->minor_version); |
3776 | } | 3802 | } |
3777 | } else | 3803 | } else |
3778 | seq_printf(seq, " super non-persistent"); | 3804 | seq_printf(seq, " super non-persistent"); |
3779 | 3805 | ||
3780 | if (mddev->pers) { | 3806 | if (mddev->pers) { |
3781 | mddev->pers->status (seq, mddev); | 3807 | mddev->pers->status (seq, mddev); |
3782 | seq_printf(seq, "\n "); | 3808 | seq_printf(seq, "\n "); |
3783 | if (mddev->pers->sync_request) { | 3809 | if (mddev->pers->sync_request) { |
3784 | if (mddev->curr_resync > 2) { | 3810 | if (mddev->curr_resync > 2) { |
3785 | status_resync (seq, mddev); | 3811 | status_resync (seq, mddev); |
3786 | seq_printf(seq, "\n "); | 3812 | seq_printf(seq, "\n "); |
3787 | } else if (mddev->curr_resync == 1 || mddev->curr_resync == 2) | 3813 | } else if (mddev->curr_resync == 1 || mddev->curr_resync == 2) |
3788 | seq_printf(seq, "\tresync=DELAYED\n "); | 3814 | seq_printf(seq, "\tresync=DELAYED\n "); |
3789 | else if (mddev->recovery_cp < MaxSector) | 3815 | else if (mddev->recovery_cp < MaxSector) |
3790 | seq_printf(seq, "\tresync=PENDING\n "); | 3816 | seq_printf(seq, "\tresync=PENDING\n "); |
3791 | } | 3817 | } |
3792 | } else | 3818 | } else |
3793 | seq_printf(seq, "\n "); | 3819 | seq_printf(seq, "\n "); |
3794 | 3820 | ||
3795 | if ((bitmap = mddev->bitmap)) { | 3821 | if ((bitmap = mddev->bitmap)) { |
3796 | unsigned long chunk_kb; | 3822 | unsigned long chunk_kb; |
3797 | unsigned long flags; | 3823 | unsigned long flags; |
3798 | spin_lock_irqsave(&bitmap->lock, flags); | 3824 | spin_lock_irqsave(&bitmap->lock, flags); |
3799 | chunk_kb = bitmap->chunksize >> 10; | 3825 | chunk_kb = bitmap->chunksize >> 10; |
3800 | seq_printf(seq, "bitmap: %lu/%lu pages [%luKB], " | 3826 | seq_printf(seq, "bitmap: %lu/%lu pages [%luKB], " |
3801 | "%lu%s chunk", | 3827 | "%lu%s chunk", |
3802 | bitmap->pages - bitmap->missing_pages, | 3828 | bitmap->pages - bitmap->missing_pages, |
3803 | bitmap->pages, | 3829 | bitmap->pages, |
3804 | (bitmap->pages - bitmap->missing_pages) | 3830 | (bitmap->pages - bitmap->missing_pages) |
3805 | << (PAGE_SHIFT - 10), | 3831 | << (PAGE_SHIFT - 10), |
3806 | chunk_kb ? chunk_kb : bitmap->chunksize, | 3832 | chunk_kb ? chunk_kb : bitmap->chunksize, |
3807 | chunk_kb ? "KB" : "B"); | 3833 | chunk_kb ? "KB" : "B"); |
3808 | if (bitmap->file) { | 3834 | if (bitmap->file) { |
3809 | seq_printf(seq, ", file: "); | 3835 | seq_printf(seq, ", file: "); |
3810 | seq_path(seq, bitmap->file->f_vfsmnt, | 3836 | seq_path(seq, bitmap->file->f_vfsmnt, |
3811 | bitmap->file->f_dentry," \t\n"); | 3837 | bitmap->file->f_dentry," \t\n"); |
3812 | } | 3838 | } |
3813 | 3839 | ||
3814 | seq_printf(seq, "\n"); | 3840 | seq_printf(seq, "\n"); |
3815 | spin_unlock_irqrestore(&bitmap->lock, flags); | 3841 | spin_unlock_irqrestore(&bitmap->lock, flags); |
3816 | } | 3842 | } |
3817 | 3843 | ||
3818 | seq_printf(seq, "\n"); | 3844 | seq_printf(seq, "\n"); |
3819 | } | 3845 | } |
3820 | mddev_unlock(mddev); | 3846 | mddev_unlock(mddev); |
3821 | 3847 | ||
3822 | return 0; | 3848 | return 0; |
3823 | } | 3849 | } |
3824 | 3850 | ||
3825 | static struct seq_operations md_seq_ops = { | 3851 | static struct seq_operations md_seq_ops = { |
3826 | .start = md_seq_start, | 3852 | .start = md_seq_start, |
3827 | .next = md_seq_next, | 3853 | .next = md_seq_next, |
3828 | .stop = md_seq_stop, | 3854 | .stop = md_seq_stop, |
3829 | .show = md_seq_show, | 3855 | .show = md_seq_show, |
3830 | }; | 3856 | }; |
3831 | 3857 | ||
3832 | static int md_seq_open(struct inode *inode, struct file *file) | 3858 | static int md_seq_open(struct inode *inode, struct file *file) |
3833 | { | 3859 | { |
3834 | int error; | 3860 | int error; |
3835 | struct mdstat_info *mi = kmalloc(sizeof(*mi), GFP_KERNEL); | 3861 | struct mdstat_info *mi = kmalloc(sizeof(*mi), GFP_KERNEL); |
3836 | if (mi == NULL) | 3862 | if (mi == NULL) |
3837 | return -ENOMEM; | 3863 | return -ENOMEM; |
3838 | 3864 | ||
3839 | error = seq_open(file, &md_seq_ops); | 3865 | error = seq_open(file, &md_seq_ops); |
3840 | if (error) | 3866 | if (error) |
3841 | kfree(mi); | 3867 | kfree(mi); |
3842 | else { | 3868 | else { |
3843 | struct seq_file *p = file->private_data; | 3869 | struct seq_file *p = file->private_data; |
3844 | p->private = mi; | 3870 | p->private = mi; |
3845 | mi->event = atomic_read(&md_event_count); | 3871 | mi->event = atomic_read(&md_event_count); |
3846 | } | 3872 | } |
3847 | return error; | 3873 | return error; |
3848 | } | 3874 | } |
3849 | 3875 | ||
3850 | static int md_seq_release(struct inode *inode, struct file *file) | 3876 | static int md_seq_release(struct inode *inode, struct file *file) |
3851 | { | 3877 | { |
3852 | struct seq_file *m = file->private_data; | 3878 | struct seq_file *m = file->private_data; |
3853 | struct mdstat_info *mi = m->private; | 3879 | struct mdstat_info *mi = m->private; |
3854 | m->private = NULL; | 3880 | m->private = NULL; |
3855 | kfree(mi); | 3881 | kfree(mi); |
3856 | return seq_release(inode, file); | 3882 | return seq_release(inode, file); |
3857 | } | 3883 | } |
3858 | 3884 | ||
3859 | static unsigned int mdstat_poll(struct file *filp, poll_table *wait) | 3885 | static unsigned int mdstat_poll(struct file *filp, poll_table *wait) |
3860 | { | 3886 | { |
3861 | struct seq_file *m = filp->private_data; | 3887 | struct seq_file *m = filp->private_data; |
3862 | struct mdstat_info *mi = m->private; | 3888 | struct mdstat_info *mi = m->private; |
3863 | int mask; | 3889 | int mask; |
3864 | 3890 | ||
3865 | poll_wait(filp, &md_event_waiters, wait); | 3891 | poll_wait(filp, &md_event_waiters, wait); |
3866 | 3892 | ||
3867 | /* always allow read */ | 3893 | /* always allow read */ |
3868 | mask = POLLIN | POLLRDNORM; | 3894 | mask = POLLIN | POLLRDNORM; |
3869 | 3895 | ||
3870 | if (mi->event != atomic_read(&md_event_count)) | 3896 | if (mi->event != atomic_read(&md_event_count)) |
3871 | mask |= POLLERR | POLLPRI; | 3897 | mask |= POLLERR | POLLPRI; |
3872 | return mask; | 3898 | return mask; |
3873 | } | 3899 | } |
3874 | 3900 | ||
3875 | static struct file_operations md_seq_fops = { | 3901 | static struct file_operations md_seq_fops = { |
3876 | .open = md_seq_open, | 3902 | .open = md_seq_open, |
3877 | .read = seq_read, | 3903 | .read = seq_read, |
3878 | .llseek = seq_lseek, | 3904 | .llseek = seq_lseek, |
3879 | .release = md_seq_release, | 3905 | .release = md_seq_release, |
3880 | .poll = mdstat_poll, | 3906 | .poll = mdstat_poll, |
3881 | }; | 3907 | }; |
3882 | 3908 | ||
3883 | int register_md_personality(struct mdk_personality *p) | 3909 | int register_md_personality(struct mdk_personality *p) |
3884 | { | 3910 | { |
3885 | spin_lock(&pers_lock); | 3911 | spin_lock(&pers_lock); |
3886 | list_add_tail(&p->list, &pers_list); | 3912 | list_add_tail(&p->list, &pers_list); |
3887 | printk(KERN_INFO "md: %s personality registered for level %d\n", p->name, p->level); | 3913 | printk(KERN_INFO "md: %s personality registered for level %d\n", p->name, p->level); |
3888 | spin_unlock(&pers_lock); | 3914 | spin_unlock(&pers_lock); |
3889 | return 0; | 3915 | return 0; |
3890 | } | 3916 | } |
3891 | 3917 | ||
3892 | int unregister_md_personality(struct mdk_personality *p) | 3918 | int unregister_md_personality(struct mdk_personality *p) |
3893 | { | 3919 | { |
3894 | printk(KERN_INFO "md: %s personality unregistered\n", p->name); | 3920 | printk(KERN_INFO "md: %s personality unregistered\n", p->name); |
3895 | spin_lock(&pers_lock); | 3921 | spin_lock(&pers_lock); |
3896 | list_del_init(&p->list); | 3922 | list_del_init(&p->list); |
3897 | spin_unlock(&pers_lock); | 3923 | spin_unlock(&pers_lock); |
3898 | return 0; | 3924 | return 0; |
3899 | } | 3925 | } |
3900 | 3926 | ||
3901 | static int is_mddev_idle(mddev_t *mddev) | 3927 | static int is_mddev_idle(mddev_t *mddev) |
3902 | { | 3928 | { |
3903 | mdk_rdev_t * rdev; | 3929 | mdk_rdev_t * rdev; |
3904 | struct list_head *tmp; | 3930 | struct list_head *tmp; |
3905 | int idle; | 3931 | int idle; |
3906 | unsigned long curr_events; | 3932 | unsigned long curr_events; |
3907 | 3933 | ||
3908 | idle = 1; | 3934 | idle = 1; |
3909 | ITERATE_RDEV(mddev,rdev,tmp) { | 3935 | ITERATE_RDEV(mddev,rdev,tmp) { |
3910 | struct gendisk *disk = rdev->bdev->bd_contains->bd_disk; | 3936 | struct gendisk *disk = rdev->bdev->bd_contains->bd_disk; |
3911 | curr_events = disk_stat_read(disk, sectors[0]) + | 3937 | curr_events = disk_stat_read(disk, sectors[0]) + |
3912 | disk_stat_read(disk, sectors[1]) - | 3938 | disk_stat_read(disk, sectors[1]) - |
3913 | atomic_read(&disk->sync_io); | 3939 | atomic_read(&disk->sync_io); |
3914 | /* The difference between curr_events and last_events | 3940 | /* The difference between curr_events and last_events |
3915 | * will be affected by any new non-sync IO (making | 3941 | * will be affected by any new non-sync IO (making |
3916 | * curr_events bigger) and any difference in the amount of | 3942 | * curr_events bigger) and any difference in the amount of |
3917 | * in-flight syncio (making current_events bigger or smaller) | 3943 | * in-flight syncio (making current_events bigger or smaller) |
3918 | * The amount in-flight is currently limited to | 3944 | * The amount in-flight is currently limited to |
3919 | * 32*64K in raid1/10 and 256*PAGE_SIZE in raid5/6 | 3945 | * 32*64K in raid1/10 and 256*PAGE_SIZE in raid5/6 |
3920 | * which is at most 4096 sectors. | 3946 | * which is at most 4096 sectors. |
3921 | * These numbers are fairly fragile and should be made | 3947 | * These numbers are fairly fragile and should be made |
3922 | * more robust, probably by enforcing the | 3948 | * more robust, probably by enforcing the |
3923 | * 'window size' that md_do_sync sort-of uses. | 3949 | * 'window size' that md_do_sync sort-of uses. |
3924 | * | 3950 | * |
3925 | * Note: the following is an unsigned comparison. | 3951 | * Note: the following is an unsigned comparison. |
3926 | */ | 3952 | */ |
3927 | if ((curr_events - rdev->last_events + 4096) > 8192) { | 3953 | if ((curr_events - rdev->last_events + 4096) > 8192) { |
3928 | rdev->last_events = curr_events; | 3954 | rdev->last_events = curr_events; |
3929 | idle = 0; | 3955 | idle = 0; |
3930 | } | 3956 | } |
3931 | } | 3957 | } |
3932 | return idle; | 3958 | return idle; |
3933 | } | 3959 | } |
3934 | 3960 | ||
3935 | void md_done_sync(mddev_t *mddev, int blocks, int ok) | 3961 | void md_done_sync(mddev_t *mddev, int blocks, int ok) |
3936 | { | 3962 | { |
3937 | /* another "blocks" (512byte) blocks have been synced */ | 3963 | /* another "blocks" (512byte) blocks have been synced */ |
3938 | atomic_sub(blocks, &mddev->recovery_active); | 3964 | atomic_sub(blocks, &mddev->recovery_active); |
3939 | wake_up(&mddev->recovery_wait); | 3965 | wake_up(&mddev->recovery_wait); |
3940 | if (!ok) { | 3966 | if (!ok) { |
3941 | set_bit(MD_RECOVERY_ERR, &mddev->recovery); | 3967 | set_bit(MD_RECOVERY_ERR, &mddev->recovery); |
3942 | md_wakeup_thread(mddev->thread); | 3968 | md_wakeup_thread(mddev->thread); |
3943 | // stop recovery, signal do_sync .... | 3969 | // stop recovery, signal do_sync .... |
3944 | } | 3970 | } |
3945 | } | 3971 | } |
3946 | 3972 | ||
3947 | 3973 | ||
3948 | /* md_write_start(mddev, bi) | 3974 | /* md_write_start(mddev, bi) |
3949 | * If we need to update some array metadata (e.g. 'active' flag | 3975 | * If we need to update some array metadata (e.g. 'active' flag |
3950 | * in superblock) before writing, schedule a superblock update | 3976 | * in superblock) before writing, schedule a superblock update |
3951 | * and wait for it to complete. | 3977 | * and wait for it to complete. |
3952 | */ | 3978 | */ |
3953 | void md_write_start(mddev_t *mddev, struct bio *bi) | 3979 | void md_write_start(mddev_t *mddev, struct bio *bi) |
3954 | { | 3980 | { |
3955 | if (bio_data_dir(bi) != WRITE) | 3981 | if (bio_data_dir(bi) != WRITE) |
3956 | return; | 3982 | return; |
3957 | 3983 | ||
3958 | BUG_ON(mddev->ro == 1); | 3984 | BUG_ON(mddev->ro == 1); |
3959 | if (mddev->ro == 2) { | 3985 | if (mddev->ro == 2) { |
3960 | /* need to switch to read/write */ | 3986 | /* need to switch to read/write */ |
3961 | mddev->ro = 0; | 3987 | mddev->ro = 0; |
3962 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 3988 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
3963 | md_wakeup_thread(mddev->thread); | 3989 | md_wakeup_thread(mddev->thread); |
3964 | } | 3990 | } |
3965 | atomic_inc(&mddev->writes_pending); | 3991 | atomic_inc(&mddev->writes_pending); |
3966 | if (mddev->in_sync) { | 3992 | if (mddev->in_sync) { |
3967 | spin_lock_irq(&mddev->write_lock); | 3993 | spin_lock_irq(&mddev->write_lock); |
3968 | if (mddev->in_sync) { | 3994 | if (mddev->in_sync) { |
3969 | mddev->in_sync = 0; | 3995 | mddev->in_sync = 0; |
3970 | mddev->sb_dirty = 1; | 3996 | mddev->sb_dirty = 1; |
3971 | md_wakeup_thread(mddev->thread); | 3997 | md_wakeup_thread(mddev->thread); |
3972 | } | 3998 | } |
3973 | spin_unlock_irq(&mddev->write_lock); | 3999 | spin_unlock_irq(&mddev->write_lock); |
3974 | } | 4000 | } |
3975 | wait_event(mddev->sb_wait, mddev->sb_dirty==0); | 4001 | wait_event(mddev->sb_wait, mddev->sb_dirty==0); |
3976 | } | 4002 | } |
3977 | 4003 | ||
3978 | void md_write_end(mddev_t *mddev) | 4004 | void md_write_end(mddev_t *mddev) |
3979 | { | 4005 | { |
3980 | if (atomic_dec_and_test(&mddev->writes_pending)) { | 4006 | if (atomic_dec_and_test(&mddev->writes_pending)) { |
3981 | if (mddev->safemode == 2) | 4007 | if (mddev->safemode == 2) |
3982 | md_wakeup_thread(mddev->thread); | 4008 | md_wakeup_thread(mddev->thread); |
3983 | else | 4009 | else |
3984 | mod_timer(&mddev->safemode_timer, jiffies + mddev->safemode_delay); | 4010 | mod_timer(&mddev->safemode_timer, jiffies + mddev->safemode_delay); |
3985 | } | 4011 | } |
3986 | } | 4012 | } |
3987 | 4013 | ||
3988 | static DECLARE_WAIT_QUEUE_HEAD(resync_wait); | 4014 | static DECLARE_WAIT_QUEUE_HEAD(resync_wait); |
3989 | 4015 | ||
3990 | #define SYNC_MARKS 10 | 4016 | #define SYNC_MARKS 10 |
3991 | #define SYNC_MARK_STEP (3*HZ) | 4017 | #define SYNC_MARK_STEP (3*HZ) |
3992 | static void md_do_sync(mddev_t *mddev) | 4018 | static void md_do_sync(mddev_t *mddev) |
3993 | { | 4019 | { |
3994 | mddev_t *mddev2; | 4020 | mddev_t *mddev2; |
3995 | unsigned int currspeed = 0, | 4021 | unsigned int currspeed = 0, |
3996 | window; | 4022 | window; |
3997 | sector_t max_sectors,j, io_sectors; | 4023 | sector_t max_sectors,j, io_sectors; |
3998 | unsigned long mark[SYNC_MARKS]; | 4024 | unsigned long mark[SYNC_MARKS]; |
3999 | sector_t mark_cnt[SYNC_MARKS]; | 4025 | sector_t mark_cnt[SYNC_MARKS]; |
4000 | int last_mark,m; | 4026 | int last_mark,m; |
4001 | struct list_head *tmp; | 4027 | struct list_head *tmp; |
4002 | sector_t last_check; | 4028 | sector_t last_check; |
4003 | int skipped = 0; | 4029 | int skipped = 0; |
4004 | 4030 | ||
4005 | /* just incase thread restarts... */ | 4031 | /* just incase thread restarts... */ |
4006 | if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) | 4032 | if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) |
4007 | return; | 4033 | return; |
4008 | 4034 | ||
4009 | /* we overload curr_resync somewhat here. | 4035 | /* we overload curr_resync somewhat here. |
4010 | * 0 == not engaged in resync at all | 4036 | * 0 == not engaged in resync at all |
4011 | * 2 == checking that there is no conflict with another sync | 4037 | * 2 == checking that there is no conflict with another sync |
4012 | * 1 == like 2, but have yielded to allow conflicting resync to | 4038 | * 1 == like 2, but have yielded to allow conflicting resync to |
4013 | * commense | 4039 | * commense |
4014 | * other == active in resync - this many blocks | 4040 | * other == active in resync - this many blocks |
4015 | * | 4041 | * |
4016 | * Before starting a resync we must have set curr_resync to | 4042 | * Before starting a resync we must have set curr_resync to |
4017 | * 2, and then checked that every "conflicting" array has curr_resync | 4043 | * 2, and then checked that every "conflicting" array has curr_resync |
4018 | * less than ours. When we find one that is the same or higher | 4044 | * less than ours. When we find one that is the same or higher |
4019 | * we wait on resync_wait. To avoid deadlock, we reduce curr_resync | 4045 | * we wait on resync_wait. To avoid deadlock, we reduce curr_resync |
4020 | * to 1 if we choose to yield (based arbitrarily on address of mddev structure). | 4046 | * to 1 if we choose to yield (based arbitrarily on address of mddev structure). |
4021 | * This will mean we have to start checking from the beginning again. | 4047 | * This will mean we have to start checking from the beginning again. |
4022 | * | 4048 | * |
4023 | */ | 4049 | */ |
4024 | 4050 | ||
4025 | do { | 4051 | do { |
4026 | mddev->curr_resync = 2; | 4052 | mddev->curr_resync = 2; |
4027 | 4053 | ||
4028 | try_again: | 4054 | try_again: |
4029 | if (kthread_should_stop()) { | 4055 | if (kthread_should_stop()) { |
4030 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); | 4056 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); |
4031 | goto skip; | 4057 | goto skip; |
4032 | } | 4058 | } |
4033 | ITERATE_MDDEV(mddev2,tmp) { | 4059 | ITERATE_MDDEV(mddev2,tmp) { |
4034 | if (mddev2 == mddev) | 4060 | if (mddev2 == mddev) |
4035 | continue; | 4061 | continue; |
4036 | if (mddev2->curr_resync && | 4062 | if (mddev2->curr_resync && |
4037 | match_mddev_units(mddev,mddev2)) { | 4063 | match_mddev_units(mddev,mddev2)) { |
4038 | DEFINE_WAIT(wq); | 4064 | DEFINE_WAIT(wq); |
4039 | if (mddev < mddev2 && mddev->curr_resync == 2) { | 4065 | if (mddev < mddev2 && mddev->curr_resync == 2) { |
4040 | /* arbitrarily yield */ | 4066 | /* arbitrarily yield */ |
4041 | mddev->curr_resync = 1; | 4067 | mddev->curr_resync = 1; |
4042 | wake_up(&resync_wait); | 4068 | wake_up(&resync_wait); |
4043 | } | 4069 | } |
4044 | if (mddev > mddev2 && mddev->curr_resync == 1) | 4070 | if (mddev > mddev2 && mddev->curr_resync == 1) |
4045 | /* no need to wait here, we can wait the next | 4071 | /* no need to wait here, we can wait the next |
4046 | * time 'round when curr_resync == 2 | 4072 | * time 'round when curr_resync == 2 |
4047 | */ | 4073 | */ |
4048 | continue; | 4074 | continue; |
4049 | prepare_to_wait(&resync_wait, &wq, TASK_UNINTERRUPTIBLE); | 4075 | prepare_to_wait(&resync_wait, &wq, TASK_UNINTERRUPTIBLE); |
4050 | if (!kthread_should_stop() && | 4076 | if (!kthread_should_stop() && |
4051 | mddev2->curr_resync >= mddev->curr_resync) { | 4077 | mddev2->curr_resync >= mddev->curr_resync) { |
4052 | printk(KERN_INFO "md: delaying resync of %s" | 4078 | printk(KERN_INFO "md: delaying resync of %s" |
4053 | " until %s has finished resync (they" | 4079 | " until %s has finished resync (they" |
4054 | " share one or more physical units)\n", | 4080 | " share one or more physical units)\n", |
4055 | mdname(mddev), mdname(mddev2)); | 4081 | mdname(mddev), mdname(mddev2)); |
4056 | mddev_put(mddev2); | 4082 | mddev_put(mddev2); |
4057 | schedule(); | 4083 | schedule(); |
4058 | finish_wait(&resync_wait, &wq); | 4084 | finish_wait(&resync_wait, &wq); |
4059 | goto try_again; | 4085 | goto try_again; |
4060 | } | 4086 | } |
4061 | finish_wait(&resync_wait, &wq); | 4087 | finish_wait(&resync_wait, &wq); |
4062 | } | 4088 | } |
4063 | } | 4089 | } |
4064 | } while (mddev->curr_resync < 2); | 4090 | } while (mddev->curr_resync < 2); |
4065 | 4091 | ||
4066 | if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { | 4092 | if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { |
4067 | /* resync follows the size requested by the personality, | 4093 | /* resync follows the size requested by the personality, |
4068 | * which defaults to physical size, but can be virtual size | 4094 | * which defaults to physical size, but can be virtual size |
4069 | */ | 4095 | */ |
4070 | max_sectors = mddev->resync_max_sectors; | 4096 | max_sectors = mddev->resync_max_sectors; |
4071 | mddev->resync_mismatches = 0; | 4097 | mddev->resync_mismatches = 0; |
4072 | } else | 4098 | } else |
4073 | /* recovery follows the physical size of devices */ | 4099 | /* recovery follows the physical size of devices */ |
4074 | max_sectors = mddev->size << 1; | 4100 | max_sectors = mddev->size << 1; |
4075 | 4101 | ||
4076 | printk(KERN_INFO "md: syncing RAID array %s\n", mdname(mddev)); | 4102 | printk(KERN_INFO "md: syncing RAID array %s\n", mdname(mddev)); |
4077 | printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:" | 4103 | printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:" |
4078 | " %d KB/sec/disc.\n", sysctl_speed_limit_min); | 4104 | " %d KB/sec/disc.\n", sysctl_speed_limit_min); |
4079 | printk(KERN_INFO "md: using maximum available idle IO bandwidth " | 4105 | printk(KERN_INFO "md: using maximum available idle IO bandwidth " |
4080 | "(but not more than %d KB/sec) for reconstruction.\n", | 4106 | "(but not more than %d KB/sec) for reconstruction.\n", |
4081 | sysctl_speed_limit_max); | 4107 | sysctl_speed_limit_max); |
4082 | 4108 | ||
4083 | is_mddev_idle(mddev); /* this also initializes IO event counters */ | 4109 | is_mddev_idle(mddev); /* this also initializes IO event counters */ |
4084 | /* we don't use the checkpoint if there's a bitmap */ | 4110 | /* we don't use the checkpoint if there's a bitmap */ |
4085 | if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && !mddev->bitmap | 4111 | if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && !mddev->bitmap |
4086 | && ! test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) | 4112 | && ! test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) |
4087 | j = mddev->recovery_cp; | 4113 | j = mddev->recovery_cp; |
4088 | else | 4114 | else |
4089 | j = 0; | 4115 | j = 0; |
4090 | io_sectors = 0; | 4116 | io_sectors = 0; |
4091 | for (m = 0; m < SYNC_MARKS; m++) { | 4117 | for (m = 0; m < SYNC_MARKS; m++) { |
4092 | mark[m] = jiffies; | 4118 | mark[m] = jiffies; |
4093 | mark_cnt[m] = io_sectors; | 4119 | mark_cnt[m] = io_sectors; |
4094 | } | 4120 | } |
4095 | last_mark = 0; | 4121 | last_mark = 0; |
4096 | mddev->resync_mark = mark[last_mark]; | 4122 | mddev->resync_mark = mark[last_mark]; |
4097 | mddev->resync_mark_cnt = mark_cnt[last_mark]; | 4123 | mddev->resync_mark_cnt = mark_cnt[last_mark]; |
4098 | 4124 | ||
4099 | /* | 4125 | /* |
4100 | * Tune reconstruction: | 4126 | * Tune reconstruction: |
4101 | */ | 4127 | */ |
4102 | window = 32*(PAGE_SIZE/512); | 4128 | window = 32*(PAGE_SIZE/512); |
4103 | printk(KERN_INFO "md: using %dk window, over a total of %llu blocks.\n", | 4129 | printk(KERN_INFO "md: using %dk window, over a total of %llu blocks.\n", |
4104 | window/2,(unsigned long long) max_sectors/2); | 4130 | window/2,(unsigned long long) max_sectors/2); |
4105 | 4131 | ||
4106 | atomic_set(&mddev->recovery_active, 0); | 4132 | atomic_set(&mddev->recovery_active, 0); |
4107 | init_waitqueue_head(&mddev->recovery_wait); | 4133 | init_waitqueue_head(&mddev->recovery_wait); |
4108 | last_check = 0; | 4134 | last_check = 0; |
4109 | 4135 | ||
4110 | if (j>2) { | 4136 | if (j>2) { |
4111 | printk(KERN_INFO | 4137 | printk(KERN_INFO |
4112 | "md: resuming recovery of %s from checkpoint.\n", | 4138 | "md: resuming recovery of %s from checkpoint.\n", |
4113 | mdname(mddev)); | 4139 | mdname(mddev)); |
4114 | mddev->curr_resync = j; | 4140 | mddev->curr_resync = j; |
4115 | } | 4141 | } |
4116 | 4142 | ||
4117 | while (j < max_sectors) { | 4143 | while (j < max_sectors) { |
4118 | sector_t sectors; | 4144 | sector_t sectors; |
4119 | 4145 | ||
4120 | skipped = 0; | 4146 | skipped = 0; |
4121 | sectors = mddev->pers->sync_request(mddev, j, &skipped, | 4147 | sectors = mddev->pers->sync_request(mddev, j, &skipped, |
4122 | currspeed < sysctl_speed_limit_min); | 4148 | currspeed < sysctl_speed_limit_min); |
4123 | if (sectors == 0) { | 4149 | if (sectors == 0) { |
4124 | set_bit(MD_RECOVERY_ERR, &mddev->recovery); | 4150 | set_bit(MD_RECOVERY_ERR, &mddev->recovery); |
4125 | goto out; | 4151 | goto out; |
4126 | } | 4152 | } |
4127 | 4153 | ||
4128 | if (!skipped) { /* actual IO requested */ | 4154 | if (!skipped) { /* actual IO requested */ |
4129 | io_sectors += sectors; | 4155 | io_sectors += sectors; |
4130 | atomic_add(sectors, &mddev->recovery_active); | 4156 | atomic_add(sectors, &mddev->recovery_active); |
4131 | } | 4157 | } |
4132 | 4158 | ||
4133 | j += sectors; | 4159 | j += sectors; |
4134 | if (j>1) mddev->curr_resync = j; | 4160 | if (j>1) mddev->curr_resync = j; |
4135 | if (last_check == 0) | 4161 | if (last_check == 0) |
4136 | /* this is the earliers that rebuilt will be | 4162 | /* this is the earliers that rebuilt will be |
4137 | * visible in /proc/mdstat | 4163 | * visible in /proc/mdstat |
4138 | */ | 4164 | */ |
4139 | md_new_event(mddev); | 4165 | md_new_event(mddev); |
4140 | 4166 | ||
4141 | if (last_check + window > io_sectors || j == max_sectors) | 4167 | if (last_check + window > io_sectors || j == max_sectors) |
4142 | continue; | 4168 | continue; |
4143 | 4169 | ||
4144 | last_check = io_sectors; | 4170 | last_check = io_sectors; |
4145 | 4171 | ||
4146 | if (test_bit(MD_RECOVERY_INTR, &mddev->recovery) || | 4172 | if (test_bit(MD_RECOVERY_INTR, &mddev->recovery) || |
4147 | test_bit(MD_RECOVERY_ERR, &mddev->recovery)) | 4173 | test_bit(MD_RECOVERY_ERR, &mddev->recovery)) |
4148 | break; | 4174 | break; |
4149 | 4175 | ||
4150 | repeat: | 4176 | repeat: |
4151 | if (time_after_eq(jiffies, mark[last_mark] + SYNC_MARK_STEP )) { | 4177 | if (time_after_eq(jiffies, mark[last_mark] + SYNC_MARK_STEP )) { |
4152 | /* step marks */ | 4178 | /* step marks */ |
4153 | int next = (last_mark+1) % SYNC_MARKS; | 4179 | int next = (last_mark+1) % SYNC_MARKS; |
4154 | 4180 | ||
4155 | mddev->resync_mark = mark[next]; | 4181 | mddev->resync_mark = mark[next]; |
4156 | mddev->resync_mark_cnt = mark_cnt[next]; | 4182 | mddev->resync_mark_cnt = mark_cnt[next]; |
4157 | mark[next] = jiffies; | 4183 | mark[next] = jiffies; |
4158 | mark_cnt[next] = io_sectors - atomic_read(&mddev->recovery_active); | 4184 | mark_cnt[next] = io_sectors - atomic_read(&mddev->recovery_active); |
4159 | last_mark = next; | 4185 | last_mark = next; |
4160 | } | 4186 | } |
4161 | 4187 | ||
4162 | 4188 | ||
4163 | if (kthread_should_stop()) { | 4189 | if (kthread_should_stop()) { |
4164 | /* | 4190 | /* |
4165 | * got a signal, exit. | 4191 | * got a signal, exit. |
4166 | */ | 4192 | */ |
4167 | printk(KERN_INFO | 4193 | printk(KERN_INFO |
4168 | "md: md_do_sync() got signal ... exiting\n"); | 4194 | "md: md_do_sync() got signal ... exiting\n"); |
4169 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); | 4195 | set_bit(MD_RECOVERY_INTR, &mddev->recovery); |
4170 | goto out; | 4196 | goto out; |
4171 | } | 4197 | } |
4172 | 4198 | ||
4173 | /* | 4199 | /* |
4174 | * this loop exits only if either when we are slower than | 4200 | * this loop exits only if either when we are slower than |
4175 | * the 'hard' speed limit, or the system was IO-idle for | 4201 | * the 'hard' speed limit, or the system was IO-idle for |
4176 | * a jiffy. | 4202 | * a jiffy. |
4177 | * the system might be non-idle CPU-wise, but we only care | 4203 | * the system might be non-idle CPU-wise, but we only care |
4178 | * about not overloading the IO subsystem. (things like an | 4204 | * about not overloading the IO subsystem. (things like an |
4179 | * e2fsck being done on the RAID array should execute fast) | 4205 | * e2fsck being done on the RAID array should execute fast) |
4180 | */ | 4206 | */ |
4181 | mddev->queue->unplug_fn(mddev->queue); | 4207 | mddev->queue->unplug_fn(mddev->queue); |
4182 | cond_resched(); | 4208 | cond_resched(); |
4183 | 4209 | ||
4184 | currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2 | 4210 | currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2 |
4185 | /((jiffies-mddev->resync_mark)/HZ +1) +1; | 4211 | /((jiffies-mddev->resync_mark)/HZ +1) +1; |
4186 | 4212 | ||
4187 | if (currspeed > sysctl_speed_limit_min) { | 4213 | if (currspeed > sysctl_speed_limit_min) { |
4188 | if ((currspeed > sysctl_speed_limit_max) || | 4214 | if ((currspeed > sysctl_speed_limit_max) || |
4189 | !is_mddev_idle(mddev)) { | 4215 | !is_mddev_idle(mddev)) { |
4190 | msleep(500); | 4216 | msleep(500); |
4191 | goto repeat; | 4217 | goto repeat; |
4192 | } | 4218 | } |
4193 | } | 4219 | } |
4194 | } | 4220 | } |
4195 | printk(KERN_INFO "md: %s: sync done.\n",mdname(mddev)); | 4221 | printk(KERN_INFO "md: %s: sync done.\n",mdname(mddev)); |
4196 | /* | 4222 | /* |
4197 | * this also signals 'finished resyncing' to md_stop | 4223 | * this also signals 'finished resyncing' to md_stop |
4198 | */ | 4224 | */ |
4199 | out: | 4225 | out: |
4200 | mddev->queue->unplug_fn(mddev->queue); | 4226 | mddev->queue->unplug_fn(mddev->queue); |
4201 | 4227 | ||
4202 | wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); | 4228 | wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); |
4203 | 4229 | ||
4204 | /* tell personality that we are finished */ | 4230 | /* tell personality that we are finished */ |
4205 | mddev->pers->sync_request(mddev, max_sectors, &skipped, 1); | 4231 | mddev->pers->sync_request(mddev, max_sectors, &skipped, 1); |
4206 | 4232 | ||
4207 | if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && | 4233 | if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && |
4208 | mddev->curr_resync > 2 && | 4234 | mddev->curr_resync > 2 && |
4209 | mddev->curr_resync >= mddev->recovery_cp) { | 4235 | mddev->curr_resync >= mddev->recovery_cp) { |
4210 | if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { | 4236 | if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { |
4211 | printk(KERN_INFO | 4237 | printk(KERN_INFO |
4212 | "md: checkpointing recovery of %s.\n", | 4238 | "md: checkpointing recovery of %s.\n", |
4213 | mdname(mddev)); | 4239 | mdname(mddev)); |
4214 | mddev->recovery_cp = mddev->curr_resync; | 4240 | mddev->recovery_cp = mddev->curr_resync; |
4215 | } else | 4241 | } else |
4216 | mddev->recovery_cp = MaxSector; | 4242 | mddev->recovery_cp = MaxSector; |
4217 | } | 4243 | } |
4218 | 4244 | ||
4219 | skip: | 4245 | skip: |
4220 | mddev->curr_resync = 0; | 4246 | mddev->curr_resync = 0; |
4221 | wake_up(&resync_wait); | 4247 | wake_up(&resync_wait); |
4222 | set_bit(MD_RECOVERY_DONE, &mddev->recovery); | 4248 | set_bit(MD_RECOVERY_DONE, &mddev->recovery); |
4223 | md_wakeup_thread(mddev->thread); | 4249 | md_wakeup_thread(mddev->thread); |
4224 | } | 4250 | } |
4225 | 4251 | ||
4226 | 4252 | ||
4227 | /* | 4253 | /* |
4228 | * This routine is regularly called by all per-raid-array threads to | 4254 | * This routine is regularly called by all per-raid-array threads to |
4229 | * deal with generic issues like resync and super-block update. | 4255 | * deal with generic issues like resync and super-block update. |
4230 | * Raid personalities that don't have a thread (linear/raid0) do not | 4256 | * Raid personalities that don't have a thread (linear/raid0) do not |
4231 | * need this as they never do any recovery or update the superblock. | 4257 | * need this as they never do any recovery or update the superblock. |
4232 | * | 4258 | * |
4233 | * It does not do any resync itself, but rather "forks" off other threads | 4259 | * It does not do any resync itself, but rather "forks" off other threads |
4234 | * to do that as needed. | 4260 | * to do that as needed. |
4235 | * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in | 4261 | * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in |
4236 | * "->recovery" and create a thread at ->sync_thread. | 4262 | * "->recovery" and create a thread at ->sync_thread. |
4237 | * When the thread finishes it sets MD_RECOVERY_DONE (and might set MD_RECOVERY_ERR) | 4263 | * When the thread finishes it sets MD_RECOVERY_DONE (and might set MD_RECOVERY_ERR) |
4238 | * and wakeups up this thread which will reap the thread and finish up. | 4264 | * and wakeups up this thread which will reap the thread and finish up. |
4239 | * This thread also removes any faulty devices (with nr_pending == 0). | 4265 | * This thread also removes any faulty devices (with nr_pending == 0). |
4240 | * | 4266 | * |
4241 | * The overall approach is: | 4267 | * The overall approach is: |
4242 | * 1/ if the superblock needs updating, update it. | 4268 | * 1/ if the superblock needs updating, update it. |
4243 | * 2/ If a recovery thread is running, don't do anything else. | 4269 | * 2/ If a recovery thread is running, don't do anything else. |
4244 | * 3/ If recovery has finished, clean up, possibly marking spares active. | 4270 | * 3/ If recovery has finished, clean up, possibly marking spares active. |
4245 | * 4/ If there are any faulty devices, remove them. | 4271 | * 4/ If there are any faulty devices, remove them. |
4246 | * 5/ If array is degraded, try to add spares devices | 4272 | * 5/ If array is degraded, try to add spares devices |
4247 | * 6/ If array has spares or is not in-sync, start a resync thread. | 4273 | * 6/ If array has spares or is not in-sync, start a resync thread. |
4248 | */ | 4274 | */ |
4249 | void md_check_recovery(mddev_t *mddev) | 4275 | void md_check_recovery(mddev_t *mddev) |
4250 | { | 4276 | { |
4251 | mdk_rdev_t *rdev; | 4277 | mdk_rdev_t *rdev; |
4252 | struct list_head *rtmp; | 4278 | struct list_head *rtmp; |
4253 | 4279 | ||
4254 | 4280 | ||
4255 | if (mddev->bitmap) | 4281 | if (mddev->bitmap) |
4256 | bitmap_daemon_work(mddev->bitmap); | 4282 | bitmap_daemon_work(mddev->bitmap); |
4257 | 4283 | ||
4258 | if (mddev->ro) | 4284 | if (mddev->ro) |
4259 | return; | 4285 | return; |
4260 | 4286 | ||
4261 | if (signal_pending(current)) { | 4287 | if (signal_pending(current)) { |
4262 | if (mddev->pers->sync_request) { | 4288 | if (mddev->pers->sync_request) { |
4263 | printk(KERN_INFO "md: %s in immediate safe mode\n", | 4289 | printk(KERN_INFO "md: %s in immediate safe mode\n", |
4264 | mdname(mddev)); | 4290 | mdname(mddev)); |
4265 | mddev->safemode = 2; | 4291 | mddev->safemode = 2; |
4266 | } | 4292 | } |
4267 | flush_signals(current); | 4293 | flush_signals(current); |
4268 | } | 4294 | } |
4269 | 4295 | ||
4270 | if ( ! ( | 4296 | if ( ! ( |
4271 | mddev->sb_dirty || | 4297 | mddev->sb_dirty || |
4272 | test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || | 4298 | test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || |
4273 | test_bit(MD_RECOVERY_DONE, &mddev->recovery) || | 4299 | test_bit(MD_RECOVERY_DONE, &mddev->recovery) || |
4274 | (mddev->safemode == 1) || | 4300 | (mddev->safemode == 1) || |
4275 | (mddev->safemode == 2 && ! atomic_read(&mddev->writes_pending) | 4301 | (mddev->safemode == 2 && ! atomic_read(&mddev->writes_pending) |
4276 | && !mddev->in_sync && mddev->recovery_cp == MaxSector) | 4302 | && !mddev->in_sync && mddev->recovery_cp == MaxSector) |
4277 | )) | 4303 | )) |
4278 | return; | 4304 | return; |
4279 | 4305 | ||
4280 | if (mddev_trylock(mddev)==0) { | 4306 | if (mddev_trylock(mddev)==0) { |
4281 | int spares =0; | 4307 | int spares =0; |
4282 | 4308 | ||
4283 | spin_lock_irq(&mddev->write_lock); | 4309 | spin_lock_irq(&mddev->write_lock); |
4284 | if (mddev->safemode && !atomic_read(&mddev->writes_pending) && | 4310 | if (mddev->safemode && !atomic_read(&mddev->writes_pending) && |
4285 | !mddev->in_sync && mddev->recovery_cp == MaxSector) { | 4311 | !mddev->in_sync && mddev->recovery_cp == MaxSector) { |
4286 | mddev->in_sync = 1; | 4312 | mddev->in_sync = 1; |
4287 | mddev->sb_dirty = 1; | 4313 | mddev->sb_dirty = 1; |
4288 | } | 4314 | } |
4289 | if (mddev->safemode == 1) | 4315 | if (mddev->safemode == 1) |
4290 | mddev->safemode = 0; | 4316 | mddev->safemode = 0; |
4291 | spin_unlock_irq(&mddev->write_lock); | 4317 | spin_unlock_irq(&mddev->write_lock); |
4292 | 4318 | ||
4293 | if (mddev->sb_dirty) | 4319 | if (mddev->sb_dirty) |
4294 | md_update_sb(mddev); | 4320 | md_update_sb(mddev); |
4295 | 4321 | ||
4296 | 4322 | ||
4297 | if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && | 4323 | if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && |
4298 | !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { | 4324 | !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { |
4299 | /* resync/recovery still happening */ | 4325 | /* resync/recovery still happening */ |
4300 | clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 4326 | clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
4301 | goto unlock; | 4327 | goto unlock; |
4302 | } | 4328 | } |
4303 | if (mddev->sync_thread) { | 4329 | if (mddev->sync_thread) { |
4304 | /* resync has finished, collect result */ | 4330 | /* resync has finished, collect result */ |
4305 | md_unregister_thread(mddev->sync_thread); | 4331 | md_unregister_thread(mddev->sync_thread); |
4306 | mddev->sync_thread = NULL; | 4332 | mddev->sync_thread = NULL; |
4307 | if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && | 4333 | if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && |
4308 | !test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { | 4334 | !test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { |
4309 | /* success...*/ | 4335 | /* success...*/ |
4310 | /* activate any spares */ | 4336 | /* activate any spares */ |
4311 | mddev->pers->spare_active(mddev); | 4337 | mddev->pers->spare_active(mddev); |
4312 | } | 4338 | } |
4313 | md_update_sb(mddev); | 4339 | md_update_sb(mddev); |
4314 | 4340 | ||
4315 | /* if array is no-longer degraded, then any saved_raid_disk | 4341 | /* if array is no-longer degraded, then any saved_raid_disk |
4316 | * information must be scrapped | 4342 | * information must be scrapped |
4317 | */ | 4343 | */ |
4318 | if (!mddev->degraded) | 4344 | if (!mddev->degraded) |
4319 | ITERATE_RDEV(mddev,rdev,rtmp) | 4345 | ITERATE_RDEV(mddev,rdev,rtmp) |
4320 | rdev->saved_raid_disk = -1; | 4346 | rdev->saved_raid_disk = -1; |
4321 | 4347 | ||
4322 | mddev->recovery = 0; | 4348 | mddev->recovery = 0; |
4323 | /* flag recovery needed just to double check */ | 4349 | /* flag recovery needed just to double check */ |
4324 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 4350 | set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
4325 | md_new_event(mddev); | 4351 | md_new_event(mddev); |
4326 | goto unlock; | 4352 | goto unlock; |
4327 | } | 4353 | } |
4328 | /* Clear some bits that don't mean anything, but | 4354 | /* Clear some bits that don't mean anything, but |
4329 | * might be left set | 4355 | * might be left set |
4330 | */ | 4356 | */ |
4331 | clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); | 4357 | clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); |
4332 | clear_bit(MD_RECOVERY_ERR, &mddev->recovery); | 4358 | clear_bit(MD_RECOVERY_ERR, &mddev->recovery); |
4333 | clear_bit(MD_RECOVERY_INTR, &mddev->recovery); | 4359 | clear_bit(MD_RECOVERY_INTR, &mddev->recovery); |
4334 | clear_bit(MD_RECOVERY_DONE, &mddev->recovery); | 4360 | clear_bit(MD_RECOVERY_DONE, &mddev->recovery); |
4335 | 4361 | ||
4336 | /* no recovery is running. | 4362 | /* no recovery is running. |
4337 | * remove any failed drives, then | 4363 | * remove any failed drives, then |
4338 | * add spares if possible. | 4364 | * add spares if possible. |
4339 | * Spare are also removed and re-added, to allow | 4365 | * Spare are also removed and re-added, to allow |
4340 | * the personality to fail the re-add. | 4366 | * the personality to fail the re-add. |
4341 | */ | 4367 | */ |
4342 | ITERATE_RDEV(mddev,rdev,rtmp) | 4368 | ITERATE_RDEV(mddev,rdev,rtmp) |
4343 | if (rdev->raid_disk >= 0 && | 4369 | if (rdev->raid_disk >= 0 && |
4344 | (test_bit(Faulty, &rdev->flags) || ! test_bit(In_sync, &rdev->flags)) && | 4370 | (test_bit(Faulty, &rdev->flags) || ! test_bit(In_sync, &rdev->flags)) && |
4345 | atomic_read(&rdev->nr_pending)==0) { | 4371 | atomic_read(&rdev->nr_pending)==0) { |
4346 | if (mddev->pers->hot_remove_disk(mddev, rdev->raid_disk)==0) { | 4372 | if (mddev->pers->hot_remove_disk(mddev, rdev->raid_disk)==0) { |
4347 | char nm[20]; | 4373 | char nm[20]; |
4348 | sprintf(nm,"rd%d", rdev->raid_disk); | 4374 | sprintf(nm,"rd%d", rdev->raid_disk); |
4349 | sysfs_remove_link(&mddev->kobj, nm); | 4375 | sysfs_remove_link(&mddev->kobj, nm); |
4350 | rdev->raid_disk = -1; | 4376 | rdev->raid_disk = -1; |
4351 | } | 4377 | } |
4352 | } | 4378 | } |
4353 | 4379 | ||
4354 | if (mddev->degraded) { | 4380 | if (mddev->degraded) { |
4355 | ITERATE_RDEV(mddev,rdev,rtmp) | 4381 | ITERATE_RDEV(mddev,rdev,rtmp) |
4356 | if (rdev->raid_disk < 0 | 4382 | if (rdev->raid_disk < 0 |
4357 | && !test_bit(Faulty, &rdev->flags)) { | 4383 | && !test_bit(Faulty, &rdev->flags)) { |
4358 | if (mddev->pers->hot_add_disk(mddev,rdev)) { | 4384 | if (mddev->pers->hot_add_disk(mddev,rdev)) { |
4359 | char nm[20]; | 4385 | char nm[20]; |
4360 | sprintf(nm, "rd%d", rdev->raid_disk); | 4386 | sprintf(nm, "rd%d", rdev->raid_disk); |
4361 | sysfs_create_link(&mddev->kobj, &rdev->kobj, nm); | 4387 | sysfs_create_link(&mddev->kobj, &rdev->kobj, nm); |
4362 | spares++; | 4388 | spares++; |
4363 | md_new_event(mddev); | 4389 | md_new_event(mddev); |
4364 | } else | 4390 | } else |
4365 | break; | 4391 | break; |
4366 | } | 4392 | } |
4367 | } | 4393 | } |
4368 | 4394 | ||
4369 | if (spares) { | 4395 | if (spares) { |
4370 | clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); | 4396 | clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); |
4371 | clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); | 4397 | clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); |
4372 | } else if (mddev->recovery_cp < MaxSector) { | 4398 | } else if (mddev->recovery_cp < MaxSector) { |
4373 | set_bit(MD_RECOVERY_SYNC, &mddev->recovery); | 4399 | set_bit(MD_RECOVERY_SYNC, &mddev->recovery); |
4374 | } else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) | 4400 | } else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) |
4375 | /* nothing to be done ... */ | 4401 | /* nothing to be done ... */ |
4376 | goto unlock; | 4402 | goto unlock; |
4377 | 4403 | ||
4378 | if (mddev->pers->sync_request) { | 4404 | if (mddev->pers->sync_request) { |
4379 | set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); | 4405 | set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); |
4380 | if (spares && mddev->bitmap && ! mddev->bitmap->file) { | 4406 | if (spares && mddev->bitmap && ! mddev->bitmap->file) { |
4381 | /* We are adding a device or devices to an array | 4407 | /* We are adding a device or devices to an array |
4382 | * which has the bitmap stored on all devices. | 4408 | * which has the bitmap stored on all devices. |
4383 | * So make sure all bitmap pages get written | 4409 | * So make sure all bitmap pages get written |
4384 | */ | 4410 | */ |
4385 | bitmap_write_all(mddev->bitmap); | 4411 | bitmap_write_all(mddev->bitmap); |
4386 | } | 4412 | } |
4387 | mddev->sync_thread = md_register_thread(md_do_sync, | 4413 | mddev->sync_thread = md_register_thread(md_do_sync, |
4388 | mddev, | 4414 | mddev, |
4389 | "%s_resync"); | 4415 | "%s_resync"); |
4390 | if (!mddev->sync_thread) { | 4416 | if (!mddev->sync_thread) { |
4391 | printk(KERN_ERR "%s: could not start resync" | 4417 | printk(KERN_ERR "%s: could not start resync" |
4392 | " thread...\n", | 4418 | " thread...\n", |
4393 | mdname(mddev)); | 4419 | mdname(mddev)); |
4394 | /* leave the spares where they are, it shouldn't hurt */ | 4420 | /* leave the spares where they are, it shouldn't hurt */ |
4395 | mddev->recovery = 0; | 4421 | mddev->recovery = 0; |
4396 | } else | 4422 | } else |
4397 | md_wakeup_thread(mddev->sync_thread); | 4423 | md_wakeup_thread(mddev->sync_thread); |
4398 | md_new_event(mddev); | 4424 | md_new_event(mddev); |
4399 | } | 4425 | } |
4400 | unlock: | 4426 | unlock: |
4401 | mddev_unlock(mddev); | 4427 | mddev_unlock(mddev); |
4402 | } | 4428 | } |
4403 | } | 4429 | } |
4404 | 4430 | ||
4405 | static int md_notify_reboot(struct notifier_block *this, | 4431 | static int md_notify_reboot(struct notifier_block *this, |
4406 | unsigned long code, void *x) | 4432 | unsigned long code, void *x) |
4407 | { | 4433 | { |
4408 | struct list_head *tmp; | 4434 | struct list_head *tmp; |
4409 | mddev_t *mddev; | 4435 | mddev_t *mddev; |
4410 | 4436 | ||
4411 | if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) { | 4437 | if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) { |
4412 | 4438 | ||
4413 | printk(KERN_INFO "md: stopping all md devices.\n"); | 4439 | printk(KERN_INFO "md: stopping all md devices.\n"); |
4414 | 4440 | ||
4415 | ITERATE_MDDEV(mddev,tmp) | 4441 | ITERATE_MDDEV(mddev,tmp) |
4416 | if (mddev_trylock(mddev)==0) | 4442 | if (mddev_trylock(mddev)==0) |
4417 | do_md_stop (mddev, 1); | 4443 | do_md_stop (mddev, 1); |
4418 | /* | 4444 | /* |
4419 | * certain more exotic SCSI devices are known to be | 4445 | * certain more exotic SCSI devices are known to be |
4420 | * volatile wrt too early system reboots. While the | 4446 | * volatile wrt too early system reboots. While the |
4421 | * right place to handle this issue is the given | 4447 | * right place to handle this issue is the given |
4422 | * driver, we do want to have a safe RAID driver ... | 4448 | * driver, we do want to have a safe RAID driver ... |
4423 | */ | 4449 | */ |
4424 | mdelay(1000*1); | 4450 | mdelay(1000*1); |
4425 | } | 4451 | } |
4426 | return NOTIFY_DONE; | 4452 | return NOTIFY_DONE; |
4427 | } | 4453 | } |
4428 | 4454 | ||
4429 | static struct notifier_block md_notifier = { | 4455 | static struct notifier_block md_notifier = { |
4430 | .notifier_call = md_notify_reboot, | 4456 | .notifier_call = md_notify_reboot, |
4431 | .next = NULL, | 4457 | .next = NULL, |
4432 | .priority = INT_MAX, /* before any real devices */ | 4458 | .priority = INT_MAX, /* before any real devices */ |
4433 | }; | 4459 | }; |
4434 | 4460 | ||
4435 | static void md_geninit(void) | 4461 | static void md_geninit(void) |
4436 | { | 4462 | { |
4437 | struct proc_dir_entry *p; | 4463 | struct proc_dir_entry *p; |
4438 | 4464 | ||
4439 | dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); | 4465 | dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); |
4440 | 4466 | ||
4441 | p = create_proc_entry("mdstat", S_IRUGO, NULL); | 4467 | p = create_proc_entry("mdstat", S_IRUGO, NULL); |
4442 | if (p) | 4468 | if (p) |
4443 | p->proc_fops = &md_seq_fops; | 4469 | p->proc_fops = &md_seq_fops; |
4444 | } | 4470 | } |
4445 | 4471 | ||
4446 | static int __init md_init(void) | 4472 | static int __init md_init(void) |
4447 | { | 4473 | { |
4448 | int minor; | 4474 | int minor; |
4449 | 4475 | ||
4450 | printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d," | 4476 | printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d," |
4451 | " MD_SB_DISKS=%d\n", | 4477 | " MD_SB_DISKS=%d\n", |
4452 | MD_MAJOR_VERSION, MD_MINOR_VERSION, | 4478 | MD_MAJOR_VERSION, MD_MINOR_VERSION, |
4453 | MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); | 4479 | MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); |
4454 | printk(KERN_INFO "md: bitmap version %d.%d\n", BITMAP_MAJOR_HI, | 4480 | printk(KERN_INFO "md: bitmap version %d.%d\n", BITMAP_MAJOR_HI, |
4455 | BITMAP_MINOR); | 4481 | BITMAP_MINOR); |
4456 | 4482 | ||
4457 | if (register_blkdev(MAJOR_NR, "md")) | 4483 | if (register_blkdev(MAJOR_NR, "md")) |
4458 | return -1; | 4484 | return -1; |
4459 | if ((mdp_major=register_blkdev(0, "mdp"))<=0) { | 4485 | if ((mdp_major=register_blkdev(0, "mdp"))<=0) { |
4460 | unregister_blkdev(MAJOR_NR, "md"); | 4486 | unregister_blkdev(MAJOR_NR, "md"); |
4461 | return -1; | 4487 | return -1; |
4462 | } | 4488 | } |
4463 | devfs_mk_dir("md"); | 4489 | devfs_mk_dir("md"); |
4464 | blk_register_region(MKDEV(MAJOR_NR, 0), MAX_MD_DEVS, THIS_MODULE, | 4490 | blk_register_region(MKDEV(MAJOR_NR, 0), MAX_MD_DEVS, THIS_MODULE, |
4465 | md_probe, NULL, NULL); | 4491 | md_probe, NULL, NULL); |
4466 | blk_register_region(MKDEV(mdp_major, 0), MAX_MD_DEVS<<MdpMinorShift, THIS_MODULE, | 4492 | blk_register_region(MKDEV(mdp_major, 0), MAX_MD_DEVS<<MdpMinorShift, THIS_MODULE, |
4467 | md_probe, NULL, NULL); | 4493 | md_probe, NULL, NULL); |
4468 | 4494 | ||
4469 | for (minor=0; minor < MAX_MD_DEVS; ++minor) | 4495 | for (minor=0; minor < MAX_MD_DEVS; ++minor) |
4470 | devfs_mk_bdev(MKDEV(MAJOR_NR, minor), | 4496 | devfs_mk_bdev(MKDEV(MAJOR_NR, minor), |
4471 | S_IFBLK|S_IRUSR|S_IWUSR, | 4497 | S_IFBLK|S_IRUSR|S_IWUSR, |
4472 | "md/%d", minor); | 4498 | "md/%d", minor); |
4473 | 4499 | ||
4474 | for (minor=0; minor < MAX_MD_DEVS; ++minor) | 4500 | for (minor=0; minor < MAX_MD_DEVS; ++minor) |
4475 | devfs_mk_bdev(MKDEV(mdp_major, minor<<MdpMinorShift), | 4501 | devfs_mk_bdev(MKDEV(mdp_major, minor<<MdpMinorShift), |
4476 | S_IFBLK|S_IRUSR|S_IWUSR, | 4502 | S_IFBLK|S_IRUSR|S_IWUSR, |
4477 | "md/mdp%d", minor); | 4503 | "md/mdp%d", minor); |
4478 | 4504 | ||
4479 | 4505 | ||
4480 | register_reboot_notifier(&md_notifier); | 4506 | register_reboot_notifier(&md_notifier); |
4481 | raid_table_header = register_sysctl_table(raid_root_table, 1); | 4507 | raid_table_header = register_sysctl_table(raid_root_table, 1); |
4482 | 4508 | ||
4483 | md_geninit(); | 4509 | md_geninit(); |
4484 | return (0); | 4510 | return (0); |
4485 | } | 4511 | } |
4486 | 4512 | ||
4487 | 4513 | ||
4488 | #ifndef MODULE | 4514 | #ifndef MODULE |
4489 | 4515 | ||
4490 | /* | 4516 | /* |
4491 | * Searches all registered partitions for autorun RAID arrays | 4517 | * Searches all registered partitions for autorun RAID arrays |
4492 | * at boot time. | 4518 | * at boot time. |
4493 | */ | 4519 | */ |
4494 | static dev_t detected_devices[128]; | 4520 | static dev_t detected_devices[128]; |
4495 | static int dev_cnt; | 4521 | static int dev_cnt; |
4496 | 4522 | ||
4497 | void md_autodetect_dev(dev_t dev) | 4523 | void md_autodetect_dev(dev_t dev) |
4498 | { | 4524 | { |
4499 | if (dev_cnt >= 0 && dev_cnt < 127) | 4525 | if (dev_cnt >= 0 && dev_cnt < 127) |
4500 | detected_devices[dev_cnt++] = dev; | 4526 | detected_devices[dev_cnt++] = dev; |
4501 | } | 4527 | } |
4502 | 4528 | ||
4503 | 4529 | ||
4504 | static void autostart_arrays(int part) | 4530 | static void autostart_arrays(int part) |
4505 | { | 4531 | { |
4506 | mdk_rdev_t *rdev; | 4532 | mdk_rdev_t *rdev; |
4507 | int i; | 4533 | int i; |
4508 | 4534 | ||
4509 | printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); | 4535 | printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); |
4510 | 4536 | ||
4511 | for (i = 0; i < dev_cnt; i++) { | 4537 | for (i = 0; i < dev_cnt; i++) { |
4512 | dev_t dev = detected_devices[i]; | 4538 | dev_t dev = detected_devices[i]; |
4513 | 4539 | ||
4514 | rdev = md_import_device(dev,0, 0); | 4540 | rdev = md_import_device(dev,0, 0); |
4515 | if (IS_ERR(rdev)) | 4541 | if (IS_ERR(rdev)) |
4516 | continue; | 4542 | continue; |
4517 | 4543 | ||
4518 | if (test_bit(Faulty, &rdev->flags)) { | 4544 | if (test_bit(Faulty, &rdev->flags)) { |
4519 | MD_BUG(); | 4545 | MD_BUG(); |
4520 | continue; | 4546 | continue; |
4521 | } | 4547 | } |
4522 | list_add(&rdev->same_set, &pending_raid_disks); | 4548 | list_add(&rdev->same_set, &pending_raid_disks); |
4523 | } | 4549 | } |
4524 | dev_cnt = 0; | 4550 | dev_cnt = 0; |
4525 | 4551 | ||
4526 | autorun_devices(part); | 4552 | autorun_devices(part); |
4527 | } | 4553 | } |
4528 | 4554 | ||
4529 | #endif | 4555 | #endif |
4530 | 4556 | ||
4531 | static __exit void md_exit(void) | 4557 | static __exit void md_exit(void) |
4532 | { | 4558 | { |
4533 | mddev_t *mddev; | 4559 | mddev_t *mddev; |
4534 | struct list_head *tmp; | 4560 | struct list_head *tmp; |
4535 | int i; | 4561 | int i; |
4536 | blk_unregister_region(MKDEV(MAJOR_NR,0), MAX_MD_DEVS); | 4562 | blk_unregister_region(MKDEV(MAJOR_NR,0), MAX_MD_DEVS); |
4537 | blk_unregister_region(MKDEV(mdp_major,0), MAX_MD_DEVS << MdpMinorShift); | 4563 | blk_unregister_region(MKDEV(mdp_major,0), MAX_MD_DEVS << MdpMinorShift); |
4538 | for (i=0; i < MAX_MD_DEVS; i++) | 4564 | for (i=0; i < MAX_MD_DEVS; i++) |
4539 | devfs_remove("md/%d", i); | 4565 | devfs_remove("md/%d", i); |
4540 | for (i=0; i < MAX_MD_DEVS; i++) | 4566 | for (i=0; i < MAX_MD_DEVS; i++) |
4541 | devfs_remove("md/d%d", i); | 4567 | devfs_remove("md/d%d", i); |
4542 | 4568 | ||
4543 | devfs_remove("md"); | 4569 | devfs_remove("md"); |
4544 | 4570 | ||
4545 | unregister_blkdev(MAJOR_NR,"md"); | 4571 | unregister_blkdev(MAJOR_NR,"md"); |
4546 | unregister_blkdev(mdp_major, "mdp"); | 4572 | unregister_blkdev(mdp_major, "mdp"); |
4547 | unregister_reboot_notifier(&md_notifier); | 4573 | unregister_reboot_notifier(&md_notifier); |
4548 | unregister_sysctl_table(raid_table_header); | 4574 | unregister_sysctl_table(raid_table_header); |
4549 | remove_proc_entry("mdstat", NULL); | 4575 | remove_proc_entry("mdstat", NULL); |
4550 | ITERATE_MDDEV(mddev,tmp) { | 4576 | ITERATE_MDDEV(mddev,tmp) { |
4551 | struct gendisk *disk = mddev->gendisk; | 4577 | struct gendisk *disk = mddev->gendisk; |
4552 | if (!disk) | 4578 | if (!disk) |
4553 | continue; | 4579 | continue; |
4554 | export_array(mddev); | 4580 | export_array(mddev); |
4555 | del_gendisk(disk); | 4581 | del_gendisk(disk); |
4556 | put_disk(disk); | 4582 | put_disk(disk); |
4557 | mddev->gendisk = NULL; | 4583 | mddev->gendisk = NULL; |
4558 | mddev_put(mddev); | 4584 | mddev_put(mddev); |
4559 | } | 4585 | } |
4560 | } | 4586 | } |
4561 | 4587 | ||
4562 | module_init(md_init) | 4588 | module_init(md_init) |
4563 | module_exit(md_exit) | 4589 | module_exit(md_exit) |
4564 | 4590 | ||
4565 | static int get_ro(char *buffer, struct kernel_param *kp) | 4591 | static int get_ro(char *buffer, struct kernel_param *kp) |
4566 | { | 4592 | { |
4567 | return sprintf(buffer, "%d", start_readonly); | 4593 | return sprintf(buffer, "%d", start_readonly); |
4568 | } | 4594 | } |
4569 | static int set_ro(const char *val, struct kernel_param *kp) | 4595 | static int set_ro(const char *val, struct kernel_param *kp) |
4570 | { | 4596 | { |
4571 | char *e; | 4597 | char *e; |
4572 | int num = simple_strtoul(val, &e, 10); | 4598 | int num = simple_strtoul(val, &e, 10); |
4573 | if (*val && (*e == '\0' || *e == '\n')) { | 4599 | if (*val && (*e == '\0' || *e == '\n')) { |
4574 | start_readonly = num; | 4600 | start_readonly = num; |
4575 | return 0;; | 4601 | return 0;; |
4576 | } | 4602 | } |
4577 | return -EINVAL; | 4603 | return -EINVAL; |
4578 | } | 4604 | } |
4579 | 4605 | ||
4580 | module_param_call(start_ro, set_ro, get_ro, NULL, 0600); | 4606 | module_param_call(start_ro, set_ro, get_ro, NULL, 0600); |
4581 | module_param(start_dirty_degraded, int, 0644); | 4607 | module_param(start_dirty_degraded, int, 0644); |
4582 | 4608 | ||
4583 | 4609 | ||
4584 | EXPORT_SYMBOL(register_md_personality); | 4610 | EXPORT_SYMBOL(register_md_personality); |
4585 | EXPORT_SYMBOL(unregister_md_personality); | 4611 | EXPORT_SYMBOL(unregister_md_personality); |
4586 | EXPORT_SYMBOL(md_error); | 4612 | EXPORT_SYMBOL(md_error); |
4587 | EXPORT_SYMBOL(md_done_sync); | 4613 | EXPORT_SYMBOL(md_done_sync); |
4588 | EXPORT_SYMBOL(md_write_start); | 4614 | EXPORT_SYMBOL(md_write_start); |
4589 | EXPORT_SYMBOL(md_write_end); | 4615 | EXPORT_SYMBOL(md_write_end); |
4590 | EXPORT_SYMBOL(md_register_thread); | 4616 | EXPORT_SYMBOL(md_register_thread); |
4591 | EXPORT_SYMBOL(md_unregister_thread); | 4617 | EXPORT_SYMBOL(md_unregister_thread); |
4592 | EXPORT_SYMBOL(md_wakeup_thread); | 4618 | EXPORT_SYMBOL(md_wakeup_thread); |
4593 | EXPORT_SYMBOL(md_print_devices); | 4619 | EXPORT_SYMBOL(md_print_devices); |
4594 | EXPORT_SYMBOL(md_check_recovery); | 4620 | EXPORT_SYMBOL(md_check_recovery); |
4595 | MODULE_LICENSE("GPL"); | 4621 | MODULE_LICENSE("GPL"); |
4596 | MODULE_ALIAS("md"); | 4622 | MODULE_ALIAS("md"); |
4597 | MODULE_ALIAS_BLOCKDEV_MAJOR(MD_MAJOR); | 4623 | MODULE_ALIAS_BLOCKDEV_MAJOR(MD_MAJOR); |
4598 | 4624 |