Blame view

Documentation/filesystems/ext4.txt 25.8 KB
fc513a333   Dave Kleikamp   [PATCH] Documenta...
1
2
3
  
  Ext4 Filesystem
  ===============
22359f574   Diego Calleja   ext4: Update Docu...
4
5
6
7
  Ext4 is an an advanced level of the ext3 filesystem which incorporates
  scalability and reliability enhancements for supporting large filesystems
  (64 bit) in keeping with increasing disk capacities and state-of-the-art
  feature requirements.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
8

22359f574   Diego Calleja   ext4: Update Docu...
9
10
  Mailing list:	linux-ext4@vger.kernel.org
  Web site:	http://ext4.wiki.kernel.org
fc513a333   Dave Kleikamp   [PATCH] Documenta...
11
12
13
14
  
  
  1. Quick usage instructions:
  ===========================
22359f574   Diego Calleja   ext4: Update Docu...
15
16
17
  Note: More extensive information for getting started with ext4 can be
        found at the ext4 wiki site at the URL:
        http://ext4.wiki.kernel.org/index.php/Ext4_Howto
93e3270c8   Jose R. Santos   ext4: Documentati...
18
    - Compile and install the latest version of e2fsprogs (as of this
22359f574   Diego Calleja   ext4: Update Docu...
19
      writing version 1.41.3) from:
93e3270c8   Jose R. Santos   ext4: Documentati...
20
21
22
23
  
      http://sourceforge.net/project/showfiles.php?group_id=2406
  	
  	or
fc513a333   Dave Kleikamp   [PATCH] Documenta...
24
      ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
93e3270c8   Jose R. Santos   ext4: Documentati...
25
26
27
  	or grab the latest git repository from:
  
      git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
4537398d9   Theodore Ts'o   ext4: Update docu...
28
29
30
31
32
    - Note that it is highly important to install the mke2fs.conf file
      that comes with the e2fsprogs 1.41.x sources in /etc/mke2fs.conf. If
      you have edited the /etc/mke2fs.conf file installed on your system,
      you will need to merge your changes with the version from e2fsprogs
      1.41.x.
03010a335   Theodore Ts'o   ext4: Rename ext4...
33
    - Create a new filesystem using the ext4 filesystem type:
93e3270c8   Jose R. Santos   ext4: Documentati...
34

03010a335   Theodore Ts'o   ext4: Rename ext4...
35
      	# mke2fs -t ext4 /dev/hda1
93e3270c8   Jose R. Santos   ext4: Documentati...
36

22359f574   Diego Calleja   ext4: Update Docu...
37
      Or to configure an existing ext3 filesystem to support extents: 
fc513a333   Dave Kleikamp   [PATCH] Documenta...
38

22359f574   Diego Calleja   ext4: Update Docu...
39
  	# tune2fs -O extents /dev/hda1
fc513a333   Dave Kleikamp   [PATCH] Documenta...
40

93e3270c8   Jose R. Santos   ext4: Documentati...
41
42
      If the filesystem was created with 128 byte inodes, it can be
      converted to use 256 byte for greater efficiency via:
fc513a333   Dave Kleikamp   [PATCH] Documenta...
43

93e3270c8   Jose R. Santos   ext4: Documentati...
44
          # tune2fs -I 256 /dev/hda1
fc513a333   Dave Kleikamp   [PATCH] Documenta...
45

03010a335   Theodore Ts'o   ext4: Rename ext4...
46
      (Note: we currently do not have tools to convert an ext4
93e3270c8   Jose R. Santos   ext4: Documentati...
47
48
      filesystem back to ext3; so please do not do try this on production
      filesystems.)
fc513a333   Dave Kleikamp   [PATCH] Documenta...
49

93e3270c8   Jose R. Santos   ext4: Documentati...
50
    - Mounting:
03010a335   Theodore Ts'o   ext4: Rename ext4...
51
  	# mount -t ext4 /dev/hda1 /wherever
fc513a333   Dave Kleikamp   [PATCH] Documenta...
52

8e1a4857c   Theodore Ts'o   Update Documentat...
53
54
55
56
57
58
59
60
61
62
    - When comparing performance with other filesystems, it's always
      important to try multiple workloads; very often a subtle change in a
      workload parameter can completely change the ranking of which
      filesystems do well compared to others.  When comparing versus ext3,
      note that ext4 enables write barriers by default, while ext3 does
      not enable write barriers by default.  So it is useful to use
      explicitly specify whether barriers are enabled or not when via the
      '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems
      for a fair comparison.  When tuning ext3 for best benchmark numbers,
      it is often worthwhile to try changing the data journaling mode; '-o
ad4340177   Lukas Czerner   ext3/ext4 Documen...
63
64
65
66
67
68
      data=writeback' can be faster for some workloads.  (Note however that
      running mounted with data=writeback can potentially leave stale data
      exposed in recently written files in case of an unclean shutdown,
      which could be a security exposure in some situations.)  Configuring
      the filesystem with a large journal can also be helpful for
      metadata-intensive workloads.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
69
70
71
72
73
  
  2. Features
  ===========
  
  2.1 Currently available
93e3270c8   Jose R. Santos   ext4: Documentati...
74
  * ability to use filesystems > 16TB (e2fsprogs support not available yet)
fc513a333   Dave Kleikamp   [PATCH] Documenta...
75
76
  * extent format reduces metadata overhead (RAM, IO for access, transactions)
  * extent format more robust in face of on-disk corruption due to magics,
8e1a4857c   Theodore Ts'o   Update Documentat...
77
  * internal redundancy in tree
49f1487b2   Mingming Cao   ext4: Documention...
78
  * improved file allocation (multi-block alloc)
722bde687   Theodore Ts'o   ext4: Add fine pr...
79
  * lift 32000 subdirectory limit imposed by i_links_count[1]
93e3270c8   Jose R. Santos   ext4: Documentati...
80
81
82
83
84
85
86
87
88
  * nsec timestamps for mtime, atime, ctime, create time
  * inode version field on disk (NFSv4, Lustre)
  * reduced e2fsck time via uninit_bg feature
  * journal checksumming for robustness, performance
  * persistent file preallocation (e.g for streaming media, databases)
  * ability to pack bitmaps and inode tables into larger virtual groups via the
    flex_bg feature
  * large file support
  * Inode allocation using large virtual block groups via flex_bg
49f1487b2   Mingming Cao   ext4: Documention...
89
90
  * delayed allocation
  * large block (up to pagesize) support
25985edce   Lucas De Marchi   Fix common misspe...
91
  * efficient new ordered mode in JBD2 and ext4(avoid using buffer head to force
49f1487b2   Mingming Cao   ext4: Documention...
92
    the ordering)
fc513a333   Dave Kleikamp   [PATCH] Documenta...
93

722bde687   Theodore Ts'o   ext4: Add fine pr...
94
95
  [1] Filesystems with a block size of 1k may see a limit imposed by the
  directory hash tree having a maximum depth of two.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
96
  2.2 Candidate features for future inclusion
93e3270c8   Jose R. Santos   ext4: Documentati...
97
  * Online defrag (patches available but not well tested)
25985edce   Lucas De Marchi   Fix common misspe...
98
  * reduced mke2fs time via lazy itable initialization in conjunction with
93e3270c8   Jose R. Santos   ext4: Documentati...
99
100
101
    the uninit_bg feature (capability to do this is available in e2fsprogs
    but a kernel thread to do lazy zeroing of unused inode table blocks
    after filesystem is first mounted is required for safety)
fc513a333   Dave Kleikamp   [PATCH] Documenta...
102

93e3270c8   Jose R. Santos   ext4: Documentati...
103
104
105
106
  There are several others under discussion, whether they all make it in is
  partly a function of how much time everyone has to work on them. Features like
  metadata checksumming have been discussed and planned for a bit but no patches
  exist yet so I'm not sure they're in the near-term roadmap.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
107

93e3270c8   Jose R. Santos   ext4: Documentati...
108
109
  The big performance win will come with mballoc, delalloc and flex_bg
  grouping of bitmaps and inode tables.  Some test results available here:
fc513a333   Dave Kleikamp   [PATCH] Documenta...
110

22359f574   Diego Calleja   ext4: Update Docu...
111
112
   - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-write-2.6.27-rc1.html
   - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-readwrite-2.6.27-rc1.html
fc513a333   Dave Kleikamp   [PATCH] Documenta...
113
114
115
116
117
118
  
  3. Options
  ==========
  
  When mounting an ext4 filesystem, the following option are accepted:
  (*) == default
8e1a4857c   Theodore Ts'o   Update Documentat...
119
120
121
122
123
  ro                   	Mount filesystem read only. Note that ext4 will
                       	replay the journal (and thus write to the
                       	partition) even when mounted "read only". The
                       	mount options "ro,noload" can be used to prevent
  		     	writes to the filesystem.
d4da6c9cc   Linus Torvalds   Revert "ext4: Rem...
124
125
126
127
  journal_checksum	Enable checksumming of the journal transactions.
  			This will allow the recovery code in e2fsck and the
  			kernel to detect corruption in the kernel.  It is a
  			compatible change and will be ignored by older kernels.
818d276ce   Girish Shilamkar   ext4: Add the jou...
128
129
  journal_async_commit	Commit block can be written to disk without waiting
  			for descriptor blocks. If enabled older kernels cannot
d4da6c9cc   Linus Torvalds   Revert "ext4: Rem...
130
131
  			mount the device. This will enable 'journal_checksum'
  			internally.
818d276ce   Girish Shilamkar   ext4: Add the jou...
132

fc513a333   Dave Kleikamp   [PATCH] Documenta...
133
134
  journal=update		Update the ext4 file system's journal to the current
  			format.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
135
136
137
138
139
  journal_dev=devnum	When the external journal device's major/minor numbers
  			have changed, this option allows the user to specify
  			the new journal location.  The journal device is
  			identified through its new major/minor numbers encoded
  			in devnum.
e3bb52ae2   Eric Sandeen   ext4: make "norec...
140
141
  norecovery		Don't load the journal on mounting.  Note that
  noload			if the filesystem was not unmounted cleanly,
8e1a4857c   Theodore Ts'o   Update Documentat...
142
143
144
                       	skipping the journal replay will lead to the
                       	filesystem containing inconsistencies that can
                       	lead to any number of problems.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
145
146
  
  data=journal		All data are committed into the journal prior to being
56889787c   Theodore Ts'o   ext4: improve han...
147
148
149
  			written into the main file system.  Enabling
  			this mode will disable delayed allocation and
  			O_DIRECT support.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
  
  data=ordered	(*)	All data are forced directly out to the main file
  			system prior to its metadata being committed to the
  			journal.
  
  data=writeback		Data ordering is not preserved, data may be written
  			into the main file system after its metadata has been
  			committed to the journal.
  
  commit=nrsec	(*)	Ext4 can be told to sync all its data and metadata
  			every 'nrsec' seconds. The default value is 5 seconds.
  			This means that if you lose your power, you will lose
  			as much as the latest 5 seconds of work (your
  			filesystem will not be damaged though, thanks to the
  			journaling).  This default value (or any low value)
  			will hurt performance, but it's good for data-safety.
  			Setting it to 0 will have the same effect as leaving
  			it at the default (5 seconds).
  			Setting it to very large values will improve
  			performance.
571640cad   Eric Sandeen   ext4: enable barr...
170
  barrier=<0|1(*)>	This enables/disables the use of write barriers in
06705bff9   Theodore Ts'o   ext4: Regularize ...
171
172
  barrier(*)		the jbd code.  barrier=0 disables, barrier=1 enables.
  nobarrier		This also requires an IO stack which can support
571640cad   Eric Sandeen   ext4: enable barr...
173
174
175
176
177
178
179
  			barriers, and if jbd gets an error on a barrier
  			write, it will disable again with a warning.
  			Write barriers enforce proper on-disk ordering
  			of journal commits, making volatile disk write caches
  			safe to use, at some performance penalty.  If
  			your disks are battery-backed in one way or another,
  			disabling barriers may safely improve performance.
06705bff9   Theodore Ts'o   ext4: Regularize ...
180
181
182
  			The mount options "barrier" and "nobarrier" can
  			also be used to enable or disable barriers, for
  			consistency with other ext4 mount options.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
183

6d3b82f2d   Fang Wenqi   ext4: Update docu...
184
  inode_readahead_blks=n	This tuning parameter controls the maximum
240799cdf   Theodore Ts'o   ext4: Use readahe...
185
186
187
  			number of inode table blocks that ext4's inode
  			table readahead algorithm will pre-read into
  			the buffer cache.  The default value is 32 blocks.
af909a57f   Theodore Ts'o   ext4: documentati...
188
189
190
191
192
193
  nouser_xattr		Disables Extended User Attributes. If you have extended
  			attribute support enabled in the kernel configuration
  			(CONFIG_EXT4_FS_XATTR), extended attribute support
  			is enabled by default on mount. See the attr(5) manual
  			page and http://acl.bestbits.at/ for more information
  			about extended attributes.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
194
195
  
  noacl			This option disables POSIX Access Control List
af909a57f   Theodore Ts'o   ext4: documentati...
196
197
198
199
200
  			support. If ACL support is enabled in the kernel
  			configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL is
  			enabled by default on mount. See the acl(5) manual
  			page and http://acl.bestbits.at/ for more information
  			about acl.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
201

fc513a333   Dave Kleikamp   [PATCH] Documenta...
202
203
  bsddf		(*)	Make 'df' act like BSD.
  minixdf			Make 'df' act like Minix.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
204
  debug			Extra debugging information is sent to syslog.
8a8a2050c   Theodore Ts'o   ext4: document th...
205
206
207
  abort			Simulate the effects of calling ext4_abort() for
  			debugging purposes.  This is normally used while
  			remounting a filesystem which is already mounted.
8e1a4857c   Theodore Ts'o   Update Documentat...
208
  errors=remount-ro	Remount the filesystem read-only on an error.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
209
210
  errors=continue		Keep going on a filesystem error.
  errors=panic		Panic and halt the machine if an error occurs.
8e1a4857c   Theodore Ts'o   Update Documentat...
211
212
213
                          (These mount options override the errors behavior
                          specified in the superblock, which can be configured
                          using tune2fs)
fc513a333   Dave Kleikamp   [PATCH] Documenta...
214

5bf5683a3   Hidehiro Kawai   ext4: add an opti...
215
216
217
218
  data_err=ignore(*)	Just print an error message if an error occurs
  			in a file data buffer in ordered mode.
  data_err=abort		Abort the journal if an error occurs in a file
  			data buffer in ordered mode.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
219
220
221
222
223
224
225
226
227
228
229
  grpid			Give objects the same group ID as their creator.
  bsdgroups
  
  nogrpid		(*)	New objects have the group ID of their creator.
  sysvgroups
  
  resgid=n		The group ID which may use the reserved blocks.
  
  resuid=n		The user ID which may use the reserved blocks.
  
  sb=n			Use alternate superblock at this location.
1358870de   Jan Kara   ext4: Update docu...
230
231
232
233
234
235
236
237
238
239
240
241
  quota			These options are ignored by the filesystem. They
  noquota			are used only by quota tools to recognize volumes
  grpquota		where quota should be turned on. See documentation
  usrquota		in the quota-tools package for more details
  			(http://sourceforge.net/projects/linuxquota).
  
  jqfmt=<quota type>	These options tell filesystem details about quota
  usrjquota=<file>	so that quota information can be properly updated
  grpjquota=<file>	during journal replay. They replace the above
  			quota options. See documentation in the quota-tools
  			package for more details
  			(http://sourceforge.net/projects/linuxquota).
fc513a333   Dave Kleikamp   [PATCH] Documenta...
242

c9de560de   Alex Tomas   ext4: Add multi b...
243
244
245
246
  stripe=n		Number of filesystem blocks that mballoc will try
  			to use for allocation size and alignment. For RAID5/6
  			systems this should be the number of data
  			disks *  RAID chunk size in file system blocks.
836538882   Jan Kara   ext4: Update docu...
247
248
249
250
251
252
253
254
255
256
  
  delalloc	(*)	Defer block allocation until just before ext4
  			writes out the block(s) in question.  This
  			allows ext4 to better allocation decisions
  			more efficiently.
  nodelalloc		Disable delayed allocation.  Blocks are allocated
  			when the data is copied from userspace to the
  			page cache, either via the write(2) system call
  			or when an mmap'ed page which was previously
  			unallocated is written for the first time.
240799cdf   Theodore Ts'o   ext4: Use readahe...
257

30773840c   Theodore Ts'o   ext4: add fsync b...
258
259
260
261
262
263
264
265
266
267
268
269
270
271
  max_batch_time=usec	Maximum amount of time ext4 should wait for
  			additional filesystem operations to be batch
  			together with a synchronous write operation.
  			Since a synchronous write operation is going to
  			force a commit and then a wait for the I/O
  			complete, it doesn't cost much, and can be a
  			huge throughput win, we wait for a small amount
  			of time to see if any other transactions can
  			piggyback on the synchronous write.   The
  			algorithm used is designed to automatically tune
  			for the speed of the disk, by measuring the
  			amount of time (on average) that it takes to
  			finish committing a transaction.  Call this time
  			the "commit time".  If the time that the
19f594600   Matt LaPlante   trivial: Miscella...
272
  			transaction has been running is less than the
30773840c   Theodore Ts'o   ext4: add fsync b...
273
274
275
276
277
278
279
280
281
282
283
284
285
  			commit time, ext4 will try sleeping for the
  			commit time to see if other operations will join
  			the transaction.   The commit time is capped by
  			the max_batch_time, which defaults to 15000us
  			(15ms).   This optimization can be turned off
  			entirely by setting max_batch_time to 0.
  
  min_batch_time=usec	This parameter sets the commit time (as
  			described above) to be at least min_batch_time.
  			It defaults to zero microseconds.  Increasing
  			this parameter may improve the throughput of
  			multi-threaded, synchronous workloads on very
  			fast disks, at the cost of increasing latency.
b3881f74b   Theodore Ts'o   ext4: Add mount o...
286
287
288
289
290
291
  journal_ioprio=prio	The I/O priority (from 0 to 7, where 0 is the
  			highest priorty) which should be used for I/O
  			operations submitted by kjournald2 during a
  			commit operation.  This defaults to 3, which is
  			a slightly higher priority than the default I/O
  			priority.
06705bff9   Theodore Ts'o   ext4: Regularize ...
292
293
294
295
296
297
298
299
300
301
302
303
  auto_da_alloc(*)	Many broken applications don't use fsync() when 
  noauto_da_alloc		replacing existing files via patterns such as
  			fd = open("foo.new")/write(fd,..)/close(fd)/
  			rename("foo.new", "foo"), or worse yet,
  			fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
  			If auto_da_alloc is enabled, ext4 will detect
  			the replace-via-rename and replace-via-truncate
  			patterns and force that any delayed allocation
  			blocks are allocated such that at the next
  			journal commit, in the default data=ordered
  			mode, the data blocks of the new file are forced
  			to disk before the rename() operation is
19f594600   Matt LaPlante   trivial: Miscella...
304
  			committed.  This provides roughly the same level
06705bff9   Theodore Ts'o   ext4: Regularize ...
305
306
307
308
  			of guarantees as ext3, and avoids the
  			"zero-length" problem that can happen when a
  			system crashes before the delayed allocation
  			blocks are forced to disk.
bfff68738   Lukas Czerner   ext4: add support...
309
310
311
312
313
314
315
316
317
318
319
320
321
  noinit_itable		Do not initialize any uninitialized inode table
  			blocks in the background.  This feature may be
  			used by installation CD's so that the install
  			process can complete as quickly as possible; the
  			inode table initialization process would then be
  			deferred until the next time the  file system
  			is unmounted.
  
  init_itable=n		The lazy itable init code will wait n times the
  			number of milliseconds it took to zero out the
  			previous block group's inode table.  This
  			minimizes the impact on the systme performance
  			while file system's inode table is being initialized.
6f9524e9e   Lukas Czerner   ext4: update ext4...
322
  discard			Controls whether ext4 should issue discard/TRIM
5328e6353   Eric Sandeen   ext4: make trim/d...
323
324
325
326
  nodiscard(*)		commands to the underlying block device when
  			blocks are freed.  This is useful for SSD devices
  			and sparse/thinly-provisioned LUNs, but it is off
  			by default until sufficient testing has been done.
6f9524e9e   Lukas Czerner   ext4: update ext4...
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
  nouid32			Disables 32-bit UIDs and GIDs.  This is for
  			interoperability  with  older kernels which only
  			store and expect 16-bit values.
  
  resize			Allows to resize filesystem to the end of the last
  			existing block group, further resize has to be done
  			with resize2fs either online, or offline. It can be
  			used only with conjunction with remount.
  
  block_validity		This options allows to enables/disables the in-kernel
  noblock_validity	facility for tracking filesystem metadata blocks
  			within internal data structures. This allows multi-
  			block allocator and other routines to quickly locate
  			extents which might overlap with filesystem metadata
  			blocks. This option is intended for debugging
  			purposes and since it negatively affects the
  			performance, it is off by default.
  
  dioread_lock		Controls whether or not ext4 should use the DIO read
  dioread_nolock		locking. If the dioread_nolock option is specified
  			ext4 will allocate uninitialized extent before buffer
  			write and convert the extent to initialized after IO
  			completes. This approach allows ext4 code to avoid
  			using inode mutex, which improves scalability on high
ad4340177   Lukas Czerner   ext3/ext4 Documen...
351
  			speed storages. However this does not work with
6f9524e9e   Lukas Czerner   ext4: update ext4...
352
353
354
355
356
357
358
359
  			data journaling and dioread_nolock option will be
  			ignored with kernel warning. Note that dioread_nolock
  			code path is only used for extent-based files.
  			Because of the restrictions this options comprises
  			it is off by default (e.g. dioread_lock).
  
  i_version		Enable 64-bit inode version support. This option is
  			off by default.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
360
  Data Mode
93e3270c8   Jose R. Santos   ext4: Documentati...
361
  =========
fc513a333   Dave Kleikamp   [PATCH] Documenta...
362
363
364
365
366
367
368
369
370
371
372
  There are 3 different data modes:
  
  * writeback mode
  In data=writeback mode, ext4 does not journal data at all.  This mode provides
  a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
  mode - metadata journaling.  A crash+recovery can cause incorrect data to
  appear in files which were written shortly before the crash.  This mode will
  typically provide the best ext4 performance.
  
  * ordered mode
  In data=ordered mode, ext4 only officially journals metadata, but it logically
49f1487b2   Mingming Cao   ext4: Documention...
373
374
375
376
  groups metadata information related to data changes with the data blocks into a
  single unit called a transaction.  When it's time to write the new metadata
  out to disk, the associated data blocks are written first.  In general,
  this mode performs slightly slower than writeback but significantly faster than journal mode.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
377
378
379
380
381
382
383
  
  * journal mode
  data=journal mode provides full data and metadata journaling.  All new data is
  written to the journal first, and then to its final location.
  In the event of a crash, the journal can be replayed, bringing both data and
  metadata into a consistent state.  This mode is the slowest except when data
  needs to be read from and written to disk at the same time where it
56889787c   Theodore Ts'o   ext4: improve han...
384
385
  outperforms all others modes.  Enabling this mode will disable delayed
  allocation and O_DIRECT support.
fc513a333   Dave Kleikamp   [PATCH] Documenta...
386

6f9524e9e   Lukas Czerner   ext4: update ext4...
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
  /proc entries
  =============
  
  Information about mounted ext4 file systems can be found in
  /proc/fs/ext4.  Each mounted filesystem will have a directory in
  /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
  /proc/fs/ext4/dm-0).   The files in each per-device directory are shown
  in table below.
  
  Files in /proc/fs/ext4/<devname>
  ..............................................................................
   File            Content
   mb_groups       details of multiblock allocator buddy cache of free blocks
  ..............................................................................
  
  /sys entries
  ============
  
  Information about mounted ext4 file systems can be found in
  /sys/fs/ext4.  Each mounted filesystem will have a directory in
  /sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or
  /sys/fs/ext4/dm-0).   The files in each per-device directory are shown
  in table below.
  
  Files in /sys/fs/ext4/<devname>
  (see also Documentation/ABI/testing/sysfs-fs-ext4)
  ..............................................................................
   File                         Content
  
   delayed_allocation_blocks    This file is read-only and shows the number of
                                blocks that are dirty in the page cache, but
                                which do not have their location in the
                                filesystem allocated yet.
  
   inode_goal                   Tuning parameter which (if non-zero) controls
                                the goal inode used by the inode allocator in
                                preference to all other allocation heuristics.
                                This is intended for debugging use only, and
                                should be 0 on production systems.
  
   inode_readahead_blks         Tuning parameter which controls the maximum
                                number of inode table blocks that ext4's inode
                                table readahead algorithm will pre-read into
                                the buffer cache
  
   lifetime_write_kbytes        This file is read-only and shows the number of
                                kilobytes of data that have been written to this
                                filesystem since it was created.
  
   max_writeback_mb_bump        The maximum number of megabytes the writeback
                                code will try to write out before move on to
                                another inode.
  
   mb_group_prealloc            The multiblock allocator will round up allocation
                                requests to a multiple of this tuning parameter if
                                the stripe size is not set in the ext4 superblock
  
   mb_max_to_scan               The maximum number of extents the multiblock
                                allocator will search to find the best extent
  
   mb_min_to_scan               The minimum number of extents the multiblock
                                allocator will search to find the best extent
  
   mb_order2_req                Tuning parameter which controls the minimum size
                                for requests (as a power of 2) where the buddy
                                cache is used
  
   mb_stats                     Controls whether the multiblock allocator should
                                collect statistics, which are shown during the
                                unmount. 1 means to collect statistics, 0 means
                                not to collect statistics
  
   mb_stream_req                Files which have fewer blocks than this tunable
                                parameter will have their blocks allocated out
                                of a block group specific preallocation pool, so
                                that small files are packed closely together.
                                Each large file will have its blocks allocated
                                out of its own unique preallocation pool.
  
   session_write_kbytes         This file is read-only and shows the number of
                                kilobytes of data that have been written to this
                                filesystem since it was mounted.
  ..............................................................................
  
  Ioctls
  ======
  
  There is some Ext4 specific functionality which can be accessed by applications
  through the system call interfaces. The list of all Ext4 specific ioctls are
  shown in the table below.
  
  Table of Ext4 specific ioctls
  ..............................................................................
   Ioctl			      Description
   EXT4_IOC_GETFLAGS	      Get additional attributes associated with inode.
  			      The ioctl argument is an integer bitfield, with
  			      bit values described in ext4.h. This ioctl is an
  			      alias for FS_IOC_GETFLAGS.
  
   EXT4_IOC_SETFLAGS	      Set additional attributes associated with inode.
  			      The ioctl argument is an integer bitfield, with
  			      bit values described in ext4.h. This ioctl is an
  			      alias for FS_IOC_SETFLAGS.
  
   EXT4_IOC_GETVERSION
   EXT4_IOC_GETVERSION_OLD
  			      Get the inode i_generation number stored for
  			      each inode. The i_generation number is normally
  			      changed only when new inode is created and it is
  			      particularly useful for network filesystems. The
  			      '_OLD' version of this ioctl is an alias for
  			      FS_IOC_GETVERSION.
  
   EXT4_IOC_SETVERSION
   EXT4_IOC_SETVERSION_OLD
  			      Set the inode i_generation number stored for
  			      each inode. The '_OLD' version of this ioctl
  			      is an alias for FS_IOC_SETVERSION.
  
   EXT4_IOC_GROUP_EXTEND	      This ioctl has the same purpose as the resize
  			      mount option. It allows to resize filesystem
  			      to the end of the last existing block group,
  			      further resize has to be done with resize2fs,
  			      either online, or offline. The argument points
  			      to the unsigned logn number representing the
  			      filesystem new block count.
  
   EXT4_IOC_MOVE_EXT	      Move the block extents from orig_fd (the one
  			      this ioctl is pointing to) to the donor_fd (the
  			      one specified in move_extent structure passed
  			      as an argument to this ioctl). Then, exchange
  			      inode metadata between orig_fd and donor_fd.
  			      This is especially useful for online
  			      defragmentation, because the allocator has the
  			      opportunity to allocate moved blocks better,
  			      ideally into one contiguous extent.
  
   EXT4_IOC_GROUP_ADD	      Add a new group descriptor to an existing or
  			      new group descriptor block. The new group
  			      descriptor is described by ext4_new_group_input
  			      structure, which is passed as an argument to
  			      this ioctl. This is especially useful in
  			      conjunction with EXT4_IOC_GROUP_EXTEND,
  			      which allows online resize of the filesystem
  			      to the end of the last existing block group.
  			      Those two ioctls combined is used in userspace
  			      online resize tool (e.g. resize2fs).
  
   EXT4_IOC_MIGRATE	      This ioctl operates on the filesystem itself.
  			      It converts (migrates) ext3 indirect block mapped
  			      inode to ext4 extent mapped inode by walking
  			      through indirect block mapping of the original
  			      inode and converting contiguous block ranges
  			      into ext4 extents of the temporary inode. Then,
  			      inodes are swapped. This ioctl might help, when
  			      migrating from ext3 to ext4 filesystem, however
  			      suggestion is to create fresh ext4 filesystem
  			      and copy data from the backup. Note, that
  			      filesystem has to support extents for this ioctl
  			      to work.
  
   EXT4_IOC_ALLOC_DA_BLKS	      Force all of the delay allocated blocks to be
  			      allocated to preserve application-expected ext3
  			      behaviour. Note that this will also start
  			      triggering a write of the data blocks, but this
  			      behaviour may change in the future as it is
  			      not necessary and has been done this way only
  			      for sake of simplicity.
19c5246d2   Yongqiang Yang   ext4: add new onl...
555
556
557
558
559
560
  
   EXT4_IOC_RESIZE_FS	      Resize the filesystem to a new size.  The number
  			      of blocks of resized filesystem is passed in via
  			      64 bit integer argument.  The kernel allocates
  			      bitmaps and inode table, the userspace tool thus
  			      just passes the new number of blocks.
6f9524e9e   Lukas Czerner   ext4: update ext4...
561
  ..............................................................................
fc513a333   Dave Kleikamp   [PATCH] Documenta...
562
563
564
565
566
567
568
  References
  ==========
  
  kernel source:	<file:fs/ext4/>
  		<file:fs/jbd2/>
  
  programs:	http://e2fsprogs.sourceforge.net/
fc513a333   Dave Kleikamp   [PATCH] Documenta...
569
570
571
  
  useful links:	http://fedoraproject.org/wiki/ext3-devel
  		http://www.bullopensource.org/ext4/
93e3270c8   Jose R. Santos   ext4: Documentati...
572
573
  		http://ext4.wiki.kernel.org/index.php/Main_Page
  		http://fedoraproject.org/wiki/Features/Ext4