Blame view
Documentation/filesystems/files.rst
4.21 KB
e6d42cb19 docs: filesystems... |
1 2 3 |
.. SPDX-License-Identifier: GPL-2.0 =================================== |
282254189 [PATCH] files: fi... |
4 |
File management in the Linux kernel |
e6d42cb19 docs: filesystems... |
5 |
=================================== |
282254189 [PATCH] files: fi... |
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
This document describes how locking for files (struct file) and file descriptor table (struct files) works. Up until 2.6.12, the file descriptor table has been protected with a lock (files->file_lock) and reference count (files->count). ->file_lock protected accesses to all the file related fields of the table. ->count was used for sharing the file descriptor table between tasks cloned with CLONE_FILES flag. Typically this would be the case for posix threads. As with the common refcounting model in the kernel, the last task doing a put_files_struct() frees the file descriptor (fd) table. The files (struct file) themselves are protected using reference count (->f_count). In the new lock-free model of file descriptor management, the reference counting is similar, but the locking is based on RCU. The file descriptor table contains multiple elements - the fd sets (open_fds and close_on_exec, the array of file pointers, the sizes of the sets and the array etc.). In order for the updates to appear atomic to a lock-free reader, all the elements of the file descriptor table are in a separate structure - struct fdtable. files_struct contains a pointer to struct fdtable through which the actual fd table is accessed. Initially the fdtable is embedded in files_struct itself. On a subsequent expansion of fdtable, a new fdtable structure is allocated and files->fdtab points to the new structure. The fdtable structure is freed with RCU and lock-free readers either see the old fdtable or the new fdtable making the update appear atomic. Here are the locking rules for the fdtable structure - 1. All references to the fdtable must be done through |
e6d42cb19 docs: filesystems... |
40 |
the files_fdtable() macro:: |
282254189 [PATCH] files: fi... |
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
struct fdtable *fdt; rcu_read_lock(); fdt = files_fdtable(files); .... if (n <= fdt->max_fds) .... ... rcu_read_unlock(); files_fdtable() uses rcu_dereference() macro which takes care of the memory barrier requirements for lock-free dereference. The fdtable pointer must be read within the read-side critical section. 2. Reading of the fdtable as described above must be protected by rcu_read_lock()/rcu_read_unlock(). |
670e9f34e Documentation: re... |
60 |
3. For any update to the fd table, files->file_lock must |
282254189 [PATCH] files: fi... |
61 62 63 64 65 |
be held. 4. To look up the file structure given an fd, a reader must use either fcheck() or fcheck_files() APIs. These take care of barrier requirements due to lock-free lookup. |
e6d42cb19 docs: filesystems... |
66 67 |
An example:: |
282254189 [PATCH] files: fi... |
68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
struct file *file; rcu_read_lock(); file = fcheck(fd); if (file) { ... } .... rcu_read_unlock(); 5. Handling of the file structures is special. Since the look-up of the fd (fget()/fget_light()) are lock-free, it is possible that look-up may race with the last put() operation on the |
fd659fd62 fix f_count descr... |
82 |
file structure. This is avoided using atomic_long_inc_not_zero() |
e6d42cb19 docs: filesystems... |
83 |
on ->f_count:: |
282254189 [PATCH] files: fi... |
84 85 86 87 |
rcu_read_lock(); file = fcheck_files(files, fd); if (file) { |
fd659fd62 fix f_count descr... |
88 |
if (atomic_long_inc_not_zero(&file->f_count)) |
282254189 [PATCH] files: fi... |
89 90 91 92 93 94 95 96 |
*fput_needed = 1; else /* Didn't get the reference, someone's freed */ file = NULL; } rcu_read_unlock(); .... return file; |
fd659fd62 fix f_count descr... |
97 |
atomic_long_inc_not_zero() detects if refcounts is already zero or |
282254189 [PATCH] files: fi... |
98 99 100 101 102 103 104 105 106 107 108 109 110 |
goes to zero during increment. If it does, we fail fget()/fget_light(). 6. Since both fdtable and file structures can be looked up lock-free, they must be installed using rcu_assign_pointer() API. If they are looked up lock-free, rcu_dereference() must be used. However it is advisable to use files_fdtable() and fcheck()/fcheck_files() which take care of these issues. 7. While updating, the fdtable pointer must be looked up while holding files->file_lock. If ->file_lock is dropped, then another thread expand the files thereby creating a new fdtable and making the earlier fdtable pointer stale. |
e6d42cb19 docs: filesystems... |
111 112 |
For example:: |
282254189 [PATCH] files: fi... |
113 114 115 116 117 118 |
spin_lock(&files->file_lock); fd = locate_fd(files, file, start); if (fd >= 0) { /* locate_fd() may have expanded fdtable, load the ptr */ fdt = files_fdtable(files); |
1dce27c5a Wrap accesses to ... |
119 120 |
__set_open_fd(fd, fdt); __clear_close_on_exec(fd, fdt); |
282254189 [PATCH] files: fi... |
121 122 123 124 125 |
spin_unlock(&files->file_lock); ..... Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), the fdtable pointer (fdt) must be loaded after locate_fd(). |