Blame view

Documentation/filesystems/files.rst 4.21 KB
e6d42cb19   Mauro Carvalho Chehab   docs: filesystems...
1
2
3
  .. SPDX-License-Identifier: GPL-2.0
  
  ===================================
282254189   Dipankar Sarma   [PATCH] files: fi...
4
  File management in the Linux kernel
e6d42cb19   Mauro Carvalho Chehab   docs: filesystems...
5
  ===================================
282254189   Dipankar Sarma   [PATCH] files: fi...
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
  
  This document describes how locking for files (struct file)
  and file descriptor table (struct files) works.
  
  Up until 2.6.12, the file descriptor table has been protected
  with a lock (files->file_lock) and reference count (files->count).
  ->file_lock protected accesses to all the file related fields
  of the table. ->count was used for sharing the file descriptor
  table between tasks cloned with CLONE_FILES flag. Typically
  this would be the case for posix threads. As with the common
  refcounting model in the kernel, the last task doing
  a put_files_struct() frees the file descriptor (fd) table.
  The files (struct file) themselves are protected using
  reference count (->f_count).
  
  In the new lock-free model of file descriptor management,
  the reference counting is similar, but the locking is
  based on RCU. The file descriptor table contains multiple
  elements - the fd sets (open_fds and close_on_exec, the
  array of file pointers, the sizes of the sets and the array
  etc.). In order for the updates to appear atomic to
  a lock-free reader, all the elements of the file descriptor
  table are in a separate structure - struct fdtable.
  files_struct contains a pointer to struct fdtable through
  which the actual fd table is accessed. Initially the
  fdtable is embedded in files_struct itself. On a subsequent
  expansion of fdtable, a new fdtable structure is allocated
  and files->fdtab points to the new structure. The fdtable
  structure is freed with RCU and lock-free readers either
  see the old fdtable or the new fdtable making the update
  appear atomic. Here are the locking rules for
  the fdtable structure -
  
  1. All references to the fdtable must be done through
e6d42cb19   Mauro Carvalho Chehab   docs: filesystems...
40
     the files_fdtable() macro::
282254189   Dipankar Sarma   [PATCH] files: fi...
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
  
  	struct fdtable *fdt;
  
  	rcu_read_lock();
  
  	fdt = files_fdtable(files);
  	....
  	if (n <= fdt->max_fds)
  		....
  	...
  	rcu_read_unlock();
  
     files_fdtable() uses rcu_dereference() macro which takes care of
     the memory barrier requirements for lock-free dereference.
     The fdtable pointer must be read within the read-side
     critical section.
  
  2. Reading of the fdtable as described above must be protected
     by rcu_read_lock()/rcu_read_unlock().
670e9f34e   Paolo Ornati   Documentation: re...
60
  3. For any update to the fd table, files->file_lock must
282254189   Dipankar Sarma   [PATCH] files: fi...
61
62
63
64
65
     be held.
  
  4. To look up the file structure given an fd, a reader
     must use either fcheck() or fcheck_files() APIs. These
     take care of barrier requirements due to lock-free lookup.
e6d42cb19   Mauro Carvalho Chehab   docs: filesystems...
66
67
  
     An example::
282254189   Dipankar Sarma   [PATCH] files: fi...
68
69
70
71
72
73
74
75
76
77
78
79
80
81
  
  	struct file *file;
  
  	rcu_read_lock();
  	file = fcheck(fd);
  	if (file) {
  		...
  	}
  	....
  	rcu_read_unlock();
  
  5. Handling of the file structures is special. Since the look-up
     of the fd (fget()/fget_light()) are lock-free, it is possible
     that look-up may race with the last put() operation on the
fd659fd62   Eric Dumazet   fix f_count descr...
82
     file structure. This is avoided using atomic_long_inc_not_zero()
e6d42cb19   Mauro Carvalho Chehab   docs: filesystems...
83
     on ->f_count::
282254189   Dipankar Sarma   [PATCH] files: fi...
84
85
86
87
  
  	rcu_read_lock();
  	file = fcheck_files(files, fd);
  	if (file) {
fd659fd62   Eric Dumazet   fix f_count descr...
88
  		if (atomic_long_inc_not_zero(&file->f_count))
282254189   Dipankar Sarma   [PATCH] files: fi...
89
90
91
92
93
94
95
96
  			*fput_needed = 1;
  		else
  		/* Didn't get the reference, someone's freed */
  			file = NULL;
  	}
  	rcu_read_unlock();
  	....
  	return file;
fd659fd62   Eric Dumazet   fix f_count descr...
97
     atomic_long_inc_not_zero() detects if refcounts is already zero or
282254189   Dipankar Sarma   [PATCH] files: fi...
98
99
100
101
102
103
104
105
106
107
108
109
110
     goes to zero during increment. If it does, we fail
     fget()/fget_light().
  
  6. Since both fdtable and file structures can be looked up
     lock-free, they must be installed using rcu_assign_pointer()
     API. If they are looked up lock-free, rcu_dereference()
     must be used. However it is advisable to use files_fdtable()
     and fcheck()/fcheck_files() which take care of these issues.
  
  7. While updating, the fdtable pointer must be looked up while
     holding files->file_lock. If ->file_lock is dropped, then
     another thread expand the files thereby creating a new
     fdtable and making the earlier fdtable pointer stale.
e6d42cb19   Mauro Carvalho Chehab   docs: filesystems...
111
112
  
     For example::
282254189   Dipankar Sarma   [PATCH] files: fi...
113
114
115
116
117
118
  
  	spin_lock(&files->file_lock);
  	fd = locate_fd(files, file, start);
  	if (fd >= 0) {
  		/* locate_fd() may have expanded fdtable, load the ptr */
  		fdt = files_fdtable(files);
1dce27c5a   David Howells   Wrap accesses to ...
119
120
  		__set_open_fd(fd, fdt);
  		__clear_close_on_exec(fd, fdt);
282254189   Dipankar Sarma   [PATCH] files: fi...
121
122
123
124
125
  		spin_unlock(&files->file_lock);
  	.....
  
     Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
     the fdtable pointer (fdt) must be loaded after locate_fd().