Blame view

Documentation/filesystems/files.txt 4.14 KB
282254189   Dipankar Sarma   [PATCH] files: fi...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
  File management in the Linux kernel
  -----------------------------------
  
  This document describes how locking for files (struct file)
  and file descriptor table (struct files) works.
  
  Up until 2.6.12, the file descriptor table has been protected
  with a lock (files->file_lock) and reference count (files->count).
  ->file_lock protected accesses to all the file related fields
  of the table. ->count was used for sharing the file descriptor
  table between tasks cloned with CLONE_FILES flag. Typically
  this would be the case for posix threads. As with the common
  refcounting model in the kernel, the last task doing
  a put_files_struct() frees the file descriptor (fd) table.
  The files (struct file) themselves are protected using
  reference count (->f_count).
  
  In the new lock-free model of file descriptor management,
  the reference counting is similar, but the locking is
  based on RCU. The file descriptor table contains multiple
  elements - the fd sets (open_fds and close_on_exec, the
  array of file pointers, the sizes of the sets and the array
  etc.). In order for the updates to appear atomic to
  a lock-free reader, all the elements of the file descriptor
  table are in a separate structure - struct fdtable.
  files_struct contains a pointer to struct fdtable through
  which the actual fd table is accessed. Initially the
  fdtable is embedded in files_struct itself. On a subsequent
  expansion of fdtable, a new fdtable structure is allocated
  and files->fdtab points to the new structure. The fdtable
  structure is freed with RCU and lock-free readers either
  see the old fdtable or the new fdtable making the update
  appear atomic. Here are the locking rules for
  the fdtable structure -
  
  1. All references to the fdtable must be done through
     the files_fdtable() macro :
  
  	struct fdtable *fdt;
  
  	rcu_read_lock();
  
  	fdt = files_fdtable(files);
  	....
  	if (n <= fdt->max_fds)
  		....
  	...
  	rcu_read_unlock();
  
     files_fdtable() uses rcu_dereference() macro which takes care of
     the memory barrier requirements for lock-free dereference.
     The fdtable pointer must be read within the read-side
     critical section.
  
  2. Reading of the fdtable as described above must be protected
     by rcu_read_lock()/rcu_read_unlock().
670e9f34e   Paolo Ornati   Documentation: re...
57
  3. For any update to the fd table, files->file_lock must
282254189   Dipankar Sarma   [PATCH] files: fi...
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
     be held.
  
  4. To look up the file structure given an fd, a reader
     must use either fcheck() or fcheck_files() APIs. These
     take care of barrier requirements due to lock-free lookup.
     An example :
  
  	struct file *file;
  
  	rcu_read_lock();
  	file = fcheck(fd);
  	if (file) {
  		...
  	}
  	....
  	rcu_read_unlock();
  
  5. Handling of the file structures is special. Since the look-up
     of the fd (fget()/fget_light()) are lock-free, it is possible
     that look-up may race with the last put() operation on the
fd659fd62   Eric Dumazet   fix f_count descr...
78
     file structure. This is avoided using atomic_long_inc_not_zero()
282254189   Dipankar Sarma   [PATCH] files: fi...
79
80
81
82
83
     on ->f_count :
  
  	rcu_read_lock();
  	file = fcheck_files(files, fd);
  	if (file) {
fd659fd62   Eric Dumazet   fix f_count descr...
84
  		if (atomic_long_inc_not_zero(&file->f_count))
282254189   Dipankar Sarma   [PATCH] files: fi...
85
86
87
88
89
90
91
92
  			*fput_needed = 1;
  		else
  		/* Didn't get the reference, someone's freed */
  			file = NULL;
  	}
  	rcu_read_unlock();
  	....
  	return file;
fd659fd62   Eric Dumazet   fix f_count descr...
93
     atomic_long_inc_not_zero() detects if refcounts is already zero or
282254189   Dipankar Sarma   [PATCH] files: fi...
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
     goes to zero during increment. If it does, we fail
     fget()/fget_light().
  
  6. Since both fdtable and file structures can be looked up
     lock-free, they must be installed using rcu_assign_pointer()
     API. If they are looked up lock-free, rcu_dereference()
     must be used. However it is advisable to use files_fdtable()
     and fcheck()/fcheck_files() which take care of these issues.
  
  7. While updating, the fdtable pointer must be looked up while
     holding files->file_lock. If ->file_lock is dropped, then
     another thread expand the files thereby creating a new
     fdtable and making the earlier fdtable pointer stale.
     For example :
  
  	spin_lock(&files->file_lock);
  	fd = locate_fd(files, file, start);
  	if (fd >= 0) {
  		/* locate_fd() may have expanded fdtable, load the ptr */
  		fdt = files_fdtable(files);
  		FD_SET(fd, fdt->open_fds);
  		FD_CLR(fd, fdt->close_on_exec);
  		spin_unlock(&files->file_lock);
  	.....
  
     Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
     the fdtable pointer (fdt) must be loaded after locate_fd().