Commit 0c36b390a546055b6815d4b93a2c9fed4d980ffb

Authored by Sebastian Ott
Committed by Tejun Heo
1 parent 5a838c3b60

percpu-refcount: fix usage of this_cpu_ops

The percpu-refcount infrastructure uses the underscore variants of
this_cpu_ops in order to modify percpu reference counters.
(e.g. __this_cpu_inc()).

However the underscore variants do not atomically update the percpu
variable, instead they may be implemented using read-modify-write
semantics (more than one instruction).  Therefore it is only safe to
use the underscore variant if the context is always the same (process,
softirq, or hardirq). Otherwise it is possible to lose updates.

This problem is something that Sebastian has seen within the aio
subsystem which uses percpu refcounters both in process and softirq
context leading to reference counts that never dropped to zeroes; even
though the number of "get" and "put" calls matched.

Fix this by using the non-underscore this_cpu_ops variant which
provides correct per cpu atomic semantics and fixes the corrupted
reference counts.

Cc: Kent Overstreet <kmo@daterainc.com>
Cc: <stable@vger.kernel.org> # v3.11+
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett

Showing 1 changed file with 3 additions and 3 deletions Inline Diff

include/linux/percpu-refcount.h
1 /* 1 /*
2 * Percpu refcounts: 2 * Percpu refcounts:
3 * (C) 2012 Google, Inc. 3 * (C) 2012 Google, Inc.
4 * Author: Kent Overstreet <koverstreet@google.com> 4 * Author: Kent Overstreet <koverstreet@google.com>
5 * 5 *
6 * This implements a refcount with similar semantics to atomic_t - atomic_inc(), 6 * This implements a refcount with similar semantics to atomic_t - atomic_inc(),
7 * atomic_dec_and_test() - but percpu. 7 * atomic_dec_and_test() - but percpu.
8 * 8 *
9 * There's one important difference between percpu refs and normal atomic_t 9 * There's one important difference between percpu refs and normal atomic_t
10 * refcounts; you have to keep track of your initial refcount, and then when you 10 * refcounts; you have to keep track of your initial refcount, and then when you
11 * start shutting down you call percpu_ref_kill() _before_ dropping the initial 11 * start shutting down you call percpu_ref_kill() _before_ dropping the initial
12 * refcount. 12 * refcount.
13 * 13 *
14 * The refcount will have a range of 0 to ((1U << 31) - 1), i.e. one bit less 14 * The refcount will have a range of 0 to ((1U << 31) - 1), i.e. one bit less
15 * than an atomic_t - this is because of the way shutdown works, see 15 * than an atomic_t - this is because of the way shutdown works, see
16 * percpu_ref_kill()/PCPU_COUNT_BIAS. 16 * percpu_ref_kill()/PCPU_COUNT_BIAS.
17 * 17 *
18 * Before you call percpu_ref_kill(), percpu_ref_put() does not check for the 18 * Before you call percpu_ref_kill(), percpu_ref_put() does not check for the
19 * refcount hitting 0 - it can't, if it was in percpu mode. percpu_ref_kill() 19 * refcount hitting 0 - it can't, if it was in percpu mode. percpu_ref_kill()
20 * puts the ref back in single atomic_t mode, collecting the per cpu refs and 20 * puts the ref back in single atomic_t mode, collecting the per cpu refs and
21 * issuing the appropriate barriers, and then marks the ref as shutting down so 21 * issuing the appropriate barriers, and then marks the ref as shutting down so
22 * that percpu_ref_put() will check for the ref hitting 0. After it returns, 22 * that percpu_ref_put() will check for the ref hitting 0. After it returns,
23 * it's safe to drop the initial ref. 23 * it's safe to drop the initial ref.
24 * 24 *
25 * USAGE: 25 * USAGE:
26 * 26 *
27 * See fs/aio.c for some example usage; it's used there for struct kioctx, which 27 * See fs/aio.c for some example usage; it's used there for struct kioctx, which
28 * is created when userspaces calls io_setup(), and destroyed when userspace 28 * is created when userspaces calls io_setup(), and destroyed when userspace
29 * calls io_destroy() or the process exits. 29 * calls io_destroy() or the process exits.
30 * 30 *
31 * In the aio code, kill_ioctx() is called when we wish to destroy a kioctx; it 31 * In the aio code, kill_ioctx() is called when we wish to destroy a kioctx; it
32 * calls percpu_ref_kill(), then hlist_del_rcu() and sychronize_rcu() to remove 32 * calls percpu_ref_kill(), then hlist_del_rcu() and sychronize_rcu() to remove
33 * the kioctx from the proccess's list of kioctxs - after that, there can't be 33 * the kioctx from the proccess's list of kioctxs - after that, there can't be
34 * any new users of the kioctx (from lookup_ioctx()) and it's then safe to drop 34 * any new users of the kioctx (from lookup_ioctx()) and it's then safe to drop
35 * the initial ref with percpu_ref_put(). 35 * the initial ref with percpu_ref_put().
36 * 36 *
37 * Code that does a two stage shutdown like this often needs some kind of 37 * Code that does a two stage shutdown like this often needs some kind of
38 * explicit synchronization to ensure the initial refcount can only be dropped 38 * explicit synchronization to ensure the initial refcount can only be dropped
39 * once - percpu_ref_kill() does this for you, it returns true once and false if 39 * once - percpu_ref_kill() does this for you, it returns true once and false if
40 * someone else already called it. The aio code uses it this way, but it's not 40 * someone else already called it. The aio code uses it this way, but it's not
41 * necessary if the code has some other mechanism to synchronize teardown. 41 * necessary if the code has some other mechanism to synchronize teardown.
42 * around. 42 * around.
43 */ 43 */
44 44
45 #ifndef _LINUX_PERCPU_REFCOUNT_H 45 #ifndef _LINUX_PERCPU_REFCOUNT_H
46 #define _LINUX_PERCPU_REFCOUNT_H 46 #define _LINUX_PERCPU_REFCOUNT_H
47 47
48 #include <linux/atomic.h> 48 #include <linux/atomic.h>
49 #include <linux/kernel.h> 49 #include <linux/kernel.h>
50 #include <linux/percpu.h> 50 #include <linux/percpu.h>
51 #include <linux/rcupdate.h> 51 #include <linux/rcupdate.h>
52 52
53 struct percpu_ref; 53 struct percpu_ref;
54 typedef void (percpu_ref_func_t)(struct percpu_ref *); 54 typedef void (percpu_ref_func_t)(struct percpu_ref *);
55 55
56 struct percpu_ref { 56 struct percpu_ref {
57 atomic_t count; 57 atomic_t count;
58 /* 58 /*
59 * The low bit of the pointer indicates whether the ref is in percpu 59 * The low bit of the pointer indicates whether the ref is in percpu
60 * mode; if set, then get/put will manipulate the atomic_t (this is a 60 * mode; if set, then get/put will manipulate the atomic_t (this is a
61 * hack because we need to keep the pointer around for 61 * hack because we need to keep the pointer around for
62 * percpu_ref_kill_rcu()) 62 * percpu_ref_kill_rcu())
63 */ 63 */
64 unsigned __percpu *pcpu_count; 64 unsigned __percpu *pcpu_count;
65 percpu_ref_func_t *release; 65 percpu_ref_func_t *release;
66 percpu_ref_func_t *confirm_kill; 66 percpu_ref_func_t *confirm_kill;
67 struct rcu_head rcu; 67 struct rcu_head rcu;
68 }; 68 };
69 69
70 int __must_check percpu_ref_init(struct percpu_ref *ref, 70 int __must_check percpu_ref_init(struct percpu_ref *ref,
71 percpu_ref_func_t *release); 71 percpu_ref_func_t *release);
72 void percpu_ref_cancel_init(struct percpu_ref *ref); 72 void percpu_ref_cancel_init(struct percpu_ref *ref);
73 void percpu_ref_kill_and_confirm(struct percpu_ref *ref, 73 void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
74 percpu_ref_func_t *confirm_kill); 74 percpu_ref_func_t *confirm_kill);
75 75
76 /** 76 /**
77 * percpu_ref_kill - drop the initial ref 77 * percpu_ref_kill - drop the initial ref
78 * @ref: percpu_ref to kill 78 * @ref: percpu_ref to kill
79 * 79 *
80 * Must be used to drop the initial ref on a percpu refcount; must be called 80 * Must be used to drop the initial ref on a percpu refcount; must be called
81 * precisely once before shutdown. 81 * precisely once before shutdown.
82 * 82 *
83 * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the 83 * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
84 * percpu counters and dropping the initial ref. 84 * percpu counters and dropping the initial ref.
85 */ 85 */
86 static inline void percpu_ref_kill(struct percpu_ref *ref) 86 static inline void percpu_ref_kill(struct percpu_ref *ref)
87 { 87 {
88 return percpu_ref_kill_and_confirm(ref, NULL); 88 return percpu_ref_kill_and_confirm(ref, NULL);
89 } 89 }
90 90
91 #define PCPU_STATUS_BITS 2 91 #define PCPU_STATUS_BITS 2
92 #define PCPU_STATUS_MASK ((1 << PCPU_STATUS_BITS) - 1) 92 #define PCPU_STATUS_MASK ((1 << PCPU_STATUS_BITS) - 1)
93 #define PCPU_REF_PTR 0 93 #define PCPU_REF_PTR 0
94 #define PCPU_REF_DEAD 1 94 #define PCPU_REF_DEAD 1
95 95
96 #define REF_STATUS(count) (((unsigned long) count) & PCPU_STATUS_MASK) 96 #define REF_STATUS(count) (((unsigned long) count) & PCPU_STATUS_MASK)
97 97
98 /** 98 /**
99 * percpu_ref_get - increment a percpu refcount 99 * percpu_ref_get - increment a percpu refcount
100 * @ref: percpu_ref to get 100 * @ref: percpu_ref to get
101 * 101 *
102 * Analagous to atomic_inc(). 102 * Analagous to atomic_inc().
103 */ 103 */
104 static inline void percpu_ref_get(struct percpu_ref *ref) 104 static inline void percpu_ref_get(struct percpu_ref *ref)
105 { 105 {
106 unsigned __percpu *pcpu_count; 106 unsigned __percpu *pcpu_count;
107 107
108 rcu_read_lock_sched(); 108 rcu_read_lock_sched();
109 109
110 pcpu_count = ACCESS_ONCE(ref->pcpu_count); 110 pcpu_count = ACCESS_ONCE(ref->pcpu_count);
111 111
112 if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) 112 if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR))
113 __this_cpu_inc(*pcpu_count); 113 this_cpu_inc(*pcpu_count);
114 else 114 else
115 atomic_inc(&ref->count); 115 atomic_inc(&ref->count);
116 116
117 rcu_read_unlock_sched(); 117 rcu_read_unlock_sched();
118 } 118 }
119 119
120 /** 120 /**
121 * percpu_ref_tryget - try to increment a percpu refcount 121 * percpu_ref_tryget - try to increment a percpu refcount
122 * @ref: percpu_ref to try-get 122 * @ref: percpu_ref to try-get
123 * 123 *
124 * Increment a percpu refcount unless it has already been killed. Returns 124 * Increment a percpu refcount unless it has already been killed. Returns
125 * %true on success; %false on failure. 125 * %true on success; %false on failure.
126 * 126 *
127 * Completion of percpu_ref_kill() in itself doesn't guarantee that tryget 127 * Completion of percpu_ref_kill() in itself doesn't guarantee that tryget
128 * will fail. For such guarantee, percpu_ref_kill_and_confirm() should be 128 * will fail. For such guarantee, percpu_ref_kill_and_confirm() should be
129 * used. After the confirm_kill callback is invoked, it's guaranteed that 129 * used. After the confirm_kill callback is invoked, it's guaranteed that
130 * no new reference will be given out by percpu_ref_tryget(). 130 * no new reference will be given out by percpu_ref_tryget().
131 */ 131 */
132 static inline bool percpu_ref_tryget(struct percpu_ref *ref) 132 static inline bool percpu_ref_tryget(struct percpu_ref *ref)
133 { 133 {
134 unsigned __percpu *pcpu_count; 134 unsigned __percpu *pcpu_count;
135 int ret = false; 135 int ret = false;
136 136
137 rcu_read_lock_sched(); 137 rcu_read_lock_sched();
138 138
139 pcpu_count = ACCESS_ONCE(ref->pcpu_count); 139 pcpu_count = ACCESS_ONCE(ref->pcpu_count);
140 140
141 if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) { 141 if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) {
142 __this_cpu_inc(*pcpu_count); 142 this_cpu_inc(*pcpu_count);
143 ret = true; 143 ret = true;
144 } 144 }
145 145
146 rcu_read_unlock_sched(); 146 rcu_read_unlock_sched();
147 147
148 return ret; 148 return ret;
149 } 149 }
150 150
151 /** 151 /**
152 * percpu_ref_put - decrement a percpu refcount 152 * percpu_ref_put - decrement a percpu refcount
153 * @ref: percpu_ref to put 153 * @ref: percpu_ref to put
154 * 154 *
155 * Decrement the refcount, and if 0, call the release function (which was passed 155 * Decrement the refcount, and if 0, call the release function (which was passed
156 * to percpu_ref_init()) 156 * to percpu_ref_init())
157 */ 157 */
158 static inline void percpu_ref_put(struct percpu_ref *ref) 158 static inline void percpu_ref_put(struct percpu_ref *ref)
159 { 159 {
160 unsigned __percpu *pcpu_count; 160 unsigned __percpu *pcpu_count;
161 161
162 rcu_read_lock_sched(); 162 rcu_read_lock_sched();
163 163
164 pcpu_count = ACCESS_ONCE(ref->pcpu_count); 164 pcpu_count = ACCESS_ONCE(ref->pcpu_count);
165 165
166 if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) 166 if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR))
167 __this_cpu_dec(*pcpu_count); 167 this_cpu_dec(*pcpu_count);
168 else if (unlikely(atomic_dec_and_test(&ref->count))) 168 else if (unlikely(atomic_dec_and_test(&ref->count)))
169 ref->release(ref); 169 ref->release(ref);
170 170
171 rcu_read_unlock_sched(); 171 rcu_read_unlock_sched();
172 } 172 }
173 173
174 #endif 174 #endif
175 175