Commit 6c9c0b52b8c6b68b05bb06efd7079a8fc5e9ba60

Authored by Peter Staubach
Committed by Linus Torvalds
1 parent 439c430e3d

[PATCH] largefile support for accounting

There is a problem in the accounting subsystem in the kernel can not
correctly handle files larger than 2GB.  The output file containing the
process accounting data can grow very large if the system is large enough
and active enough.  If the 2GB limit is reached, then the system simply
stops storing process accounting data.

Another annoying problem is that once the system reaches this 2GB limit,
then every process which exits will receive a signal, SIGXFSZ.  This signal
is generated because an attempt was made to write beyond the limit for the
file descriptor.  This signal makes it look like every process has exited
due to a signal, when in fact, they have not.

The solution is to add the O_LARGEFILE flag to the list of flags used to
open the accounting file.  The rest of the accounting support is already
largefile safe.

The changes were tested by constructing a large file (just short of 2GB),
enabling accounting, and then running enough commands to cause the
accounting data generated to increase the size of the file to 2GB.  Without
the changes, the file grows to 2GB and the last command run in the test
script appears to exit due a signal when it has not.  With the changes,
things work as expected and quietly.

There are some user level changes required so that it can deal with
largefiles, but those are being handled separately.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Showing 1 changed file with 1 additions and 1 deletions Inline Diff

1 /* 1 /*
2 * linux/kernel/acct.c 2 * linux/kernel/acct.c
3 * 3 *
4 * BSD Process Accounting for Linux 4 * BSD Process Accounting for Linux
5 * 5 *
6 * Author: Marco van Wieringen <mvw@planets.elm.net> 6 * Author: Marco van Wieringen <mvw@planets.elm.net>
7 * 7 *
8 * Some code based on ideas and code from: 8 * Some code based on ideas and code from:
9 * Thomas K. Dyas <tdyas@eden.rutgers.edu> 9 * Thomas K. Dyas <tdyas@eden.rutgers.edu>
10 * 10 *
11 * This file implements BSD-style process accounting. Whenever any 11 * This file implements BSD-style process accounting. Whenever any
12 * process exits, an accounting record of type "struct acct" is 12 * process exits, an accounting record of type "struct acct" is
13 * written to the file specified with the acct() system call. It is 13 * written to the file specified with the acct() system call. It is
14 * up to user-level programs to do useful things with the accounting 14 * up to user-level programs to do useful things with the accounting
15 * log. The kernel just provides the raw accounting information. 15 * log. The kernel just provides the raw accounting information.
16 * 16 *
17 * (C) Copyright 1995 - 1997 Marco van Wieringen - ELM Consultancy B.V. 17 * (C) Copyright 1995 - 1997 Marco van Wieringen - ELM Consultancy B.V.
18 * 18 *
19 * Plugged two leaks. 1) It didn't return acct_file into the free_filps if 19 * Plugged two leaks. 1) It didn't return acct_file into the free_filps if
20 * the file happened to be read-only. 2) If the accounting was suspended 20 * the file happened to be read-only. 2) If the accounting was suspended
21 * due to the lack of space it happily allowed to reopen it and completely 21 * due to the lack of space it happily allowed to reopen it and completely
22 * lost the old acct_file. 3/10/98, Al Viro. 22 * lost the old acct_file. 3/10/98, Al Viro.
23 * 23 *
24 * Now we silently close acct_file on attempt to reopen. Cleaned sys_acct(). 24 * Now we silently close acct_file on attempt to reopen. Cleaned sys_acct().
25 * XTerms and EMACS are manifestations of pure evil. 21/10/98, AV. 25 * XTerms and EMACS are manifestations of pure evil. 21/10/98, AV.
26 * 26 *
27 * Fixed a nasty interaction with with sys_umount(). If the accointing 27 * Fixed a nasty interaction with with sys_umount(). If the accointing
28 * was suspeneded we failed to stop it on umount(). Messy. 28 * was suspeneded we failed to stop it on umount(). Messy.
29 * Another one: remount to readonly didn't stop accounting. 29 * Another one: remount to readonly didn't stop accounting.
30 * Question: what should we do if we have CAP_SYS_ADMIN but not 30 * Question: what should we do if we have CAP_SYS_ADMIN but not
31 * CAP_SYS_PACCT? Current code does the following: umount returns -EBUSY 31 * CAP_SYS_PACCT? Current code does the following: umount returns -EBUSY
32 * unless we are messing with the root. In that case we are getting a 32 * unless we are messing with the root. In that case we are getting a
33 * real mess with do_remount_sb(). 9/11/98, AV. 33 * real mess with do_remount_sb(). 9/11/98, AV.
34 * 34 *
35 * Fixed a bunch of races (and pair of leaks). Probably not the best way, 35 * Fixed a bunch of races (and pair of leaks). Probably not the best way,
36 * but this one obviously doesn't introduce deadlocks. Later. BTW, found 36 * but this one obviously doesn't introduce deadlocks. Later. BTW, found
37 * one race (and leak) in BSD implementation. 37 * one race (and leak) in BSD implementation.
38 * OK, that's better. ANOTHER race and leak in BSD variant. There always 38 * OK, that's better. ANOTHER race and leak in BSD variant. There always
39 * is one more bug... 10/11/98, AV. 39 * is one more bug... 10/11/98, AV.
40 * 40 *
41 * Oh, fsck... Oopsable SMP race in do_process_acct() - we must hold 41 * Oh, fsck... Oopsable SMP race in do_process_acct() - we must hold
42 * ->mmap_sem to walk the vma list of current->mm. Nasty, since it leaks 42 * ->mmap_sem to walk the vma list of current->mm. Nasty, since it leaks
43 * a struct file opened for write. Fixed. 2/6/2000, AV. 43 * a struct file opened for write. Fixed. 2/6/2000, AV.
44 */ 44 */
45 45
46 #include <linux/config.h> 46 #include <linux/config.h>
47 #include <linux/mm.h> 47 #include <linux/mm.h>
48 #include <linux/slab.h> 48 #include <linux/slab.h>
49 #include <linux/acct.h> 49 #include <linux/acct.h>
50 #include <linux/file.h> 50 #include <linux/file.h>
51 #include <linux/tty.h> 51 #include <linux/tty.h>
52 #include <linux/security.h> 52 #include <linux/security.h>
53 #include <linux/vfs.h> 53 #include <linux/vfs.h>
54 #include <linux/jiffies.h> 54 #include <linux/jiffies.h>
55 #include <linux/times.h> 55 #include <linux/times.h>
56 #include <linux/syscalls.h> 56 #include <linux/syscalls.h>
57 #include <asm/uaccess.h> 57 #include <asm/uaccess.h>
58 #include <asm/div64.h> 58 #include <asm/div64.h>
59 #include <linux/blkdev.h> /* sector_div */ 59 #include <linux/blkdev.h> /* sector_div */
60 60
61 /* 61 /*
62 * These constants control the amount of freespace that suspend and 62 * These constants control the amount of freespace that suspend and
63 * resume the process accounting system, and the time delay between 63 * resume the process accounting system, and the time delay between
64 * each check. 64 * each check.
65 * Turned into sysctl-controllable parameters. AV, 12/11/98 65 * Turned into sysctl-controllable parameters. AV, 12/11/98
66 */ 66 */
67 67
68 int acct_parm[3] = {4, 2, 30}; 68 int acct_parm[3] = {4, 2, 30};
69 #define RESUME (acct_parm[0]) /* >foo% free space - resume */ 69 #define RESUME (acct_parm[0]) /* >foo% free space - resume */
70 #define SUSPEND (acct_parm[1]) /* <foo% free space - suspend */ 70 #define SUSPEND (acct_parm[1]) /* <foo% free space - suspend */
71 #define ACCT_TIMEOUT (acct_parm[2]) /* foo second timeout between checks */ 71 #define ACCT_TIMEOUT (acct_parm[2]) /* foo second timeout between checks */
72 72
73 /* 73 /*
74 * External references and all of the globals. 74 * External references and all of the globals.
75 */ 75 */
76 static void do_acct_process(long, struct file *); 76 static void do_acct_process(long, struct file *);
77 77
78 /* 78 /*
79 * This structure is used so that all the data protected by lock 79 * This structure is used so that all the data protected by lock
80 * can be placed in the same cache line as the lock. This primes 80 * can be placed in the same cache line as the lock. This primes
81 * the cache line to have the data after getting the lock. 81 * the cache line to have the data after getting the lock.
82 */ 82 */
83 struct acct_glbs { 83 struct acct_glbs {
84 spinlock_t lock; 84 spinlock_t lock;
85 volatile int active; 85 volatile int active;
86 volatile int needcheck; 86 volatile int needcheck;
87 struct file *file; 87 struct file *file;
88 struct timer_list timer; 88 struct timer_list timer;
89 }; 89 };
90 90
91 static struct acct_glbs acct_globals __cacheline_aligned = {SPIN_LOCK_UNLOCKED}; 91 static struct acct_glbs acct_globals __cacheline_aligned = {SPIN_LOCK_UNLOCKED};
92 92
93 /* 93 /*
94 * Called whenever the timer says to check the free space. 94 * Called whenever the timer says to check the free space.
95 */ 95 */
96 static void acct_timeout(unsigned long unused) 96 static void acct_timeout(unsigned long unused)
97 { 97 {
98 acct_globals.needcheck = 1; 98 acct_globals.needcheck = 1;
99 } 99 }
100 100
101 /* 101 /*
102 * Check the amount of free space and suspend/resume accordingly. 102 * Check the amount of free space and suspend/resume accordingly.
103 */ 103 */
104 static int check_free_space(struct file *file) 104 static int check_free_space(struct file *file)
105 { 105 {
106 struct kstatfs sbuf; 106 struct kstatfs sbuf;
107 int res; 107 int res;
108 int act; 108 int act;
109 sector_t resume; 109 sector_t resume;
110 sector_t suspend; 110 sector_t suspend;
111 111
112 spin_lock(&acct_globals.lock); 112 spin_lock(&acct_globals.lock);
113 res = acct_globals.active; 113 res = acct_globals.active;
114 if (!file || !acct_globals.needcheck) 114 if (!file || !acct_globals.needcheck)
115 goto out; 115 goto out;
116 spin_unlock(&acct_globals.lock); 116 spin_unlock(&acct_globals.lock);
117 117
118 /* May block */ 118 /* May block */
119 if (vfs_statfs(file->f_dentry->d_inode->i_sb, &sbuf)) 119 if (vfs_statfs(file->f_dentry->d_inode->i_sb, &sbuf))
120 return res; 120 return res;
121 suspend = sbuf.f_blocks * SUSPEND; 121 suspend = sbuf.f_blocks * SUSPEND;
122 resume = sbuf.f_blocks * RESUME; 122 resume = sbuf.f_blocks * RESUME;
123 123
124 sector_div(suspend, 100); 124 sector_div(suspend, 100);
125 sector_div(resume, 100); 125 sector_div(resume, 100);
126 126
127 if (sbuf.f_bavail <= suspend) 127 if (sbuf.f_bavail <= suspend)
128 act = -1; 128 act = -1;
129 else if (sbuf.f_bavail >= resume) 129 else if (sbuf.f_bavail >= resume)
130 act = 1; 130 act = 1;
131 else 131 else
132 act = 0; 132 act = 0;
133 133
134 /* 134 /*
135 * If some joker switched acct_globals.file under us we'ld better be 135 * If some joker switched acct_globals.file under us we'ld better be
136 * silent and _not_ touch anything. 136 * silent and _not_ touch anything.
137 */ 137 */
138 spin_lock(&acct_globals.lock); 138 spin_lock(&acct_globals.lock);
139 if (file != acct_globals.file) { 139 if (file != acct_globals.file) {
140 if (act) 140 if (act)
141 res = act>0; 141 res = act>0;
142 goto out; 142 goto out;
143 } 143 }
144 144
145 if (acct_globals.active) { 145 if (acct_globals.active) {
146 if (act < 0) { 146 if (act < 0) {
147 acct_globals.active = 0; 147 acct_globals.active = 0;
148 printk(KERN_INFO "Process accounting paused\n"); 148 printk(KERN_INFO "Process accounting paused\n");
149 } 149 }
150 } else { 150 } else {
151 if (act > 0) { 151 if (act > 0) {
152 acct_globals.active = 1; 152 acct_globals.active = 1;
153 printk(KERN_INFO "Process accounting resumed\n"); 153 printk(KERN_INFO "Process accounting resumed\n");
154 } 154 }
155 } 155 }
156 156
157 del_timer(&acct_globals.timer); 157 del_timer(&acct_globals.timer);
158 acct_globals.needcheck = 0; 158 acct_globals.needcheck = 0;
159 acct_globals.timer.expires = jiffies + ACCT_TIMEOUT*HZ; 159 acct_globals.timer.expires = jiffies + ACCT_TIMEOUT*HZ;
160 add_timer(&acct_globals.timer); 160 add_timer(&acct_globals.timer);
161 res = acct_globals.active; 161 res = acct_globals.active;
162 out: 162 out:
163 spin_unlock(&acct_globals.lock); 163 spin_unlock(&acct_globals.lock);
164 return res; 164 return res;
165 } 165 }
166 166
167 /* 167 /*
168 * Close the old accouting file (if currently open) and then replace 168 * Close the old accouting file (if currently open) and then replace
169 * it with file (if non-NULL). 169 * it with file (if non-NULL).
170 * 170 *
171 * NOTE: acct_globals.lock MUST be held on entry and exit. 171 * NOTE: acct_globals.lock MUST be held on entry and exit.
172 */ 172 */
173 static void acct_file_reopen(struct file *file) 173 static void acct_file_reopen(struct file *file)
174 { 174 {
175 struct file *old_acct = NULL; 175 struct file *old_acct = NULL;
176 176
177 if (acct_globals.file) { 177 if (acct_globals.file) {
178 old_acct = acct_globals.file; 178 old_acct = acct_globals.file;
179 del_timer(&acct_globals.timer); 179 del_timer(&acct_globals.timer);
180 acct_globals.active = 0; 180 acct_globals.active = 0;
181 acct_globals.needcheck = 0; 181 acct_globals.needcheck = 0;
182 acct_globals.file = NULL; 182 acct_globals.file = NULL;
183 } 183 }
184 if (file) { 184 if (file) {
185 acct_globals.file = file; 185 acct_globals.file = file;
186 acct_globals.needcheck = 0; 186 acct_globals.needcheck = 0;
187 acct_globals.active = 1; 187 acct_globals.active = 1;
188 /* It's been deleted if it was used before so this is safe */ 188 /* It's been deleted if it was used before so this is safe */
189 init_timer(&acct_globals.timer); 189 init_timer(&acct_globals.timer);
190 acct_globals.timer.function = acct_timeout; 190 acct_globals.timer.function = acct_timeout;
191 acct_globals.timer.expires = jiffies + ACCT_TIMEOUT*HZ; 191 acct_globals.timer.expires = jiffies + ACCT_TIMEOUT*HZ;
192 add_timer(&acct_globals.timer); 192 add_timer(&acct_globals.timer);
193 } 193 }
194 if (old_acct) { 194 if (old_acct) {
195 spin_unlock(&acct_globals.lock); 195 spin_unlock(&acct_globals.lock);
196 do_acct_process(0, old_acct); 196 do_acct_process(0, old_acct);
197 filp_close(old_acct, NULL); 197 filp_close(old_acct, NULL);
198 spin_lock(&acct_globals.lock); 198 spin_lock(&acct_globals.lock);
199 } 199 }
200 } 200 }
201 201
202 /* 202 /*
203 * sys_acct() is the only system call needed to implement process 203 * sys_acct() is the only system call needed to implement process
204 * accounting. It takes the name of the file where accounting records 204 * accounting. It takes the name of the file where accounting records
205 * should be written. If the filename is NULL, accounting will be 205 * should be written. If the filename is NULL, accounting will be
206 * shutdown. 206 * shutdown.
207 */ 207 */
208 asmlinkage long sys_acct(const char __user *name) 208 asmlinkage long sys_acct(const char __user *name)
209 { 209 {
210 struct file *file = NULL; 210 struct file *file = NULL;
211 char *tmp; 211 char *tmp;
212 int error; 212 int error;
213 213
214 if (!capable(CAP_SYS_PACCT)) 214 if (!capable(CAP_SYS_PACCT))
215 return -EPERM; 215 return -EPERM;
216 216
217 if (name) { 217 if (name) {
218 tmp = getname(name); 218 tmp = getname(name);
219 if (IS_ERR(tmp)) { 219 if (IS_ERR(tmp)) {
220 return (PTR_ERR(tmp)); 220 return (PTR_ERR(tmp));
221 } 221 }
222 /* Difference from BSD - they don't do O_APPEND */ 222 /* Difference from BSD - they don't do O_APPEND */
223 file = filp_open(tmp, O_WRONLY|O_APPEND, 0); 223 file = filp_open(tmp, O_WRONLY|O_APPEND|O_LARGEFILE, 0);
224 putname(tmp); 224 putname(tmp);
225 if (IS_ERR(file)) { 225 if (IS_ERR(file)) {
226 return (PTR_ERR(file)); 226 return (PTR_ERR(file));
227 } 227 }
228 if (!S_ISREG(file->f_dentry->d_inode->i_mode)) { 228 if (!S_ISREG(file->f_dentry->d_inode->i_mode)) {
229 filp_close(file, NULL); 229 filp_close(file, NULL);
230 return (-EACCES); 230 return (-EACCES);
231 } 231 }
232 232
233 if (!file->f_op->write) { 233 if (!file->f_op->write) {
234 filp_close(file, NULL); 234 filp_close(file, NULL);
235 return (-EIO); 235 return (-EIO);
236 } 236 }
237 } 237 }
238 238
239 error = security_acct(file); 239 error = security_acct(file);
240 if (error) { 240 if (error) {
241 if (file) 241 if (file)
242 filp_close(file, NULL); 242 filp_close(file, NULL);
243 return error; 243 return error;
244 } 244 }
245 245
246 spin_lock(&acct_globals.lock); 246 spin_lock(&acct_globals.lock);
247 acct_file_reopen(file); 247 acct_file_reopen(file);
248 spin_unlock(&acct_globals.lock); 248 spin_unlock(&acct_globals.lock);
249 249
250 return (0); 250 return (0);
251 } 251 }
252 252
253 /* 253 /*
254 * If the accouting is turned on for a file in the filesystem pointed 254 * If the accouting is turned on for a file in the filesystem pointed
255 * to by sb, turn accouting off. 255 * to by sb, turn accouting off.
256 */ 256 */
257 void acct_auto_close(struct super_block *sb) 257 void acct_auto_close(struct super_block *sb)
258 { 258 {
259 spin_lock(&acct_globals.lock); 259 spin_lock(&acct_globals.lock);
260 if (acct_globals.file && 260 if (acct_globals.file &&
261 acct_globals.file->f_dentry->d_inode->i_sb == sb) { 261 acct_globals.file->f_dentry->d_inode->i_sb == sb) {
262 acct_file_reopen((struct file *)NULL); 262 acct_file_reopen((struct file *)NULL);
263 } 263 }
264 spin_unlock(&acct_globals.lock); 264 spin_unlock(&acct_globals.lock);
265 } 265 }
266 266
267 /* 267 /*
268 * encode an unsigned long into a comp_t 268 * encode an unsigned long into a comp_t
269 * 269 *
270 * This routine has been adopted from the encode_comp_t() function in 270 * This routine has been adopted from the encode_comp_t() function in
271 * the kern_acct.c file of the FreeBSD operating system. The encoding 271 * the kern_acct.c file of the FreeBSD operating system. The encoding
272 * is a 13-bit fraction with a 3-bit (base 8) exponent. 272 * is a 13-bit fraction with a 3-bit (base 8) exponent.
273 */ 273 */
274 274
275 #define MANTSIZE 13 /* 13 bit mantissa. */ 275 #define MANTSIZE 13 /* 13 bit mantissa. */
276 #define EXPSIZE 3 /* Base 8 (3 bit) exponent. */ 276 #define EXPSIZE 3 /* Base 8 (3 bit) exponent. */
277 #define MAXFRACT ((1 << MANTSIZE) - 1) /* Maximum fractional value. */ 277 #define MAXFRACT ((1 << MANTSIZE) - 1) /* Maximum fractional value. */
278 278
279 static comp_t encode_comp_t(unsigned long value) 279 static comp_t encode_comp_t(unsigned long value)
280 { 280 {
281 int exp, rnd; 281 int exp, rnd;
282 282
283 exp = rnd = 0; 283 exp = rnd = 0;
284 while (value > MAXFRACT) { 284 while (value > MAXFRACT) {
285 rnd = value & (1 << (EXPSIZE - 1)); /* Round up? */ 285 rnd = value & (1 << (EXPSIZE - 1)); /* Round up? */
286 value >>= EXPSIZE; /* Base 8 exponent == 3 bit shift. */ 286 value >>= EXPSIZE; /* Base 8 exponent == 3 bit shift. */
287 exp++; 287 exp++;
288 } 288 }
289 289
290 /* 290 /*
291 * If we need to round up, do it (and handle overflow correctly). 291 * If we need to round up, do it (and handle overflow correctly).
292 */ 292 */
293 if (rnd && (++value > MAXFRACT)) { 293 if (rnd && (++value > MAXFRACT)) {
294 value >>= EXPSIZE; 294 value >>= EXPSIZE;
295 exp++; 295 exp++;
296 } 296 }
297 297
298 /* 298 /*
299 * Clean it up and polish it off. 299 * Clean it up and polish it off.
300 */ 300 */
301 exp <<= MANTSIZE; /* Shift the exponent into place */ 301 exp <<= MANTSIZE; /* Shift the exponent into place */
302 exp += value; /* and add on the mantissa. */ 302 exp += value; /* and add on the mantissa. */
303 return exp; 303 return exp;
304 } 304 }
305 305
306 #if ACCT_VERSION==1 || ACCT_VERSION==2 306 #if ACCT_VERSION==1 || ACCT_VERSION==2
307 /* 307 /*
308 * encode an u64 into a comp2_t (24 bits) 308 * encode an u64 into a comp2_t (24 bits)
309 * 309 *
310 * Format: 5 bit base 2 exponent, 20 bits mantissa. 310 * Format: 5 bit base 2 exponent, 20 bits mantissa.
311 * The leading bit of the mantissa is not stored, but implied for 311 * The leading bit of the mantissa is not stored, but implied for
312 * non-zero exponents. 312 * non-zero exponents.
313 * Largest encodable value is 50 bits. 313 * Largest encodable value is 50 bits.
314 */ 314 */
315 315
316 #define MANTSIZE2 20 /* 20 bit mantissa. */ 316 #define MANTSIZE2 20 /* 20 bit mantissa. */
317 #define EXPSIZE2 5 /* 5 bit base 2 exponent. */ 317 #define EXPSIZE2 5 /* 5 bit base 2 exponent. */
318 #define MAXFRACT2 ((1ul << MANTSIZE2) - 1) /* Maximum fractional value. */ 318 #define MAXFRACT2 ((1ul << MANTSIZE2) - 1) /* Maximum fractional value. */
319 #define MAXEXP2 ((1 <<EXPSIZE2) - 1) /* Maximum exponent. */ 319 #define MAXEXP2 ((1 <<EXPSIZE2) - 1) /* Maximum exponent. */
320 320
321 static comp2_t encode_comp2_t(u64 value) 321 static comp2_t encode_comp2_t(u64 value)
322 { 322 {
323 int exp, rnd; 323 int exp, rnd;
324 324
325 exp = (value > (MAXFRACT2>>1)); 325 exp = (value > (MAXFRACT2>>1));
326 rnd = 0; 326 rnd = 0;
327 while (value > MAXFRACT2) { 327 while (value > MAXFRACT2) {
328 rnd = value & 1; 328 rnd = value & 1;
329 value >>= 1; 329 value >>= 1;
330 exp++; 330 exp++;
331 } 331 }
332 332
333 /* 333 /*
334 * If we need to round up, do it (and handle overflow correctly). 334 * If we need to round up, do it (and handle overflow correctly).
335 */ 335 */
336 if (rnd && (++value > MAXFRACT2)) { 336 if (rnd && (++value > MAXFRACT2)) {
337 value >>= 1; 337 value >>= 1;
338 exp++; 338 exp++;
339 } 339 }
340 340
341 if (exp > MAXEXP2) { 341 if (exp > MAXEXP2) {
342 /* Overflow. Return largest representable number instead. */ 342 /* Overflow. Return largest representable number instead. */
343 return (1ul << (MANTSIZE2+EXPSIZE2-1)) - 1; 343 return (1ul << (MANTSIZE2+EXPSIZE2-1)) - 1;
344 } else { 344 } else {
345 return (value & (MAXFRACT2>>1)) | (exp << (MANTSIZE2-1)); 345 return (value & (MAXFRACT2>>1)) | (exp << (MANTSIZE2-1));
346 } 346 }
347 } 347 }
348 #endif 348 #endif
349 349
350 #if ACCT_VERSION==3 350 #if ACCT_VERSION==3
351 /* 351 /*
352 * encode an u64 into a 32 bit IEEE float 352 * encode an u64 into a 32 bit IEEE float
353 */ 353 */
354 static u32 encode_float(u64 value) 354 static u32 encode_float(u64 value)
355 { 355 {
356 unsigned exp = 190; 356 unsigned exp = 190;
357 unsigned u; 357 unsigned u;
358 358
359 if (value==0) return 0; 359 if (value==0) return 0;
360 while ((s64)value > 0){ 360 while ((s64)value > 0){
361 value <<= 1; 361 value <<= 1;
362 exp--; 362 exp--;
363 } 363 }
364 u = (u32)(value >> 40) & 0x7fffffu; 364 u = (u32)(value >> 40) & 0x7fffffu;
365 return u | (exp << 23); 365 return u | (exp << 23);
366 } 366 }
367 #endif 367 #endif
368 368
369 /* 369 /*
370 * Write an accounting entry for an exiting process 370 * Write an accounting entry for an exiting process
371 * 371 *
372 * The acct_process() call is the workhorse of the process 372 * The acct_process() call is the workhorse of the process
373 * accounting system. The struct acct is built here and then written 373 * accounting system. The struct acct is built here and then written
374 * into the accounting file. This function should only be called from 374 * into the accounting file. This function should only be called from
375 * do_exit(). 375 * do_exit().
376 */ 376 */
377 377
378 /* 378 /*
379 * do_acct_process does all actual work. Caller holds the reference to file. 379 * do_acct_process does all actual work. Caller holds the reference to file.
380 */ 380 */
381 static void do_acct_process(long exitcode, struct file *file) 381 static void do_acct_process(long exitcode, struct file *file)
382 { 382 {
383 acct_t ac; 383 acct_t ac;
384 mm_segment_t fs; 384 mm_segment_t fs;
385 unsigned long vsize; 385 unsigned long vsize;
386 unsigned long flim; 386 unsigned long flim;
387 u64 elapsed; 387 u64 elapsed;
388 u64 run_time; 388 u64 run_time;
389 struct timespec uptime; 389 struct timespec uptime;
390 390
391 /* 391 /*
392 * First check to see if there is enough free_space to continue 392 * First check to see if there is enough free_space to continue
393 * the process accounting system. 393 * the process accounting system.
394 */ 394 */
395 if (!check_free_space(file)) 395 if (!check_free_space(file))
396 return; 396 return;
397 397
398 /* 398 /*
399 * Fill the accounting struct with the needed info as recorded 399 * Fill the accounting struct with the needed info as recorded
400 * by the different kernel functions. 400 * by the different kernel functions.
401 */ 401 */
402 memset((caddr_t)&ac, 0, sizeof(acct_t)); 402 memset((caddr_t)&ac, 0, sizeof(acct_t));
403 403
404 ac.ac_version = ACCT_VERSION | ACCT_BYTEORDER; 404 ac.ac_version = ACCT_VERSION | ACCT_BYTEORDER;
405 strlcpy(ac.ac_comm, current->comm, sizeof(ac.ac_comm)); 405 strlcpy(ac.ac_comm, current->comm, sizeof(ac.ac_comm));
406 406
407 /* calculate run_time in nsec*/ 407 /* calculate run_time in nsec*/
408 do_posix_clock_monotonic_gettime(&uptime); 408 do_posix_clock_monotonic_gettime(&uptime);
409 run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec; 409 run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec;
410 run_time -= (u64)current->start_time.tv_sec*NSEC_PER_SEC 410 run_time -= (u64)current->start_time.tv_sec*NSEC_PER_SEC
411 + current->start_time.tv_nsec; 411 + current->start_time.tv_nsec;
412 /* convert nsec -> AHZ */ 412 /* convert nsec -> AHZ */
413 elapsed = nsec_to_AHZ(run_time); 413 elapsed = nsec_to_AHZ(run_time);
414 #if ACCT_VERSION==3 414 #if ACCT_VERSION==3
415 ac.ac_etime = encode_float(elapsed); 415 ac.ac_etime = encode_float(elapsed);
416 #else 416 #else
417 ac.ac_etime = encode_comp_t(elapsed < (unsigned long) -1l ? 417 ac.ac_etime = encode_comp_t(elapsed < (unsigned long) -1l ?
418 (unsigned long) elapsed : (unsigned long) -1l); 418 (unsigned long) elapsed : (unsigned long) -1l);
419 #endif 419 #endif
420 #if ACCT_VERSION==1 || ACCT_VERSION==2 420 #if ACCT_VERSION==1 || ACCT_VERSION==2
421 { 421 {
422 /* new enlarged etime field */ 422 /* new enlarged etime field */
423 comp2_t etime = encode_comp2_t(elapsed); 423 comp2_t etime = encode_comp2_t(elapsed);
424 ac.ac_etime_hi = etime >> 16; 424 ac.ac_etime_hi = etime >> 16;
425 ac.ac_etime_lo = (u16) etime; 425 ac.ac_etime_lo = (u16) etime;
426 } 426 }
427 #endif 427 #endif
428 do_div(elapsed, AHZ); 428 do_div(elapsed, AHZ);
429 ac.ac_btime = xtime.tv_sec - elapsed; 429 ac.ac_btime = xtime.tv_sec - elapsed;
430 ac.ac_utime = encode_comp_t(jiffies_to_AHZ( 430 ac.ac_utime = encode_comp_t(jiffies_to_AHZ(
431 current->signal->utime + 431 current->signal->utime +
432 current->group_leader->utime)); 432 current->group_leader->utime));
433 ac.ac_stime = encode_comp_t(jiffies_to_AHZ( 433 ac.ac_stime = encode_comp_t(jiffies_to_AHZ(
434 current->signal->stime + 434 current->signal->stime +
435 current->group_leader->stime)); 435 current->group_leader->stime));
436 /* we really need to bite the bullet and change layout */ 436 /* we really need to bite the bullet and change layout */
437 ac.ac_uid = current->uid; 437 ac.ac_uid = current->uid;
438 ac.ac_gid = current->gid; 438 ac.ac_gid = current->gid;
439 #if ACCT_VERSION==2 439 #if ACCT_VERSION==2
440 ac.ac_ahz = AHZ; 440 ac.ac_ahz = AHZ;
441 #endif 441 #endif
442 #if ACCT_VERSION==1 || ACCT_VERSION==2 442 #if ACCT_VERSION==1 || ACCT_VERSION==2
443 /* backward-compatible 16 bit fields */ 443 /* backward-compatible 16 bit fields */
444 ac.ac_uid16 = current->uid; 444 ac.ac_uid16 = current->uid;
445 ac.ac_gid16 = current->gid; 445 ac.ac_gid16 = current->gid;
446 #endif 446 #endif
447 #if ACCT_VERSION==3 447 #if ACCT_VERSION==3
448 ac.ac_pid = current->tgid; 448 ac.ac_pid = current->tgid;
449 ac.ac_ppid = current->parent->tgid; 449 ac.ac_ppid = current->parent->tgid;
450 #endif 450 #endif
451 451
452 read_lock(&tasklist_lock); /* pin current->signal */ 452 read_lock(&tasklist_lock); /* pin current->signal */
453 ac.ac_tty = current->signal->tty ? 453 ac.ac_tty = current->signal->tty ?
454 old_encode_dev(tty_devnum(current->signal->tty)) : 0; 454 old_encode_dev(tty_devnum(current->signal->tty)) : 0;
455 read_unlock(&tasklist_lock); 455 read_unlock(&tasklist_lock);
456 456
457 ac.ac_flag = 0; 457 ac.ac_flag = 0;
458 if (current->flags & PF_FORKNOEXEC) 458 if (current->flags & PF_FORKNOEXEC)
459 ac.ac_flag |= AFORK; 459 ac.ac_flag |= AFORK;
460 if (current->flags & PF_SUPERPRIV) 460 if (current->flags & PF_SUPERPRIV)
461 ac.ac_flag |= ASU; 461 ac.ac_flag |= ASU;
462 if (current->flags & PF_DUMPCORE) 462 if (current->flags & PF_DUMPCORE)
463 ac.ac_flag |= ACORE; 463 ac.ac_flag |= ACORE;
464 if (current->flags & PF_SIGNALED) 464 if (current->flags & PF_SIGNALED)
465 ac.ac_flag |= AXSIG; 465 ac.ac_flag |= AXSIG;
466 466
467 vsize = 0; 467 vsize = 0;
468 if (current->mm) { 468 if (current->mm) {
469 struct vm_area_struct *vma; 469 struct vm_area_struct *vma;
470 down_read(&current->mm->mmap_sem); 470 down_read(&current->mm->mmap_sem);
471 vma = current->mm->mmap; 471 vma = current->mm->mmap;
472 while (vma) { 472 while (vma) {
473 vsize += vma->vm_end - vma->vm_start; 473 vsize += vma->vm_end - vma->vm_start;
474 vma = vma->vm_next; 474 vma = vma->vm_next;
475 } 475 }
476 up_read(&current->mm->mmap_sem); 476 up_read(&current->mm->mmap_sem);
477 } 477 }
478 vsize = vsize / 1024; 478 vsize = vsize / 1024;
479 ac.ac_mem = encode_comp_t(vsize); 479 ac.ac_mem = encode_comp_t(vsize);
480 ac.ac_io = encode_comp_t(0 /* current->io_usage */); /* %% */ 480 ac.ac_io = encode_comp_t(0 /* current->io_usage */); /* %% */
481 ac.ac_rw = encode_comp_t(ac.ac_io / 1024); 481 ac.ac_rw = encode_comp_t(ac.ac_io / 1024);
482 ac.ac_minflt = encode_comp_t(current->signal->min_flt + 482 ac.ac_minflt = encode_comp_t(current->signal->min_flt +
483 current->group_leader->min_flt); 483 current->group_leader->min_flt);
484 ac.ac_majflt = encode_comp_t(current->signal->maj_flt + 484 ac.ac_majflt = encode_comp_t(current->signal->maj_flt +
485 current->group_leader->maj_flt); 485 current->group_leader->maj_flt);
486 ac.ac_swaps = encode_comp_t(0); 486 ac.ac_swaps = encode_comp_t(0);
487 ac.ac_exitcode = exitcode; 487 ac.ac_exitcode = exitcode;
488 488
489 /* 489 /*
490 * Kernel segment override to datasegment and write it 490 * Kernel segment override to datasegment and write it
491 * to the accounting file. 491 * to the accounting file.
492 */ 492 */
493 fs = get_fs(); 493 fs = get_fs();
494 set_fs(KERNEL_DS); 494 set_fs(KERNEL_DS);
495 /* 495 /*
496 * Accounting records are not subject to resource limits. 496 * Accounting records are not subject to resource limits.
497 */ 497 */
498 flim = current->signal->rlim[RLIMIT_FSIZE].rlim_cur; 498 flim = current->signal->rlim[RLIMIT_FSIZE].rlim_cur;
499 current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY; 499 current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY;
500 file->f_op->write(file, (char *)&ac, 500 file->f_op->write(file, (char *)&ac,
501 sizeof(acct_t), &file->f_pos); 501 sizeof(acct_t), &file->f_pos);
502 current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim; 502 current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim;
503 set_fs(fs); 503 set_fs(fs);
504 } 504 }
505 505
506 /* 506 /*
507 * acct_process - now just a wrapper around do_acct_process 507 * acct_process - now just a wrapper around do_acct_process
508 */ 508 */
509 void acct_process(long exitcode) 509 void acct_process(long exitcode)
510 { 510 {
511 struct file *file = NULL; 511 struct file *file = NULL;
512 512
513 /* 513 /*
514 * accelerate the common fastpath: 514 * accelerate the common fastpath:
515 */ 515 */
516 if (!acct_globals.file) 516 if (!acct_globals.file)
517 return; 517 return;
518 518
519 spin_lock(&acct_globals.lock); 519 spin_lock(&acct_globals.lock);
520 file = acct_globals.file; 520 file = acct_globals.file;
521 if (unlikely(!file)) { 521 if (unlikely(!file)) {
522 spin_unlock(&acct_globals.lock); 522 spin_unlock(&acct_globals.lock);
523 return; 523 return;
524 } 524 }
525 get_file(file); 525 get_file(file);
526 spin_unlock(&acct_globals.lock); 526 spin_unlock(&acct_globals.lock);
527 527
528 do_acct_process(exitcode, file); 528 do_acct_process(exitcode, file);
529 fput(file); 529 fput(file);
530 } 530 }
531 531
532 532
533 /* 533 /*
534 * acct_update_integrals 534 * acct_update_integrals
535 * - update mm integral fields in task_struct 535 * - update mm integral fields in task_struct
536 */ 536 */
537 void acct_update_integrals(struct task_struct *tsk) 537 void acct_update_integrals(struct task_struct *tsk)
538 { 538 {
539 if (likely(tsk->mm)) { 539 if (likely(tsk->mm)) {
540 long delta = tsk->stime - tsk->acct_stimexpd; 540 long delta = tsk->stime - tsk->acct_stimexpd;
541 541
542 if (delta == 0) 542 if (delta == 0)
543 return; 543 return;
544 tsk->acct_stimexpd = tsk->stime; 544 tsk->acct_stimexpd = tsk->stime;
545 tsk->acct_rss_mem1 += delta * get_mm_counter(tsk->mm, rss); 545 tsk->acct_rss_mem1 += delta * get_mm_counter(tsk->mm, rss);
546 tsk->acct_vm_mem1 += delta * tsk->mm->total_vm; 546 tsk->acct_vm_mem1 += delta * tsk->mm->total_vm;
547 } 547 }
548 } 548 }
549 549
550 /* 550 /*
551 * acct_clear_integrals 551 * acct_clear_integrals
552 * - clear the mm integral fields in task_struct 552 * - clear the mm integral fields in task_struct
553 */ 553 */
554 void acct_clear_integrals(struct task_struct *tsk) 554 void acct_clear_integrals(struct task_struct *tsk)
555 { 555 {
556 if (tsk) { 556 if (tsk) {
557 tsk->acct_stimexpd = 0; 557 tsk->acct_stimexpd = 0;
558 tsk->acct_rss_mem1 = 0; 558 tsk->acct_rss_mem1 = 0;
559 tsk->acct_vm_mem1 = 0; 559 tsk->acct_vm_mem1 = 0;
560 } 560 }
561 } 561 }
562 562