- 16 Oct, 2007 2 commits
-
-
Nick Piggin authored
The commit b5810039 contains the note A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!). There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see. Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used). As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it. When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss. Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it. The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution. Signed-off-by:
Nick Piggin <npiggin@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus "snif" Torvalds <torvalds@linux-foundation.org>
-
Fengguang Wu authored
Combine the file_ra_state members unsigned long prev_index unsigned int prev_offset into loff_t prev_pos It is more consistent and better supports huge files. Thanks to Peter for the nice proposal! [akpm@linux-foundation.org: fix shift overflow] Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 15 Oct, 2007 6 commits
-
-
Randy Dunlap authored
Fix filesystems docbook warnings. Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'name' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'mode' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'parent' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'value' Warning(linux-2.6.23-git8//include/linux/jbd.h:404): No description found for parameter 'h_lockdep_map' Signed-off-by:
Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Ingo Molnar authored
make sync wakeups affine for cache-cold tasks: if a cache-cold task is woken up by a sync wakeup then use the opportunity to migrate it straight away. (the two tasks are 'related' because they communicate) Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Laurent Vivier authored
like for cpustat, introduce the "gtime" (guest time of the task) and "cgtime" (guest time of the task children) fields for the tasks. Modify signal_struct and task_struct. Modify /proc/<pid>/stat to display these new fields. Signed-off-by:
Laurent Vivier <Laurent.Vivier@bull.net> Acked-by:
Avi Kivity <avi@qumranet.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Laurent Vivier authored
as recent CPUs introduce a third running state, after "user" and "system", we need a new field, "guest", in cpustat to store the time used by the CPU to run virtual CPU. Modify /proc/stat to display this new field. Signed-off-by:
Laurent Vivier <Laurent.Vivier@bull.net> Acked-by:
Avi Kivity <avi@qumranet.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Mike Galbraith authored
Here's another piece of low hanging obsolete fruit. Remove obsolete TASK_NONINTERACTIVE. Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Ingo Molnar authored
rename all 'cnt' fields and variables to the less yucky 'count' name. yuckage noticed by Andrew Morton. no change in code, other than the /proc/sched_debug bkl_count string got a bit larger: text data bss dec hex filename 38236 3506 24 41766 a326 sched.o.before 38240 3506 24 41770 a32a sched.o.after Signed-off-by:
Ingo Molnar <mingo@elte.hu> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 14 Oct, 2007 2 commits
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 13 Oct, 2007 1 commit
-
-
Peter Zijlstra authored
On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote: > The circular lock seems to be this: > > #1: > > sys_mmap2: down_write(&mm->mmap_sem); > nfs_revalidate_mapping: mutex_lock(&inode->i_mutex); > > > #0: > > vfs_readdir: mutex_lock(&inode->i_mutex); > - during the readdir (filldir64), we take a user fault (missing page?) > and call do_page_fault - > do_page_fault: down_read(&mm->mmap_sem); > > > So it does indeed look like a circular locking. Now the question is, "is > this a bug?". Looking like the inode of #1 must be a file or something > else that you can mmap and the inode of #0 seems it must be a directory. > I would say "no". > > Now if you can readdir on a file or mmap a directory, then this could be > an issue. > > Otherwise, I'd love to see someone teach lockdep about this issue! ;-) Make a distinction between file and dir usage of i_mutex. The inode should be complete and unused at unlock_new_inode(), re-init i_mutex depending on its type. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl>
-
- 15 Oct, 2007 1 commit
-
-
Peter Zijlstra authored
Give each filesystem its own inode lock class. The various filesystems have different locking order wrt the inode locks; esp. the pseudo filesystems differ from the rest. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl>
-
- 13 Oct, 2007 7 commits
-
-
Dave Kleikamp authored
commit e30408b2 ("JFS: fix bio-related build breakage") removed some "return 0;" statements, rather than changing them to null returns. Signed-off-by:
Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Woodhouse authored
Signed-off-by:
David Woodhouse <dwmw2@infradead.org>
-
David Woodhouse authored
In three places: summary scan, normal scan, REF_PRISTINE GC. Just truncate at the NUL, since that was the correct thing to do in the only case where this (inexplicable) breakage has been seen. Signed-off-by:
David Woodhouse <dwmw2@infradead.org>
-
David Woodhouse authored
I have no idea how this happened, but OLPC trac #4184 suggests that it did. Catch it early. Signed-off-by:
David Woodhouse <dwmw2@infradead.org>
-
David Woodhouse authored
... where we'll actually print the count in a debug message. Signed-off-by:
David Woodhouse <dwmw2@infradead.org>
-
David Woodhouse authored
In OLPC trac #4184 we found a case where a corrupted node didn't actually get obsoleted when we tried to garbage-collect it. So we wrote out many million copies of it, in repeated attempts to obsolete it, until the flash became full. Don't Do That. Signed-off-by:
David Woodhouse <dwmw2@infradead.org>
-
David Woodhouse authored
Instead of matching resv_blocks_gcmerge, which is only about 3, instead match resv_blocks_gctrigger, which includes a proportion of the total device size. These ought to become tunable from userspace, at some point. Signed-off-by:
David Woodhouse <dwmw2@infradead.org>
-
- 12 Oct, 2007 21 commits
-
-
Tejun Heo authored
Sysfs has gone through considerable amount of reimplementation. Add copyrights. Any objections? :-) Signed-off-by:
Tejun Heo <htejun@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
Sysfs file poll implementation is scattered over sysfs and kobject. Event numbering is done in sysfs_dirent but wait itself is done on kobject. This not only unecessarily bloats both kobject and sysfs_dirent but is also buggy - if a sysfs_dirent is removed while there still are pollers, the associaton betwen the kobject and sysfs_dirent breaks and kobject may be freed with the pollers still sleeping on it. This patch moves whole poll implementation into sysfs_open_dirent. Each time a sysfs_open_dirent is created, event number restarts from 1 and pollers sleep on sysfs_open_dirent. As event sequence number is meaningless without any open file and pollers should have open file and thus sysfs_open_dirent, this ephemeral event counting works and is a saner implementation. This patch fixes the dnagling sleepers bug and reduces the sizes of kobject and sysfs_dirent by one pointer. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
Implement sysfs_open_dirent which represents an open file (attribute) sysfs_dirent. A file sysfs_dirent with one or more open files have one sysfs_dirent and all sysfs_buffers (one for each open instance) are linked to it. sysfs_open_dirent doesn't actually do anything yet but will be used to off-load things which are specific for open file sysfs_dirent from it. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
Children list head is only meaninful for directory nodes. Move it into s_dir. This doesn't save any space currently but it will with further changes. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
sysfs_root is different from a regular directory dirent in that it's of type SYSFS_ROOT and doesn't have a name. These differences aren't used by anybody and only adds to complexity. Make sysfs_root a regular directory dirent. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
sysfs_attach_dentry() now has only one caller and isn't doing much other than obfuscating the code. Open code and kill it. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
Make s_elem an anonymous union. Prefixing with s_elem makes things needlessly longer without any advantage. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
All bin attr operations require active references of itself and its parent. There's no reason to allow open when its parent has been deactivated and allowing it is inconsistent with regular sysfs file. Use sysfs_get_active_two() in bin attribute open function. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
In sysfs_release(), sysfs_buffer pointed to by filp->private_data is guaranteed to exist. Kill the unnecessary NULL check. This also makes the code more consistent with the counterpart in fs/sysfs/bin.c. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
There's no reason to get an extra reference to sysfs_dirent for an open file. Open file has a reference to the dentry which in turn has a reference to sysfs_dirent. This is fairly obvious as otherwise open itself won't be able to access the sysfs_dirent. Kill the extra sysfs_get() and matching sysfs_put(). Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
Move s_mode downward such that it's side-by-side with s_iattr which is used for the same thing. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
sysfs_update_file() depends on inode->i_mtime but sysfs iondes are now reclaimable making the reported modification time unreliable. There's only one user (pci hotplug) of this notification mechanism and it reportedly isn't utilized from userland. Kill sysfs_update_file(). Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
sysfs is about to go through major overhaul making this a pretty good opportunity to clean up (out-of-tree changes and pending patches will need regeneration anyway). Clean up headers. * Kill space between * and symbolname. * Move SYSFS_* type constants and flags into fs/sysfs/sysfs.h. They're internal to sysfs. * Reformat function prototypes and add argument symbol names. * Make dummy function definition order match that of function prototypes. * Add some comments. * Reorganize fs/sysfs/sysfs.h according to which file the declared variable or feature lives in. This patch does not introduce any behavior change. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
sysfs_chmod_file() looked and updated only inode of the target file. Dentry and inode are reclaimable and the update mode data will go away when the inode is reclaimed. This patch makes sysfs_chmod_file() update sd->s_mode too such that the change is permanent. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
sysfs_add/remove_one() now link and unlink the target dirent into and from the children list. Update comments accordingly. Signed-off-by:
Tejun Heo <htejun@gmail.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Greg Kroah-Hartman authored
We want to let people know when we create a duplicate sysfs file, as they need to fix up their code. Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric W. Biederman authored
This patch rewrites sysfs_move_dir to perform it's checks as much as possible on the underlying sysfs_dirents instead of the contents of the dcache, making sysfs_move_dir more like the rest of the sysfs directory modification code. Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
Tejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric W. Biederman authored
This patch rewrites sysfs_rename_dir to perform it's checks as much as possible on the underlying sysfs_dirents instead of the contents of the dcache. It turns out that this version is a little simpler, and a little more like the rest of the sysfs directory modification code. tj: fixed double locking of sysfs_mutex Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
Tejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric W. Biederman authored
The only uses of s_dentry left are the code that maintains s_dentry and trivial users that don't actually need it. So this patch removes the s_dentry maintenance code and restructures the trivial uses to use something else. Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
Tejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
Now that we know the sysfs tree structure cannot change under us and sysfs shadow support is dropped, sysfs_get_dentry() can be simplified greatly. It can just look up from the root and there's no need to retry on failure. Signed-off-by:
Tejun Heo <htejun@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric W. Biederman authored
Looking carefully at the rename code we have a subtle dependency that the structure of sysfs not change while we are performing a rename. If the parent directory of the object we are renaming changes while the rename is being performed nasty things could happen when we go to release our locks. So introduce a sysfs_rename_mutex to prevent this highly unlikely theoretical issue. In addition hold sysfs_rename_mutex over all calls to sysfs_get_dentry. Allowing sysfs_get_dentry to be simplified in the future. Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
Tejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-