Commits · 557ed1fa2620dc119adb86b34c614e152a629a80 · matisse / android_kernel_samsung_matisse

16 Oct, 2007 2 commits

Nick Piggin authored 17 years ago

The commit b5810039

 contains the note

  A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
  (and thus mapcounted and count towards shared rss).  These writes to
  the struct page could cause excessive cacheline bouncing on big
  systems.  There are a number of ways this could be addressed if it is
  an issue.

And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).

There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completely

I will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.

Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).

As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.

When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.

Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.

The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
much use to them except on benchmarks.  All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>

557ed1fa

readahead: combine file_ra_state.prev_index/prev_offset into prev_pos · f4e6b498

Fengguang Wu authored 17 years ago


Combine the file_ra_state members
				unsigned long prev_index
				unsigned int prev_offset
into
				loff_t prev_pos

It is more consistent and better supports huge files.

Thanks to Peter for the nice proposal!

[akpm@linux-foundation.org: fix shift overflow]
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f4e6b498

15 Oct, 2007 6 commits

docbook: fix filesystems content · e6716b87

Randy Dunlap authored 17 years ago

Fix filesystems docbook warnings.

Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'name'
Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'mode'
Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'parent'
Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'value'
Warning(linux-2.6.23-git8//include/linux/jbd.h:404): No description found for parameter 'h_lockdep_map'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e6716b87

sched: affine sync wakeups · 71e20f18

Ingo Molnar authored 17 years ago

make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)
Signed-off-by: Ingo Molnar <mingo@elte.hu>

71e20f18

sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields · 9ac52315

Laurent Vivier authored 17 years ago


like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.

Modify /proc/<pid>/stat to display these new fields.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9ac52315

sched: guest CPU accounting: add guest-CPU /proc/stat field · 5e84cfde

Laurent Vivier authored 17 years ago


as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time
used by the CPU to run virtual CPU. Modify /proc/stat to display this
new field.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

5e84cfde

sched: cleanup, remove the TASK_NONINTERACTIVE flag · af927232

Mike Galbraith authored 17 years ago


Here's another piece of low hanging obsolete fruit.

Remove obsolete TASK_NONINTERACTIVE.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

af927232

sched: clean up schedstats, cnt -> count · 2d72376b

Ingo Molnar authored 17 years ago


rename all 'cnt' fields and variables to the less yucky 'count' name.

yuckage noticed by Andrew Morton.

no change in code, other than the /proc/sched_debug bkl_count string got
a bit larger:

   text    data     bss     dec     hex filename
  38236    3506      24   41766    a326 sched.o.before
  38240    3506      24   41770    a32a sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

2d72376b

14 Oct, 2007 2 commits

more low-hanging fruits - kernel, fs, lib signedness · 5ba25331

Al Viro authored 17 years ago

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5ba25331

fs/partitions/sun.c endianness annotations · b1519d04

Al Viro authored 17 years ago

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b1519d04

13 Oct, 2007 1 commit

lockdep: annotate dir vs file i_mutex · 14358e6d

Peter Zijlstra authored 17 years ago


On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote:
> The circular lock seems to be this:
> 
> #1:
> 
>   sys_mmap2:              down_write(&mm->mmap_sem);
>   nfs_revalidate_mapping: mutex_lock(&inode->i_mutex);
> 
> 
> #0:
> 
>   vfs_readdir:     mutex_lock(&inode->i_mutex);
>    - during the readdir (filldir64), we take a user fault (missing page?)
>     and call do_page_fault -
>   do_page_fault:   down_read(&mm->mmap_sem);
> 
> 
> So it does indeed look like a circular locking. Now the question is, "is
> this a bug?".  Looking like the inode of #1 must be a file or something
> else that you can mmap and the inode of #0 seems it must be a directory.
> I would say "no".
> 
> Now if you can readdir on a file or mmap a directory, then this could be
> an issue.
> 
> Otherwise, I'd love to see someone teach lockdep about this issue! ;-)

Make a distinction between file and dir usage of i_mutex.
The inode should be complete and unused at unlock_new_inode(), re-init
i_mutex depending on its type.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

14358e6d

15 Oct, 2007 1 commit

lockdep: per filesystem inode lock class · d475fd42

Peter Zijlstra authored 17 years ago

Give each filesystem its own inode lock class. The various filesystems have
different locking order wrt the inode locks; esp. the pseudo filesystems differ
from the rest.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

d475fd42

13 Oct, 2007 7 commits

JFS: Bio cleanup: Replace missing return statements · 8d8fe642

Dave Kleikamp authored 17 years ago

commit e30408b2

 ("JFS: fix bio-related
build breakage") removed some "return 0;" statements, rather than
changing them to null returns.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8d8fe642

[JFFS2] Remove stray debugging printk · 4fc8a607
David Woodhouse authored 17 years ago
```
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
```
4fc8a607

[JFFS2] Handle dirents on the flash with embedded zero bytes in names. · b534e70c

David Woodhouse authored 17 years ago


In three places: summary scan, normal scan, REF_PRISTINE GC.

Just truncate at the NUL, since that was the correct thing to do in the
only case where this (inexplicable) breakage has been seen.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>

b534e70c

[JFFS2] Check for creation of dirents with embedded zero bytes in name. · 69ca4378

David Woodhouse authored 17 years ago


I have no idea how this happened, but OLPC trac #4184 suggests that it
did. Catch it early.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>

69ca4378

[JFFS2] Don't count all 'very dirty' blocks except in debug mode · a8c68f32
David Woodhouse authored 17 years ago
```
... where we'll actually print the count in a debug message.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
```
a8c68f32

[JFFS2] Check whether garbage-collection actually obsoleted its victim. · 2665ea84

David Woodhouse authored 17 years ago


In OLPC trac #4184 we found a case where a corrupted node didn't
actually get obsoleted when we tried to garbage-collect it. So we wrote
out many million copies of it, in repeated attempts to obsolete it,
until the flash became full. Don't Do That.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>

2665ea84

[JFFS2] Relax threshold for triggering GC due to dirty blocks. · 85becc53

David Woodhouse authored 17 years ago


Instead of matching resv_blocks_gcmerge, which is only about 3, instead
match resv_blocks_gctrigger, which includes a proportion of the total
device size.

These ought to become tunable from userspace, at some point.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>

85becc53

12 Oct, 2007 21 commits

sysfs: add copyrights · 6d66f5cd

Tejun Heo authored 17 years ago


Sysfs has gone through considerable amount of reimplementation.  Add
copyrights.  Any objections?  :-)
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

6d66f5cd

sysfs: move sysfs file poll implementation to sysfs_open_dirent · a4e8b912

Tejun Heo authored 17 years ago


Sysfs file poll implementation is scattered over sysfs and kobject.
Event numbering is done in sysfs_dirent but wait itself is done on
kobject.  This not only unecessarily bloats both kobject and
sysfs_dirent but is also buggy - if a sysfs_dirent is removed while
there still are pollers, the associaton betwen the kobject and
sysfs_dirent breaks and kobject may be freed with the pollers still
sleeping on it.

This patch moves whole poll implementation into sysfs_open_dirent.
Each time a sysfs_open_dirent is created, event number restarts from 1
and pollers sleep on sysfs_open_dirent.  As event sequence number is
meaningless without any open file and pollers should have open file
and thus sysfs_open_dirent, this ephemeral event counting works and is
a saner implementation.

This patch fixes the dnagling sleepers bug and reduces the sizes of
kobject and sysfs_dirent by one pointer.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a4e8b912

sysfs: implement sysfs_open_dirent · 85a4ffad

Tejun Heo authored 17 years ago


Implement sysfs_open_dirent which represents an open file (attribute)
sysfs_dirent.  A file sysfs_dirent with one or more open files have
one sysfs_dirent and all sysfs_buffers (one for each open instance)
are linked to it.

sysfs_open_dirent doesn't actually do anything yet but will be used to
off-load things which are specific for open file sysfs_dirent from it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

85a4ffad

sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir · bc747f37

Tejun Heo authored 17 years ago


Children list head is only meaninful for directory nodes.  Move it
into s_dir.  This doesn't save any space currently but it will with
further changes.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

bc747f37

sysfs: make sysfs_root a regular directory dirent · dc2f75f0

Tejun Heo authored 17 years ago


sysfs_root is different from a regular directory dirent in that it's
of type SYSFS_ROOT and doesn't have a name.  These differences aren't
used by anybody and only adds to complexity.  Make sysfs_root a
regular directory dirent.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dc2f75f0

sysfs: open code sysfs_attach_dentry() · d6b4fd2f

Tejun Heo authored 17 years ago


sysfs_attach_dentry() now has only one caller and isn't doing much
other than obfuscating the code.  Open code and kill it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

d6b4fd2f

sysfs: make s_elem an anonymous union · b1fc3d61

Tejun Heo authored 17 years ago


Make s_elem an anonymous union.  Prefixing with s_elem makes things
needlessly longer without any advantage.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b1fc3d61

sysfs: make bin attr open get active reference of parent too · 078ce640

Tejun Heo authored 17 years ago


All bin attr operations require active references of itself and its
parent.  There's no reason to allow open when its parent has been
deactivated and allowing it is inconsistent with regular sysfs file.
Use sysfs_get_active_two() in bin attribute open function.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

078ce640

sysfs: kill unnecessary NULL pointer check in sysfs_release() · 50ab1a72

Tejun Heo authored 17 years ago


In sysfs_release(), sysfs_buffer pointed to by filp->private_data is
guaranteed to exist.  Kill the unnecessary NULL check.  This also
makes the code more consistent with the counterpart in fs/sysfs/bin.c.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

50ab1a72

sysfs: kill unnecessary sysfs_get() in open paths · b05f0548

Tejun Heo authored 17 years ago


There's no reason to get an extra reference to sysfs_dirent for an
open file.  Open file has a reference to the dentry which in turn has
a reference to sysfs_dirent.  This is fairly obvious as otherwise open
itself won't be able to access the sysfs_dirent.  Kill the extra
sysfs_get() and matching sysfs_put().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b05f0548

sysfs: reposition sysfs_dirent->s_mode. · b13dc89c

Tejun Heo authored 17 years ago


Move s_mode downward such that it's side-by-side with s_iattr which is
used for the same thing.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b13dc89c

sysfs: kill sysfs_update_file() · 5a7ad7f0

Tejun Heo authored 17 years ago


sysfs_update_file() depends on inode->i_mtime but sysfs iondes are now
reclaimable making the reported modification time unreliable.  There's
only one user (pci hotplug) of this notification mechanism and it
reportedly isn't utilized from userland.

Kill sysfs_update_file().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

5a7ad7f0

sysfs: clean up header files · 59f69015

Tejun Heo authored 17 years ago


sysfs is about to go through major overhaul making this a pretty good
opportunity to clean up (out-of-tree changes and pending patches will
need regeneration anyway).  Clean up headers.

* Kill space between * and symbolname.

* Move SYSFS_* type constants and flags into fs/sysfs/sysfs.h.
  They're internal to sysfs.

* Reformat function prototypes and add argument symbol names.

* Make dummy function definition order match that of function
  prototypes.

* Add some comments.

* Reorganize fs/sysfs/sysfs.h according to which file the declared
  variable or feature lives in.

This patch does not introduce any behavior change.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

59f69015

sysfs: fix sysfs_chmod_file() such that it updates sd->s_mode too · f88123ea

Tejun Heo authored 17 years ago


sysfs_chmod_file() looked and updated only inode of the target file.
Dentry and inode are reclaimable and the update mode data will go away
when the inode is reclaimed.  This patch makes sysfs_chmod_file()
update sd->s_mode too such that the change is permanent.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f88123ea

sysfs: fix comments of sysfs_add/remove_one() · 181b2e4b

Tejun Heo authored 17 years ago


sysfs_add/remove_one() now link and unlink the target dirent into and
from the children list.  Update comments accordingly.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

181b2e4b

sysfs: spit a warning to users when they try to create a duplicate sysfs file · 5c3e8964

Greg Kroah-Hartman authored 17 years ago


We want to let people know when we create a duplicate sysfs file, as
they need to fix up their code.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

5c3e8964

sysfs: Rewrite sysfs_move_dir in terms of sysfs dirents · 45aaae9c

Eric W. Biederman authored 17 years ago


This patch rewrites sysfs_move_dir to perform it's checks
as much as possible on the underlying sysfs_dirents instead
of the contents of the dcache, making sysfs_move_dir
more like the rest of the sysfs directory modification
code.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

45aaae9c

sysfs: Rewrite rename in terms of sysfs dirents · 9918f9a4

Eric W. Biederman authored 17 years ago


This patch rewrites sysfs_rename_dir to perform it's checks
as much as possible on the underlying sysfs_dirents instead
of the contents of the dcache.  It turns out that this version
is a little simpler, and a little more like the rest of
the sysfs directory modification code.

tj: fixed double locking of sysfs_mutex
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9918f9a4

sysfs: Remove s_dentry · 5a26b79c

Eric W. Biederman authored 17 years ago


The only uses of s_dentry left are the code that maintains
s_dentry and trivial users that don't actually need it.
So this patch removes the s_dentry maintenance code and
restructures the trivial uses to use something else.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

5a26b79c

sysfs: simply sysfs_get_dentry · e0712bbf

Tejun Heo authored 17 years ago


Now that we know the sysfs tree structure cannot change under us and
sysfs shadow support is dropped, sysfs_get_dentry() can be simplified
greatly.  It can just look up from the root and there's no need to
retry on failure.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e0712bbf

sysfs: Introduce sysfs_rename_mutex · 932ea2e3

Eric W. Biederman authored 17 years ago


Looking carefully at the rename code we have a subtle dependency
that the structure of sysfs not change while we are performing
a rename.  If the parent directory of the object we are renaming
changes while the rename is being performed nasty things could
happen when we go to release our locks.

So introduce a sysfs_rename_mutex to prevent this highly
unlikely theoretical issue.

In addition hold sysfs_rename_mutex over all calls to
sysfs_get_dentry. Allowing sysfs_get_dentry to be simplified
in the future.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

932ea2e3