Commits · 5a303d5d4b8055d2e5a03e92d04745bfc5881a22 · matisse / android_kernel_samsung_matisse

17 Dec, 2009 3 commits

Btrfs: Make fallocate(2) more ENOSPC friendly · 5a303d5d

Yan, Zheng authored 15 years ago


fallocate(2) may allocate large number of file extents, so it's not
good to do it in a single transaction. This patch make fallocate(2)
start a new transaction for each file extents it allocates.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5a303d5d

Btrfs: Avoid orphan inodes cleanup while replaying log · c71bf099

Yan, Zheng authored 15 years ago


We do log replay in a single transaction, so it's not good to do unbound
operations. This patch cleans up orphan inodes cleanup after replaying
the log. It also avoids doing other unbound operations such as truncating
a file during replaying log. These unbound operations are postponed to
the orphan inode cleanup stage.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c71bf099

Btrfs: Fix disk_i_size update corner case · c2167754

Yan, Zheng authored 15 years ago


There are some cases file extents are inserted without involving
ordered struct. In these cases, we update disk_i_size directly,
without checking pending ordered extent and DELALLOC bit. This
patch extends btrfs_ordered_update_i_size() to handle these cases.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c2167754

15 Dec, 2009 1 commit

Btrfs: Rewrite btrfs_drop_extents · 920bbbfb

Yan, Zheng authored 15 years ago


Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can
avoid calling lock_extent within transaction.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

920bbbfb

11 Nov, 2009 4 commits

Btrfs: fix panic when trying to destroy a newly allocated · a6dbd429

Josef Bacik authored 15 years ago

There is a problem where iget5_locked will look for an inode, not find it, and
then subsequently try to allocate it. Another CPU will have raced in and
allocated the inode instead, so when iget5_locked gets the inode spin lock again
and does a search, it finds the new inode. So it goes ahead and calls
destroy_inode on the inode it just allocated. The problem is we don't set
BTRFS_I(inode)->root until the new inode is completely initialized. This patch
makes us set root to NULL when alloc'ing a new inode, so when we get to
btrfs_destroy_inode and we see that root is NULL we can just free up the memory
and continue on. This fixes the panic

http://www.kerneloops.org/submitresult.php?number=812690

Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

a6dbd429

Btrfs: fallback on uncompressed io if compressed io fails · f5a84ee3

Josef Bacik authored 15 years ago

Currently compressed IO does not deal with not having its entire extent able to
be allocated. So if we have enough free space to allocate for the extent, but
its not contiguous, it will fail spectacularly. This patch fixes this by
falling back on uncompressed IO which lets us spread the delalloc extent across
multiple extents. I tested this by making us randomly think the reservation had
failed to make it fallback on the uncompressed io way and it seemed to work
fine. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

f5a84ee3

Btrfs: fix some metadata enospc issues · 5df6a9f6

Josef Bacik authored 15 years ago


We weren't reserving metadata space for rename, rmdir and unlink, which could
cause problems.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5df6a9f6

Btrfs: fix data allocation hint start · 6346c939

Josef Bacik authored 15 years ago

Sometimes our start allocation hint when we cow a file can be either
EXTENT_HOLE or some other such place holder, which is not optimal. So if we
find that our em->block_start is one of these special values, check to see
where the first block of the inode is stored, and use that as a hint. If that
block is also a special value, just fallback on a hint of 0 and let the
allocator figure out a good place to put the data.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

6346c939

14 Oct, 2009 1 commit

Btrfs: fix possible ENOSPC problems with truncate · 5d5e103a

Josef Bacik authored 15 years ago


There's a problem where we don't do any space reservation for truncates, which
can cause you to OOPs because you will be allowed to go off in the weeds a bit
since we don't account for the delalloc bytes that are created as a result of
the truncate.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5d5e103a

13 Oct, 2009 1 commit

Btrfs: avoid tree log commit when there are no changes · 257c62e1

Chris Mason authored 15 years ago


rpm has a habit of running fdatasync when the file hasn't
changed.  We already detect if a file hasn't been changed
in the current transaction but it might have been sent to
the tree-log in this transaction and not changed since
the last call to fsync.

In this case, we want to avoid a tree log sync, which includes
a number of synchronous writes and barriers.  This commit
extends the existing tracking of the last transaction to change
a file to also track the last sub-transaction.

The end result is that rpm -ivh and -Uvh are roughly twice as fast,
and on par with ext3.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

257c62e1

09 Oct, 2009 3 commits

Btrfs: fix uninit compiler warning in cow_file_range_nocow · e9061e21

Chris Mason authored 15 years ago


The extent_type variable was exposed uninit via a goto.  It should be
impossible to trigger because it is protected by a check on another
variable, but this makes sure.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e9061e21

Btrfs: constify dentry_operations · 82d339d9
Alexey Dobriyan authored 15 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
82d339d9

Btrfs: remove negative dentry when deleting subvolumne · efefb143

Yan, Zheng authored 15 years ago


The use of btrfs_dentry_delete is removing dentries from the
dcache when deleting subvolumne. btrfs_dentry_delete ignores
negative dentries. This is incorrect since if we don't remove
the negative dentry, its parent dentry can't be removed.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

efefb143

08 Oct, 2009 3 commits

Btrfs: release delalloc reservations on extent item insertion · 32c00aff

Josef Bacik authored 15 years ago

This patch fixes an issue with the delalloc metadata space reservation
code. The problem is we used to free the reservation as soon as we
allocated the delalloc region. The problem with this is if we are not
inserting an inline extent, we don't actually insert the extent item until
after the ordered extent is written out. This patch does 3 things,

1) It moves the reservation clearing stuff into the ordered code, so when
we remove the ordered extent we remove the reservation.
2) It adds a EXTENT_DO_ACCOUNTING flag that gets passed when we clear
delalloc bits in the cases where we want to clear the metadata reservation
when we clear the delalloc extent, in the case that we do an inline extent
or we invalidate the page.
3) It adds another waitqueue to the space info so that when we start a fs
wide delalloc flush, anybody else who also hits that area will simply wait
for the flush to finish and then try to make their allocation.

This has been tested thoroughly to make sure we did not regress on
performance.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

32c00aff

Btrfs: delay clearing EXTENT_DELALLOC for compressed extents · a3429ab7

Chris Mason authored 15 years ago


When compression is on, the cow_file_range code is farmed off to
worker threads.  This allows us to do significant CPU work in parallel
on SMP machines.

But it is a delicate balance around when we clear flags and how.  In
the past we cleared the delalloc flag immediately, which was safe
because the pages stayed locked.

But this is causing problems with the newest ENOSPC code, and with the
recent extent state cleanups we can now clear the delalloc bit at the
same time the uncompressed code does.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

a3429ab7

Btrfs: cleanup extent_clear_unlock_delalloc flags · a791e35e

Chris Mason authored 15 years ago


extent_clear_unlock_delalloc has a growing set of ugly parameters
that is very difficult to read and maintain.

This switches to a flag field and well named flag defines.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

a791e35e

01 Oct, 2009 2 commits

const: constify remaining file_operations · 828c0950

Alexey Dobriyan authored 15 years ago


[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

828c0950

Btrfs: fix data space leak fix · fbf19087

Josef Bacik authored 15 years ago


There is a problem where page_mkwrite can be called on a dirtied page that
already has a delalloc range associated with it.  The fix is to clear any
delalloc bits for the range we are dirtying so the space accounting gets
handled properly.  This is the same thing we do in the normal write case, so we
are consistent across the board.  With this patch we no longer leak reserved
space.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

fbf19087

28 Sep, 2009 1 commit

Btrfs: proper -ENOSPC handling · 9ed74f2d

Josef Bacik authored 15 years ago

At the start of a transaction we do a btrfs_reserve_metadata_space() and
specify how many items we plan on modifying. Then once we've done our
modifications and such, just call btrfs_unreserve_metadata_space() for
the same number of items we reserved.

For keeping track of metadata needed for data I've had to add an extent_io op
for when we merge extents. This lets us track space properly when we are doing
sequential writes, so we don't end up reserving way more metadata space than
what we need.

The only place where the metadata space accounting is not done is in the
relocation code. This is because Yan is going to be reworking that code in the
near future, so running btrfs-vol -b could still possibly result in a ENOSPC
related panic. This patch also turns off the metadata_ratio stuff in order to
allow users to more efficiently use their disk space.

This patch makes it so we track how much metadata we need for an inode's
delayed allocation extents by tracking how many extents are currently
waiting for allocation. It introduces two new callbacks for the
extent_io tree's, merge_extent_hook and split_extent_hook. These help
us keep track of when we merge delalloc extents together and split them
up. Reservations are handled prior to any actually dirty'ing occurs,
and then we unreserve after we dirty.

btrfs_unreserve_metadata_for_delalloc() will make the appropriate
unreservations as needed based on the number of reservations we
currently have and the number of extents we currently have. Doing the
reservation outside of doing any of the actual dirty'ing lets us do
things like filemap_flush() the inode to try and force delalloc to
happen, or as a last resort actually start allocation on all delalloc
inodes in the fs. This has survived dbench, fs_mark and an fsx torture
test.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9ed74f2d

24 Sep, 2009 2 commits

Btrfs: don't rename file into dummy directory · f679a840

Yan, Zheng authored 15 years ago

A recent change enforces only one access point to each subvolume. The first
directory entry (the one added when the subvolume/snapshot was created) is
treated as valid access point, all other subvolume links are linked to dummy
empty directories. The dummy directories are temporary inodes that only in
memory, so we can not rename file into them.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

f679a840

Btrfs: check size of inode backref before adding hardlink · a5719521

Yan, Zheng authored 15 years ago


For every hardlink in btrfs, there is a corresponding inode back
reference. All inode back references for hardlinks in a given
directory are stored in single b-tree item. The size of b-tree item
is limited by the size of b-tree leaf, so we can only create limited
number of hardlinks to a given file in a directory.

The original code lacks of the check, it oops if the number of
hardlinks goes over the limit. This patch fixes the issue by adding
check to btrfs_link and btrfs_rename.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

a5719521

22 Sep, 2009 2 commits

const: mark remaining inode_operations as const · 6e1d5dcc

Alexey Dobriyan authored 15 years ago

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6e1d5dcc

const: mark remaining address_space_operations const · 7f09410b

Alexey Dobriyan authored 15 years ago

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7f09410b

21 Sep, 2009 3 commits

Btrfs: add snapshot/subvolume destroy ioctl · 76dda93c

Yan, Zheng authored 15 years ago

This patch adds snapshot/subvolume destroy ioctl. A subvolume that isn't being
used and doesn't contains links to other subvolumes can be destroyed.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

76dda93c

Btrfs: change how subvolumes are organized · 4df27c4d

Yan, Zheng authored 15 years ago


btrfs allows subvolumes and snapshots anywhere in the directory tree.
If we snapshot a subvolume that contains a link to other subvolume
called subvolA, subvolA can be accessed through both the original
subvolume and the snapshot. This is similar to creating hard link to
directory, and has the very similar problems.

The aim of this patch is enforcing there is only one access point to
each subvolume. Only the first directory entry (the one added when
the subvolume/snapshot was created) is treated as valid access point.
The first directory entry is distinguished by checking root forward
reference. If the corresponding root forward reference is missing,
we know the entry is not the first one.

This patch also adds snapshot/subvolume rename support, the code
allows rename subvolume link across subvolumes.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4df27c4d

Btrfs: do not reuse objectid of deleted snapshot/subvol · 13a8a7c8

Yan, Zheng authored 15 years ago


The new back reference format does not allow reusing objectid of
deleted snapshot/subvol. So we use ++highest_objectid to allocate
objectid for new snapshot/subvol.

Now we use ++highest_objectid to allocate objectid for both new inode
and new snapshot/subvolume, so this patch removes 'find hole' code in
btrfs_find_free_objectid.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

13a8a7c8

18 Sep, 2009 1 commit

Btrfs: search for an allocation hint while filling file COW · b917b7c3

Chris Mason authored 15 years ago


The allocator has some nice knobs for sending hints about where
to try and allocate new blocks, but when we're doing file allocations
we're not sending any hint at all.

This commit adds a simple extent map search to see if we can
quickly and easily find a hint for the allocator.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b917b7c3

16 Sep, 2009 1 commit
- HWPOISON: Enable error_remove_page on btrfs · 465fdd97
  Andi Kleen authored 15 years ago
```
Cc: chris.mason@oracle.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
```
  465fdd97
11 Sep, 2009 7 commits

Btrfs: zero page past end of inline file items · 93c82d57

Chris Mason authored 15 years ago


When btrfs_get_extent is reading inline file items for readpage,
it needs to copy the inline extent into the page.  If the
inline extent doesn't cover all of the page, that means there
is a hole in the file, or that our file is smaller than one
page.

readpage does zeroing for the case where the file is smaller than one
page, but nobody is currently zeroing for the case where there is
a hole after the inline item.

This commit changes btrfs_get_extent to zero fill the page past
the end of the inline item.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

93c82d57

Btrfs: fix btrfs page_mkwrite to return locked page · 50a9b214

Chris Mason authored 15 years ago


This closes a whole where the page may be written before
the page_mkwrite caller has a chance to dirty it

(thanks to Nick Piggin)
Signed-off-by: Chris Mason <chris.mason@oracle.com>

50a9b214

Btrfs: Fix extent replacment race · a1ed835e

Chris Mason authored 15 years ago


Data COW means that whenever we write to a file, we replace any old
extent pointers with new ones.  There was a window where a readpage
might find the old extent pointers on disk and cache them in the
extent_map tree in ram in the middle of a given write replacing them.

Even though both the readpage and the write had their respective bytes
in the file locked, the extent readpage inserts may cover more bytes than
it had locked down.

This commit closes the race by keeping the new extent pinned in the extent
map tree until after the on-disk btree is properly setup with the new
extent pointers.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

a1ed835e

Btrfs: Use PagePrivate2 to track pages in the data=ordered code. · 8b62b72b

Chris Mason authored 15 years ago


Btrfs writes go through delalloc to the data=ordered code.  This
makes sure that all of the data is on disk before the metadata
that references it.  The tracking means that we have to make sure
each page in an extent is fully written before we add that extent into
the on-disk btree.

This was done in the past by setting the EXTENT_ORDERED bit for the
range of an extent when it was added to the data=ordered code, and then
clearing the EXTENT_ORDERED bit in the extent state tree as each page
finished IO.

One of the reasons we had to do this was because sometimes pages are
magically dirtied without page_mkwrite being called.  The EXTENT_ORDERED
bit is checked at writepage time, and if it isn't there, our page become
dirty without going through the proper path.

These bit operations make for a number of rbtree searches for each page,
and can cause considerable lock contention.

This commit switches from the EXTENT_ORDERED bit to use PagePrivate2.
As pages go into the ordered code, PagePrivate2 is set on each one.
This is a cheap operation because we already have all the pages locked
and ready to go.

As IO finishes, the PagePrivate2 bit is cleared and the ordered
accoutning is updated for each page.

At writepage time, if the PagePrivate2 bit is missing, we go into the
writepage fixup code to handle improperly dirtied pages.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

8b62b72b

Btrfs: use a cached state for extent state operations during delalloc · 9655d298

Chris Mason authored 15 years ago


This changes the btrfs code to find delalloc ranges in the extent state
tree to use the new state caching code from set/test bit.  It reduces
one of the biggest causes of rbtree searches in the writeback path.

test_range_bit is also modified to take the cached state as a starting
point while searching.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9655d298

Btrfs: cache values for locking extents · 2c64c53d

Chris Mason authored 15 years ago


Many of the btrfs extent state tree users follow the same pattern.
They lock an extent range in the tree, do some operation and then
unlock.

This translates to at least 2 rbtree searches, and maybe more if they
are doing operations on the extent state tree.  A locked extent
in the tree isn't going to be merged or changed, and so we can
safely return the extent state structure as a cached handle.

This changes set_extent_bit to give back a cached handle, and also
changes both set_extent_bit and clear_extent_bit to use the cached
handle if it is available.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2c64c53d

Btrfs: switch extent_map to a rw lock · 890871be

Chris Mason authored 15 years ago


There are two main users of the extent_map tree.  The
first is regular file inodes, where it is evenly spread
between readers and writers.

The second is the chunk allocation tree, which maps blocks from
logical addresses to phyiscal ones, and it is 99.99% reads.

The mapping tree is a point of lock contention during heavy IO
workloads, so this commit switches things to a rw lock.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

890871be

21 Aug, 2009 1 commit

btrfs: fix inode rbtree corruption · 03e860bd

From: Nick Piggin authored 15 years ago


Node may not be inserted over existing node. This causes inode tree
corruption and I was seeing crashes in inode_tree_del which I can not
reproduce after this patch.

The other way to fix this would be to tie inode lifetime in the rbtree
with inode while not in freeing state. I had a look at this but it is
not so trivial at this point. At least this patch gets things working again.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

03e860bd

07 Aug, 2009 1 commit

Btrfs: remove superfluous NULL pointer check in btrfs_rename() · 4baf8c92

Bartlomiej Zolnierkiewicz authored 15 years ago


This takes care of the following entry from Dan's list:

fs/btrfs/inode.c +4788 btrfs_rename(36) warning: variable derefenced before check 'old_inode'
Reported-by: Dan Carpenter <error27@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Julia Lawall <julia@diku.dk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4baf8c92

22 Jul, 2009 1 commit

Btrfs: adjust NULL test · 33c17ad5

Julia Lawall authored 15 years ago


Move the call to BUG_ON to before the dereference of the tested value.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

33c17ad5

12 Jul, 2009 1 commit

headers: smp_lock.h redux · 405f5571

Alexey Dobriyan authored 16 years ago


* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

405f5571

02 Jul, 2009 1 commit

Btrfs: honor nodatacow/sum mount options for new files · 94272164

Chris Mason authored 16 years ago

The btrfs attr patches unconditionally inherited the inode flags field
without honoring nodatacow and nodatasum. This fix makes sure
we properly record the nodatacow/sum mount options in new inodes.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

94272164