Commits · e44a1b44c3a9794236fe038b89a0fbef5adcd523 · matisse / android_kernel_samsung_matisse

12 May, 2009 1 commit

block: fix the bio_vec array index out-of-bounds test · af498d7f


Current bio_vec array index out-of-bounds test within
__end_that_request_first() does not seem correct.
It checks bio->bi_idx against bio->bi_vcnt, but the subsequent code
uses idx (which is, bio->bi_idx + next_idx) as the array index into
bio_vec array. This means that the test really make sense only at
the first iteration of !(nr_bytes >=bio->bi_size) case (when next_idx
== zero). Fix this by replacing bio->bi_idx with idx.
(This patch applies to 2.6.30-rc4.)
Signed-off-by: Kazuhisa Ichikawa <ki@epsilou.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

af498d7f

24 Apr, 2009 5 commits

cfq-iosched: cache prio_tree root in cfqq->p_root · f2d1f0ae

Jens Axboe authored 15 years ago


Currently we look it up from ->ioprio, but ->ioprio can change if
either the process gets its IO priority changed explicitly, or if
cfq decides to temporarily boost it. So if we are unlucky, we can
end up attempting to remove a node from a different rbtree root than
where it was added.

Fix this by using ->org_ioprio as the prio_tree index, since that
will only change for explicit IO priority settings (not for a boost).
Additionally cache the rbtree root inside the cfqq, then we don't have
to add code to reinsert the cfqq in the prio_tree if IO priority changes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

f2d1f0ae

cfq-iosched: fix bug with aliased request and cooperation detection · 3ac6c9f8

Jens Axboe authored 15 years ago


cfq_prio_tree_lookup() should return the direct match, yet it always
returns zero. Fix that.

cfq_prio_tree_add() assumes that we don't get a direct match, while
it is very possible that we do. Using O_DIRECT, you can have different
cfqq with matching requests, since you don't have the page cache
to serialize things for you. Fix this bug by only adding the cfqq if
there isn't an existing match.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

3ac6c9f8

cfq-iosched: clear ->prio_trees[] on cfqd alloc · 26a2ac00

Jens Axboe authored 15 years ago


Not strictly needed, but we should make it clear that we init the
rbtree roots here.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

26a2ac00

block: fix intermittent dm timeout based oops · 17d5c8ca

Hannes Reinecke authored 15 years ago


Very rarely under stress testing of dm, oopses are occuring as
something tampers with an old stack frame.  This has been traced back
to blk_abort_queue() leaving a timeout_list pointing to the stack.
The reason is that sometimes blk_abort_request() won't delete the
timer (if the request is marked as complete but before the timer has
been removed, a small race window).  Fix this by splicing back from
the ususally empty list to the q->timeout_list.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

17d5c8ca

block: simplify I/O stat accounting · 42dad764

Jerome Marchand authored 15 years ago


This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.

Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

42dad764

22 Apr, 2009 6 commits

cfq-iosched: use the default seek distance when there aren't enough seek samples · 04dc6e71

Jeff Moyer authored 15 years ago


If the cfq io context doesn't have enough samples yet to provide a mean
seek distance, then use the default threshold we have for seeky IO instead
of defaulting to 0.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

04dc6e71

cfq-iosched: make seek_mean converge more quickly · 4d00aa47

Jeff Moyer authored 15 years ago


Right now, depending on the first sector to which a process issues I/O,
the seek time may start out way out of whack. So make sure we start
with 0 sectors in seek, instead of the offset of the first request
issued.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

4d00aa47

block: make blk_abort_queue() ignore non-request based devices · b7591134

Jens Axboe authored 15 years ago


There's nothing to do for those devices, since the timeout handling is
based on requests.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b7591134

block: include empty disks in /proc/diskstats · 71982a40

Tejun Heo authored 15 years ago

/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions.  Commit
074a7aca

 accidentally changed the
behavior such that it doesn't print out zero sized disks.  This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in
/proc/diskstats.

Reported and bisectd by Dianel Collins.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Daniel Collins <solemnwarning@solemnwarning.no-ip.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

71982a40

block: fix queue bounce limit setting · cd0aca2d

Tejun Heo authored 15 years ago


Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily

All DMA address limits are expressed in terms of the last addressable
unit (byte or page) instead of one plus that.  However, when
determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(),
it compares the specified limit against 0x100000000UL to determine
whether it's below 4G ending up falsely setting GFP_DMA in
q->bounce_gfp.

As DMA zone is very small on x86_64, this makes larger SG_IO transfers
very eager to trigger OOM killer.  Fix it.  While at it, rename the
parameter to @dma_mask for clarity and convert comment to proper
winged style.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cd0aca2d

block: fix SG_IO vector request data length handling · 25636e28

Tejun Heo authored 15 years ago


Impact: fix SG_IO behavior such that it matches the documentation

SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the
shorter one wins.  However, the current implementation returns -EINVAL
for such cases.  Trim iovc if it's longer than ->dxfer_len.

This patch uses iov_*() helpers which take struct iovec * by casting
struct sg_iovec * to it.  sg_iovec is always identical to iovec and
this will be further cleaned up with later patches.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

25636e28

15 Apr, 2009 11 commits

cfq-iosched: add close cooperator code · a36e71f9

Jens Axboe authored 15 years ago


If we have processes that are working in close proximity to each
other on disk, we don't want to idle wait. Instead allow the close
process to issue a request, getting better aggregate bandwidth.
The anticipatory scheduler has similar checks, noop and deadline do
not need it since they don't care about process <-> io mappings.

The code for CFQ is a little more involved though, since we split
request queues into per-process contexts.

This fixes a performance problem with eg dump(8), since it uses
several processes in some silly attempt to speed IO up. Even if
dump(8) isn't really a valid case (it should be fixed by using
CLONE_IO), there are other cases where we see close processes
and where idling ends up hurting performance.

Credit goes to Jeff Moyer <jmoyer@redhat.com> for writing the
initial implementation.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

a36e71f9

cfq-iosched: log responsible 'cfqq' in idle timer arm · 9481ffdc
Jens Axboe authored 15 years ago
```
Makes it easier to read the traces.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
9481ffdc

cfq-iosched: tweak kick logic a bit more · 2d870722

Jens Axboe authored 15 years ago


We only kick the dispatch for an idling queue, if we think it's a
(somewhat) fully merged request. Also allow a kick if we have other
busy queues in the system, since we don't want to risk waiting for
a potential merge in that case. It's better to get some work done and
proceed.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

2d870722

cfq-iosched: no need to save interrupts in cfq_kick_queue() · 40bb54d1

Jens Axboe authored 15 years ago


It's called from the workqueue handlers from process context, so
we always have irqs enabled when entered.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

40bb54d1

block: Remove code handling bio_alloc failure with __GFP_WAIT · 15afd1cc

Nikanth Karthikesan authored 15 years ago


Remove code handling bio_alloc failure with __GFP_WAIT.
GFP_KERNEL implies __GFP_WAIT.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

15afd1cc

block: fix SG_IO to return a proper error value · 91e463c8

FUJITA Tomonori authored 15 years ago


blk_rq_unmap_user() returns -EFAULT if a program passes an invalid
address to kernel. SG_IO path needs to pass the returned value to user
space instead of ignoring it.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

91e463c8

cfq-iosched: don't delay queue kick for a merged request · d6ceb25e

Jens Axboe authored 15 years ago

"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> reports that commit
b029195d introduced a regression
of about 50% with sequential threaded read workloads. The test
case is:

tiotest -k0 -k1 -k3 -f 80 -t 32

which starts 32 threads each reading a 80MB file. Twiddle the kick
queue logic so that we do start IO immediately, if it appears to be
a fully merged request. We can't really detect that, so just check
if the request is bigger than a page or not. The assumption is that
since single bio issues will first queue a single request with just
one page attached and then later do merges on that, if we already
have more than a page worth of data in the request, then the request
is most likely good to go.

Verified that this doesn't cause a regression with the test case that
commit b029195d

 was fixing. It does not,
we still see maximum sized requests for the queue-then-merge cases.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

d6ceb25e

as-iosched: get rid of private REQ_SYNC/REQ_ASYNC defines · 1d6bfbdf

Jens Axboe authored 15 years ago


We can just use the block layer BLK_RW_SYNC/ASYNC defines now.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1d6bfbdf

cfq-iosched: get rid of private SYNC/ASYNC defines · ff6657c6

Jens Axboe authored 15 years ago


We can just use the block layer BLK_RW_SYNC/ASYNC defines now.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ff6657c6

cfq-iosched: use rw_is_sync() to see if rw flags are sync or not · b0b78f81
Jens Axboe authored 15 years ago
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
b0b78f81

block: fix bad spelling of quiesce · f600abe2

Jens Axboe authored 15 years ago


Credit goes to Andrew Morton for spotting this one.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

f600abe2

07 Apr, 2009 6 commits

cfq-iosched: don't let idling interfere with plugging · b029195d

Jens Axboe authored 15 years ago

When CFQ is waiting for a new request from a process, currently it'll
immediately restart queuing when it sees such a request. This doesn't
work very well with streamed IO, since we then end up splitting IO
that would otherwise have been merged nicely. For a simple dd test,
this causes 10x as many requests to be issued as we should have.
Normally this goes unnoticed due to the low overhead of requests
at the device side, but some hardware is very sensitive to request
sizes and there it can cause big slow downs.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b029195d

block: remove unused REQ_UNPLUG · 23853277

Jens Axboe authored 15 years ago

The request inherits the unplug flag from the bio, but it isn't actually
used. The bio flag stops at __make_request(), which tells it to unplug
after submission. Passing it on to the request doesn't make any sense.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

23853277

cfq-iosched: kill two unused cfqq flags · 75e50984

Jens Axboe authored 15 years ago


We only manipulate the must_dispatch and queue_new flags, they are not
tested anymore. So get rid of them.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

75e50984

cfq-iosched: change dispatch logic to deal with single requests at the time · 2f5cb738

Jens Axboe authored 15 years ago

The IO scheduler core calls into the IO scheduler dispatch_request hook
to move requests from the IO scheduler and into the driver dispatch
list. It only does so when the dispatch list is empty. CFQ moves several
requests to the dispatch list, which can cause higher latencies if we
suddenly have to switch to some important sync IO. Change the logic to
move one request at the time instead.

This should almost be functionally equivalent to what we did before,
except that we now honor 'quantum' as the maximum queue depth at the
device side from any single cfqq. If there's just a single active
cfqq, we allow up to 4 times the normal quantum.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

2f5cb738

block: fix inconsistency in I/O stat accounting code · 26308eab

Jerome Marchand authored 15 years ago


This forces in_flight to be zero when turning off or on the I/O stat
accounting and stops updating I/O stats in attempt_merge() when
accounting is turned off.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

26308eab

block: elevator quiescing helpers · 6c7e8cee

Jens Axboe authored 15 years ago


Simple helper functions to quiesce the request queue. These are
currently only used for switching IO schedulers on-the-fly, but
we can use them to properly switch IO accounting on and off as well.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

6c7e8cee

06 Apr, 2009 4 commits

pata_artop: typo · 8feb4d20

Alan Cox authored 15 years ago


Fix a typo (this was in the original patch but was not merged when the code
fixes were for some reason)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

8feb4d20

block: Add flag for telling the IO schedulers NOT to anticipate more IO · aeb6fafb

Jens Axboe authored 15 years ago


By default, CFQ will anticipate more IO from a given io context if the
previously completed IO was sync. This used to be fine, since the only
sync IO was reads and O_DIRECT writes. But with more "normal" sync writes
being used now, we don't want to anticipate for those.

Add a bio/request flag that informs the IO scheduler that this is a sync
request that we should not idle for. Introduce WRITE_ODIRECT specifically
for O_DIRECT writes, and make sure that the other sync writes set this
flag.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aeb6fafb

block: enabling plugging on SSD devices that don't do queuing · 644b2d99

Jens Axboe authored 15 years ago


For the older SSD devices that don't do command queuing, we do want to
enable plugging to get better merging.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

644b2d99

block: change the request allocation/congestion logic to be sync/async based · 1faa16d2

Jens Axboe authored 15 years ago


This makes sure that we never wait on async IO for sync requests, instead
of doing the split on writes vs reads.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1faa16d2

03 Apr, 2009 1 commit

blktrace: fix pdu_len when tracing packet command requests · e2494e1b

Li Zefan authored 15 years ago

Impact: output all of packet commands - not just the first 4 / 8 bytes

Since commit d7e3c324

 ("block: add
large command support"), struct request->cmd has been changed from
unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd.

v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

- make sure rq->cmd_len is always intialized, and then we can use
  rq->cmd_len instead of BLK_MAX_CDB.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
LKML-Reference: <49D4507E.2060602@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

e2494e1b

26 Mar, 2009 2 commits

bsg: Remove bogus check against request_queue->max_sectors · e7cbbf1b

Boaz Harrosh authored 16 years ago


bsg submits REQ_TYPE_BLOCK_PC so the right check is max_hw_sectors.
But I've removed this check because right after, bsg proceeds with
calling blk_rq_map_user() which does all the right checks.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

e7cbbf1b

block: WARN in __blk_put_request() for potential bio leak · 1cd96c24

Boaz Harrosh authored 16 years ago


Put a WARN_ON in __blk_put_request if it is about to
leak bio(s). This is a serious bug that can happen in error
handling code paths.

For this to work I have fixed a couple of places in block/ where
request->bio != NULL ownership was not honored. And a small cleanup
at sg_io() while at it.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1cd96c24

24 Mar, 2009 3 commits

bsg: add support for tail queuing · 05378940

Boaz Harrosh authored 16 years ago


Currently inherited from sg.c bsg will submit asynchronous request
 at the head-of-the-queue, (using "at_head" set in the call to
 blk_execute_rq_nowait()). This is bad in situation where the queues
 are full, requests will execute out of order, and can cause
 starvation of the first submitted requests.

The sg_io_v4->flags member is used and a bit is allocated to denote the
Q_AT_TAIL. Zero is to queue at_head as before, to be compatible with old
code at the write/read path. SG_IO code path behavior was changed so to
be the same as write/read behavior. SG_IO was very rarely used and breaking
compatibility with it is OK at this stage.

sg_io_hdr at sg.h also has a flags member and uses 3 bits from the first
nibble and one bit from the last nibble. Even though none of these bits
are supported by bsg, The second nibble is allocated for use by bsg. Just
in case.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
CC: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

05378940

block: get rid of unused blkdev_free_rq() define · 50e17493
Jens Axboe authored 16 years ago
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
50e17493

block: remove various blk_queue_*() setting functions in blk_init_queue_node() · f3b144aa

Jens Axboe authored 16 years ago


It calls blk_queue_make_request(), which sets the identical set of limits.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

f3b144aa

13 Mar, 2009 1 commit

cpumask: use topology_core_cpumask/topology_thread_cpumask instead of cpu_core_map/cpu_sibling_map · c69fc56d

Rusty Russell authored 16 years ago


Impact: cleanup

This is presumably what those definitions are for, and while all archs
define cpu_core_map/cpu_sibling map, that's changing (eg. x86 wants to
change it to a pointer).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

c69fc56d