- 12 May, 2009 1 commit
-
-
Kazuhisa Ichikawa authored
Current bio_vec array index out-of-bounds test within __end_that_request_first() does not seem correct. It checks bio->bi_idx against bio->bi_vcnt, but the subsequent code uses idx (which is, bio->bi_idx + next_idx) as the array index into bio_vec array. This means that the test really make sense only at the first iteration of !(nr_bytes >=bio->bi_size) case (when next_idx == zero). Fix this by replacing bio->bi_idx with idx. (This patch applies to 2.6.30-rc4.) Signed-off-by:
Kazuhisa Ichikawa <ki@epsilou.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 24 Apr, 2009 5 commits
-
-
Jens Axboe authored
Currently we look it up from ->ioprio, but ->ioprio can change if either the process gets its IO priority changed explicitly, or if cfq decides to temporarily boost it. So if we are unlucky, we can end up attempting to remove a node from a different rbtree root than where it was added. Fix this by using ->org_ioprio as the prio_tree index, since that will only change for explicit IO priority settings (not for a boost). Additionally cache the rbtree root inside the cfqq, then we don't have to add code to reinsert the cfqq in the prio_tree if IO priority changes. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
cfq_prio_tree_lookup() should return the direct match, yet it always returns zero. Fix that. cfq_prio_tree_add() assumes that we don't get a direct match, while it is very possible that we do. Using O_DIRECT, you can have different cfqq with matching requests, since you don't have the page cache to serialize things for you. Fix this bug by only adding the cfqq if there isn't an existing match. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Not strictly needed, but we should make it clear that we init the rbtree roots here. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Hannes Reinecke authored
Very rarely under stress testing of dm, oopses are occuring as something tampers with an old stack frame. This has been traced back to blk_abort_queue() leaving a timeout_list pointing to the stack. The reason is that sometimes blk_abort_request() won't delete the timer (if the request is marked as complete but before the timer has been removed, a small race window). Fix this by splicing back from the ususally empty list to the q->timeout_list. Signed-off-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jerome Marchand authored
This simplifies I/O stat accounting switching code and separates it completely from I/O scheduler switch code. Requests are accounted according to the state of their request queue at the time of the request allocation. There is no need anymore to flush the request queue when switching I/O accounting state. Signed-off-by:
Jerome Marchand <jmarchan@redhat.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 22 Apr, 2009 6 commits
-
-
Jeff Moyer authored
If the cfq io context doesn't have enough samples yet to provide a mean seek distance, then use the default threshold we have for seeky IO instead of defaulting to 0. Signed-off-by:
Jeff Moyer <jmoyer@redhat.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jeff Moyer authored
Right now, depending on the first sector to which a process issues I/O, the seek time may start out way out of whack. So make sure we start with 0 sectors in seek, instead of the offset of the first request issued. Signed-off-by:
Jeff Moyer <jmoyer@redhat.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
There's nothing to do for those devices, since the timeout handling is based on requests. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Tejun Heo authored
/proc/diskstats used to show stats for all disks whether they're zero-sized or not and their non-zero partitions. Commit 074a7aca accidentally changed the behavior such that it doesn't print out zero sized disks. This patch implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and uses it in diskstats_show() such that empty part0 is shown in /proc/diskstats. Reported and bisectd by Dianel Collins. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Daniel Collins <solemnwarning@solemnwarning.no-ip.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Tejun Heo authored
Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily All DMA address limits are expressed in terms of the last addressable unit (byte or page) instead of one plus that. However, when determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(), it compares the specified limit against 0x100000000UL to determine whether it's below 4G ending up falsely setting GFP_DMA in q->bounce_gfp. As DMA zone is very small on x86_64, this makes larger SG_IO transfers very eager to trigger OOM killer. Fix it. While at it, rename the parameter to @dma_mask for clarity and convert comment to proper winged style. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Tejun Heo authored
Impact: fix SG_IO behavior such that it matches the documentation SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the shorter one wins. However, the current implementation returns -EINVAL for such cases. Trim iovc if it's longer than ->dxfer_len. This patch uses iov_*() helpers which take struct iovec * by casting struct sg_iovec * to it. sg_iovec is always identical to iovec and this will be further cleaned up with later patches. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 15 Apr, 2009 11 commits
-
-
Jens Axboe authored
If we have processes that are working in close proximity to each other on disk, we don't want to idle wait. Instead allow the close process to issue a request, getting better aggregate bandwidth. The anticipatory scheduler has similar checks, noop and deadline do not need it since they don't care about process <-> io mappings. The code for CFQ is a little more involved though, since we split request queues into per-process contexts. This fixes a performance problem with eg dump(8), since it uses several processes in some silly attempt to speed IO up. Even if dump(8) isn't really a valid case (it should be fixed by using CLONE_IO), there are other cases where we see close processes and where idling ends up hurting performance. Credit goes to Jeff Moyer <jmoyer@redhat.com> for writing the initial implementation. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Makes it easier to read the traces. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
We only kick the dispatch for an idling queue, if we think it's a (somewhat) fully merged request. Also allow a kick if we have other busy queues in the system, since we don't want to risk waiting for a potential merge in that case. It's better to get some work done and proceed. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
It's called from the workqueue handlers from process context, so we always have irqs enabled when entered. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Nikanth Karthikesan authored
Remove code handling bio_alloc failure with __GFP_WAIT. GFP_KERNEL implies __GFP_WAIT. Signed-off-by:
Nikanth Karthikesan <knikanth@suse.de> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
blk_rq_unmap_user() returns -EFAULT if a program passes an invalid address to kernel. SG_IO path needs to pass the returned value to user space instead of ignoring it. Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> reports that commit b029195d introduced a regression of about 50% with sequential threaded read workloads. The test case is: tiotest -k0 -k1 -k3 -f 80 -t 32 which starts 32 threads each reading a 80MB file. Twiddle the kick queue logic so that we do start IO immediately, if it appears to be a fully merged request. We can't really detect that, so just check if the request is bigger than a page or not. The assumption is that since single bio issues will first queue a single request with just one page attached and then later do merges on that, if we already have more than a page worth of data in the request, then the request is most likely good to go. Verified that this doesn't cause a regression with the test case that commit b029195d was fixing. It does not, we still see maximum sized requests for the queue-then-merge cases. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
We can just use the block layer BLK_RW_SYNC/ASYNC defines now. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
We can just use the block layer BLK_RW_SYNC/ASYNC defines now. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Credit goes to Andrew Morton for spotting this one. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 07 Apr, 2009 6 commits
-
-
Jens Axboe authored
When CFQ is waiting for a new request from a process, currently it'll immediately restart queuing when it sees such a request. This doesn't work very well with streamed IO, since we then end up splitting IO that would otherwise have been merged nicely. For a simple dd test, this causes 10x as many requests to be issued as we should have. Normally this goes unnoticed due to the low overhead of requests at the device side, but some hardware is very sensitive to request sizes and there it can cause big slow downs. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
The request inherits the unplug flag from the bio, but it isn't actually used. The bio flag stops at __make_request(), which tells it to unplug after submission. Passing it on to the request doesn't make any sense. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
We only manipulate the must_dispatch and queue_new flags, they are not tested anymore. So get rid of them. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
The IO scheduler core calls into the IO scheduler dispatch_request hook to move requests from the IO scheduler and into the driver dispatch list. It only does so when the dispatch list is empty. CFQ moves several requests to the dispatch list, which can cause higher latencies if we suddenly have to switch to some important sync IO. Change the logic to move one request at the time instead. This should almost be functionally equivalent to what we did before, except that we now honor 'quantum' as the maximum queue depth at the device side from any single cfqq. If there's just a single active cfqq, we allow up to 4 times the normal quantum. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jerome Marchand authored
This forces in_flight to be zero when turning off or on the I/O stat accounting and stops updating I/O stats in attempt_merge() when accounting is turned off. Signed-off-by:
Jerome Marchand <jmarchan@redhat.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Simple helper functions to quiesce the request queue. These are currently only used for switching IO schedulers on-the-fly, but we can use them to properly switch IO accounting on and off as well. Signed-off-by:
Jerome Marchand <jmarchan@redhat.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 06 Apr, 2009 4 commits
-
-
Alan Cox authored
Fix a typo (this was in the original patch but was not merged when the code fixes were for some reason) Signed-off-by:
Alan Cox <alan@redhat.com> Signed-off-by:
Jeff Garzik <jgarzik@redhat.com>
-
Jens Axboe authored
By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Jens Axboe authored
For the older SSD devices that don't do command queuing, we do want to enable plugging to get better merging. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Jens Axboe authored
This makes sure that we never wait on async IO for sync requests, instead of doing the split on writes vs reads. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 03 Apr, 2009 1 commit
-
-
Li Zefan authored
Impact: output all of packet commands - not just the first 4 / 8 bytes Since commit d7e3c324 ("block: add large command support"), struct request->cmd has been changed from unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd. v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> - make sure rq->cmd_len is always intialized, and then we can use rq->cmd_len instead of BLK_MAX_CDB. Signed-off-by:
Li Zefan <lizf@cn.fujitsu.com> Acked-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> LKML-Reference: <49D4507E.2060602@cn.fujitsu.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
- 26 Mar, 2009 2 commits
-
-
Boaz Harrosh authored
bsg submits REQ_TYPE_BLOCK_PC so the right check is max_hw_sectors. But I've removed this check because right after, bsg proceeds with calling blk_rq_map_user() which does all the right checks. Signed-off-by:
Boaz Harrosh <bharrosh@panasas.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Boaz Harrosh authored
Put a WARN_ON in __blk_put_request if it is about to leak bio(s). This is a serious bug that can happen in error handling code paths. For this to work I have fixed a couple of places in block/ where request->bio != NULL ownership was not honored. And a small cleanup at sg_io() while at it. Signed-off-by:
Boaz Harrosh <bharrosh@panasas.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 24 Mar, 2009 3 commits
-
-
Boaz Harrosh authored
Currently inherited from sg.c bsg will submit asynchronous request at the head-of-the-queue, (using "at_head" set in the call to blk_execute_rq_nowait()). This is bad in situation where the queues are full, requests will execute out of order, and can cause starvation of the first submitted requests. The sg_io_v4->flags member is used and a bit is allocated to denote the Q_AT_TAIL. Zero is to queue at_head as before, to be compatible with old code at the write/read path. SG_IO code path behavior was changed so to be the same as write/read behavior. SG_IO was very rarely used and breaking compatibility with it is OK at this stage. sg_io_hdr at sg.h also has a flags member and uses 3 bits from the first nibble and one bit from the last nibble. Even though none of these bits are supported by bsg, The second nibble is allocated for use by bsg. Just in case. Signed-off-by:
Boaz Harrosh <bharrosh@panasas.com> CC: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
It calls blk_queue_make_request(), which sets the identical set of limits. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 13 Mar, 2009 1 commit
-
-
Rusty Russell authored
Impact: cleanup This is presumably what those definitions are for, and while all archs define cpu_core_map/cpu_sibling map, that's changing (eg. x86 wants to change it to a pointer). Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-