- 22 Jun, 2009 1 commit
-
-
Mike Snitzer authored
Add .iterate_devices to 'struct target_type' to allow a function to be called for all devices in a DM target. Implemented it for all targets except those in dm-snap.c (origin and snapshot). (The raid1 version number jumps to 1.12 because we originally reserved 1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors' instead.) Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Cc: martin.petersen@oracle.com
-
- 15 Apr, 2009 1 commit
-
-
Christoph Hellwig authored
It's used by DM and MD and generally useful, so move the bio list helpers into bio.h. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Alasdair G Kergon <agk@redhat.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 02 Apr, 2009 2 commits
-
-
Jonathan Brassow authored
The logging API needs an extra function to make cluster mirroring possible. This new function allows us to check whether a mirror region is being recovered on another machine in the cluster. This helps us prevent simultaneous recovery I/O and process I/O to the same locations on disk. Cluster-aware log modules will implement this function. Single machine log modules will not. So, there is no performance penalty for single machine mirrors. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Acked-by:
Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Mikulas Patocka authored
With my previous patch to save bi_io_vec, the size of dm_raid1_read_record is significantly increased (the vector list takes 3072 bytes on 32-bit machines and 4096 bytes on 64-bit machines). The structure dm_raid1_read_record used to be allocated with kmalloc, but kmalloc aligns the size on the next power-of-two so an object slightly greater than 4096 will allocate 8192 bytes of memory and half of that memory will be wasted. This patch turns kmalloc into a slab cache which doesn't have this padding so it will reduce the memory consumed. Cc: stable@kernel.org Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
- 05 Jan, 2009 3 commits
-
-
Milan Broz authored
Move log size validation from mirror target to log constructor. Removed PAGE_SIZE restriction we no longer think necessary. Signed-off-by:
Milan Broz <mbroz@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Mikulas Patocka authored
Change dm_unregister_target to return void and use BUG() for error reporting. dm_unregister_target can only fail because of programming bug in the target driver. It can't fail because of user's behavior or disk errors. This patch changes unregister_target to return void and use BUG if someone tries to unregister non-registered target or unregister target that is in use. This patch removes code duplication (testing of error codes in all dm targets) and reports bugs in just one place, in dm_unregister_target. In some target drivers, these return codes were ignored, which could lead to a situation where bugs could be missed. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Jonathan Brassow authored
Always increase the error count when I/O on a leg of a mirror fails. The error count is used to decide whether to select an alternative mirror leg. If the target doesn't use the "handle_errors" feature, the error count is not updated and the bio can get requeued forever by the read callback. Fix it by increasing error_count before the handle_errors feature checking. Cc: stable@kernel.org Signed-off-by:
Milan Broz <mbroz@redhat.com> Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
- 13 Nov, 2008 1 commit
-
-
Mikulas Patocka authored
We queue work on keventd queue --- so this queue must be flushed in the destructor. Otherwise, keventd could access mirror_set after it was freed. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org
-
- 30 Oct, 2008 1 commit
-
-
Ilpo Jarvinen authored
Missing braces. Commit 1f965b19 (dm raid1: separate region_hash interface part1) broke it. Signed-off-by:
Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Cc: Heinz Mauelshagen <hjm@redhat.com>
-
- 21 Oct, 2008 3 commits
-
-
Heinz Mauelshagen authored
Separate the region hash code from raid1 so it can be shared by forthcoming targets. Use BUG_ON() for failed async dm_io() calls. Signed-off-by:
Heinz Mauelshagen <hjm@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Mikulas Patocka authored
Change #include "dm.h" to #include <linux/device-mapper.h> in all targets. Targets should not need direct access to internal DM structures. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Mikulas Patocka authored
Move array_too_big to include/linux/device-mapper.h because it is used by targets. Remove the test from dm-raid1 as the number of mirror legs is limited such that it can never fail. (Even for stripes it seems rather unlikely.) Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
- 10 Oct, 2008 1 commit
-
-
Jonathan Brassow authored
dm-raid1 is setting the 'DM_KCOPYD_IGNORE_ERROR' flag unconditionally when assigning kcopyd work. kcopyd is responsible for copying an assigned section of disk to one or more other disks. The 'DM_KCOPYD_IGNORE_ERROR' flag affects kcopyd in the following way: When not set: kcopyd will immediately stop the copy operation when an error is encountered. When set: kcopyd will try to proceed regardless of errors and try to continue copying any remaining amount. Since dm-raid1 tracks regions of the address space that are (or are not) in sync and it now has the ability to handle these errors, we can safely enable this optimization. This optimization is conditional on whether mirror error handling has been enabled. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
- 25 Apr, 2008 8 commits
-
-
Mikulas Patocka authored
Remove an avoidable 3ms delay on some dm-raid1 and kcopyd I/O. It is specified that any submitted bio without BIO_RW_SYNC flag may plug the queue (i.e. block the requests from being dispatched to the physical device). The queue is unplugged when the caller calls blk_unplug() function. Usually, the sequence is that someone calls submit_bh to submit IO on a buffer. The IO plugs the queue and waits (to be possibly joined with other adjacent bios). Then, when the caller calls wait_on_buffer(), it unplugs the queue and submits the IOs to the disk. This was happenning: When doing O_SYNC writes, function fsync_buffers_list() submits a list of bios to dm_raid1, the bios are added to dm_raid1 write queue and kmirrord is woken up. fsync_buffers_list() calls wait_on_buffer(). That unplugs the queue, but there are no bios on the device queue as they are still in the dm_raid1 queue. wait_on_buffer() starts waiting until the IO is finished. kmirrord is scheduled, kmirrord takes bios and submits them to the devices. The submitted bio plugs the harddisk queue but there is no one to unplug it. (The process that called wait_on_buffer() is already sleeping.) So there is a 3ms timeout, after which the queues on the harddisks are unplugged and requests are processed. This 3ms timeout meant that in certain workloads (e.g. O_SYNC, 8kb writes), dm-raid1 is 10 times slower than md raid1. Every time we submit something asynchronously via dm_io, we must unplug the queue actually to send the request to the device. This patch adds an unplug call to kmirrord - while processing requests, it keeps the queue plugged (so that adjacent bios can be merged); when it finishes processing all the bios, it unplugs the queue to submit the bios. It also fixes kcopyd which has the same potential problem. All kcopyd requests are submitted with BIO_RW_SYNC. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Acked-by:
Jens Axboe <jens.axboe@oracle.com>
-
Mikulas Patocka authored
This patch replaces the schedule() in the main kmirrord thread with a timer. The schedule() could introduce an unwanted delay when work is ready to be processed. The code instead calls wake() when there's work to be done immediately, and delayed_wake() after a failure to give a short delay before retrying. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Alasdair G Kergon authored
Publish the dm-io, dm-log and dm-kcopyd headers in include/linux. Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Heinz Mauelshagen authored
Clean up the dm-log interface to prepare for publishing it in include/linux. Signed-off-by:
Heinz Mauelshagen <hjm@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Heinz Mauelshagen authored
Clean up the kcopyd interface to prepare for publishing it in include/linux. Signed-off-by:
Heinz Mauelshagen <hjm@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Heinz Mauelshagen authored
Clean up the dm-io interface to prepare for publishing it in include/linux. Signed-off-by:
Heinz Mauelshagen <hjm@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Heinz Mauelshagen authored
Move the dirty region log code into a separate module so other targets can share the code. Signed-off-by:
Heinz Mauelshagen <hjm@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Robert P. J. Day authored
Use shorter list_splice_init() for brevity. Signed-off-by:
Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
- 28 Mar, 2008 1 commit
-
-
Alasdair G Kergon authored
write_err is an unsigned long used with set_bit() so should not be passed around as unsigned int. http://bugzilla.kernel.org/show_bug.cgi?id=10271 Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 19 Feb, 2008 1 commit
-
-
Adrian Bunk authored
This patch fixes two NULL dereferences introduced by commit 06386bbf and spotted by the Coverity checker. Signed-off-by:
Adrian Bunk <bunk@kernel.org> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 13 Feb, 2008 1 commit
-
-
Al Viro authored
test_and_set_bit() on address of uint32_t is a Bad Idea(tm)... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 07 Feb, 2008 5 commits
-
-
Jonathan Brassow authored
This patch adds extra information to the mirror status output, so that it can be determined which device(s) have failed. For each mirror device, a character is printed indicating the most severe error encountered. The characters are: * A => Alive - No failures * D => Dead - A write failure occurred leaving mirror out-of-sync * S => Sync - A sychronization failure occurred, mirror out-of-sync * R => Read - A read failure occurred, mirror data unaffected This allows userspace to properly reconfigure the mirror set. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Jonathan Brassow authored
This patch gives the ability to respond-to/record device failures that happen during read operations. It also adds the ability to read from mirror devices that are not the primary if they are in-sync. There are essentially two read paths in mirroring; the direct path and the queued path. When a read request is mapped, if the region is 'in-sync' the direct path is taken; otherwise the queued path is taken. If the direct path is taken, we must record bio information so that if the read fails we can retry it. We then discover the status of a direct read through mirror_end_io. If the read has failed, we will mark the device from which the read was attempted as failed (so we don't try to read from it again), restore the bio and try again. If the queued path is taken, we discover the results of the read from 'read_callback'. If the device failed, we will mark the device as failed and attempt the read again if there is another device where this region is known to be 'in-sync'. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Jonathan Brassow authored
This patch adds the ability to requeue write I/O to core device-mapper when there is a log device failure. If a write to the log produces and error, the pending writes are put on the "failures" list. Since the log is marked as failed, they will stay on the failures list until a suspend happens. Suspends come in two phases, presuspend and postsuspend. We must make sure that all the writes on the failures list are requeued in the presuspend phase (a requirement of dm core). This means that recovery must be complete (because writes may be delayed behind it) and the failures list must be requeued before we return from presuspend. The mechanisms to ensure recovery is complete (or stopped) was already in place, but needed to be moved from postsuspend to presuspend. We rely on 'flush_workqueue' to ensure that the mirror thread is complete and therefore, has requeued all writes in the failures list. Because we are using flush_workqueue, we must ensure that no additional 'queue_work' calls will produce additional I/O that we need to requeue (because once we return from presuspend, we are unable to do anything about it). 'queue_work' is called in response to the following functions: - complete_resync_work = NA, recovery is stopped - rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it is ready to recover the region (recovery is stopped) or it needs to clear the region in the log* **this doesn't get called while suspending** - rh_recovery_end = NA, recovery is stopped - rh_recovery_start = NA, recovery is stopped - write_callback = 1) Writes w/o failures simply call bio_endio -> mirror_end_io -> rh_dec (see rh_dec above) 2) Writes with failures are put on the failures list and queue_work is called** ** write_callbacks don't happen during suspend ** - do_failures = NA, 'queue_work' not called if suspending - add_mirror (initialization) = NA, only done on mirror creation - queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue is called. 2) No more I/Os are being issued. 3) Re-attempted READs can still be handled. (Write completions are handled through rh_dec/ write_callback - mention above - and do not use queue_bio.) Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Jonathan Brassow authored
This patch adds the calls to 'fail_mirror' if an error occurs during mirror recovery (aka resynchronization). 'fail_mirror' is responsible for recording the type of error by mirror device and ensuring an event gets raised for the purpose of notifying userspace. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Jonathan Brassow authored
This patch gives mirror the ability to handle device failures during normal write operations. The 'write_callback' function is called when a write completes. If all the writes failed or succeeded, we report failure or success respectively. If some of the writes failed, we call fail_mirror; which increments the error count for the device, notes the type of error encountered (DM_RAID1_WRITE_ERROR), and selects a new primary (if necessary). Note that the primary device can never change while the mirror is not in-sync (IOW, while recovery is happening.) This means that the scenario where a failed write changes the primary and gives recovery_complete a chance to misread the primary never happens. The fact that the primary can change has necessitated the change to the default_mirror field. We need to protect against reading garbage while the primary changes. We then add the bio to a new list in the mirror set, 'failures'. For every bio in the 'failures' list, we call a new function, '__bio_mark_nosync', where we mark the region 'not-in-sync' in the log and properly set the region state as, RH_NOSYNC. Userspace must also be notified of the failure. This is done by 'raising an event' (dm_table_event()). If fail_mirror is called in process context the event can be raised right away. If in interrupt context, the event is deferred to the kmirrord thread - which raises the event if 'event_waiting' is set. Backwards compatibility is maintained by ignoring errors if the DM_FEATURES_HANDLE_ERRORS flag is not present. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
- 19 Oct, 2007 4 commits
-
-
Jonathan Brassow authored
Store a pointer to the owning mirror_set structure within each mirror structure for a subsequent patch to use. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Jonathan Brassow authored
There are now two phases to a suspend in device-mapper - presuspend and postsuspend. This patch removes the single 'suspend' in the logging API and replaces it with 'presuspend' and 'postsuspend' functions to align it better with core device-mapper. A subsequent patch will make use of 'presuspend'. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
vignesh babu authored
Replacing n & (n - 1) for power of 2 check by is_power_of_2(n) Signed-off-by:
vignesh babu <vignesh.babu@wipro.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
Dmitry Monakhov authored
Add missing 'dm_io_client_destroy' to alloc_context error path. Reorganize mirror constructor error path in order to prevent workqueue leakage. Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
Alasdair G Kergon <agk@redhat.com>
-
- 10 Oct, 2007 1 commit
-
-
NeilBrown authored
As bi_end_io is only called once when the reqeust is complete, the 'size' argument is now redundant. Remove it. Now there is no need for bio_endio to subtract the size completed from bi_size. So don't do that either. While we are at it, change bi_end_io to return void. Signed-off-by:
Neil Brown <neilb@suse.de> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 19 Jul, 2007 1 commit
-
-
Yoann Padioleau authored
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc). Here is a short excerpt of the semantic patch performing this transformation: @@ type T2; expression x; identifier f,fld; expression E; expression E1,E2; expression e1,e2,e3,y; statement S; @@ x = - kmalloc + kzalloc (E1,E2) ... when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\) - memset((T2)x,0,E1); @@ expression E1,E2,E3; @@ - kzalloc(E1 * E2,E3) + kcalloc(E1,E2,E3) [akpm@linux-foundation.org: get kcalloc args the right way around] Signed-off-by:
Yoann Padioleau <padator@wanadoo.fr> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by:
Russell King <rmk@arm.linux.org.uk> Cc: Bryan Wu <bryan.wu@analog.com> Acked-by:
Jiri Slaby <jirislaby@gmail.com> Cc: Dave Airlie <airlied@linux.ie> Acked-by:
Roland Dreier <rolandd@cisco.com> Cc: Jiri Kosina <jkosina@suse.cz> Acked-by:
Dmitry Torokhov <dtor@mail.ru> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Mauro Carvalho Chehab <mchehab@infradead.org> Acked-by:
Pierre Ossman <drzeus-list@drzeus.cx> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by:
Greg KH <greg@kroah.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 12 Jul, 2007 5 commits
-
-
Jonathan Brassow authored
When writing to a mirror, the log must be updated first. Failure to update the log could result in the log not properly reflecting the state of the mirror if the machine should crash. We change the return type of the rh_flush function to give us the ability to check if a log write was successful. If the log write was unsuccessful, we fail the writes to avoid the case where the log does not properly reflect the state of the mirror. A follow-up patch - which is dependent on the ability to requeue I/O's to core device-mapper - will requeue the I/O's for retry (allowing the mirror to be reconfigured.) Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Jonathan Brassow authored
Device-mapper mirroring currently takes a best effort approach to recovery - failures during mirror synchronization are completely ignored. This means that regions are marked 'in-sync' and 'clean' and removed from the hash list. Future reads and writes that query the region will incorrectly interpret the region as in-sync. This patch handles failures during the recovery process. If a failure occurs, the region is marked as 'not-in-sync' (aka RH_NOSYNC) and added to a new list 'failed_recovered_regions'. Regions on the 'failed_recovered_regions' list are not marked as 'clean' upon removal from the list. Furthermore, if the DM_RAID1_HANDLE_ERRORS flag is set, the region is marked as 'not-in-sync'. This action prevents any future read-balancing from choosing an invalid device because of the 'not-in-sync' status. If "handle_errors" is not specified when creating a mirror (leaving the DM_RAID1_HANDLE_ERRORS flag unset), failures will be ignored exactly as they would be without this patch. This is to preserve backwards compatibility with user-space tools, such as 'pvmove'. However, since future read-balancing policies will rely on the correct sync status of a region, a user must choose "handle_errors" when using read-balancing. Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Jonathan Brassow authored
A clear_region function is permitted to block (in practice, rare) but gets called in rh_update_states() with a spinlock held. The bits being marked and cleared by the above functions are used to update the on-disk log, but are never read directly. We can perform these operations outside the spinlock since the bits are only changed within one thread viz. - mark_region in rh_inc() - clear_region in rh_update_states(). So, we grab the clean_regions list items via list_splice() within the spinlock and defer clear_region() until we iterate over the list for deletion - similar to how the recovered_regions list is already handled. We then move the flush() call down to ensure it encapsulates the changes which are done by the later calls to clear_region(). Signed-off-by:
Jonathan Brassow <jbrassow@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Milan Broz authored
Fix mirror status line broken in dm-log-report-fault-status.patch: - space missing between two words - placeholder ("0") required for compatibility with a subsequent patch - incorrect offset parameter Cc: stable@kernel.org Signed-off-by:
Milan Broz <mbroz@redhat.com> Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Alasdair G Kergon authored
Remove explicit module name from messages as the macro now includes it automatically. Signed-off-by:
Alasdair G Kergon <agk@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-