1. 11 May, 2009 1 commit
    • Tejun Heo's avatar
      block: convert to pos and nr_sectors accessors · 83096ebf
      Tejun Heo authored
      
      With recent cleanups, there is no place where low level driver
      directly manipulates request fields.  This means that the 'hard'
      request fields always equal the !hard fields.  Convert all
      rq->sectors, nr_sectors and current_nr_sectors references to
      accessors.
      
      While at it, drop superflous blk_rq_pos() < 0 test in swim.c.
      
      [ Impact: use pos and nr_sectors accessors ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Tested-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Acked-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Tested-by: default avatarAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: default avatarAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: default avatarMike Miller <mike.miller@hp.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Paul Clements <paul.clements@steeleye.com>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Dario Ballabio <ballabio_dario@emc.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: unsik Kim <donari75@gmail.com>
      Cc: Laurent Vivier <Laurent@lvivier.info>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      83096ebf
  2. 28 Apr, 2009 1 commit
    • Tejun Heo's avatar
      block: kill blk_start_queueing() · a7f55792
      Tejun Heo authored
      
      blk_start_queueing() is identical to __blk_run_queue() except that it
      doesn't check for recursion.  None of the current users depends on
      blk_start_queueing() running request_fn directly.  Replace usages of
      blk_start_queueing() with [__]blk_run_queue() and kill it.
      
      [ Impact: removal of mostly duplicate interface function ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a7f55792
  3. 15 Apr, 2009 1 commit
  4. 29 Dec, 2008 2 commits
  5. 09 Oct, 2008 3 commits
  6. 26 Jul, 2008 1 commit
  7. 03 Jul, 2008 1 commit
    • Jens Axboe's avatar
      as-iosched: properly protect ioc_gone and ioc count · 863fddcb
      Jens Axboe authored
      
      If we have multiple tasks freeing io contexts when as-iosched
      is being unloaded, we could complete() ioc_gone twice. Fix that by
      protecting ioc_gone complete() and clearing with a spinlock for
      just that purpose. Doesn't matter from a performance perspective,
      since it'll only enter that path when ioc_gone != NULL (when as-iosched
      is being rmmod'ed).
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      863fddcb
  8. 01 Jul, 2008 1 commit
    • Divyesh Shah's avatar
      block: Fix the starving writes bug in the anticipatory IO scheduler · d585d0b9
      Divyesh Shah authored
      
      AS scheduler alternates between issuing read and write batches. It does
      the batch switch only after all requests from the previous batch are
      completed.
      
      When switching to a write batch, if there is an on-going read request,
      it waits for its completion and indicates its intention of switching by
      setting ad->changed_batch and the new direction but does not update the
      batch_expire_time for the new write batch which it does in the case of
      no previous pending requests.
      On completion of the read request, it sees that we were waiting for the
      switch and schedules work for kblockd right away and resets the
      ad->changed_data flag.
      Now when kblockd enters dispatch_request where it is expected to pick
      up a write request, it in turn ends the write batch because the
      batch_expire_timer was not updated and shows the expire timestamp for
      the previous batch.
      
      This results in the write starvation for all the cases where there is
      the intention for switching to a write batch, but there is a previous
      in-flight read request and the batch gets reverted to a read_batch
      right away.
      
      This also holds true in the reverse case (switching from a write batch
      to a read batch with an in-flight write request).
      
      I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and
      linux-2.6-block git HEAD. I've tested the fix on x86 platforms with
      SCSI drives where the driver asks for the next request while a current
      request is in-flight.
      
      This patch is based off linux-2.6-block git HEAD.
      
      Bug reproduction:
      A simple scenario which reproduces this bug is:
      - dd if=/dev/hda3 of=/dev/null &
      - lilo
         The lilo takes forever to complete.
      
      This can also be reproduced fairly easily with the earlier dd and
      another test
      program doing msync().
      
      The example test program below should print out a message after every
      iteration
      but it simply hangs forever. With this bugfix it makes forward progress.
      
      ====
      Example test program using msync() (thanks to suleiman AT google DOT
      com)
      
      inline uint64_t
      rdtsc(void)
      {
               int64_t tsc;
      
               __asm __volatile("rdtsc" : "=A" (tsc));
               return (tsc);
      }
      
      int
      main(int argc, char **argv)
      {
               struct stat st;
               uint64_t e, s, t;
               char *p, q;
               long i;
               int fd;
      
               if (argc < 2) {
                       printf("Usage: %s <file>\n", argv[0]);
                       return (1);
               }
      
               if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0)
                       err(1, "open");
      
               if (fstat(fd, &st) < 0)
                       err(1, "fstat");
      
               p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE,
      MAP_SHARED, fd, 0);
      
               t = 0;
               for (i = 0; i < 1000; i++) {
                       *p = 0;
                       msync(p, 4096, MS_SYNC);
                       s = rdtsc();
                      *p = 0;
                       __asm __volatile(""::: "memory");
                       e = rdtsc();
                       if (argc > 2)
                               printf("%d: %lld cycles %jd %jd\n",
                                      i, e - s, (intmax_t)s, (intmax_t)e);
                       t += e - s;
               }
               printf("average time: %lld cycles\n", t / 1000);
               return (0);
      }
      
      Cc: <stable@kernel.org>
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      d585d0b9
  9. 01 Feb, 2008 2 commits
    • Jens Axboe's avatar
      block: kill swap_io_context() · 3bc217ff
      Jens Axboe authored
      
      It blindly copies everything in the io_context, including the lock.
      That doesn't work so well for either lock ordering or lockdep.
      
      There seems zero point in swapping io contexts on a request to request
      merge, so the best point of action is to just remove it.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      3bc217ff
    • Jens Axboe's avatar
      as-iosched: fix inconsistent ioc->lock context · 8bdd3f8a
      Jens Axboe authored
      
      Since it's acquired from irq context, all locking must be of the
      irq safe variant. Most are already inside the queue lock (which
      already disables interrupts), but the io scheduler rmmod path
      always has irqs enabled and the put_io_context() path may legally
      be called with irqs enabled (even if it isn't usually). So fixup
      those two.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      8bdd3f8a
  10. 30 Jan, 2008 1 commit
  11. 28 Jan, 2008 1 commit
  12. 18 Dec, 2007 3 commits
  13. 24 Jul, 2007 1 commit
  14. 17 Jul, 2007 1 commit
  15. 09 May, 2007 1 commit
  16. 08 May, 2007 1 commit
  17. 13 Dec, 2006 1 commit
  18. 01 Dec, 2006 1 commit
  19. 22 Nov, 2006 1 commit
    • David Howells's avatar
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells authored
      
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      65f27f38
  20. 01 Oct, 2006 1 commit
  21. 30 Sep, 2006 12 commits
  22. 30 Jun, 2006 1 commit
  23. 26 Jun, 2006 1 commit