1. 24 Jul, 2008 1 commit
  2. 18 Jul, 2008 3 commits
  3. 27 Jun, 2008 3 commits
  4. 20 Jun, 2008 5 commits
  5. 19 Jun, 2008 2 commits
  6. 18 Jun, 2008 1 commit
    • Dmitry Adamushko's avatar
      sched: rework of "prioritize non-migratable tasks over migratable ones" · 20b6331b
      Dmitry Adamushko authored
      regarding this commit: 45c01e82
      
      
      
      I think we can do it simpler. Please take a look at the patch below.
      
      Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and
      twice so on x86_64), let's add "exclusive" (the ones that are bound to
      this CPU) tasks to the head of the queue and "shared" ones -- to the
      end.
      
      In case of a few newly woken up "exclusive" tasks, they are 'stacked'
      (not queued as now), meaning that a task {i+1} is being placed in front
      of the previously woken up task {i}. But I don't think that this
      behavior may cause any realistic problems.
      
      There are a couple of changes on top of this one.
      
      (1) in check_preempt_curr_rt()
      
      I don't think there is a need for the "pick_next_rt_entity(rq, &rq->rt)
      != &rq->curr->rt" check.
      
      enqueue_task_rt(p) and check_preempt_curr_rt() are always called one
      after another with rq->lock being held so the following check
      "p->rt.nr_cpus_allowed == 1 && rq->curr->rt.nr_cpus_allowed != 1" should
      be enough (well, just its left part) to guarantee that 'p' has been
      queued in front of the 'curr'.
      
      (2) in set_cpus_allowed_rt()
      
      I don't thinks there is a need for requeue_task_rt() here.
      
      Perhaps, the only case when 'requeue' (+ reschedule) might be useful is
      as follows:
      
      i) weight == 1 && cpu_isset(task_cpu(p), *new_mask)
      
      i.e. a task is being bound to this CPU);
      
      ii) 'p' != rq->curr
      
      but here, 'p' has already been on this CPU for a while and was not
      migrated. i.e. it's possible that 'rq->curr' would not have high chances
      to be migrated right at this particular moment (although, has chance in
      a bit longer term), should we allow it to be preempted.
      
      Anyway, I think we should not perhaps make it more complex trying to
      address some rare corner cases. For instance, that's why a single queue
      approach would be preferable. Unless I'm missing something obvious, this
      approach gives us similar functionality at lower cost.
      
      Verified only compilation-wise.
      
      (Almost)-Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      20b6331b
  7. 10 Jun, 2008 1 commit
    • Peter Zijlstra's avatar
      sched: fix hotplug cpus on ia64 · 7def2be1
      Peter Zijlstra authored
      
      Cliff Wickman wrote:
      
      > I built an ia64 kernel from Andrew's tree (2.6.26-rc2-mm1)
      > and get a very predictable hotplug cpu problem.
      > billberry1:/tmp/cpw # ./dis
      > disabled cpu 17
      > enabled cpu 17
      > billberry1:/tmp/cpw # ./dis
      > disabled cpu 17
      > enabled cpu 17
      > billberry1:/tmp/cpw # ./dis
      >
      > The script that disables the cpu always hangs (unkillable)
      > on the 3rd attempt.
      >
      > And a bit further:
      > The kstopmachine thread always sits on the run queue (real time) for about
      > 30 minutes before running.
      
      this fix solves some (but not all) issues between CPU hotplug and
      RT bandwidth throttling.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7def2be1
  8. 06 Jun, 2008 4 commits
    • Ingo Molnar's avatar
      sched: fix cpuprio build bug · 1100ac91
      Ingo Molnar authored
      
      this patch was not built on !SMP:
      
       kernel/sched_rt.c: In function 'inc_rt_tasks':
       kernel/sched_rt.c:404: error: 'struct rq' has no member named 'online'
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1100ac91
    • Gregory Haskins's avatar
      sched: fix cpupri hotplug support · 1f11eb6a
      Gregory Haskins authored
      The RT folks over at RedHat found an issue w.r.t. hotplug support which
      was traced to problems with the cpupri infrastructure in the scheduler:
      
      https://bugzilla.redhat.com/show_bug.cgi?id=449676
      
      
      
      This bug affects 23-rt12+, 24-rtX, 25-rtX, and sched-devel.  This patch
      applies to 25.4-rt4, though it should trivially apply to most cpupri enabled
      kernels mentioned above.
      
      It turned out that the issue was that offline cpus could get inadvertently
      registered with cpupri so that they were erroneously selected during
      migration decisions.  The end result would be an OOPS as the offline cpu
      had tasks routed to it.
      
      This patch generalizes the old join/leave domain interface into an
      online/offline interface, and adjusts the root-domain/hotplug code to
      utilize it.
      
      I was able to easily reproduce the issue prior to this patch, and am no
      longer able to reproduce it after this patch.  I can offline cpus
      indefinately and everything seems to be in working order.
      
      Thanks to Arnaldo (acme), Thomas, and Peter for doing the legwork to point
      me in the right direction.  Also thank you to Peter for reviewing the
      early iterations of this patch.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1f11eb6a
    • Gregory Haskins's avatar
      sched: use a 2-d bitmap for searching lowest-pri CPU · 6e0534f2
      Gregory Haskins authored
      
      The current code use a linear algorithm which causes scaling issues
      on larger SMP machines.  This patch replaces that algorithm with a
      2-dimensional bitmap to reduce latencies in the wake-up path.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      Acked-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      6e0534f2
    • Gregory Haskins's avatar
      sched: prioritize non-migratable tasks over migratable ones · 45c01e82
      Gregory Haskins authored
      Dmitry Adamushko pointed out a known flaw in the rt-balancing algorithm
      that could allow suboptimal balancing if a non-migratable task gets
      queued behind a running migratable one.  It is discussed in this thread:
      
      http://lkml.org/lkml/2008/4/22/296
      
      
      
      This issue has been further exacerbated by a recent checkin to
      sched-devel (git-id 5eee63a5ebc19a870ac40055c0be49457f3a89a3).
      
      >From a pure priority standpoint, the run-queue is doing the "right"
      thing. Using Dmitry's nomenclature, if T0 is on cpu1 first, and T1
      wakes up at equal or lower priority (affined only to cpu1) later, it
      *should* wait for T0 to finish.  However, in reality that is likely
      suboptimal from a system perspective if there are other cores that
      could allow T0 and T1 to run concurrently.  Since T1 can not migrate,
      the only choice for higher concurrency is to try to move T0.  This is
      not something we addessed in the recent rt-balancing re-work.
      
      This patch tries to enhance the balancing algorithm by accomodating this
      scenario.  It accomplishes this by incorporating the migratability of a
      task into its priority calculation.  Within a numerical tsk->prio, a
      non-migratable task is logically higher than a migratable one.  We
      maintain this by introducing a new per-priority queue (xqueue, or
      exclusive-queue) for holding non-migratable tasks.  The scheduler will
      draw from the xqueue over the standard shared-queue (squeue) when
      available.
      
      There are several details for utilizing this properly.
      
      1) During task-wake-up, we not only need to check if the priority
         preempts the current task, but we also need to check for this
         non-migratable condition.  Therefore, if a non-migratable task wakes
         up and sees an equal priority migratable task already running, it
         will attempt to preempt it *if* there is a likelyhood that the
         current task will find an immediate home.
      
      2) Tasks only get this non-migratable "priority boost" on wake-up.  Any
         requeuing will result in the non-migratable task being queued to the
         end of the shared queue.  This is an attempt to prevent the system
         from being completely unfair to migratable tasks during things like
         SCHED_RR timeslicing.
      
      I am sure this patch introduces potentially "odd" behavior if you
      concoct a scenario where a bunch of non-migratable threads could starve
      migratable ones given the right pattern.  I am not yet convinced that
      this is a problem since we are talking about tasks of equal RT priority
      anyway, and there never is much in the way of guarantees against
      starvation under that scenario anyway. (e.g. you could come up with a
      similar scenario with a specific timing environment verses an affinity
      environment).  I can be convinced otherwise, but for now I think this is
      "ok".
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
      CC: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      45c01e82
  9. 29 May, 2008 1 commit
  10. 23 May, 2008 1 commit
  11. 05 May, 2008 2 commits
  12. 19 Apr, 2008 6 commits
    • Peter Zijlstra's avatar
      sched: rt-group: optimize dequeue_rt_stack · 58d6c2d7
      Peter Zijlstra authored
      
      Now that the group hierarchy can have an arbitrary depth the O(n^2) nature
      of RT task dequeues will really hurt. Optimize this by providing space to
      store the tree path, so we can walk it the other way.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      58d6c2d7
    • Peter Zijlstra's avatar
      sched: fair-group: SMP-nice for group scheduling · 18d95a28
      Peter Zijlstra authored
      
      Implement SMP nice support for the full group hierarchy.
      
      On each load-balance action, compile a sched_domain wide view of the full
      task_group tree. We compute the domain wide view when walking down the
      hierarchy, and readjust the weights when walking back up.
      
      After collecting and readjusting the domain wide view, we try to balance the
      tasks within the task_groups. The current approach is a naively balance each
      task group until we've moved the targeted amount of load.
      
      Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP
      paper.
      
      XXX: there will be some numerical issues due to the limited nature of
           SCHED_LOAD_SCALE wrt to representing a task_groups influence on the
           total weight. When the tree is deep enough, or the task weight small
           enough, we'll run out of bits.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Abhishek Chandra <chandra@cs.umn.edu>
      CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      18d95a28
    • Dhaval Giani's avatar
      sched: mix tasks and groups · 354d60c2
      Dhaval Giani authored
      
      This patch allows tasks and groups to exist in the same cfs_rq. With this
      change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N)
      fairness model where M tasks and N groups exist at the cfs_rq level.
      
      [a.p.zijlstra@chello.nl: rt bits and assorted fixes]
      Signed-off-by: default avatarDhaval Giani <dhaval@linux.vnet.ibm.com>
      Signed-off-by: default avatarSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      354d60c2
    • Mike Travis's avatar
      sched: add new set_cpus_allowed_ptr function · cd8ba7cd
      Mike Travis authored
      
      Add a new function that accepts a pointer to the "newly allowed cpus"
      cpumask argument.
      
      int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
      
      The current set_cpus_allowed() function is modified to use the above
      but this does not result in an ABI change.  And with some compiler
      optimization help, it may not introduce any additional overhead.
      
      Additionally, to enforce the read only nature of the new_mask arg, the
      "const" property is migrated to sub-functions called by set_cpus_allowed.
      This silences compiler warnings.
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cd8ba7cd
    • Peter Zijlstra's avatar
      sched: rt-group: smp balancing · ac086bc2
      Peter Zijlstra authored
      
      Currently the rt group scheduling does a per cpu runtime limit, however
      the rt load balancer makes no guarantees about an equal spread of real-
      time tasks, just that at any one time, the highest priority tasks run.
      
      Solve this by making the runtime limit a global property by borrowing
      excessive runtime from the other cpus once the local limit runs out.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ac086bc2
    • Peter Zijlstra's avatar
      sched: rt-group: synchonised bandwidth period · d0b27fa7
      Peter Zijlstra authored
      
      Various SMP balancing algorithms require that the bandwidth period
      run in sync.
      
      Possible improvements are moving the rt_bandwidth thing into root_domain
      and keeping a span per rt_bandwidth which marks throttled cpus.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d0b27fa7
  13. 07 Mar, 2008 1 commit
    • Steven Rostedt's avatar
      sched: balance RT task resched only on runqueue · 6fa46fa5
      Steven Rostedt authored
      Sripathi Kodi reported a crash in the -rt kernel:
      
        https://bugzilla.redhat.com/show_bug.cgi?id=435674
      
      
      
      this is due to a place that can reschedule a task without holding
      the tasks runqueue lock.  This was caused by the RT balancing code
      that pulls RT tasks to the current run queue and will reschedule the
      current task.
      
      There's a slight chance that the pulling of the RT tasks will release
      the current runqueue's lock and retake it (in the double_lock_balance).
      During this time that the runqueue is released, the current task can
      migrate to another runqueue.
      
      In the prio_changed_rt code, after the pull, if the current task is of
      lesser priority than one of the RT tasks pulled, resched_task is called
      on the current task. If the current task had migrated in that small
      window, resched_task will be called without holding the runqueue lock
      for the runqueue that the task is on.
      
      This race condition also exists in the mainline kernel and this patch
      adds a check to make sure the task hasn't migrated before calling
      resched_task.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Tested-by: default avatarSripathi Kodi <sripathik@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6fa46fa5
  14. 04 Mar, 2008 1 commit
    • Peter Zijlstra's avatar
      sched: revert load_balance_monitor() changes · 62fb1851
      Peter Zijlstra authored
      The following commits cause a number of regressions:
      
        commit 58e2d4ca
        Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
        Date:   Fri Jan 25 21:08:00 2008 +0100
        sched: group scheduling, change how cpu load is calculated
      
        commit 6b2d7700
      
      
        Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
        Date:   Fri Jan 25 21:08:00 2008 +0100
        sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups
      
      Namely:
       - very frequent wakeups on SMP, reported by PowerTop users.
       - cacheline trashing on (large) SMP
       - some latencies larger than 500ms
      
      While there is a mergeable patch to fix the latter, the former issues
      are not fixable in a manner suitable for .25 (we're at -rc3 now).
      
      Hence we revert them and try again in v2.6.26.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Tested-by: default avatarAlexey Zaytsev <alexey.zaytsev@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      62fb1851
  15. 13 Feb, 2008 3 commits
  16. 25 Jan, 2008 5 commits