1. 03 May, 2008 1 commit
  2. 30 Apr, 2008 1 commit
  3. 28 Apr, 2008 1 commit
    • Thomas Gleixner's avatar
      hrtimer: raise softirq unlocked to avoid circular lock dependency · 0c96c597
      Thomas Gleixner authored
      
      The scheduler hrtimer bits in 2.6.25 introduced a circular lock
      dependency in a rare code path:
      
      =======================================================
      [ INFO: possible circular locking dependency detected ]
      2.6.25-sched-devel.git-x86-latest.git #19
      -------------------------------------------------------
      X/2980 is trying to acquire lock:
       (&rq->rq_lock_key#2){++..}, at: [<ffffffff80230146>] task_rq_lock+0x56/0xa0
      
      but task is already holding lock:
       (&cpu_base->lock){++..}, at: [<ffffffff80257ae1>] lock_hrtimer_base+0x31/0x60
      
      which lock already depends on the new lock.
      
      The scenario which leads to this is:
      
      posix-timer signal is delivered
       -> posix-timer is rearmed
          timer is already expired in hrtimer_enqueue()
           -> softirq is raised
      
      To prevent this we need to move the raise of the softirq out of the
      base->lock protected code path.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      0c96c597
  4. 27 Apr, 2008 1 commit
    • Bodo Stroesser's avatar
      hrtimer: timeout too long when using HRTIMER_CB_SOFTIRQ · d7b41a24
      Bodo Stroesser authored
      
      When using hrtimer with timer->cb_mode == HRTIMER_CB_SOFTIRQ
      in some cases the clockevent is not programmed.
      This happens, if:
       - a timer is rearmed while it's state is HRTIMER_STATE_CALLBACK
       - hrtimer_reprogram() returns -ETIME, when it is called after
         CALLBACK is finished. This occurs if the new timer->expires
         is in the past when CALLBACK is done.
      In this case, the timer needs to be removed from the tree and put
      onto the pending list again.
      
      The patch is against 2.6.22.5, but AFAICS, it is relevant
      for 2.6.25 also (in run_hrtimer_pending()).
      Signed-off-by: default avatarBodo Stroesser <bstroesser@fujitsu-siemens.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      d7b41a24
  5. 21 Apr, 2008 2 commits
  6. 17 Apr, 2008 2 commits
  7. 14 Feb, 2008 2 commits
  8. 10 Feb, 2008 2 commits
    • Oleg Nesterov's avatar
      hrtimer: don't modify restart_block->fn in restart functions · c289b074
      Oleg Nesterov authored
      
      hrtimer_nanosleep_restart() clears/restores restart_block->fn. This is
      pointless and complicates its usage. Note that if sys_restart_syscall()
      doesn't actually happen, we have a bogus "pending" restart->fn anyway,
      this is harmless.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Pavel Emelyanov <xemul@sw.ru>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Toyo Abe <toyoa@mvista.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c289b074
    • Oleg Nesterov's avatar
      hrtimer: fix *rmtp handling in hrtimer_nanosleep() · 080344b9
      Oleg Nesterov authored
      Spotted by Pavel Emelyanov and Alexey Dobriyan.
      
      hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
      the local variable which lives in the caller's stack frame. This means that
      if sys_restart_syscall() actually happens and it is interrupted as well, we
      don't update the user-space variable, but write into the already dead stack
      frame.
      
      Introduced by commit 04c22714
      
      
      hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
      
      Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
      hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.
      
      Small problem remains. man 2 nanosleep states that *rtmp should be written if
      nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
      if nanosleep returns 0), but (with or without this patch) we can dirty *rem
      even if nanosleep() returns 0.
      
      NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
      bugs. Fixed by the next patch.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Pavel Emelyanov <xemul@sw.ru>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Toyo Abe <toyoa@mvista.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
       include/linux/hrtimer.h |    2 -
       kernel/hrtimer.c        |   51 +++++++++++++++++++++++++-----------------------
       kernel/posix-timers.c   |   14 +------------
       3 files changed, 30 insertions(+), 37 deletions(-)
      080344b9
  9. 05 Feb, 2008 1 commit
    • Davide Libenzi's avatar
      timerfd: new timerfd API · 4d672e7a
      Davide Libenzi authored
      This is the new timerfd API as it is implemented by the following patch:
      
      int timerfd_create(int clockid, int flags);
      int timerfd_settime(int ufd, int flags,
      		    const struct itimerspec *utmr,
      		    struct itimerspec *otmr);
      int timerfd_gettime(int ufd, struct itimerspec *otmr);
      
      The timerfd_create() API creates an un-programmed timerfd fd.  The "clockid"
      parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
      
      The timerfd_settime() API give new settings by the timerfd fd, by optionally
      retrieving the previous expiration time (in case the "otmr" parameter is not
      NULL).
      
      The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
      is set in the "flags" parameter.  Otherwise it's a relative time.
      
      The timerfd_gettime() API returns the next expiration time of the timer, or
      {0, 0} if the timerfd has not been set yet.
      
      Like the previous timerfd API implementation, read(2) and poll(2) are
      supported (with the same interface).  Here's a simple test program I used to
      exercise the new timerfd APIs:
      
      http://www.xmailserver.org/timerfd-test2.c
      
      
      
      [akpm@linux-foundation.org: coding-style cleanups]
      [akpm@linux-foundation.org: fix ia64 build]
      [akpm@linux-foundation.org: fix m68k build]
      [akpm@linux-foundation.org: fix mips build]
      [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
      [heiko.carstens@de.ibm.com: fix s390]
      [akpm@linux-foundation.org: fix powerpc build]
      [akpm@linux-foundation.org: fix sparc64 more]
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d672e7a
  10. 01 Feb, 2008 1 commit
  11. 25 Jan, 2008 3 commits
  12. 21 Jan, 2008 1 commit
    • Randy Dunlap's avatar
      hrtimer: fix section mismatch · 0ec160dd
      Randy Dunlap authored
      
      Fix section mismatch in hrtimer.c:
      
      WARNING: vmlinux.o(.text+0x50c61): Section mismatch: reference to .init.text: (between 'hrtimer_cpu_notify' and 'down_read_trylock')
      
      Noticed by Johannes Berg and confirmed by Sam Ravnborg.
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: Linus Torvalds <torvalds@akpm@linux-foundation.org>
      0ec160dd
  13. 07 Dec, 2007 1 commit
    • Thomas Gleixner's avatar
      hrtimers: avoid overflow for large relative timeouts · 62f0f61e
      Thomas Gleixner authored
      
      Relative hrtimers with a large timeout value might end up as negative
      timer values, when the current time is added in hrtimer_start().
      
      This in turn is causing the clockevents_set_next() function to set an
      huge timeout and sleep for quite a long time when we have a clock
      source which is capable of long sleeps like HPET. With PIT this almost
      goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
      sorts this out in the next timer interrupt, so we never noticed that
      problem which has been there since the first day of hrtimers.
      
      This bug became more apparent in 2.6.24 which activates HPET on more
      hardware.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      62f0f61e
  14. 29 Oct, 2007 1 commit
  15. 19 Oct, 2007 2 commits
  16. 18 Oct, 2007 1 commit
  17. 10 Oct, 2007 1 commit
  18. 25 Jul, 2007 2 commits
    • john stultz's avatar
      Cache xtime every call to update_wall_time · 17c38b74
      john stultz authored
      
      This avoids xtime lag seen with dynticks, because while 'xtime' itself
      is still not updated often, we keep a 'xtime_cache' variable around that
      contains the approximate real-time that _is_ updated each time we do a
      'update_wall_time()', and is thus never off by more than one tick.
      
      IOW, this restores the original semantics for 'xtime' users, as long as
      you use the proper abstraction functions (ie 'current_kernel_time()' or
      'get_seconds()' depending on whether you want a timespec or just the
      seconds field).
      
      [ Updated Patch.  As penance for my sins I've also yanked another #ifdef
        that was added to avoid the xtime lag w/ hrtimers.  ]
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17c38b74
    • john stultz's avatar
      Cleanup non-arch xtime uses, use get_seconds() or current_kernel_time(). · 2c6b47de
      john stultz authored
      
      This avoids use of the kernel-internal "xtime" variable directly outside
      of the actual time-related functions.  Instead, use the helper functions
      that we already have available to us.
      
      This doesn't actually change any behaviour, but this will allow us to
      fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ
      (because much of the realtime information is maintained as separate
      offsets to 'xtime'), which has caused interfaces that use xtime directly
      to get a time that is out of sync with the real-time clock by up to a
      third of a second or so.
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c6b47de
  19. 21 Jul, 2007 2 commits
  20. 16 Jul, 2007 1 commit
  21. 09 May, 2007 1 commit
    • Rafael J. Wysocki's avatar
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki authored
      
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  22. 08 May, 2007 1 commit
  23. 27 Apr, 2007 1 commit
  24. 26 Apr, 2007 1 commit
    • Patrick McHardy's avatar
      [NET_SCHED]: Use ktime as clocksource · 641b9e0e
      Patrick McHardy authored
      
      Get rid of the manual clock source selection mess and use ktime. Also
      use a scalar representation, which allows to clean up pkt_sched.h a bit
      more and results in less ktime_to_ns() calls in most cases.
      
      The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite
      inefficient by this patch, following patches will convert all qdiscs
      to hrtimers and get rid of them entirely.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      641b9e0e
  25. 07 Apr, 2007 1 commit
    • Ingo Molnar's avatar
      [PATCH] high-res timers: resume fix · 995f054f
      Ingo Molnar authored
      
      Soeren Sonnenburg reported that upon resume he is getting
      this backtrace:
      
       [<c0119637>] smp_apic_timer_interrupt+0x57/0x90
       [<c0142d30>] retrigger_next_event+0x0/0xb0
       [<c0104d30>] apic_timer_interrupt+0x28/0x30
       [<c0142d30>] retrigger_next_event+0x0/0xb0
       [<c0140068>] __kfifo_put+0x8/0x90
       [<c0130fe5>] on_each_cpu+0x35/0x60
       [<c0143538>] clock_was_set+0x18/0x20
       [<c0135cdc>] timekeeping_resume+0x7c/0xa0
       [<c02aabe1>] __sysdev_resume+0x11/0x80
       [<c02ab0c7>] sysdev_resume+0x47/0x80
       [<c02b0b05>] device_power_up+0x5/0x10
      
      it turns out that on resume we mistakenly re-enable interrupts too
      early.  Do the timer retrigger only on the current CPU.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarSoeren Sonnenburg <kernel@nn7.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      995f054f
  26. 28 Mar, 2007 1 commit
  27. 16 Mar, 2007 2 commits
  28. 06 Mar, 2007 1 commit
  29. 05 Mar, 2007 1 commit
    • Heiko Carstens's avatar
      [PATCH] timer/hrtimer: take per cpu locks in sane order · e81ce1f7
      Heiko Carstens authored
      Doing something like this on a two cpu system
      
        # echo 0 > /sys/devices/system/cpu/cpu0/online
        # echo 1 > /sys/devices/system/cpu/cpu0/online
        # echo 0 > /sys/devices/system/cpu/cpu1/online
      
      will give me this:
      
        =======================================================
        [ INFO: possible circular locking dependency detected ]
        2.6.21-rc2-g562aa1d4
      
      -dirty #7
        -------------------------------------------------------
        bash/1282 is trying to acquire lock:
         (&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240
      
        but task is already holding lock:
         (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240
      
        which lock already depends on the new lock.
      
      This happens because we have the following code in kernel/hrtimer.c:
      
        migrate_hrtimers(int cpu)
        [...]
        old_base = &per_cpu(hrtimer_bases, cpu);
        new_base = &get_cpu_var(hrtimer_bases);
        [...]
        spin_lock(&new_base->lock);
        spin_lock(&old_base->lock);
      
      Which means the spinlocks are taken in an order which depends on which cpu
      gets shut down from which other cpu. Therefore lockdep complains that there
      might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
      cpu hotplug it's safe to assume that it isn't executed concurrently on a
      
      The same problem exists in kernel/timer.c: migrate_timers().
      
      As pointed out by Christian Borntraeger one possible solution to avoid
      the locking order complaints would be to make sure that the locks are
      always taken in the same order. E.g. by taking the lock of the cpu with
      the lower number first.
      
      To achieve this we introduce two new spinlock functions double_spin_lock
      and double_spin_unlock which lock or unlock two locks in a given order.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Christian Borntraeger <cborntra@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e81ce1f7
  30. 16 Feb, 2007 1 commit
    • Ingo Molnar's avatar
      [PATCH] Add debugging feature /proc/timer_stat · 82f67cd9
      Ingo Molnar authored
      
      Add /proc/timer_stats support: debugging feature to profile timer expiration.
      Both the starting site, process/PID and the expiration function is captured.
      This allows the quick identification of timer event sources in a system.
      
      Sample output:
      
      # echo 1 > /proc/timer_stats
      # cat /proc/timer_stats
      Timer Stats Version: v0.1
      Sample period: 4.010 s
        24,     0 swapper          hrtimer_stop_sched_tick (hrtimer_sched_tick)
        11,     0 swapper          sk_reset_timer (tcp_delack_timer)
         6,     0 swapper          hrtimer_stop_sched_tick (hrtimer_sched_tick)
         2,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
        17,     0 swapper          hrtimer_restart_sched_tick (hrtimer_sched_tick)
         2,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
         4,  2050 pcscd            do_nanosleep (hrtimer_wakeup)
         5,  4179 sshd             sk_reset_timer (tcp_write_timer)
         4,  2248 yum-updatesd     schedule_timeout (process_timeout)
        18,     0 swapper          hrtimer_restart_sched_tick (hrtimer_sched_tick)
         3,     0 swapper          sk_reset_timer (tcp_delack_timer)
         1,     1 swapper          neigh_table_init_no_netlink (neigh_periodic_timer)
         2,     1 swapper          e1000_up (e1000_watchdog)
         1,     1 init             schedule_timeout (process_timeout)
      100 total events, 25.24 events/sec
      
      [ cleanups and hrtimers support from Thomas Gleixner <tglx@linutronix.de> ]
      [bunk@stusta.de: nr_entries can become static]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82f67cd9