1. 05 Mar, 2007 14 commits
    • Ingo Molnar's avatar
      [PATCH] disable NMI watchdog by default · 6ebf622b
      Ingo Molnar authored
      
      there's a new NMI watchdog related problem: KVM crashes on certain
      bzImages because ... we enable the NMI watchdog by default (even if the
      user does not ask for it) , and no other OS on this planet does that so
      KVM doesnt have emulation for that yet. So KVM injects a #GP, which
      crashes the Linux guest:
      
       general protection fault: 0000 [#1]
       PREEMPT SMP
       Modules linked in:
       CPU:    0
       EIP:    0060:[<c011a8ae>]    Not tainted VLI
       EFLAGS: 00000246   (2.6.20-rc5-rt0 #3)
       EIP is at setup_apic_nmi_watchdog+0x26d/0x3d3
      
      and no, i did /not/ request an nmi_watchdog on the boot command line!
      
      Solution: turn off that darn thing! It's a debug tool, not a 'make life
      harder' tool!!
      
      with this patch the KVM guest boots up just fine.
      
      And with this my laptop (Lenovo T60) also stopped its sporadic hard
      hanging (sometimes in acpi_init(), sometimes later during bootup,
      sometimes much later during actual use) as well. It hung with both
      nmi_watchdog=1 and nmi_watchdog=2, so it's generally the fact of NMI
      injection that is causing problems, not the NMI watchdog variant, nor
      any particular bootup code.
      
      [ NMI breaks on some systems, esp in combination with SMM -Arjan ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ebf622b
    • Heiko Carstens's avatar
      [PATCH] timer/hrtimer: take per cpu locks in sane order · e81ce1f7
      Heiko Carstens authored
      Doing something like this on a two cpu system
      
        # echo 0 > /sys/devices/system/cpu/cpu0/online
        # echo 1 > /sys/devices/system/cpu/cpu0/online
        # echo 0 > /sys/devices/system/cpu/cpu1/online
      
      will give me this:
      
        =======================================================
        [ INFO: possible circular locking dependency detected ]
        2.6.21-rc2-g562aa1d4
      
      -dirty #7
        -------------------------------------------------------
        bash/1282 is trying to acquire lock:
         (&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240
      
        but task is already holding lock:
         (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240
      
        which lock already depends on the new lock.
      
      This happens because we have the following code in kernel/hrtimer.c:
      
        migrate_hrtimers(int cpu)
        [...]
        old_base = &per_cpu(hrtimer_bases, cpu);
        new_base = &get_cpu_var(hrtimer_bases);
        [...]
        spin_lock(&new_base->lock);
        spin_lock(&old_base->lock);
      
      Which means the spinlocks are taken in an order which depends on which cpu
      gets shut down from which other cpu. Therefore lockdep complains that there
      might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
      cpu hotplug it's safe to assume that it isn't executed concurrently on a
      
      The same problem exists in kernel/timer.c: migrate_timers().
      
      As pointed out by Christian Borntraeger one possible solution to avoid
      the locking order complaints would be to make sure that the locks are
      always taken in the same order. E.g. by taking the lock of the cpu with
      the lower number first.
      
      To achieve this we introduce two new spinlock functions double_spin_lock
      and double_spin_unlock which lock or unlock two locks in a given order.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Christian Borntraeger <cborntra@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e81ce1f7
    • john stultz's avatar
      [PATCH] clocksource init adjustments (fix bug #7426) · 6bb74df4
      john stultz authored
      This patch resolves the issue found here:
      http://bugme.osdl.org/show_bug.cgi?id=7426
      
      
      
      The basic summary is:
      Currently we register most of i386/x86_64 clocksources at module_init
      time. Then we enable clocksource selection at late_initcall time. This
      causes some problems for drivers that use gettimeofday for init
      calibration routines (specifically the es1968 driver in this case),
      where durring module_init, the only clocksource available is the low-res
      jiffies clocksource. This may cause slight calibration errors, due to
      the small sampling time used.
      
      It should be noted that drivers that require fine grained time may not
      function on architectures that do not have better then jiffies
      resolution timekeeping (there are a few). However, this does not
      discount the reasonable need for such fine-grained timekeeping at init
      time.
      
      Thus the solution here is to register clocksources earlier (ideally when
      the hardware is being initialized), and then we enable clocksource
      selection at fs_initcall (before device_initcall).
      
      This patch should probably get some testing time in -mm, since
      clocksource selection is one of the most important issues for correct
      timekeeping, and I've only been able to test this on a few of my own
      boxes.
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6bb74df4
    • Zachary Amsden's avatar
      [PATCH] vmi: apic ops · 772205f6
      Zachary Amsden authored
      
      Use para_fill instead of directly setting the APIC ops to the result of the
      vmi_get_function call - this allows one to implement a VMI ROM without
      implementing APIC functions, just using the native APIC functions.
      
      While doing this, I realized that there is a lot more cleanup that should have
      been done.  Basically, we should never assume that the ROM implements a
      specific set of functions, and always allow fallback to the native
      implementation.
      
      This is critical for future compatibility.
      Signed-off-by: default avatarAnthony Liguori <anthony@codemonkey.ws>
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      772205f6
    • Zachary Amsden's avatar
      [PATCH] vmi: pit override · e30fab3a
      Zachary Amsden authored
      
      The time_init_hook in paravirt-ops no longer functions in the correct manner
      after the integration of the hrtimers code.  The problem is that now the call
      path for time initialization is:
      
        time_init :
             late_time_init = hpet_time_init;
      
        late_time_init -> hpet_time_init:
             setup_pit_timer (BAD)
             do_time_init --> (via paravirt.h)
                time_init_hook --> (via arch_hooks.h)
                    time_init_hook (in SUBARCH/setup.c)
      
      If this isn't confusing enough, the paravirt case goes through an indirect
      function pointer in the paravirt-ops table.  The problem is, by the time the
      paravirt hook is called, the pit timer is already enabled.
      
      But paravirt guests have their own timer, and don't want to use the PIT.
      Rather than intensify the struggle for power going on here, just make it all
      nice and simple and just unconditionally do all timer setup in the
      late_time_init hook.  This also has the advantage of enabling timers in the
      same place in all code paths, so everyone has the same bugs and we don't have
      outliers who break other code because they turn on timer too early or too
      late.
      
      So the paravirt-ops time init function is now by default hpet_time_init, which
      is the time init function used for native hardware.  Paravirt guests have the
      chance to override this when they setup the paravirt-ops table, and should
      need no change.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e30fab3a
    • Zachary Amsden's avatar
      [PATCH] vmi: paravirt drop udelay op · eda08b1b
      Zachary Amsden authored
      
      Not respecting udelay causes problems with any virtual hardware that is passed
      through to real hardware.  This can be noticed by any device that interacts
      with the real world in real time - like AP startup, which takes real time.  Or
      keyboard LEDs, which should blink in real-time.  Or floppy drives, but only
      when passed through to a real floppy controller on OSes which can't
      sufficiently buffer the floppy commands to emulate a zero latency floppy.  Or
      IDE drives, when connecting to a physical CDROM.
      
      This was mostly a hack to get the kernel to boot faster, but it introduced a
      number of misvirtualization bugs, and Alan and Pavel argued pretty strongly
      against it.  We were the only client, and now want to clean up this cruft.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eda08b1b
    • Zachary Amsden's avatar
      [PATCH] vmi: fix highpte · 9a1c13e9
      Zachary Amsden authored
      
      Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
      page tables.  This information is required so the physical address of PTE
      updates can be determined; otherwise, the mm layer would have to carry the
      physical address all the way to each PTE modification callsite, which is even
      more hideous that the macros required to provide the proper hooks.
      
      So lets not mess up arch neutral code to achieve this, but keep the horror in
      an #ifdef HIGHPTE in include/asm-i386/pgtable.h.  I had to use macros here
      because some types are not yet defined in all the include paths for this
      header.
      
      This patch is absolutely required for HIGHPTE kernels to operate properly with
      VMI.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a1c13e9
    • Zachary Amsden's avatar
      [PATCH] vmi: cpu cycles fix · 1182d852
      Zachary Amsden authored
      
      In order to share the common code in tsc.c which does CPU Khz calibration, we
      need to make an accurate value of CPU speed available to the tsc.c code.  This
      value loses a lot of precision in a VM because of the timing differences with
      real hardware, but we need it to be as precise as possible so the guest can
      make accurate time calculations with the cycle counters.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1182d852
    • Zachary Amsden's avatar
      [PATCH] vmi: sched clock paravirt op fix · 6cb9a835
      Zachary Amsden authored
      
      The custom_sched_clock hook is broken.  The result from sched_clock needs to
      be in nanoseconds, not in CPU cycles.  The TSC is insufficient for this
      purpose, because TSC is poorly defined in a virtual environment, and mostly
      represents real world time instead of scheduled process time (which can be
      interrupted without notice when a virtual machine is descheduled).
      
      To make the scheduler consistent, we must expose a different nature of time,
      that is scheduled time.  So deprecate this custom_sched_clock hack and turn it
      into a paravirt-op, as it should have been all along.  This allows the tsc.c
      code which converts cycles to nanoseconds to be shared by all paravirt-ops
      backends.
      
      It is unfortunate to add a new paravirt-op, but this is a very distinct
      abstraction which is clearly different for all virtual machine
      implementations, and it gets rid of an ugly indirect function which I
      ashamedly admit I hacked in to try to get this to work earlier, and then even
      got in the wrong units.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6cb9a835
    • Christoph Lameter's avatar
      [PATCH] Page migration: Fix vma flag checking · 0dc952dc
      Christoph Lameter authored
      
      Currently we do not check for vma flags if sys_move_pages is called to move
      individual pages.  If sys_migrate_pages is called to move pages then we
      check for vm_flags that indicate a non migratable vma but that still
      includes VM_LOCKED and we can migrate mlocked pages.
      
      Extract the vma_migratable check from mm/mempolicy.c, fix it and put it
      into migrate.h so that is can be used from both locations.
      
      Problem was spotted by Lee Schermerhorn
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0dc952dc
    • Con Kolivas's avatar
      [PATCH] sched: remove SMT nice · 69f7c0a1
      Con Kolivas authored
      
      Remove the SMT-nice feature which idles sibling cpus on SMT cpus to
      facilitiate nice working properly where cpu power is shared.  The idling of
      cpus in the presence of runnable tasks is considered too fragile, easy to
      break with outside code, and the complexity of managing this system if an
      architecture comes along with many logical cores sharing cpu power will be
      unworkable.
      
      Remove the associated per_cpu_gain variable in sched_domains used only by
      this code.
      
      Also:
      
        The reason is that with dynticks enabled, this code breaks without yet
        further tweaks so dynticks brought on the rapid demise of this code.  So
        either we tweak this code or kill it off entirely.  It was Ingo's preference
        to kill it off.  Either way this needs to happen for 2.6.21 since dynticks
        has gone in.
      Signed-off-by: default avatarCon Kolivas <kernel@kolivas.org>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69f7c0a1
    • David Brownell's avatar
      [PATCH] gpio_keys driver shouldn't be ARM-specific · 49015bee
      David Brownell authored
      
      The gpio_keys driver is wrongly ARM-specific; it can't build on
      other platforms with GPIO suport.  This fixes that problem.
      Signed-off-by: default avatarDavid Brownell <dbrownell@users.sourceforge.net>
      Cc: Dmitry Torokhov <dtor@mail.ru>
      Cc: pHilipp Zabel <philipp.zabel@gmail.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Ben Nizette <ben.nizette@iinet.net.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49015bee
    • Eric W. Biederman's avatar
      [PATCH] msi: sanely support hardware level msi disabling · f5f2b131
      Eric W. Biederman authored
      
      In some cases when we are not using msi we need a way to ensure that the
      hardware does not have an msi capability enabled.  Currently the code has been
      calling disable_msi_mode to try and achieve that.  However disable_msi_mode
      has several other side effects and is only available when msi support is
      compiled in so it isn't really appropriate.
      
      Instead this patch implements pci_msi_off which disables all msi and msix
      capabilities unconditionally with no additional side effects.
      
      pci_disable_device was redundantly clearing the bus master enable flag and
      clearing the msi enable bit.  A device that is not allowed to perform bus
      mastering operations cannot generate intx or msi interrupt messages as those
      are essentially a special case of dma, and require bus mastering.  So the call
      in pci_disable_device to disable msi capabilities was redundant.
      
      quirk_pcie_pxh also called disable_msi_mode and is updated to use pci_msi_off.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5f2b131
    • Jean Delvare's avatar
      [PATCH] io_apic.h needs apicdef.h · 58a53b24
      Jean Delvare authored
      
      A -mm patch caused:
      
      In file included from drivers/pci/quirks.c:532:
      include/asm/io_apic.h:61: error: "MAX_IO_APICS" undeclared here (not in a function)
      
      So let's include the needed header.
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58a53b24
  2. 04 Mar, 2007 11 commits
  3. 03 Mar, 2007 1 commit
  4. 02 Mar, 2007 11 commits
  5. 01 Mar, 2007 3 commits