- 12 Mar, 2010 1 commit
-
-
Christoph Hellwig authored
Add a generic implementation of the old select() syscall, which expects its argument in a memory block and switch all architectures over to use it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by:
H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: James Morris <jmorris@namei.org> Acked-by:
Andreas Schwab <schwab@linux-m68k.org> Acked-by:
Russell King <rmk+kernel@arm.linux.org.uk> Acked-by:
Greg Ungerer <gerg@uclinux.org> Acked-by:
David Howells <dhowells@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 01 Mar, 2010 26 commits
-
-
Gleb Natapov authored
Add proper error and permission checking. This patch also change task switching code to load segment selectors before segment descriptors, like SDM requires, otherwise permission checking during segment descriptor loading will be incorrect. Cc: stable@kernel.org (2.6.33, 2.6.32) Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Make emulator check that vcpu is allowed to execute IN, INS, OUT, OUTS, CLI, STI. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Currently when x86 emulator needs to access memory, page walk is done with broadest permission possible, so if emulated instruction was executed by userspace process it can still access kernel memory. Fix that by providing correct memory access to page walker during emulation. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
For some instructions CPU behaves differently for real-mode and virtual 8086. Let emulator know which mode cpu is in, so it will not poke into vcpu state directly. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Sheng Yang authored
Following the new SDM. Now the bit is named "Ignore PAT memory type". Signed-off-by:
Sheng Yang <sheng@linux.intel.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
None of the other registers have the shadow_ prefix. Signed-off-by:
Avi Kivity <avi@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Avi Kivity authored
Assume that if the guest executes clts, it knows what it's doing, and load the guest fpu to prevent an #NM exception. Signed-off-by:
Avi Kivity <avi@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
Enhance mov dr instruction emulation used by SVM so that it properly handles dr4/5: alias to dr6/7 if cr4.de is cleared. Otherwise return EMULATE_FAIL which will let our only possible caller in that scenario, ud_interception, re-inject UD. We do not need to inject faults, SVM does this for us (exceptions take precedence over instruction interceptions). For the same reason, the value overflow checks can be removed. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Avi Kivity authored
Needed by <asm/kvm_para.h>. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Implement HYPER-V apic MSRs. Spec defines three MSRs that speed-up access to EOI/TPR/ICR apic registers for PV guests. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Minimum HYPER-V implementation should have GUEST_OS_ID, HYPERCALL and VP_INDEX MSRs. [avi: fix build on i386] Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Provide HYPER-V related defines that will be used by following patches. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
Instead of selecting TS and MP as the comments say, the macro included TS and PE. Luckily the macro is unused now, but fix in order to save a few hours of debugging from anyone who attempts to use it. Acked-by:
Joerg Roedel <joerg.roedel@amd.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep it loaded until the next heavyweight exit (where we are forced to unload it). This reduces unnecessary exits. We also defer fpu activation on clts; while clts signals the intent to use the fpu, we can't be sure the guest will actually use it. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
We will use this later to give the guest ownership of cr0.ts. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
The explanation of write_emulated is confused with that of read_emulated. This patch fix it. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Sheng Yang authored
Signed-off-by:
Sheng Yang <sheng@linux.intel.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Sheng Yang authored
Then the callback can provide the maximum supported large page level, which is more flexible. Also move the gb page support into x86_64 specific. Signed-off-by:
Sheng Yang <sheng@linux.intel.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Avi Kivity authored
With slots_lock converted to rcu, the entire kvm hotpath on modern processors (with npt or ept) now scales beautifully. Increase the maximum vcpu count to 64 to reflect this. Signed-off-by:
Avi Kivity <avi@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Marcelo Tosatti authored
Using a similar two-step procedure as for memslots. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Marcelo Tosatti authored
Have a pointer to an allocated region inside x86's kvm_arch. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Sheng Yang authored
Before enabling, execution of "rdtscp" in guest would result in #UD. Signed-off-by:
Sheng Yang <sheng@linux.intel.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Sheng Yang authored
Sometime, we need to adjust some state in order to reflect guest CPUID setting, e.g. if we don't expose rdtscp to guest, we won't want to enable it on hardware. cpuid_update() is introduced for this purpose. Also export kvm_find_cpuid_entry() for later use. Signed-off-by:
Sheng Yang <sheng@linux.intel.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
Some bits of cr4 can be owned by the guest on vmx, so when we read them, we copy them to the vcpu structure. In preparation for making the set of guest-owned bits dynamic, use helpers to access these bits so we don't need to know where the bit resides. No changes to svm since all bits are host-owned there. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
They have no place in common code. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Sheng Yang authored
We don't support these instructions, but guest can execute them even if the feature('monitor') haven't been exposed in CPUID. So we would trap and inject a #UD if guest try this way. Cc: stable@kernel.org Signed-off-by:
Sheng Yang <sheng@linux.intel.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- 27 Feb, 2010 2 commits
-
-
Ian Campbell authored
Now that both Xen and VMI disable allocations of PTE pages from high memory this paravirt op serves no further purpose. This effectively reverts ce6234b5 "add kmap_atomic_pte for mapping highpte pages". Signed-off-by:
Ian Campbell <ian.campbell@citrix.com> LKML-Reference: <1267204562-11844-3-git-send-email-ian.campbell@citrix.com> Acked-by:
Alok Kataria <akataria@vmware.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Russ Anderson authored
Enable NMI on all cpus in UV system and add an NMI handler to dump_stack on each cpu. By default on x86 all the cpus except the boot cpu have NMI masked off. This patch enables NMI on all cpus in UV system and adds an NMI handler to dump_stack on each cpu. This way if a system hangs we can NMI the machine and get a backtrace from all the cpus. Version 2: Use x86_platform driver mechanism for nmi init, per Ingo's suggestion. Version 3: Clean up Ingo's nits. Signed-off-by:
Russ Anderson <rja@sgi.com> LKML-Reference: <20100226164912.GA24439@sgi.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
- 25 Feb, 2010 6 commits
-
-
Thomas Gleixner authored
Replace the #ifdef'ed OLPC-specific init functions by a conditional x86_init function. If the function returns 0 we leave pci_arch_init, otherwise we continue. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Andres Salomon <dilinger@collabora.co.uk> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318CE89@orsmsx508.amr.corp.intel.com> Signed-off-by:
Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Thomas Gleixner authored
Added an abstraction function for arch specific init calls. Signed-off-by:
Jacob Pan <jacob.jun.pan@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318CE84@orsmsx508.amr.corp.intel.com> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Masami Hiramatsu authored
Introduce x86 arch-specific optimization code, which supports both of x86-32 and x86-64. This code also supports safety checking, which decodes whole of a function in which probe is inserted, and checks following conditions before optimization: - The optimized instructions which will be replaced by a jump instruction don't straddle the function boundary. - There is no indirect jump instruction, because it will jumps into the address range which is replaced by jump operand. - There is no jump/loop instruction which jumps into the address range which is replaced by jump operand. - Don't optimize kprobes if it is in functions into which fixup code will jumps. This uses text_poke_multibyte() which doesn't support modifying code on NMI/MCE handler. However, since kprobes itself doesn't support NMI/MCE code probing, it's not a problem. Changes in v9: - Use *_text_reserved() for checking the probe can be optimized. - Verify jump address range is in 2G range when preparing slot. - Backup original code when switching optimized buffer, instead of preparing buffer, because there can be int3 of other probes in preparing phase. - Check kprobe is disabled in arch_check_optimized_kprobe(). - Strictly check indirect jump opcodes (ff /4, ff /5). Changes in v6: - Split stop_machine-based jump patching code. - Update comments and coding style. Changes in v5: - Introduce stop_machine-based jump replacing. Signed-off-by:
Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Anders Kaseorg <andersk@ksplice.com> Cc: Tim Abbott <tabbott@ksplice.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20100225133446.6725.78994.stgit@localhost6.localdomain6> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Masami Hiramatsu authored
Add generic text_poke_smp for SMP which uses stop_machine() to synchronize modifying code. This stop_machine() method is officially described at "7.1.3 Handling Self- and Cross-Modifying Code" on the intel's software developer's manual 3A. Since stop_machine() can't protect code against NMI/MCE, this function can not modify those handlers. And also, this function is basically for modifying multibyte-single-instruction. For modifying multibyte-multi-instructions, we need another special trap & detour code. This code originaly comes from immediate values with stop_machine() version. Thanks Jason and Mathieu! Signed-off-by:
Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Anders Kaseorg <andersk@ksplice.com> Cc: Tim Abbott <tabbott@ksplice.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20100225133438.6725.80273.stgit@localhost6.localdomain6> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Masami Hiramatsu authored
Change RELATIVEJUMP_INSTRUCTION macro to RELATIVEJUMP_OPCODE since it represents just the opcode byte. Signed-off-by:
Masami Hiramatsu <mhiramat@redhat.com> Acked-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Anders Kaseorg <andersk@ksplice.com> Cc: Tim Abbott <tabbott@ksplice.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20100225133349.6725.99302.stgit@localhost6.localdomain6> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Ian Campbell authored
Distros generally (I looked at Debian, RHEL5 and SLES11) seem to enable CONFIG_HIGHPTE for any x86 configuration which has highmem enabled. This means that the overhead applies even to machines which have a fairly modest amount of high memory and which therefore do not really benefit from allocating PTEs in high memory but still pay the price of the additional mapping operations. Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but no actual highptes being allocated there was a reduction in system time used from 59.737s to 55.9s. With CONFIG_HIGHPTE=y and highmem PTEs being allocated: Average Optimal load -j 4 Run (std deviation): Elapsed Time 175.396 (0.238914) User Time 515.983 (5.85019) System Time 59.737 (1.26727) Percent CPU 263.8 (71.6796) Context Switches 39989.7 (4672.64) Sleeps 42617.7 (246.307) With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated: Average Optimal load -j 4 Run (std deviation): Elapsed Time 174.278 (0.831968) User Time 515.659 (6.07012) System Time 55.9 (1.07799) Percent CPU 263.8 (71.266) Context Switches 39929.6 (4485.13) Sleeps 42583.7 (373.039) This patch allows the user to control the allocation of PTEs in highmem from the command line ("userpte=nohigh") but retains the status-quo as the default. It is possible that some simple heuristic could be developed which allows auto-tuning of this option however I don't have a sufficiently large machine available to me to perform any particularly meaningful experiments. We could probably handwave up an argument for a threshold at 16G of total RAM. Assuming 768M of lowmem we have 196608 potential lowmem PTE pages. Each page can map 2M of RAM in a PAE-enabled configuration, meaning a maximum of 384G of RAM could potentially be mapped using lowmem PTEs. Even allowing generous factor of 10 to account for other required lowmem allocations, generous slop to account for page sharing (which reduces the total amount of RAM mappable by a given number of PT pages) and other innacuracies in the estimations it would seem that even a 32G machine would not have a particularly pressing need for highmem PTEs. I think 32G could be considered to be at the upper bound of what might be sensible on a 32 bit machine (although I think in practice 64G is still supported). It's seems questionable if HIGHPTE is even a win for any amount of RAM you would sensibly run a 32 bit kernel on rather than going 64 bit. Signed-off-by:
Ian Campbell <ian.campbell@citrix.com> LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
- 24 Feb, 2010 5 commits
-
-
Yinghai Lu authored
nr_legacy_irqs and its ilk have moved to legacy_pic. -v2: there is one in ioapic_.c Singed-off-by:
Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B84AAC4.2020204@kernel.org> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Jacob Pan authored
Moorestown platform does not have PIT or HPET platform timers. Instead it has a bank of eight APB timers. The number of available timers to the os is exposed via SFI mtmr tables. All APB timer interrupts are routed via ioapic rtes and delivered as MSI. Currently, we use timer 0 and 1 for per cpu clockevent devices, timer 2 for clocksource. Signed-off-by:
Jacob Pan <jacob.jun.pan@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D2D2@orsmsx508.amr.corp.intel.com> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Feng Tang authored
vRTC information is obtained from SFI tables on Moorestown, this patch parses these tables and assign the information. Signed-off-by:
Feng Tang <feng.tang@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0D@orsmsx508.amr.corp.intel.com> Signed-off-by:
Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Jacob Pan authored
Moorestown platform timer information is obtained from SFI FW tables. This patch parses SFI table then assign the irq information to mp_irqs. Signed-off-by:
Jacob Pan <jacob.jun.pan@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0B@orsmsx508.amr.corp.intel.com> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Jacob Pan authored
This patch added Moorestown platform specific PCI init functions. Signed-off-by:
Jacob Pan <jacob.jun.pan@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0A@orsmsx508.amr.corp.intel.com> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-