1. 30 Jan, 2009 1 commit
    • Steven Rostedt's avatar
      generic-ipi: use per cpu data for single cpu ipi calls · d7240b98
      Steven Rostedt authored
      
      The smp_call_function can be passed a wait parameter telling it to
      wait for all the functions running on other CPUs to complete before
      returning, or to return without waiting. Unfortunately, this is
      currently just a suggestion and not manditory. That is, the
      smp_call_function can decide not to return and wait instead.
      
      The reason for this is because it uses kmalloc to allocate storage
      to send to the called CPU and that CPU will free it when it is done.
      But if we fail to allocate the storage, the stack is used instead.
      This means we must wait for the called CPU to finish before
      continuing.
      
      Unfortunatly, some callers do no abide by this hint and act as if
      the non-wait option is mandatory. The MTRR code for instance will
      deadlock if the smp_call_function is set to wait. This is because
      the smp_call_function will wait for the other CPUs to finish their
      called functions, but those functions are waiting on the caller to
      continue.
      
      This patch changes the generic smp_call_function code to use per cpu
      variables if the allocation of the data fails for a single CPU call. The
      smp_call_function_many will fall back to the smp_call_function_single
      if it fails its alloc. The smp_call_function_single is modified
      to not force the wait state.
      
      Since we now are using a single data per cpu we must synchronize the
      callers to prevent a second caller modifying the data before the
      first called IPI functions complete. To do so, I added a flag to
      the call_single_data called CSD_FLAG_LOCK. When the single CPU is
      called (which can be called when a many call fails an alloc), we
      set the LOCK bit on this per cpu data. When the caller finishes
      it clears the LOCK bit.
      
      The caller must wait till the LOCK bit is cleared before setting
      it. When it is cleared, there is no IPI function using it.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d7240b98
  2. 31 Dec, 2008 1 commit
  3. 29 Dec, 2008 2 commits
    • Rusty Russell's avatar
      cpumask: arch_send_call_function_ipi_mask: core · ce47d974
      Rusty Russell authored
      
      Impact: new API to reduce stack usage
      
      We're weaning the core code off handing cpumask's around on-stack.
      This introduces arch_send_call_function_ipi_mask().
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      ce47d974
    • Rusty Russell's avatar
      cpumask: smp_call_function_many() · 54b11e6d
      Rusty Russell authored
      
      Impact: Implementation change to remove cpumask_t from stack.
      
      Actually change smp_call_function_mask() to smp_call_function_many().
      We avoid cpumasks on the stack in this version.
      
      (S390 has its own version, but that's going away apparently).
      
      We have to do some dancing to figure out if 0 or 1 other cpus are in
      the mask supplied and the online mask without allocating a tmp
      cpumask.  It's still fairly cheap.
      
      We allocate the cpumask at the end of the call_function_data
      structure: if allocation fails we fallback to smp_call_function_single
      rather than using the baroque quiescing code (which needs a cpumask on
      stack).
      
      (Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!)
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: npiggin@suse.de
      Cc: axboe@kernel.dk
      54b11e6d
  4. 06 Nov, 2008 1 commit
    • Suresh Siddha's avatar
      generic-ipi: fix the smp_mb() placement · 561920a0
      Suresh Siddha authored
      
      smp_mb() is needed (to make the memory operations visible globally) before
      sending the ipi on the sender and the receiver (on Alpha atleast) needs
      smp_read_barrier_depends() in the handler before reading the call_single_queue
      list in a lock-free fashion.
      
      On x86, x2apic mode register accesses for sending IPI's don't have serializing
      semantics. So the need for smp_mb() before sending the IPI becomes more
      critical in x2apic mode.
      
      Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
      smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
      doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      561920a0
  5. 25 Aug, 2008 1 commit
  6. 12 Aug, 2008 1 commit
    • Nick Piggin's avatar
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask(), fix · c2fc1198
      Nick Piggin authored
      
      > > Nick Piggin (1):
      > >       generic-ipi: fix stack and rcu interaction bug in
      > > smp_call_function_mask()
      >
      > I'm still not 100% sure that I have this patch right... I might have seen
      > a lockup trace implicating the smp call function path... which may have
      > been due to some other problem or a different bug in the new call function
      > code, but if some more people can take a look at it before merging?
      
      OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
      data properly in the alloc && wait case. Secondly, I wasn't resetting
      CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
      wait).
      
      After those fixes, the patch boots and runs with the kmalloc commented
      out (so it always executes the slowpath).
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c2fc1198
  7. 11 Aug, 2008 1 commit
    • Nick Piggin's avatar
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() · cc7a486c
      Nick Piggin authored
      * Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:
      
      > Found a OOPS on a big SMP box during an overnight reboot test with
      > upstream git.
      >
      > Suresh and I looked at the oops and looks like the root cause is in
      > generic_smp_call_function_interrupt() and smp_call_function_mask() with
      > wait parameter.
      >
      > The actual oops looked like
      >
      > [   11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
      > [   11.277815] IP: [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.278155] PGD 202063 PUD 0
      > [   11.278576] Oops: 0010 [1] SMP
      > [   11.279006] CPU 5
      > [   11.279336] Modules linked in:
      > [   11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f7
      
       #290
      > [   11.280039] RIP: 0010:[<ffff8802ffffffff>]  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.280692] RSP: 0018:ffff88027f1f7f70  EFLAGS: 00010086
      > [   11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
      > [   11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
      > [   11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
      > [   11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
      > [   11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
      > [   11.282502] FS:  0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
      > [   11.283096] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > [   11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
      > [   11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > [   11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > [   11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
      > [   11.284936] Stack:  ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
      > [   11.285802]  ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
      > [   11.286599]  ffffffff8020bdd6 ffff88027f1f3e40 <EOI>  ffff88027f1f3ef8 0000000000000000
      > [   11.287120] Call Trace:
      > [   11.287768]  <IRQ>  [<ffffffff80250963>] ? generic_smp_call_function_interrupt+0x61/0x12c
      > [   11.288354]  [<ffffffff8021adb5>] smp_call_function_interrupt+0x17/0x27
      > [   11.288744]  [<ffffffff8020bdd6>] call_function_interrupt+0x66/0x70
      > [   11.289030]  <EOI>  [<ffffffff8024ab3b>] ? clockevents_notify+0x19/0x73
      > [   11.289380]  [<ffffffff803b9b75>] ? acpi_idle_enter_simple+0x18b/0x1fa
      > [   11.289760]  [<ffffffff803b9b6b>] ? acpi_idle_enter_simple+0x181/0x1fa
      > [   11.290051]  [<ffffffff8053aeca>] ? cpuidle_idle_call+0x70/0xa2
      > [   11.290338]  [<ffffffff80209f61>] ? cpu_idle+0x5f/0x7d
      > [   11.290723]  [<ffffffff8060224a>] ? start_secondary+0x14d/0x152
      > [   11.291010]
      > [   11.291287]
      > [   11.291654] Code:  Bad RIP value.
      > [   11.292041] RIP  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.292380]  RSP <ffff88027f1f7f70>
      > [   11.292741] CR2: ffff8802ffffffff
      > [   11.310951] ---[ end trace 137c54d525305f1c ]---
      >
      > The problem is with the following sequence of events:
      >
      > - CPU A calls smp_call_function_mask() for CPU B with wait parameter
      > - CPU A sets up the call_function_data on the stack and does an rcu add to
      >   call_function_queue
      > - CPU A waits until the WAIT flag is cleared
      > - CPU B gets the call function interrupt and starts going through the
      >   call_function_queue
      > - CPU C also gets some other call function interrupt and starts going through
      >   the call_function_queue
      > - CPU C, which is also going through the call_function_queue, starts referencing
      >   CPU A's stack, as that element is still in call_function_queue
      > - CPU B finishes the function call that CPU A set up and as there are no other
      >   references to it, rcu deletes the call_function_data (which was from CPU A
      >   stack)
      > - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
      > - CPU A which was waiting on the flag continues executing and the stack
      >   contents change
      >
      > - CPU C is still in rcu_read section accessing the CPU A's stack sees
      >   inconsistent call_funation_data and can try to execute
      >   function with some random pointer, causing stack corruption for A
      >   (by clearing the bits in mask field) and oops.
      
      Nice debugging work.
      
      I'd suggest something like the attached (boot tested) patch as the simple
      fix for now.
      
      I expect the benefits from the less synchronized, multiple-in-flight-data
      global queue will still outweigh the costs of dynamic allocations. But
      if worst comes to worst then we just go back to a globally synchronous
      one-at-a-time implementation, but that would be pretty sad!
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cc7a486c
  8. 26 Jul, 2008 1 commit
  9. 15 Jul, 2008 1 commit
  10. 27 Jun, 2008 1 commit
  11. 26 Jun, 2008 2 commits