1. 21 May, 2010 1 commit
    • Ralph Campbell's avatar
      IB/core: Allow device-specific per-port sysfs files · 9a6edb60
      Ralph Campbell authored
      
      Add a new parameter to ib_register_device() so that low-level device
      drivers can pass in a pointer to a callback function that will be
      called for each port that is registered in sysfs.  This allows
      low-level device drivers to create files in
      
          /sys/class/infiniband/<hca>/ports/<N>/
      
      without having to poke through the internals of the RDMA sysfs handling.
      
      There is no need for an unregister function since the kobject
      reference will go to zero when ib_unregister_device() is called.
      Signed-off-by: default avatarRalph Campbell <ralph.campbell@qlogic.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      9a6edb60
  2. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  3. 05 Sep, 2009 1 commit
    • Jack Morgenstein's avatar
      IB/mthca: Don't allow userspace open while recovering from catastrophic error · d8410647
      Jack Morgenstein authored
      
      Userspace apps are supposed to release all ib device resources if they
      receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
      app has no way of knowing when the device has come back up, except to
      repeatedly attempt ibv_open_device() until it succeeds.
      
      However, currently there is no protection against the open succeeding
      while the device is in being removed following the fatal event.  In
      this case, the open will succeed, but as a result the device waits in
      the middle of its removal until the new app releases its resources --
      and the new app will not do so, since the open succeeded at a point
      following the fatal event generation.
      
      This patch adds an "active" flag to the device. The active flag is set
      to false (in the fatal event flow) before the "fatal" event is
      generated, so any subsequent ibv_dev_open() call to the device will
      fail until the device comes back up, thus preventing the above
      deadlock.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      d8410647
  4. 15 Jul, 2008 1 commit
  5. 29 Apr, 2008 2 commits
    • Roland Dreier's avatar
      IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute · baaad380
      Roland Dreier authored
      Commit cb9fbc5c
      
       ("IB: expand ib_umem_get() prototype") changed the
      mthca userspace ABI to provide a way for userspace to indicate which
      memory regions need the DMA write barrier attribute.  However, it is
      possible to handle this without breaking existing userspace, by having
      the mthca kernel driver recognize whether it is talking to old or new
      userspace, depending on the size of the register MR structure passed in.
      
      The only potential drawback of this is that is allows old userspace
      (which has a bug with DMA ordering on large SGI Altix systems) to
      continue to run on new kernels, but the advantage of allowing old
      userspace to continue to work on unaffected systems seems to outweigh
      this, and we can print a warning to push people to upgrade their
      userspace.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      baaad380
    • Arthur Kepner's avatar
      IB: expand ib_umem_get() prototype · cb9fbc5c
      Arthur Kepner authored
      
      Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
      when mapping user-allocated CQs with ib_umem_get().
      Signed-off-by: default avatarArthur Kepner <akepner@sgi.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb9fbc5c
  6. 19 Apr, 2008 1 commit
  7. 17 Apr, 2008 2 commits
  8. 04 Feb, 2008 2 commits
    • Jack Morgenstein's avatar
      IB/mthca: Don't read reserved fields in mthca_QUERY_ADAPTER() · 6ccef1de
      Jack Morgenstein authored
      
      For memfree devices, the firmware QUERY_ADAPTER command does not
      return vendor_id, device_id, and revision_id; do not return these
      fields in the QUERY_ADAPTER function for memfree devices.
      
      Instead, for memfree devices, initialize the rev_id field of the mthca
      device via init_node_data (MAD IFC query), as is done in the
      query_device verb implementation.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      6ccef1de
    • Roland Dreier's avatar
      IB/mthca: Fix and simplify page size calculation in mthca_reg_phys_mr() · 0d89fe2c
      Roland Dreier authored
          
      In mthca_reg_phys_mr(), we calculate the page size for the HCA
      hardware to use to map the buffer list passed in by the consumer.
      For example, if the consumer passes in
      
          [0] addr 0x1000, size 0x1000
          [1] addr 0x2000, size 0x1000
      
      then the algorithm would come up with a page size of 0x2000 and a list
      of two pages, at 0x0000 and 0x2000.  Usually, this would work fine
      since the memory region would start at an offset of 0x1000 and have a
      length of 0x2000.
      
      However, the old code did not take into account the alignment of the
      IO virtual address passed in.  For example, if the consumer passed in
      a virtual address of 0x6000 for the above, then the offset of 0x1000
      would not be used correctly because the page mask of 0x1fff would
      result in an offset of 0.
      
      We can fix this quite neatly by making sure that the page shift we use
      is no bigger than the first bit where the start of the first buffer
      and the IO virtual address differ.  Also, we can further simplify the
      code by removing the special case for a single buffer by noticing that
      it doesn't matter if we use a page size that is too big.  This allows
      the loop to compute the page shift to be replaced with __ffs().
      
      Thanks to Bryan S Rosenburg <rosnbrg@us.ibm.com> for pointing out the
      original bug and suggesting several ways to improve this patch.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      0d89fe2c
  9. 08 May, 2007 1 commit
    • Roland Dreier's avatar
      IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules · f7c6a7b5
      Roland Dreier authored
      
      Export ib_umem_get()/ib_umem_release() and put low-level drivers in
      control of when to call ib_umem_get() to pin and DMA map userspace,
      rather than always calling it in ib_uverbs_reg_mr() before calling the
      low-level driver's reg_user_mr method.
      
      Also move these functions to be in the ib_core module instead of
      ib_uverbs, so that driver modules using them do not depend on
      ib_uverbs.
      
      This has a number of advantages:
       - It is better design from the standpoint of making generic code a
         library that can be used or overridden by device-specific code as
         the details of specific devices dictate.
       - Drivers that do not need to pin userspace memory regions do not
         need to take the performance hit of calling ib_mem_get().  For
         example, although I have not tried to implement it in this patch,
         the ipath driver should be able to avoid pinning memory and just
         use copy_{to,from}_user() to access userspace memory regions.
       - Buffers that need special mapping treatment can be identified by
         the low-level driver.  For example, it may be possible to solve
         some Altix-specific memory ordering issues with mthca CQs in
         userspace by mapping CQ buffers with extra flags.
       - Drivers that need to pin and DMA map userspace memory for things
         other than memory regions can use ib_umem_get() directly, instead
         of hacks using extra parameters to their reg_phys_mr method.  For
         example, the mlx4 driver that is pending being merged needs to pin
         and DMA map QP and CQ buffers, but it does not need to create a
         memory key for these buffers.  So the cleanest solution is for mlx4
         to call ib_umem_get() in the create_qp and create_cq methods.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      f7c6a7b5
  10. 07 May, 2007 1 commit
    • Michael S. Tsirkin's avatar
      IB: Add CQ comp_vector support · f4fd0b22
      Michael S. Tsirkin authored
      
      Add a num_comp_vectors member to struct ib_device and extend
      ib_create_cq() to pass in a comp_vector parameter -- this parallels
      the userspace libibverbs API.  Update all hardware drivers to set
      num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
      value.  Pass the value of num_comp_vectors to userspace rather than
      hard-coding a value of 1.
      
      We want multiple CQ event vector support (via MSI-X or similar for
      adapters that can generate multiple interrupts), but it's not clear
      how many vectors we want, or how we want to deal with policy issues
      such as how to decide which vector to use or how to set up interrupt
      affinity.  This patch is useful for experimenting, since no core
      changes will be necessary when updating a driver to support multiple
      vectors, and we know that we want to make at least these changes
      anyway.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@dev.mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      f4fd0b22
  11. 25 Apr, 2007 1 commit
  12. 12 Feb, 2007 1 commit
  13. 30 Dec, 2006 1 commit
  14. 08 Dec, 2006 1 commit
    • David Howells's avatar
      [PATCH] LOG2: Implement a general integer log2 facility in the kernel · f0d1b0b3
      David Howells authored
      
      This facility provides three entry points:
      
      	ilog2()		Log base 2 of unsigned long
      	ilog2_u32()	Log base 2 of u32
      	ilog2_u64()	Log base 2 of u64
      
      These facilities can either be used inside functions on dynamic data:
      
      	int do_something(long q)
      	{
      		...;
      		y = ilog2(x)
      		...;
      	}
      
      Or can be used to statically initialise global variables with constant values:
      
      	unsigned n = ilog2(27);
      
      When performing static initialisation, the compiler will report "error:
      initializer element is not constant" if asked to take a log of zero or of
      something not reducible to a constant.  They treat negative numbers as
      unsigned.
      
      When not dealing with a constant, they fall back to using fls() which permits
      them to use arch-specific log calculation instructions - such as BSR on
      x86/x86_64 or SCAN on FRV - if available.
      
      [akpm@osdl.org: MMC fix]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f0d1b0b3
  15. 29 Nov, 2006 1 commit
  16. 10 Oct, 2006 1 commit
  17. 22 Sep, 2006 1 commit
  18. 18 Aug, 2006 1 commit
    • Roland Dreier's avatar
      IB/mthca: No userspace SRQs if HCA doesn't have SRQ support · 5beba532
      Roland Dreier authored
      
      Leave all SRQ methods out of the device's uverbs_cmd_mask if the
      device doesn't have SRQ support (because of ancient firmware) so that
      we don't allow userspace to call the driver's create_srq method.  This
      fixes a userspace-triggerable oops caused by ib_uverbs_create_srq()
      following the device's ->create_srq function pointer, which will be
      NULL if the device doesn't support SRQs.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      5beba532
  19. 17 Jun, 2006 2 commits
  20. 01 May, 2006 1 commit
  21. 12 Apr, 2006 1 commit
    • Jack Morgenstein's avatar
      IB/mthca: Fix max_srq_sge returned by ib_query_device for Tavor devices · 59fef3b1
      Jack Morgenstein authored
      
      The driver allocates SRQ WQEs size with a power of 2 size both for
      Tavor and for memfree. For Tavor, however, the hardware only requires
      the WQE size to be a multiple of 16, not a power of 2, and the max
      number of scatter-gather allowed is reported accordingly by the
      firmware (and this is the value currently returned by
      ib_query_device() and ibv_query_device()).
      
      If the max number of scatter/gather entries reported by the FW is used
      when creating an SRQ, the creation will fail for Tavor, since the
      required WQE size will be increased to the next power of 2, which
      turns out to be larger than the device permitted max WQE size (which
      is not a power of 2).
      
      This patch reduces the reported SRQ max wqe size so that it can be used
      successfully in creating an SRQ on Tavor HCAs.
      Signed-off-by: default avatarJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      59fef3b1
  22. 20 Mar, 2006 5 commits
  23. 30 Jan, 2006 1 commit
  24. 12 Jan, 2006 1 commit
  25. 10 Jan, 2006 1 commit
  26. 09 Jan, 2006 3 commits
  27. 10 Nov, 2005 1 commit
  28. 05 Nov, 2005 1 commit
  29. 03 Nov, 2005 1 commit
  30. 27 Oct, 2005 1 commit
    • Roland Dreier's avatar
      [IB] mthca: first pass at catastrophic error reporting · 3d155f8c
      Roland Dreier authored
      
      Add some initial support for detecting and reporting catastrophic
      errors reported by Mellanox HCAs.  We start a periodic timer which
      polls the catastrophic error reporting buffer in device memory.  If an
      error is detected, we dump the contents of the buffer for port-mortem
      debugging, and report a fatal asynchronous error to higher levels.
      
      In the future we can try to recover from these errors by resetting the
      device, but this will require some work in higher-level code as well.
      Let's get this in now, so that we at least get catastrophic errors
      reported in logs.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      3d155f8c