1. 16 Jul, 2007 1 commit
    • Vasily Tarasov's avatar
      diskquota: 32bit quota tools on 64bit architectures · b716395e
      Vasily Tarasov authored
      
      OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
      working on 64bit architectures.  In 2.6.10 kernel sys32_quotactl() function
      was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
      32/64bit clean, enable it for 32bit" However this isn't right.  Look at
      if_dqblk structure:
      
      struct if_dqblk {
              __u64 dqb_bhardlimit;
              __u64 dqb_bsoftlimit;
              __u64 dqb_curspace;
              __u64 dqb_ihardlimit;
              __u64 dqb_isoftlimit;
              __u64 dqb_curinodes;
              __u64 dqb_btime;
              __u64 dqb_itime;
              __u32 dqb_valid;
      };
      
      For 32 bit quota tools sizeof(if_dqblk) == 0x44.
      But for 64 bit kernel its size is 0x48, 'cause of alignment!
      Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
      that handles this and related situations.
      
      [michal.k.k.piotrowski@gmail.com: build fix]
      [akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
      Signed-off-by: default avatarVasily Tarasov <vtaras@openvz.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarMichal Piotrowski <michal.k.k.piotrowski@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b716395e
  2. 12 May, 2007 1 commit
  3. 11 May, 2007 3 commits
    • Davide Libenzi's avatar
      signal/timer/event: eventfd core · e1ad7468
      Davide Libenzi authored
      This is a very simple and light file descriptor, that can be used as event
      wait/dispatch by userspace (both wait and dispatch) and by the kernel
      (dispatch only).  It can be used instead of pipe(2) in all cases where those
      would simply be used to signal events.  Their kernel overhead is much lower
      than pipes, and they do not consume two fds.  When used in the kernel, it can
      offer an fd-bridge to enable, for example, functionalities like KAIO or
      syslets/threadlets to signal to an fd the completion of certain operations.
      But more in general, an eventfd can be used by the kernel to signal readiness,
      in a POSIX poll/select way, of interfaces that would otherwise be incompatible
      with it.  The API is:
      
      int eventfd(unsigned int count);
      
      The eventfd API accepts an initial "count" parameter, and returns an eventfd
      fd.  It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).
      
      The POLLIN flag is raised when the internal counter is greater than zero.
      
      The POLLOUT flag is raised when at least a value of "1" can be written to the
      internal counter.
      
      The POLLERR flag is raised when an overflow in the counter value is detected.
      
      The write(2) operation can never overflow the counter, since it blocks (unless
      O_NONBLOCK is set, in which case -EAGAIN is returned).
      
      But the eventfd_signal() function can do it, since it's supposed to not sleep
      during its operation.
      
      The read(2) function reads the __u64 counter value, and reset the internal
      value to zero.  If the value read is equal to (__u64) -1, an overflow happened
      on the internal counter (due to 2^64 eventfd_signal() posts that has never
      been retired - unlickely, but possible).
      
      The write(2) call writes an __u64 count value, and adds it to the current
      counter.  The eventfd fd supports O_NONBLOCK also.
      
      On the kernel side, we have:
      
      struct file *eventfd_fget(int fd);
      int eventfd_signal(struct file *file, unsigned int n);
      
      The eventfd_fget() should be called to get a struct file* from an eventfd fd
      (this is an fget() + check of f_op being an eventfd fops pointer).
      
      The kernel can then call eventfd_signal() every time it wants to post an event
      to userspace.  The eventfd_signal() function can be called from any context.
      An eventfd() simple test and bench is available here:
      
      http://www.xmailserver.org/eventfd-bench.c
      
      This is the eventfd-based version of pipetest-4 (pipe(2) based):
      
      http://www.xmailserver.org/pipetest-4.c
      
      
      
      Not that performance matters much in the eventfd case, but eventfd-bench
      shows almost as double as performance than pipetest-4.
      
      [akpm@linux-foundation.org: fix i386 build]
      [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1ad7468
    • Davide Libenzi's avatar
      signal/timer/event: timerfd core · b215e283
      Davide Libenzi authored
      This patch introduces a new system call for timers events delivered though
      file descriptors.  This allows timer event to be used with standard POSIX
      poll(2), select(2) and read(2).  As a consequence of supporting the Linux
      f_op->poll subsystem, they can be used with epoll(2) too.
      
      The system call is defined as:
      
      int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);
      
      The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
      w/out going through the close/open cycle (same as signalfd).  If "ufd" is -1,
      s new file descriptor will be created, otherwise the existing "ufd" will be
      re-programmed.
      
      The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME.  The time
      specified in the "utmr->it_value" parameter is the expiry time for the timer.
      
      If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
      otherwise it's a relative time.
      
      If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
      tv_nsec == 0), this is the period at which the following ticks should be
      generated.
      
      The "utmr->it_interval" should be set to zero if only one tick is requested.
      Setting the "utmr->it_value" to zero will disable the timer, or will create a
      timerfd without the timer enabled.
      
      The function returns the new (or same, in case "ufd" is a valid timerfd
      descriptor) file, or -1 in case of error.
      
      As stated before, the timerfd file descriptor supports poll(2), select(2) and
      epoll(2).  When a timer event happened on the timerfd, a POLLIN mask will be
      returned.
      
      The read(2) call can be used, and it will return a u32 variable holding the
      number of "ticks" that happened on the interface since the last call to
      read(2).  The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
      be returned if no ticks happened.
      
      A quick test program, shows timerfd working correctly on my amd64 box:
      
      http://www.xmailserver.org/timerfd-test.c
      
      
      
      [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b215e283
    • Davide Libenzi's avatar
      signal/timer/event: signalfd core · fba2afaa
      Davide Libenzi authored
      This patch series implements the new signalfd() system call.
      
      I took part of the original Linus code (and you know how badly it can be
      broken :), and I added even more breakage ;) Signals are fetched from the same
      signal queue used by the process, so signalfd will compete with standard
      kernel delivery in dequeue_signal().  If you want to reliably fetch signals on
      the signalfd file, you need to block them with sigprocmask(SIG_BLOCK).  This
      seems to be working fine on my Dual Opteron machine.  I made a quick test
      program for it:
      
      http://www.xmailserver.org/signafd-test.c
      
      
      
      The signalfd() system call implements signal delivery into a file descriptor
      receiver.  The signalfd file descriptor if created with the following API:
      
      int signalfd(int ufd, const sigset_t *mask, size_t masksize);
      
      The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
      to close/create cycle (Linus idea).  Use "ufd" == -1 if you want a brand new
      signalfd file.
      
      The "mask" allows to specify the signal mask of signals that we are interested
      in.  The "masksize" parameter is the size of "mask".
      
      The signalfd fd supports the poll(2) and read(2) system calls.  The poll(2)
      will return POLLIN when signals are available to be dequeued.  As a direct
      consequence of supporting the Linux poll subsystem, the signalfd fd can use
      used together with epoll(2) too.
      
      The read(2) system call will return a "struct signalfd_siginfo" structure in
      the userspace supplied buffer.  The return value is the number of bytes copied
      in the supplied buffer, or -1 in case of error.  The read(2) call can also
      return 0, in case the sighand structure to which the signalfd was attached,
      has been orphaned.  The O_NONBLOCK flag is also supported, and read(2) will
      return -EAGAIN in case no signal is available.
      
      If the size of the buffer passed to read(2) is lower than sizeof(struct
      signalfd_siginfo), -EINVAL is returned.  A read from the signalfd can also
      return -ERESTARTSYS in case a signal hits the process.  The format of the
      struct signalfd_siginfo is, and the valid fields depends of the (->code &
      __SI_MASK) value, in the same way a struct siginfo would:
      
      struct signalfd_siginfo {
      	__u32 signo;	/* si_signo */
      	__s32 err;	/* si_errno */
      	__s32 code;	/* si_code */
      	__u32 pid;	/* si_pid */
      	__u32 uid;	/* si_uid */
      	__s32 fd;	/* si_fd */
      	__u32 tid;	/* si_fd */
      	__u32 band;	/* si_band */
      	__u32 overrun;	/* si_overrun */
      	__u32 trapno;	/* si_trapno */
      	__s32 status;	/* si_status */
      	__s32 svint;	/* si_int */
      	__u64 svptr;	/* si_ptr */
      	__u64 utime;	/* si_utime */
      	__u64 stime;	/* si_stime */
      	__u64 addr;	/* si_addr */
      };
      
      [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fba2afaa
  4. 03 Nov, 2006 1 commit
  5. 16 Oct, 2006 1 commit
  6. 30 Sep, 2006 1 commit
    • David Howells's avatar
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells authored
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JF...
      9361401e
  7. 23 Jun, 2006 2 commits
    • Christoph Lameter's avatar
      [PATCH] sys_move_pages: 32bit support (i386, x86_64) · 1b2db9fb
      Christoph Lameter authored
      
      sys_move_pages() support for 32bit (i386 plus x86_64 compat layer)
      
      Add support for move_pages() on i386 and also add the compat functions
      necessary to run 32 bit binaries on x86_64.
      
      Add compat_sys_move_pages to the x86_64 32bit binary layer.  Note that it is
      not up to date so I added the missing pieces.  Not sure if this is done the
      right way.
      
      [akpm@osdl.org: compile fix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1b2db9fb
    • Christoph Lameter's avatar
      [PATCH] page migration: sys_move_pages(): support moving of individual pages · 742755a1
      Christoph Lameter authored
      
      move_pages() is used to move individual pages of a process. The function can
      be used to determine the location of pages and to move them onto the desired
      node. move_pages() returns status information for each page.
      
      long move_pages(pid, number_of_pages_to_move,
      		addresses_of_pages[],
      		nodes[] or NULL,
      		status[],
      		flags);
      
      The addresses of pages is an array of void * pointing to the
      pages to be moved.
      
      The nodes array contains the node numbers that the pages should be moved
      to. If a NULL is passed instead of an array then no pages are moved but
      the status array is updated. The status request may be used to determine
      the page state before issuing another move_pages() to move pages.
      
      The status array will contain the state of all individual page migration
      attempts when the function terminates. The status array is only valid if
      move_pages() completed successfullly.
      
      Possible page states in status[]:
      
      0..MAX_NUMNODES	The page is now on the indicated node.
      
      -ENOENT		Page is not present
      
      -EACCES		Page is mapped by multiple processes and can only
      		be moved if MPOL_MF_MOVE_ALL is specified.
      
      -EPERM		The page has been mlocked by a process/driver and
      		cannot be moved.
      
      -EBUSY		Page is busy and cannot be moved. Try again later.
      
      -EFAULT		Invalid address (no VMA or zero page).
      
      -ENOMEM		Unable to allocate memory on target node.
      
      -EIO		Unable to write back page. The page must be written
      		back in order to move it since the page is dirty and the
      		filesystem does not provide a migration function that
      		would allow the moving of dirty pages.
      
      -EINVAL		A dirty page cannot be moved. The filesystem does not provide
      		a migration function and has no ability to write back pages.
      
      The flags parameter indicates what types of pages to move:
      
      MPOL_MF_MOVE	Move pages that are only mapped by the process.
      
      MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
      		Requires sufficient capabilities.
      
      Possible return codes from move_pages()
      
      -ENOENT		No pages found that would require moving. All pages
      		are either already on the target node, not present, had an
      		invalid address or could not be moved because they were
      		mapped by multiple processes.
      
      -EINVAL		Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
      		to migrate pages in a kernel thread.
      
      -EPERM		MPOL_MF_MOVE_ALL specified without sufficient priviledges.
      		or an attempt to move a process belonging to another user.
      
      -EACCES		One of the target nodes is not allowed by the current cpuset.
      
      -ENODEV		One of the target nodes is not online.
      
      -ESRCH		Process does not exist.
      
      -E2BIG		Too many pages to move.
      
      -ENOMEM		Not enough memory to allocate control array.
      
      -EFAULT		Parameters could not be accessed.
      
      A test program for move_pages() may be found with the patches
      on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3
      
      From: Christoph Lameter <clameter@sgi.com>
      
        Detailed results for sys_move_pages()
      
        Pass a pointer to an integer to get_new_page() that may be used to
        indicate where the completion status of a migration operation should be
        placed.  This allows sys_move_pags() to report back exactly what happened to
        each page.
      
        Wish there would be a better way to do this. Looks a bit hacky.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      742755a1
  8. 11 Apr, 2006 1 commit
  9. 27 Mar, 2006 1 commit
  10. 20 Feb, 2006 1 commit
  11. 08 Jan, 2006 4 commits
    • Matt Mackall's avatar
      [PATCH] Make vm86 support optional · 64ca9004
      Matt Mackall authored
      
      This adds an option to remove vm86 support under CONFIG_EMBEDDED.  Saves
      about 5k.
      
      This version eliminates most of the #ifdefs of the previous version and
      instead uses function stubs in vm86.h.  Also, release_vm86_irqs is moved
      from asm-i386/irq.h to a more appropriate home in vm86.h so that the stubs
      can live together.
      
      $ size vmlinux-baseline vmlinux-novm86
         text    data     bss     dec     hex filename
      2920821  523232  190652 3634705  377611 vmlinux-baseline
      2916268  523100  190492 3629860  376324 vmlinux-novm86
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      64ca9004
    • Matt Mackall's avatar
      [PATCH] tiny: Make *[ug]id16 support optional · e585e470
      Matt Mackall authored
      
      Configurable 16-bit UID and friends support
      
      This allows turning off the legacy 16 bit UID interfaces on embedded platforms.
      
         text    data     bss     dec     hex filename
      3330172  529036  190556 4049764  3dcb64 vmlinux-baseline
      3328268  529040  190556 4047864  3dc3f8 vmlinux
      
      From: Adrian Bunk <bunk@stusta.de>
      
          UID16 was accidentially disabled for !EMBEDDED.
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e585e470
    • Christoph Lameter's avatar
      [PATCH] Swap Migration V5: sys_migrate_pages interface · 39743889
      Christoph Lameter authored
      
      sys_migrate_pages implementation using swap based page migration
      
      This is the original API proposed by Ray Bryant in his posts during the first
      half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.
      
      The intent of sys_migrate is to migrate memory of a process.  A process may
      have migrated to another node.  Memory was allocated optimally for the prior
      context.  sys_migrate_pages allows to shift the memory to the new node.
      
      sys_migrate_pages is also useful if the processes available memory nodes have
      changed through cpuset operations to manually move the processes memory.  Paul
      Jackson is working on an automated mechanism that will allow an automatic
      migration if the cpuset of a process is changed.  However, a user may decide
      to manually control the migration.
      
      This implementation is put into the policy layer since it uses concepts and
      functions that are also needed for mbind and friends.  The patch also provides
      a do_migrate_pages function that may be useful for cpusets to automatically
      move memory.  sys_migrate_pages does not modify policies in contrast to Ray's
      implementation.
      
      The current code here is based on the swap based page migration capability and
      thus is not able to preserve the physical layout relative to it containing
      nodeset (which may be a cpuset).  When direct page migration becomes available
      then the implementation needs to be changed to do a isomorphic move of pages
      between different nodesets.  The current implementation simply evicts all
      pages in source nodeset that are not in the target nodeset.
      
      Patch supports ia64, i386 and x86_64.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      39743889
    • Arnd Bergmann's avatar
      [PATCH] spufs: The SPU file system, base · 67207b96
      Arnd Bergmann authored
      This is the current version of the spu file system, used
      for driving SPEs on the Cell Broadband Engine.
      
      This release is almost identical to the version for the
      2.6.14 kernel posted earlier, which is available as part
      of the Cell BE Linux distribution from
      http://www.bsc.es/projects/deepcomputing/linuxoncell/
      
      .
      
      The first patch provides all the interfaces for running
      spu application, but does not have any support for
      debugging SPU tasks or for scheduling. Both these
      functionalities are added in the subsequent patches.
      
      See Documentation/filesystems/spufs.txt on how to use
      spufs.
      Signed-off-by: default avatarArnd Bergmann <arndb@de.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      67207b96
  12. 01 Aug, 2005 1 commit
    • Ingo Molnar's avatar
      [PATCH] remove sys_set_zone_reclaim() · 6cb54819
      Ingo Molnar authored
      This removes sys_set_zone_reclaim() for now.  While i'm sure Martin is
      trying to solve a real problem, we must not hard-code an incomplete and
      insufficient approach into a syscall, because syscalls are pretty much
      for eternity.  I am quite strongly convinced that this syscall must not
      hit v2.6.13 in its current form.
      
      Firstly, the syscall lacks basic syscall design: e.g. it allows the
      global setting of VM policy for unprivileged users. (!) [ Imagine an
      Oracle installation and a SAP installation on the same NUMA box fighting
      over the 'optimal' setting for this flag. What will they do? Will they
      try to set the flag to their own preferred value every second or so? ]
      
      Secondly, it was added based on a single datapoint from Martin:
      
       http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2
      
      
      
      where Martin characterizes the numbers the following way:
      
       ' Run-to-run variability for "make -j" is huge, so these numbers aren't
         terribly useful except to see that with reclaim the benchmark still
         finishes in a reasonable amount of time. '
      
      in other words: the fundamental problem has likely not been solved, only
      a tendential move into the right direction has been observed, and a
      handful of numbers were picked out of a set of hugely variable results,
      without showing the variability data. How much variance is there
      run-to-run?
      
      I'd really suggest to first walk the walk and see what's needed to get
      stable & predictable kernel compilation numbers on that NUMA box, before
      adding random syscalls to tune a particular aspect of the VM ... which
      approach might not even matter once the whole picture has been analyzed
      and understood!
      
      The third, most important point is that the syscall exposes VM tuning
      internals in a completely unstructured way. What sense does it make to
      have a _GLOBAL_ per-node setting for 'should we go to another node for
      reclaim'? If then it might make sense to do this per-app, via numalib or
      so.
      
      The change is minimalistic in that it doesnt remove the syscall and the
      underlying infrastructure changes, only the user-visible changes.  We
      could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka
      /proc/sys/vm/swappiness, but even that looks quite counterproductive
      when the generic approach is that we are trying to reduce the number of
      external factors in the VM balance picture.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6cb54819
  13. 12 Jul, 2005 1 commit
    • Robert Love's avatar
      [PATCH] inotify · 0eeca283
      Robert Love authored
      
      inotify is intended to correct the deficiencies of dnotify, particularly
      its inability to scale and its terrible user interface:
      
              * dnotify requires the opening of one fd per each directory
                that you intend to watch. This quickly results in too many
                open files and pins removable media, preventing unmount.
              * dnotify is directory-based. You only learn about changes to
                directories. Sure, a change to a file in a directory affects
                the directory, but you are then forced to keep a cache of
                stat structures.
              * dnotify's interface to user-space is awful.  Signals?
      
      inotify provides a more usable, simple, powerful solution to file change
      notification:
      
              * inotify's interface is a system call that returns a fd, not SIGIO.
      	  You get a single fd, which is select()-able.
              * inotify has an event that says "the filesystem that the item
                you were watching is on was unmounted."
              * inotify can watch directories or files.
      
      Inotify is currently used by Beagle (a desktop search infrastructure),
      Gamin (a FAM replacement), and other projects.
      
      See Documentation/filesystems/inotify.txt.
      Signed-off-by: default avatarRobert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0eeca283
  14. 25 Jun, 2005 1 commit
  15. 21 Jun, 2005 1 commit
    • Martin Hicks's avatar
      [PATCH] VM: early zone reclaim · 753ee728
      Martin Hicks authored
      This is the core of the (much simplified) early reclaim.  The goal of this
      patch is to reclaim some easily-freed pages from a zone before falling back
      onto another zone.
      
      One of the major uses of this is NUMA machines.  With the default allocator
      behavior the allocator would look for memory in another zone, which might be
      off-node, before trying to reclaim from the current zone.
      
      This adds a zone tuneable to enable early zone reclaim.  It is selected on a
      per-zone basis and is turned on/off via syscall.
      
      Adding some extra throttling on the reclaim was also required (patch
      4/4).  Without the machine would grind to a crawl when doing a "make -j"
      kernel build.  Even with this patch the System Time is higher on
      average, but it seems tolerable.  Here are some numbers for kernbench
      runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
      
      			wall  user   sys   %cpu  ctx sw.  sleeps
      			----  ----   ---   ----   ------  ------
      No patch		1009  1384   847   258   298170   504402
      w/patch, no reclaim     880   1376   667   288   254064   396745
      w/patch & reclaim       1079  1385   926   252   291625   548873
      
      These numbers are the average of 2 runs of 3 "make -j" runs done right
      after system boot.  Run-to-run variability for "make -j" is huge, so
      these numbers aren't terribly useful except to seee that with reclaim
      the benchmark still finishes in a reasonable amount of time.
      
      I also looked at the NUMA hit/miss stats for the "make -j" runs and the
      reclaim doesn't make any difference when the machine is thrashing away.
      
      Doing a "make -j8" on a single node that is filled with page cache pages
      takes 700 seconds with reclaim turned on and 735 seconds without reclaim
      (due to remote memory accesses).
      
      The simple zone_reclaim syscall program is at
      http://www.bork.org/~mort/sgi/zone_reclaim.c
      
      Signed-off-by: default avatarMartin Hicks <mort@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      753ee728
  16. 01 May, 2005 1 commit
  17. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4