1. 13 Nov, 2013 26 commits
  2. 10 Nov, 2013 7 commits
    • ktoonsez's avatar
      fs: Modify writeback to make sure block_dump is greater than 1 rather than... · 24b1f638
      ktoonsez authored
      fs: Modify writeback to make sure block_dump is greater than 1 rather than greater than 0 for android devices
      24b1f638
    • ktoonsez's avatar
      9db3796c
    • ktoonsez's avatar
      mm: Modify page_alloc to analyze zone for NULL condition before passing it on... · e5d6ecf9
      ktoonsez authored
      mm: Modify page_alloc to analyze zone for NULL condition before passing it on to fucntions like zone_dirty_ok.
      
      This is trying to prevent:
      Unable to handle kernel NULL pointer dereference
      at virtual address 00000001' condition from PC is
      at get_page_from_freelist+0xe8/0x630
      LR is at zone_dirty_ok+0x18/0xe0
      e5d6ecf9
    • ktoonsez's avatar
      mm: optimize Kernel Samepage Merging(KSM) · 2d783681
      ktoonsez authored
      2d783681
    • Ming Lei's avatar
      ARM: 7746/1: mm: lazy cache flushing on non-mapped pages · f5986135
      Ming Lei authored
      
      Currently flush_dcache_page() thinks pages as non-mapped if
      mapping_mapped(mapping) return false. This approach is very
      coase:
      	- mmap on part of file may cause all pages backed on
      	the file being thought as mmaped
      
      	- file-backed pages aren't mapped into user space actually
      	if the memory mmaped on the file isn't accessed
      
      This patch uses page_mapped() to decide if the page has been
      mapped.
      
      From the attached test code, I find there is much performance
      improvement(>25%) when accessing page caches via read under this
      situations, so memcpy benefits a lot from not flushing cache
      under this situation.
      
      No.   read time without the patch	No. read time with the patch
      ================================================================
      No. 0, time  22615636 us		No. 0, time  22014717 us
      No. 1, time  4387851 us 		No. 1, time  3113184 us
      No. 2, time  4276535 us 		No. 2, time  3005244 us
      No. 3, time  4259821 us 		No. 3, time  3001565 us
      No. 4, time  4263811 us 		No. 4, time  3002748 us
      No. 5, time  4258486 us 		No. 5, time  3004104 us
      No. 6, time  4253009 us 		No. 6, time  3002188 us
      No. 7, time  4262809 us 		No. 7, time  2998196 us
      No. 8, time  4264525 us 		No. 8, time  3007255 us
      No. 9, time  4267795 us 		No. 9, time  3005094 us
      
      1), No.0. is to read the file from storage device, and others are
      to read the file from page caches basically.
      2), file size is 512M, and is on ext4 over usb mass storage.
      3), the test is done on Pandaboard.
      
      unsigned int  sum = 0;
      unsigned long sum_val = 0;
      
      static unsigned long tv_diff(struct timeval *tv1, struct timeval *tv2)
      {
      	return (tv2->tv_sec - tv1->tv_sec) * 1000000 +
      		(tv2->tv_usec - tv1->tv_usec);
      }
      
      int main(int argc, char *argv[])
      {
      	char *mbuf, fbuf;
      	int fd;
      	int i;
      	unsigned long page_size, size;
      	struct stat stat;
      	struct timeval t1, t2;
      	unsigned char *rbuf = malloc(32 * page_size);
      
      	if (!rbuf) {
      		printf("	%sn", "malloc failed");
      		exit(-1);
      	}
      
      	page_size = getpagesize();
      	fd = open(argv[1], O_RDWR);
      	assert(fd >= 0);
      
      	fstat(fd, &stat);
      	size = stat.st_size;
      	printf("%s: file %s, size %lu, page size %lun",
      		argv[0],
      		argv[1], size, page_size);
      
      	gettimeofday(&t1, NULL);
      	mbuf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      	if (!mbuf) {
      		printf("	%sn", "mmap failed");
      		exit(-1);
      	}
      
      	for (i = 0 ; i < size ; i += (page_size * 32)) {
      		int rcnt;
      		lseek(fd, i, SEEK_SET);
      		rcnt = read(fd, rbuf, page_size * 32);
      		if (rcnt != page_size * 32) {
      			printf("%s: read faildn", __func__);
      			exit(-1);
      		}
      	}
      	free(rbuf);
      	munmap(mbuf, size);
      	gettimeofday(&t2, NULL);
      	printf("tread mmaped time: %luusn", tv_diff(&t1, &t2));
      
      	close(fd);
      }
      
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Reviewed-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      f5986135
    • Henrik Rydberg's avatar
      Input: Send events one packet at a time · 1841adf9
      Henrik Rydberg authored
      
      On heavy event loads, such as a multitouch driver, the irqsoff latency
      can be as high as 250 us.  By accumulating a frame worth of data
      before passing it on, the latency can be dramatically reduced.  As a
      side effect, the special EV_SYN handling can be removed, since the
      frame is now atomic.
      
      This patch adds the events() handler callback and uses it if it
      exists. The latency is improved by 50 us even without the callback.
      
      Change-Id: Iebd9b1868ae6300a922a45b6d104e7c2b38e4cf5
      Cc: Daniel Kurtz <djkurtz@chromium.org>
      Tested-by: default avatarBenjamin Tissoires <benjamin.tissoires@enac.fr>
      Tested-by: default avatarPing Cheng <pingc@wacom.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Acked-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarHenrik Rydberg <rydberg@euromail.se>
      
      Input: Improve the events-per-packet estimate
      
      The events-per-packet estimate has so far been used by MT devices
      only. This patch adjusts the packet buffer size to also accomodate the
      KEY and MSC events.  Keyboards normally send one or two keys at a
      time. MT devices normally send a number of button keys along with the
      MT information.  The buffer size chosen here covers those cases, and
      matches the default buffer size in evdev. Since the input estimate is
      now preferred, remove the special input-mt estimate.
      Reviewed-and-tested-by: default avatarPing Cheng <pingc@wacom.com>
      Tested-by: default avatarBenjamin Tissoires <benjamin.tissoires@enac.fr>
      Acked-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarHenrik Rydberg <rydberg@euromail.se>
      Signed-off-by: default avatarfranciscofranco <franciscofranco.1990@gmail.com>
      Signed-off-by: default avatarFrancisco Franco <franciscofranco.1990@gmail.com>
      1841adf9
    • Waiman Long's avatar
      SELinux: Reduce overhead of mls_level_isvalid() function call · 7d541228
      Waiman Long authored
      
      Date	Mon, 10 Jun 2013 13:55:08 -0400
      
      v4->v5:
        - Fix scripts/checkpatch.pl warning.
      
      v3->v4:
        - Merge the 2 separate while loops in ebitmap_contains() into
          a single one.
      
      v2->v3:
        - Remove unused local variables i, node from mls_level_isvalid().
      
      v1->v2:
       - Move the new ebitmap comparison logic from mls_level_isvalid()
         into the ebitmap_contains() helper function.
       - Rerun perf and performance tests on the latest v3.10-rc4 kernel.
      
      While running the high_systime workload of the AIM7 benchmark on
      a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
      (with HT on), it was found that a pretty sizable amount of time was
      spent in the SELinux code. Below was the perf trace of the "perf
      record -a -s" of a test run at 1500 users:
      
        5.04%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
        1.96%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
        1.95%            ls  [kernel.kallsyms]     [k] find_next_bit
      
      The ebitmap_get_bit() was the hottest function in the perf-report
      output.  Both the ebitmap_get_bit() and find_next_bit() functions
      were, in fact, called by mls_level_isvalid(). As a result, the
      mls_level_isvalid() call consumed 8.95% of the total CPU time of
      all the 24 virtual CPUs which is quite a lot. The majority of the
      mls_level_isvalid() function invocations come from the socket creation
      system call.
      
      Looking at the mls_level_isvalid() function, it is checking to see
      if all the bits set in one of the ebitmap structure are also set in
      another one as well as the highest set bit is no bigger than the one
      specified by the given policydb data structure. It is doing it in
      a bit-by-bit manner. So if the ebitmap structure has many bits set,
      the iteration loop will be done many times.
      
      The current code can be rewritten to use a similar algorithm as the
      ebitmap_contains() function with an additional check for the
      highest set bit. The ebitmap_contains() function was extended to
      cover an optional additional check for the highest set bit, and the
      mls_level_isvalid() function was modified to call ebitmap_contains().
      
      With that change, the perf trace showed that the used CPU time drop
      down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
      total which is about 100X less than before.
      
        0.07%            ls  [kernel.kallsyms]     [k] ebitmap_contains
        0.05%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
        0.01%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
        0.01%            ls  [kernel.kallsyms]     [k] find_next_bit
      
      The remaining ebitmap_get_bit() and find_next_bit() functions calls
      are made by other kernel routines as the new mls_level_isvalid()
      function will not call them anymore.
      
      This patch also improves the high_systime AIM7 benchmark result,
      though the improvement is not as impressive as is suggested by the
      reduction in CPU time spent in the ebitmap functions. The table below
      shows the performance change on the 2-socket x86-64 system (with HT
      on) mentioned above.
      
      +--------------+---------------+----------------+-----------------+
      |   Workload   | mean % change | mean % change  | mean % change   |
      |              | 10-100 users  | 200-1000 users | 1100-2000 users |
      +--------------+---------------+----------------+-----------------+
      | high_systime |     +0.1%     |     +0.9%      |     +2.6%       |
      +--------------+---------------+----------------+-----------------+
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: default avatarPaul Reioux <reioux@gmail.com>
      7d541228
  3. 05 Nov, 2013 2 commits
  4. 04 Nov, 2013 5 commits