1. 28 Mar, 2014 14 commits
    • hellsgod's avatar
      Revert "epoll: optimize EPOLL_CTL_DEL using rcu" · 78b237a6
      hellsgod authored
      This reverts commit a182b3b17f67027ba1d543312cf168cd62b10fe6.
      78b237a6
    • hellsgod's avatar
      Revert "epoll: Do not take global 'epmutex' for simple topologies" · 6b6722e7
      hellsgod authored
      This reverts commit 59fc9b6fb2418d28b35ba77d989339b65fc4a285.
      6b6722e7
    • hellsgod's avatar
      Revert "epoll: do not take globals epmutex for simple topologies fix" · ca7b605d
      hellsgod authored
      This reverts commit 0d6c0e635bd4e8e1042acfadad96d7e008283008.
      ca7b605d
    • Guojian Chen's avatar
      cpuidle: remove cross-cpu IPI by new latency request. · 9dbf4640
      Guojian Chen authored
      
      when drivers request new latency requirement, it's not necessary to
      immediately wake up another cpu by sending cross-cpu IPI, we can consider
      the new latency to be taken into effect after next wakeup from idle,
      this can save the unnecessary wakeup cost, and reduce the risk that
      drivers may request latency in irq disabled context.
      
      [<c08e0cc0>] (__irq_svc+0x40/0x70) from [<c00e801c>] (smp_call_function_single+0x16c/0x240)
      [<c00e801c>] (smp_call_function_single+0x16c/0x240) from [<c00e855c>] (smp_call_function+0x40/0x6c)
      [<c00e855c>] (smp_call_function+0x40/0x6c) from [<c0601c9c>] (cpuidle_latency_notify+0x18/0x20)
      [<c0601c9c>] (cpuidle_latency_notify+0x18/0x20) from [<c00b7c28>] (blocking_notifier_call_chain+0x74/0x94)
      [<c00b7c28>] (blocking_notifier_call_chain+0x74/0x94) from [<c00d563c>] (pm_qos_update_target+0xe0/0x128)
      [<c00d563c>] (pm_qos_update_target+0xe0/0x128) from [<c0620d3c>] (msmsdcc_enable+0xac/0x158)
      [<c0620d3c>] (msmsdcc_enable+0xac/0x158) from [<c06050e0>] (mmc_try_claim_host+0xb0/0xb8)
      [<c06050e0>] (mmc_try_claim_host+0xb0/0xb8) from [<c0605318>] (mmc_start_bkops.part.15+0x50/0x2f4)
      [<c0605318>] (mmc_start_bkops.part.15+0x50/0x2f4) from [<c00ab768>] (process_one_work+0x124/0x55c)
      [<c00ab768>] (process_one_work+0x124/0x55c) from [<c00abfc8>] (worker_thread+0x178/0x45c)
      [<c00abfc8>] (worker_thread+0x178/0x45c) from [<c00b0b24>] (kthread+0x84/0x90)
      [<c00b0b24>] (kthread+0x84/0x90) from [<c000fdd4>] (kernel_thread_exit+0x0/0x8)
      Disabling lock debugging due to kernel taint
      coresight-etb coresight-etb.0: ETB aborted
      Kernel panic - not syncing: softlockup: hung tasks
      Signed-off-by: default avatarGuojian Chen <a21757@motorola.com>
      Reviewed-on: http://gerrit.pcs.mot.com/532702
      
      
      SLT-Approved: Slta Waiver <sltawvr@motorola.com>
      Tested-by: default avatarJira Key <jirakey@motorola.com>
      Reviewed-by: default avatarKlocwork kwcheck <klocwork-kwcheck@sourceforge.mot.com>
      Reviewed-by: default avatarChristopher Fries <qcf001@motorola.com>
      Reviewed-by: default avatarDavid Ding <dding@motorola.com>
      Submit-Approved: Jira Key <jirakey@motorola.com>
      Signed-off-by: default avatarPranav Vashi <neobuddy89@gmail.com>
      9dbf4640
    • Igor Kovalenko's avatar
      input: evdev: daisy-chain header and client buffers · 19f798cd
      Igor Kovalenko authored
      
      The evdev driver calculates memory required to hold
      client data using power-of-2 math, and then adds header
      size, which pushes the request over the power of 2 and
      requires 2x larger allocation.
      
      Change the logic to daisy-chain the header and client array.
      That way instead of single order4 allocation, we will make
      order0+order2.
      Signed-off-by: default avatarIgor Kovalenko <cik009@motorola.com>
      SLT-Approved: Slta Waiver <sltawvr@motorola.com>
      Tested-by: default avatarJira Key <jirakey@motorola.com>
      Reviewed-by: default avatarChristopher Fries <qcf001@motorola.com>
      Reviewed-by: default avatarJason Hrycay <jason.hrycay@motorola.com>
      Reviewed-by: default avatarKlocwork kwcheck <klocwork-kwcheck@sourceforge.mot.com>
      Signed-off-by: default avatarPranav Vashi <neobuddy89@gmail.com>
      19f798cd
    • Chris Fries's avatar
      ARM: smp: Wait just 1 second for other CPU to halt · 8cf7d67b
      Chris Fries authored
      
      Currently, the busyloop waiting for a 2nd CPU to stop takes about 4
      seconds.  Adjust for the overhead of the loop by looping every 1ms
      instead of 1us.
      Signed-off-by: default avatarChris Fries <C.Fries@motorola.com>
      Reviewed-on: http://gerrit.pcs.mot.com/537864
      
      
      SLT-Approved: Slta Waiver <sltawvr@motorola.com>
      Tested-by: default avatarJira Key <jirakey@motorola.com>
      Reviewed-by: default avatarCheck Patch <CHEKPACH@motorola.com>
      Reviewed-by: default avatarKlocwork kwcheck <klocwork-kwcheck@sourceforge.mot.com>
      Reviewed-by: default avatarIgor Kovalenko <cik009@motorola.com>
      Reviewed-by: default avatarRussell Knize <rknize2@motorola.com>
      Submit-Approved: Jira Key <jirakey@motorola.com>
      Signed-off-by: default avatarPranav Vashi <neobuddy89@gmail.com>
      Signed-off-by: default avatarfranciscofranco <franciscofranco.1990@gmail.com>
      8cf7d67b
    • Tao Ma's avatar
      ext4: protect group inode free counting with group lock · d6c65bec
      Tao Ma authored
      
      commit 6f2e9f0e7d795214b9cf5a47724a273b705fd113 upstream.
      
      Now when we set the group inode free count, we don't have a proper
      group lock so that multiple threads may decrease the inode free
      count at the same time. And e2fsck will complain something like:
      
      Free inodes count wrong for group #1 (1, counted=0).
      Fix? no
      
      Free inodes count wrong for group #2 (3, counted=0).
      Fix? no
      
      Directories count wrong for group #2 (780, counted=779).
      Fix? no
      
      Free inodes count wrong for group #3 (2272, counted=2273).
      Fix? no
      
      So this patch try to protect it with the ext4_lock_group.
      
      btw, it is found by xfstests test case 269 and the volume is
      mkfsed with the parameter
      "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr"
      and I have run it 100 times and the error in e2fsck doesn't
      show up again.
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6c65bec
    • Ben Segall's avatar
      sched: Avoid throttle_cfs_rq() racing with period_timer stopping · b3943366
      Ben Segall authored
      
      commit f9f9ffc237dd924f048204e8799da74f9ecf40cf upstream.
      
      throttle_cfs_rq() doesn't check to make sure that period_timer is running,
      and while update_curr/assign_cfs_runtime does, a concurrently running
      period_timer on another cpu could cancel itself between this cpu's
      update_curr and throttle_cfs_rq(). If there are no other cfs_rqs running
      in the tg to restart the timer, this causes the cfs_rq to be stranded
      forever.
      
      Fix this by calling __start_cfs_bandwidth() in throttle if the timer is
      inactive.
      
      (Also add some sched_debug lines for cfs_bandwidth.)
      
      Tested: make a run/sleep task in a cgroup, loop switching the cgroup
      between 1ms/100ms quota and unlimited, checking for timer_active=0 and
      throttled=1 as a failure. With the throttle_cfs_rq() change commented out
      this fails, with the full patch it passes.
      Signed-off-by: default avatarBen Segall <bsegall@google.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: pjt@google.com
      Link: http://lkml.kernel.org/r/20131016181632.22647.84174.stgit@sword-of-the-dawn.mtv.corp.google.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarChris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3943366
    • Stephen Smalley's avatar
      SELinux: Fix kernel BUG on empty security contexts. · af592541
      Stephen Smalley authored
      
      Setting an empty security context (length=0) on a file will
      lead to incorrectly dereferencing the type and other fields
      of the security context structure, yielding a kernel BUG.
      As a zero-length security context is never valid, just reject
      all such security contexts whether coming from userspace
      via setxattr or coming from the filesystem upon a getxattr
      request by SELinux.
      
      Setting a security context value (empty or otherwise) unknown to
      SELinux in the first place is only possible for a root process
      (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
      if the corresponding SELinux mac_admin permission is also granted
      to the domain by policy.  In Fedora policies, this is only allowed for
      specific domains such as livecd for setting down security contexts
      that are not defined in the build host policy.
      
      [On Android, this can only be set by root/CAP_MAC_ADMIN processes,
      and if running SELinux in enforcing mode, only if mac_admin permission
      is granted in policy.  In Android 4.4, this would only be allowed for
      root/CAP_MAC_ADMIN processes that are also in unconfined domains. In current
      AOSP master, mac_admin is not allowed for any domains except the recovery
      console which has a legitimate need for it.  The other potential vector
      is mounting a maliciously crafted filesystem for which SELinux fetches
      xattrs (e.g. an ext4 filesystem on a SDcard).  However, the end result is
      only a local denial-of-service (DOS) due to kernel BUG.  This fix is
      queued for 3.14.]
      
      Reproducer:
      su
      setenforce 0
      touch foo
      setfattr -n security.selinux foo
      
      Caveat:
      Relabeling or removing foo after doing the above may not be possible
      without booting with SELinux disabled.  Any subsequent access to foo
      after doing the above will also trigger the BUG.
      
      BUG output from Matthew Thode:
      [  473.893141] ------------[ cut here ]------------
      [  473.962110] kernel BUG at security/selinux/ss/services.c:654!
      [  473.995314] invalid opcode: 0000 [#6] SMP
      [  474.027196] Modules linked in:
      [  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
      3.13.0-grsec #1
      [  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
      07/29/10
      [  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
      ffff8805f50cd488
      [  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
      [  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
      0000000000000100
      [  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
      ffff8805e8aaa000
      [  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
      0000000000000006
      [  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
      0000000000000006
      [  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
      0000000000000000
      [  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
      knlGS:0000000000000000
      [  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
      00000000000207f0
      [  474.556058] Stack:
      [  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
      ffff8805f1190a40
      [  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
      ffff8805e8aac860
      [  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
      ffff8805c0ac3d94
      [  474.690461] Call Trace:
      [  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
      [  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
      [  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
      [  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
      [  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
      [  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
      [  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
      [  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
      [  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
      [  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
      [  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
      [  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
      [  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
      [  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
      8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
      75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
      [  475.255884] RIP  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  475.296120]  RSP <ffff8805c0ac3c38>
      [  475.328734] ---[ end trace f076482e9d754adc ]---
      
      [sds:  commit message edited to note Android implications and
      to generate a unique Change-Id for gerrit]
      
      Change-Id: I4d5389f0cfa72b5f59dada45081fa47e03805413
      Reported-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      af592541
    • Tatyana Brokhman's avatar
      block: do not notify urgent request, when flush with data in flight · 783b01dd
      Tatyana Brokhman authored
      
      MMC device driver implements URGENT request execution with priority
      (using stop flow), as a result currently running (and prepared) request
      may be reinserted back into I/O scheduler. This will break block layer
      logic of flushes (flush request should not be inserted into I/O scheduler).
      
      Block layer flush machinery keep q->flush_data_in_flight list updated with
      started but not completed flush requests with data (REQ_FUA).
      
      This change will not notify underling block device driver about pending
      urgent request during flushes in flight.
      
      Change-Id: I8b654925a3c989250fcb8f4f7c998795fb203923
      Signed-off-by: default avatarKonstantin Dorfman <kdorfman@codeaurora.org>
      Signed-off-by: default avatarTatyana Brokhman <tlinder@codeaurora.org>
      783b01dd
    • Stephen Boyd's avatar
      ARM: sched_clock: Load cycle count after epoch stabilizes · e68da643
      Stephen Boyd authored
      
      There is a small race between when the cycle count is read from
      the hardware and when the epoch stabilizes. Consider this
      scenario:
      
       CPU0                           CPU1
       ----                           ----
       cyc = read_sched_clock()
       cyc_to_sched_clock()
                                       update_sched_clock()
                                        ...
                                        cd.epoch_cyc = cyc;
        epoch_cyc = cd.epoch_cyc;
        ...
        epoch_ns + cyc_to_ns((cyc - epoch_cyc)
      
      The cyc on cpu0 was read before the epoch changed. But we
      calculate the nanoseconds based on the new epoch by subtracting
      the new epoch from the old cycle count. Since epoch is most likely
      larger than the old cycle count we calculate a large number that
      will be converted to nanoseconds and added to epoch_ns, causing
      time to jump forward too much.
      
      Fix this problem by reading the hardware after the epoch has
      stabilized.
      
      Change-Id: I995133b229b2c2fedd5091406d1dc366d8bfff7b
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Git-commit: 336ae1180df5f69b9e0fb6561bec01c5f64361cf
      Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
      
      
      [sboyd: reworked for file movement kernel/time -> arm/kernel]
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarfranciscofranco <franciscofranco.1990@gmail.com>
      e68da643
    • Ingo Molnar's avatar
      nohz: Reduce overhead under high-freq idling patterns · 28ecf5b0
      Ingo Molnar authored
      
      One testbox of mine (Intel Nehalem, 16-way) uses MWAIT for its idle routine,
      which apparently can break out of its idle loop rather frequently, with
      high frequency.
      
      In that case NO_HZ_FULL=y kernels show high ksoftirqd overhead and constant
      context switching, because tick_nohz_stop_sched_tick() will, if
      delta_jiffies == 0, mis-identify this as a timer event - activating the
      TIMER_SOFTIRQ, which wakes up ksoftirqd.
      
      Fix this by treating delta_jiffies == 0 the same way we treat other short
      wakeups, delta_jiffies == 1.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarfranciscofranco <franciscofranco.1990@gmail.com>
      28ecf5b0
    • Srivatsa S. Bhat's avatar
      CPU hotplug, debug: detect imbalance between get_online_cpus() and put_online_cpus() · 1e604bb0
      Srivatsa S. Bhat authored
      
      The synchronization between CPU hotplug readers and writers is achieved
      by means of refcounting, safeguarded by the cpu_hotplug.lock.
      
      get_online_cpus() increments the refcount, whereas put_online_cpus()
      decrements it.  If we ever hit an imbalance between the two, we end up
      compromising the guarantees of the hotplug synchronization i.e, for
      example, an extra call to put_online_cpus() can end up allowing a
      hotplug reader to execute concurrently with a hotplug writer.
      
      So, add a WARN_ON() in put_online_cpus() to detect such cases where the
      refcount can go negative, and also attempt to fix it up, so that we can
      continue to run.
      
      Change-Id: I144efeaa5899a2e8a3cddd21f010679cbaaa2459
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Git-commit: 075663d19885eb3738fd2d7dbdb8947e12563b68
      Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
      
      Signed-off-by: default avatarOsvaldo Banuelos <osvaldob@codeaurora.org>
      1e604bb0
    • Steve Muckle's avatar
      tracing/sched: add load balancer tracepoint · 3eec8401
      Steve Muckle authored
      
      When doing performance analysis it can be useful to see exactly
      what is going on with the load balancer - when it runs and why
      exactly it may not be redistributing load.
      
      This additional tracepoint will show the idle context of the
      load balance operation (idle, not idle, newly idle), various
      values from the load balancing operation, the final result,
      and the new balance interval.
      
      Change-Id: I9e5c97ae3878bea44e60d189ff3cec2275f2c75e
      Signed-off-by: default avatarSteve Muckle <smuckle@codeaurora.org>
      3eec8401
  2. 25 Mar, 2014 6 commits
    • hellsgod's avatar
      config: b46-t1 · 82b861f6
      hellsgod authored
      82b861f6
    • myfluxi's avatar
      SELinux: ipv4: Dump stack on attempt to release alive inet socket · 940dd241
      myfluxi authored
      We hide the bug, that will make SELinux bomb out. Let's dump the stack
      when it happens to have at least a chance to find the cause.
      940dd241
    • franciscofranco's avatar
      SELinux: add a ugly workaround to bail early when selinux tries to free a socket · 7de7e508
      franciscofranco authored
      
      That is still in use and it ends in a Kernel panic.
      These lines are from opensource.samsung.com.
      While this will most likely put a bandaid over this Kernel panic it does
      not fix the bug so it might occur somewhere else if the
      same conditions are met but since I have only seen panics from this hook then we may as well sleep in peace.
      Signed-off-by: default avatarfranciscofranco <franciscofranco.1990@gmail.com>
      7de7e508
    • anarkia1976's avatar
      8c9923a7
    • Hong-Mei Li's avatar
      msm: memutils: memcpy, memmove optimization · fd441b3f
      Hong-Mei Li authored
      
      1. To fit 8x26 64-bytes cache line size, we change preload steps as 64Bytes.
      2. According to tune result, change preload distance as 5 cache lines.
      3. According to tune result, re-arrange the ld/str order as back-to-back
         for copy_from_user.
      
      The markable improvement:
      	memcpy : 5%
      	copy_to_user : 9%
      	copy_from_user : 13%
      	memmove : 37%
      
      Raw data is as below:
      
      BASELINE:
      	memcpy 1000MB at 5MB       : took 1547098 usec, bandwidth 646.646 MB/s
      	copy_to_user 1000MB at 5MB : took 1704308 usec, bandwidth 586.586 MB/s
      	copy_from_user 1000MB at 5M: took 1777090 usec, bandwidth 562.562 MB/s
      	memmove 1000GB at 5MB      : took 1066205 usec, bandwidth 937.937 MB/s
      	copy_to_user 1000GB at 4kB : took 1774866 usec, bandwidth 563.563 MB/s
      	copy_from_user 1000GB at 4k: took 1797654 usec, bandwidth 556.556 MB/s
      	copy_page 1000GB at 4kB    : took 1644606 usec, bandwidth 608.608 MB/s
      	memmove 1000GB at 4kB      : took 1236227 usec, bandwidth 808.808 MB/s
      
      THIS PATCH:
      	memcpy 1000MB at 5MB       : took 1475835 usec, bandwidth 677.677 MB/s
      	copy_to_user 1000MB at 5MB : took 1559060 usec, bandwidth 641.641 MB/s
      	copy_from_user 1000MB at 5M: took 1561603 usec, bandwidth 640.640 MB/s
      	memmove 1000GB at 5MB      : took 861664 usec, bandwidth 1160.160 MB/s
      	copy_to_user 1000GB at 4kB : took 1673501 usec, bandwidth 597.597 MB/s
      	copy_from_user 1000GB at 4k: took 1674006 usec, bandwidth 597.597 MB/s
      	copy_page 1000GB at 4kB    : took 1691358 usec, bandwidth 591.591 MB/s
      	memmove 1000GB at 4kB      : took 882985 usec, bandwidth 1132.132 MB/s
      
      Change-Id: I83bec3b7a9dd9cd88890eaa7ec423363e230a651
      Signed-off-by: default avatarHong-Mei Li <a21834@motorola.com>
      Reviewed-on: http://gerrit.pcs.mot.com/550383
      
      
      SLT-Approved: Slta Waiver <sltawvr@motorola.com>
      Tested-by: default avatarJira Key <jirakey@motorola.com>
      Reviewed-by: default avatarKlocwork kwcheck <klocwork-kwcheck@sourceforge.mot.com>
      Reviewed-by: default avatarCheck Patch <CHEKPACH@motorola.com>
      Reviewed-by: default avatarYi-Wei Zhao <gbjc64@motorola.com>
      Submit-Approved: Jira Key <jirakey@motorola.com>
      fd441b3f
    • hellsgod's avatar
      msm: memutils: memcpy, memmove, copy_page optimization · 7bde1bea
      hellsgod authored
      
      Preload farther to take advantage of the memory bus, and assume
      64-byte cache lines.  Unroll some pairs of ldm/stm as well, for
      unexplainable reasons.
      
      Future enhancements should include,
      
      - #define for how far to preload, possibly defined separately for
        memcpy, copy_*_user
      - Tuning for misaligned buffers
      - Tuning for memmove
      - Tuning for small buffers
      - Understanding mechanism behind ldm/stm unroll causing some gains
        in copy_to_user
      
      BASELINE (msm8960pro):
      ======================================================================
      memcpy 1000MB at 5MB       : took 808850 usec, bandwidth 1236.236 MB/s
      copy_to_user 1000MB at 5MB : took 810071 usec, bandwidth 1234.234 MB/s
      copy_from_user 1000MB at 5M: took 942926 usec, bandwidth 1060.060 MB/s
      memmove 1000GB at 5MB      : took 848588 usec, bandwidth 1178.178 MB/s
      copy_to_user 1000GB at 4kB : took 847916 usec, bandwidth 1179.179 MB/s
      copy_from_user 1000GB at 4k: took 935113 usec, bandwidth 1069.069 MB/s
      copy_page 1000GB at 4kB    : took 779459 usec, bandwidth 1282.282 MB/s
      
      THIS PATCH:
      ======================================================================
      memcpy 1000MB at 5MB       : took 346223 usec, bandwidth 2888.888 MB/s
      copy_to_user 1000MB at 5MB : took 348084 usec, bandwidth 2872.872 MB/s
      copy_from_user 1000MB at 5M: took 348176 usec, bandwidth 2872.872 MB/s
      memmove 1000GB at 5MB      : took 348267 usec, bandwidth 2871.871 MB/s
      copy_to_user 1000GB at 4kB : took 377018 usec, bandwidth 2652.652 MB/s
      copy_from_user 1000GB at 4k: took 371829 usec, bandwidth 2689.689 MB/s
      copy_page 1000GB at 4kB    : took 383763 usec, bandwidth 2605.605 MB/s
      
      Change-Id: I843fe77192d61ce35a81e75d76fc09f0a472e98a
      Signed-off-by: default avatarChris Fries <C.Fries@motorola.com>
      7bde1bea
  3. 24 Mar, 2014 4 commits
  4. 19 Mar, 2014 16 commits