Commits · 1fc51ff24b39f6bb2af6e18d3f6733ff97e013e4 · Chris / Cam Test

13 Jun, 2014 5 commits

fs/dcache.c: Fix the too small buffer for dname · 1fc51ff2

Devin Kim authored 11 years ago


temp[64] is used for internal temporary buffer in dynamic_dname().
This is for dname. But It's too small. dname's size may be > 64.
In that case, it returns as -ENAMETOOLONG. So Increase the buffer size
to 256 for avoiding this issue.

The following was caused by the small buffer.
WARNING: at /kernel/mm/page_alloc.c:2470 __alloc_pages_nodemask+0x24c/0x938()
CPU: 2 PID: 505 Comm: android.bg Not tainted 3.10.0-g2f73780-00003-g2ff41d9-dirty #13
[<c010ba3c>] (unwind_backtrace+0x0/0x11c) from [<c0109cac>] (show_stack+0x10/0x14)
[<c0109cac>] (show_stack+0x10/0x14) from [<c01939a0>] (warn_slowpath_common+0x48/0x68)
[<c01939a0>] (warn_slowpath_common+0x48/0x68) from [<c0193a7c>] (warn_slowpath_null+0x18/0x20)
[<c0193a7c>] (warn_slowpath_null+0x18/0x20) from [<c0222454>] (__alloc_pages_nodemask+0x24c/0x938)
[<c0222454>] (__alloc_pages_nodemask+0x24c/0x938) from [<c0222b50>] (__get_free_pages+0x10/0x24)
[<c0222b50>] (__get_free_pages+0x10/0x24) from [<c024faf8>] (kmalloc_order_trace+0x24/0xf0)
[<c024faf8>] (kmalloc_order_trace+0x24/0xf0) from [<c024fe20>] (__kmalloc+0x30/0x244)
[<c024fe20>] (__kmalloc+0x30/0x244) from [<c02723c8>] (seq_read+0x270/0x464)
[<c02723c8>] (seq_read+0x270/0x464) from [<c0256a18>] (vfs_read+0xa4/0x134)
[<c0256a18>] (vfs_read+0xa4/0x134) from [<c0256de8>] (SyS_read+0x38/0x68)
[<c0256de8>] (SyS_read+0x38/0x68) from [<c0106140>] (ret_fast_syscall+0x0/0x30)

Change-Id: I74f5217ba3c4be73e91f33f900f1f0c26810cc05
Signed-off-by: Devin Kim <dojip.kim@lge.com>

1fc51ff2

seq_file: always clear m->count when we free m->buf · 06f5e4c0

Devin Kim authored 11 years ago

Once we'd freed m->buf, m->count should become zero - we have no valid
contents reachable via m->buf.

cherry-picked from https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/seq_file.c?id=801a76050bcf8d4e500eb8d048ff6265f37a61c8

Change-Id: I4c1d3e69db4ecf5362e2a5d05bfd7db754dc6dc6
Reported-by: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Devin Kim <dojip.kim@lge.com>

06f5e4c0

seq_file: always update file->f_pos in seq_lseek() · 42a93c2b

Devin Kim authored 11 years ago

This issue was first pointed out by Jiaxing Wang several months ago, but no
further comments:
https://lkml.org/lkml/2013/6/29/41

As we know pread() does not change f_pos, so after pread(), file->f_pos
and m->read_pos become different. And seq_lseek() does not update file->f_pos
if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to
m->read_pos), then a subsequent read may read from a wrong position, the
following program produces the problem:

    char str1[32] = { 0 };
    char str2[32] = { 0 };
    int poffset = 10;
    int count = 20;

    /*open any seq file*/
    int fd = open("/proc/modules", O_RDONLY);

    pread(fd, str1, count, poffset);
    printf("pread:%s\n", str1);

    /*seek to where m->read_pos is*/
    lseek(fd, poffset+count, SEEK_SET);

    /*supposed to read from poffset+count, but this read from position 0*/
    read(fd, str2, count);
    printf("read:%s\n", str2);

out put:
pread:
 ck_netbios_ns 12665
read:
 nf_conntrack_netbios

/proc/modules:
nf_conntrack_netbios_ns 12665 0 - Live 0xffffffffa038b000
nf_conntrack_broadcast 12589 1 nf_conntrack_netbios_ns, Live 0xffffffffa0386000

So we always update file->f_pos to offset in seq_lseek() to fix this issue.

cherry-picked from https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/seq_file.c?id=05e16745c0c471bba313961b605b6da3b21a853d

Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/seq_file.c

Change-Id: If419b92498e2c3e08669a4342e9b9ebf99ad3768
Signed-off-by: Devin Kim <dojip.kim@lge.com>

42a93c2b

mako: touch: PLG137 firmware E044 update · e9839396

Jongrak Kwon authored 11 years ago


Improved ghost touch issues by
- Upgraded baseline check algorithm
- Adjusted touch sensitivity

It doesn't help in all cases for b/9236385

Bug: 7725315
Bug: 9236385
Change-Id: I1850427bac84620f34cad61bdfd9b16bbf692e32
Signed-off-by: Jongrak Kwon <jongrak.kwon@lge.com>

e9839396

msm: mdp: Free secure memory based on pipe config · 1b766b56

Naseer Ahmed authored 11 years ago


When there is no secure pipe configured, we can unmap secure
memory instead of relying on unset with secure flag set from
userspace.

Bug: 11857675
Signed-off-by: Naseer Ahmed <naseer@codeaurora.org>

1b766b56

06 Apr, 2014 29 commits

config: b46-t4-kk · ce14da27
hellsgod authored 11 years ago

ce14da27

Linux 3.4.86 · a5ea33ba

Greg Kroah-Hartman authored 11 years ago


staging: speakup: Prefix externally-visible symbols

commit ca2beaf84d9678c12b17d92623f0e90829d6ca13 upstream.

This prefixes all externally-visible symbols of speakup with "spk_".
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ext4: atomically set inode->i_flags in ext4_set_inode_flags()

commit 00a1a053ebe5febcfc2ec498bd894f035ad2aa06 upstream.

Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: synaptics - add manual min/max quirk

commit 421e08c41fda1f0c2ff6af81a67b491389b653a5 upstream.

The new Lenovo Haswell series (-40's) contains a new Synaptics touchpad.
However, these new Synaptics devices report bad axis ranges.
Under Windows, it is not a problem because the Windows driver uses RMI4
over SMBus to talk to the device. Under Linux, we are using the PS/2
fallback interface and it occurs the reported ranges are wrong.

Of course, it would be too easy to have only one range for the whole
series, each touchpad seems to be calibrated in a different way.

We can not use SMBus to get the actual range because I suspect the firmware
will switch into the SMBus mode and stop talking through PS/2 (this is the
case for hybrid HID over I2C / PS/2 Synaptics touchpads).

So as a temporary solution (until RMI4 land into upstream), start a new
list of quirks with the min/max manually set.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: synaptics - add manual min/max quirk for ThinkPad X240

commit 8a0435d958fb36d93b8df610124a0e91e5675c82 upstream.

This extends Benjamin Tissoires manual min/max quirk table with support for
the ThinkPad X240.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86: fix boot on uniprocessor systems

commit 825600c0f20e595daaa7a6dd8970f84fa2a2ee57 upstream.

On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().

See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too.  Enabling them also fixes
rapl_pmu code.
Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages

commit b22f5126a24b3b2f15448c3f2a254fc10cbc2b92 upstream.

Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...

  struct dccp_hdr _dh, *dh;
  ...
  skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);

... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.

Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.

Fixes: 2bc78049

 ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a5ea33ba

msm: kgsl: Fix Z180 memory leak · badc38e2

Hareesh Gundu authored 11 years ago


Decerement entry refcount, which incremented
in kgsl_sharedmem_find_region.

CRs-Fixed: 635747
Change-Id: I621ba8f8e119a9ab8ba5455b28a565e3cae2f7cd
Signed-off-by: Hareesh Gundu <hareeshg@codeaurora.org>

badc38e2

drivers: input: 2w: fix late resume problems · ae224a2a
anarkia1976 authored 11 years ago

ae224a2a
drivers: input: 2w: fix Unbalanced IRQ 294 wake disable · f625b746
anarkia1976 authored 11 years ago

f625b746
drivers: input: 2w: permit disable irq for power button · 00bfa7d0
anarkia1976 authored 11 years ago

00bfa7d0
introduce for_each_thread() to replace the buggy while_each_thread() (fix) · db980a2f
Paul Reioux authored 11 years ago
```
fix a merge derp. pointed out by neobuddy89
Signed-off-by: Paul Reioux <reioux@gmail.com>
```
db980a2f
lowmemorykiller.c: convert to use for_each_thread · d8061bf4
Paul Reioux authored 11 years ago
```
Signed-off-by: Paul Reioux <reioux@gmail.com>
```
d8061bf4

introduce for_each_thread() to replace the buggy while_each_thread() · fee0c1cf

Oleg Nesterov authored 11 years ago


while_each_thread() and next_thread() should die, almost every lockless
usage is wrong.

1. Unless g == current, the lockless while_each_thread() is not safe.

   while_each_thread(g, t) can loop forever if g exits, next_thread()
   can't reach the unhashed thread in this case. Note that this can
   happen even if g is the group leader, it can exec.

2. Even if while_each_thread() itself was correct, people often use
   it wrongly.

   It was never safe to just take rcu_read_lock() and loop unless
   you verify that pid_alive(g) == T, even the first next_thread()
   can point to the already freed/reused memory.

This patch adds signal_struct->thread_head and task->thread_node to
create the normal rcu-safe list with the stable head.  The new
for_each_thread(g, t) helper is always safe under rcu_read_lock() as
long as this task_struct can't go away.

Note: of course it is ugly to have both task_struct->thread_node and the
old task_struct->thread_group, we will kill it later, after we change
the users of while_each_thread() to use for_each_thread().

Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node.  But
we can't do this right now because this will lead to subtle behavioural
changes.  For example, do/while_each_thread() always sees at least one
task, while for_each_thread() can do nothing if the whole thread group
has died.  Or thread_group_empty(), currently its semantics is not clear
unless thread_group_leader(p) and we need to audit the callers before we
can change it.

So this patch adds the new interface which has to coexist with the old
one for some time, hopefully the next changes will be more or less
straightforward and the old one will go away soon.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Reioux <reioux@gmail.com>

backported to Linux 3.4

Conflicts:
	kernel/fork.c

fee0c1cf

mm, oom: base root bonus on current usage · 855e6b1c

David Rientjes authored 11 years ago

A 3% of system memory bonus is sometimes too excessive in comparison to
other processes.

With commit a63d83f4

 ("oom: badness heuristic rewrite"), the OOM
killer tries to avoid killing privileged tasks by subtracting 3% of
overall memory (system or cgroup) from their per-task consumption.  But
as a result, all root tasks that consume less than 3% of overall memory
are considered equal, and so it only takes 33+ privileged tasks pushing
the system out of memory for the OOM killer to do something stupid and
kill dhclient or other root-owned processes.  For example, on a 32G
machine it can't tell the difference between the 1M agetty and the 10G
fork bomb member.

The changelog describes this 3% boost as the equivalent to the global
overcommit limit being 3% higher for privileged tasks, but this is not
the same as discounting 3% of overall memory from _every privileged task
individually_ during OOM selection.

Replace the 3% of system memory bonus with a 3% of current memory usage
bonus.

By giving root tasks a bonus that is proportional to their actual size,
they remain comparable even when relatively small.  In the example
above, the OOM killer will discount the 1M agetty's 256 badness points
down to 179, and the 10G fork bomb's 262144 points down to 183500 points
and make the right choice, instead of discounting both to 0 and killing
agetty because it's first in the task list.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

855e6b1c

mm, oom: prefer thread group leaders for display purposes · 0d379f45

David Rientjes authored 11 years ago


When two threads have the same badness score, it's preferable to kill
the thread group leader so that the actual process name is printed to
the kernel log rather than the thread group name which may be shared
amongst several processes.

This was the behavior when select_bad_process() used to do
for_each_process(), but it now iterates threads instead and leads to
ambiguity.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0d379f45

oom_kill: add rcu_read_lock() into find_lock_task_mm() · d7aade1d

Oleg Nesterov authored 11 years ago


find_lock_task_mm() expects it is called under rcu or tasklist lock, but
it seems that at least oom_unkillable_task()->task_in_mem_cgroup() and
mem_cgroup_out_of_memory()->oom_badness() can call it lockless.

Perhaps we could fix the callers, but this patch simply adds rcu lock
into find_lock_task_mm().  This also allows to simplify a bit one of its
callers, oom_kill_process().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Sergey Dyasly <dserrg@gmail.com>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d7aade1d

oom_kill: has_intersects_mems_allowed() needs rcu_read_lock() · 5c50d0f8

Oleg Nesterov authored 11 years ago


At least out_of_memory() calls has_intersects_mems_allowed() without
even rcu_read_lock(), this is obviously buggy.

Add the necessary rcu_read_lock().  This means that we can not simply
return from the loop, we need "bool ret" and "break".

While at it, swap the names of task_struct's (the argument and the
local).  This cleans up the code a little bit and avoids the unnecessary
initialization.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5c50d0f8

oom_kill: change oom_kill.c to use for_each_thread() · 56df589f

Oleg Nesterov authored 11 years ago


Change oom_kill.c to use for_each_thread() rather than the racy
while_each_thread() which can loop forever if we race with exit.

Note also that most users were buggy even if while_each_thread() was
fine, the task can exit even _before_ rcu_read_lock().

Fortunately the new for_each_thread() only requires the stable
task_struct, so this change fixes both problems.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

56df589f

mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). · 6485d458

Rusty Russell authored 11 years ago


The normal expectation for ERR_PTR() is to put a negative errno into a
pointer.  oom_kill puts the magic -1 in the result (and has since
pre-git), which is probably clearer with an explicit cast.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

6485d458

mm, oom: cleanup pagefault oom handler · 80bec38f

David Rientjes authored 12 years ago


To lock the entire system from parallel oom killing, it's possible to pass
in a zonelist with all zones rather than using for_each_populated_zone()
for the iteration.  This obsoletes try_set_system_oom() and
clear_system_oom() so that they can be removed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

80bec38f

oom: use N_MEMORY instead N_HIGH_MEMORY · 2cddef4d

Lai Jiangshan authored 12 years ago


N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2cddef4d

mm, oom: allow exiting threads to have access to memory reserves · 18455a23

David Rientjes authored 12 years ago


Exiting threads, those with PF_EXITING set, can pagefault and require
memory before they can make forward progress.  This happens, for instance,
when a process must fault task->robust_list, a userspace structure, before
detaching its memory.

These threads also aren't guaranteed to get access to memory reserves
unless oom killed or killed from userspace.  The oom killer won't grant
memory reserves if other threads are also exiting other than current and
stalling at the same point.  This prevents needlessly killing processes
when others are already exiting.

Instead of special casing all the possible situations between PF_EXITING
getting set and a thread detaching its mm where it may allocate memory,
which probably wouldn't get updated when a change is made to the exit
path, the solution is to give all exiting threads access to memory
reserves if they call the oom killer.  This allows them to quickly
allocate, detach its mm, and free the memory it represents.

Summary of Luigi's bug report:

: He had an oom condition where threads were faulting on task->robust_list
: and repeatedly called the oom killer but it would defer killing a thread
: because it saw other PF_EXITING threads.  This can happen anytime we need
: to allocate memory after setting PF_EXITING and before detaching our mm;
: if there are other threads in the same state then the oom killer won't do
: anything unless one of them happens to be killed from userspace.
:
: So instead of only deferring for PF_EXITING and !task->robust_list, it's
: better to just give them access to memory reserves to prevent a potential
: livelock so that any other faults that may be introduced in the future in
: the exit path don't cause the same problem (and hopefully we don't allow
: too many of those!).
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Tested-by: Luigi Semenzato <semenzato@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

18455a23

sched/fair: Optimize cgroup pick_next_task_fair() · 4a5e1d0e

Peter Zijlstra authored 13 years ago

Since commit 2f36825b

 ("sched: Next buddy hint on sleep and preempt
path") it is likely we pick a new task from the same cgroup, doing a put
and then set on all intermediate entities is a waste of time, so try to
avoid this.

Measured using:

  mount nodev /cgroup -t cgroup -o cpu
  cd /cgroup
  mkdir a; cd a
  mkdir b; cd b
  mkdir c; cd c
  echo $$ > tasks
  perf stat --repeat 10 -- taskset 1 perf bench sched pipe

PRE :      4.542422684 seconds time elapsed   ( +-  0.33% )
POST:      4.389409991 seconds time elapsed   ( +-  0.32% )

Which shows a significant improvement of ~3.5%
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptop

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul Reioux <reioux@gmail.com>

Backported for Linux 3.4 kernel

Conflicts:
	kernel/sched/fair.c

Conflicts:
	kernel/sched/fair.c

4a5e1d0e

sched/fair: Optimize find_busiest_queue() · b781b233

Peter Zijlstra authored 11 years ago


Use for_each_cpu_and() and thereby avoid computing the capacity for
CPUs we know we're not interested in.
Reviewed-by: Paul Turner <pjt@google.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-lppceyv6kb3a19g8spmrn20b@git.kernel.org

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul Reioux <reioux@gmail.com>

backported for Linux 3.4

Conflicts:
	kernel/sched/fair.c

b781b233

mm, memcg: move all oom handling to memcontrol.c · 7d29513c

hellsgod authored 11 years ago

By globally defining check_panic_on_oom(), the memcg oom handler can be
moved entirely to mm/memcontrol.c.  This removes the ugly #ifdef in the
oom killer and cleans up the code.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Reioux <reioux@gmail.com>

Conflicts:
	mm/oom_kill.c

7d29513c

mm, oom: reduce dependency on tasklist_lock · b4e84e3b

David Rientjes authored 12 years ago


Since exiting tasks require write_lock_irq(&tasklist_lock) several times,
try to reduce the amount of time the readside is held for oom kills.  This
makes the interface with the memcg oom handler more consistent since it
now never needs to take tasklist_lock unnecessarily.

The only time the oom killer now takes tasklist_lock is when iterating the
children of the selected task, everything else is protected by
rcu_read_lock().

This requires that a reference to the selected process, p, is grabbed
before calling oom_kill_process().  It may release it and grab a reference
on another one of p's threads if !p->mm, but it also guarantees that it
will release the reference before returning.

[hughd@google.com: fix duplicate put_task_struct()]
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Reioux <reioux@gmail.com>

Conflicts:
	mm/oom_kill.c

Conflicts:
	mm/oom_kill.c

b4e84e3b

mm, memcg: introduce own oom handler to iterate only over its own threads · 94d4cf24

David Rientjes authored 12 years ago


The global oom killer is serialized by the per-zonelist
try_set_zonelist_oom() which is used in the page allocator.  Concurrent
oom kills are thus a rare event and only occur in systems using
mempolicies and with a large number of nodes.

Memory controller oom kills, however, can frequently be concurrent since
there is no serialization once the oom killer is called for oom conditions
in several different memcgs in parallel.

This creates a massive contention on tasklist_lock since the oom killer
requires the readside for the tasklist iteration.  If several memcgs are
calling the oom killer, this lock can be held for a substantial amount of
time, especially if threads continue to enter it as other threads are
exiting.

Since the exit path grabs the writeside of the lock with irqs disabled in
a few different places, this can cause a soft lockup on cpus as a result
of tasklist_lock starvation.

The kernel lacks unfair writelocks, and successful calls to the oom killer
usually result in at least one thread entering the exit path, so an
alternative solution is needed.

This patch introduces a seperate oom handler for memcgs so that they do
not require tasklist_lock for as much time.  Instead, it iterates only
over the threads attached to the oom memcg and grabs a reference to the
selected thread before calling oom_kill_process() to ensure it doesn't
prematurely exit.

This still requires tasklist_lock for the tasklist dump, iterating
children of the selected process, and killing all other threads on the
system sharing the same memory as the selected victim.  So while this
isn't a complete solution to tasklist_lock starvation, it significantly
reduces the amount of time that it is held.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Sha Zhengju <handai.szj@taobao.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

94d4cf24

mm, oom: introduce helper function to process threads during scan · cf670b46

David Rientjes authored 12 years ago


This patch introduces a helper function to process each thread during the
iteration over the tasklist.  A new return type, enum oom_scan_t, is
defined to determine the future behavior of the iteration:

 - OOM_SCAN_OK: continue scanning the thread and find its badness,

 - OOM_SCAN_CONTINUE: do not consider this thread for oom kill, it's
   ineligible,

 - OOM_SCAN_ABORT: abort the iteration and return, or

 - OOM_SCAN_SELECT: always select this thread with the highest badness
   possible.

There is no functional change with this patch.  This new helper function
will be used in the next patch in the memory controller.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sha Zhengju <handai.szj@taobao.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cf670b46

mm, oom: move declaration for mem_cgroup_out_of_memory to oom.h · 67c84428

David Rientjes authored 12 years ago


mem_cgroup_out_of_memory() is defined in mm/oom_kill.c, so declare it in
linux/oom.h rather than linux/memcontrol.h.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Reioux <reioux@gmail.com>

Conflicts:
	include/linux/memcontrol.h

67c84428

mm, oom: fix and cleanup oom score calculations · 20e11f2f

David Rientjes authored 12 years ago


The divide in p->signal->oom_score_adj * totalpages / 1000 within
oom_badness() was causing an overflow of the signed long data type.

This adds both the root bias and p->signal->oom_score_adj before doing the
normalization which fixes the issue and also cleans up the calculation.
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

20e11f2f

mm, oom: fix badness score underflow · 17cd5b6a

David Rientjes authored 13 years ago


If the privileges given to root threads (3% of allowable memory) or a
negative value of /proc/pid/oom_score_adj happen to exceed the amount of
rss of a thread, its badness score overflows as a result of commit
a7f638f999ff ("mm, oom: normalize oom scores to oom_score_adj scale only
for userspace").

Fix this by making the type signed and return 1, meaning the thread is
still eligible for kill, if the value is negative.
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

17cd5b6a

mm, oom: normalize oom scores to oom_score_adj scale only for userspace · 4a1ba135

David Rientjes authored 13 years ago


The oom_score_adj scale ranges from -1000 to 1000 and represents the
proportion of memory available to the process at allocation time.  This
means an oom_score_adj value of 300, for example, will bias a process as
though it was using an extra 30.0% of available memory and a value of
-350 will discount 35.0% of available memory from its usage.

The oom killer badness heuristic also uses this scale to report the oom
score for each eligible process in determining the "best" process to
kill.  Thus, it can only differentiate each process's memory usage by
0.1% of system RAM.

On large systems, this can end up being a large amount of memory: 256MB
on 256GB systems, for example.

This can be fixed by having the badness heuristic to use the actual
memory usage in scoring threads and then normalizing it to the
oom_score_adj scale for userspace.  This results in better comparison
between eligible threads for kill and no change from the userspace
perspective.
Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4a1ba135

page_alloc: Make watermarks tunable separately · 419074f5

Paul Reioux authored 11 years ago


This patch introduces three new sysctls to /proc/sys/vm:
wmark_min_kbytes, wmark_low_kbytes and wmark_high_kbytes.

Each entry is used to compute watermark[min], watermark[low]
and watermark[high] for each zone.

These parameters are also updated when min_free_kbytes are
changed because originally they are set based on min_free_kbytes.
On the other hand, min_free_kbytes is updated when wmark_free_kbytes
changes.

By using the parameters one can adjust the difference among
watermark[min], watermark[low] and watermark[high] and as a result
one can tune the kernel reclaim behaviour to fit their requirement.
Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>

modified and tuned for Hammerhead
Signed-off-by: Paul Reioux <reioux@gmail.com>

419074f5

31 Mar, 2014 6 commits

cpufreq: break earlier if target_freq is equal to current freq. Also fetch a... · 9f3e8ffb

Francisco Franco authored 11 years ago

cpufreq: break earlier if target_freq is equal to current freq. Also fetch a fix from @imoseyon to update other cpuX nodes when cpu0 gets updated (max/min/gov).
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>

9f3e8ffb

config: b46-t3 / disable SHADOW_WRITES · 21249e43
hellsgod authored 11 years ago

21249e43

msm: clock-rpm: Make rpm clocks sleeping clocks · 8b4df679

Stephen Boyd authored 12 years ago


Now that we have clk_prepare/unprepare we can make the RPM clocks
sleepable. This allows us to move the sometimes costly busy wait
that RPM clocks incur when enabling and disabling or changing
rates.

CRs-Fixed: 552223

Change-Id: I8ac53c0b7fc79e56051b19fedb6910ac3f1cda42
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Git-commit: b500badb5dc821dd92f93833003170cb9ae106b0
Git-repo: https://android.googlesource.com/kernel/msm/

Signed-off-by: Srinivasarao P <spathi@codeaurora.org>

8b4df679

cpufreq: Fix broken uevents for cpufreq governor and cpu devices · 695574a7

myfluxi authored 11 years ago

cyanogens uevent commit when the governor changes was rendered
non-working since kitkat (or so) as the uevent filter function
caused events to be dropped. Now hook into cpufreq_core_init()
and cpufreq_add_dev_interface() to create our basic ksets for
cpufreq and cpu devices. Also, we don't need to set environmental
data, so clean it up a bit.

This commit requires a change in ueventd.rc that add rules for
several files of interest.

Change-Id: I3aafa0d4e18363e1d68535f513099ecd27024007

695574a7

drivers: cpufreq: Send a uevent when governor changes · e542d63f
Steve Kondik authored 12 years ago
```
 * Useful so userspace tools can reconfigure.

Change-Id: Ib423910b8b9ac791ebe81a75bf399f58272f64f2
```
e542d63f

msm: dma: Refactored driver to avoid queuing of cmds in workq. · 7f17f9c8

Venkatesh Yadav Abbarapu authored 11 years ago


Highlights of the changes are:
1. A workqueue is used to configure ADM only when
   the clocks need to be prepared & enabled.
   A function call is used to configure ADM in
   cases where the ADM driver is invoked for data
   transfer when the clock is already prepared &
   enabled.
2. Replaced threaded irqs with hard irqs.

Change-Id: Ifaa5efdde8150f932a12d4f9eeccfb36ecb8f88f
Acked-by: John Nicholas <jnichola@qti.qualcomm.com>
Signed-off-by: Venkatesh Yadav Abbarapu <quicvenkat@codeaurora.org>

7f17f9c8