1. 25 Jul, 2008 9 commits
    • Li Zefan's avatar
      memcg: clean up checking of the disabled flag · cede86ac
      Li Zefan authored
      
      Those checks are unnecessary, because when the subsystem is disabled
      it can't be mounted, so those functions won't get called.
      
      The check is needed in functions which will be called in other places
      except cgroup.
      
      [hugh@veritas.com: further checking of disabled flag]
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cede86ac
    • KAMEZAWA Hiroyuki's avatar
      memcg: remove a redundant check · accf163e
      KAMEZAWA Hiroyuki authored
      
      Because of remove refcnt patch, it's very rare case to that
      mem_cgroup_charge_common() is called against a page which is accounted.
      
      mem_cgroup_charge_common() is called when.
       1. a page is added into file cache.
       2. an anon page is _newly_ mapped.
      
      A racy case is that a newly-swapped-in anonymous page is referred from
      prural threads in do_swap_page() at the same time.
      (a page is not Locked when mem_cgroup_charge() is called from do_swap_page.)
      
      Another case is shmem. It charges its page before calling add_to_page_cache().
      Then, mem_cgroup_charge_cache() is called twice. This case is handled in
      mem_cgroup_cache_charge(). But this check may be too hacky...
      
      Signed-off-by : KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      accf163e
    • KAMEZAWA Hiroyuki's avatar
      memcg: add hints for branch · b76734e5
      KAMEZAWA Hiroyuki authored
      
      Showing brach direction for obvious conditions.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b76734e5
    • KAMEZAWA Hiroyuki's avatar
      memcg: helper function for relcaim from shmem. · c9b0ed51
      KAMEZAWA Hiroyuki authored
      
      A new call, mem_cgroup_shrink_usage() is added for shmem handling and
      relacing non-standard usage of mem_cgroup_charge/uncharge.
      
      Now, shmem calls mem_cgroup_charge() just for reclaim some pages from
      mem_cgroup.  In general, shmem is used by some process group and not for
      global resource (like file caches).  So, it's reasonable to reclaim pages
      from mem_cgroup where shmem is mainly used.
      
      [hugh@veritas.com: shmem_getpage release page sooner]
      [hugh@veritas.com: mem_cgroup_shrink_usage css_put]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9b0ed51
    • KAMEZAWA Hiroyuki's avatar
      memcg: remove refcnt from page_cgroup · 69029cd5
      KAMEZAWA Hiroyuki authored
      
      memcg: performance improvements
      
      Patch Description
       1/5 ... remove refcnt fron page_cgroup patch (shmem handling is fixed)
       2/5 ... swapcache handling patch
       3/5 ... add helper function for shmem's memory reclaim patch
       4/5 ... optimize by likely/unlikely ppatch
       5/5 ... remove redundunt check patch (shmem handling is fixed.)
      
      Unix bench result.
      
      == 2.6.26-rc2-mm1 + memory resource controller
      Execl Throughput                           2915.4 lps   (29.6 secs, 3 samples)
      C Compiler Throughput                      1019.3 lpm   (60.0 secs, 3 samples)
      Shell Scripts (1 concurrent)               5796.0 lpm   (60.0 secs, 3 samples)
      Shell Scripts (8 concurrent)               1097.7 lpm   (60.0 secs, 3 samples)
      Shell Scripts (16 concurrent)               565.3 lpm   (60.0 secs, 3 samples)
      File Read 1024 bufsize 2000 maxblocks    1022128.0 KBps  (30.0 secs, 3 samples)
      File Write 1024 bufsize 2000 maxblocks   544057.0 KBps  (30.0 secs, 3 samples)
      File Copy 1024 bufsize 2000 maxblocks    346481.0 KBps  (30.0 secs, 3 samples)
      File Read 256 bufsize 500 maxblocks      319325.0 KBps  (30.0 secs, 3 samples)
      File Write 256 bufsize 500 maxblocks     148788.0 KBps  (30.0 secs, 3 samples)
      File Copy 256 bufsize 500 maxblocks       99051.0 KBps  (30.0 secs, 3 samples)
      File Read 4096 bufsize 8000 maxblocks    2058917.0 KBps  (30.0 secs, 3 samples)
      File Write 4096 bufsize 8000 maxblocks   1606109.0 KBps  (30.0 secs, 3 samples)
      File Copy 4096 bufsize 8000 maxblocks    854789.0 KBps  (30.0 secs, 3 samples)
      Dc: sqrt(2) to 99 decimal places         126145.2 lpm   (30.0 secs, 3 samples)
      
                           INDEX VALUES
      TEST                                        BASELINE     RESULT      INDEX
      
      Execl Throughput                                43.0     2915.4      678.0
      File Copy 1024 bufsize 2000 maxblocks         3960.0   346481.0      875.0
      File Copy 256 bufsize 500 maxblocks           1655.0    99051.0      598.5
      File Copy 4096 bufsize 8000 maxblocks         5800.0   854789.0     1473.8
      Shell Scripts (8 concurrent)                     6.0     1097.7     1829.5
                                                                       =========
           FINAL SCORE                                                     991.3
      
      == 2.6.26-rc2-mm1 + this set ==
      Execl Throughput                           3012.9 lps   (29.9 secs, 3 samples)
      C Compiler Throughput                       981.0 lpm   (60.0 secs, 3 samples)
      Shell Scripts (1 concurrent)               5872.0 lpm   (60.0 secs, 3 samples)
      Shell Scripts (8 concurrent)               1120.3 lpm   (60.0 secs, 3 samples)
      Shell Scripts (16 concurrent)               578.0 lpm   (60.0 secs, 3 samples)
      File Read 1024 bufsize 2000 maxblocks    1003993.0 KBps  (30.0 secs, 3 samples)
      File Write 1024 bufsize 2000 maxblocks   550452.0 KBps  (30.0 secs, 3 samples)
      File Copy 1024 bufsize 2000 maxblocks    347159.0 KBps  (30.0 secs, 3 samples)
      File Read 256 bufsize 500 maxblocks      314644.0 KBps  (30.0 secs, 3 samples)
      File Write 256 bufsize 500 maxblocks     151852.0 KBps  (30.0 secs, 3 samples)
      File Copy 256 bufsize 500 maxblocks      101000.0 KBps  (30.0 secs, 3 samples)
      File Read 4096 bufsize 8000 maxblocks    2033256.0 KBps  (30.0 secs, 3 samples)
      File Write 4096 bufsize 8000 maxblocks   1611814.0 KBps  (30.0 secs, 3 samples)
      File Copy 4096 bufsize 8000 maxblocks    847979.0 KBps  (30.0 secs, 3 samples)
      Dc: sqrt(2) to 99 decimal places         128148.7 lpm   (30.0 secs, 3 samples)
      
                           INDEX VALUES
      TEST                                        BASELINE     RESULT      INDEX
      
      Execl Throughput                                43.0     3012.9      700.7
      File Copy 1024 bufsize 2000 maxblocks         3960.0   347159.0      876.7
      File Copy 256 bufsize 500 maxblocks           1655.0   101000.0      610.3
      File Copy 4096 bufsize 8000 maxblocks         5800.0   847979.0     1462.0
      Shell Scripts (8 concurrent)                     6.0     1120.3     1867.2
                                                                       =========
           FINAL SCORE                                                    1004.6
      
      This patch:
      
      Remove refcnt from page_cgroup().
      
      After this,
      
       * A page is charged only when !page_mapped() && no page_cgroup is assigned.
      	* Anon page is newly mapped.
      	* File page is added to mapping->tree.
      
       * A page is uncharged only when
      	* Anon page is fully unmapped.
      	* File page is removed from LRU.
      
      There is no change in behavior from user's view.
      
      This patch also removes unnecessary calls in rmap.c which was used only for
      refcnt mangement.
      
      [akpm@linux-foundation.org: fix warning]
      [hugh@veritas.com: fix shmem_unuse_inode charging]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69029cd5
    • KAMEZAWA Hiroyuki's avatar
      memcg: better migration handling · e8589cc1
      KAMEZAWA Hiroyuki authored
      
      This patch changes page migration under memory controller to use a
      different algorithm.  (thanks to Christoph for new idea.)
      
      Before:
       - page_cgroup is migrated from an old page to a new page.
      After:
       - a new page is accounted , no reuse of page_cgroup.
      
      Pros:
      
       - We can avoid compliated lock depndencies and races in migration.
      
      Cons:
      
       - new param to mem_cgroup_charge_common().
      
       - mem_cgroup_getref() is added for handling ref_cnt ping-pong.
      
      This version simplifies complicated lock dependency in page migraiton
      under memory resource controller.
      
        new refcnt sequence is following.
      
      a mapped page:
        prepage_migration() ..... +1 to NEW page
        try_to_unmap()      ..... all refs to OLD page is gone.
        move_pages()        ..... +1 to NEW page if page cache.
        remap...            ..... all refs from *map* is added to NEW one.
        end_migration()     ..... -1 to New page.
      
        page's mapcount + (page_is_cache) refs are added to NEW one.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e8589cc1
    • KAMEZAWA Hiroyuki's avatar
      memcg: avoid unnecessary initialization · 508b7be0
      KAMEZAWA Hiroyuki authored
      
      * remove over-killing initialization (in fast path)
      * makeing the condition for PAGE_CGROUP_FLAG_ACTIVE be more obvious.
      Signed-off-by: default avatarKAMEAZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      508b7be0
    • KAMEZAWA Hiroyuki's avatar
      memcg: make global var read_mostly · a181b0e8
      KAMEZAWA Hiroyuki authored
      
      mem_cgroup_subsys and page_cgroup_cache should be read_mostly and
      MEM_CGROUP_RECLAIM_RETRIES can be just a fixed number.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a181b0e8
    • Paul Menage's avatar
      cgroup files: convert res_counter_write() to be a cgroups write_string() handler · 856c13aa
      Paul Menage authored
      
      Currently res_counter_write() is a raw file handler even though it's
      ultimately taking a number, since in some cases it wants to
      pre-process the string when converting it to a number.
      
      This patch converts res_counter_write() from a raw file handler to a
      write_string() handler; this allows some of the boilerplate
      copying/locking/checking to be removed, and simplies the cleanup path,
      since these functions are now performed by the cgroups framework.
      
      [lizf@cn.fujitsu.com: build fix]
      Signed-off-by: default avatarPaul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      856c13aa
  2. 01 May, 2008 1 commit
  3. 29 Apr, 2008 12 commits
  4. 08 Apr, 2008 1 commit
  5. 04 Apr, 2008 1 commit
    • Balbir Singh's avatar
      memory controller: make memory resource control aware of boot options · 4077960e
      Balbir Singh authored
      
      A boot option for the memory controller was discussed on lkml.  It is a good
      idea to add it, since it saves memory for people who want to turn off the
      memory controller.
      
      By default the option is on for the following two reasons:
      
      1. It provides compatibility with the current scheme where the memory
         controller turns on if the config option is enabled
      2. It allows for wider testing of the memory controller, once the config
         option is enabled
      
      We still allow the create, destroy callbacks to succeed, since they are not
      aware of boot options.  We do not populate the directory will memory resource
      controller specific files.
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4077960e
  6. 19 Mar, 2008 1 commit
  7. 04 Mar, 2008 12 commits
    • Hugh Dickins's avatar
      memcg: fix oops on NULL lru list · fb59e9f1
      Hugh Dickins authored
      
      While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
      called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
       I couldn't see what racing tasks on other cpus were doing, but surmise that
      another must have been in mem_cgroup_charge_common on the same page, between
      its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
      which I'd almost changed to a kmalloc).
      
      Normally such a race cannot happen, the ref_cnt prevents it, the final
      uncharge cannot race with the initial charge.  But force_empty buggers the
      ref_cnt, that's what it's all about; and thereafter forced pages are
      vulnerable to races such as this (just think of a shared page also mapped into
      an mm of another mem_cgroup than that just emptied).  And remain vulnerable
      until they're freed indefinitely later.
      
      This patch just fixes the oops by moving the unlock_page_cgroups down below
      adding to and removing from the list (only possible given the previous patch);
      and while we're at it, we might as well make it an invariant that
      page->page_cgroup is always set while pc is on lru.
      
      But this behaviour of force_empty seems highly unsatisfactory to me: why have
      a ref_cnt if we always have to cope with it being violated (as in the earlier
      page migration patch).  We may prefer force_empty to move pages to an orphan
      mem_cgroup (could be the root, but better not), from which other cgroups could
      recover them; we might need to reverse the locking again; but no time now for
      such concerns.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb59e9f1
    • Hirokazu Takahashi's avatar
      memcg: simplify force_empty and move_lists · 9b3c0a07
      Hirokazu Takahashi authored
      
      As for force_empty, though this may not be the main topic here,
      mem_cgroup_force_empty_list() can be implemented simpler.  It is possible to
      make the function just call mem_cgroup_uncharge_page() instead of releasing
      page_cgroups by itself.  The tip is to call get_page() before invoking
      mem_cgroup_uncharge_page(), so the page won't be released during this
      function.
      
      Kamezawa-san points out that by the time mem_cgroup_uncharge_page() uncharges,
      the page might have been reassigned to an lru of a different mem_cgroup, and
      now be emptied from that; but Hugh claims that's okay, the end state is the
      same as when it hasn't gone to another list.
      
      And once force_empty stops taking lock_page_cgroup within mz->lru_lock,
      mem_cgroup_move_lists() can be simplified to take mz->lru_lock directly while
      holding page_cgroup lock (but still has to use try_lock_page_cgroup).
      Signed-off-by: default avatarHirokazu Takahashi <taka@valinux.co.jp>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b3c0a07
    • Hugh Dickins's avatar
      memcg: fix mem_cgroup_move_lists locking · 2680eed7
      Hugh Dickins authored
      
      Ever since the VM_BUG_ON(page_get_page_cgroup(page)) (now Bad page state) went
      into page freeing, I've hit it from time to time in testing on some machines,
      sometimes only after many days.  Recently found a machine which could usually
      produce it within a few hours, which got me there at last.
      
      The culprit is mem_cgroup_move_lists, whose locking is inadequate; and the
      arrangement of structures was such that you got page_cgroups from the lru list
      neatly put on to SLUB's freelist.  Kamezawa-san identified the same hole
      independently.
      
      The main problem was that it was missing the lock_page_cgroup it needs to
      safely page_get_page_cgroup; but it's tricky to go beyond that too, and I
      couldn't do it with SLAB_DESTROY_BY_RCU as I'd expected.  See the code for
      comments on the constraints.
      
      This patch immediately gets replaced by a simpler one from Hirokazu-san; but
      is it just foolish pride that tells me to put this one on record, in case we
      need to come back to it later?
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2680eed7
    • Hugh Dickins's avatar
      memcg: css_put after remove_list · 6d48ff8b
      Hugh Dickins authored
      
      mem_cgroup_uncharge_page does css_put on the mem_cgroup before uncharging from
      it, and before removing page_cgroup from one of its lru lists: isn't there a
      danger that struct mem_cgroup memory could be freed and reused before
      completing that, so corrupting something?  Never seen it, and for all I know
      there may be other constraints which make it impossible; but let's be
      defensive and reverse the ordering there.
      
      mem_cgroup_force_empty_list is safe because there's an extra css_get around
      all its works; but even so, change its ordering the same way round, to help
      get in the habit of doing it like this.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d48ff8b
    • Hugh Dickins's avatar
      memcg: remove clear_page_cgroup and atomics · b9c565d5
      Hugh Dickins authored
      
      Remove clear_page_cgroup: it's an unhelpful helper, see for example how
      mem_cgroup_uncharge_page had to unlock_page_cgroup just in order to call it
      (serious races from that?  I'm not sure).
      
      Once that's gone, you can see it's pointless for page_cgroup's ref_cnt to be
      atomic: it's always manipulated under lock_page_cgroup, except where
      force_empty unilaterally reset it to 0 (and how does uncharge's
      atomic_dec_and_test protect against that?).
      
      Simplify this page_cgroup locking: if you've got the lock and the pc is
      attached, then the ref_cnt must be positive: VM_BUG_ONs to check that, and to
      check that pc->page matches page (we're on the way to finding why sometimes it
      doesn't, but this patch doesn't fix that).
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9c565d5
    • Hugh Dickins's avatar
      memcg: memcontrol uninlined and static · d5b69e38
      Hugh Dickins authored
      
      More cleanup to memcontrol.c, this time changing some of the code generated.
      Let the compiler decide what to inline (except for page_cgroup_locked which is
      only used when CONFIG_DEBUG_VM): the __always_inline on lock_page_cgroup etc.
      was quite a waste since bit_spin_lock etc.  are inlines in a header file; made
      mem_cgroup_force_empty and mem_cgroup_write_strategy static.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d5b69e38
    • Hugh Dickins's avatar
      memcg: memcontrol whitespace cleanups · 8869b8f6
      Hugh Dickins authored
      
      Sorry, before getting down to more important changes, I'd like to do some
      cleanup in memcontrol.c.  This patch doesn't change the code generated, but
      cleans up whitespace, moves up a double declaration, removes an unused enum,
      removes void returns, removes misleading comments, that kind of thing.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8869b8f6
    • Hugh Dickins's avatar
      memcg: remove mem_cgroup_uncharge · 8289546e
      Hugh Dickins authored
      
      Nothing uses mem_cgroup_uncharge apart from mem_cgroup_uncharge_page, (a
      trivial wrapper around it) and mem_cgroup_end_migration (which does the same
      as mem_cgroup_uncharge_page).  And it often ends up having to lock just to let
      its caller unlock.  Remove it (but leave the silly locking until a later
      patch).
      
      Moved mem_cgroup_cache_charge next to mem_cgroup_charge in memcontrol.h.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8289546e
    • Hugh Dickins's avatar
      memcg: mem_cgroup_charge never NULL · 7e924aaf
      Hugh Dickins authored
      
      My memcgroup patch to fix hang with shmem/tmpfs added NULL page handling to
      mem_cgroup_charge_common.  It seemed convenient at the time, but hard to
      justify now: there's a perfectly appropriate swappage to charge and uncharge
      instead, this is not on any hot path through shmem_getpage, and no performance
      hit was observed from the slight extra overhead.
      
      So revert that NULL page handling from mem_cgroup_charge_common; and make it
      clearer by bringing page_cgroup_assign_new_page_cgroup into its body - that
      was a helper I found more of a hindrance to understanding.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e924aaf
    • Hugh Dickins's avatar
      memcg: bad page if page_cgroup when free · 9442ec9d
      Hugh Dickins authored
      
      Replace free_hot_cold_page's VM_BUG_ON(page_get_page_cgroup(page)) by a "Bad
      page state" and clear: most users don't have CONFIG_DEBUG_VM on, and if it
      were set here, it'd likely cause corruption when the page is reused.
      
      Don't use page_assign_page_cgroup to clear it: that should be private to
      memcontrol.c, and always called with the lock taken; and memmap_init_zone
      doesn't need it either - like page->mapping and other pointers throughout the
      kernel, Linux assumes pointers in zeroed structures are NULL pointers.
      
      Instead use page_reset_bad_cgroup, added to memcontrol.h for this only.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9442ec9d
    • Hugh Dickins's avatar
      memcg: move_lists on page not page_cgroup · 427d5416
      Hugh Dickins authored
      
      Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
      it's more convenient if it acts upon the page itself not the page_cgroup; and
      in a later patch this becomes important to handle within memcontrol.c.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      427d5416
    • Hugh Dickins's avatar
      memcg: mm_match_cgroup not vm_match_cgroup · bd845e38
      Hugh Dickins authored
      
      vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
      it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd845e38
  8. 23 Feb, 2008 2 commits
  9. 09 Feb, 2008 1 commit