Memory Usage -Breakup

LinuxVM
Understanding Linux VM/Memory Breakup

Above is a breakup of 4GB RAM memory on a typical desktop linux. Important to observe is breakup of Kernel space and Userland/User-space memory usage. Also how within these Cached ( a term that is difficult to explain in a simple post here) as well as non-cached memory breakups are arrived at to understand from Linux kernel VM/memory perspective of what can be typically reclaimed/recycled easily without swapping. This is how MemAvailable is calculated from >3.14 kernels.

Total PSS(Proportional Set size)+Shmem is what is typically Total Userland memory usage and Anon Pages+Shmem is typically not viewed as easily reclaimable but PageCache+Mapped is easily if needed to serve memory needs without swapping.

Important to note is at runtime some values may go negative but point of this post is to understand how kernel sees memory (note BIOS/HW reserved memory,few other items are not included here and one can see out of 4096MB only 3916MB is avl to start with)

Database servers reserve/use heavily Shared/initimate memory,use HugePages and pinned memory typically compared to typicall non-Db servers. This post is an illustration of essentially a breakup to arrive at ‘MemAvailable’ effective in non-DB servers serving as cache/servers

Linux MM – Memory Management Fundamentals(Quick grasp)

Note: This post is essentially a bird’s  view(simple but quick) aiming at a overview of Linux VM/Memory mgmt.

MEMORY AVAILABLE FOR NEW PROCESSES/APPLICATIONS (Without need to use SWAP)
Memory Available without need to use swap is computed as all of currently held Free Memory together with portion of FILE LRU lists pages and SLAB reclaimable portion that can be recycled/reclaimed after taking into account low water Mark of zones.

MemAvailable a dataitem reported from >3.14 linux kernels is arrived at after taking into account various details of internals (that would keep getting revised as kernel evolves) and this post is nothing but scratching the surface to get basic overview.

Perhaps i would post a slide of breakup of how Kernel sees memory (both its internal use as well as userland processes that would explain more later.

USERSPACE MEMORY
Linux VM Subsystem manages in Pages using Buddy allocation algorithm
All processes have pages mapped here. Pages could be File backed(FS,mmap) or anonymous/heap alloc,shared memory,tmpfs etc
File backed pages have separate LRU lists and serve as significant portion of PageCache.
ACTIVE LRU LISTS = ACTIVE ANON + ACTIVE FILE BACKED, similar for INACTIVE LRU LISTS.
Kernel as part of PageCache reclamation to allocate/serve memory needs for processes usually scans FILE LRU lists first as they are file backed and easy to evict/recycle rather than ANON lists which need SWAP I/O. File I/O is optimized via Buffers portion memory in kernel/FS implementation as well as HW/block device better support than SWAP file I/O. Also read-only file backed pages easy to evict/reuse.

GC overhead -short lived objects

Normally one is not inclined to touch these jvm flags but sometimes it helps as a quick fix

By default sometimes NewRatio which is a jvm/java flag is not suitable/convenient in your specific case where lot of short-lived objects are created due to nature of your code(whatever be the reason). In such cases it can be beneficial to change it to 1 from default of 2 in most jvm/java server platforms.

This i saw in a specific case changed runtime behaviour to give improved execution with lesser GC overhead esp helping code to run little faster.

Simple java benchmark-primes

In Java objects usage esp the GC overhead in terms of cpu usage as well as memory becomes tricky as project size grows. Efficient usage of objects in terms of avoiding unnecessary creation as well as sticking to primitive types helps often in cpu/memory usage. Unfortunately most popular IDEs with their inbuilt profilers do not give a clear picture of minute details easily on how this can be achieved. eg: Oracle’s own flagship JDeveloper though very versatile and much easier in its latest version does not help much when it comes to looking for reducing GC overhead wherever possible. This is not to blame Jdev or any IDE for that matter as it depends on way JVM/java language works as it is very easy to get trapped in GC overheads due to ease in object oriented programming due to cost overhead it brings.

I tried to use both linux’s perf as well as java IDE’s (in this case Oracle’s JDeveloper) to understand where the cpu usage esp GC overhead shows up its face and somehow reduced cpu usage in a particular case of primes generation using simple java.

Though no attempt is made to significantly review the logic/algorithm linux perf showed where jvm/java was spending cpu esp in object allocations/GC.

For reference i am providing links to improved:

  1. Improved ArrayList version in std java for primes.
  2. Significantly improved Arrays version in std java.

Original source file listing of “primes.java”

Would post later in more detail on using linux’s perf esp uprobes more effectively in combining with a particular language’s own IDE/profiler esp java runtime profilers for optimizing later.

Reducing Objects creation-Part 2

Below is a much faster and better solution for jdk8 std java for same primes.java

 

import java.util.*;

import java.lang.Math ;

class PrimeNumbersGenerator {
int[] get_primes7(int n) {

int[] res;
if (n < 2) {
res = new int[1];
return res;
}
if (n == 2) {
res = new int[1];
res[0] = 2;
return res;
}
int[] s = new int[(int) (n – 2) / 2];

for (int i = 3; i < n + 1; i += 2) {
s[(i – 3) / 2] = i;
}

int mroot = (int) Math.sqrt(n);

int half = s.length;
int i = 0;
int m = 3;
while (m <= mroot) {

if (s[i] != 0) {
int j;
j = ((m * m – 3) / 2);

s[j] = 0;
while (j < half) {

s[j] = 0;
j += m;
}
}
i = i + 1;

m = 2 * i + 3;
}
// 2 is the only even prime and the first prime always
int pcount = 1;
for (int it = 0; it < s.length; ++it) {
if (s[it] != 0) {
pcount++;
}
}
res = new int[pcount];
pcount = 1;
res[0] = 2;

for (int it = 0; it < s.length; ++it) {

if (s[it] != 0) {

res[pcount] = s[it];
pcount++;
}
}

return res;
}
}

class PrimeNumbersBenchmarkApp {

public static void main(String[] args) {
long startTime = System.currentTimeMillis();
long periodTime = Long.parseLong(System.getenv(“RUN_TIME”), 10) * 1000;

int[] res;
while ((System.currentTimeMillis() – startTime) < periodTime) {

res = (new PrimeNumbersGenerator()).get_primes7(10000000);
System.out.format(“Found %d prime numbers.\n”, res.length);

}
}
}

Reducing object creations

Reducing object creations wherever possible to reduce GC overhead in primes.java

Note: Below is changed code which should run faster with jdk8 and please make sure code is copied without any strange characters in case compilation throws errors.

 

import java.util.*;

import java.lang.Math;

class PrimeNumbersGenerator {

/* Create the arraylist variable s just once outside the method get_primes7 as partof      class variable,it can be cleared each time inside loop */

static ArrayList<Integer> s = new ArrayList<Integer>();

ArrayList<Integer> get_primes7(ArrayList<Integer> res, int n) {
if (n < 2)
return res;
if (n == 2) {
res.add(2);
return res;
}

// Clear the arraylist s before each time it is used in loop
s.clear();
for (int i = 3; i < n + 1; i += 2) {
s.add(i);
}
int mroot = (int)Math.sqrt(n);
int half = s.size();
int i = 0;
int m = 3;
while (m <= mroot) {
if (s.get(i) != 0) {
int j = (int)((m * m – 3) / 2);
s.set(j, 0);
while (j < half) {
s.set(j, 0);
j += m;
}
}
i = i + 1;
m = 2 * i + 3;
}
res.add(2);
for (int it = 0; it < s.size(); ++it) {
if (s.get(it) != 0) {
res.add(s.get(it));
}
}

return res;
}
}

class PrimeNumbersBenchmarkApp {

// Create the arraylist to hold the resulting primes only once & pass to calling method
static ArrayList<Integer> resArr = new ArrayList<Integer>();

// Same way Create the calling class object only once instead of creating in loop
static PrimeNumbersGenerator primeGen = new PrimeNumbersGenerator();

public static void main(String[] args) {
long startTime = System.currentTimeMillis();
long periodTime = Long.parseLong(System.getenv(“RUN_TIME”), 10) * 1000;
while ((System.currentTimeMillis() – startTime) < periodTime) {

//     Clear the result arraylist each time before calling get_primes7 method
resArr.clear();
resArr = primeGen.get_primes7(resArr, 10000000);
System.out.format(“Found %d prime numbers.\n”, resArr.size());
}
}
}