[ale] Free is not showing me correct used memory
    Jim Kinney 
    jim.kinney at gmail.com
       
    Sat Aug 22 13:47:15 EDT 2020
    
    
  
There's ways to tweak oomkiller to not hit your watchdog. That way you can let kernel "do it's thing" until the watchdog says the pull the plug. If you have at least 2 cores, you can use cpu-affinity to bind a list of processes, including the watchdog, to core0 plus X RAM placing basic os outside of oomkiller space. 
Robert Tweedy did a specific tweak to  put glusterfs outside of oomkiller land on the Emory cluster. Since glusterfs runs in userspace (fuse) and all non-booting storage was mounting over glusterfs oomkilling gluster would hang the box.
On August 22, 2020 1:09:35 PM EDT, Chris Fowler <cfowler at outpostsentinel.com> wrote:
>setting overcommit_memory to 0 solved the issue of OOM panic.  After
>the first failure of malloc() the system rebooted.
>
>
>0       -       Heuristic overcommit handling. Obvious overcommits of
>               address space are refused. Used for a typical system. It
>               ensures a seriously wild allocation fails while allowing
>                overcommit to reduce swap usage.  root is allowed to
>                allocate slightly more memory in this mode. This is the
>                default.
>
>Now I just need to decide if I am going to use OOM's panic and reboot
>or watchdog's.  With watchdog I have a repair script that runs and
>keeps count allowing the system 2m to recover from ENOMEM.  OOM panic
>will reboot on the first ENOMEM.  The issue of not using all available
>ram can be addressed via overcommit_ratio.  To use watchdog, I could
>keep overcommit_memory at 2 and set the ratio to 80%.  The remaining
>20% is reserved for kernel memory usage.
>
>The sitting of 2 and the ration of 5o also explains why I was able to
>malloc() more when I had enabled 500M of swap.  It is 50% of
>(RAM+SWAP).
>Is tmpfs kernel space?  I need to verify that because this system does
>not mount a drive as /  It uses unionfs where the software is in a ro
>image and the n any thing written in / ends up on a tmpfs.   Exactly
>like an Ubuntu livecd.
>
>I could also disable OOM panic, set overcommit_memory to 0 and use
>watchdog.  If watchdog is killed for whatever reason the system will
>reboot since it is using /dev/watchdog.
>
>________________________________
>From: Jim Kinney <jim.kinney at gmail.com>
>Sent: Saturday, August 22, 2020 10:56 AM
>To: Chris Fowler <cfowler at outpostsentinel.com>; Atlanta Linux
>Enthusiasts <ale at ale.org>
>Subject: Re: [ale] Free is not showing me correct used memory
>
>When did memory over commit become safely possible? I don't have to dig
>often into details of memory management but sometime in kernel 2 days a
>major change occured. Memory was marked allocated when requested
>regardless of use. But a different request process was used to report
>use. The overcommit allowed unused but allocated memory to be used by
>another process. The change merged those two or clarified or something.
>Been a long time. I also vaguely recall there might have been a race
>condition in the first few iterations of the change. Something to do
>with a thread problem and the over commit not playing well together.
>
>
>
>On August 22, 2020 10:30:01 AM EDT, Chris Fowler
><cfowler at outpostsentinel.com> wrote:
>
>When I copied that program from the web page I noticed something odd,
>but did not question it.
>
>memset(b, TEN_MB, 0);
>
>The size argument is the last one!  It should've been. As many times
>that I've used memset() I should have corrected it the first time. 
>Just me being an idiot and not second guessing someone else's program.
>
>memset(b, 0x0, TEN_MB);
>
>[root at basement]# free
>             total       used       free     shared    buffers cached
>Mem:       1964784    1225620     739164          0       5108  24632
>-/+ buffers/cache:    1195880     768904
>Swap:       524284          0     524284
>
>Now I'd like to ask the stupid question of why does Linux consider
>memory allocated via malloc(), but not used not good enough to mark
>that memory as used for /proc/meminfo?  I can't mallco() anymore
>anyway.  This is why watchdog's active test of trying to allocate
>memory is better than its passive test of just groking /poc/memifno.
>
>________________________________
>From: Jim Kinney <jim.kinney at gmail.com>
>Sent: Friday, August 21, 2020 10:20 PM
>To: Chris Fowler <cfowler at outpostsentinel.com>; Atlanta Linux
>Enthusiasts <ale at ale.org>
>Subject: Re: [ale] Free is not showing me correct used memory
>
>Is your ram fully functional? Memtest results are 100% perfect?
>
>Cgroups can direct oomkiller to odd locations.
>
>Strange problem.
>
>On August 21, 2020 8:33:17 PM EDT, Chris Fowler via Ale <ale at ale.org>
>wrote:
>I just enabled 500M of swap on the device, but it does not use disk.  I
>uses a zram device.   The kernel configured 500M of compressed ram for
>the use of swap.  The oom program was able to grab 134 chunks before
>watchdog restarted.  That's 1.3G.  Is my kernel config wrong?
>
>Chris
>________________________________
>From: Ale <ale-bounces at ale.org> on behalf of Chris Fowler via Ale
><ale at ale.org>
>Sent: Friday, August 21, 2020 8:16 PM
>To: ALE <ale at ale.org>
>Subject: [ale] Free is not showing me correct used memory
>
>I've ran into an issue on a device which runs out of memory, but it
>refuses to panic on OOM.  This creates a DOS affect.  No SSH to the
>device, but I am still able to ping it. After reboot messages in syslog
>show the device has ran out of memory.  Kernel is 2.,6.38 and system is
>32bit.
>
>I followed instructions on the URL below to test OOM panic.  I compiled
>an ran the program to test.  On a system with kernel 5.6.0 the OOM
>panics and system is restarted.  It works, but on the problem device it
>is as if the OOM is not fully aware. free, /proc/meminfo, vmstat, etc
>do not show memory usage skyrocketing as it does on the other system.
>
>Free on the problem device before I run the program:
>
>[root at basement]# free
>             total       used       free     shared    buffers cached
>Mem:       1964728     235380    1729348          0      12064  55388
>-/+ buffers/cache:     167928    1796800
>Swap:            0          0          0
>
>Now, I run it and it is able to allocate 82 chunks at 10M each before
>malloc() failed:
>
>[root at basement]#  oom
>Allocated 82 chunks.
>Sleeping 60(s) before exiting.
>
>In another xterm I'll run free while oom is waiting to exit.  Once it
>exits, all that memory is freed. No change.
>
>[root at basement]# free
>             total       used       free     shared    buffers cached
>Mem:       1964728     236380    1728348          0      12112  55388
>-/+ buffers/cache:     168880    1795848
>Swap:            0          0          0
>
>If I run oom in another window it will only grab 2 chunks before
>failure.  Also, watchdog is configured to restart if it can't allocate
>20M of memory.  Watchdog will restart the device because it is unable
>to grab it.
>
>The system itself has 2G of RAM.
>
>If that program was able to allocate 82x10 820M of RAM why did that not
>show up as used memory?  Now, I'm curious as to why Iw as only able to
>allocate 830M if there is 1.7GB free?
>
>Below is grep 'MEM' config:
>
># CONFIG_CGROUP_MEM_RES_CTLR is not set
>CONFIG_SHMEM=y
>CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
>CONFIG_NO_BOOTMEM=y
># CONFIG_MEMTEST is not set
># CONFIG_NOHIGHMEM is not set
>CONFIG_HIGHMEM4G=y
># CONFIG_HIGHMEM64G is not set
>CONFIG_HIGHMEM=y
>CONFIG_ARCH_FLATMEM_ENABLE=y
>CONFIG_ARCH_SPARSEMEM_ENABLE=y
>CONFIG_ARCH_SELECT_MEMORY_MODEL=y
>CONFIG_SELECT_MEMORY_MODEL=y
>CONFIG_FLATMEM_MANUAL=y
># CONFIG_SPARSEMEM_MANUAL is not set
>CONFIG_FLATMEM=y
>CONFIG_FLAT_NODE_MEM_MAP=y
>CONFIG_SPARSEMEM_STATIC=y
>CONFIG_HAVE_MEMBLOCK=y
>CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
># CONFIG_MEMORY_FAILURE is not set
>CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
>CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
># CONFIG_BLK_DEV_UMEM is not set
>CONFIG_INPUT_FF_MEMLESS=y
>CONFIG_DEVKMEM=y
>CONFIG_FIX_EARLYCON_MEM=y
># CONFIG_HW_RANDOM_TIMERIOMEM is not set
># CONFIG_MEMSTICK is not set
>CONFIG_FIRMWARE_MEMMAP=y
># CONFIG_DEBUG_KMEMLEAK is not set
># CONFIG_DEBUG_HIGHMEM is not set
># CONFIG_DEBUG_MEMORY_INIT is not set
>CONFIG_HAVE_ARCH_KMEMCHECK=y
># CONFIG_STRICT_DEVMEM is not set
>CONFIG_ASYNC_MEMCPY=y
>CONFIG_HAS_IOMEM=y
>
>Also:
>
>CONFIG_VMSPLIT_3G=y
># CONFIG_VMSPLIT_3G_OPT is not set
># CONFIG_VMSPLIT_2G is not set
># CONFIG_VMSPLIT_2G_OPT is not set
># CONFIG_VMSPLIT_1G is not set
>CONFIG_PAGE_OFFSET=0xC0000000
>
>
>--
>"no government by experts in which the masses do not have the chance to
>inform the experts as to their needs can be anything but an oligarchy
>managed in the interests of the few.” - John Dewey
-- 
"no government by experts in which the masses do not have the chance to inform the experts as to their needs can be anything but an oligarchy managed in the interests of the few.” - John Dewey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.ale.org/pipermail/ale/attachments/20200822/2dc8edc6/attachment.html>
    
    
More information about the Ale
mailing list