评论

收藏

[C++] 记一次 .NET 某妇产医院 WPF内存溢出分析

编程语言 编程语言 发布于:2021-12-15 14:31 | 阅读数:456 | 评论:0

一:背景
1. 讲故事
上个月有位朋友通过短消息找到我,说他的程序存在内存溢出情况,寻求如何解决。
要解决还得通过 windbg 分析啦。

二:Windbg 分析
1. 为什么会内存溢出
大家都知道内存溢出对应着 .NET 中的 ​​OutOfMemoryException​​ 异常,这种异常有可能是托管代码手工抛出的,也有可能是CLR层面抛出的,言外之意就是可以通过两种方式排查。





  • 托管线程是否挂载着异常?



0:000> !tThreadCount:    23UnstartedThread:  0BackgroundThread: 5PendingThread:  0DeadThread:     17Hosted Runtime:   no                                     Lock     ID OSID ThreadOBJ  State GC Mode   GC Alloc Context  Domain   Count Apt Exception   0  1 362c 00fac868   26020 Preemptive  7ED701A0:00000000 00fa6b60 0   STA  5  2 2d70 00fbeba0   2b220 Preemptive  7EBA7AC0:00000000 00fa6b60 0   MTA (Finalizer)  7  3 3264 061c8890   102a220 Preemptive  00000000:00000000 00fa6b60 0   MTA (Threadpool Worker)   17   15 3f98 19682b90   202b220 Preemptive  7EBB0830:00000000 00fa6b60 0   MTA XXXX   16  0 2845fb00   35820 Preemptive  00000000:00000000 00fa6b60 0   Ukn   18   14  a7c 2842b1c8   202b220 Preemptive  00000000:00000000 00fa6b60 0   MTA XXXX  6  0 2c9b3778   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   18  0 288a1318   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   23  0 288a22f0   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   10  0 2ccf3550   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   21  0 288a1860   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   12  0 288a1da8   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   11  0 2c993640   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX  8  0 2ccf3a98   35820 Preemptive  00000000:00000000 00fa6b60 0   Ukn XXXX  9  0 2ccf2030   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX  7  0 2c9aed88   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   26  0 28898308   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   25  0 2c492c68   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX  4  0 2c993b88   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   20  0 2c9af2d0   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   17  0 2c9afd60   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker) XXXX   24  0 2c9b1280   1039820 Preemptive  00000000:00000000 00fa6b60 0   Ukn (Threadpool Worker)   23   22 2658 2c9b02a8   1029220 Preemptive  7ED5BFF8:00000000 00fa6b60 0   MTA (Threadpool Worker)
从输出信息看,这些线程并没有挂载任何托管异常,我去。。。





  • 是否在 CLR 上抛出



这主要是看 ​​托管堆(heap)​​​ 上的内存分配或者gc回收造成的内存不足,可以用 ​​!ao​​ 命令。
0:000> !aoThere was no managed OOM due to allocations on the GC heap
从输出信息看也没有任何异常,尴尬了????????????。。。 尼玛,那到底是因为什么呢?
2. 探索溢出原因
出现这种尴尬情况,我只能怀疑生成这个dump的时候并没有get到那个点,或者是我的知识边界有限,不过天无绝人之路,不在那个 ​​点 ​​​ 也肯定在那个 ​​点​​​ 附近,对吧,接下来用 ​​!address -summary​​ 看一下内存使用的归类信息。
0:000> !address -summary--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal<unknown>                1520      4c185000 (   1.189 GB)  65.57%   59.45%Image                  4306      1f140000 ( 497.250 MB)  26.78%   24.28%Free                   1133       bf17000 ( 191.090 MB)      9.33%Heap                  617       7626000 ( 118.148 MB)   6.36%  5.77%Stack                  72       1740000 (  23.250 MB)   1.25%  1.14%Other                  34       7b000 ( 492.000 kB)   0.03%  0.02%TEB                    24       30000 ( 192.000 kB)   0.01%  0.01%PEB                     1        3000 (  12.000 kB)   0.00%  0.00%--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotalMEM_MAPPED                549      34b60000 ( 843.375 MB)  45.42%   41.18%MEM_PRIVATE              1718      20424000 ( 516.141 MB)  27.80%   25.20%MEM_IMAGE                4307      1f155000 ( 497.332 MB)  26.78%   24.28%--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotalMEM_COMMIT               4904      66ddd000 (   1.607 GB)  88.64%   80.37%MEM_RESERVE              1670       d2fc000 ( 210.984 MB)  11.36%   10.30%MEM_FREE                 1133       bf17000 ( 191.090 MB)      9.33%--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotalPAGE_READONLY              2272      382cf000 ( 898.809 MB)  48.41%   43.89%PAGE_READWRITE             1572      1eead000 ( 494.676 MB)  26.64%   24.15%PAGE_EXECUTE_READ             218       dd59000 ( 221.348 MB)  11.92%   10.81%PAGE_WRITECOPY              449       133e000 (  19.242 MB)   1.04%  0.94%PAGE_EXECUTE_READWRITE          188      ab4000 (  10.703 MB)   0.58%  0.52%PAGE_NOACCESS               156       9c000 ( 624.000 kB)   0.03%  0.03%PAGE_READWRITE | PAGE_GUARD        48       78000 ( 480.000 kB)   0.03%  0.02%PAGE_READWRITE | PAGE_WRITECOMBINE    1        2000 (   8.000 kB)   0.00%  0.00%--- Largest Region by Usage ----------- Base Address -------- Region Size ----------<unknown>                   1d200000       a001000 ( 160.004 MB)Image                    fed1000       36e4000 (  54.891 MB)Free                    33dfe000       1082000 (  16.508 MB)Heap                    3da84000      a1b000 (  10.105 MB)Stack                    1a10000       fd000 (1012.000 kB)Other                     7fa40000       33000 ( 204.000 kB)TEB                       a4c000        3000 (  12.000 kB)PEB                       a3d000        3000 (  12.000 kB)
从上面的 ​​MEM_COMMIT=1.607 GB 80.37%​​​ 信息看,当前内存占用 ​​1.6G​​​,占比 ​​80.37%​​​,可以看出它受到了一个 ​​2G内存​​​ 的限制,而且从 ​​!t​​ 输出中的内存地址看,当前是 32bit 程序,所以这是一个经典的: 64系统跑着32位程序被2G内存限制 的问题。
3. 如何突破 2G 限制
要寻找答案,还得看最权威的 MSDN: ​​https://docs.microsoft.com/en-us/windows/win32/memory/memory-limits-for-windows-releases?redirectedfrom=MSDN​​
​​破局​​​ 还得设置程序的 ​​IMAGE_FILE_LARGE_ADDRESS_AWARE​​ 标记。
DSC0000.png

关于具体怎么设置,我找了三种方法。





  • 使用 LargeAddressAware 安装包



DSC0001.png







  • 使用 editbin



可以在 vs 的生成事件中输入 ​​editbin /largeaddressaware $(TargetPath)​​。





  • 使用代码方式



这种可以直接给生成好的 exe 增加 ​​LargeAddressAware​​ 标记,除了标记,还能检测,????????
using System;using System.IO;namespace PEFile{  public class LargeAddressAware  {    public static bool IsLargeAddressAware(string filePath)    {      bool isLargeAddressAware = false;      PrepareStream(filePath, (stream, binaryReader) => isLargeAddressAware = (binaryReader.ReadInt16() & 0x20) != 0);      return isLargeAddressAware;    }    public static void SetLargeAddressAware(string filePath)    {      PrepareStream(filePath, (stream, binaryReader) =>      {        var value = binaryReader.ReadInt16();        if ((value & 0x20) == 0)        {          value = (short)(value | 0x20);          stream.Position -= 2;          var binaryWriter = new BinaryWriter(stream);          binaryWriter.Write(value);          binaryWriter.Flush();        }      });    }    private static void PrepareStream(string filePath, Action<Stream, BinaryReader> action)    {      using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.ReadWrite, FileShare.Read))      {        if (stream.Length < 0x3C)        {          return;        }        var binaryReader = new BinaryReader(stream);        // MZ header        if (binaryReader.ReadInt16() != 0x5A4D)        {          return;        }        stream.Position = 0x3C;        var peHeaderLocation = binaryReader.ReadInt32();        stream.Position = peHeaderLocation;        // PE header        if (binaryReader.ReadInt32() != 0x4550)        {          return;        }        stream.Position += 0x12;        action(stream, binaryReader);      }    }  }}


三:总结
总的来说,​​2G 内存限制​​​ 是一个 32bit 程序所必须面对的问题,知道了就好解决了,最后有一个问题要解释下,为什么 commit 内存高达 ​​1.6G​​​,这是因为医疗类的软件,大多是 ​​FastReport + DevExpress​​ 这些重量级的经典搭配以及大量的图片资源占用了太多 native memory。
DSC0002.jpg





关注下面的标签,发现更多相似文章