Monday, December 14, 2009

Examining address spaces with mdb


A while ago, I was interested in more details about process address spaces. For instance, if a page is mapped into an address space, where is the page in physical memory? Or if a page is on swap, where is it on swap? Are there pages that are in memory, but not currently valid for a process? The meminfo(2) system call can be used by an application to examine the locations of physical pages corresponding to a range of virtual addresses that the process is using. Is there a tool for doing this from outside the process? Is there any tool for determining the locations of pages in memory when one is using liblgrp(3)? liblgrp(3) provides an API for specifying a "locality group". A locality group, as the man page says, "represents the set of CPU-like and memory-like hardware devices that are at most some locality apart from each other". Essentially, using liblgrp(3), one can specify the desired memory placement for memory that threads within a process are using.


So, I have written a dcmd, called segpages, for mdb that allows one to examine each virtual page of a segment in a process address space. The command gives the following information:


  • The virtual address of the page.
  • If the page is in memory, the physical address of the page.
  • If the page is on swap, the location on swap, and which swap device/file.
  • If the page is not currently in memory or on swap, a "-".
  • If the page is mapped from a file, the pathname of the file, and the offset within the file.
  • If the page is anonymous, the command prints "anon".
  • If the page is mapped to a device, the command only prints the physical address it is mapped to, and the path to the device.
  • The "share count" for the page, i.e., the number of processes sharing the same page.
  • The dcmd command also prints the status of the page:

    • VALID -- The page is mapped
    • INMEMORY -- The page is in memory (it may not be valid for the process).
    • SWAPPED -- The page is on swap. Note that a page may be INMEMORY and SWAPPED. What I find more interesting, is pages that are SWAPPED and VALID. I expect to find INMEMORY pages that are also on swap. I did not expect to find SWAPPED pages that are also VALID, since I assumed that a page that was read in from swap and is now valid would not have a copy on swap. From a quick look at the code, it appears the swap slot is not freed until the reference count on the anon struct that is mapping the page has gone to 0. Anyone with a more complete understanding of this is welcome to comment.



Here is (very abbreviated) output for a running bash process.


First, a look at pmap output. Each line of the pmap output represents a "segment" of the address space. The different columns are described in the pmap(1) man page.

$ pmap -x 919
919: /bin/bash --noediting -i<
Address Kbytes RSS Anon Locked Mode Mapped File
08045000 12 12 4 - rw--- [ stack ]
08050000 644 644 - - r-x-- bash
08100000 80 80 12 - rwx-- bash
08114000 52 52 28 - rwx-- [ heap ]
CE410000 624 512 - - r-x-- libnsl.so.1
CE4BC000 16 16 4 - rw--- libnsl.so.1
CE4C0000 20 8 - - rw--- libnsl.so.1
CE4F0000 56 52 - - r-x-- methods_unicode.so.3
CE50D000 4 4 - - rwx-- methods_unicode.so.3
CE510000 2416 752 - - r-x-- en_US.UTF-8.so.3
CE77B000 4 4 - - rwx-- en_US.UTF-8.so.3
CE960000 64 16 - - rwx-- [ anon ]
CE97E000 4 4 - - rwxs- [ anon ]
CE980000 4 4 - - rwx-- [ anon ]
CE990000 24 12 4 - rwx-- [ anon ]
CE9A0000 4 4 4 - rwx-- [ anon ]
CE9B0000 1280 972 - - r-x-- libc_hwcap1.so.1
CEAF0000 28 28 16 - rwx-- libc_hwcap1.so.1
CEAF7000 8 8 - - rwx-- libc_hwcap1.so.1
CEB00000 4 4 - - r-x-- libdl.so.1
CEB10000 4 4 4 - rwx-- [ anon ]
CEB20000 56 56 - - r-x-- libsocket.so.1
CEB3E000 4 4 - - rw--- libsocket.so.1
CEB40000 180 136 - - r-x-- libcurses.so.1
CEB7D000 28 28 - - rw--- libcurses.so.1
CEB84000 8 - - - rw--- libcurses.so.1
CEB90000 4 4 4 - rwx-- [ anon ]
CEBA0000 4 4 4 - rw--- [ anon ]
CEBB0000 4 4 - - rw--- [ anon ]
CEBBF000 180 180 - - r-x-- ld.so.1
CEBFC000 8 8 4 - rwx-- ld.so.1
CEBFE000 4 4 4 - rwx-- ld.so.1
-------- ------- ------- ------- -------
total Kb 5832 3620 92 -

# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs usba sockfs ip hook neti sctp arp uhci sd fctl md lofs audiosup fcip fcp random cpc crypto logindmux ptm ufs sppp ipc ]



First, load the dmod containing the new dcmd.


> ::load /wd320/max/source/mdb/segpages/i386/segpages.so
>


Now, walk through the segments of the process address space, showing
each virtual page in the segment. Note that output has been omitted.


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages
VA PA FILE OFFSET SHARES DISPOSITION
8045000 378C5000 [anon] 54518000 1 VALID
8046000 6EB06000 [anon] 54118000 1 VALID
8047000 5F9C7000 [anon] 540B8000 1 VALID
VA PA FILE OFFSET SHARES DISPOSITION
8050000 600A7000 bash 0 7 VALID
8051000 74368000 bash 1000 7 VALID
8052000 72669000 bash 2000 7 VALID
8053000 66C6A000 bash 3000 7 VALID
8054000 636AB000 bash 4000 0 INVALID,INMEMORY
8055000 5FDEC000 bash 5000 0 INVALID,INMEMORY
8056000 63EED000 bash 6000 0 INVALID,INMEMORY
8057000 62EAE000 bash 7000 0 INVALID,INMEMORY
8058000 5C52F000 bash 8000 7 VALID
8059000 5C5B0000 bash 9000 7 VALID
... output omitted
80ED000 5C2C4000 bash 9D000 7 VALID
80EE000 5C245000 bash 9E000 7 VALID
80EF000 5C286000 bash 9F000 3 VALID
80F0000 63A97000 bash A0000 0 INVALID,INMEMORY
VA PA FILE OFFSET SHARES DISPOSITION
8100000 79940000 [anon] 541D8000 1 VALID
8101000 5F0C1000 [anon] 62F00000 1 VALID
8102000 378C2000 [anon] 54438000 1 VALID
8103000 5EF5A000 bash A3000 6 VALID
8104000 5EEDB000 bash A4000 6 VALID
8105000 37885000 [anon] 543B8000 1 VALID
8106000 60E1D000 bash A6000 7 VALID
...
VA PA FILE OFFSET SHARES DISPOSITION
8114000 37914000 [anon] 54478000 1 VALID
8115000 79DD5000 [anon] 54368000 1 VALID
8116000 55356000 [anon] 62F90000 1 VALID
...
VA PA FILE OFFSET SHARES DISPOSITION
CE410000 7AE40000 libnsl.so.1 0 55 VALID
CE411000 7AEC1000 libnsl.so.1 1000 57 VALID
CE412000 7AE42000 libnsl.so.1 2000 57 VALID
CE413000 7AE83000 libnsl.so.1 3000 57 VALID
CE414000 7AE84000 libnsl.so.1 4000 57 VALID
...
CE42D000 6EE96000 libnsl.so.1 1D000 1A INVALID,INMEMORY
CE42E000 6E797000 libnsl.so.1 1E000 1A INVALID,INMEMORY
...
VA PA FILE OFFSET SHARES DISPOSITION
CE4F0000 17D9000 methods_unicode.so.3 0 29 VALID
CE4F1000 17DA000 methods_unicode.so.3 1000 2C VALID
...
VA PA FILE OFFSET SHARES DISPOSITION
CE510000 1869000 en_US.UTF-8.so.3 0 28 VALID
CE511000 18AA000 en_US.UTF-8.so.3 1000 2A VALID
...
CE518000 6F1EA000 en_US.UTF-8.so.3 8000 0 INVALID,INMEMORY
CE519000 6F1EB000 en_US.UTF-8.so.3 9000 0 INVALID,INMEMORY
CE51A000 6F1EC000 en_US.UTF-8.so.3 A000 0 INVALID,INMEMORY
...
CE5FF000 6DB60000 en_US.UTF-8.so.3 EF000 5 INVALID,INMEMORY
CE600000 1659000 en_US.UTF-8.so.3 F0000 7 INVALID,INMEMORY
...
CE6EE000 1687000 en_US.UTF-8.so.3 1DE000 9 INVALID,INMEMORY
CE6EF000 1688000 en_US.UTF-8.so.3 1DF000 9 INVALID,INMEMORY
CE6F0000 1649000 en_US.UTF-8.so.3 1E0000 9 INVALID,INMEMORY
...
CE729000 1782000 en_US.UTF-8.so.3 219000 29 VALID
CE72A000 1783000 en_US.UTF-8.so.3 21A000 29 VALID
...
CE730000 1709000 en_US.UTF-8.so.3 220000 29 VALID
CE731000 6F143000 en_US.UTF-8.so.3 221000 0 INVALID,INMEMORY
CE732000 6F144000 en_US.UTF-8.so.3 222000 0 INVALID,INMEMORY
...
VA PA FILE OFFSET SHARES DISPOSITION
CE9B0000 76A42000 libc_hwcap1.so.1 0 5B VALID
CE9B1000 76AC3000 libc_hwcap1.so.1 1000 5B VALID
...
VA PA FILE OFFSET SHARES DISPOSITION
...
CEBC4000 2A34000 ld.so.1 5000 47 VALID
CEBC5000 28B5000 ld.so.1 6000 47 VALID
CEBC6000 29F6000 ld.so.1 7000 60 VALID
CEBC7000 2937000 ld.so.1 8000 60 VALID
...
>


Some general things to note:


  • Physical pages are randomly distributed. However, pages from ld.so.1 tend to be in low memory with comparison to anonymous pages. This should be expected as most pages of ld.so.1 are probably loaded early on in the system lifetime as most every application uses it.
  • There are many pages that are not valid, but they are in memory. In general, text and data pages are prefetched when a program starts, unless the program is large, or there is not enough free memory. Although pages are prefetched, it appears that they are not mapped to the process address space until/unless they are actually used.
  • Bash is not very large. Running the command above finishes in 5-10 seconds. Running the same command on a large program (for instance, firefox-bin), takes several minutes to complete. Running the command on a large 64-bit application will take considerably longer.
  • This is being run on a live system, so the address space of the process being examined may change while it is being walked.
  • At this point in time, no pages are swapped out.


Now, let's get some general statistics.


First, a count of the pages currently valid for the process. This is the current mapped RSS. Note that the pmap command reports "RSS", which, at 3620k is 905 4k-byte pages. But only 558 pages (or 2232k) are currently valid.


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i " valid" | wc
558 3348 35712
>


Now, the pages in memory, but not currently valid in the page table(s) for the process.


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i "inmemory" | wc
347 2082 26025
>


Note that the valid pages plus the in memory pages is 905, or the value reported by pmap. So RSS as reported by pmap does not imply that page faults will not happen for all of those pages. But if a page fault occurs the correct page will be found in memory.


How many pages are currently not valid (and not in memory).


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i " invalid$" | wc
553 3298 36498
>


How large is the address space?


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -v OFFSET | wc
1458 8728 98235
>


Note that this is 5832k, the total size as reported by pmap.


How many pages have been swapped out?


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i swapped | wc
0 0 0
>


Now, we'll induce memory load on the system, and again examine the address space. The memory usage induced should be enough to cause pages to be swapped (paged) out.


First, pmap output after the memory stress.


$ pmap -x 919
919: /bin/bash --noediting -i
Address Kbytes RSS Anon Locked Mode Mapped File
08045000 12 4 - - rw--- [ stack ]
08050000 644 508 - - r-x-- bash
08100000 80 80 - - rwx-- bash
08114000 52 44 28 - rwx-- [ heap ]
CE410000 624 320 - - r-x-- libnsl.so.1
CE4BC000 16 16 4 - rw--- libnsl.so.1
CE4C0000 20 8 - - rw--- libnsl.so.1
CE4F0000 56 36 - - r-x-- methods_unicode.so.3
CE50D000 4 4 - - rwx-- methods_unicode.so.3
CE510000 2416 124 - - r-x-- en_US.UTF-8.so.3
CE77B000 4 4 - - rwx-- en_US.UTF-8.so.3
CE960000 64 16 - - rwx-- [ anon ]
CE97E000 4 4 - - rwxs- [ anon ]
CE980000 4 4 - - rwx-- [ anon ]
CE990000 24 12 4 - rwx-- [ anon ]
CE9A0000 4 4 4 - rwx-- [ anon ]
CE9B0000 1280 952 - - r-x-- libc_hwcap1.so.1
CEAF0000 28 28 12 - rwx-- libc_hwcap1.so.1
CEAF7000 8 8 - - rwx-- libc_hwcap1.so.1
CEB00000 4 4 - - r-x-- libdl.so.1
CEB10000 4 4 4 - rwx-- [ anon ]
CEB20000 56 56 - - r-x-- libsocket.so.1
CEB3E000 4 4 - - rw--- libsocket.so.1
CEB40000 180 68 - - r-x-- libcurses.so.1
CEB7D000 28 28 - - rw--- libcurses.so.1
CEB84000 8 - - - rw--- libcurses.so.1
CEB90000 4 4 4 - rwx-- [ anon ]
CEBA0000 4 4 4 - rw--- [ anon ]
CEBB0000 4 4 - - rw--- [ anon ]
CEBBF000 180 180 - - r-x-- ld.so.1
CEBFC000 8 8 4 - rwx-- ld.so.1
CEBFE000 4 4 4 - rwx-- ld.so.1
-------- ------- ------- ------- -------
total Kb 5832 2544 72 -

$


As expected, the RSS has gone down, but the virtual size remains the same. It is a little interesting that the amount reported under anon has also dropped by 20k.


Again, we'll use the new dcmd to examine the address space more closely.


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages
VA PA FILE OFFSET SHARES DISPOSITION
8045000 378C5000 [anon] 54518000 1 VALID
8046000 - /dev/zvol/dsk/rpool/swap 17EBF000 0 INVALID,SWAPPED
8047000 - /dev/zvol/dsk/rpool/swap 1D8FE000 0 INVALID,SWAPPED
VA PA FILE OFFSET SHARES DISPOSITION
8050000 16B86000 bash 0 1 VALID
8051000 13A07000 bash 1000 1 VALID
8052000 6B088000 bash 2000 1 VALID
8053000 1889000 bash 3000 1 VALID
8054000 2430A000 bash 4000 1 VALID
8055000 6440B000 bash 5000 1 VALID
8056000 6684C000 bash 6000 1 VALID
8057000 7308D000 bash 7000 1 VALID
8058000 6DCCE000 bash 8000 0 INVALID,INMEMORY
8059000 3784F000 bash 9000 0 INVALID,INMEMORY
...
80ED000 4CB23000 bash 9D000 0 INVALID,INMEMORY
80EE000 76BE4000 bash 9E000 0 INVALID,INMEMORY
80EF000 5BA5000 bash 9F000 0 INVALID,INMEMORY
80F0000 36836000 bash A0000 0 INVALID,INMEMORY
VA PA FILE OFFSET SHARES DISPOSITION
8100000 - /dev/zvol/dsk/rpool/swap 247C2000 0 INVALID,SWAPPED
8101000 - /dev/zvol/dsk/rpool/swap 7CCD000 0 INVALID,SWAPPED
8102000 378C2000 [anon] 54438000 1 VALID
8103000 75479000 bash A3000 0 INVALID,INMEMORY
8104000 532BA000 bash A4000 0 INVALID,INMEMORY
8105000 37885000 [anon] 543B8000 1 VALID
8106000 7443C000 bash A6000 0 INVALID,INMEMORY
...
VA PA FILE OFFSET SHARES DISPOSITION
8114000 37914000 [anon] 54478000 1 VALID
8115000 79DD5000 [anon] 54368000 1 VALID
8116000 55356000 [anon] 62F90000 1 VALID
...
VA PA FILE OFFSET SHARES DISPOSITION
CE410000 7AE40000 libnsl.so.1 0 4C VALID
CE411000 7AEC1000 libnsl.so.1 1000 4E VALID
CE412000 7AE42000 libnsl.so.1 2000 4E VALID
CE413000 7AE83000 libnsl.so.1 3000 4E VALID
CE414000 7AE84000 libnsl.so.1 4000 4E VALID
...
CE42D000 6EE96000 libnsl.so.1 1D000 18 INVALID,INMEMORY
CE42E000 6E797000 libnsl.so.1 1E000 18 INVALID,INMEMORY
...
VA PA FILE OFFSET SHARES DISPOSITION
CE4F0000 17D9000 methods_unicode.so.3 0 27 VALID
CE4F1000 17DA000 methods_unicode.so.3 1000 2A VALID
...
VA PA FILE OFFSET SHARES DISPOSITION
CE510000 1869000 en_US.UTF-8.so.3 0 26 VALID
CE511000 18AA000 en_US.UTF-8.so.3 1000 28 VALID
...
CE518000 - en_US.UTF-8.so.3 8000 0 INVALID
CE519000 - en_US.UTF-8.so.3 9000 0 INVALID
CE51A000 - en_US.UTF-8.so.3 A000 0 INVALID
...
CE5FF000 - en_US.UTF-8.so.3 EF000 0 INVALID
CE600000 - en_US.UTF-8.so.3 F0000 0 INVALID
...
CE6EE000 1687000 en_US.UTF-8.so.3 1DE000 A INVALID,INMEMORY
CE6EF000 1688000 en_US.UTF-8.so.3 1DF000 A INVALID,INMEMORY
CE6F0000 1649000 en_US.UTF-8.so.3 1E0000 A INVALID,INMEMORY
...
CE729000 1782000 en_US.UTF-8.so.3 219000 27 VALID
CE72A000 1783000 en_US.UTF-8.so.3 21A000 27 VALID
...
CE730000 1709000 en_US.UTF-8.so.3 220000 27 VALID
CE731000 - en_US.UTF-8.so.3 221000 0 INVALID
CE732000 - en_US.UTF-8.so.3 222000 0 INVALID
...
VA PA FILE OFFSET SHARES DISPOSITION
CE9B0000 76A42000 libc_hwcap1.so.1 0 51 VALID
CE9B1000 76AC3000 libc_hwcap1.so.1 1000 51 VALID
...
VA PA FILE OFFSET SHARES DISPOSITION
CEBC4000 2A34000 ld.so.1 5000 42 VALID
CEBC5000 28B5000 ld.so.1 6000 42 VALID
CEBC6000 29F6000 ld.so.1 7000 57 VALID
CEBC7000 2937000 ld.so.1 8000 57 VALID
...
>


As expected, many pages that were previously valid are now invalid. Many of these pages are still in memory, but some have been swapped out. The output does not show it, but some pages that are swapped out can also be in memory (the page was swapped out, put on a freelist, but has not yet been re-used for some other purpose. It is interesting that some pages with reasonably high share counts are still in memory, but no longer valid for this instance of bash. The pageout code checks the share counts, and skips pages being shared by more than po_share processes. On my system, po_share is 8. So I am not sure what is marking the pages invalid (maybe a job for DTrace).


As before, I'll get some counts of valid, invalid, inmemory, and
swapped pages.


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i " valid" | wc
413 2478 26432
>


Previously, the number of valid pages was 558, so 145 pages have been marked invalid and possibly swapped out.


The number of invalid pages is now:


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i " invalid$" | wc
818 4888 53988
>


Previously, this was 553, so 265 pages that previously were valid are
now invalid.


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i "inmemory" | wc
215 1290 16125
>


And 215 pages that are invalid are still in memory, but the page table entries for the bash instance does not have the pages mapped.


> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i swapped | wc
12 72 936


And 12 pages of bash are on swap.


It would be nice to be able to show this graphically. For instance, a large box representing the address space, with different colored pixels to represent the state of the different pages of the address space. I have been told that JavaFX is good for this, but my knowledge of Java is really not up to it. Especially for large processes, a graphical view would be nice (well, at least interesting to look at...).


I have not tried the dcmd on SPARC or x64, but I expect it to work (at least on x64). I would also like to try this on a large machine which has latency groups set up. If anyone has such a machine and would like to try this out, please let me know.


I also have a version of the command that only prints summary information. I want to add an option that prints page sizes, but currently the command assumes all pages are the native page size (4k on x86/x64 and 8k on SPARC).


If there is interest, I'll make the code for the dcmd available.

3 comments:

Unknown said...

Interesting. Please publish.

alan

Max Bruning said...

The source is available at
ftp://ftp.bruningsystems.com/segpages.tar

Unknown said...

Got it! thanks!

alan