A while ago, I was interested in more details about process address spaces.  For instance, if a page is mapped into an address space, where is the page in physical memory?  Or if a page is on swap, where is it on swap?  Are there pages that are in memory, but not currently valid for a process?  The meminfo(2) system call can be used by an application to examine the locations of physical pages corresponding to a range of virtual addresses that the process is using.  Is there a tool for doing this from outside the process?  Is there any tool for determining the locations of pages in memory when one is using liblgrp(3)?  liblgrp(3) provides an API for specifying a "locality group".  A locality group, as the man page says, "represents the set of CPU-like and memory-like hardware devices that are at most some locality apart from each other".  Essentially, using liblgrp(3), one can specify the desired memory placement for memory that threads within a process are using.
So, I have written a dcmd, called segpages, for mdb that allows one to examine each virtual page of a segment in a process address space. The command gives the following information:
-  The virtual address of the page.
 -  If the page is in memory, the physical address of the page.
 -  If the page is on swap, the location on swap, and which swap device/file.
 -  If the page is not currently in memory or on swap, a "-".
 -  If the page is mapped from a file, the pathname of the file, and the offset within the file.
 -  If the page is anonymous, the command prints "anon".
 -  If the page is mapped to a device, the command only prints the physical address it is mapped to, and the path to the device.
 -  The "share count" for the page, i.e., the number of processes sharing the same page.
 -  The dcmd command also prints the status of the page:
-  VALID -- The page is mapped
 -  INMEMORY -- The page is in memory (it may not be valid for the process).
 -  SWAPPED -- The page is on swap. Note that a page may be INMEMORY and SWAPPED. What I find more interesting, is pages that are SWAPPED and VALID.  I expect to find INMEMORY pages that are also on swap.  I did not expect to find SWAPPED pages that are also VALID, since I assumed that a page that was read in from swap and is now valid would not have a copy on swap.  From a quick look at the code, it appears the swap slot is not freed until the reference count on the anon struct that is mapping the page has gone to 0. Anyone with a more complete understanding of this is welcome to comment.
 
 -  VALID -- The page is mapped
 
Here is (very abbreviated) output for a running bash process.
First, a look at pmap output.  Each line of the pmap output represents a "segment" of the address space.  The different columns are described in the pmap(1) man page.
$ pmap -x 919
919: /bin/bash --noediting -i<
Address   Kbytes     RSS    Anon  Locked Mode   Mapped File
08045000      12      12       4       - rw---    [ stack ]
08050000     644     644       -       - r-x--  bash
08100000      80      80      12       - rwx--  bash
08114000      52      52      28       - rwx--    [ heap ]
CE410000     624     512       -       - r-x--  libnsl.so.1
CE4BC000      16      16       4       - rw---  libnsl.so.1
CE4C0000      20       8       -       - rw---  libnsl.so.1
CE4F0000      56      52       -       - r-x--  methods_unicode.so.3
CE50D000       4       4       -       - rwx--  methods_unicode.so.3
CE510000    2416     752       -       - r-x--  en_US.UTF-8.so.3
CE77B000       4       4       -       - rwx--  en_US.UTF-8.so.3
CE960000      64      16       -       - rwx--    [ anon ]
CE97E000       4       4       -       - rwxs-    [ anon ]
CE980000       4       4       -       - rwx--    [ anon ]
CE990000      24      12       4       - rwx--    [ anon ]
CE9A0000       4       4       4       - rwx--    [ anon ]
CE9B0000    1280     972       -       - r-x--  libc_hwcap1.so.1
CEAF0000      28      28      16       - rwx--  libc_hwcap1.so.1
CEAF7000       8       8       -       - rwx--  libc_hwcap1.so.1
CEB00000       4       4       -       - r-x--  libdl.so.1
CEB10000       4       4       4       - rwx--    [ anon ]
CEB20000      56      56       -       - r-x--  libsocket.so.1
CEB3E000       4       4       -       - rw---  libsocket.so.1
CEB40000     180     136       -       - r-x--  libcurses.so.1
CEB7D000      28      28       -       - rw---  libcurses.so.1
CEB84000       8       -       -       - rw---  libcurses.so.1
CEB90000       4       4       4       - rwx--    [ anon ]
CEBA0000       4       4       4       - rw---    [ anon ]
CEBB0000       4       4       -       - rw---    [ anon ]
CEBBF000     180     180       -       - r-x--  ld.so.1
CEBFC000       8       8       4       - rwx--  ld.so.1
CEBFE000       4       4       4       - rwx--  ld.so.1
-------- ------- ------- ------- -------
total Kb    5832    3620      92       -
# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs usba sockfs ip hook neti sctp arp uhci sd fctl md lofs audiosup fcip fcp random cpc crypto logindmux ptm ufs sppp ipc ]
First, load the dmod containing the new dcmd.
> ::load /wd320/max/source/mdb/segpages/i386/segpages.so 
>
Now, walk through the segments of the process address space, showing
each virtual page in the segment.  Note that output has been omitted.
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
 8045000 378C5000               [anon] 54518000        1  VALID
 8046000 6EB06000               [anon] 54118000        1  VALID
 8047000 5F9C7000               [anon] 540B8000        1  VALID
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
 8050000 600A7000                 bash        0        7  VALID
 8051000 74368000                 bash     1000        7  VALID
 8052000 72669000                 bash     2000        7  VALID
 8053000 66C6A000                 bash     3000        7  VALID
 8054000 636AB000                 bash     4000        0  INVALID,INMEMORY
 8055000 5FDEC000                 bash     5000        0  INVALID,INMEMORY
 8056000 63EED000                 bash     6000        0  INVALID,INMEMORY
 8057000 62EAE000                 bash     7000        0  INVALID,INMEMORY
 8058000 5C52F000                 bash     8000        7  VALID
 8059000 5C5B0000                 bash     9000        7  VALID
... output omitted
 80ED000 5C2C4000                 bash    9D000        7  VALID
 80EE000 5C245000                 bash    9E000        7  VALID
 80EF000 5C286000                 bash    9F000        3  VALID
 80F0000 63A97000                 bash    A0000        0  INVALID,INMEMORY
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
 8100000 79940000               [anon] 541D8000        1  VALID
 8101000 5F0C1000               [anon] 62F00000        1  VALID
 8102000 378C2000               [anon] 54438000        1  VALID
 8103000 5EF5A000                 bash    A3000        6  VALID
 8104000 5EEDB000                 bash    A4000        6  VALID
 8105000 37885000               [anon] 543B8000        1  VALID
 8106000 60E1D000                 bash    A6000        7  VALID
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
 8114000 37914000               [anon] 54478000        1  VALID
 8115000 79DD5000               [anon] 54368000        1  VALID
 8116000 55356000               [anon] 62F90000        1  VALID
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CE410000 7AE40000          libnsl.so.1        0       55  VALID
CE411000 7AEC1000          libnsl.so.1     1000       57  VALID
CE412000 7AE42000          libnsl.so.1     2000       57  VALID
CE413000 7AE83000          libnsl.so.1     3000       57  VALID
CE414000 7AE84000          libnsl.so.1     4000       57  VALID
...
CE42D000 6EE96000          libnsl.so.1    1D000       1A  INVALID,INMEMORY
CE42E000 6E797000          libnsl.so.1    1E000       1A  INVALID,INMEMORY
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CE4F0000  17D9000 methods_unicode.so.3        0       29  VALID
CE4F1000  17DA000 methods_unicode.so.3     1000       2C  VALID
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CE510000  1869000     en_US.UTF-8.so.3        0       28  VALID
CE511000  18AA000     en_US.UTF-8.so.3     1000       2A  VALID
...
CE518000 6F1EA000     en_US.UTF-8.so.3     8000        0  INVALID,INMEMORY
CE519000 6F1EB000     en_US.UTF-8.so.3     9000        0  INVALID,INMEMORY
CE51A000 6F1EC000     en_US.UTF-8.so.3     A000        0  INVALID,INMEMORY
...
CE5FF000 6DB60000     en_US.UTF-8.so.3    EF000        5  INVALID,INMEMORY
CE600000  1659000     en_US.UTF-8.so.3    F0000        7  INVALID,INMEMORY
...
CE6EE000  1687000     en_US.UTF-8.so.3   1DE000        9  INVALID,INMEMORY
CE6EF000  1688000     en_US.UTF-8.so.3   1DF000        9  INVALID,INMEMORY
CE6F0000  1649000     en_US.UTF-8.so.3   1E0000        9  INVALID,INMEMORY
...
CE729000  1782000     en_US.UTF-8.so.3   219000       29  VALID
CE72A000  1783000     en_US.UTF-8.so.3   21A000       29  VALID
...
CE730000  1709000     en_US.UTF-8.so.3   220000       29  VALID
CE731000 6F143000     en_US.UTF-8.so.3   221000        0  INVALID,INMEMORY
CE732000 6F144000     en_US.UTF-8.so.3   222000        0  INVALID,INMEMORY
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CE9B0000 76A42000     libc_hwcap1.so.1        0       5B  VALID
CE9B1000 76AC3000     libc_hwcap1.so.1     1000       5B  VALID
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
...
CEBC4000  2A34000              ld.so.1     5000       47  VALID
CEBC5000  28B5000              ld.so.1     6000       47  VALID
CEBC6000  29F6000              ld.so.1     7000       60  VALID
CEBC7000  2937000              ld.so.1     8000       60  VALID
...
>
Some general things to note:
-  Physical pages are randomly distributed.  However, pages from ld.so.1 tend to be in low memory with comparison to anonymous pages.  This should be expected as most pages of ld.so.1 are probably loaded early on in the system lifetime as most every application uses it.
 -  There are many pages that are not valid, but they are in memory.  In general, text and data pages are prefetched when a program starts, unless the program is large, or there is not enough free memory.  Although pages are prefetched, it appears that they are not mapped to the process  address space until/unless they are actually used.
 -  Bash is not very large.  Running the command above finishes in 5-10 seconds.  Running the same command on a large program (for instance, firefox-bin), takes several minutes to complete. Running the command on a large 64-bit application will take considerably longer.
 -  This is being run on a live system, so the address space of the process being examined may change while it is being walked.
 -  At this point in time, no pages are swapped out.
 
Now, let's get some general statistics.
First, a count of the pages currently valid for the process.  This is the current mapped RSS.  Note that the pmap command reports "RSS", which, at 3620k is 905 4k-byte pages.  But only 558 pages (or 2232k) are currently valid.
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i " valid" | wc
     558    3348   35712
>
Now, the pages in memory, but not currently valid in the page table(s) for the process.
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i "inmemory" | wc
     347    2082   26025
>
Note that the valid pages plus the in memory pages is 905, or the value reported by pmap.  So RSS as reported by pmap does not imply that page faults will not happen for all of those pages.  But if a page fault occurs the correct page will be found in memory.
How many pages are currently not valid (and not in memory).
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i " invalid$" | wc
     553    3298   36498
>
How large is the address space?
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -v OFFSET | wc
    1458    8728   98235
>
Note that this is 5832k, the total size as reported by pmap.
How many pages have been swapped out?
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i swapped | wc
       0       0       0
>
Now, we'll induce memory load on the system, and again examine the address space.  The memory usage induced should be enough to cause pages to be swapped (paged) out.
First, pmap output after the memory stress.
$ pmap -x 919
919: /bin/bash --noediting -i
 Address  Kbytes     RSS    Anon  Locked Mode   Mapped File
08045000      12       4       -       - rw---    [ stack ]
08050000     644     508       -       - r-x--  bash
08100000      80      80       -       - rwx--  bash
08114000      52      44      28       - rwx--    [ heap ]
CE410000     624     320       -       - r-x--  libnsl.so.1
CE4BC000      16      16       4       - rw---  libnsl.so.1
CE4C0000      20       8       -       - rw---  libnsl.so.1
CE4F0000      56      36       -       - r-x--  methods_unicode.so.3
CE50D000       4       4       -       - rwx--  methods_unicode.so.3
CE510000    2416     124       -       - r-x--  en_US.UTF-8.so.3
CE77B000       4       4       -       - rwx--  en_US.UTF-8.so.3
CE960000      64      16       -       - rwx--    [ anon ]
CE97E000       4       4       -       - rwxs-    [ anon ]
CE980000       4       4       -       - rwx--    [ anon ]
CE990000      24      12       4       - rwx--    [ anon ]
CE9A0000       4       4       4       - rwx--    [ anon ]
CE9B0000    1280     952       -       - r-x--  libc_hwcap1.so.1
CEAF0000      28      28      12       - rwx--  libc_hwcap1.so.1
CEAF7000       8       8       -       - rwx--  libc_hwcap1.so.1
CEB00000       4       4       -       - r-x--  libdl.so.1
CEB10000       4       4       4       - rwx--    [ anon ]
CEB20000      56      56       -       - r-x--  libsocket.so.1
CEB3E000       4       4       -       - rw---  libsocket.so.1
CEB40000     180      68       -       - r-x--  libcurses.so.1
CEB7D000      28      28       -       - rw---  libcurses.so.1
CEB84000       8       -       -       - rw---  libcurses.so.1
CEB90000       4       4       4       - rwx--    [ anon ]
CEBA0000       4       4       4       - rw---    [ anon ]
CEBB0000       4       4       -       - rw---    [ anon ]
CEBBF000     180     180       -       - r-x--  ld.so.1
CEBFC000       8       8       4       - rwx--  ld.so.1
CEBFE000       4       4       4       - rwx--  ld.so.1
-------- ------- ------- ------- -------
total Kb    5832    2544      72       -
$
As expected, the RSS has gone down, but the virtual size remains the same.  It is a little  interesting that the amount reported under anon has also dropped by 20k.
Again, we'll use the new dcmd to examine the address space more closely.
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
 8045000 378C5000               [anon] 54518000        1  VALID
 8046000        - /dev/zvol/dsk/rpool/swap 17EBF000        0  INVALID,SWAPPED
 8047000        - /dev/zvol/dsk/rpool/swap 1D8FE000        0  INVALID,SWAPPED
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
 8050000 16B86000                 bash        0        1  VALID
 8051000 13A07000                 bash     1000        1  VALID
 8052000 6B088000                 bash     2000        1  VALID
 8053000  1889000                 bash     3000        1  VALID
 8054000 2430A000                 bash     4000        1  VALID
 8055000 6440B000                 bash     5000        1  VALID
 8056000 6684C000                 bash     6000        1  VALID
 8057000 7308D000                 bash     7000        1  VALID
 8058000 6DCCE000                 bash     8000        0  INVALID,INMEMORY
 8059000 3784F000                 bash     9000        0  INVALID,INMEMORY
...
 80ED000 4CB23000                 bash    9D000        0  INVALID,INMEMORY
 80EE000 76BE4000                 bash    9E000        0  INVALID,INMEMORY
 80EF000  5BA5000                 bash    9F000        0  INVALID,INMEMORY
 80F0000 36836000                 bash    A0000        0  INVALID,INMEMORY
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
 8100000        - /dev/zvol/dsk/rpool/swap 247C2000        0  INVALID,SWAPPED
 8101000        - /dev/zvol/dsk/rpool/swap  7CCD000        0  INVALID,SWAPPED
 8102000 378C2000               [anon] 54438000        1  VALID
 8103000 75479000                 bash    A3000        0  INVALID,INMEMORY
 8104000 532BA000                 bash    A4000        0  INVALID,INMEMORY
 8105000 37885000               [anon] 543B8000        1  VALID
 8106000 7443C000                 bash    A6000        0  INVALID,INMEMORY
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
 8114000 37914000               [anon] 54478000        1  VALID
 8115000 79DD5000               [anon] 54368000        1  VALID
 8116000 55356000               [anon] 62F90000        1  VALID
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CE410000 7AE40000          libnsl.so.1        0       4C  VALID
CE411000 7AEC1000          libnsl.so.1     1000       4E  VALID
CE412000 7AE42000          libnsl.so.1     2000       4E  VALID
CE413000 7AE83000          libnsl.so.1     3000       4E  VALID
CE414000 7AE84000          libnsl.so.1     4000       4E  VALID
...
CE42D000 6EE96000          libnsl.so.1    1D000       18  INVALID,INMEMORY
CE42E000 6E797000          libnsl.so.1    1E000       18  INVALID,INMEMORY
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CE4F0000  17D9000 methods_unicode.so.3        0       27  VALID
CE4F1000  17DA000 methods_unicode.so.3     1000       2A  VALID
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CE510000  1869000     en_US.UTF-8.so.3        0       26  VALID
CE511000  18AA000     en_US.UTF-8.so.3     1000       28  VALID
...
CE518000        -     en_US.UTF-8.so.3     8000        0  INVALID
CE519000        -     en_US.UTF-8.so.3     9000        0  INVALID
CE51A000        -     en_US.UTF-8.so.3     A000        0  INVALID
...
CE5FF000        -     en_US.UTF-8.so.3    EF000        0  INVALID
CE600000        -     en_US.UTF-8.so.3    F0000        0  INVALID
...
CE6EE000  1687000     en_US.UTF-8.so.3   1DE000        A  INVALID,INMEMORY
CE6EF000  1688000     en_US.UTF-8.so.3   1DF000        A  INVALID,INMEMORY
CE6F0000  1649000     en_US.UTF-8.so.3   1E0000        A  INVALID,INMEMORY
...
CE729000  1782000     en_US.UTF-8.so.3   219000       27  VALID
CE72A000  1783000     en_US.UTF-8.so.3   21A000       27  VALID
...
CE730000  1709000     en_US.UTF-8.so.3   220000       27  VALID
CE731000        -     en_US.UTF-8.so.3   221000        0  INVALID
CE732000        -     en_US.UTF-8.so.3   222000        0  INVALID
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CE9B0000 76A42000     libc_hwcap1.so.1        0       51  VALID
CE9B1000 76AC3000     libc_hwcap1.so.1     1000       51  VALID
...
      VA       PA                 FILE  OFFSET     SHARES DISPOSITION
CEBC4000  2A34000              ld.so.1     5000       42  VALID
CEBC5000  28B5000              ld.so.1     6000       42  VALID
CEBC6000  29F6000              ld.so.1     7000       57  VALID
CEBC7000  2937000              ld.so.1     8000       57  VALID
...
>
As expected, many pages that were previously valid are now invalid.  Many of these pages are still in memory, but some have been swapped out. The output does not show it, but some pages that are swapped out can also be in memory (the page was swapped out, put on a freelist, but has not yet been re-used for some other purpose.  It is interesting that some pages with reasonably high share counts are still in memory, but no longer valid for this instance of bash.  The pageout code checks the share counts, and skips pages being shared by more than po_share processes.  On my system, po_share is 8.  So I am not sure what is marking the pages invalid (maybe a job for DTrace).
As before, I'll get some counts of valid, invalid, inmemory, and
swapped pages.
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i " valid" | wc
     413    2478   26432
>
Previously, the number of valid pages was 558, so 145 pages have been marked invalid and possibly swapped out.
The number of invalid pages is now:
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i " invalid$" | wc
     818    4888   53988
>
Previously, this was 553, so 265 pages that previously were valid are
now invalid.
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i "inmemory" | wc
     215    1290   16125
>
And 215 pages that are invalid are still in memory, but the page table entries for the bash instance does not have the pages mapped.
> 0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i swapped | wc
      12      72     936
And 12 pages of bash are on swap.
It would be nice to be able to show this graphically.  For instance, a large box representing the address space, with different colored pixels to represent the state of the different pages of the address space.  I have been told that JavaFX is good for this, but my knowledge of Java is really not up to it.  Especially for large processes, a graphical view would be nice (well, at least interesting to look at...).
I have not tried the dcmd on SPARC or x64, but I expect it to work (at least on x64).  I would also like to try this on a large machine which has latency groups set up.  If anyone has such a machine and would like to try this out, please let me know.
I also have a version of the command that only prints summary information.  I want to add an option that prints page sizes, but currently the command assumes all pages are the native page size (4k on x86/x64 and 8k on SPARC).
If there is interest, I'll make the code for the dcmd available.
3 comments:
Interesting. Please publish.
alan
The source is available at
ftp://ftp.bruningsystems.com/segpages.tar
Got it! thanks!
alan
Post a Comment