swap_pager: indefinite wait buffer

Miroslav Lachman 000.fbsd at quip.cz
Sat Aug 5 20:05:45 CEST 2006


Dnes se mi v logu na jednom serveru objevilo nekolik zprav, jako je tato:
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 23, size: 4096

Zhruba ve stejnou dobu "umrel" i proces, ktery testoval diskovy system 
(kopirovani souboru a overovani jejich md5 checksumu), od teto chvile je 
server neuveritelne zpomaleny. Jakakoliv operace v prikazovem radku pres 
SSH trva velice dlouho (napriklad i dokoncovani nazvu souboru v shellu). 
System je pritom nevytizen. Sice na nem je spusteni Apache, MySQL, 
Postfix, Amavisd atd., ale zatim neni nasazen do provozu. Jedine, co 
vykonava nejakou cinnost je generovani grafu MRTG.
Vyse ovedena hlaska se v prubehu dne jeste nekolikrat zopakovala.

last pid: 51602;  load averages:  0.00,  0.00,  0.00         up 
2+20:23:23  19:45:22
94 processes:  1 running, 93 sleeping
CPU states:  0.4% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.6% 
idle
Mem: 257M Active, 100M Inact, 102M Wired, 26M Cache, 60M Buf, 9316K Free
Swap: 3072M Total, 152K Used, 3072M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
  1077 root        1   5    0  2016K  1180K ttyin    3:26  0.00% systat
   599 mysql      10 102    0   493M 70820K ucond    0:43  0.00% mysqld
  1087 root        1  -8    0  2520K  1828K piperd   0:25  0.00% ccze
   544 root        1  96    0 26628K 25092K select   0:24  0.00% perl5.8.8
   491 root        1  96    0  3180K  1444K select   0:07  0.00% ntpd
   542 root        1   8    0 23524K 11116K nanslp   0:07  0.00% httpd
24567 root        1 -16    0  1948K  1220K vnread   0:05  0.00% cp


Napriklad spusteni prikazu gstat trva vic nez minutu.


systat -vmstat

     1 users    Load  0.04  0.01  0.00                  Aug  5 19:49

Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP PAGER
         Tot   Share      Tot    Share    Free         in  out     in  out
Act  131024    8804   906652    26924   32056 count
All  498000   75216246234032   210428         pages
                                                                  Interrupts
Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt        cow    2008 total
         1 12 80       356    4   69  158    2      104572 wire 
14: ata
                                                    262212 act       5 
16: bge
  0.3%Sys   0.0%Intr  0.0%User  0.0%Nice 99.7%Idl   106616 inact     2 
19: uhc
|    |    |    |    |    |    |    |    |    |      26488 cache  2001 
cpu0: time
                                                      5568 free
                                                           daefr
Namei         Name-cache    Dir-cache                     prcfr
     Calls     hits    %     hits    %                     react
         1        1  100                                   pdwake
                                           zfod            pdpgs
Disks   ad4   ad5                         ozfod           intrn
KB/t  13.67 13.67                         %slo-z    61456 buf
tps       1     1                       1 tfree        18 dirtybuf
MB/s   0.02  0.02                                   35659 desiredvnodes
% busy  111   111                                   29876 numvnodes
                                                      8913 freevnodes

Vsimnete si, ze disky maji 111% busy (pri jednotlivych obnovenich 
obrazovky se hodnoty pohybuji od 90 do 115), presto tps pouze 1 a datovy 
tok temer nulovy.
Ve vypisu gstat je videt problikavani velmi vysokych hodnot 
procentualniho zatizeni (cisla od 150 az do 3000), ale jsou to jen velmi 
kratke okamziky, rozhodne to nevypada, jako kdyz se na disk opravdu neco 
systematicky zapisuje / cte. Zaroven jsou tam silene dlouhe casy pro 
zapisy (ms/w).

dT: 0.501s  w: 0.500s
  L(q) ops/s r/s kBps ms/r w/s kBps   ms/w   %busy Name
    0    0    0    0  0.0   0    0    0.0    0.0| acd0
    4    2    0    0  0.0   2    4 5110.3  164.6| ad5
    5    2    0    0  0.0   2    4 5002.8  167.9| ad4
    6    2    0    0  0.0   2    4 5002.9  167.9| mirror/gm0
    0    0    0    0  0.0   0    0    0.0    0.0| ad4s1
    0    0    0    0  0.0   0    0    0.0    0.0| ad4s2
    0    0    0    0  0.0   0    0    0.0    0.0| ad5s1
    0    0    0    0  0.0   0    0    0.0    0.0| ad5s2
    1    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1
    5    2    0    0  0.0   2    4 5003.0  167.9| mirror/gm0s2
    0    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1a
    0    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1b
    0    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1c
    0    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1d
    0    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1e
    0    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1f
    0    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1g
    1    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s1h
    0    0    0    0  0.0   0    0    0.0    0.0| mirror/gm0s2c
    5    2    0    0  0.0   2    4 5003.0  167.9| mirror/gm0s2d

Nejsem bohuzel nijak schopen zjistit, cim to muze byt.

Zmineny stav trva uz vice jak 12 hodin, neni mi znamo, ze by neco 
opravdu pracovalo s diskem. Dokazal by mi nekdo poradit, cim a jak 
zjistovat, kde muze byt problem?

Miroslav Lachman

PS: stale se jedna o jiz nekolikrat zminovane stroje ASUS RS120-E3 s 
250GB SATA disky Seagate s pouzitim Gmirroru.

FreeBSD 6.1-STABLE #0: Wed Aug  2 22:07:55 CEST 2006 
/usr/obj/usr/src/sys/ASUS_RS120  i386



More information about the Users-l mailing list