-- Demo case: Real life use case for pSnapper -- Date: 13-04-2019 -- Info: Database runs on NFS storage with dNFS and Oracle's archiver process is hanging/stuck from time to time -- References: https://blog.tanelpoder.com/psnapper/ -- Demo case from Oracle version: 12.1.0.2 -- create system state dumps (usually 2 or 3 in a row to get a picture if process state is moving) when problem occurs oradebug setmypid oradebug unlimit oradebug dump systemstate 258 --- system state dump snippets regarding Oracle's archiver process --- 1st systemstate dump ... PROCESS 28: ARC2 ---------------------------------------- SO: 0x558a431c8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x558a431c8, name=process, file=ksu.h LINE:13949, pg=0 conuid=0 (process) Oracle pid:28, ser:2, calls cur/top: 0x55cb2bc70/0x55cb2c868 flags : (0x2) SYSTEM icon_uid:0 flags2: (0x0), flags3: (0x10) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 0 0 26 last post received-location: ksa2.h LINE:290 ID:ksasnd last process to post me: 0x55c7e52c0 1 6 last post sent: 0 0 9 last post sent-location: ksq.h LINE:2797 ID:ksqrcl last process posted by me: 0x55c7e8960 1 2 waiter on post event: 0 (latch info) hold_bits=0x0 Process Group: DEFAULT, pseudo proc: 0x558ae2080 O/S info: user: oracle, term: UNKNOWN, ospid: 75413 OSD pid info: Short stack dump: ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-recvmsg()+16<-kgnfs_finish_partial_recv()+403<-kgnfs_getaprocdata()+1040<-kgnfs_processmsg()+2461<-kgnfs_getmsg()+457<-kgnfswat()+2653<-kgnfs_wat()+150<-kgodm_reap_ios()+1247<-kgodm_io()+366<-odm_io()+58<-ksfd_odmwat()+1035<-ksfdwtio()+610<-ksfdwat_internal()+197<-ksfdblock()+328<-krr_finish_read()+657<-krr_async_read()+731<-krr_read_disk()+6519<-krr_read()+216<-krse_arc_spool()+1973<-krse_arc_driver_core()+1211<-krse_arc_driver()+568<-kcrrwkx()+2870<-kcrrwk()+316<-ksbabs()+865<-krsv_abs()+71<-ksbrdp()+1068<-opirip()+1488<-opidrv()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+253 ... ---------------------------------------- SO: 0x55ca32310, type: 4, owner: 0x558a431c8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x558a431c8, name=session, file=ksu.h LINE:13957, pg=0 conuid=0 (session) sid: 316 ser: 27109 trans: (nil), creator: 0x558a431c8 flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x409) -/-/INC DID: 0001-001C-00000003, short-term DID: 0000-0000-00000000 txn branch: (nil) edition#: 0 user#/name: 0/SYS oct: 0, prv: 0, sql: (nil), psql: (nil) stats: 0x547d9f558, PX stats: 0xcf6e4f0 ksuxds FALSE at location: 0 service name: SYS$BACKGROUND Current Wait Stack: 0: waiting for 'log file sequential read' log#=0x285ec, block#=0x98000, blocks=0x800 wait_id=10788869 seq_num=14171 snap_id=1 wait times: snap=13 min 44 sec, exc=13 min 44 sec, total=13 min 44 sec wait times: max=infinite, heur=13 min 44 sec wait counts: calls=0 os=0 in_wait=1 iflags=0x5a0 Wait State: fixed_waits=0 flags=0x22 boundary=(nil)/-1 Session Wait History: elapsed time of 0.001137 sec since current wait 0: waited for 'log file sequential read' log#=0x285ec, block#=0x97000, blocks=0x800 wait_id=10788868 seq_num=14170 snap_id=1 wait times: snap=0.005067 sec, exc=0.005067 sec, total=0.005067 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.004964 sec of elapsed time 1: waited for 'log file sequential read' log#=0x285ec, block#=0x96000, blocks=0x800 wait_id=10788867 seq_num=14169 snap_id=1 wait times: snap=0.003334 sec, exc=0.003334 sec, total=0.003334 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.003222 sec of elapsed time 2: waited for 'log file sequential read' log#=0x285ec, block#=0x95000, blocks=0x800 wait_id=10788866 seq_num=14168 snap_id=1 wait times: snap=0.006440 sec, exc=0.006440 sec, total=0.006440 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.001065 sec of elapsed time 3: waited for 'log file sequential read' log#=0x285ec, block#=0x94000, blocks=0x800 wait_id=10788865 seq_num=14167 snap_id=1 wait times: snap=0.005283 sec, exc=0.005283 sec, total=0.005283 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000572 sec of elapsed time 4: waited for 'log file sequential read' log#=0x285ec, block#=0x93800, blocks=0x800 wait_id=10788864 seq_num=14166 snap_id=1 wait times: snap=0.002916 sec, exc=0.002916 sec, total=0.002916 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000560 sec of elapsed time 5: waited for 'log file sequential read' log#=0x285ec, block#=0x93000, blocks=0x800 wait_id=10788863 seq_num=14165 snap_id=1 wait times: snap=0.005669 sec, exc=0.005669 sec, total=0.005669 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000694 sec of elapsed time 6: waited for 'log file sequential read' log#=0x285ec, block#=0x92800, blocks=0x800 wait_id=10788862 seq_num=14164 snap_id=1 wait times: snap=0.003684 sec, exc=0.003684 sec, total=0.003684 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.003850 sec of elapsed time 7: waited for 'log file sequential read' log#=0x285ec, block#=0x91800, blocks=0x800 wait_id=10788861 seq_num=14163 snap_id=1 wait times: snap=0.001965 sec, exc=0.001965 sec, total=0.001965 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000584 sec of elapsed time 8: waited for 'log file sequential read' log#=0x285ec, block#=0x91000, blocks=0x800 wait_id=10788860 seq_num=14162 snap_id=1 wait times: snap=0.003977 sec, exc=0.003977 sec, total=0.003977 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000744 sec of elapsed time 9: waited for 'log file sequential read' log#=0x285ec, block#=0x90800, blocks=0x800 wait_id=10788859 seq_num=14161 snap_id=1 wait times: snap=0.008299 sec, exc=0.008299 sec, total=0.008299 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000565 sec of elapsed time ... --- 2nd systemstate dump ... PROCESS 28: ARC2 ---------------------------------------- SO: 0x558a431c8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x558a431c8, name=process, file=ksu.h LINE:13949, pg=0 conuid=0 (process) Oracle pid:28, ser:2, calls cur/top: 0x55cb2bc70/0x55cb2c868 flags : (0x2) SYSTEM icon_uid:0 flags2: (0x0), flags3: (0x10) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 0 0 26 last post received-location: ksa2.h LINE:290 ID:ksasnd last process to post me: 0x55c7e52c0 1 6 last post sent: 0 0 9 last post sent-location: ksq.h LINE:2797 ID:ksqrcl last process posted by me: 0x55c7e8960 1 2 waiter on post event: 0 (latch info) hold_bits=0x0 Process Group: DEFAULT, pseudo proc: 0x558ae2080 O/S info: user: oracle, term: UNKNOWN, ospid: 75413 OSD pid info: Short stack dump: ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-recvmsg()+14<-kgnfs_finish_partial_recv()+403<-kgnfs_getaprocdata()+1040<-kgnfs_processmsg()+2461<-kgnfs_getmsg()+457<-kgnfswat()+2653<-kgnfs_wat()+150<-kgodm_reap_ios()+1247<-kgodm_io()+366<-odm_io()+58<-ksfd_odmwat()+1035<-ksfdwtio()+610<-ksfdwat_internal()+197<-ksfdblock()+328<-krr_finish_read()+657<-krr_async_read()+731<-krr_read_disk()+6519<-krr_read()+216<-krse_arc_spool()+1973<-krse_arc_driver_core()+1211<-krse_arc_driver()+568<-kcrrwkx()+2870<-kcrrwk()+316<-ksbabs()+865<-krsv_abs()+71<-ksbrdp()+1068<-opirip()+1488<-opidrv()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+253 ... ---------------------------------------- SO: 0x55ca32310, type: 4, owner: 0x558a431c8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x558a431c8, name=session, file=ksu.h LINE:13957, pg=0 conuid=0 (session) sid: 316 ser: 27109 trans: (nil), creator: 0x558a431c8 flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x409) -/-/INC DID: 0001-001C-00000003, short-term DID: 0000-0000-00000000 txn branch: (nil) edition#: 0 user#/name: 0/SYS oct: 0, prv: 0, sql: (nil), psql: (nil) stats: 0x547d9f558, PX stats: 0xcf6e4f0 ksuxds FALSE at location: 0 service name: SYS$BACKGROUND Current Wait Stack: 0: waiting for 'log file sequential read' log#=0x285ec, block#=0x98000, blocks=0x800 wait_id=10788869 seq_num=14171 snap_id=1 wait times: snap=28 min 44 sec, exc=28 min 44 sec, total=28 min 44 sec wait times: max=infinite, heur=28 min 44 sec wait counts: calls=0 os=0 in_wait=1 iflags=0x5a0 Wait State: fixed_waits=0 flags=0x22 boundary=(nil)/-1 Session Wait History: elapsed time of 0.001137 sec since current wait 0: waited for 'log file sequential read' log#=0x285ec, block#=0x97000, blocks=0x800 wait_id=10788868 seq_num=14170 snap_id=1 wait times: snap=0.005067 sec, exc=0.005067 sec, total=0.005067 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.004964 sec of elapsed time 1: waited for 'log file sequential read' log#=0x285ec, block#=0x96000, blocks=0x800 wait_id=10788867 seq_num=14169 snap_id=1 wait times: snap=0.003334 sec, exc=0.003334 sec, total=0.003334 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.003222 sec of elapsed time 2: waited for 'log file sequential read' log#=0x285ec, block#=0x95000, blocks=0x800 wait_id=10788866 seq_num=14168 snap_id=1 wait times: snap=0.006440 sec, exc=0.006440 sec, total=0.006440 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.001065 sec of elapsed time 3: waited for 'log file sequential read' log#=0x285ec, block#=0x94000, blocks=0x800 wait_id=10788865 seq_num=14167 snap_id=1 wait times: snap=0.005283 sec, exc=0.005283 sec, total=0.005283 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000572 sec of elapsed time 4: waited for 'log file sequential read' log#=0x285ec, block#=0x93800, blocks=0x800 wait_id=10788864 seq_num=14166 snap_id=1 wait times: snap=0.002916 sec, exc=0.002916 sec, total=0.002916 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000560 sec of elapsed time 5: waited for 'log file sequential read' log#=0x285ec, block#=0x93000, blocks=0x800 wait_id=10788863 seq_num=14165 snap_id=1 wait times: snap=0.005669 sec, exc=0.005669 sec, total=0.005669 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000694 sec of elapsed time 6: waited for 'log file sequential read' log#=0x285ec, block#=0x92800, blocks=0x800 wait_id=10788862 seq_num=14164 snap_id=1 wait times: snap=0.003684 sec, exc=0.003684 sec, total=0.003684 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.003850 sec of elapsed time 7: waited for 'log file sequential read' log#=0x285ec, block#=0x91800, blocks=0x800 wait_id=10788861 seq_num=14163 snap_id=1 wait times: snap=0.001965 sec, exc=0.001965 sec, total=0.001965 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000584 sec of elapsed time 8: waited for 'log file sequential read' log#=0x285ec, block#=0x91000, blocks=0x800 wait_id=10788860 seq_num=14162 snap_id=1 wait times: snap=0.003977 sec, exc=0.003977 sec, total=0.003977 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000744 sec of elapsed time 9: waited for 'log file sequential read' log#=0x285ec, block#=0x90800, blocks=0x800 wait_id=10788859 seq_num=14161 snap_id=1 wait times: snap=0.008299 sec, exc=0.008299 sec, total=0.008299 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000565 sec of elapsed time ... --- ARC2 is stuck by reading log#=0x285ec, block#=0x98000, blocks=0x800 for at least 28 min 44 sec and executing OS syscall recvmsg(). --- However syscall recvmsg() itself is not stuck - see different + = Offset in bytes from beginning of symbol (function) where child function call happened. --- Further analysis with pSnapper regarding Linux OS syscall recvmsg() would be with option wchan or upcoming kstack functionality, e.g. something like : psn -p 75413 -a -g syscall,wchan,kstack --- We need to know what's going on in Linux kernel space for OS syscall recvmsg() :-)