Kernel stack watchdog traces
Each Kudu server process has a background thread called the Stack Watchdog, which monitors other threads in the server in case they are blocked for longer-than-expected periods of time. These traces can indicate operating system issues or bottle-necked storage.
When the watchdog thread identifies a case of thread blockage, it logs an
entry in the WARNING
log as follows:
W0921 23:51:54.306350 10912 kernel_stack_watchdog.cc:111] Thread 10937 stuck at /data/kudu/consensus/log.cc:505 for 537ms: Kernel stack: [<ffffffffa00b209d>] do_get_write_access+0x29d/0x520 [jbd2] [<ffffffffa00b2471>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [<ffffffffa00fe6d8>] __ext4_journal_get_write_access+0x38/0x80 [ext4] [<ffffffffa00d9b23>] ext4_reserve_inode_write+0x73/0xa0 [ext4] [<ffffffffa00d9b9c>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4] [<ffffffffa00d9e90>] ext4_dirty_inode+0x40/0x60 [ext4] [<ffffffff811ac48b>] __mark_inode_dirty+0x3b/0x160 [<ffffffff8119c742>] file_update_time+0xf2/0x170 [<ffffffff8111c1e0>] __generic_file_aio_write+0x230/0x490 [<ffffffff8111c4c8>] generic_file_aio_write+0x88/0x100 [<ffffffffa00d3fb1>] ext4_file_write+0x61/0x1e0 [ext4] [<ffffffff81180f5b>] do_sync_readv_writev+0xfb/0x140 [<ffffffff81181ee6>] do_readv_writev+0xd6/0x1f0 [<ffffffff81182046>] vfs_writev+0x46/0x60 [<ffffffff81182102>] sys_pwritev+0xa2/0xc0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff User stack: @ 0x3a1ace10c4 (unknown) @ 0x1262103 (unknown) @ 0x12622d4 (unknown) @ 0x12603df (unknown) @ 0x8e7bfb (unknown) @ 0x8f478b (unknown) @ 0x8f55db (unknown) @ 0x12a7b6f (unknown) @ 0x3a1b007851 (unknown) @ 0x3a1ace894d (unknown) @ (nil) (unknown)
These traces can be useful for diagnosing root-cause latency issues in Kudu especially when they are caused by underlying systems such as disk controllers or filesystems.