| Anonymous | Login | Signup for a new account | 2010-09-09 17:17 CEST |
| Main | My View | View Issues | Change Log | Roadmap | Docs |
| Viewing Issue Simple Details [ Jump to Notes ] | [ View Advanced ] [ Issue History ] [ Print ] | |||||||||||
| ID | Category | Severity | Reproducibility | Date Submitted | Last Update | |||||||
| 0000559 | [Sphinx] general | crash | always | 2010-07-27 05:05 | 2010-07-27 05:36 | |||||||
| Reporter | chris14679 | View Status | public | |||||||||
| Assigned To | ||||||||||||
| Priority | normal | Resolution | open | |||||||||
| Status | new | Product Version | 1.10-beta | |||||||||
| Summary | 0000559: frequently locks up entire server when using realtime indexes | |||||||||||
| Description |
We are using beta 1.10 downloaded as RPM with 64 bit IDs. We have tried to convert an existing index into a realtime index but have now given up due to frequent serious crashes. There are 330M rows in the table we are indexing. On average we can insert about 10M rows before the system locks up. I was monitoring 'top' at the exact time of one of the crashes, when the 'top' display froze there was 19GB of available ('cached' according to top) memory (8GB RAM + 11GB swap) from the total on this machine of 36GB RAM and 11GB swap. The load average was only 5.4 (the machine has 8 cores) so memory shortage and high CPU use can be rules out as causes. I can rule out the content of the table causing the problem because when we restarted insertion from the same point after a crash, it works OK but crashes later on a different record. We got up to about 50M rows inserted with about 5 crashes at which point we abandoned realtime. On the last occasion, the machine actually locked up while searchd was loading the realtime index, before we started inserting. This pretty much rules out that anything other than the realtime index could be causing the crashes. There were no sphinx.crash files and nothing in the log to indicate what the problem is. I have now reverted to using conventional sphinx indexes and rebuilding the whole index every few days. Strangely, the server still responded to PING, sometimes intermittently, but everything else locked up - SSH, HTTP etc. all went down instantly and stayed down. A hard reboot followed by database recovery from logs was necessary each time. |
|||||||||||
| Additional Information |
Server is RHEL5 and is shared by a mysql database server. Unresolved bug 0000445 also occurred on this same server causing similar crashes with a different version of sphinx. Since then we have rotated indexes manually to avoid causing a crash so I don't know whether 0000445 is still an ongoing problem. I don't want to try a rotation until I have some indication that you think 0000445 has been identified and fixed. Generally the crashes left no record in any of the logs but on one occasion we did get this in the 'messages' file: Jul 25 05:20:56 db2 kernel: INFO: task kjournald:991 blocked for more than 120 seconds. Jul 25 05:20:56 db2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 25 05:20:56 db2 kernel: kjournald D ffff810001004420 0 991 331 1011 954 (L-TLB) Jul 25 05:20:56 db2 kernel: ffff8104be7e1dd0 0000000000000046 0000000000000010 ffff81079efe8bb0 Jul 25 05:20:56 db2 kernel: 0000000000000000 000000000000000a ffff81093ff59860 ffffffff80308b60 Jul 25 05:20:56 db2 kernel: 00000a47563b5d15 00000000000023d7 ffff81093ff59a48 00000000bfb15a78 Jul 25 05:20:56 db2 kernel: Call Trace: Jul 25 05:20:56 db2 kernel: [<ffffffff880335cf>] :jbd:journal_commit_transaction+0x16d/0x1066 Jul 25 05:20:56 db2 kernel: [<ffffffff800a0abe>] autoremove_wake_function+0x0/0x2e Jul 25 05:20:56 db2 kernel: [<ffffffff8004b36f>] try_to_del_timer_sync+0x7f/0x88 Jul 25 05:20:56 db2 kernel: [<ffffffff880375d3>] :jbd:kjournald+0xc1/0x213 Jul 25 05:20:56 db2 kernel: [<ffffffff800a0abe>] autoremove_wake_function+0x0/0x2e Jul 25 05:25:46 db2 kernel: [<ffffffff88037512>] :jbd:kjournald+0x0/0x213 Jul 25 05:26:41 db2 kernel: [<ffffffff800a08a6>] keventd_create_kthread+0x0/0xc4 Jul 25 05:26:41 db2 kernel: [<ffffffff80032894>] kthread+0xfe/0x132 Jul 25 05:26:41 db2 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jul 25 05:26:41 db2 kernel: [<ffffffff800a08a6>] keventd_create_kthread+0x0/0xc4 Jul 25 05:26:41 db2 kernel: [<ffffffff80032796>] kthread+0x0/0x132 Jul 25 05:26:41 db2 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jul 25 05:26:41 db2 kernel: Jul 25 05:26:41 db2 kernel: INFO: task mysqld:6497 blocked for more than 120 seconds. Jul 25 05:26:41 db2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 25 05:26:41 db2 kernel: mysqld D 0000000000000003 0 6497 6261 6499 (NOTLB) Jul 25 05:26:41 db2 kernel: ffff8104a7e6ded8 0000000000000086 ffff81093b609128 ffffffff800646ac Jul 25 05:26:42 db2 kernel: 0000000000000292 0000000000000007 ffff8101d50ab820 ffff8104b67677a0 Jul 25 05:26:42 db2 kernel: 00000a59bf07efc2 0000000000005ccd ffff8101d50aba08 0000000480066b88 Jul 25 05:26:42 db2 kernel: Call Trace: Jul 25 05:26:42 db2 kernel: [<ffffffff800646ac>] __down_read+0x7a/0x92 Jul 25 05:26:42 db2 kernel: [<ffffffff8000df14>] do_mmap_pgoff+0x431/0x7d7 Jul 25 05:26:42 db2 kernel: [<ffffffff80064613>] __down_write_nested+0x7a/0x92 Jul 25 05:26:42 db2 kernel: [<ffffffff80020bbc>] sys_mprotect+0xcc/0x224 Jul 25 05:26:42 db2 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Jul 25 05:26:42 db2 kernel: Jul 25 05:26:42 db2 kernel: INFO: task mysqld:6534 blocked for more than 120 seconds. Jul 25 05:26:42 db2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 25 05:26:42 db2 kernel: mysqld D 00002aaaaaaab000 0 6534 6261 6539 6531 (NOTLB) Jul 25 05:26:42 db2 kernel: ffff81012c8e1f08 0000000000000086 0000000000000821 0000000000000002 Jul 25 05:26:42 db2 kernel: 0000000000000004 0000000000000007 ffff81025d3d4820 ffff8101d50ab820 Jul 25 05:26:42 db2 kernel: 00000a59bf0792f5 000000000001ca18 ffff81025d3d4a08 000000044c4c00e0 Jul 25 05:26:42 db2 kernel: Call Trace: Jul 25 05:26:42 db2 kernel: [<ffffffff8000c5fe>] _atomic_dec_and_lock+0x39/0x57 Jul 25 05:26:42 db2 kernel: [<ffffffff80064613>] __down_write_nested+0x7a/0x92 Jul 25 05:26:42 db2 kernel: [<ffffffff800161ee>] sys_munmap+0x32/0x59 Jul 25 05:26:42 db2 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Jul 25 05:26:42 db2 kernel: Jul 25 05:26:42 db2 kernel: INFO: task mysqld:7846 blocked for more than 120 seconds. Jul 25 05:26:42 db2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 25 05:26:42 db2 kernel: mysqld D 0000000000000006 0 7846 6261 15789 7845 (NOTLB) Jul 25 05:26:42 db2 kernel: ffff810706cd7e18 0000000000000086 0000000000000000 00002aad67b81000 Jul 25 05:26:42 db2 kernel: ffffffff80317ed0 0000000000000007 ffff8109350ab860 ffff81093e6d5040 Jul 25 05:26:42 db2 kernel: 00000a59bf084051 0000000000007ffa ffff8109350aba48 0000000580008f97 Jul 25 05:26:42 db2 kernel: Call Trace: Jul 25 05:26:42 db2 kernel: [<ffffffff800646ac>] __down_read+0x7a/0x92 Jul 25 05:26:42 db2 kernel: [<ffffffff80066ad0>] do_page_fault+0x446/0x874 Jul 25 05:26:42 db2 kernel: [<ffffffff800312b3>] do_fork+0x148/0x1c1 Jul 25 05:27:28 db2 kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84 Jul 25 05:27:28 db2 kernel: Jul 25 05:27:28 db2 kernel: INFO: task mysqld:13903 blocked for more than 120 seconds. Jul 25 05:27:28 db2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 25 05:27:28 db2 kernel: mysqld D 0000000000000006 0 13903 6261 13904 13902 (NOTLB) Jul 25 05:27:28 db2 kernel: ffff810723dc9e18 0000000000000086 0000000000000000 00002aad67939000 Jul 25 05:27:28 db2 kernel: ffffffff80317ed0 0000000000000007 ffff81093fc2f860 ffff8109350ab860 Jul 25 05:27:28 db2 kernel: 00000a59bf07c057 000000000001df0a ffff81093fc2fa48 0000000580008f97 Jul 25 05:27:28 db2 kernel: Call Trace: Jul 25 05:29:52 db2 kernel: [<ffffffff800646ac>] __down_read+0x7a/0x92 Jul 25 05:38:02 db2 syslogd 1.4.1: restart. ----------- The following errors also appeared in the 'messages' log after we stopped using realtime indexes and did not coincide with a lockup, but I include them as they may be relevant: Jul 26 00:11:27 db2 kernel: searchd[6216]: segfault at 0000000000000000 rip 00002b31e9abced7 rsp 00007fffcb4c5668 error 6 Jul 26 00:21:48 db2 kernel: searchd[14278]: segfault at 0000000000000000 rip 00002b7024215ed7 rsp 00007fffa1d83888 error 6 |
|||||||||||
| Tags | No tags attached. | |||||||||||
| Attached Files | ||||||||||||
|
|
||||||||||||
Issue History |
|||
| Date Modified | Username | Field | Change |
| 2010-07-27 05:05 | chris14679 | New Issue | |
| 2010-07-27 05:36 | chris14679 | Note Added: 0000812 | |
| 2010-07-27 05:39 | chris14679 | Note Edited: 0000812 | |
| Copyright © 2000 - 2008 Mantis Group |