my server every 1-2 time month crash, checking from console i have a kernel panic, in attach the imge, i am using Centos 4.4 with the latest rpm kernel
We have experienced a serious downtime of our server due to a "Kernel Panic" problem. The server stops responding to ping requests. When we accessed the console through IPMI, we saw a "Kernel Panic" error detailed here: [url]and [url]
We are running redhat RHEL 5.1
[root@host~]# uname -a Linux host.com 2.6.18-92.1.1.el5PAE #1 SMP Thu May 22 09:16:17 EDT 2008 i686 i686 i386 GNU/Linux [root@host~]#
Today my server just crashed instantly... It was running fine with a load of 0.12 but then it went down all the sudden. After a reboot I checked the /var/log/messages file and saw this:
Code: Mar 23 15:57:00 alpha kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000020 Mar 23 15:57:00 alpha kernel: printing eip: Mar 23 15:57:00 alpha kernel: c017292b Mar 23 15:57:00 alpha kernel: *pde = 2085d001 Mar 23 15:57:00 alpha kernel: Oops: 0000 [#1] Mar 23 15:57:00 alpha kernel: SMP Mar 23 15:57:00 alpha kernel: Modules linked in: ipt_TOS iptable_mangle ip_conntrack_ftp ip_conntrack_irc ipt_LOG ipt_limit ipt_multiport ipt_state ip_conntr ack ipt_owner ipt_REJECT iptable_filter ip_tables autofs4 i2c_dev i2c_core sunrpc md5 ipv6 dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd e1000 floppy ext3 jbd ata_piix libata sd_mod scsi_mod Mar 23 15:57:00 alpha kernel: CPU: 0 Mar 23 15:57:00 alpha kernel: EIP: 0060:[<c017292b>] Not tainted VLI Mar 23 15:57:00 alpha kernel: EFLAGS: 00010246 (2.6.9-42.0.3.ELsmp) Mar 23 15:57:00 alpha kernel: EIP is at dnotify_flush+0xe/0x70 Mar 23 15:57:00 alpha kernel: eax: c0446c40 ebx: ca05dd80 ecx: 0046ef00 edx: f1d95300 Mar 23 15:57:00 alpha kernel: esi: 00000000 edi: ca05dd80 ebp: f1d95300 esp: eb3f3fa0 Mar 23 15:57:00 alpha kernel: ds: 007b es: 007b ss: 0068 Mar 23 15:57:00 alpha kernel: Process httpd (pid: 28057, threadinfo=eb3f3000 task=efaf03b0) Mar 23 15:57:00 alpha kernel: Stack: ca05dd80 00000220 f1d95300 eb3f3000 c015a7e5 0000020f bff65b64 bff65af0 Mar 23 15:57:00 alpha kernel: c02d47cb 0000020f 0046eff4 08cd4e6c bff65b64 bff65af0 bff615b8 00000006 Mar 23 15:57:00 alpha kernel: c02d007b 0000007b 00000006 0032a7a2 00000073 00000246 bff61598 0000007b Mar 23 15:57:00 alpha kernel: Call Trace: Mar 23 15:57:00 alpha kernel: [<c015a7e5>] filp_close+0x49/0x5f Mar 23 15:57:00 alpha kernel: [<c02d47cb>] syscall_call+0x7/0xb Mar 23 15:57:00 alpha kernel: [<c02d007b>] packet_rcv+0x17e/0x307 Mar 23 15:57:00 alpha kernel: Code: 00 89 c3 85 d2 74 0e 8b 42 04 8b 12 25 ff ff ff 7f 09 c1 eb ee 89 8b 34 01 00 00 5b c3 55 89 d5 57 89 c7 56 53 8b 40 08 8 b 70 10 <0f> b7 46 20 25 00 f0 00 00 3d 00 40 00 00 75 4d 8d 46 68 e8 fb Mar 23 15:57:00 alpha kernel: <0>Fatal exception: panic in 5 seconds Mar 23 16:06:46 alpha syslogd 1.4.1: restart. So it seems like httpd crashed my server? I checked the Apache error_log and noticed lines like these:
Code: *** glibc detected *** free(): invalid pointer: 0x08872c98 *** [Fri Mar 23 15:57:03 2007] [error] [client x.x.x.x] Premature end of script headers: /home/user/public_html/forum/index.php [Fri Mar 23 15:57:03 2007] [error] [client x.x.x.x] File does not exist: /home/user/public_html/500.shtml When I look at the whole error_log it seems like such errors appear frequently and for different websites and PHP applications. This isn't the only glibc error, there's also stuff like "*** glibc detected *** corrupted double-linked list: 0x088e0bb8 ***". I have recently installed eAccelerator and searched the web: I'm not the only one who has such problems with eAccelerator. Do you think these glibc errors may have caused the server to crash?
Also, lately I noticed a lot of lines like these in the /var/log/messages file:
Code: Mar 23 15:47:45 alpha kernel: post_create: setxattr failed, rc=122 (dev=sda7 ino=4318565) and
Code: Mar 23 15:50:56 alpha ntpd[2958]: sendto(66.111.46.200): Operation not permitted I've done some Google'ing on the first error and it seems like someone with the same error got it fixed by running a filesystem check... By the way /dev/sda7 is my /home directory. But in order to run a filesystem check I need to disable all services and unmount /home? That's kind of a pain in the ***... In general, how long do you think it takes to run a filesystem check? Here's information about my /home partition:
we have one box in hivelocity.net that has been down so many times this month that we were forced to remove links to siteuptime where we were once so proud of having a 99.7% uptime for 3 years in theplanet.
syslog shows that just before crashing, these entries were made:
kernel: kernel BUG at mm/rmap.c:479 kernel: invalid operand:0000 [#1]
dmesg also shows this:
... Brought up 2 CPUs zapping low mappings. checking if image is initramfs... it is Freeing initrd memory: 482k freed NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xf9f20, last bus=1 PCI: Using configuration type 1 mtrr: v2.0 (20020519) mtrr: your CPUs had inconsistent fixed MTRR settings mtrr: probably your BIOS does not setup all CPUs. mtrr: corrected configuration. ...
i've googled these messages and they point to ram problems.
hivelocity.net claims to have done diagnostics on the box and that there were no problems reported.
they said this is a result of a sys configuration problem made by us.
Last year I ordered a new server with Centos 4.3 and it had the kernel kernel 2.6.9-34.0.2ELsmp installed. It runned fine and I didn't update any packages since then.
Today I started getting a problem where both mysqld and kswapd0 uses very high amounts of CPU, spiking up to 100% and my memory usage is at 99% all the time. The problem seems exactly the same as the one mentioned in this thread.
In that thread the exact same kernel is said to be insecure and to cause this problem. I also came across a centOS bug that reports this problem with high cpu, mem usage and mysql & kswapd0 consuming all resources.
In the linked thread the person solved the problem by upgrading to kernel 2.6.9-42 using rpms but others recommended a newer kernel or a custom compiled kernel for CentOS.
Apparently when they used yum it said 34.0.2 was the latest kernel.
What should I do to upgrade the kernel, which version should i upgrade to, and where do I get it from? I won't be able to compile a custom kernel and I've only installed basic rpm packages before.
here is what I seen when I installed kernel-2.6.20-1.2948.fc6.src.rpm
rpm -ivh kernel-2.6.20-1.2948.fc6.src.rpm 1:kernel warning: user brewbuilder does not exist - using root warning: group brewbuilder does not exist - using root warning: user brewbuilder does not exist - using root ########################################### [100%] warning: user brewbuilder does not exist - using root warning: group brewbuilder does not exist - using root
then when I ran: rpmbuild -bp --target=$(uname -m) /usr/src/redhat/SPECS/kernel-2.6.spec
I seen this error: + Arch=x86_64 + make ARCH=x86_64 nonint_oldconfig In file included from /usr/include/sys/socket.h:35, from /usr/include/netinet/in.h:24, from /usr/include/arpa/inet.h:23, from scripts/basic/fixdep.c:117: /usr/include/bits/socket.h:310:24: error: asm/socket.h: No such file or directory make[1]: *** [scripts/basic/fixdep] Error 1 make: *** [scripts_basic] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.93770 (%prep)
I need to have this installed to get a app installed etc... suggestions or ideas? thanks
I have a Xen VPS. I started with a Debian 4 image and have since upgraded to Debian 5. Firstly was this advisable? Secondly what Kernel version should I be running, or rather is it set by my installation or by the Xen server?
as part of a project I have lately been looking into various aspects of kernel tuning. Most notably lately tuning the TCP stack for more efficient memory usage/throughput.
Thought I would start this thread to mention some of the tools I'd found for doing testing and see what anyone else had to recommend.
So far my favorite of the bunch is nuttcp. Its easy to use and gives a very good idea of how much of your bandwidth you are able to utilize.
A few interesting web pages are as follows for anyone interested in the topic:
[url]- Tuning TCP for High Bandwidth Delay networks
[url]- TCP Tuning Cook book, some interesting information in there as well
[url]...formanceTuning - Performance Tuning TWiki. Has a list of useful tools, flags for existing tools and ways to monitor network performance from a system level, along with some suggestions of things to correct
What is the best way to find out which filesystems and harddrive drivers you can remove? Obviously, i need ext2,3 but how do you find which HD you only need?
i can not find any important info from /var/log/messages
but i find some records many time on it,those like ---------------------------------- Jun 15 05:30:40 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:40 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:40 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:40 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:40 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:40 server kernel: ata1: EH complete Jun 15 05:30:42 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:42 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:42 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:42 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:42 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:42 server kernel: ata1: EH complete Jun 15 05:30:44 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:51 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:51 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:51 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:51 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:51 server kernel: ata1: EH complete Jun 15 05:30:51 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:51 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:51 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:51 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:51 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:51 server kernel: ata1: EH complete Jun 15 05:30:51 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:51 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:51 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:51 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:51 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:52 server kernel: ata1: EH complete
Jun 15 05:31:26 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:31:30 server kernel: sd 0:0:0:0: SCSI error: return code = 0x08000002 Jun 15 05:31:33 server kernel: sda: Current [descriptor]: sense key: Medium Error Jun 15 05:31:36 server kernel: Add. Sense: Unrecovered read error - auto reallocate failed Jun 15 05:31:36 server kernel: Jun 15 05:31:39 server kernel: Descriptor sense data with sense descriptors (in hex): Jun 15 05:31:46 server kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 15 05:31:51 server kernel: 2c d2 23 42 Jun 15 05:31:56 server kernel: end_request: I/O error, dev sda, sector 751969090 Jun 15 05:31:57 server kernel: ata1: EH complete Jun 15 05:31:57 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Jun 15 05:31:58 server kernel: sda: Write Protect is off Jun 15 05:31:58 server kernel: SCSI device sda: drive cache: write back Jun 15 05:31:59 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Jun 15 05:32:03 server kernel: sda: Write Protect is off Jun 15 05:32:04 server kernel: SCSI device sda: drive cache: write back -------------------
I copied the default config file and renamed it as .config but I get this:
Code: WARNING: No module dm-mem-cache found for kernel 2.6.27.10-grsec, continuing anyway WARNING: No module dm-region_hash found for kernel 2.6.27.10-grsec, continuing anyway WARNING: No module dm-message found for kernel 2.6.27.10-grsec, continuing anyway WARNING: No module dm-raid45 found for kernel 2.6.27.10-grsec, continuing anyway
My current kernel version is "2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:17:21 EST 2007 i686 athlon i386 GNU/Linux". I want it to be upgraded since it is old. I have been told by our server management company that the latest kernel distributed from yum is kernel.i686 0:2.6.9-78.0.22.E. Can anyone tell me if this version is safe and secure enough? It is a CentOS release 4.7 (Final) server with cPanel installed.
when doing 2.6.26+ or w/e it is, how do you enable conntrack, what options do i need to enable under make menuconfig?
net.netfilter.nf_conntrack_acct = 1 net.netfilter.nf_conntrack_generic_timeout = 120 error: "net.netfilter.nf_conntrack_icmp_timeout" is an unknown key error: "net.netfilter.nf_conntrack_tcp_timeout_close" is an unknown key error: "net.netfilter.nf_conntrack_tcp_timeout_time_wait" is an unknown key error: "net.netfilter.nf_conntrack_tcp_timeout_last_ack" is an unknown key error: "net.netfilter.nf_conntrack_tcp_timeout_close_wait" is an unknown key error: "net.netfilter.nf_conntrack_tcp_timeout_fin_wait" is an unknown key error: "net.netfilter.nf_conntrack_tcp_timeout_established" is an unknown key error: "net.netfilter.nf_conntrack_tcp_timeout_syn_recv" is an unknown key error: "net.netfilter.nf_conntrack_tcp_timeout_syn_sent" is an unknown key error: "net.netfilter.nf_conntrack_udp_timeout" is an unknown key error: "net.netfilter.nf_conntrack_udp_timeout_stream" is an unknown key net.netfilter.nf_conntrack_max = 262144
and how do i know which hardware/devices that i can remove?