We have experienced a serious downtime of our server due to a "Kernel Panic" problem. The server stops responding to ping requests. When we accessed the console through IPMI, we saw a "Kernel Panic" error detailed here: [url]and [url]
We are running redhat RHEL 5.1
[root@host~]# uname -a
Linux host.com 2.6.18-92.1.1.el5PAE #1 SMP Thu May 22 09:16:17 EDT 2008 i686 i686 i386 GNU/Linux
[root@host~]#
Today my server just crashed instantly... It was running fine with a load of 0.12 but then it went down all the sudden. After a reboot I checked the /var/log/messages file and saw this:
Code: Mar 23 15:57:00 alpha kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000020 Mar 23 15:57:00 alpha kernel: printing eip: Mar 23 15:57:00 alpha kernel: c017292b Mar 23 15:57:00 alpha kernel: *pde = 2085d001 Mar 23 15:57:00 alpha kernel: Oops: 0000 [#1] Mar 23 15:57:00 alpha kernel: SMP Mar 23 15:57:00 alpha kernel: Modules linked in: ipt_TOS iptable_mangle ip_conntrack_ftp ip_conntrack_irc ipt_LOG ipt_limit ipt_multiport ipt_state ip_conntr ack ipt_owner ipt_REJECT iptable_filter ip_tables autofs4 i2c_dev i2c_core sunrpc md5 ipv6 dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd e1000 floppy ext3 jbd ata_piix libata sd_mod scsi_mod Mar 23 15:57:00 alpha kernel: CPU: 0 Mar 23 15:57:00 alpha kernel: EIP: 0060:[<c017292b>] Not tainted VLI Mar 23 15:57:00 alpha kernel: EFLAGS: 00010246 (2.6.9-42.0.3.ELsmp) Mar 23 15:57:00 alpha kernel: EIP is at dnotify_flush+0xe/0x70 Mar 23 15:57:00 alpha kernel: eax: c0446c40 ebx: ca05dd80 ecx: 0046ef00 edx: f1d95300 Mar 23 15:57:00 alpha kernel: esi: 00000000 edi: ca05dd80 ebp: f1d95300 esp: eb3f3fa0 Mar 23 15:57:00 alpha kernel: ds: 007b es: 007b ss: 0068 Mar 23 15:57:00 alpha kernel: Process httpd (pid: 28057, threadinfo=eb3f3000 task=efaf03b0) Mar 23 15:57:00 alpha kernel: Stack: ca05dd80 00000220 f1d95300 eb3f3000 c015a7e5 0000020f bff65b64 bff65af0 Mar 23 15:57:00 alpha kernel: c02d47cb 0000020f 0046eff4 08cd4e6c bff65b64 bff65af0 bff615b8 00000006 Mar 23 15:57:00 alpha kernel: c02d007b 0000007b 00000006 0032a7a2 00000073 00000246 bff61598 0000007b Mar 23 15:57:00 alpha kernel: Call Trace: Mar 23 15:57:00 alpha kernel: [<c015a7e5>] filp_close+0x49/0x5f Mar 23 15:57:00 alpha kernel: [<c02d47cb>] syscall_call+0x7/0xb Mar 23 15:57:00 alpha kernel: [<c02d007b>] packet_rcv+0x17e/0x307 Mar 23 15:57:00 alpha kernel: Code: 00 89 c3 85 d2 74 0e 8b 42 04 8b 12 25 ff ff ff 7f 09 c1 eb ee 89 8b 34 01 00 00 5b c3 55 89 d5 57 89 c7 56 53 8b 40 08 8 b 70 10 <0f> b7 46 20 25 00 f0 00 00 3d 00 40 00 00 75 4d 8d 46 68 e8 fb Mar 23 15:57:00 alpha kernel: <0>Fatal exception: panic in 5 seconds Mar 23 16:06:46 alpha syslogd 1.4.1: restart. So it seems like httpd crashed my server? I checked the Apache error_log and noticed lines like these:
Code: *** glibc detected *** free(): invalid pointer: 0x08872c98 *** [Fri Mar 23 15:57:03 2007] [error] [client x.x.x.x] Premature end of script headers: /home/user/public_html/forum/index.php [Fri Mar 23 15:57:03 2007] [error] [client x.x.x.x] File does not exist: /home/user/public_html/500.shtml When I look at the whole error_log it seems like such errors appear frequently and for different websites and PHP applications. This isn't the only glibc error, there's also stuff like "*** glibc detected *** corrupted double-linked list: 0x088e0bb8 ***". I have recently installed eAccelerator and searched the web: I'm not the only one who has such problems with eAccelerator. Do you think these glibc errors may have caused the server to crash?
Also, lately I noticed a lot of lines like these in the /var/log/messages file:
Code: Mar 23 15:47:45 alpha kernel: post_create: setxattr failed, rc=122 (dev=sda7 ino=4318565) and
Code: Mar 23 15:50:56 alpha ntpd[2958]: sendto(66.111.46.200): Operation not permitted I've done some Google'ing on the first error and it seems like someone with the same error got it fixed by running a filesystem check... By the way /dev/sda7 is my /home directory. But in order to run a filesystem check I need to disable all services and unmount /home? That's kind of a pain in the ***... In general, how long do you think it takes to run a filesystem check? Here's information about my /home partition:
my server every 1-2 time month crash, checking from console i have a kernel panic, in attach the imge, i am using Centos 4.4 with the latest rpm kernel
we have one box in hivelocity.net that has been down so many times this month that we were forced to remove links to siteuptime where we were once so proud of having a 99.7% uptime for 3 years in theplanet.
syslog shows that just before crashing, these entries were made:
kernel: kernel BUG at mm/rmap.c:479 kernel: invalid operand:0000 [#1]
dmesg also shows this:
... Brought up 2 CPUs zapping low mappings. checking if image is initramfs... it is Freeing initrd memory: 482k freed NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xf9f20, last bus=1 PCI: Using configuration type 1 mtrr: v2.0 (20020519) mtrr: your CPUs had inconsistent fixed MTRR settings mtrr: probably your BIOS does not setup all CPUs. mtrr: corrected configuration. ...
i've googled these messages and they point to ram problems.
hivelocity.net claims to have done diagnostics on the box and that there were no problems reported.
they said this is a result of a sys configuration problem made by us.
Last year I ordered a new server with Centos 4.3 and it had the kernel kernel 2.6.9-34.0.2ELsmp installed. It runned fine and I didn't update any packages since then.
Today I started getting a problem where both mysqld and kswapd0 uses very high amounts of CPU, spiking up to 100% and my memory usage is at 99% all the time. The problem seems exactly the same as the one mentioned in this thread.
In that thread the exact same kernel is said to be insecure and to cause this problem. I also came across a centOS bug that reports this problem with high cpu, mem usage and mysql & kswapd0 consuming all resources.
In the linked thread the person solved the problem by upgrading to kernel 2.6.9-42 using rpms but others recommended a newer kernel or a custom compiled kernel for CentOS.
Apparently when they used yum it said 34.0.2 was the latest kernel.
What should I do to upgrade the kernel, which version should i upgrade to, and where do I get it from? I won't be able to compile a custom kernel and I've only installed basic rpm packages before.
We cannot figure out why our dedicated server will not boot to the correct kernel. I've removed all other options from grub.conf but it's still booting to the default CentOS setup.
grub.conf:
Code: # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You do not have a /boot partition. This means that # all kernel and initrd paths are relative to /, eg. # root (hd0,0) # kernel /boot/vmlinuz-version ro root=/dev/mapper/ddf1_4c53492020202020808627c300000000378494a900000a28p1 # initrd /boot/initrd-version.img #boot=/dev/mapper/ddf1_4c53492020202020808627c300000000378494a900000a28 default=0 timeout=5 splashimage=(hd0,0)/boot/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.18-028stab062.3) root (hd0,0) kernel /boot/vmlinux-2.6.18-028stab062.3 ro root=LABEL=/ initrd /boot/initrd-2.6.18-028stab062.3.img
I got this weird issue with a server, I have even contacted some server management companies as I ran out of ideas and ran out of things to try and fix it.
Lemme explain. The server is a core2quad q6600, with 8gb ram. 2 velociraptor 300gb on raid1.
When I set the server up I had to wait on cpanel so I first went in and compiled a grsec kernel, 2.6.24.3 to be exact. Then I installed cpanel and everything else, have done the same exact procedures with countless other servers before, nothing special.
Have had this server around 10 months. It will only run right in the 2.6.24.3-grsec kernel. When you boot another kernel it will first boot very very slow, then when the server comes up everything is very very slow. Then the load will go up to like 100 with nothing special going on in the server. It;s like its loaded down like that just with basic startup functions. You will see things like service_start processes long after startup. Cpanel takes forever to start up if it does. The server is extremely slow and unusable, you are lucky if you can edit grub.conf real quick and set the default kernel back.
It does this on every kernel...Besides the 2.6.24.3-grsec which in that case it boots right up fine and dandy. It acts like a regular server then, performs good.
So any other kernel besides the 2.6.24.3-grsec simply wont work on it. There are no logs in messages, nothing like that. I looked into things that may have been built only for the 2.6.24.3-grsec kernel but couldn't really find anything that should have made such an impact.
on a RHE 3 system I installed a new Kernel. I did update lilo.conf and grub.conf but they are still booting an old kernel: 2.4.21-27.ELsmp. Please have a look at my files below and if you have any idea why please let me know.
Code: default=0 timeout=10 splashimage=(hd0,0)/grub/splash.xpm.gz title Red Hat Enterprise Linux ES (2.4.21-53.ELsmp) root (hd0,0) kernel /vmlinuz-2.4.21-53.ELsmp ro root=/dev/hda3 initrd /initrd-2.4.21-53.ELsmp.img title Red Hat Enterprise Linux ES (2.4.21-47.ELsmp) root (hd0,0) kernel /vmlinuz-2.4.21-47.ELsmp ro root=/dev/hda3 initrd /initrd-2.4.21-47.ELsmp.img title Red Hat Enterprise Linux ES (2.4.21-47.EL) root (hd0,0) kernel /vmlinuz-2.4.21-47.EL ro root=/dev/hda3 initrd /initrd-2.4.21-47.EL.img title Red Hat Enterprise Linux ES (2.4.21-27.ELsmp) root (hd0,0) kernel /vmlinuz-2.4.21-27.ELsmp ro root=LABEL=/ initrd /initrd-2.4.21-27.ELsmp.img title Red Hat Enterprise Linux ES-up (2.4.21-27.EL) root (hd0,0) kernel /vmlinuz-2.4.21-27.EL ro root=LABEL=/ initrd /initrd-2.4.21-27.EL.img
I have a server running Fedora 4 and WHM/cPanel. I would like to upgrade the linux kernel to the latest version, so mosey in via SSH and type "yum -y upgrade". It downloads a few things and tells me everything is hunky dory.
Now, the version it says it is currently running is: 2.6.17-1.2142_FC4 #1 Tue Jul 11 22:41:14 EDT 2006
Is that the really the newest version available? Maybe I'm confused as to how this works, but if I go to kernel.org it tells me the most recent stable version of the kernel is 2.6.24.3. Is this because I am running FC4?
root@server [~]# tail -f /var/log/messages Jun 10 14:14:49 server kernel: printk: 56 messages suppressed. Jun 10 14:14:49 server kernel: ip_conntrack: table full, dropping packet. Jun 10 14:14:54 server kernel: printk: 59 messages suppressed. Jun 10 14:14:54 server kernel: ip_conntrack: table full, dropping packet. Jun 10 14:14:59 server kernel: printk: 85 messages suppressed. Jun 10 14:14:59 server kernel: ip_conntrack: table full, dropping packet. Jun 10 14:15:04 server kernel: printk: 90 messages suppressed. Jun 10 14:15:04 server kernel: ip_conntrack: table full, dropping packet. Jun 10 14:15:09 server kernel: printk: 58 messages suppressed. Jun 10 14:15:09 server kernel: ip_conntrack: table full, dropping packet. Jun 10 14:15:14 server kernel: printk: 70 messages suppressed. Jun 10 14:15:14 server kernel: ip_conntrack: table full, dropping packet. Jun 10 14:15:19 server kernel: printk: 193 messages suppressed. Jun 10 14:15:19 server kernel: ip_conntrack: table full, dropping packet.
Anyone know what this is about?
Using Centos / Cpanel
Linux server.domain.com 2.6.9-67.0.15.ELsmp #1 SMP Thu May 8 10:52:19 EDT 2008 i686 i686 i386 GNU/Linux
some errors I get on my dedicated server? I also got errors like these when I deleted inside a folder a large number of files and after running every time 'du' it shows kernel errors.
---------
Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299439] EIP is at ext3_clear_inode+0x42/0xa0 [ext3] Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299459] eax: d73d9c2c ebx: d73d9b94 ecx: 00000000 e dx: ffff9eff Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299479] esi: d73d9c2c edi: 00000000 ebp: 0000003e e sp: f7f4feb4 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299499] ds: 0068 es: 0068 fs: 00d8 gs: 0000 ss: 006 8 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299520] Process kswapd0 (pid: 162, ti=f7f4e000 task=f79ab 550 task.ti=f7f4e000) Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299540] Stack: d73d9c2c 00000000 f7f4fef0 c01ec2ff f7f4fe f0 d73d9c34 d73d9c2c c01ec65a Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299585] 00000080 de2652b8 00000080 f7f4fef0 c01ec8 a5 00000000 00000080 d73d9e18 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299628] e3c8dc34 0000e54c 00000094 f7ffea80 000000 d0 c01bedfd 0000580f 00000000 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299671] Call Trace: Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299699] [<c01ec2ff>] clear_inode+0x9f/0x150 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299726] [<c01ec65a>] dispose_list+0x1a/0xe0 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299751] [<c01ec8a5>] shrink_icache_memory+0x185/0x260 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299779] [<c01bedfd>] shrink_slab+0x11d/0x180 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299812] [<c01bf27a>] kswapd+0x35a/0x450 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299849] [<c0195d80>] autoremove_wake_function+0x0/0x50 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299880] [<c01bef20>] kswapd+0x0/0x450 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299900] [<c0195bca>] kthread+0xba/0xf0 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299921] [<c0195b10>] kthread+0x0/0xf0 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ... server kernel: [358873.299945] [<c015f6e7>] kernel_thread_helper+0x7/0x10 Message from syslogd@ at Wed Jul 16 16:30:26 2008 ...
here is what I seen when I installed kernel-2.6.20-1.2948.fc6.src.rpm
rpm -ivh kernel-2.6.20-1.2948.fc6.src.rpm 1:kernel warning: user brewbuilder does not exist - using root warning: group brewbuilder does not exist - using root warning: user brewbuilder does not exist - using root ########################################### [100%] warning: user brewbuilder does not exist - using root warning: group brewbuilder does not exist - using root
then when I ran: rpmbuild -bp --target=$(uname -m) /usr/src/redhat/SPECS/kernel-2.6.spec
I seen this error: + Arch=x86_64 + make ARCH=x86_64 nonint_oldconfig In file included from /usr/include/sys/socket.h:35, from /usr/include/netinet/in.h:24, from /usr/include/arpa/inet.h:23, from scripts/basic/fixdep.c:117: /usr/include/bits/socket.h:310:24: error: asm/socket.h: No such file or directory make[1]: *** [scripts/basic/fixdep] Error 1 make: *** [scripts_basic] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.93770 (%prep)
I need to have this installed to get a app installed etc... suggestions or ideas? thanks
I have a Xen VPS. I started with a Debian 4 image and have since upgraded to Debian 5. Firstly was this advisable? Secondly what Kernel version should I be running, or rather is it set by my installation or by the Xen server?
as part of a project I have lately been looking into various aspects of kernel tuning. Most notably lately tuning the TCP stack for more efficient memory usage/throughput.
Thought I would start this thread to mention some of the tools I'd found for doing testing and see what anyone else had to recommend.
So far my favorite of the bunch is nuttcp. Its easy to use and gives a very good idea of how much of your bandwidth you are able to utilize.
A few interesting web pages are as follows for anyone interested in the topic:
[url]- Tuning TCP for High Bandwidth Delay networks
[url]- TCP Tuning Cook book, some interesting information in there as well
[url]...formanceTuning - Performance Tuning TWiki. Has a list of useful tools, flags for existing tools and ways to monitor network performance from a system level, along with some suggestions of things to correct
What is the best way to find out which filesystems and harddrive drivers you can remove? Obviously, i need ext2,3 but how do you find which HD you only need?
i can not find any important info from /var/log/messages
but i find some records many time on it,those like ---------------------------------- Jun 15 05:30:40 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:40 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:40 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:40 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:40 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:40 server kernel: ata1: EH complete Jun 15 05:30:42 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:42 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:42 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:42 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:42 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:42 server kernel: ata1: EH complete Jun 15 05:30:44 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:51 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:51 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:51 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:51 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:51 server kernel: ata1: EH complete Jun 15 05:30:51 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:51 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:51 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:51 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:51 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:51 server kernel: ata1: EH complete Jun 15 05:30:51 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 15 05:30:51 server kernel: ata1.00: (irq_stat 0x40000001) Jun 15 05:30:51 server kernel: ata1.00: cmd 25/00:08:42:23:d2/00:00:2c:00:00/e0 tag 0 cdb 0x0 data 4096 in Jun 15 05:30:51 server kernel: res 51/40:00:42:23:d2/00:00:2c:00:00/e0 Emask 0x9 (media error) Jun 15 05:30:51 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:30:52 server kernel: ata1: EH complete
Jun 15 05:31:26 server kernel: ata1.00: configured for UDMA/133 Jun 15 05:31:30 server kernel: sd 0:0:0:0: SCSI error: return code = 0x08000002 Jun 15 05:31:33 server kernel: sda: Current [descriptor]: sense key: Medium Error Jun 15 05:31:36 server kernel: Add. Sense: Unrecovered read error - auto reallocate failed Jun 15 05:31:36 server kernel: Jun 15 05:31:39 server kernel: Descriptor sense data with sense descriptors (in hex): Jun 15 05:31:46 server kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 15 05:31:51 server kernel: 2c d2 23 42 Jun 15 05:31:56 server kernel: end_request: I/O error, dev sda, sector 751969090 Jun 15 05:31:57 server kernel: ata1: EH complete Jun 15 05:31:57 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Jun 15 05:31:58 server kernel: sda: Write Protect is off Jun 15 05:31:58 server kernel: SCSI device sda: drive cache: write back Jun 15 05:31:59 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Jun 15 05:32:03 server kernel: sda: Write Protect is off Jun 15 05:32:04 server kernel: SCSI device sda: drive cache: write back -------------------