[SOLVED] Odd Networking Issue that is really frustrating me

The place to post if you need help or advice

Moderators: ChriThor, LXF moderators

[SOLVED] Odd Networking Issue that is really frustrating me

Postby jmangan » Wed Mar 22, 2017 5:01 pm

I've got a weird problem which has really got me stumped. I tried the Centos forums but haven't received any suggestions there.

I've got a CentOS 7 server which I've been running for about six months with no issues at all.
A very basic server just used as a backup dump using RSYNC.

A couple of weeks ago we had a small power blip which caused some of the devices here to reboot or just power off.

When this server came back up I could still ping it but I could no longer SSH to it or connect to Webmin.
Okay, I thought, some service hasn't come back up or I forgot to save some detail and the reboot has lost it.
First I checked the interface configuration; correct IP, mask, gateway and DNS server.
Checked that the 'internal' firewalld zone was default and permanent and the service was running (it was).
Ensured that it accepted the local subnet as source. Checked that DNS, HTTP/S and SSH were allowed.
All good but no difference. Then I made sure that the network service was stopped and disabled (it was) and
that NetworkManager was enabled and active (yes).

By the way all moy 'local' machines are on the same subnet with just a network switch between them; no firewalls or routers.

In desperation I added the 'ipv6.disable' option to the GRUB config and confirmed after reboot no IPv6
interfaces. But, of course, it made no difference.

So then I started looking at network functionality. It can ping specific IP addresses but not local hostnames.
And this is where the weirdness starts. If I try to ping a hostname (local or 'global') while pointed to my internal DNS the
command prompt freezes. At first I kept 'Ctrl-C'-ing out of it but then I left it and discovered that in
around a minute it returned control but with no message at all (no 'lost pings', no 'name couldn't be found',
nothing). With that clue I set it to point at an external DNS server and it works just fine.

After pointing it at an external DNS I also ran a 'yum update' which worked perfectly so the network hardware is
working.

So at this point you might think that the internal DNS server is broken except that a dozen other devices are
using it with no problems at all and it doesn't explain why no other device can connect to this server on
SSH/Webmin even though the services are running and the firewall is configured correctly. It also doesn't explain
the lack of any response to the command line when trying to identify a local hostname. Presumably DNS is silently
timing out but then why doesn't 'ping' return 'host could not be found'?

Any ideas? At all?
Last edited by jmangan on Fri Mar 24, 2017 7:56 pm, edited 1 time in total.
jmangan
 
Posts: 64
Joined: Fri Apr 29, 2005 7:37 am

Re: Odd Networking Issue that is really frustrating me

Postby nelz » Wed Mar 22, 2017 9:07 pm

First thing to check when getting weird failures after a power outage, did you fsck all your filesystems?
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 9041
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK

Re: Odd Networking Issue that is really frustrating me

Postby jmangan » Wed Mar 22, 2017 11:12 pm

I thought the server might have done that on bootup BUT I'm not sure.

I'll give it a try.
jmangan
 
Posts: 64
Joined: Fri Apr 29, 2005 7:37 am

Re: Odd Networking Issue that is really frustrating me

Postby Dutch_Master » Thu Mar 23, 2017 11:00 am

You can have the kernel automagically perform fsck on boot by editing /etc/fstab. Here's a few sample lines from mine:
Code: Select all
/dev/sda1   /boot            jfs      defaults      1 2
/dev/sda2   /            jfs      defaults      0 1
/dev/sdb1   /home/         jfs      defaults           0 0

It's the last digits on each line that control whether fsck is done and when. Read the man page for fstab to learn more :)
Dutch_Master
LXF regular
 
Posts: 2586
Joined: Tue Mar 27, 2007 1:49 am

Re: Odd Networking Issue that is really frustrating me

Postby jmangan » Thu Mar 23, 2017 5:38 pm

Well I ran 'touch /forcefsck' but the check ran too quickly to see and the problem remains.

The non-root partitions are on software RAID1 using xfs. I wasn't sure if it was worth it or sensible to force fsck on those.

Any other suggestions?
jmangan
 
Posts: 64
Joined: Fri Apr 29, 2005 7:37 am

Re: Odd Networking Issue that is really frustrating me

Postby Dutch_Master » Fri Mar 24, 2017 10:19 am

You can't run fsck on mounted partitions, not even forced. Download a sysrescuecd image and either put it on a cdrom (if the server has one) or use unetbootin to make a bootable USB stick. Whichever media, boot the server from it, then perform the fsck on the disks. Reboot.
Dutch_Master
LXF regular
 
Posts: 2586
Joined: Tue Mar 27, 2007 1:49 am

Re: Odd Networking Issue that is really frustrating me

Postby jmangan » Fri Mar 24, 2017 10:31 am

I'm a bit confused now. I thought that when you set '/forcefsck' that caused the system to run fsck on the root partition before it was mounted at bootup. That certainly is what the articles I've read have suggested. Are you saying that is not the case?

Also I can understand why something odd on either the /boot or / partition 'might' in some way have something to do with this odd issue but I don't understand how the data partitions which have no system data/config/applications/modules/drivers, etc. could be involved. Also in fstab the value in the line for the data partitions was '3' whereas the articles I read suggested only 0,1 & 2 were valid. That makes me nervous about messing with it.

Can you clarify for me please. I really, really want to fix this but I also want to understand what I'm doing and why, if possible.

Thanks.
jmangan
 
Posts: 64
Joined: Fri Apr 29, 2005 7:37 am

Re: Odd Networking Issue that is really frustrating me

Postby Dutch_Master » Fri Mar 24, 2017 3:39 pm

I've never come across this "/forcefsck" tool you mentioned before, so I can't tell you anything about it. Probably something systemd related, for which it should be taken behind the shed and its misery ended very definitively :roll: :mrgreen:

Anyway, back up your fstab first*:

Code: Select all
sudo cp /etc/fstab /etc/fstab.bak

The .bak extension is an unofficial indicator of a backup file. Next, open /etc/fstab and change the values to 0, 1 or 2 according to your needs. Save the file. Reboot.

In Linux, partitions that are marked as 'not clean' will not be mounted r/w, even if fstab tells it to be. They can only be read, hence a system will partially boot, like yours. One of the areas affected with not having write access are the virtual filesystems, like /proc. This in turn leads to problems with /dev/random and similar, culminating in the network not being able to start in full. SSH and by extension rsync need write access for their date/time stamps. If they haven't, they won't start their daemons. So they're unreachable from the network.

*I assume you've taken my advise and used Sysrescuecd to fsck the partitions on the machine, right? If not, do so, otherwise you won't be able to backup your fstab.

PS: manipulating (fsck!) XFS partitions usually requires extra tools installed on your system. They should be there if you've installed it using xfs from the installer. If you haven't done so, these tools may not be present on your system, or otherwise not available. Check in the package manager whether the xfs tools are installed, if not, do so. Package name is probably xfstools or xfsutils, or similar.
Dutch_Master
LXF regular
 
Posts: 2586
Joined: Tue Mar 27, 2007 1:49 am

Re: Odd Networking Issue that is really frustrating me

Postby jmangan » Fri Mar 24, 2017 7:51 pm

Sorted!

It wasn't an fsck thing - although I appreciate the expanded explanation and I did try that first; and xfs_repair.

Somehow the local network wasn't in the route table! ifcfg-<dev> didn't have a 'NETWORK=' or 'NETMASK=' statment.

Now I can't work out how a power cut could have removed them (not very likely) nor how it could have worked for so long if they weren't there to start with.

Anyway, big sigh of relief and much thanks for the suggestions. (And some more knowledge acquired).
jmangan
 
Posts: 64
Joined: Fri Apr 29, 2005 7:37 am

Re: Odd Networking Issue that is really frustrating me

Postby nelz » Fri Mar 24, 2017 10:11 pm

Dutch_Master wrote:I've never come across this "/forcefsck" tool you mentioned before, so I can't tell you anything about it. Probably something systemd related, for which it should be taken behind the shed and its misery ended very definitively :roll: :mrgreen:


/forcefsck is neither a program nor a part of systemd. Some distro startup scripts look for the presence of a file call forcefsck in the root of a filesystem and run fsck if they find it. Otherwise is normally only run every 30 mounts or so, or when the filesystem as not been cleanly unmounted.
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 9041
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK


Return to Help!

Who is online

Users browsing this forum: Yahoo [Bot] and 4 guests

cron