You know what they say; the best time to make configuration changes is at 23:30 on a Sunday evening right when you're about to go to bed. It's oh so very simple, it all starts with a sudo su, that's the squeeze, then the apt-get dist-upgrade, the pin pulled, and then, the final piece, the humble little being, a disaster hidden in a disguise that could trounces a Trojan Horse, Y, the figurative grenade being thrown into the fragile house of glass.
That's what I did last night, and, then spent the subsequent 16 hours following up on. Granted, I did sleep in that time. In all honesty, it wasn't THAT bad, but it wasn't exactly smooth. Below I've outlined some of the issues I came across and the fix/hack that got them working again, just in case anyone else does run into the same issues. I'm sure there are better ways to handle some of them - I am certainly no expert, but if you're looking for a quick turn around, the below should suffice.
For quick reference, the below deals with:
- eth0/Network down on boot
- Dovecot failing to start due to bad config
- Fail2Ban failing to start - iptables error 100
- Postfix Milters - opendmarc & opendkim refusing connection
- OSSEC false positive on grep
Immediately after upgrading I rebooted my remote cloud based VM, waited about a minute, and then tried to SSH onto it... timeout. I reasoned that this would be because it was newly updated, if only to appease my stress levels. Normally in this situation I take to refreshing my web page every few seconds (the server doesn't respond to pings). About 5 minutes went by before I realised there was something else amiss. I hopped on my CloudStack account, opened up a terminal, ran ifconfig and much to my alarm, eth0 wasn't there. Running ifconfig -a told me that eth0 hadn't up and left - tired of my insatiable need to taunt it in the late hours - but rather that it was down. I presumed that this was an issue with that particular boot and network availability, it's happened before, so I ran a ifup eth0 and thought nothing of it. Oh how wrong I was. I'm told that interfaces no longer auto up by default, I don't know whether that's true or whether it was just a glitch, but a system which previously leapt onto the internet with the enthusiasm of a bull faced with a red rag had suddenly lost it's gusto.
TL;DR: The Fix: Your eth0 is down on every boot
You need to set it to auto up on boot, open up the network interface config:
sudo nano /etc/network/interfaces
And then append to it the following:
Additionally, if you're missing it, add the following:
iface eth0 inet dhcp
Dovecot down, imap logins fail
Given the benchmark for this upgrade had been set, I moved onto my next issue. It had occurred to me in this moment, perhaps by the some form of divine intervention, that I'd not received the usual post-reboot spam from OSSEC and Fail2Ban. It was also around this time I realised Thunderbird was timing out, and when I tried I couldn't log into the web interface. A quick look at the logs gave me this:
dovecot: master: Dovecot v2.2.27 (c0f36b0) starting up for imap, lmtp, sieve (core dumps disabled) dovecot: lmtp(5130): Fatal: Invalid ssl_protocols setting: Unknown protocol 'SSLv2' dovecot: master: Error: service(lmtp): command startup failed, throttling for 2 secs
Best practice with any internet facing device is to lock down anything that could be considered a ticking time bomb, SSLv2 and SSLv3 more than meet that criteria. But it turns out SSLv2 has been entirely purged from Dovecots configuration, to the point that it doesn't know what it is.
TL;DR:The Fix: You still have a reference to SSLv2
The reference will most likely be in your dovecot.conf or your 10-ssl.conf. To fix it you have two options, the first is to find in which (if not both) file a reference to SSLv2 (either SSLv2 or !SSLv2 - to disable it) occures, to do so, run the following:
sudo grep -r "SSLv2" /etc/dovecot/ -H
This will give you the filename and line number of each occurance, go in, delete 'em. Remeber not to delete !SSLv3, and if you don't see !SSLv3 - put it in like so:
ssl_protocols = !SSLv3
An alternate fix is just to override any references by adding an sslprotocols line to your dovecot.conf - or editing the one that's already there - and making it look like the line above. Dovecot will then ignore the configuration in any other files.
As an aside, the update to Stretch, as we can see here, brought with it an update to the Dovecot main package in the source repository, so until now - if you hadn't compiles from source, you couldn't use things such as the following, which I highly recommend you implement:
ssl_dh_parameters_length = 2048 ssl_prefer_server_ciphers = yes
Fail2Ban fails to ban, quelle surprise
This issue was a bit of a big one, whilst my server is in no way the holy grail of exploits waiting to happen, I do like to do what I can to keep it secure, and automated SSH and IMAP attacks are frequent, so I do rather depend on Fail2Ban to reactively put a stop to them. Unfortunately something, I don't know what, has changed in the current stable release for Stretch. The errors look something like this, though there were a lot more of them:
fail2ban.actions.action: ERROR iptables -D INPUT -p tcp -m multiport --dports ssh -j fail2ban-ssh iptables -F fail2ban-ssh iptables -X fail2ban-ssh returned 100
Running the last line there as it's own command, will give you an iptables error stating too many links. If you're like me, you'll backup your firewall, purge it, and reload Fail2Ban with your fingers crossed, that does not work.
If you Google this issue, people will be saying things about race conditions with Fail2Ban, and suggest you add a pause into the F2B code, I suggest you do not do that.
TL;DR: The Fix: Reinstall
Yep. I couldn't fix this, it was already getting late and I'd lost the will, it was easier to just do a complete fresh reinstall. This did however to take advantage of something either new or which I'd not seen before, Fail2Ban now supports direct integration with CloudFlare, which I'd previously achieved using a custom jail. To enable it, you simply have to set your action in your jail.local as follows:
action = %(action_cf_mwl)s cfemail = <your cloudflare email> cfapikey = <your cloudflare api key>
Unfortunately, this still uses CloudFlare's v1 API, to get it to add firewall rules in with there new v4 API, you'll need a different action. Luckily one exists, you can create a file under /etc/fail2ban/action.d named cloudflare.loal and enter the contents of this file into it. You will need to specify two variables, cftoken and cfuser, this can be done here or in /etc/fail2ban/action.d/cloudflare.conf - I believe you could also do it in jail.local, but don't quote me on that.
This works great, if someone is trying to get in when they shouldn't, they'll get blocked at CloudFlare's firewall, but, I still want them blocking in iptables too, which this rule doesn't by default support, but you can add it. Open up your jail.local and find 'action_cf_mwl = '. This may not be the 'proper' way of doing it, but to get it to also ban iptables, append the following to it:
%(banaction)s[name=%(__name__)s, bantime="%(bantime)s", port="%(port)s", protocol="%(protocol)s", chain="%(chain)s"]
It should look like this:
action_cf_mwl = cloudflare[cfuser="%(cfemail)s", cftoken="%(cfapikey)s"] %(mta)s-whois-lines[name=%(__name__)s, sender="%(sender)s", dest="%(destemail)s", logpath=%(logpath)s, chain="%(chain)s"] %(banaction)s[name=%(__name__)s, bantime="%(bantime)s", port="%(port)s", protocol="%(protocol)s", chain="%(chain)s"]
Save that, restart Fail2Ban, and you're laughing.
On a final note, the php-fopen jail is broken in version 0.9.6 at least, this is noted here on the GitHub page which also contains links to potential fixes.
Postfix Milters: Connection Refused
This one took a bit more head scratching, and I only discovered it this afternoon. Essentially, mail worked, but incoming mail wasn't going through the opendmarc and opendkim milters, the errors looked like this:
warning: connect to Milter service inet:localhost:12301: Connection refused warning: connect to Milter service inet:localhost:54321: Connection refused
Worse still, the processes were running, ps faux | grep dmarc and ps faux | grep dkim both showed active processes, and there was nothing in the syslog that suggested there was an issue. After a lot of chasing the wrong problem, I started to come across a lot of (as is usually the case) StackExchange answers which suggested that there was a permissions issue between postfix and opendkim/dmarc. This was not the case, but those answers lead me to run lsof -i :12301 - and there was nothing listening on it. I quick look at the config file for either of them showed that they were configured to use local ports, and the ones in the log file too. It's this discovery that lead me to this SE post and [https://bugs.debian.org/cgi-bin/bugreport.cgi?archive=no&bug=861169](this bug report). The user there, who I thank the lord for creating, has the solution. Apparently it's the case that the package, by default (and thus when you upgrade and it gets updated) is hard coded to use a socket and thus ignores the two configuration files in /etc/default where you've specified the inet addresses.
TL;DR: The Fix: A socket has been hard coded
It's a simple fix in the end, one of those magical "run these commands" fixes:
sudo /lib/opendkim/opendkim.service.generate sudo /lib/opendkim/openmarc.service.generate sudo systemctl daemon-reload sudo service opendmarc restart sudo service opendkim restart
OSSEC false positive on grep
This isn't hurting anyone, so I guess this is more of a notice. OSSEC uses RootCheck to identify trojan files, and due to changes in grep with the new version, OSSEC thinks it is one, it's not.
You will get a notification for a rule 510 violation, looking like this:
Trojaned version of file '/bin/grep' detected. Signature used: 'bash|givemer|/dev/' (Generic).
Do not be alarmed, it's fine. Though, if someone did want to trojan you and go undetected, I guess this would be the apple of eden. The issue is outlined on OSSEC's GitHub - where you can also see ways to verify that grep is in fact, grep.
If you want rid, there are two suggestions outlined here.
There were a couple more tidbits, for example, Apache2 (shudders) somehow ended up back on my system, and that killed Nginx, and for some bizarre reason, Nginx isn't proxy_pass'ing requests to my Ghost blog after a reboot, despite the fact that both are running - it takes a restart of Ghost to start it, so that one is probably on them.
I hope this helps someone else like me, who just wants to get to bed but can't sleep knowing there's a warning message in their logs.