Project

General

Profile

Actions

Task #2256

closed

Setup a smarthost to relay service and automatic system emails

Added by Guilhem Moulin over 7 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Category:
Mail system
Target version:
Team - Q4/2018
Start date:
Due date:
% Done:

0%

Tags:

Description

All our boxes need to be able to send out email (such as system mails to hostmaster@tdf). Currently each smtp(8) client establishes TCP/25 connections directly to the remote MTAs. That doesn't scale well, because for each new host we need to take the public part of the generated DKIM key and add a TXT record to the zone. Moreover amavis and clamav are rather greedy in terms of resources, and having an instance of both on each of our hosts seems unnecessary.

This issue is about deploying a (possibly more?) smarthost to relay outgoing email from all our boxes, except

- documentfoundation.org (TDF's mail server, private mailing lists)
- vm192.documentfoundation.org (redmine instance)
- vm194.documentfoundation.org (public mailing lists)
- intranet.documentfoundation.org
- monitoring.documentfoundation.org

Other boxes would use said smarthost as a relayhost, and delegate DKIM signing & virus detection to it. To secure links to the smarthost, each smtp(8) would use a client certificate and public keys should be pinned on both ends, as per the following snippets:

vmXYZ.tdf:/etc/postfix/main.cf
smtp_tls_security_level = may
smtp_tls_cert_file      = /etc/ssl/certs/ssl-cert-snakeoil.pem
smtp_tls_key_file       = /etc/ssl/private/ssl-cert-snakeoil.key
smtp_tls_policy_maps    = hash:$config_directory/tls_policy
smtp_tls_fingerprint_digest = sha256
vmXYZ.tdf:/etc/postfix/main.cf
[smarthost.tdf]:25 fingerprint ciphers=high protocols=!SSLv2:!SSLv3:!TLSv1:!TLSv1.1
  match=$$SHA-256 disgest of the smptd(8)'s SPKI$$
smarthost.tdf:/etc/postfix/main.cf
relay_clientcerts            = hash:$config_directory/relay_clientcerts
smtpd_client_restrictions    = permit_mynetworks, permit_tls_clientcerts
smtpd_relay_restrictions     = …, permit_tls_clientcerts, …
smtpd_recipient_restrictions = …, permit_tls_clientcerts, …
smtpd_tls_ask_ccert          = yes
smarthost.tdf:/etc/postfix/relay_clientcerts
$$SHA-256 disgest of the smtp(8)'s SPKI$$ vmXYZ.tdf
$$SHA-256 disgest of the smtp(8)'s SPKI$$ vmUVW.tdf
…

The SPF policy for vmXYZ.tdf would be

vmXYZ IN TXT "v=spf1 a:smarthost.tdf ?all"

(we could even pre-fill the zone like we do for A records)

The smarthost would also act an MX for nullmailers clients (with either a DISCARD rule for all valid senders, or alias them to hostmaster). This is because some MTAs phone back to verify that the envelope sender address exists. However machines that need to be able to receive email would keep an INADDR_ANY-listening smtpd, and use themselves as MX.

Actions #1

Updated by Florian Effenberger over 7 years ago

  • Target version changed from Q2/2017 to Q3/2017

For the boxes to be excluded, you can check our DNSWL record, as these send individually
The above list looks rather well, but not entirely sure if there isn't one host missing

Actions #2

Updated by Guilhem Moulin over 7 years ago

Actions #3

Updated by Florian Effenberger about 7 years ago

Taken from an older discussion, what we also should factor in is lowering internal mail filtering, e.g. from a VM to mail.tdf and other way round. You wrote:

I usually use a different policy bank for that (making amavis listen on
another socket), and add a check_client_access line to the main.cf's
smtpd_recipient_restrictions with a CIDR table:

main.cf:
smtpd_recipient_restrictions =
check_client_access cidr:$config_directory/filter-mynetworks.cidr

/etc/postfix/filter-mynetworks.cidr:
192.168.1.1/24 FILTER amavisfeed:[127.0.0.1]:10042

Actions #4

Updated by Florian Effenberger about 7 years ago

And some more of your comments on that:

IMHO the proper way to do that is to deploy a private interface on all
the machines and use a cidr map in the postscreen_access_list (so
internal clients completely bypass postscreen and immediately connect to
the smtpd). I argued for that a couple of infra calls ago, but at the
moment not all VMs have a configured private interface.

For now we could whitelist 89.238.68.0/25 and 2a00:1828:a012::/48, but
for the hosts outside manitu we would have to manually mine the public
IPs (since the mail config of mail.tdf is not deployed by salt).

Actions #5

Updated by Guilhem Moulin about 7 years ago

Florian Effenberger wrote:

And some more of your comments on that:

IMHO the proper way to do that is to deploy a private interface on all
the machines and use a cidr map in the postscreen_access_list (so
internal clients completely bypass postscreen and immediately connect to
the smtpd). I argued for that a couple of infra calls ago, but at the
moment not all VMs have a configured private interface.

For now we could whitelist 89.238.68.0/25 and 2a00:1828:a012::/48, but
for the hosts outside manitu we would have to manually mine the public
IPs (since the mail config of mail.tdf is not deployed by salt).

Thanks for the reminder :-) In fact we could avoid that by having using a dedicated smtpd(8) for our internal mail traffic have it listen on a port other than 25 (for instance TCP/587 or TCP/2525). postscreen is only useful on public interfaces; unlike the public one, the private smtpd(8) instance mandates STARTTLS, requests a client cert, and only allows authenticated traffic from known smtp(8) clients.

That being said, using a dedicated subnet from the private IP space avoids the need for encrypted tunnels. But we're already copying public key material for ssh, and it's pretty much the same for self-signed X.509 keypairs.

Actions #6

Updated by Florian Effenberger about 7 years ago

Smarthost should also deal with

  • pollux
  • antares

being unable to send mails due to missing FQDN (these are in intranet).

Actions #7

Updated by Florian Effenberger about 7 years ago

  • Blocks deleted (Task #2119: undeliverable e-mails)
Actions #8

Updated by Florian Effenberger about 7 years ago

  • Target version changed from Q3/2017 to Q4/2017

I'd like to prioritize the SSO topic (unless there's a compelling reason the smarthost is needed right now) for Q3, so re-assigning to Q4

Actions #9

Updated by Guilhem Moulin about 7 years ago

We also need to be careful not to use our upstream recursive servers to query DNSBL/DNSWL due to rate limiting. A solution would be to use one (or more) local caching resolver(s), and either query the root DNS servers directly, or spoof NS records for these zones pointing to their respective authoritative servers.

Pointing unbound(8) to the root DNS servers can be done with root-hints: "/etc/unbound/root.hints" (with a cron job to refresh that list regularly).

Actions #10

Updated by Florian Effenberger over 6 years ago

Where do we stand wrt. this?
Isn't the smarthost in place already? Are there machines that still need using it?

Actions #11

Updated by Guilhem Moulin over 6 years ago

Yes, didn't migrate all hosts at once to see how things scale, and this ticket should stay open until all host have been migrated. vm161 (blog) and vm146 (askbot) have also not been migrated yet, because they unfortunately relay a fair amount of spam and I was afraid it would taint vm202's IPs. It' probably time to have a look.

Actions #12

Updated by Florian Effenberger over 6 years ago

Sounds good and sensible, thanks!

Actions #13

Updated by Florian Effenberger over 6 years ago

  • Target version changed from Q4/2017 to Q3/2018

With all that's on the table atm, and the fact that this is not super urgent, I propose to not look into this before end-June, unless you see it more urgent

Actions #14

Updated by Florian Effenberger about 6 years ago

How many systems are not using the smarthost already, iow. can we close this ticket? :)

Actions #15

Updated by Guilhem Moulin almost 6 years ago

  • Target version changed from Q3/2018 to Q4/2018

All public boxes (excl. {monitoring,mail,vm194}.tdf) use it since the summer, as well as all boxes that have been upgraded to Stretch. For the remaining ones, I reconfigure the MTA when upgrading the OS, hence re-targeting to Q4 (current aim to upgrade the remaining Jessie boxes).

Actions #16

Updated by Florian Effenberger over 5 years ago

I guess we can close this ticket now, or is something pending? :-)

Actions #17

Updated by Guilhem Moulin over 5 years ago

  • Status changed from New to Closed

Florian Effenberger wrote:

I guess we can close this ticket now, or is something pending? :-)

Ah yeah, that one is long done :-)

Actions

Also available in: Atom PDF