[ale] Replacing shared host?

Rich Kulawiec rsk at gsp.org
Thu Dec 20 03:49:05 EST 2018


On Mon, Dec 17, 2018 at 11:29:38AM -0500, Simba via Ale wrote:
> You're not going to get  status reports on your abuse reports to any
> hosting provider, because you aren't entitled to them.

Ummm, no.  That's wrong.

Let me explain a few things about how competent operations are run
(in the context of this topic, clearly there are other contexts).

I'm going to try to be nice about this, but I'm kinda all out of gum
with respect to people who can't or won't do their jobs.

First, quality operations are designed, built, and operated to minimize
the possibilty that they can emit abuse/attacks.  That covers a lot of
ground, from default-deny policies on outbound firewalls to mandatory
confirmed opt-in (COI) on mailing lists to customer vetting to netflow
monitoring.  And of course the measures vary with the operation, but
there are many things that apply to most.  I'm appending an example
approach in [1].

Second, these operations comply with best practices, whether formally
codified in RFCs or informally standardized via usage.  For example, RFC
2142 formally specifies "postmaster" as mandatory for all mail systems,
but it was a de facto best practice well before 1997.  Speaking of
RFC 2142, competent operations have working addresses as it specifies,
including "abuse" -- and they pay rapt attention to what shows up there.
Same for "security".  Of course they do: the traffic arriving at these
represents attempts by third parties to tell them that something is wrong,
and ignoring that is perilous.

Third, they deal with these reports in a manner commensurate with their
urgency and severity.  That includes acknowledging them and providing
status updates to the reporters.  It also includes providing an incident
summary once the matter's been resolved, and if appropriate, an apology.
(It's almost always appropriate.)   Of course, the larger the receiving
operation, the greater its resources and thus the faster all of this
should happen.  For example, my expectation is that an operation the scale
of AWS should respond within the hour 24x7: it is obvious on inspection
that they can easily afford to staff for this.

Fourth, the investigation and remediation need to happen on a "whatever
it takes" basis.  Some incidents are truly trivial and can be quickly
dealt with, e.g., a misconfigured MTA that constitutes an open SMTP
relay.  Others may be far more complex.  It's worth observing that
full remediation may sometimes require the termination of a customer.
(For this, I highly recommend an approach with the thoroughness
of Garibaldi's Firing Method.)  For case studies in what happens when
operations *don't* do this, I recommend examining the performances of
Twitter and Facebook, both of which deserve an "F" grade only because
no lower mark is available.

Fifth, the abuse/security people/teams handling all this are empowered to
overrule operations, programming, marketing, sales -- everyone.  They have
to be, otherwise this can't work.  They're also equipped with appropriate
toolsets, long experience, and jaundiced attitudes.  This approach may
seem harsh, but unfortunately anything less doesn't work in practice.
(Why?  Because abusers/attackers, having detected a soft target, will
soon be back, and in greater numbers.  The target will become what we
call an "abuse magnet".)

Let me pause to note that sometimes folks will comment that this is
onerous -- that is, that the volume of incoming traffic is too much to
deal with.  My response to that is that it's easy to fix: stop emitting
so much abuse and generating so many attacks -- if necessary, by shutting
down the operation until its operators learn how to run it properly.
There is certainly no reason for the entire rest of the Internet to put
up with people who built something they don't know how to run
and then had the arrogance to plug it into our network anyway.

	"Current Peeve: The mindset that the Internet is some sort of
	school for novice sysadmins and that everyone *not* doing stupid
	dangerous things should act like patient teachers with the ones
	who are."
		--- Bill Cole

I have also received the comment that doing this is impossible.  That is
nonsense, of course: I've been running operations that do this for decades,
and have trained others to do the same.  Most of it's pretty straightforward.
And most of the rest becomes so with some training/practice.

And finally I have received the comment that it's not their
responsibility.  This is not merely wrong, but insanely wrong.  The first
duty of every sysadmin and every network admin is to ensure that their
operation is not an operational menace to the entire rest of the Internet.
It doesn't matter whether they're running a collection of huge data
centers or a 1U.  This duty pre-empts all others at all times.  And if
abuse/attacks come from their servers on their network on their watch,
then *they own those*.  Anyone who can't handle that responsibility
appropriately should unplug their systems/networks.

If it's not clear why it has to be that way, then consider this:

All of the abuse that your (as in "you" the reader) systems log, all
of the attacks that you must fend off, do not magically fall out of the
sky.  They come from systems run by people who aren't doing their jobs
properly -- because well-run operations do not exhibit this behavior.
Keep in mind that if you can see abuse/attacks arriving, then they can
most certainly see them leaving -- if only they bother to look.  And they
can stop them, if only they bother to act.  Thus you might very reasonably
wonder why you see the same things day after day from the same operations.
You also might be more than a little annoyed that *you* are paying the
costs of *their* failures, whether that payment is aggravation, risk,
complexity, time, money, or otherwise.

Consider two other things as well:

	1. There is an entire market sector that exists solely because
	of this.  All the products and services that defend against
	spam, DoS attacks, etc. exist solely because of this.  All of
	the money spent in this market sector has been expended because
	these people either couldn't or wouldn't do their jobs.

	2. As I observed many years ago, outbound abuse from an
	operation is a surface indicator of underlying security problems.
	It doesn't tell you what those problems are.  It doesn't tell
	you why they are.  But it is an existence proof for them.  And if
	that proof is on the table for all to see, why would anyone want
	to sign up to be their customer?  Doing so means moving into an
	operation that is quite clearly *already* compromised.

The bottom line is that competent operations are on top of all of this
and thus, while they may have sporadic/isolated problems, they don't
have chronic/systemic problems.  They pay attention to problem reports,
they deal with them swiftly and appropriately, and they take preventive
steps to avoid repeat incidents.  These are responsible professionals
(or responsible amateurs) who understand what it takes and make it happen.

Then there are the others.  For them, the best rebuttal is a firewall.
There is no reason to continue granting them access privileges
if they're going to use those for abuse/attacks.  (If you allow someone
into your home, repeatedly, and they choose to repay your hospitality
by relieving themselves on your living room carpet, repeatedly, it would
be foolish to let them in yet again.)

This is why Digital Ocean has earned its very own firewall ruleset.
They worked quite diligently at it: they kept sending abuse and attacks,
and ignoring (or /dev/null'ing) my reports.  Eventually I decided to
give them what they so earnestly requested.

Same for Choopa/Vultr.  Same for some others.

And thus we have the problems (some of the problems) that we have in the
contemporary Internet environment.  BUT: that is no excuse for any of us
to make them worse.   Quite the contrary, it's ample reason for all of us
to continuously try to do better.  It's our responsibility to each other.
Just because some people are incompetent and negligent (and that's a
charitable assessment; it's also possible that they're not facilitating
the abusers, but that they *are* the abusers) doesn't mean we need to be.

---rsk

[1] Here's an example of pro-active outbound abuse/attack control.

Let's suppose you have a network with a dozen hosts doing various things:
web server, mail server, DNS server, database server, etc.  And today
you're going to work on preventing these from becoming an outbound
(email) spam/phish/etc. problem.

(a) Configure the MTAs on all but one host as dumb/stub mail systems which
automatically forward all non-internal traffic to the smarthost -- the
single mail system which is actually clueful enough to send traffic
out to destinations on the Internet.

(b) Block (on the hosts and on the firewall) outbound port 25 on all but
the smarthost.  Set alarms for any traffic trying to get out, because
that should never happen.  (Why block in two places?  Because then
a single-point compromise will not remove all the blocks.)

(c) Figure out what a normal level of traffic from the smarthost looks
like, that is, in data volume per unit time.  Why?  Because even if you
have a gigabit of bandwidth available, you almost certainly don't
*need* that much for SMTP.   And when doing this kind of engineering,
any excess resource you don't need should be considered a liability:
you won't use it, but your adversaries sure will.

So let's you measure outbound data volume, and over a week, you see
that it peaks at 1M/hour.  Pick a fudge factor, maybe 5, and bandlimit
outbound flow from the mailhost to 5M/hour, with an alarm if that's
exceeded.

(d) Monitor the MTA's logs on the mailhost.  Even something as simple
as lines-generated-per-hour might be enough to flag a problem.

But if you want to get a bit fancier: most mail systems exhibit
easily measurable patterns.  Mail goes out to the same places most
of the time, at roughly the same volumes.  Joe's Donuts in Dubuque,
Iowa does not routinely send traffic to Peru or Portugal or Pakistan.
Use knowledge of these -- which you get by slicing and dicing your
logs -- to craft alarms that go off when unusual things happen.
Refine the alarms to adjust for FPs, and definitely for FNs.

(e) You can (and should) also do things like blocking *all* traffic
to and from the networks on the Spamhaus DROP list.  (Why?  Because
all possible outcomes of permitting that traffic are bad for you.)

(f) Revisit this periodically to see if any improvements can be
made.  Every incremental step correspondingly decreases the probability
that you'll originate abuse/attacks, and in judicious combination
they can frustrate all but the most clueful, determined,
well-resourced adversaries.


Does all of this stop abuse?  No.  But it does four things: first,
it limits the damage you can do.  You're still going to have a Bad Day
if your mailhost starts spamming, but if it only sends 2% of what it
*could* have sent, you will avoid a much worse day.  Second, it gives
you a fighting chance of detecting the problem.  Yes, other people
might eventually tell you about it, and that's great, but it would
be much better if you caught it first.

Third, it makes your operation a harder target.  Especially if
you repeat this approach across every server, every service, every
port, every protocol.  This discourages future attackers/abusers,
because they have their pick of much softer targets.  This in turn
reduces your future workload.  Think of it as being the opposite
of an abuse magnet.

Fourth, as you've probably guessed, it also hardens you against
*inbound* attacks.  So even if you completely lack the resolve
to defend the Internet from your operation, you should care about this
anyway because it's in your self-interest.


More information about the Ale mailing list