Risk at the Margin

Humans are, generally, pretty awesome at risk management.  Why, then, do we seem to be so bad at it – and in so many different ways – when it comes to assessing risk in the CoViD era?

Risk Models

First, let’s talk about how humans make most risk decisions.  Risk comes in a lot of different flavors (injury, long-term health, short-term health, embarrassment, financial, ….), and everyone weights those flavors differently.  For simplicity, I’m going to talk about risk as if it lives on a single, linear scale, like so:

Slide2
A human has an aggregate risk tolerance, somewhere on that scale:

Slide3
Really, you’re almost certainly all the way over on the left.  Humans are really risk averse, because we think we’re sort of immortal, and we don’t want to jeopardize that.
Slide4

When you assess an activity, you’re quickly going to put it either to your left (Safe! Do this!) or your right (Unsafe! Don't do that!).  While the “safe” activity might actually increase your risk, it seems like an activity you already accept, so you probably don’t consider it to, on the whole, make you less safe. For only a tiny amount of decisions do you need to actually think about the risk.

Slide5
That area in between “safe” and “unsafe” is really small. Most of the time, you don’t ever have to evaluate a set of choices that sit on the margin between safe and unsafe, or be forced to pick between two unsafe activities.  The distance between safe and unsafe is extremely small, although from our personal perspective, it seems massively large, since just about all risk decisions that we ever think about happen inside the margin.

new6
This presents a hazard for us: we believe that the decision between "safe" and "unsafe" is really obvious, because the choices are so far apart, when, really, many of these choices are separated by a tiny amount, and even small errors in our decision-making process may put something on the "wrong" side.

Making decisions
Human decision-making can be modeled using Boyd's OODA Loop: we
observe an input, we orient that input to our model of the world, we decide what we should do, and then we act on our plan.
Slide6
We do this so often, that our brains have optimized to perform our decision-making without thinking about it. We're like a machine-learning algorithm on steroids; our minds rapidly pattern match to get to a quick, cognitively-cheap, good enough solution, so you move from "Observe" to "Act" before you can even get introspective about your decision.
Risk at the margin

Orientation often starts with pulling models you think you know about, and using those as rough approximations.  So CoViD might be “like the flu, but worse,” even though CoViD risk bears less resemblance to flu risk than cricket has to baseball.  Sure, we can plan going to a cricket game by starting with the baseball rules and modifying them until we have cricket, but your sense of a three-hour game is woefully inaccurate.


One failure mode is that once you bucket a novel risk on one side of your margin or the other, you will not consider it further.  “CoViD is just like the flu, so I’ll behave normally” and “CoViD is the end of the world, let’s enter a post-apocalyptic dystopia” might describe ways this failure mode might kick in. Once you've bucketed the risk, it becomes really hard to move from "unsafe" to "safe." Let's put some arbitrary numbers onto our risk scale. Perhaps the most risk you'll accept to keep something in the "safe" bucket is 0.09 risk units, and the lowest risk that puts something in the unsafe bucket is 0.11 risk units.

margin7

So it should seem like subtracting 0.02 risk units should let us change our decision. Unfortunately, we're really reluctant to change our minds about risk, and that partly because, once we think abut risk, we feel like we take a lot of risk - and our margins are much larger to us, perhaps ranging from 0.05 risk units all the way up to 0.95.

margin8

A different failure mode starts when you might correctly identify that the aggregate risk is in the margin, and requires complex thought (“I can stay home to avoid CoViD, but then I won’t make any money and I might go stir-crazy, so how do I safely engage”).   You might miss steps that would be helpful: KN95 masks, fresh air, allow air diffusion before sharing a space, don’t sing in an enclosed area.  You’ll need to mitigate:  as your perception of risk goes up, you’ll take safety measures to push it down.  (Note that as your perception of risk goes
down, you’ll remove safety measures to let it come back up).
Slide7

Using the flawed model of CoViD-as-flu-but-worse, you might convince yourself that certain mitigating steps will help more than they do: rigorous surface washing, eliminate shared objects, wear lightweight cloth masks everywhere. You think you've reduced the risk of your choices down into the safe zone, even if you haven't. (Or your risk was in your safe zone, but you were wrong about the risk, and the safety theater measures you engage in aren't changing your risk). On the other hand, you might use the inverse (and also flawed) model of CoViD-as-flu-but-better, and convince yourself that it's okay to take even more risks, because the people telling you it's dangerous are clearly wrong, so what else are they wrong about?

The Failure of Expertise


It's natural to want to ask for help.  You ask an expert, “is this safe?”  That’s a loaded question.  Almost nothing is “safe”.  While you really want to ask, “Given that I’m comfortable with this level of risk, what actions should I be taking?,” a risk expert hears “Tell me everything is okay,” and they aren’t going to do that.

Only you can make risk choices on your behalf, because only you can decide if the reward is worth the risk for you.  An expert, who isn’t you, is generally going to err on the side of extreme caution, because they have an entirely different risk problem:  If they say something is “safe,” and there is a bad outcome, it’s their fault and they get blamed.  And since they’re often dealing across a population, even most “sort of safe” risks still pose a risk to someone in the population, so it’s easiest to have a very rigid canonical answer: Don’t do anything.

Experts in an area risk functionally end up only being able to veto things, because it’s too dangerous to do anything else. There is no incentive for an expert to ever suggest removing a safety control, even it is is high-cost and useless, because for them, the downside is too massive.

Confirmation Biases
If you make a “wrong” decision, it’s really hard to correct it, without radically new data.  If you put a group of activities on the wrong side of your risk tolerance, revisiting it generally requires you to be able to challenge every risk choice you’ve ever made.  That’s … not easy.  Even somewhat new data is easier to discard if it challenges your decisions, or easy to rigorously hold onto if it supports them (even if it later turns out to be incorrect).

Inspecting your models is one of the most helpful things you can do, but it’s hard; especially if you’ve been arguing loudly against people who made different choices than you.  You risk the embarrassment of agreeing with someone that you’ve said was foolish, so it’s simpler to dig in.

Risk at the Margin

That risk that lives in your margin you might have adjusted to push it just barely to one side or another (“I’ve mitigated risk enough that I choose to do this” vs. “I can’t mitigate this risk so I won’t do this activity”).  However, you are likely now going to stop inspecting that risk; it’s either in the Safe or Unsafe buckets.  Most people don’t waste cognitive capacity keeping track of marginal risk once they’ve bucketed it.

Boiling the Frog
If it’s hard to deal with a wrong risk choice, consider how much harder it is to deal with a mostly right risk choice, when the world changes and now that choice becomes wrong.  As incremental evidence comes in, you’re going to keep your choice on whichever side of your risk tolerance you placed it, because that’s easier.  But if you’d just barely moved it to one side, ignoring evidence that it is pushing to the other side is dangerous … but really easy.

Make your Predictions
One way to treat this risk confusion is to commit to predictions in advance.  “When all the adults in my house are fully vaccinated, then they can go eat lunch at a restaurant.”  That’s a safe commitment to make in advance, but harder to do in real time; but by depersonalizing a decision a little bit – you’re making the decision for your future self, so you’re a little more invested than an expert – you can engage conscious risk decision-making to your benefit.

Leading to Representation

It’s a trope among managers and executives that making significant inroads on building a more representatively diverse workforce is almost impossible.  Moving the needle by even a fraction of a percentage point in a normal year is considered a massive success worth celebrating.

That’s a cop-out.  It’s not easy, but it isn’t impossible.  And here’s my roadmap for doing so.

First the data, so you can see the success.  I started doing detailed tracking way too late in my career, in the middle of 2017, when I realized that the information I wanted wasn’t accessible via our normal manager toolkits, and it was too much labor to pull through my HR business partner.  I kept a spreadsheet (all good databases start as Excel!), and I recorded, for all of my staff, a few fields: Name, Pay Grade, Country, Startdate, Gender, and Race.  For those last two fields, I used a very small number of buckets to more closely align with Akamai HR norms.  The Gender summary includes Male, Female, and Non-Binary; trans staff were, for this summary, grouped with the gender they had publicly declared at that time. Note that all of the people who worked for me are individual humans that I know and care about, and I do them disservice with any bucketing strategy; but this summary is aligned with the metrics that Akamai tracked on an annual basis.

Every six months, I’d make a new version of the spreadsheet, with an updated snapshot of the organization.  I’d then summarize the data, so that I could compare trends across time.  I looked at non-white staff in the US population (“Minority”), Black/Hispanic staff in the US, female staff in the global population (“Women”), as well as non-binary staff globally.  I looked at crosscuts by seniority; staff in pay grades at or above manager level (“Senior”) versus those below manager levels (“Junior”).  Additionally, I tracked longevity, to look at those with less than one year of company tenure (“New”), one to five years (“Mid”), and those with more than five years (“Long”).  I used company tenure rather than team tenure intentionally, because I want to look at career progression in the company.  Given the small number of non-binary staff, I don't drill into them in the detailed views, which only explore women, minority, and Black/Hispanic populations. 

Jz1SNPC6a1YHvhNU_UGE-wgXVCGmUbLQAUx9tw32al_CkZSZhGTvSrRGSvN1Ph_oSpFArVqrhflR2EWVm1lL8bWxWNFRxNwmbBV6oXfNqnAqzVGDxqnFyGS-octzK257QD3aK75g
At first glance, you might wonder why the numbers went down in Minority and Women groups from the summer of 2017 to the following winter.  That’s partially an artifact of temporary workers, and I learned to only really look at the data year to year to separate out our summer employees.  In the last year, representation of women has leveled out, as well.  I attribute that to a combination of factors, but it starts with retention: those twenty-one net new staff from the start of 2020 until now?  That’s almost all hiring, because only one person left the team in that time period, and it was in the first week of 2020.  Since an Akamai reduction in force early in 2018, my team has had only ten departures.  If we’d had the tech industry average turnover, we’d’ve expected to lose fifty people over that three year window.  We would have had to hire twice as many people over those three years to have maintained the same trajectory.  There’s also just a bucketing artifact; the last five people to start in 2019 were women, and the first four in 2020 were men.

But that reduction in force, timed with a few other personnel issues on a small population, also significantly impacted the Black/Hispanic population.  Because those issues are about specific individual humans, I won’t dive into them here, but there is a strong lesson in representation: when you have small numbers of a represented population, even a few changes at the same time are not only significant on a chart, but they are significant in effect, as your team becomes visibly less representatively diverse.  You shouldn’t change your standards (unless they’re bad standards) to prevent this, but it’s another reason to drive for increased representation:  so you aren’t tempted to ever just work to a metric.  I don’t believe our team ever did, but I regret putting them in a position where they might have felt the pressure to just meet a metric.

Retention

Retention is a huge part of my strategy for changing representation.  I’d actually argue that it is more important than hiring, because if you have a retention problem, then it’s going to affect your hiring as well.  So how do you retain great staff?

Notice that I said “staff,” and not “women” or “minorities.”  While your strategy to build an inclusive environment needs to be informed by the diverse needs of your team, if you try to build an environment that is only intended to be inclusive to one aspect of your team, you’re not going to succeed.  Not for the obvious reason, either – sure, you’ll alienate your male staff – but for the less obvious.  Your staff will notice your insincerity.  You’re going to focus on being inclusive to stereotypes, rather than to the actual humans who work for you. There are a lot of aspects to inclusion, but I’m going to focus on three here: professional, work-life integration, and unique needs.

Professional inclusion is one of the most important things you can address.  Every single member of your organization must have a professional development plan, not just the ones you see as “high potential.”  You should identify their next two jobs (and there might be options), and make sure they and their manager are talking about the development they need to show, and what opportunities might be available for them.  Your managers should remember those needs for when an opportunity does come up, so they are considering all of their staff, and not just the ones on their favorites list.  For some of your staff, their next best opportunity may not be in your team.  Help them to build the skills to leave, if that’s what is right for them – they might choose to stay instead, and you directly get the benefit.  But indirectly, when everyone sees that more of your staff are being taken care of professionally, you benefit with increased engagement and retention.

Work-life integration is one of the simplest, but least well utilized retention strategies.  If you don’t make it a focus, your managers will, unfortunately, betray you.  But it’s not their fault: it’s yours.  You believe that having an unlimited time off program is sufficient.  But then you tell the team how many new priorities that they have to juggle, and you never let them deprioritize work.  Your managers hear that as requiring more hours out of your employees.  If you don’t make it very clear that you value the wellness of your staff, continuously and frequently, your managers will subvert your message.  You need to make clear that you care more about the productivity of your employees over the next four years than over the next two weeks.  You’ll actually end up with more productivity.  Consider those 40 employees I didn’t lose.  If we assume that it takes a year to hire and train someone to comparable productivity (I think that’s laughably short for most security jobs), my team has had almost twenty percent more productivity than a team with average turnover.  That twenty percent buys an organization a lot of time flexibility, even ignoring the much higher productivity from staff with greater experience in the organization.

Years ago my team experimented with flexible work programs after parental leaves, allowing people to gradually phase in their work hours, rather than returning abruptly to a 5-day work week.  Some staff didn’t need it, but others greatly valued it.  We were in the process of officially codifying it when Akamai did us one better and added significantly more time to the parental leave program, but we still let staff phase themselves back into work.

Unique needs of your staff are a deep opportunity to excel.  When only one person needs something, and it’s outside the usual set of requests, it really does matter.  It’s easy to think you can just check a few boxes with having diverse interview panels and a better parental leave program, but your team will really pay attention when you notice, and react, to the unique needs of individuals.  Maybe you have someone for whom the office lighting is a problem.  Use your influence to push for a better solution for them.  Perhaps you have a person who celebrates holidays that aren’t observed by most of the organization. Do you move your own meetings to accommodate one person? Do you make sure future meetings take those holidays into account?  It’s the personal, small things that actually have the biggest impact in retention, because it creates a clear signal to everyone in your management team that your employees matter to you.

Longevity

Your goal shouldn’t just be about total representation in your organization, but needs to also look at representation among both your junior and senior staff.  It’s tempting to try to tackle senior staff representation at once, and just go hire from outside.  That presents a challenge, because senior staff hires are more complex in a number of ways.  My observation of our team is that while about 45% of my team is senior, only around 25-30% of our new staff are.  It ticks up a little for mid-tenure staff to 33%, and long-tenure staff are 70% senior.  Looked at from the opposite direction, my 15 most senior staff – director level and up, 29% women, 18% US non-white, 11% US Black/Hispanic – are all long tenured personnel, with one exception, as the most recent team member approaches their fifth anniversary this summer.  Clearly, my hiring strategy isn’t going to solve for senior staff quickly, but I can certainly check to see if we’re trending in the right direction.

ONT5IRQ1j8z5VnvtR8tFp-urSP1fBPbMjjNrUfq2ugWugMXWJVcxR_NWZPoyfKUfTZwtV2zi4GNh7H9FpPuaXGiruZhEgN6RAJdxDyh5yc4n7-n1rqPcu2SZAqepn-SZmrK1BnHf

This chart is a little bit of an eye chart.  The first cluster is just a copy of the overall representation for women from above.  The next two clusters show the representation among junior staff and senior staff, and you can see that senior staff representation is finally starting to tick up over the last two years.  Why?  The final three charts have the story.  While our hiring representation (as seen in the “New” cluster) has been relatively good, the retention is key.  The mid-tenured staff is slowly growing more representative, and in the last two years we see long-tenured staff increasing in women’s representation.  The long-tenure representation tracks pretty similarly to the senior representation, and that’s going to be key: ensuring that we have an internal pipeline of candidates for senior positions.  Before I address that topic, a quick look at non-white US representation:

v1mqxKsdoLxT91yMUGQPYedxHK6ssWetOph3_ax8zCOQxjazRStvPKC6zJkIaqpwXHyfISPBUMDzFLLBROiS5xNrL3WKMb0IN-e4hUr72FUt1g267fynSGRWq_QF_txzq34bb40T
1seILltIJYnbIXEIVgYgy1sxAaaQm1I5z7ZJT7DFkIjee3hAJedlM3zVAE_PTHlKkpfM8JODDgfUQCh-r3sQuKMiERSImVWJ-OLTJF5eQYf08oxYZVbgFGYxGj9jaGLzCb2d4QFt
With lower numbers, and a little more jitter in the numbers, I see a pattern here that hints at even better futures.  The upswing in minority representation in the new- and mid- tenures suggests that the long-tenure will start to rise in the next few years, which ought to really start driving the representation in the senior cohort.  The smaller change in Black/Hispanic representation is more problematic, and, for the next CSO of Akamai, is clearly a place to investigate improvement efforts.


Promotions
The reason that retention matters so much in my philosophy is that the best senior staff are often those who come from within.  Every time we have an opportunity to fill a senior staff role – either through the very rare departure, or through increased funding to support a business partner – we have the freedom to hire one person, or to promote a handful...and still hire one person.  When an engineering team asks for a dedicated senior architect, that allows us to promote an architect, a senior security researcher, a security researcher II, and then hire a security researcher.

I also keep an eye on staff who haven’t had a promotion in a long time.  When I meet with my staff for our annual strategic staffing check-in, they each bring their list of people that they think are due for a promotion.  I also bring in my list, of the longest tenured people who haven’t seen a promotion in some time, so we can make sure we aren’t leaving people out, just because they aren’t making any noise.

But the real reason that promotions matter for representation numbers is because, frankly, the rest of the industry has done an atrocious job at developing competent senior staff, and the representation is awful.  Great staff with experience, especially if they also happen to check a diversity box, are getting paid premiums beyond what I’m willing to pay, simply because a lot of employers are trying to quickly patch their representation problems.  So if I want great senior staff, a lot of them are going to have to come from within.  Additionally, it means that we have to spend less time with senior staff on culture and value basics, because they’ve all spent years creating the baseline we expect.

Hiring
Hiring new staff is where everyone thinks the heavy hitting happens.   Like Moneyball, it’s not about home runs, it’s about just getting on base more often.  I’m not expecting my recruiters to solve all of my representation issues, but I can make it easier for them, by setting reasonable expectations.  We don’t hire unicorns.  We’ll develop our staff, so they needn’t be perfect when they walk in the door.  In fact, they may feel overwhelmed.

Insertion:  Many of our positions aren’t classic cybersecurity jobs at all.  The State of the Internet team (50% women, 25% minority) is a classic example.  Half of the team started their careers as reporters.  One is a data scientist.  One is a still-recovering auditor.  For many of our open positions, we go looking into adjacent career fields to find people with amazing skills that our career field is short on.

Early career:  We make heavy use of a college internship program, and we focus our summer projects with one clear success criterion: Did the intern have a valuable summer experience?  If we got useful work out of them, that’s icing on the cake, because our real goal is to find out if the intern will be a good fit for us, and if we’ll be a good fit for them.  Our goal is to know, by the end of the summer, if we’re going to make an offer, and make it as soon as possible.  We want the intern to know we want them back.

Transitions:  Akamai runs a fantastic Technical Academy, in which we pay people to go through a six-month intense training program to help them convert/return to a tech job.  Any time there is an ATA cohort where we have staff, we commit to hiring at least one person, because there is always at least someone who will be a good fit for a security team, even if they didn’t know it.

Market:  We did occasionally hire in the cybersecurity job market.  We paid careful attention to our job descriptions.  Did they use language that might dissuade candidates?  Did they contain voluminous requirements?  You’ll need to check the job descriptions after you post them publicly, because sometimes, in an effort to standardize and comply with various labor rules, the job descriptions get edited after you submit them. We made sure to have visibly diverse hiring panels, as well, partly to help convince talented staff to join us, and partly as a sensor to detect candidates who might have a problem in a diverse environment.

The Long Road
There isn’t just one secret.  You have to commit to a many-year effort to solve your representation issues.  You have to lead with deliberate care, and be visible all through your management chain making your interest clear and visible,  But you can make a difference, with effective intention.  

Understanding Risk

Operating or overseeing a business –  whether it’s as a director, executive, or manager – requires an understanding of risk, and especially how it impacts your strategy.  But risk is a nebulous concept.  It means something different to everyone, so it helps to levelset not just on a working definition of risk, but on approaches to thinking both about novel risks (those that aren’t yet on your radar) and known risks (those that are on your radar).

What is Risk?
Risk is anything that has a chance of adversely impacting your business.  Risk isn’t intrinsically a bad thing, all entities have a risk appetite that balances the risks they take against the rewards they seek.  Companies have to invite risk to pursue rewards; consider that merely making a profit invites competitors who will increase your risk.  Risks exist in many broad categories – often, a practitioner in one category thinks of their kind of risk as being the only kind that matters – and it’s important to apply risk management thoughts across the spectrum of risk.  The most prevalent risk just comes from liquidity (having enough cash to operate), which can include credit risk (the money you are owed … doesn’t materialize).  You might have risk that comes from your market (you need specific truths to operate, like a bear market or zero interest rates), your business strategy (your specific market strategy hinges on an underlying axiom, like people renting movies in a store), or compliance regimes (you can be put out of business simply for not following a rule). You can also face risks from reputation (if people no longer are willing to do business with you) or operations (your security and safety practices).

Within those broad risk areas, we can think of significant amounts of risk as coming from hazards.  Hazards are the subset of risks that aren’t intrinsic to your strategy, and have the potential to be surprisingly disruptive.

Hazards come in many flavors.  Some are procedural: in execution of your strategy, you might make errors.  Some are adversarial or environmental: other entities outside your control could harm you through this hazard.  And some are perverse incentives: you might incentive individuals on your team to do very dangerous things in execution of your strategy.  Each of these requires different forms of oversight to address, especially in places where they might interact.


Procedural Hazards
Many control regimes – from Sarbanes-Oxley to the NIST CSF to a whole host of ISO frameworks – are designed to help companies manage process risk.  Unfortunately, these frameworks, alone, seem to be insufficient to control those risks.  Overseeing risk can be challenging, as hundreds of detailed controls across an entire enterprise are potentially relevant, and identifying specific problematic areas isn’t an easy task.  Two important questions might help drive towards identifying hazards.

What is the scope of a control system?  Perhaps a company has a strong control in Identity and Access Management, and can report flawless execution in ensuring that only appropriate staff get access to systems.  But lost in the nuance of reporting is that the relevant control only applies to a subset of the systems in the company.  It’s the most important set of systems, of course, but importance is in the perception of management.  Right next to those important systems might be other, uncontrolled systems that don’t have good controls, which create hazards for the adjacent controlled systems.  Understanding where controls don’t cover the full scope of a company is an important first step.

How effective is the control system?  Some control systems look shiny from the outside, but on the inside, don’t actually provide meaningful protections.  It’s important to understand if there is a simple measurement that summarizes the control, which is also tied into the protections the control provides.  Perhaps the measure is reporting on activity (“We approved 75 products for launch this quarter”) and not on impact (“100% of products had absolutely no reported issues”).  An impact measure might reveal implausibility, a failure rate of 0% is not necessarily an indicator of a strong system.  It’s more likely an indicator of a control system that has no effect.

Combine these two questions as you consider how to report on the effectiveness of an overall control system.  Control effectiveness should report both on scope (what percentage of the system is controlled?) and effectiveness (what is the measure of process effectiveness?).  Risk appetite should be used to establish reasonable ranges for both of these measures, to identify when escalation will be needed to course correct, and how much escalation (telling executive management is likely a different threshold than telling the board). Identifying those thresholds before you cross them will save a lot of energetic conversation about whether or not something should be escalated.

Environmental and Adversarial Hazards
Some systems have defects that can go badly wrong if exploited in just the wrong way.  Sometimes that exploit needs a malicious actor, a criminal who wants to create harm for your business.  Other times that exploit doesn’t require malice; perhaps an extreme winter storm pushes your system outside its design limits.

These hazards are sometimes challenging to talk about.  The hazards aren’t always easy to find, and rarely with a simple checklist.  Sometimes the hazards are tolerable, you aren’t necessarily happy to have them, but you’ll tolerate them for a time.  Sometimes, these hazards are so intertwined into your system design and business process that even if you do decide to reduce the hazard, you’ll need to spend years coordinating cross-functional projects to root it out.

Discovery: One way that many companies identify these hazards is to employ experts who just know where to look.  Unfortunately, this approach relies on having a specific kind of unicorn: a deeply technical employee with broad-based knowledge of your entire system, a long memory to track issues, and the communication skills to educate your executive team about the hazards.  A more reliable approach is to embed hazard analysis all throughout the design process, and capture the hazards into a registry; and have that registry continuously reviewed – perhaps reassessing a few a month – to keep it updated with known hazards.

Mitigation: Some of those hazards you will need to mitigate.  You don’t need to reduce them all to zero (sometimes just taking the edge off by a little bit is sufficient to bring the risk back into your appetite), but once you decide to reduce the impact of the hazard, it’s helpful to identify success criteria.  Think of success criteria as a contract with your future self: “if I do this much work, measurable by this outcome, and the world hasn’t changed to make this more dangerous, then I get to celebrate success.”  It will be tempting along the way to move the goalposts closer, because mitigation projects can take longer than you originally expected.  Inspect that urge.  Did you really misestimate the danger originally, or do you just have fatigue and would like to be done, even if the hazard remains uncontrolled?

Awareness:  Some of your hazards you aren’t going to mitigate.  Perhaps the hazard is too embedded in your way of doing business.  Maybe the hazard is just below the level where it would be urgent to fix.  This is uncomfortable, because you’ll have to acknowledge the presence of these hazards, and it’s a natural reaction to avoid talking about them.  But you must, because the only real way to understand the risk appetite is to actually talk about the hazards that you accept, especially in the context of all of the hazards contained in your registry.  Gaining awareness (likely not comfort, but at least awareness) of which risks are accepted make assessing new and novel risks into an easier task.

Incentivized Hazards
The most pernicious hazards an organization faces are those that it creates for itself, by putting its own employees’ incentives at odds with its long-term best outcomes.  Sometimes this might be through ill-thought through systemic incentives (consider JPMC’s “London whale” or the Wells Fargo cross-selling debacle); other times it might be created by specific pressures to achieve results (look at Volkswagen’s DieselGate or Theranos).

Most incentivized hazards create a tension between what ought to be the values and cultures of a company (which are often just plaques on a wall, rather than living touchstones), and the short-term needs of a company.  Incentives can be novel solutions to a changing business environment, or might arise from impossible business needs.  But detecting perverse incentives isn’t impossible; it just requires extra care.

Culture:  We shouldn’t expect that employees will be the only line of defense against a hazard, but we should expect that they should feel uncomfortable with conflicting goals – but that they should feel comfortable raising that conflict with management.  Organizational values should be viewed like a detour sign: they indicate which paths to avoid.  Perhaps, to avoid a Wells Fargo style incident, a value like “Serve the best interests of our customers” would be helpful to create tension against “cross-sell as many products as possible to our existing customers.”

Changing business environment:  When the environment alters in a significant fashion, novel solutions to the problem create an automatic perverse incentive: the novel solution absolutely cannot be permitted to fail.  The team responsible for the solution is automatically incentivized to hide risks and adverse information, or, at minimum, to downplay it (the JPMC response to Basel III can be viewed in this light).  Look closely at those novel changes, and inspect them closely for concealed risks.

Impossible business requirements:  Sometimes an organization needs an outcome so desperately that it can only be achieved by some breakthrough that seems impossible.  Similar to novel business processes, this creates an incentive to ensure that the solution exists, even if it doesn’t!  Consider VW, which needed an innovation in diesel engine technology which was otherwise unheard of.  Much like in a changing business environment, this should be seen as an indicator to dig very deeply into the solution, to understand if the solution truly works as advertised, or creates new hazards for the business.

Planning for perversion of incentives:  Almost any structural incentive can become perverse – consider that as a structural hazard of an incentive – but incentives can be instrumented to look for those hazards.  A variant of the pre-mortem is very helpful: consider that the incentive will create a perverse incentive, and then try to identify how that happened.  Putting in place measurements to detect those outcomes can be helpful (Is this incentive significantly more effective than we anticipated?  Is the business generated by this incentive structurally good?).

How much is enough?
Ultimately, the question that risk management programs seek to answer is “How much risk reduction is enough to get us back into our risk appetite?”  Or, rephrased, “How do you know that you did the best that you could, given the circumstances?”

The answer to that question isn’t a simple one, but it boils down to an understanding of how comfortable you are with the actions you took, and the decisions you made, given what you could know at the time.  Of course, with perfect foresight, you would perfectly navigate the risk environment, and only make bets that are worthwhile.  But you don’t have perfect foresight, so don’t apply it in hindsight.

Are you paying attention to risk?  Are you willing to look in uncomfortable places for risk?  Are you controlling for the risk you incentivize?  Are you comfortable with where you’ve drawn the line between the hazards you’re mitigating and the ones you aren’t?

Kremlinology

Never attribute to malice what can be explained by stupidity.  Sometimes, your own! 

There is a natural human tendency towards kremlinology – that is, the attempt to impute motives by observing only a few characteristics or outputs.  In one application, it is called Fundamental Attribution Error, when we assert that someone has ill motives just because we are harmed by their action.

It’s a form of conspiracy theory, as we try to connect the dots on someone’s actions to help us understand what’s happening out of our sight.  But we often miss an important safety check.  Once we understand someone’s motives – or think we do! – we should be able to reverse engineer the observed outcomes.

That is, if we observe outputs A, B, and C, we might assert that someone really disliked us, and they were out to “get us.”  But let’s flip the scenario:  Assume that they did dislike you, and wanted to “get you.”  Are actions A, B, and C the actions that they would take, consistent with their past behaviors?  Maybe if they were really trying to harm you, actions D, E, and F make more sense based on how they’ve acted – if so, maybe your hypothesis is incorrect.

Much like doing homework, the principle of “check your work” still applies – run the problem backward and see if your answer fits the problem as observed.

Football. CoViD-19, and distributed systems hazards

Looking at the latest trickle of Covid-19 cases in the NFL – specifically in the Patriots locker room – it strikes me that some of the challenges of public health safety are strikingly similar to the issues of distributed system safety in computer systems, and each can help highlight important lessons in the other.

Caveats:  I am not an epidemiologist, nor do I play one on TV.  There is still a lot we don’t have certainty on around Covid-19, from incubation periods and transmission mechanics, to testing reliability and safety protocols. I am also not an NFL insider, and most of my information about what the NFL is doing is inferred from the fantastic coverage of a number of NFL reporters, especially on the Patriots and medical beats.  But we can infer and assume some things for the sake of thinking about the safety of the NFL’s apparent distributed safety protocol.

Background:  There are two interesting classes of tests for Covid-19: qPCR and POC.  qPCR (quantitative polymerase chain reaction) is the more reliable test, but takes a number of hours to get a result; it appears a 10-12 hour lag from point of test to results being made available, based on available sensors (we see players get tested before practices, and we hear about results in that evening/night).  POC (point of care) tests are rapid tests – with results in minutes – but are lower reliability.  Reliability here is a combination of both sensitivity (a positive result means the person has Covid-19) and specificity (a negative result means the person does not have it). qPCR tests are more expensive, consuming scarcer resources than POC tests do.

The NFL protocols, in general, appear to be oriented around qPCR tests; given every day except gamedays; and once an outbreak occurs in a locker room, adding in POC tests for that team.
Read More...

The Future of Work

Here we are.  Three to six months into CoviDistancing – call it lockdowns, social distancing, isolation, shutdowns – and, really, there’s no end in sight.

Let that sink in for a few minutes.  It’s possible that there’s an effective vaccine just around the corner – which generally means a year of human trials so we know it’s reasonably safe.  It’s possible that a cocktail of treatments which make CoViD-19 no worse than a common cold is also just around the corner.  I hope and pray that one, or really both, of those is true.

But hope is not a strategy.  And prayer is most powerful when it accompanies action.

Let’s also acknowledge that just about everyone has been wrong about CoViD-19 at one point or another.  Most people and organizations waited to distance their employees or isolate themselves until far too late.  Almost everyone – from the US Surgeon General down – has been wrong at one point or another about masks.  Treatments that we hoped were promising weren’t.  Institutions that we trust to guide us demonstrated their fallibility.  Ideas about how to “work remotely” ran headlong into the reality of juggling crisis-schooling.

But really, none of that matters today, except to demonstrate one powerful lesson – we’re learning.  And that’s the best option for the future.  To continue learning.  Because if there is no end in sight to living with CoViD-19, then it’s on us to learn how to live with it.  Because the world we end up with where we don’t learn to live with it is a nasty place – we will accelerate inequality while we degrade the quality of life for so many people.

My recent exp Read More...

Moving to Distributed Work

So, you're working from home …

For a while.

You've probably worked remotely before, and you're thinking, "I've got this!"

Odds are, you're mistaken. You don't have this. That's OK; this is an opportunity to learn new skills.

You can think of working from home much like someone moving into an entirely new environment. Your patterns of work might be optimized for working in an office, and they might not quite fit at home. You can think of this post as moving you from accommodating yourself to including yourself — reducing the friction that misspends your energy just to exist.

Now it's time to adapt. You need to adapt, your workday needs to adapt, and your environment needs to be adapted. So what can you do? Below is some advice — take it in the spirit of unsolicited advice on self-improvement. Some of these things will work for you; some of them won't. Many of these ideas work for me or people near me; they might or might not work for you. Give them a try, and be willing to learn and adapt.

Your Workspace
Maybe you've been getting by with sitting on the couch or on the floor in the corner of your bedroom. Those might be all the choices you have, but you should consider some changes:
  • Use an external monitor. One of the biggest productivity gains comes from useful screen real estate, so finding a way to get more is incredibly helpful to you. Paired with an external keyboard and mouse, you're also on your way to better ergonomics.
  • Use a desk and a chair. Sitting on a couch for a long period is probably not healthy in a lot of ways. Can you fit in a sit/stand desk? Maybe you do need a different ergonomic choice, but make it deliberately.
  • If you can dedicate a workspace, that's ideal. If you can't, consider a space that you can set up at the start of the workday, then tear it back down in the evening — so you have clearly delineated boundaries of when you're "in the office" instead of just chilling.
  • Even if you can't dedicate a workspace, make a conscious effort to not take a meal (be it lunch, dinner, etc.) from where you are working. If you have a dedicated workspace, leave it and go to your kitchen, another room, or, if possible, outside for your meal. This should be time to mentally recharge as much as physically recharge. If you don't have a dedicated space, still take the time to close your laptop and do something that is not work. Your brain (and your similarly stressed co-workers) will thank you.
  • Do you have a headset with a microphone to take meetings with? Gaming headsets can be an affordable and high-quality solution, or possibly Bluetooth earbuds. Anything is an improvement over just using your laptop's speakers. But also think about how your ears might feel after multiple hours using a device you're not familiar with. Maybe change between earbuds and a headset … or even just take a long break from videoconferencing.
  • Wired Ethernet makes an enormous difference for videoconferencing — and for many of our other tools. Even if the cable has to get unplugged when you roll up your desk at the end of the day, this can be worth the trouble.

Your Family
There's a good chance you're sharing your space with other people — a partner, some children, maybe roommates. Their needs will matter, too, and it's better for you to plan ahead with your schedules so that no one is disappointed.
  • Do you have to homeschool small children? What does your plan look like for that, and how are you trading it off with your partner?
  • Do you need to add daily household meetings to identify any issues?

Your Commute
You might be really excited about not having to waste time getting to the office because you can just hit work running. But take a moment to think about what you also do during your commute. Are you thinking about your schedule for the day? Working on a hard problem? Thinking about your kids? That's valuable mental time, which you should consider how to keep in your day so that you can gracefully transition between parts of your life.
  • Can you go for a walk around the block (or further)?
  • Can you set aside quiet time at the start and end of your day, before you dive into email?
  • Make sure you take time for lunch. This might make a good time to check in with your colleagues in your co-working space or take quiet time for yourself. You might want to think about planning for those lunches to make sure you're making healthy choices rather than just grabbing whatever is available.

  • Make a hard break. "Bye, kids, I'm headed to work!" can be a really powerful boundary to set.

Your Meetings
Meeting culture is very location-centric, especially when that location is your headquarters. Some of that is a product of enterprise tools (many video solutions makes it hard to see more than a few participants at once, and the slight added latency over the Internet interacts with the human desire to jump in as the next speaker), some is a product of our organizations (meetings where 80% of the attendees are physically in one place), and some is a product of habit (sitting in a circle, which then excludes the video participants). This is an opportunity to work on more-inclusive meeting structures.
  • Consider nonverbal cues for meeting participants to use to call for attention. If everyone is visible, that can be a raised hand; if that's not the case, then a chat backchannel can help.
  • Work more on pauses between speakers. There is rarely a need to jump in instantly, and that's often seen as a behavior that is exclusionary anyway, so this is a good opportunity to evaluate it. Past three people, a moderator helps enormously — perhaps defaulting to whomever called the meeting or wrote the agenda.
  • Consider working off a shared document with an agenda and notes so that some information flows can be faster-than-verbal. This might rely on everyone having more screen real estate.
  • Think about the lighting. You should be able to clearly see your face, which generally means lights and windows should be in front of you, not behind you. It's always possible to learn from one call and revise or improve for the next one.
  • Thirty-minute blocks are not fundamental to the universe. You can meet for 5 minutes or 15 — and jumping from chat to a video call for 5 minutes can unlock great work for you or your colleagues.
  • As a last resort, disabling video can improve audio distortions, jitter, and latency in meetings.

Your Physical Wellness
When working from home, it can be really easy to fall into a rut with no physical activity. Perhaps you roll out of bed, grab a quick bite, and hop on a call. For a day, that's only a little bad, but that's a bad long-term pattern. Schedule your exercise time.
  • Maybe take that long walk at the start of your day or after lunch.
  • If you're fortunate enough to have a treadmill or stationary cycle in your house, maybe you take a walking meeting with a colleague.
  • Look at how you can keep your body from stiffening from a lack of movement or poor ergonomics. Take stretch breaks. Take a 20-second break every 20 minutes and look out at something at least 20 feet away to prevent eyestrain. Consider how to incorporate physical wellness into your everyday routine.

Your Emotional Wellness
Odds are you get some value out of occasionally talking to other human beings. Find ways to take care of your need for connection, even while you're practicing social distancing.
  • Schedule open office hours. Open up a video chat and let colleagues join in. Maybe it's just a "hallway chat" that people can drop in for. Perhaps you have a tea-time theme and let people use tea as a conversation starter.
  • Connect with people that you usually sit near but don't have meetings with. Check in with them.
  • Think about what errands you run and how you can incorporate a little more social interaction into them.
  • We're all saving on commute time. A work-social event, such as a knitting group or a distributed board game (for example, Words With Friends), may even be helpful and appropriate during your day.

Your Team
Recognize that your colleagues are working through the same challenges that you are, and you can help them by both experimenting and by setting examples.
  • Consider checking in and out at the start and end of your day. Especially if you're a manager, you'll be tempted to squeeze in some extra time out of the afternoon commute; but even if you tell your staff that they're done, they won't really believe it if you don't show them that you're done. They can't see you walk out of the office, so you have to tell them you are.
  • Recognize that your colleagues may have to make different choices than you do. Maybe they're taking a few hours in the middle of the day to interact with their family. Maybe they're making food for more people. Perhaps they create a hard stop at 5 p.m. Honor their choices — and do so visibly — so they know you're supporting them.
  • Remember that you're now a guest in your colleagues' homes — things you say might be overheard by their spouse or children, so be a more-gentle human.
I'd like to thank the helpful Akamai humans who contributed to the content here.

This post originally appeared on
Dark Reading.

Unsolicited Advice on Self Improvement

Some thoughts on advice about guiding others’ self-improvement.  Often, advice comes in the form of “If you do X, Y will happen.” It’s worth unpacking that. What “If you do X, Y will happen” often really means is, “For some group, which I think is large and I believe you are in, doing X will increase their favorable outcomes in the direction of Y by some small amount.” 
 
The first half of that framing can be really othering if you’re wrong. “I think this works for all humans!” might get heard as “It doesn’t work for me, do you think I’m not human?”  I think of the latter half of the framing as the 1% effect:  For some set of people, this advice might improve their expectations of good outcomes, but maybe by no more than 1 percent.   How big is “some people”?  It might be just the advocate (“I did this and it worked for me”), or it might apply to a large in-group.  But it almost certainly doesn’t apply universally, to every possible listener.  We should understand that the original framing (“Do X to get Y”) is incredibly othering to members of the  outgroup.  That advice might not work at all for them.  Or a 1% improvement in their outcomes might be outweighed by the odds stacked against them. So that’s the first tip in guiding: acknowledge that you might be wrong, and your advice might not apply.  And that for some people, these topics are so sensitive and carry so much historical stress that you might hurt them more than you planned to help them. 
 
But what about the 1% effect?  Assume that any piece of guidance might help someone by 1%.  Is that a lot?  It depends!  The difference between the best NFL player and the worst is just a small collection of 1% effects.  But me?  1% isn’t getting me onto the Patriots roster.  Understand where there’s a cluster effect - maybe this advice is useful, but the biggest effect you see is for people who’ve invested in gaining a lot of advantage in a related area, and others might not see the same benefit. 
 
Consider this one: when people ask how I am, I say, “I’m fantastic!”  It helps me keep my frame focused on positives, which increases my resilience to stress.  I used to say, “Not bad,” and I noticed that I was looking for the bad.  Now for me, that “one simple trick” sits on top of a lot of mindfulness, and care, and good fortune.  Thanks to a colleague, I always remember I could have been born a nematode!  I’m really fortunate to be a human in the 21st century. I 
suspect that for many people, a daily remembrance of good fortune will improve their condition - but I also know that for people with a wide range of circumstances, from chemical depression to trauma to many more, that advice rings hollow. So when you’re evangelizing something that works for you, and maybe others, recognize that you are almost certainly talking at someone with a different experience.  And they might not appreciate unhelpful and/or unsolicited advice without caveats. (If you’re in the mistargeted group for whom this advice might even be harmful, recognize that this blind spot around inapplicable advice might be just a blind spot, and not explicit malice (We hope).)
 
That leads to a point about inclusion.  I think of inclusion as “reducing the energy cost of a person just to exist in a space.”  Recognize that your assumptions about what works for other people increase their existence cost when you’re wrong. 
 
So I conclude with hopefully near-universally applicable guidance, from the hallowed halls of San Dimas:  
Be excellent to each other. And, as always, thank you to a remarkable cast of humans who help me think about these ideas, and find ways to make the world better.  It’s truly a blessing to know you and have access to you.

Composing Defences

Often, in the information security community, we bandy about terms like “defence in depth” or “layered defences.”  Most of the time, it’s just a platitude for “buy more stuff.” It’s worth exploring the way these terms evolved, and how we should think about defensive architectures in the world defined not by physical space, but by network connectivity.

In the flat space of military defences in the pre-WWII area, defence in depth would refer to one of two concepts.  In the first mode, it was a set of defences which interlocked in some form -- consider a castle wall, a moat, and a set of guards atop the wall.  Each of these defenses, individually, was trivially defeatable, but together, they multiplied. While an adversary was busy crossing the moat, they were easy to shoot at.  The moat made it hard to scale the wall. The wall gave defensive cover to the guards. In the second mode, it was about depth in distance - consider the depth of the Soviet terrain as they fell back in World War II, and the lengthening of the attacker’s supply lines as weather set in.  “Never get involved in a land war in Asia” is good advice for a reason.

Integrating defences relies on some basic features of the physical world.  Adversaries occupy space across a period of time. Defenders can trivially observe adversaries - the Mark One eyeball is generally ubiquitous across history.  But when defences integrate, it may be easier to think of them as stacking – defence in height.

When defences fail to integrate, allowing an attacker to sequentially defeat them – consider a set of hurdles in a line – then depth may be the correct way to consider the dimension.  Consider a pair of identical, locked doors, with a small, unmonitored space between them.  While an attacker may take more time to defeat the doors (either using lockpicks, slides, or a purloined key), neither defence is actually made harder by the presence of the other.

Sometimes, defences don’t even stack.  Defence in breadth represents a set of defences that present a choice to an adversary, where they can opt not to engage in a defence, by going around it.  The postern gate provides an alternate path for a spy than the front gate; the Maginot Line could be gone around; any of a dozen servers in a network DMZ can be breached to provide access to an intranet.

The lesson for defenders is to understand both the system you’re defending, and how its defences work – or don’t ­– together.  Increased complexity may be an indicator of defences in breadth, often with “layered” defences where the defeat of one could go undetected.  Our goal should be to create defence in height, where we know how our defences work together towards defeating adversaries.

How do we approach improving our defences?  
One way is to flip our mental model, and consider ourselves as attackers, and the adversary as a defender.  In the same way an adversary might conduct surveillance on our defences, we need to surveil the adversary as they defeat our defences.  We should consider our boundary systems as the adversary’s, and ask, “How can we see the adversary conducting an operation?” While an adversary’s dwell time inside our perimeters might not need to be long to accomplish their goals, how can we observe artifacts of their presence?

Another approach is to understand that our perimeters are almost always wider than we understand.  When we try to govern our systems, we often start from the best maintained systems and work outward;  adversaries will start from our worst-maintained systems and work inward. We need to aim to operationalize the same visibility and maintenance practices across our entire perimeter stack, so that we understand our risks, and not bury them deep as a footnote in our assessments.

A third approach is to reduce our perimeter entirely.  Simplifying our defensive models makes them easier for us to understand, and reduces the possibilities for adversaries to penetrate through unknown ways.  This may involve partitioning our system clusters, so that lateral movement is restricted, and each network architecture becomes understandable.

All of these approaches have value in improving our defenses, and restoring height to our walls in meaningful and helpful ways.

A Perimeter of One

Even before there were enterprises we thought of as carbon vs silicon, enterprises were graphite-and-paper.  In the graphite-and-paper enterprise, an organization had perceived control over all of its information assets – after all, they were written down, in hard copy, and often didn’t leave the building.  While humans came into the building, the information perimeter existed at the tip of the pencil.

As computers came into the enterprise, often the first use case was to displace existing systems, and replace the graphite-and-paper enterprise with a silicon enterprise – instead of doing accounting in double-entry on fold-out ledgers, accounting took place in a general ledger application.

Yet the security world still thought of computers as just quicker versions of the graphite-and-paper world.  Our perimeter still existed at the fingertips of the humans, only now those fingertips were typing on a keyboard instead of scribbling in a notebook.  But our security was still based on the models of a physical perimeter. Mostly. But with a very dangerous flaw.

A physical perimeter — at least the non-human parts of it — isn’t really designed to keep adversaries out.  It’s designed to slow adversaries down. To change the cost equation for adversaries, to make them risk their own safety until human guards notice their attempts to enter.

And when the silicon enterprise connected to other networks, we kept this very flawed model.  Because we’d always trusted the silicon — after all, it had evolved from graphite-and-paper, which only lied when humans told it to — we weren’t prepared for how untrustworthy our computers would become. And the rate of silicon communications far exceeded our expectations for monitoring, and adversaries had little personal risk.  So we relied on “securing our perimeter,” in a last-ditch attempt to keep adversaries out.

But the basis of our security controls was all about establishing perfect trust in our devices and networks.  We’d require the best endpoint security, no matter where our devices were, because our security models would rely on that trust to build a credible environment.  Even when our devices would travel the world in the hand of a user — the one thing we wouldn’t trust — and be used for official and personal use, we would still believe that we could trust those devices, and make them part of our enterprise.

But those devices aren’t part of our enterprise.

They’re part of the user’s perimeter, instead.

Around the turn of the millennium, enterprising CFOs realized that with the increased consumerization of the mobile phone market, there was no reason for enterprises to own and manage cellphones.  Instead, at best, a cellphone allowance could be issued to employees, and those humans could be responsible for their devices.

It was a smart move financially, but one with long-lasting repercussions for the security model of enterprises.  While most phones — and even early smartphones — acted as clients to some larger network, with the advent of the iPhone, the model shifted.  Smartphones are now an extension of the human who carries them, not of the network that they connect to.

And since the distance between a smartphone and a laptop isn’t that large, we should consider the laptop as also part of the human who carries them.  And, as a result, the enterprise really shouldn’t carry any implicit trust for them.

In the same fashion that a consumer-oriented enterprise doesn’t overly trust the security of the devices its users operate, the modern enterprise needs to function in the same fashion to its employees.

Does this mean that we just abandon employees to the dangers of the Internet?  Of course not. The modern IT department has become a managed service provider, providing its clients — the human employees — with support and security services to protect that human’s cybernetic perimeter against adversaries.  But that service doesn’t mean that our enterprise applications should implicitly trust those devices.

Instead, our enterprise applications should give no more trust to the devices than necessary, and only as a proxy for the specific human who carries them.  This is hard work, because we’re so used to the belief of being able to trust everything on our network. But our network is the Internet now, and our mental perimeter needs to shrink to only encompass our applications.  Everything else outside those applications should have no implicit trust.

And the user’s devices?  They’re inside the user’s perimeter, and we should help them establish a safe perimeter of one.