Risk at the Margin

Humans are, generally, pretty awesome at risk management.  Why, then, do we seem to be so bad at it – and in so many different ways – when it comes to assessing risk in the CoViD era?

Risk Models

First, let’s talk about how humans make most risk decisions.  Risk comes in a lot of different flavors (injury, long-term health, short-term health, embarrassment, financial, ….), and everyone weights those flavors differently.  For simplicity, I’m going to talk about risk as if it lives on a single, linear scale, like so:

Slide2
A human has an aggregate risk tolerance, somewhere on that scale:

Slide3
Really, you’re almost certainly all the way over on the left.  Humans are really risk averse, because we think we’re sort of immortal, and we don’t want to jeopardize that.
Slide4

When you assess an activity, you’re quickly going to put it either to your left (Safe! Do this!) or your right (Unsafe! Don't do that!).  While the “safe” activity might actually increase your risk, it seems like an activity you already accept, so you probably don’t consider it to, on the whole, make you less safe. For only a tiny amount of decisions do you need to actually think about the risk.

Slide5
That area in between “safe” and “unsafe” is really small. Most of the time, you don’t ever have to evaluate a set of choices that sit on the margin between safe and unsafe, or be forced to pick between two unsafe activities.  The distance between safe and unsafe is extremely small, although from our personal perspective, it seems massively large, since just about all risk decisions that we ever think about happen inside the margin.

new6
This presents a hazard for us: we believe that the decision between "safe" and "unsafe" is really obvious, because the choices are so far apart, when, really, many of these choices are separated by a tiny amount, and even small errors in our decision-making process may put something on the "wrong" side.

Making decisions
Human decision-making can be modeled using Boyd's OODA Loop: we
observe an input, we orient that input to our model of the world, we decide what we should do, and then we act on our plan.
Slide6
We do this so often, that our brains have optimized to perform our decision-making without thinking about it. We're like a machine-learning algorithm on steroids; our minds rapidly pattern match to get to a quick, cognitively-cheap, good enough solution, so you move from "Observe" to "Act" before you can even get introspective about your decision.
Risk at the margin

Orientation often starts with pulling models you think you know about, and using those as rough approximations.  So CoViD might be “like the flu, but worse,” even though CoViD risk bears less resemblance to flu risk than cricket has to baseball.  Sure, we can plan going to a cricket game by starting with the baseball rules and modifying them until we have cricket, but your sense of a three-hour game is woefully inaccurate.


One failure mode is that once you bucket a novel risk on one side of your margin or the other, you will not consider it further.  “CoViD is just like the flu, so I’ll behave normally” and “CoViD is the end of the world, let’s enter a post-apocalyptic dystopia” might describe ways this failure mode might kick in. Once you've bucketed the risk, it becomes really hard to move from "unsafe" to "safe." Let's put some arbitrary numbers onto our risk scale. Perhaps the most risk you'll accept to keep something in the "safe" bucket is 0.09 risk units, and the lowest risk that puts something in the unsafe bucket is 0.11 risk units.

margin7

So it should seem like subtracting 0.02 risk units should let us change our decision. Unfortunately, we're really reluctant to change our minds about risk, and that partly because, once we think abut risk, we feel like we take a lot of risk - and our margins are much larger to us, perhaps ranging from 0.05 risk units all the way up to 0.95.

margin8

A different failure mode starts when you might correctly identify that the aggregate risk is in the margin, and requires complex thought (“I can stay home to avoid CoViD, but then I won’t make any money and I might go stir-crazy, so how do I safely engage”).   You might miss steps that would be helpful: KN95 masks, fresh air, allow air diffusion before sharing a space, don’t sing in an enclosed area.  You’ll need to mitigate:  as your perception of risk goes up, you’ll take safety measures to push it down.  (Note that as your perception of risk goes
down, you’ll remove safety measures to let it come back up).
Slide7

Using the flawed model of CoViD-as-flu-but-worse, you might convince yourself that certain mitigating steps will help more than they do: rigorous surface washing, eliminate shared objects, wear lightweight cloth masks everywhere. You think you've reduced the risk of your choices down into the safe zone, even if you haven't. (Or your risk was in your safe zone, but you were wrong about the risk, and the safety theater measures you engage in aren't changing your risk). On the other hand, you might use the inverse (and also flawed) model of CoViD-as-flu-but-better, and convince yourself that it's okay to take even more risks, because the people telling you it's dangerous are clearly wrong, so what else are they wrong about?

The Failure of Expertise


It's natural to want to ask for help.  You ask an expert, “is this safe?”  That’s a loaded question.  Almost nothing is “safe”.  While you really want to ask, “Given that I’m comfortable with this level of risk, what actions should I be taking?,” a risk expert hears “Tell me everything is okay,” and they aren’t going to do that.

Only you can make risk choices on your behalf, because only you can decide if the reward is worth the risk for you.  An expert, who isn’t you, is generally going to err on the side of extreme caution, because they have an entirely different risk problem:  If they say something is “safe,” and there is a bad outcome, it’s their fault and they get blamed.  And since they’re often dealing across a population, even most “sort of safe” risks still pose a risk to someone in the population, so it’s easiest to have a very rigid canonical answer: Don’t do anything.

Experts in an area risk functionally end up only being able to veto things, because it’s too dangerous to do anything else. There is no incentive for an expert to ever suggest removing a safety control, even it is is high-cost and useless, because for them, the downside is too massive.

Confirmation Biases
If you make a “wrong” decision, it’s really hard to correct it, without radically new data.  If you put a group of activities on the wrong side of your risk tolerance, revisiting it generally requires you to be able to challenge every risk choice you’ve ever made.  That’s … not easy.  Even somewhat new data is easier to discard if it challenges your decisions, or easy to rigorously hold onto if it supports them (even if it later turns out to be incorrect).

Inspecting your models is one of the most helpful things you can do, but it’s hard; especially if you’ve been arguing loudly against people who made different choices than you.  You risk the embarrassment of agreeing with someone that you’ve said was foolish, so it’s simpler to dig in.

Risk at the Margin

That risk that lives in your margin you might have adjusted to push it just barely to one side or another (“I’ve mitigated risk enough that I choose to do this” vs. “I can’t mitigate this risk so I won’t do this activity”).  However, you are likely now going to stop inspecting that risk; it’s either in the Safe or Unsafe buckets.  Most people don’t waste cognitive capacity keeping track of marginal risk once they’ve bucketed it.

Boiling the Frog
If it’s hard to deal with a wrong risk choice, consider how much harder it is to deal with a mostly right risk choice, when the world changes and now that choice becomes wrong.  As incremental evidence comes in, you’re going to keep your choice on whichever side of your risk tolerance you placed it, because that’s easier.  But if you’d just barely moved it to one side, ignoring evidence that it is pushing to the other side is dangerous … but really easy.

Make your Predictions
One way to treat this risk confusion is to commit to predictions in advance.  “When all the adults in my house are fully vaccinated, then they can go eat lunch at a restaurant.”  That’s a safe commitment to make in advance, but harder to do in real time; but by depersonalizing a decision a little bit – you’re making the decision for your future self, so you’re a little more invested than an expert – you can engage conscious risk decision-making to your benefit.

Leading to Representation

It’s a trope among managers and executives that making significant inroads on building a more representatively diverse workforce is almost impossible.  Moving the needle by even a fraction of a percentage point in a normal year is considered a massive success worth celebrating.

That’s a cop-out.  It’s not easy, but it isn’t impossible.  And here’s my roadmap for doing so.

First the data, so you can see the success.  I started doing detailed tracking way too late in my career, in the middle of 2017, when I realized that the information I wanted wasn’t accessible via our normal manager toolkits, and it was too much labor to pull through my HR business partner.  I kept a spreadsheet (all good databases start as Excel!), and I recorded, for all of my staff, a few fields: Name, Pay Grade, Country, Startdate, Gender, and Race.  For those last two fields, I used a very small number of buckets to more closely align with Akamai HR norms.  The Gender summary includes Male, Female, and Non-Binary; trans staff were, for this summary, grouped with the gender they had publicly declared at that time. Note that all of the people who worked for me are individual humans that I know and care about, and I do them disservice with any bucketing strategy; but this summary is aligned with the metrics that Akamai tracked on an annual basis.

Every six months, I’d make a new version of the spreadsheet, with an updated snapshot of the organization.  I’d then summarize the data, so that I could compare trends across time.  I looked at non-white staff in the US population (“Minority”), Black/Hispanic staff in the US, female staff in the global population (“Women”), as well as non-binary staff globally.  I looked at crosscuts by seniority; staff in pay grades at or above manager level (“Senior”) versus those below manager levels (“Junior”).  Additionally, I tracked longevity, to look at those with less than one year of company tenure (“New”), one to five years (“Mid”), and those with more than five years (“Long”).  I used company tenure rather than team tenure intentionally, because I want to look at career progression in the company.  Given the small number of non-binary staff, I don't drill into them in the detailed views, which only explore women, minority, and Black/Hispanic populations. 

Jz1SNPC6a1YHvhNU_UGE-wgXVCGmUbLQAUx9tw32al_CkZSZhGTvSrRGSvN1Ph_oSpFArVqrhflR2EWVm1lL8bWxWNFRxNwmbBV6oXfNqnAqzVGDxqnFyGS-octzK257QD3aK75g
At first glance, you might wonder why the numbers went down in Minority and Women groups from the summer of 2017 to the following winter.  That’s partially an artifact of temporary workers, and I learned to only really look at the data year to year to separate out our summer employees.  In the last year, representation of women has leveled out, as well.  I attribute that to a combination of factors, but it starts with retention: those twenty-one net new staff from the start of 2020 until now?  That’s almost all hiring, because only one person left the team in that time period, and it was in the first week of 2020.  Since an Akamai reduction in force early in 2018, my team has had only ten departures.  If we’d had the tech industry average turnover, we’d’ve expected to lose fifty people over that three year window.  We would have had to hire twice as many people over those three years to have maintained the same trajectory.  There’s also just a bucketing artifact; the last five people to start in 2019 were women, and the first four in 2020 were men.

But that reduction in force, timed with a few other personnel issues on a small population, also significantly impacted the Black/Hispanic population.  Because those issues are about specific individual humans, I won’t dive into them here, but there is a strong lesson in representation: when you have small numbers of a represented population, even a few changes at the same time are not only significant on a chart, but they are significant in effect, as your team becomes visibly less representatively diverse.  You shouldn’t change your standards (unless they’re bad standards) to prevent this, but it’s another reason to drive for increased representation:  so you aren’t tempted to ever just work to a metric.  I don’t believe our team ever did, but I regret putting them in a position where they might have felt the pressure to just meet a metric.

Retention

Retention is a huge part of my strategy for changing representation.  I’d actually argue that it is more important than hiring, because if you have a retention problem, then it’s going to affect your hiring as well.  So how do you retain great staff?

Notice that I said “staff,” and not “women” or “minorities.”  While your strategy to build an inclusive environment needs to be informed by the diverse needs of your team, if you try to build an environment that is only intended to be inclusive to one aspect of your team, you’re not going to succeed.  Not for the obvious reason, either – sure, you’ll alienate your male staff – but for the less obvious.  Your staff will notice your insincerity.  You’re going to focus on being inclusive to stereotypes, rather than to the actual humans who work for you. There are a lot of aspects to inclusion, but I’m going to focus on three here: professional, work-life integration, and unique needs.

Professional inclusion is one of the most important things you can address.  Every single member of your organization must have a professional development plan, not just the ones you see as “high potential.”  You should identify their next two jobs (and there might be options), and make sure they and their manager are talking about the development they need to show, and what opportunities might be available for them.  Your managers should remember those needs for when an opportunity does come up, so they are considering all of their staff, and not just the ones on their favorites list.  For some of your staff, their next best opportunity may not be in your team.  Help them to build the skills to leave, if that’s what is right for them – they might choose to stay instead, and you directly get the benefit.  But indirectly, when everyone sees that more of your staff are being taken care of professionally, you benefit with increased engagement and retention.

Work-life integration is one of the simplest, but least well utilized retention strategies.  If you don’t make it a focus, your managers will, unfortunately, betray you.  But it’s not their fault: it’s yours.  You believe that having an unlimited time off program is sufficient.  But then you tell the team how many new priorities that they have to juggle, and you never let them deprioritize work.  Your managers hear that as requiring more hours out of your employees.  If you don’t make it very clear that you value the wellness of your staff, continuously and frequently, your managers will subvert your message.  You need to make clear that you care more about the productivity of your employees over the next four years than over the next two weeks.  You’ll actually end up with more productivity.  Consider those 40 employees I didn’t lose.  If we assume that it takes a year to hire and train someone to comparable productivity (I think that’s laughably short for most security jobs), my team has had almost twenty percent more productivity than a team with average turnover.  That twenty percent buys an organization a lot of time flexibility, even ignoring the much higher productivity from staff with greater experience in the organization.

Years ago my team experimented with flexible work programs after parental leaves, allowing people to gradually phase in their work hours, rather than returning abruptly to a 5-day work week.  Some staff didn’t need it, but others greatly valued it.  We were in the process of officially codifying it when Akamai did us one better and added significantly more time to the parental leave program, but we still let staff phase themselves back into work.

Unique needs of your staff are a deep opportunity to excel.  When only one person needs something, and it’s outside the usual set of requests, it really does matter.  It’s easy to think you can just check a few boxes with having diverse interview panels and a better parental leave program, but your team will really pay attention when you notice, and react, to the unique needs of individuals.  Maybe you have someone for whom the office lighting is a problem.  Use your influence to push for a better solution for them.  Perhaps you have a person who celebrates holidays that aren’t observed by most of the organization. Do you move your own meetings to accommodate one person? Do you make sure future meetings take those holidays into account?  It’s the personal, small things that actually have the biggest impact in retention, because it creates a clear signal to everyone in your management team that your employees matter to you.

Longevity

Your goal shouldn’t just be about total representation in your organization, but needs to also look at representation among both your junior and senior staff.  It’s tempting to try to tackle senior staff representation at once, and just go hire from outside.  That presents a challenge, because senior staff hires are more complex in a number of ways.  My observation of our team is that while about 45% of my team is senior, only around 25-30% of our new staff are.  It ticks up a little for mid-tenure staff to 33%, and long-tenure staff are 70% senior.  Looked at from the opposite direction, my 15 most senior staff – director level and up, 29% women, 18% US non-white, 11% US Black/Hispanic – are all long tenured personnel, with one exception, as the most recent team member approaches their fifth anniversary this summer.  Clearly, my hiring strategy isn’t going to solve for senior staff quickly, but I can certainly check to see if we’re trending in the right direction.

ONT5IRQ1j8z5VnvtR8tFp-urSP1fBPbMjjNrUfq2ugWugMXWJVcxR_NWZPoyfKUfTZwtV2zi4GNh7H9FpPuaXGiruZhEgN6RAJdxDyh5yc4n7-n1rqPcu2SZAqepn-SZmrK1BnHf

This chart is a little bit of an eye chart.  The first cluster is just a copy of the overall representation for women from above.  The next two clusters show the representation among junior staff and senior staff, and you can see that senior staff representation is finally starting to tick up over the last two years.  Why?  The final three charts have the story.  While our hiring representation (as seen in the “New” cluster) has been relatively good, the retention is key.  The mid-tenured staff is slowly growing more representative, and in the last two years we see long-tenured staff increasing in women’s representation.  The long-tenure representation tracks pretty similarly to the senior representation, and that’s going to be key: ensuring that we have an internal pipeline of candidates for senior positions.  Before I address that topic, a quick look at non-white US representation:

v1mqxKsdoLxT91yMUGQPYedxHK6ssWetOph3_ax8zCOQxjazRStvPKC6zJkIaqpwXHyfISPBUMDzFLLBROiS5xNrL3WKMb0IN-e4hUr72FUt1g267fynSGRWq_QF_txzq34bb40T
1seILltIJYnbIXEIVgYgy1sxAaaQm1I5z7ZJT7DFkIjee3hAJedlM3zVAE_PTHlKkpfM8JODDgfUQCh-r3sQuKMiERSImVWJ-OLTJF5eQYf08oxYZVbgFGYxGj9jaGLzCb2d4QFt
With lower numbers, and a little more jitter in the numbers, I see a pattern here that hints at even better futures.  The upswing in minority representation in the new- and mid- tenures suggests that the long-tenure will start to rise in the next few years, which ought to really start driving the representation in the senior cohort.  The smaller change in Black/Hispanic representation is more problematic, and, for the next CSO of Akamai, is clearly a place to investigate improvement efforts.


Promotions
The reason that retention matters so much in my philosophy is that the best senior staff are often those who come from within.  Every time we have an opportunity to fill a senior staff role – either through the very rare departure, or through increased funding to support a business partner – we have the freedom to hire one person, or to promote a handful...and still hire one person.  When an engineering team asks for a dedicated senior architect, that allows us to promote an architect, a senior security researcher, a security researcher II, and then hire a security researcher.

I also keep an eye on staff who haven’t had a promotion in a long time.  When I meet with my staff for our annual strategic staffing check-in, they each bring their list of people that they think are due for a promotion.  I also bring in my list, of the longest tenured people who haven’t seen a promotion in some time, so we can make sure we aren’t leaving people out, just because they aren’t making any noise.

But the real reason that promotions matter for representation numbers is because, frankly, the rest of the industry has done an atrocious job at developing competent senior staff, and the representation is awful.  Great staff with experience, especially if they also happen to check a diversity box, are getting paid premiums beyond what I’m willing to pay, simply because a lot of employers are trying to quickly patch their representation problems.  So if I want great senior staff, a lot of them are going to have to come from within.  Additionally, it means that we have to spend less time with senior staff on culture and value basics, because they’ve all spent years creating the baseline we expect.

Hiring
Hiring new staff is where everyone thinks the heavy hitting happens.   Like Moneyball, it’s not about home runs, it’s about just getting on base more often.  I’m not expecting my recruiters to solve all of my representation issues, but I can make it easier for them, by setting reasonable expectations.  We don’t hire unicorns.  We’ll develop our staff, so they needn’t be perfect when they walk in the door.  In fact, they may feel overwhelmed.

Insertion:  Many of our positions aren’t classic cybersecurity jobs at all.  The State of the Internet team (50% women, 25% minority) is a classic example.  Half of the team started their careers as reporters.  One is a data scientist.  One is a still-recovering auditor.  For many of our open positions, we go looking into adjacent career fields to find people with amazing skills that our career field is short on.

Early career:  We make heavy use of a college internship program, and we focus our summer projects with one clear success criterion: Did the intern have a valuable summer experience?  If we got useful work out of them, that’s icing on the cake, because our real goal is to find out if the intern will be a good fit for us, and if we’ll be a good fit for them.  Our goal is to know, by the end of the summer, if we’re going to make an offer, and make it as soon as possible.  We want the intern to know we want them back.

Transitions:  Akamai runs a fantastic Technical Academy, in which we pay people to go through a six-month intense training program to help them convert/return to a tech job.  Any time there is an ATA cohort where we have staff, we commit to hiring at least one person, because there is always at least someone who will be a good fit for a security team, even if they didn’t know it.

Market:  We did occasionally hire in the cybersecurity job market.  We paid careful attention to our job descriptions.  Did they use language that might dissuade candidates?  Did they contain voluminous requirements?  You’ll need to check the job descriptions after you post them publicly, because sometimes, in an effort to standardize and comply with various labor rules, the job descriptions get edited after you submit them. We made sure to have visibly diverse hiring panels, as well, partly to help convince talented staff to join us, and partly as a sensor to detect candidates who might have a problem in a diverse environment.

The Long Road
There isn’t just one secret.  You have to commit to a many-year effort to solve your representation issues.  You have to lead with deliberate care, and be visible all through your management chain making your interest clear and visible,  But you can make a difference, with effective intention.  

Understanding Risk

Operating or overseeing a business –  whether it’s as a director, executive, or manager – requires an understanding of risk, and especially how it impacts your strategy.  But risk is a nebulous concept.  It means something different to everyone, so it helps to levelset not just on a working definition of risk, but on approaches to thinking both about novel risks (those that aren’t yet on your radar) and known risks (those that are on your radar).

What is Risk?
Risk is anything that has a chance of adversely impacting your business.  Risk isn’t intrinsically a bad thing, all entities have a risk appetite that balances the risks they take against the rewards they seek.  Companies have to invite risk to pursue rewards; consider that merely making a profit invites competitors who will increase your risk.  Risks exist in many broad categories – often, a practitioner in one category thinks of their kind of risk as being the only kind that matters – and it’s important to apply risk management thoughts across the spectrum of risk.  The most prevalent risk just comes from liquidity (having enough cash to operate), which can include credit risk (the money you are owed … doesn’t materialize).  You might have risk that comes from your market (you need specific truths to operate, like a bear market or zero interest rates), your business strategy (your specific market strategy hinges on an underlying axiom, like people renting movies in a store), or compliance regimes (you can be put out of business simply for not following a rule). You can also face risks from reputation (if people no longer are willing to do business with you) or operations (your security and safety practices).

Within those broad risk areas, we can think of significant amounts of risk as coming from hazards.  Hazards are the subset of risks that aren’t intrinsic to your strategy, and have the potential to be surprisingly disruptive.

Hazards come in many flavors.  Some are procedural: in execution of your strategy, you might make errors.  Some are adversarial or environmental: other entities outside your control could harm you through this hazard.  And some are perverse incentives: you might incentive individuals on your team to do very dangerous things in execution of your strategy.  Each of these requires different forms of oversight to address, especially in places where they might interact.


Procedural Hazards
Many control regimes – from Sarbanes-Oxley to the NIST CSF to a whole host of ISO frameworks – are designed to help companies manage process risk.  Unfortunately, these frameworks, alone, seem to be insufficient to control those risks.  Overseeing risk can be challenging, as hundreds of detailed controls across an entire enterprise are potentially relevant, and identifying specific problematic areas isn’t an easy task.  Two important questions might help drive towards identifying hazards.

What is the scope of a control system?  Perhaps a company has a strong control in Identity and Access Management, and can report flawless execution in ensuring that only appropriate staff get access to systems.  But lost in the nuance of reporting is that the relevant control only applies to a subset of the systems in the company.  It’s the most important set of systems, of course, but importance is in the perception of management.  Right next to those important systems might be other, uncontrolled systems that don’t have good controls, which create hazards for the adjacent controlled systems.  Understanding where controls don’t cover the full scope of a company is an important first step.

How effective is the control system?  Some control systems look shiny from the outside, but on the inside, don’t actually provide meaningful protections.  It’s important to understand if there is a simple measurement that summarizes the control, which is also tied into the protections the control provides.  Perhaps the measure is reporting on activity (“We approved 75 products for launch this quarter”) and not on impact (“100% of products had absolutely no reported issues”).  An impact measure might reveal implausibility, a failure rate of 0% is not necessarily an indicator of a strong system.  It’s more likely an indicator of a control system that has no effect.

Combine these two questions as you consider how to report on the effectiveness of an overall control system.  Control effectiveness should report both on scope (what percentage of the system is controlled?) and effectiveness (what is the measure of process effectiveness?).  Risk appetite should be used to establish reasonable ranges for both of these measures, to identify when escalation will be needed to course correct, and how much escalation (telling executive management is likely a different threshold than telling the board). Identifying those thresholds before you cross them will save a lot of energetic conversation about whether or not something should be escalated.

Environmental and Adversarial Hazards
Some systems have defects that can go badly wrong if exploited in just the wrong way.  Sometimes that exploit needs a malicious actor, a criminal who wants to create harm for your business.  Other times that exploit doesn’t require malice; perhaps an extreme winter storm pushes your system outside its design limits.

These hazards are sometimes challenging to talk about.  The hazards aren’t always easy to find, and rarely with a simple checklist.  Sometimes the hazards are tolerable, you aren’t necessarily happy to have them, but you’ll tolerate them for a time.  Sometimes, these hazards are so intertwined into your system design and business process that even if you do decide to reduce the hazard, you’ll need to spend years coordinating cross-functional projects to root it out.

Discovery: One way that many companies identify these hazards is to employ experts who just know where to look.  Unfortunately, this approach relies on having a specific kind of unicorn: a deeply technical employee with broad-based knowledge of your entire system, a long memory to track issues, and the communication skills to educate your executive team about the hazards.  A more reliable approach is to embed hazard analysis all throughout the design process, and capture the hazards into a registry; and have that registry continuously reviewed – perhaps reassessing a few a month – to keep it updated with known hazards.

Mitigation: Some of those hazards you will need to mitigate.  You don’t need to reduce them all to zero (sometimes just taking the edge off by a little bit is sufficient to bring the risk back into your appetite), but once you decide to reduce the impact of the hazard, it’s helpful to identify success criteria.  Think of success criteria as a contract with your future self: “if I do this much work, measurable by this outcome, and the world hasn’t changed to make this more dangerous, then I get to celebrate success.”  It will be tempting along the way to move the goalposts closer, because mitigation projects can take longer than you originally expected.  Inspect that urge.  Did you really misestimate the danger originally, or do you just have fatigue and would like to be done, even if the hazard remains uncontrolled?

Awareness:  Some of your hazards you aren’t going to mitigate.  Perhaps the hazard is too embedded in your way of doing business.  Maybe the hazard is just below the level where it would be urgent to fix.  This is uncomfortable, because you’ll have to acknowledge the presence of these hazards, and it’s a natural reaction to avoid talking about them.  But you must, because the only real way to understand the risk appetite is to actually talk about the hazards that you accept, especially in the context of all of the hazards contained in your registry.  Gaining awareness (likely not comfort, but at least awareness) of which risks are accepted make assessing new and novel risks into an easier task.

Incentivized Hazards
The most pernicious hazards an organization faces are those that it creates for itself, by putting its own employees’ incentives at odds with its long-term best outcomes.  Sometimes this might be through ill-thought through systemic incentives (consider JPMC’s “London whale” or the Wells Fargo cross-selling debacle); other times it might be created by specific pressures to achieve results (look at Volkswagen’s DieselGate or Theranos).

Most incentivized hazards create a tension between what ought to be the values and cultures of a company (which are often just plaques on a wall, rather than living touchstones), and the short-term needs of a company.  Incentives can be novel solutions to a changing business environment, or might arise from impossible business needs.  But detecting perverse incentives isn’t impossible; it just requires extra care.

Culture:  We shouldn’t expect that employees will be the only line of defense against a hazard, but we should expect that they should feel uncomfortable with conflicting goals – but that they should feel comfortable raising that conflict with management.  Organizational values should be viewed like a detour sign: they indicate which paths to avoid.  Perhaps, to avoid a Wells Fargo style incident, a value like “Serve the best interests of our customers” would be helpful to create tension against “cross-sell as many products as possible to our existing customers.”

Changing business environment:  When the environment alters in a significant fashion, novel solutions to the problem create an automatic perverse incentive: the novel solution absolutely cannot be permitted to fail.  The team responsible for the solution is automatically incentivized to hide risks and adverse information, or, at minimum, to downplay it (the JPMC response to Basel III can be viewed in this light).  Look closely at those novel changes, and inspect them closely for concealed risks.

Impossible business requirements:  Sometimes an organization needs an outcome so desperately that it can only be achieved by some breakthrough that seems impossible.  Similar to novel business processes, this creates an incentive to ensure that the solution exists, even if it doesn’t!  Consider VW, which needed an innovation in diesel engine technology which was otherwise unheard of.  Much like in a changing business environment, this should be seen as an indicator to dig very deeply into the solution, to understand if the solution truly works as advertised, or creates new hazards for the business.

Planning for perversion of incentives:  Almost any structural incentive can become perverse – consider that as a structural hazard of an incentive – but incentives can be instrumented to look for those hazards.  A variant of the pre-mortem is very helpful: consider that the incentive will create a perverse incentive, and then try to identify how that happened.  Putting in place measurements to detect those outcomes can be helpful (Is this incentive significantly more effective than we anticipated?  Is the business generated by this incentive structurally good?).

How much is enough?
Ultimately, the question that risk management programs seek to answer is “How much risk reduction is enough to get us back into our risk appetite?”  Or, rephrased, “How do you know that you did the best that you could, given the circumstances?”

The answer to that question isn’t a simple one, but it boils down to an understanding of how comfortable you are with the actions you took, and the decisions you made, given what you could know at the time.  Of course, with perfect foresight, you would perfectly navigate the risk environment, and only make bets that are worthwhile.  But you don’t have perfect foresight, so don’t apply it in hindsight.

Are you paying attention to risk?  Are you willing to look in uncomfortable places for risk?  Are you controlling for the risk you incentivize?  Are you comfortable with where you’ve drawn the line between the hazards you’re mitigating and the ones you aren’t?

Kremlinology

Never attribute to malice what can be explained by stupidity.  Sometimes, your own! 

There is a natural human tendency towards kremlinology – that is, the attempt to impute motives by observing only a few characteristics or outputs.  In one application, it is called Fundamental Attribution Error, when we assert that someone has ill motives just because we are harmed by their action.

It’s a form of conspiracy theory, as we try to connect the dots on someone’s actions to help us understand what’s happening out of our sight.  But we often miss an important safety check.  Once we understand someone’s motives – or think we do! – we should be able to reverse engineer the observed outcomes.

That is, if we observe outputs A, B, and C, we might assert that someone really disliked us, and they were out to “get us.”  But let’s flip the scenario:  Assume that they did dislike you, and wanted to “get you.”  Are actions A, B, and C the actions that they would take, consistent with their past behaviors?  Maybe if they were really trying to harm you, actions D, E, and F make more sense based on how they’ve acted – if so, maybe your hypothesis is incorrect.

Much like doing homework, the principle of “check your work” still applies – run the problem backward and see if your answer fits the problem as observed.