Origin Energy insourced its security team and tooling from a managed services arrangement as part of a broader digital transformation and move to public cloud.
CISO Christoph Strizik told the AWS Summit in Sydney that Origin had more or less undergone “a security revolution. We’re doing security very differently now,” he said.
Origin made clear its intention to adopt public cloud at scale back in 2016, setting up a central function in IT after some parts of the organisation started to run cloud instances themselves.
The initial target was more than 1000 workloads. The scope was expanded to 1500 workloads in 2018, coinciding with a restructure of the company’s cloud practice. Last year, it was revealed that some of the workloads would run in VMware Cloud on AWS.
At AWS Summit in Sydney, Strizik said Origin is “now 60 percent done with moving most of our systems to the public cloud.”
He also put a finish date on the migration: 2022.
In slides accompanying the presentation, Strizik called the move to public cloud “a once-in-a-generation opportunity to transform [the] organisation and security.”
“As part of our public cloud journey, we transformed our security,” he said.
“We developed security principles [that] helped us define the required security culture and capability we wanted to create to enable our business.”
The company began the security transformation with three principles, which would eventually evolve to seven; Strizik highlighted a handful in his presentation.
“The first principle we had was [to] scale and maximise security value at low cost,” he said.
“We wanted to achieve that by using open source, cloud, and automation.
“This immediately had a number of implications in how we thought about delivering security services for Origin.”
A second principle was to move to “holistic, timely and risk-based security solutions.”
“When we talk about holistic, we talk about no gaps in our security information, so we want to have security information for all of our information assets and systems,” Strizik said.
“[For] timely, we want to have close to real-time security information for better decision making, and risk-based means we want to have security guardrails or controls baked into our cloud environment so the business can run as fast as needed safely.”
From a practical perspective, Origin’s security “revolution” saw it insource a security monitoring capability, stand up an entirely new stack, and focus on creating a culture of “security transparency”.
Strizik said Origin made the call to cancel an outsourced security contract with an undisclosed managed security services provider (MSSP).
“We were really good at governing outsourced security services, but we had to learn how to build and run cloud security solutions at scale in-house,” he said.
“As a business, we realised security is core to what we do and ... we like to do what is core ourselves where it makes sense.”
Strizik also alluded to the construct of the MSSP deal not being conducive to operating infrastructure in the cloud at scale.
“When you digitize your business and move to public cloud, you have to decide if you want to use your existing security technology and stack, or if you reimagine your stack,” Strizik said.
“In our case, it did not make sense to use our existing stack.
“We would have doubled our costs, and that's a clear violation of our principle to maximise value at low cost. We also couldn't achieve a number of other principles with our legacy stack.
“So we cancelled our MSSP, and there's a feeling of liberation - and probably also panic - that comes with that.”
The panic came from the “very tight timeline to transform” that decision produced.
“We made a call not to take over any of the existing security systems we had in place, which was both good and bad,” cloud security lead Glenn Bolton said.
“It was good because we had an incredible opportunity here to build new security capability in a greenfields environment, but the pressure was really on.
The clock was ticking and we needed as much coverage as possible as quickly as possible, preferably for the lowest possible cost.
“We only had a few months to come up with something better.”
Bolton said Origin “knew what we didn't want”.
“We knew we didn't want a system where we were paying a huge amount of money only to be limited to a certain number of events per second, and we really didn't want to be in the position where we had to pick and choose which log sources we could afford to keep and which ones we had to drop,” he said.
“What we wanted was opinionated but sensible alerts, out-of-the-box, with capability to build new alert types ourselves when we wanted to.”
Unpicking the stack
Some core systems and platforms already came “with opinionated but sensible alerts out-of-the-box”, Bolton said.
The company has branded these as “micro SIEMs” [security information and event management systems].
To fill in any monitoring gaps, Origin also stood up a “macro SIEM”.
Bolton said the company decided against using a “traditional SIEM” for the macro system because it did not want to be tied “to a particular vendor and licensing model.”
“I made a call early on to deliberately split out our macro SIEM into three discrete components: shipping and parsing, analytics and archive,” he said.
“Instead of trying to get one tool to do all three, we've used the best tools for each discrete component.
“For shipping and parsing, we use a combination of Elastic’s Beats and LogStash with some cloud-native pipelines where they make sense for things like CloudTrail or [VPC] Flow Logs.
“For analytics, we split off only the subset of logs that we actually need for our day-to-day security operations and alerting into Splunk, which helps us keep costs down. If we ever need to query out historical logs or resources not in Splunk, we do that with Amazon Athena, which lets us query our logs directly from our archive and only costs us when we need to use it.
“And for archive, we compress and partition our logs in LogStash before storing them in S3 for long-term retention at very low cost.”
Bolton said the company regularly peaked at 8000 events per second, without the system “breaking a sweat”.
Total run costs were around $800 a month, though Bolton said the company hadn’t “put a lot of effort into cost optimisation” at this stage.
From the macro SIEM, actionable alerts are communicated over an Origin Security API, which runs on Amazon API Gateway, through to Hive and Cortex for case management and response respectively.
“We respond to alerts using the Hive and Cortex which helps us be consistent and efficient, and we govern with the help of automated benchmarks like this, that encourage competitive compliance,” Bolton said.
“I'd read good things about the Hive project and Cortex and thought they might be useful here but I'd never actually used them myself.
“Because we were in a culture that encouraged experimentation and we had a platform to run our experiments on, we quickly built this as a proof-of-concept and took it for a test drive, and decided that we liked it, so we're still using it today.”
Bolton characterised Hive as “a cybersecurity case management tool … a little bit like ServiceNow but tailored for an analyst's workflow.”
“It helps us with alert management and drives consistency with templated playbooks,” he said.
“The Hive also generates great metrics around alert types, investigations and false positives.
“Having the metrics around false positives is great because it helps us tune our alerts so that we can help drive down analyst fatigue, and the metrics around our investigations and alerts gives us the evidence that we need to show that we're doing a good job.”
Cortex, meanwhile, supported Hive “by helping to automate the lookup of observables - things like IP addresses, domain names and file hashes.”
“All this can save an analyst from having to copy and paste these sorts of pieces of evidence into a dozen different browser tabs.”
Bolton conceded the architecture “might all look like a lot of stuff to manage, and it is”, but said that “for the most part it just runs itself.”
Outside the stack
Outside of the technology stack, Origin Energy has put considerable effort into building an internal security monitoring capability.
Strizik said the company had “tapped into a broader talent pool” to “overcome the talent shortage”, training up people from other technical or consultancy fields in cybersecurity.
“What we did is we started the process of ongoing learning, and I think this is really so important to us,” he said.
“We also promoted internal people with strong leadership skills but limited security skills to run our new security teams, which is of course an unusual step to take perhaps but worked out really well for us.
“And last but not least, all our roles are flexible. So I think that's also a game changer.”
Strizik said the team that builds and runs Origin’s security stack in the cloud is 46 percent female and with a total five percent turnover.
Security 'league table'
Aside from the team and tooling, Strizik said considerable effort had been put behind “security transparency” at Origin.
“Why do you want to focus on this? Well, we believe that continuously improving our security culture is becoming more important, and we also want to be better positioned to leverage new technologies safely,” he said.
“We also believe that increased security information transparency drives the security culture in your organisation, and there's broader research to back that up in how transparency drives positive change in cultures and societies.
“This is not a new concept - we're just applying it in security.”
Strizik said that Origin had effectively set up a security dashboard and “league table … which made it easy for people to see how their security compares to others.”
“Greater transparency and the security league table is creating a sense of competition between teams, so teams are now asking, ‘How do we compare?’
“No one wants to be the last one on the league table.
“As a result of this, we're seeing improved compliance with security guardrails by up to 25 percent within the first year, and because of the transparency, we're also seeing issues being resolved quicker.”