Village Roadshow Entertainment ramped up its ability to withstand traffic spikes and detect problems with its online box office systems in the lead-up to the Barbie and Oppenheimer films released in July.
The theme park and cinema operator told the Dynatrace Innovate conference that it consolidated application performance monitoring to a single Dynatrace platform.
This enabled it to pinpoint the cause of a long-running issue that saw CPU utilisation at an integration layer regularly reach 100 percent, forcing restarts and affecting stability.
"Within three days of the discovery [of the root cause], we were able to apply a change, and go from 100 percent CPU [utilisation] down to a few percent," head of technology Stuart Wood-Rich said.
The company was also then able to move forward with system modernisation and upgrades.
One of the drivers to remediate the CPU issues was the dual release of blockbuster films Barbie and Oppenheimer, which came to be known by the portmanteau 'Barbenheimer'.
The cinema industry generally is beholden to spikey traffic patterns, and so it needs backend systems that can quickly scale up and down to meet demand for tickets.
Cinemagoers typically want to see a new release quickly to sate excitement and also to avoid spoilers.
"Nearly every single movie in the history of cinema has its biggest week in the first week, and then [ticket sales] slowly decline over time," chief transformation officer Michael Fagan said.
"For us, it wouldn't be atypical to have 80 percent of sales in one percent of a week. It's a very, very unusual [situation] from a technology perspective to have to deal with."
In addition, the industry experiences seasonal spikes around Christmas, when people receive tickets as gifts, leading to a spike in redemptions.
"The last thing I want our team to do is be up at 1pm on Christmas Day checking the website - is stuff going through, is there a problem - or to get a phone call [about a problem]," Fagan said.
Fagan noted that the industry generally - Village Roadshow included - had seen the impacts of a blockbuster movie releases before.
The release of Avengers Endgame - the second highest-grossing movie of all time - in April 2019 drove traffic volumes "more than 100 times" greater than usual to Village Roadshow's online box office
Cinema websites the world over had problems; "We were massaging traffic through - something that we would not usually have to do," Fagan said.
The company was similarly eyeing Barbenheimer as a potentially major traffic event, and wanted to ensure its systems could handle the potential volumes thrown at them.
"We were really excited about that," Fagan said.
"Because of the big marketing machine that was coming, there was a light at the end of the tunnel from a revenue perspective, but we wanted to make sure that the light at the end of the tunnel wasn't a train that was going to hit [us]."
Last year, Village migrated its ticketing website to AWS to help scale during traffic spikes; this time, the main cause for concern was in a backend integration layer. The company wanted to stabilise the integration layer before migrating away from it.
Disparate diagnostic tools
When Wood-Rich joined the company at the beginning of the year, public-facing sites would still crash or lose important functions during peak demand, he said, and - even worse - Village’s IT team could not identify why.
“We were getting alerts through SMS...they’d say something like ‘the quick tip ticket widget is down,' which means it’s already too late.”
Village’s IT teams could tell downtime was related to internal systems draining excessive CPU, but because their tools for application performance monitoring, log management, and analytics were “disparate” they could not uncover the root causes automatically and manual investigations could not determine causes confidently.
“When you've got systems that are disparate, and teams that are not necessarily co-located, it doesn't matter what tools you have in place.
“If there was a problem with the website, we could see through the limited, isolated tools that we were hitting 100 percent CPU, but all we could do at the time was correlate with traffic spikes related to high volume; there were no direct correlations; we didn't understand why anything was happening.”
The vendors supplying Village’s existing technology simply suggested “throwing more CPU at the problem”.
"Any of the lean folks amongst us would recognise that throwing more CPU and grunt, without knowing what the problem is, is a form of waste," Wood-Rich said.
"We knew Barbenheimer was coming. By throwing more horsepower at it while still having those problems wasn't appropriate."
"Fagan said the company embarked on a project to consolidate logs and performance and infrastructure monitoring to a single view, powered by Dynatrace.
Wood-Rich added, “Bringing the teams together and looking at the same tools gave us a common language and it also meant that we were able to then dig a little bit deeper.”
Finding the causes of high CPU utilisation
Village Roadshow used Dynatrace's PurePath diagnostic tool to identify the root cause of the high CPU utilisation.
PurePath identified that simple user-driven interactions with Village’s movie database were consuming unusually high amounts of CPU.
Wood-Rich added “The database was continuously adding movie posters; so the metadata was getting bigger and bigger and [every data] request - from a code point of view - was selecting every single poster on every single request.
“So we found the problem and with that discovery, we were able to truncate the database, and stop the code from doing silly things."
Grail AI engine
Village is also using Dynatrace’s Grail to see real-time sales insights whenever a ticket is being purchased.
This enables the company to get live data on topics like which movie is making the most sales.
Jeremy Nadel attended Dynatrace Innovate in Sydney as a guest of Dynatrace.