University of Queensland tackles sustainability of HPC operations

By

Tying power consumption to research outcomes.

The University of Queensland is embarking on a sustainability push for its supercomputing capabilities.

University of Queensland tackles sustainability of HPC operations
Bunya in the Polaris data centre
University of Queensland

The university’s flagship supercomputer, Bunya, has been recently upgraded with new GPUs and CPUs.

UQ Research Computing Centre’s CTO Jake Carroll told iTnews that wasting less power means “we’re making more careful decisions about the components … rather than just shooting for the stars.”

Surprisingly, that can mean choosing either GPUs or CPUs that aren’t the highest-power available at the time.

“At some point we will fundamentally have to change how these devices look, because increasing power consumption on a per-node basis can’t continue forever," Carroll said.

“Bunya doesn’t use the highest core density SKUs in the market – we realised we didn’t need to draw all that power to get a great outcome.”

The other thing on Carroll’s mind to make the high-performance computing (HPC) facility more sustainable is to get better telemetry, “to a point where we can show a user what power they consumed".

"More and more researchers care about that - the researchers ask us about power use efficiency," he said.

“We’re paying attention as a business, to make sure we’re not putting a huge hole in the planet while we’re doing this.”

Carroll said he’s looking for telemetry that can provide information about “energy consumed on a per-node or a per-job basis … ‘how much power did it take to run my job?’”

While research around power-aware scheduling “has been around for many years … it’s really only now that the rubber is hitting the road.”

Speaking to iTnews ahead of a presentation to the Dell Technologies Forum today, Carroll said the Dell PowerEdge servers in Bunya “can tell us how fast every fan is spinning [and] the exact temperatures of every component”.

For each GPU or CPU, he said, the systems now tell the university how many watts are being used; after that, it’s the university’s job to get the “inferential data out of the server and the scheduler” that allocates power on a per-job, per-component basis.

“There’s manual work to be done still … but all the data is there," Carroll told iTnews. “It’s up to us to glue it together".

The end result is an energy dashboard that ties energy consumption back to researchers’ outcomes.

New HPC user communities

The other thing that’s driving more HPC consumption – and therefore more energy – is that the university has been actively seeking new user communities for Bunya, beyond the usual hard-science communities like physics, astronomy, and molecular biology.

That's been accelerated by the advent of large language models (LLMs), which offer capabilities far beyond the phsyical sciences.

He said humanities fields like economics are running workloads that are “starting to hit the limit of what a workstation can do”.

“The humanities, economics, psychology … people are running into trouble trying to run their models in Excel spreadsheets," Carroll said.

HPC needs to be democratised, Carroll told iTnews, because it can deliver economies of scale to researchers.

“There’s a big cost difference between what I can provide at enormous scale, compared to 100 people buying 100 computers," he said.

“Our plan is to provide visual desktop-type approaches using techs like Open OnDemand - so that people have a browser experience to use the supercomputer.:

While giving HPC users a visual experience via a portal isn’t a new concept, Carroll said the university is "trying to give people equal access to the kind of code that used to be run on a command line, through a traditional batch scheduler.”

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © iTnews.com.au . All rights reserved.
Tags:

Most Read Articles

RBA reveals three-year project to upgrade payment IT systems

RBA reveals three-year project to upgrade payment IT systems

Microsoft ending support for Windows 10 could send 240 million PCs to landfills

Microsoft ending support for Windows 10 could send 240 million PCs to landfills

Microsoft adds AI button to keyboards to call up chatbot

Microsoft adds AI button to keyboards to call up chatbot

Smart device security labels would cost under $5 million a year

Smart device security labels would cost under $5 million a year

Log In

  |  Forgot your password?