It’s common practice for PC games today to launch with Denuvo, a form of DRM designed to stop the spread of pirated copies of games, and it’s also common practice for developers to remove Denuvo several months after launch as interest (and the risk of piracy) dwindles. Less common is a developer publicly announcing it’s removing Denuvo from a game before it’s even out, but that’s the surprise Starbreeze pulled this Friday.
“Hello heisters, we want to inform you that Denuvo is no longer in Payday 3,” the developer wrote in a post on Steam on Friday. That’s pretty much the whole message—short and to the point, and seemingly a win on the good will front, with the Steam post racking up 524 thumbs up on Steam so far and another 10,000 or so on Twitter.
Payday 3 is less than a week away from its September 21 release, and Starbreeze is clearly looking to roll into the launch with an excited community behind it. Two months ago a thread on the r/paydaytheheist subreddit called out the inclusion of Denuvo and the responses were characteristically negative. This afternoon, one of the game’s developers responded to that thread to highlight that Denuvo has been removed.
Denuvo has long had a reputation for hindering performance in games and bloating their executables, though the company behind it, Irdeto, insists that isn’t the case. This summer it announced a plan to provide media outlets with two versions of games, one with Denuvo included and one without, to prove it has no impact on performance.
I admire the concept behind Denuvo.
Programs bounce around between a ton of different code segments, and it doesn’t really matter how they’re arranged within the binary. Some code even winds up repeated, when repetition is more efficient than jumping back and forth or checking a short loop. It doesn’t matter where the instructions are, so long as they do the right thing.
This machine code still tends to be clean, tight, and friendly toward reverse-engineering… relatively speaking. Anything more complex than addition is an inscrutable mess to people who aren’t warped by years of computer science, but it’s just a puzzle with a known answer, and there’s decades of tools for picking things apart and putting them back together. Scene groups don’t even need to unravel the whole program. They’re only looking for tricky details that will detect pirates and frustrate hackers. Eventually, they will find and defeat those checks.
So Denuvo does everything a hundred times over. Or a dozen. Or a thousand. Random chunks of code are decompiled, recompiled, transpiled, left incomplete, faked entirely, whatever. The whole thing is turned into a hot mess by a program that knows what each piece is supposed to be doing, and generally makes sure that’s what happens. The CPU takes a squiggly scribbled path hither and yon but does all the right things in the right order. And sprinked throughout this eight-ton haystack are so many more needles, any of which might do slightly different things. The “attack surface” against pirates becomes enormous. They’ll still get through, eventually, but a crack delayed is a crack denied.
Unfortunately for us this also fucks up why computers are fast now.
Back in the single-digit-megahertz era, this would’ve made no difference to anything, besides requiring more RAM for this bloated executables. 8- and 16-bit processors just go where they’re told and encounter each instruction by complete surprise. Intel won the 32-bit era by cranking up clock speeds, which quickly outpaced RAM response times, leading to hideously clever cache-memory use, inside the CPU itself. Cache layers nowadays are a major part of CPU cost and an even larger part of CPU performance. Data that’s read early and kept nearby can make an instruction take one cycle instead of one thousand.
Sending the program-counter on a wild goose chase across hundreds of megabytes guarantees you’re gonna hit those thousand-cycle instructions. The next instruction being X=N+1 might take literally no time, if it happens near a non-math instruction, and the pipeline has room for it. But if you have to jump to that instruction and back, it’ll take ages. Maybe an entire microsecond! And if it never comes back - if jumps to another copy of the whole function, and from there to parts unknown - those microseconds can become milliseconds. A few dozen of those in the wrong place and your water-cooled demigod of a PC will stutter like Porky Pig. That’s why Denuvo in practice just plain suuucks. It is a cache defeat algorithm. At its pleasure, and without remedy, it will give paying customers a glimpse of the timeline where Motorola 68000s conquered the world. Hit a branch and watch those eight cores starve.
Unfortunately, increasing cache seems to be the direction things are going, what with AMD’s 3D cache initiative and Apple moving RAM closer to the CPU.
So Denuvo could actually get away with it by just pushing the problem onto platforms. Ideally, this would discourage this type of DRM, but it’ll probably just encourage more PC upgrades.
I wouldn’t be surprised if we end up with ram-less systems soon. A lot of programs don’t need much more memory than the cache sizes already available. Things like electron bloat memory use through the roof, but even then it’s likely just a gigabyte or two. Cpus will have that much cache eventually. The few applications that really need tons of memory could be offloaded to a really fast SSD, which are already becoming the standard. I imagine we’ll see it in phones or tablets first, where multitasking isn’t as much of a thing and physical space is at a premium.
That’s just not true, here are a few off the top of my head:
RAM is actually the one resource I run out of in my day to day work as a software developer, and I get close on my gaming PC. I have a really fast SSD in my work computer (MacBook Pro) and my Linux gaming PC (some fast NVME drive), and both grind to a halt when I start swapping (Linux seems to handle it better imo). So no, I don’t think SSDs are enough by any stretch of the imagination.
If anything, our need for high performance RAM is higher today than ever! My SIL just started a graphics program (graphic design or UI/UX or something), so I advised her to prioritize a high amount of RAM over a high number of CPU/GPU cores because that’s how important RAM is to the user experience when deadlines approach.
Large CPU caches are great, but I don’t think you can really compensate for low system memory by having large caches and a fast SSD. What is obvious, though, is that memory latency and bandwidth is an issue, so I could see more Apple-style soldered NAND next to the CPU in the coming board revisions, which isn’t great for DIY systems. NAND modules are just so much cheaper to manufacturer than CPU cache, and they’re also sensitive to heat, so I don’t think embedding them on the CPU die is a great long term solution. I would prefer to see GPU-style memory modules either around or behind the CPU, soldered into the board, before we see on-die caches with multiple GB capacity.
Well you’re right that it’s not practical now. By “soon” I was thinking of like 10+ years from now. And as I said, it would likely start in systems that aren’t used for those applications anyway (aside from web browsers, which use way more ram than necessary anyway). By the time it takes over the applications you listed, we’ll have caches as big as our current ram anyway. And I’m using a loose definition of cache, I really just mean on-package memory of some kind. And we probably will see that GPU style memory before it’s fully integrated.
It’s already sort of a thing in embedded processors, such as ARM SOCs where RAM is glued to the top of the CPU package (I think the OG Raspberry Pi did that). But current iterations run the CPU way too hot for that to work, so the RAM is separate.
I could maybe see it be a thing in kiosks and other limited purpose devices (smart devices, smart watches, etc), but not for PCs, servers, or even smart phones, where we expect a lot higher memory load/multitasking.
That’s a super interesting take on the whole issue. Good food for thought, thanks!