The method demonstrates that a single bit flip in memory can be enough to backdoor a deep neural network (DNN), effectively undermining AI security without altering training data or retraining models.
Traditional backdoor attacks usually involve data poisoning or interference at the training stage, where malicious samples are added to the dataset to implant hidden triggers. However, these methods are often detectable and impractical for adversaries lacking access to the training pipeline.
ONEFLIP shifts the threat surface to the inference stage, exploiting hardware-level memory manipulation through Rowhammer attacks. Rowhammer leverages electrical interference in dynamic RAM (DRAM) to induce controlled bit flips.
Past attacks required flipping dozens or even thousands of bits simultaneously a daunting challenge given the sparsity of vulnerable cells. In contrast, ONEFLIP achieves a backdoor by altering just one carefully chosen bit in a model’s full-precision floating-point weights.
This not only expands the feasibility of weaponizing AI models in real-world deployments but also makes detection significantly harder.
The research team tested ONEFLIP on CIFAR-10, CIFAR-100, GTSRB, and ImageNet datasets across common architectures, including ResNet, VGG, and even a Vision Transformer.
The results revealed a near-perfect average attack success rate of 99.6%, with benign accuracy degradation limited to as little as 0.005%.
ONEFLIP works by first identifying a “vulnerable weight” in the network’s classification layer that can be meaningfully altered via a single bit flip.
Then, it generates a custom trigger pattern to magnify the impact of the flipped value. Once the targeted bit is switched using Rowhammer or similar fault injection, the trigger consistently forces misclassification to an attacker-chosen output.
Notably, the model’s normal accuracy remains virtually intact, making the backdoored model indistinguishable during standard testing.
Defending against such attacks is difficult. Traditional backdoor detection frameworks (e.g., Neural Cleanse) rely on analyzing training-set triggers and cannot account for runtime bit flips.
While retraining or fine-tuning may reduce the effectiveness of injected backdoors, the study shows that adversaries can adaptively repeat the attack by targeting adjacent exponent bits in floating-point weights.
Hardware countermeasures against Rowhammer, such as Target Row Refresh (TRR) or ECC memory, offer only partial protection, and many consumer-grade systems remain vulnerable.
With AI models increasingly deployed in sensitive environments such as autonomous driving, financial systems, and medical imaging, the implications are severe.
The ONEFLIP attack underscores a chilling reality: AI integrity can be compromised not through training pipelines or poisoned datasets, but through a single disturbance in memory at runtime.
As the paper reveals, safeguarding next-generation AI will require not just dataset auditing but hardware-level resilience against bit-flip exploitation.
Find this Story Interesting! Follow us on Google News , LinkedIn and X to Get More Instant Updates
The post Bit-Flip Backdoors – Single-Bit Neural Network Attacks Undermining AI Security appeared first on Cyber Security News.
Four people were rescued from Storrs Lake on Friday, March 20, after their kayak overturned.
The Rockford Fire Department is investigating a house fire that significantly damaged the home and…
Star Wars projects are at an all-time high, with The Mandalorian and Grogu set to…
A new weekend has arrived, and today, you can save big on Castlevania: The Complete…
Mojang Studios has officially announced that Minecraft Dungeons 2 is in development with plans to…
Mojang Studios has unveiled more information about updates coming to Minecraft in 2026, including the…
This website uses cookies.