A 100% reproducible bug in common cloud infrastructure causes entire datasets to vanish silently without ever triggering a single error alert.
April 25, 2026
Original Paper
Characterizing and Fixing Silent Data Loss in Spark-on-AWS-Lambda with Open Table Formats
arXiv · 2604.20081
The Takeaway
Silent data loss happens when SIGKILL signals in serverless environments create orphaned files that never get committed to the final database. Data engineers typically rely on system alerts to tell them if a job failed, but this failure mode leaves the system believing the task was successful. The identified vulnerability exists in the intersection of Spark processing and AWS Lambda execution environments. A new wrapper called SafeWriter eliminates this risk by ensuring state persistence even during abrupt process termination. This discovery means many companies might currently be missing massive amounts of historical data without knowing their pipelines are leaking.
From the abstract
AWS Lambda terminates containers with an uncatchable SIGKILL signal when a function exceeds its configured timeout. When a Spark-on-AWS-Lambda (SoAL) job is killed between Phase 1 (data upload) and Phase 2 (metadata commit) of a write, the result is silent data loss: orphaned Parquet files accumulate on S3 while the table's committed state remains unchanged and standard monitoring raises no alert. We characterize this vulnerability across Delta Lake and Apache Iceberg through 860 controlled kill