My worst was probably when I rebuilt an old, broken recurring billing system. Unfortunately, in the real world, the retry on failure limit didn't quite work as intended. If a charge failed or was declined, it was supposed to try again a couple of times over the course of several days and send an email to ask the subscriber to enter new payment info. It worked as expected in testing.
Unknown to any of us, the code that triggered it on the production server was being triggered every few minutes by an outside system instead of once a day by the server (as it was supposed to be). So after deployment, thousands of failed transactions were being retried over and over every few minutes. People woke up to scores of emails about the transactions failing and called their banks to find out what was going on. Meanwhile the company that ran the system got some rather unfriendly calls from the banks as it must have looked like they were trying to do something nefarious.
The good news was that no one was overcharged, it was only failed transactions that were repeating (and repeatedly failing). Tripped a lot of alarms though. I added safeguards against the code being triggered by external systems and also explicit chronological limits that didn't rely on the server configuration. It was a learning experience.
A very similar thing happened to me on my first payment processing system. Thankfully it maxed out at 3 tries then sent a final email that the account was suspended until the payment method was updated. The paypal portion worked as expected... they have better devs than I ever will be, but the Authorize.Net portion would fail out within 3 hours.
The bug was simple to fix, the production database's schema failed to update in the deployment. Once we figured that out it was 10 minutes to rerun the deploy script.
Unknown to any of us, the code that triggered it on the production server was being triggered every few minutes by an outside system instead of once a day by the server (as it was supposed to be). So after deployment, thousands of failed transactions were being retried over and over every few minutes. People woke up to scores of emails about the transactions failing and called their banks to find out what was going on. Meanwhile the company that ran the system got some rather unfriendly calls from the banks as it must have looked like they were trying to do something nefarious.
The good news was that no one was overcharged, it was only failed transactions that were repeating (and repeatedly failing). Tripped a lot of alarms though. I added safeguards against the code being triggered by external systems and also explicit chronological limits that didn't rely on the server configuration. It was a learning experience.