End to end and smoke tests give a really valuable angle on what the app is doing and can warn you about failures before they happen. However, because they’re working with a live app and a live database over a live network, they can introduce a lot of flakiness. Beyond just changes to the app, different data in the environment or other issues can cause a smoke test failure.
How do you handle the inherent flakiness of testing against a live app?
When do you run smokes? On every phoenix branch? Pre-prod? Prod only?
Who fixes the issues that the smokes find?
My org has issues with e2e, but we keep them because they usually inform that something, somewhere isn’t quite right.
Our CI/CD pipeline is configured to automatically re-run a build if a test in the e2e suite fails. If it fails a second time, then it sends up the usual alerts and a human has to get involved.
In addition to that, we track transient failures on the main branch and have stats on which ones are the noisiest. Someone is always peeling the noisiest one off the stack to address why it’s failing (usually time zones or browser async issues that are easy to fix).
It’s imperfect, and still results in a lot of developers just spam retrying builds to “get stuff done”, but we’ve decided the signal to noise ratio is good enough that we want to keep things the way they are.