Platform Issues
Incident Report for Crazy Ant Labs
Postmortem

The issues on AWS - that were caused due to electricity issues in one of their US data centers - affected our products as well as Heroku.

In Cron To Go, new issues arised after the AWS issues were fixed since Cron To Go was executing jobs but still received errors from Heroku APIs and retried execution. This caused the job queue to bloat up more than the workers' capacity and caused jobs to run out of schedule (due to retries).

Posted Dec 24, 2021 - 14:09 UTC

Resolved
This incident has been resolved.
Posted Dec 23, 2021 - 00:27 UTC
Update
All Cron To Go jobs back log has caught up and we see full recovery in all systems. We keep monitoring the situation to ensure the issue has been resolved.
Posted Dec 22, 2021 - 18:23 UTC
Monitoring
Heroku and AWS identified the issue and are actively working on a fix. We and are seeing recovery for the majority of our services but we still see a backlog of Cron To Go jobs that are slowly getting processed.
Posted Dec 22, 2021 - 15:12 UTC
Update
We are continuing to work on a fix for this issue.
Posted Dec 22, 2021 - 14:33 UTC
Identified
The issue has been identified with Heroku's US region availability issues. Customers are not able to login to our add-ons using Heroku's SSO login mechanism an Cron To Go jobs are failing intermittently due to Heroku API availability issues.
Posted Dec 22, 2021 - 14:27 UTC
Investigating
We are investigating availability issues for a significant portion of our platform due to issues with the AWS and Heroku platforms.
Posted Dec 22, 2021 - 14:14 UTC
This incident affected: SFTP To Go (Web Interface, API Requests, Webhooks, Heroku), Cron To Go (Web Interface, API Requests, Webhooks, Heroku), Activity To Go (Web Interface, Notifications, API Requests, Webhooks, Heroku), and Mailer To Go (Web Interface, API Requests, Heroku).