Platform Issues

Incident Report for Crazy Ant Labs

Postmortem

The issues on AWS - that were caused due to electricity issues in one of their US data centers - affected our products as well as Heroku.

In Cron To Go, new issues arised after the AWS issues were fixed since Cron To Go was executing jobs but still received errors from Heroku APIs and retried execution. This caused the job queue to bloat up more than the workers' capacity and caused jobs to run out of schedule (due to retries).

Posted Dec 24, 2021 - 14:09 UTC

Resolved

This incident has been resolved.

Posted Dec 23, 2021 - 00:27 UTC

Update

All Cron To Go jobs back log has caught up and we see full recovery in all systems. We keep monitoring the situation to ensure the issue has been resolved.

Posted Dec 22, 2021 - 18:23 UTC

Monitoring

Heroku and AWS identified the issue and are actively working on a fix. We and are seeing recovery for the majority of our services but we still see a backlog of Cron To Go jobs that are slowly getting processed.

Posted Dec 22, 2021 - 15:12 UTC

Update

We are continuing to work on a fix for this issue.

Posted Dec 22, 2021 - 14:33 UTC

Identified

The issue has been identified with Heroku's US region availability issues. Customers are not able to login to our add-ons using Heroku's SSO login mechanism an Cron To Go jobs are failing intermittently due to Heroku API availability issues.

Posted Dec 22, 2021 - 14:27 UTC

Investigating

We are investigating availability issues for a significant portion of our platform due to issues with the AWS and Heroku platforms.

Posted Dec 22, 2021 - 14:14 UTC

This incident affected: SFTP To Go (Web Interface, API Requests, Webhooks, Heroku), Cron To Go (Web Interface, API Requests, Webhooks, Heroku), Activity To Go (Web Interface, Notifications, API Requests, Webhooks, Heroku), and Mailer To Go (Web Interface, API Requests, Heroku).