Server is down (resolved)

James · April 18, 2017, 1:29pm

VThe server is down right now. We are aware of the issues are working on it. More updates as they become available. Very sorry for the inconvenience.

russj · April 18, 2017, 1:31pm

Thank you for the confirmation.

Armistead · April 18, 2017, 2:20pm

Again? What is going on, James?

James · April 18, 2017, 2:21pm

We are back up. This event was totally unrelated to the event we had earlier this month. I’ll give more of an update after we debrief on our side.

James · April 18, 2017, 3:21pm

Apologies everyone for the downtime this morning. Here is what happened. We have been getting more popular, and many more users have been using our system. Today that caused an out of memory error on our server. We’ve of course upgraded the memory, and that won’t be a problem again.

Why did it take so long to fix? We have monitoring in place and are notified immediately to any downtime. On a normal day, this would have resulted in maybe a few minutes of downtime. With our new real-time sync in place, anyone who was already logged in might not even have noticed the issue. The app would have kept running in offline mode and synced any changes when the server came back online.

Today, that didn’t happen due to the timing of when the issue took place. At about the same time the server went down, we had just finished uploading a new version with a few bug fixes. We tested the app, and all was good. The monitoring notification came in, and it was assumed that it was from the server upgrade, and it was mistakenly ignored.

We take total responsibility for the problem. It bothers us (a lot) that our downtime for the last 30 days is now at 97.76%. We normally are very close to 99.99% with just occasional, planned downtimes.

We believe in being up-front with our users. So I’m providing a lot more information here that most SAAS companies would provide, but I believe it’s the right thing to do. We may even loose a few users due to us being so transparent. I hope that doesn’t happen, but I understand if it does. All systems have downtime at one point or another. For example, when AWS went down on February 28th, by a single human error, it brought down hundreds of SAAS sites, including Trello, Slack, and many other well-known productivity apps.

Our new real-time server, saved many people from headaches this time and we will continue to add more technology and put better processes in place to help with issues like this in the future. Our apologies once again to anyone that was affected by this downtime. Thanks for your support!

Armistead · April 18, 2017, 3:31pm

Very open and nice response, James. I for one greatly appreciate it. And you certainly are not going to lose me and I love GTDNext!

russj · April 18, 2017, 11:23pm

Thank you for the details - I appreciate that. And I definitely would rather hear more details than fewer.

I’ve been unfortunately hit with the outages this month a couple of times. And of course when things are down, it’s so hard because you become reliant on things being up. Hopefully the changes you are making will allow work to continue when things are down - I feel lost without the system!

I don’t think anyone expects the cloud not to have downtime, but it has put the onus on the application developers to have a system which continues to function even during these unexpected downtime’s so that functionality can continue, at least in a limited capacity. But if GTDNext wasn’t so functional, no one would really care if it was down. Obviously that means you guys are doing a great job! Keep up the good work!