So it’s been a rough week at Solid Earth. Over the last couple of years we have tried to communicate to our customers that transparency in our communication is critical. To deliver enterprise software well, our company must disclose everything to our customers: our capacity, our strategy, our mistakes and our successes. They need to know these things about us, and we need to know just as much about their operation as well, if we are to match goals and schedules effectively. This level of communication is exceedingly difficult, especially as the company changes and customers change over time. So as much as we love to talk about Spring and the (very) exciting new things happening at Solid Earth, it’s probably instructive for us, and for our customers to hear about the challenges we occasionally face, and how we work through them.
This week we had some serious challenges: a major (very rare) system outage on Sunday morning and then four different trips for the Solid Earthlings. Trips included an MLS Demo in California, an important new equipment install in Chicago, an important industry meeting in Louisville and cancer treatment in Pennsylvania for me. All of them proved to be difficult. On the California trip, Lauren was delayed 10 hours on the way out (and now she’s stuck and can’t make it out of Modesto). My flight to Philadelphia took off from Reagan National and the flaps stuck in the take-off position, requiring an emergency landing at Dulles. Plus there was a bomb threat that closed several blocks around the Federal Courthouse in Huntsville, including our block, and that was just Monday!
So when we say TGIF this week, we really mean it. Except maybe CMO Bill Fowler who is not too happy about the outage happening the weekend before he meets with 400-500 Association Execs at AEI, this year held in Louisville, a long-time Solid Earth client. I am sure he’ll do a great job explaining what happened, I am going to drive up and join them for Saturday night, celebrating 10 years of serving MetroSearch Inc and The Greater Louisville Association of Realtors.
The network outage started early Sunday morning. By 8AM I had spoken with several customers and the whole Exec staff at Solid Earth. Chris Weir, Sys Admin at Solid Earth was in the Network Operations Center (NOC) working on the problem and he reported that a utility server had failed. There was a backup server, but unfortunately it was en route by FedEx to a new backup center in Chicago, shipped out the previous Friday. Chris was scheduled to fly out Monday morning to meet the shipment and install the equipment.
The loss of the server, a DNS server, brought down the whole network and without a backup, it stayed down. Because of the backup, the failure occurred at the worst possible time. The only solution was to build a new server so Robb Dempsey, Solid Earth’s CTO and Chris worked all day to build two new DNS servers. Chris then flew to Chicago on Monday and finished the installation of the original backup DNS so that we now have a total of four. We can promise that will never happen again.
The total outage for most users was about ten hours. That’s a very long time, especially on an early Spring weekend with good weather in a market showing some signs of recovery. It’s the longest outage at Solid Earth in many years and not something we take lightly. Realtors were calling their MLS offices, their Brokers, Solid Earth, me personally as well as the rest of our staff. They were not happy. We know our friends, the MLS administrators in each of the markets we support, were also getting these calls and they were not happy with us at all. We don’t blame them and we agree with everything they said. It was a terrible day to be down. But, we want you to know that we gave maximum effort to the problem all day Sunday until it was resolved. We just still can’t believe the server, which had been up for a couple of years, failed on one of the two days it had ever been without a backup.
In these instances I think it sounds overly defensive to start giving our trusted and valued customers a bunch of statistics about our uptime. Still, I also think it’s important to tackle any crisis with data-driven methodology, so while it’s defensive, it’s also true that Solid Earth is almost never down. To illustrate this to our customers and to help you explain it to your customers, we have pulled together some statistics. In the last six months, for the ten largest MLS systems:
- Uptime was 99.999%
- No system had a measurable outage within 35 days of another outage
- The average system had 74 days between measured outages
- One system was up for 3 months with no measured outages
We know these stats don’t help much when you are talking to an angry subscriber, but it is important to recognize under pressure, that Solid Earth has confidence in the platform. I personally have a great deal of confidence in the platform and as CEO and half-owner of Solid Earth, I have a great deal invested in the platform. While I sincerely apologize for Sunday, I am proud of the numbers. So, finally on this topic, the new DNS servers are up, that part of the move to Chicago is complete, and the network is fast and stable again.
The network is being restructured for the launch of Spring, for more on that watch for a post from CTO Robb Dempsey. We’ve rebuilt the network from scratch, again, and the Chicago move is only a small part of it. It’s interesting if you’re a techy and if you’re an MLS subscriber you’ll love it because it should mean the end of outages altogether by the end of this year, in Spring as well as LIST-IT.