The Case:
Sunday night, 1:30 AM. the NOC team is ringing your phone, desperately trying wake you up. one of the critical business apps is down, and they have been trying to reach the team who owns it for close to an hour, but every call goes to voice mail.
you fall out of bed, rush through a shower, speed a little bit on the freeway towards the data center, and check in with security. It’s Sunday, but based on what the NOC sees from Ops Manager, everything is down, and you don’t know if you can get it back online, unsupported, before business starts in 5 hours or so.
you turn the corner towards your company’s rack, only to see the entire critical app team sitting around a card table; playing poker, eating pizza, and watching the software maintenance package on the critical server run though it’s offline upgrade. Their mobile phones are all stacked in the center of the table, because there is no signal in the data center, and they’re halfway through “CR99281: Routine maintenance update to critical business app”, scheduled for Sunday, 12:30 AM to 4:30 AM.