Discussion Forums
Advanced search options
Leap Second - Amazon Linux
Posted by: David Campano RealName(TM)
Posted on: Jun 30, 2012 4:55 PM
  Click to reply to this thread Reply
There is some talk about systems crashing when trying to apply the leap second tonight. Anyone know if this will affect the Amazon Linux AMIs?

http://serverfault.com/q/403732/8453

Edited by: David Campano on Jun 30, 2012 4:58 PM
Permlink Replies: 8 | Pages: 1 - Last Post: Jul 2, 2012 8:06 AM by: Alessandro
Replies
Re: Leap Second - Amazon Linux
Posted by: Max@AWS
Posted on: Jun 30, 2012 5:15 PM
in response to: David Campano in response to: David Campano
  Click to reply to this thread Reply
I had no issues on the current (2012.03) Amazon Linux AMI instances that I was monitoring.

In the dmesg I saw messages like:

[  387.234627] Clock: inserting leap second 23:59:60 UTC


And the instances continued without incident.

Did you experience any problems?

Thank you for using the Amazon Linux AMI.

Edited by: Max@AWS on Jun 30, 2012 5:16 PM
Re: Leap Second - Amazon Linux
Posted by: David Campano RealName(TM)
Posted on: Jun 30, 2012 5:19 PM
in response to: Max@AWS in response to: Max@AWS
  Click to reply to this thread Reply
Thanks for the reply, we didn't have any servers crash. Had a couple hiccups with our database servers, but I think that may have been due to our own custom application code.
Re: Leap Second - Amazon Linux
Posted by: Alessandro
Posted on: Jun 30, 2012 6:06 PM
in response to: David Campano in response to: David Campano
Attachment Web_02-24.PNG (162.1 KB)
Attachment LS.PNG (64.7 KB)
Attachment Web2.PNG (61.8 KB)
Attachment Web1.PNG (60.2 KB)
  Click to reply to this thread Reply
We did have issues!

Three of our instances (2 small and 1 micro) went 100% CPU exactly at the leap second moment and became unresponsive (see the attached charts).

The incredible thing is that all of the instances (even the 2 small) started throttling the CPU and reported >97% STEAL CPU (see the attached top screenshot). We had always seen micro instances only throttling that way, never small instances!

We had to terminate and start fresh instances to restore normal behavior.

Please explain what happened.
Re: Leap Second - Amazon Linux
Posted by: David Campano RealName(TM)
Posted on: Jun 30, 2012 6:11 PM
in response to: Alessandro in response to: Alessandro
  Click to reply to this thread Reply
From your top output, it looks like the jvm process started hogging all the cpu. We experienced a similar bug, but in our case it was ruby processes. It appears the 'fix' is to reboot the machine. Several people on twitter talking about their java apps using up all the cpu.
Re: Leap Second - Amazon Linux
Posted by: Max@AWS
Posted on: Jun 30, 2012 6:18 PM
in response to: Alessandro in response to: Alessandro
  Click to reply to this thread Reply
The issue was discussed a bit here, and has the potential to impact a number of Linux flavors.

https://access.redhat.com/knowledge/articles/15145

http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-today

The recommended solution is simply to restart your instances, the same way you would reboot any Linux machine that locked up. If that doesn't fix your issues, please speak up.
Re: Leap Second - Amazon Linux
Posted by: Michael Steuer
Posted on: Jun 30, 2012 6:30 PM
in response to: Alessandro in response to: Alessandro
  Click to reply to this thread Reply
Same here.. Number of Amazon Linux based instances that are in permanent 100% CPU state, starting exactly at the leap second UTC. It seems to be the "thin server" process that's consuming 100%+ CPU. What is it? Why is it freaking out of the leap second? Amazon, did you not hear about the leap second coming? How come it wasn't prepared? Now on the same day as a 14 hour outage due to you tripping your power up (and your redundancy, multiple availability zones, failover and pretty much everything else you advertise), my service is now affected by a lousy leap second?!?! Really?!
Re: Leap Second - Amazon Linux
Posted by: Max@AWS
Posted on: Jun 30, 2012 6:41 PM
in response to: Michael Steuer in response to: Michael Steuer
  Click to reply to this thread Reply
mssteuer@ -- I'm sorry that you had trouble with your Amazon Linux AMI instances.

As discussed above, this is an issue that has the ability to impact a variety of Linux flavors, depending on the kernels that they are running.

On the Amazon Linux AMI side, newer (2012.03.3) Amazon Linux AMIs were not impacted, but some older ones (2011.02) were.

Regardless of the Linux flavor that you are using, restarting your instances should solve the problem. Because it was an issue unique to the leap second, it will not recur.

Thank you for using the Amazon Linux AMI.

Edited by: Max@AWS on Jun 30, 2012 6:42 PM
Re: Leap Second - Amazon Linux
Posted by: Alessandro
Posted on: Jul 2, 2012 8:06 AM
in response to: Max@AWS in response to: Max@AWS
  Click to reply to this thread Reply
Hi Max,

What is still not clear to me is why the small instances reported 97% percent of stolen CPU with the top command. Even if our own processes made an infinite loop (and that's not the case), shouldn't small instances let us consume 100% CPU without being throttled (differently from micro instances)?

Thanks for any clarification.