|
Leap Second - Amazon Linux
Posted on:
Jun 30, 2012 4:55 PM
|
|
|
|
There is some talk about systems crashing when trying to apply the leap second tonight. Anyone know if this will affect the Amazon Linux AMIs?
http://serverfault.com/q/403732/8453
Edited by: David Campano on Jun 30, 2012 4:58 PM
|
|
|
|
Re: Leap Second - Amazon Linux
Posted on:
Jun 30, 2012 5:15 PM
|
|
|
|
I had no issues on the current (2012.03) Amazon Linux AMI instances that I was monitoring.
In the dmesg I saw messages like:
[ 387.234627] Clock: inserting leap second 23:59:60 UTC
And the instances continued without incident.
Did you experience any problems?
Thank you for using the Amazon Linux AMI.
Edited by: Max@AWS on Jun 30, 2012 5:16 PM
|
|
|
|
Re: Leap Second - Amazon Linux
Posted on:
Jun 30, 2012 5:19 PM
|
|
|
|
Thanks for the reply, we didn't have any servers crash. Had a couple hiccups with our database servers, but I think that may have been due to our own custom application code.
|
|
|
|
Re: Leap Second - Amazon Linux
Posted on:
Jun 30, 2012 6:06 PM
|
|
|
|
We did have issues!
Three of our instances (2 small and 1 micro) went 100% CPU exactly at the leap second moment and became unresponsive (see the attached charts).
The incredible thing is that all of the instances (even the 2 small) started throttling the CPU and reported >97% STEAL CPU (see the attached top screenshot). We had always seen micro instances only throttling that way, never small instances!
We had to terminate and start fresh instances to restore normal behavior.
Please explain what happened.
|
|
|
|
Re: Leap Second - Amazon Linux
Posted on:
Jun 30, 2012 6:11 PM
|
|
|
|
From your top output, it looks like the jvm process started hogging all the cpu. We experienced a similar bug, but in our case it was ruby processes. It appears the 'fix' is to reboot the machine. Several people on twitter talking about their java apps using up all the cpu.
|
|
|
|
Re: Leap Second - Amazon Linux
Posted on:
Jun 30, 2012 6:18 PM
|
|
|
|
|
|
|
|
Re: Leap Second - Amazon Linux
Posted on:
Jun 30, 2012 6:30 PM
|
|
|
|
Same here.. Number of Amazon Linux based instances that are in permanent 100% CPU state, starting exactly at the leap second UTC. It seems to be the "thin server" process that's consuming 100%+ CPU. What is it? Why is it freaking out of the leap second? Amazon, did you not hear about the leap second coming? How come it wasn't prepared? Now on the same day as a 14 hour outage due to you tripping your power up (and your redundancy, multiple availability zones, failover and pretty much everything else you advertise), my service is now affected by a lousy leap second?!?! Really?!
|
|
|
|
Re: Leap Second - Amazon Linux
Posted on:
Jun 30, 2012 6:41 PM
|
|
|
|
mssteuer@ -- I'm sorry that you had trouble with your Amazon Linux AMI instances.
As discussed above, this is an issue that has the ability to impact a variety of Linux flavors, depending on the kernels that they are running.
On the Amazon Linux AMI side, newer (2012.03.3) Amazon Linux AMIs were not impacted, but some older ones (2011.02) were.
Regardless of the Linux flavor that you are using, restarting your instances should solve the problem. Because it was an issue unique to the leap second, it will not recur.
Thank you for using the Amazon Linux AMI.
Edited by: Max@AWS on Jun 30, 2012 6:42 PM
|
|
|
|
Re: Leap Second - Amazon Linux
Posted on:
Jul 2, 2012 8:06 AM
|
|
|
|
Hi Max,
What is still not clear to me is why the small instances reported 97% percent of stolen CPU with the top command. Even if our own processes made an infinite loop (and that's not the case), shouldn't small instances let us consume 100% CPU without being throttled (differently from micro instances)?
Thanks for any clarification.
|
|
|
|
|