Discussion Forums
Discussion Forums > Category: Compute > Forum: Amazon Elastic Compute Cloud (EC2) >Thread: Degraded performance after forced reboot due to AWS instance maintenance
Advanced search options
Degraded performance after forced reboot due to AWS instance maintenance
Posted by: ajnaware
Posted on: Dec 19, 2017 11:15 AM
Attachment ozpdaservercpu2.png (103.1 KB)
  Click to reply to this thread Reply
This question is answered.
Five days ago I received email from AWS (see below for full text) which informed me that a reboot of one of my instances was necessary due to "updates". To pre-empt auto-reboot on 5th Jan I manually rebooted 3 days ago. Immediately following the reboot my server running on this instance started to suffer from cpu stress. Looking at cpu stats there was a very clear change in daily cpu usage pattern, despite continuing normal traffic to my server. I performed extensive review of what might have changed on my server configuration but drew a complete blank - configuration of the server did not change.

It is simply as if the instance (m1.medium) was somehow degraded to a lesser performing one following the reboot. I simply can't find any explanation other than a change to the instance capability that took effect when I rebooted.

What could possible be causing this? I'm at wits end trying to understand what happened? Is it possible that AWS maintenance is responsible for this degradation?

See attached file showing changed cpu pattern that started on 15th Dec immediately following the reboot.

==============================
Full email from AWS announcing need to reboot:

Dear Amazon EC2 Customer,

One or more of your Amazon EC2 instances in the ap-southeast-2 region requires important security and operational updates which will require a reboot of your instance. A maintenance window has been scheduled between Sat, 6 Jan 2018 03:00:00 GMT and Sat, 6 Jan 2018 05:00:00 GMT during which the EC2 service will automatically perform the required reboot. During the maintenance window, the affected instance will be unavailable for a short period of time as it reboots. You may instead choose to reboot the instance yourself at any time before the maintenance window. If you choose to do this, the maintenance will be marked as completed and no reboot will occur during the maintenance window. For more information on EC2 maintenance please see our documentation here: https://aws.amazon.com/maintenance-help/. More details on rebooting your instances yourself can be found here:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-reboot.html

To see which of your instances are impacted please visit the 'Events' page on the EC2 console to view your instances that are scheduled for maintenance:

https://ap-southeast-2.console.aws.amazon.com/ec2/v2/home?region=ap-southeast-2#Events

If you have any questions or concerns, you can contact the AWS Support Team on the community forums and via AWS Premium Support at: https://aws.amazon.com/support

Edited by: ajnaware on Dec 19, 2017 11:19 AM
Permlink Replies: 30 | Pages: 2 - Last Post: Jan 8, 2018 11:49 PM by: avdvyver
Replies
« Previous | Page: 1 2 | Next »
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: sfdanb
Posted on: Dec 20, 2017 10:54 AM
in response to: ajnaware in response to: ajnaware
Attachment api_cpu.png (76.1 KB)
  Click to reply to this thread Reply
Helpful
We are experiencing the same thing across all roles in our fleet.

Attached is a CPU graph (statistic: average, period: 1 hour) for one instance type. The arrows point at reboot events. blue lines are systems that have been rebooted at some point in this graph. red lines are systems that have not been rebooted.

These hosts are all behind the same ELB handing uniform traffic patterns throughout this graphed time period.

Edited by: sfdanb on Dec 20, 2017 10:55 AM
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: Matt@AWS
Posted on: Dec 20, 2017 3:47 PM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
Helpful
Hi,
The update that is being applied to a portion of EC2 instances can, in some corner cases, require additional CPU resources. We always attempt to make updates and maintenance smooth and non-disruptive for customers, and in the vast majority of cases we are able to apply updates without scheduling maintenance events like instance reboots. For this update, we have attempted to find and eliminate as many of the corner cases that influence performance as possible.

For some time we have recommended that customers use our latest generation instances with HVM AMIs to get the best performance from EC2 (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html). If moving to a HVM based AMI is not easy, changing your instance size to m3.medium, which provides more compute than m1.medium at a lower price, may be a workaround.

As the notice points out, the update that is being applied is important to maintain the high security and operational aspects for your instances. We want to make every effort to make this as non-disruptive as possible. If this information does not help you resolve the CPU utilization issue you're experiencing, please reach out again.

Kind regards,

Matt
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: sfdanb
Posted on: Dec 20, 2017 6:20 PM
in response to: Matt@AWS in response to: Matt@AWS
  Click to reply to this thread Reply
Fortunately we had been working on a path to upgrade our fleet to HVM. This upgrade forced our hand, so we can confirm that on the same instance type (c3.xlarge) and using the same code we have returned to an acceptable performance level on affected hosts with HVM AMIs.

It was a very long couple days..
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: ajnaware
Posted on: Dec 20, 2017 6:22 PM
in response to: Matt@AWS in response to: Matt@AWS
  Click to reply to this thread Reply
Thanks for the detailed reply. I guess you are essentially confirming that the instance maintenance was likely to be the reason for the major change to cpu usage, and that I am one of those edge cases, and that the only solution now is to change instance type. Of course I am not entirely happy about this. I bought a 3-year reserved instance 2 years ago, and now have to hope I can sell the remaining year for a reasonable amount (which may be a stretch given that I am apparently using legacy instance type), and then purchase new reserved instance after upgrading. I will likely only buy 1 year reserved instances henceforth, given that there is apparently no guarantee that the instance will actually remain viable for the full duration, should you undertake any future maintenance causing similar issues. Also as I am not personally expert enough to carry out the new instance selection and update myself, I am having to pay for hired expert assistance to do this. Also the cpu max-outs I've experienced have caused some grief regarding my own user relations. So all in all I'm pretty disappointed about this issue. If I manage to sort it all out I'll mark as answered.
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: lvms
Posted on: Dec 20, 2017 7:55 PM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
We are in the exact same situation with an m1.medium instance that has a bit over a year to go on a three year reserved instance. As our business is primarily online, we have now suffered significant losses. In our case, Amazon tried to say that our problem was not the same as what everyone else is reporting, and instead was due to our running an old kernel and Linux distribution in general, despite the fact that we had exactly the same symptoms. We have now upgraded our distro, but are having the same problem. It really sounds as if Amazon screwed up with an inadequately tested upgrade and are now trying to avoid responsibility. That is very unfortunate, as we could accept the admission of an honest mistake, but not these excuses with no attempt on their part to fix the problem. We may have to upgrade now, but we'll certainly be looking to move to different hosting. I suspect they are afraid to admit liability, or maybe they just don't care about the people who will be affected, as I suspect it is mostly smaller customers. If that is the case, then they may have even known this would be the result, and just made a decision that the resulting loss of goodwill and probable lawsuits was worth the tradeoff.
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: sfdanb
Posted on: Dec 20, 2017 10:05 PM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
For what it's worth, we also had reserved instances (c3.xlarge) and we were able to relaunch our instances on that same instance type with HVM. So while I'm sure it's small consolation (as it was for us) at least you don't need to immediately deal with reselling the RIs. You just need to figure out how to migrate to HVM and relaunch the instances... which is no small task, to be sure.
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: lvms
Posted on: Dec 20, 2017 10:14 PM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
Unfortunately, M1 instances do not support HVM.
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: Matt@AWS
Posted on: Dec 20, 2017 10:19 PM
in response to: lvms in response to: lvms
  Click to reply to this thread Reply
Hi,

All instance types can now run HVM AMIs for any operating system. Previously only Windows HVM AMIs could be used in HVM mode on M1, M2, C1, and T1 instances. This is no longer the case.

Kind regards,

Matt
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: Matt@AWS
Posted on: Dec 20, 2017 10:38 PM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
Hi,

Before selling your RI, can you try running your workload on a HVM AMI running on a m1.medium instance?

I am also disappointed that we have fallen short in making this maintenance completely painless for you, despite our continuing efforts. We will follow up directly to make sure your issues are resolved.

Kind regards,

Matt
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: Matt@AWS
Posted on: Dec 20, 2017 11:41 PM
in response to: lvms in response to: lvms
  Click to reply to this thread Reply
Hi,

I have reviewed support case 4743634091 regarding what you're experiencing on your instance. You're correct that what you are seeing is not the same issue as what others are reporting. In the first correspondence from support they correctly pointed out that the kernel in your instance is encountering an out of memory (OOM) condition and made suggestions about how to adjust the configuration within your instance to avoid the OOM processor killer from kicking in.

The update that is being applied to instances that have scheduled reboot maintenance can cause slight changes to system resources available to paravirtualized instances, including a small reduction in usable memory. This can cause smaller instances, like m1.medium, that run workloads that were previously just fitting within the usable memory available to the instance to trigger out of memory conditions. Adding a swap file (as no swap is configured in your instance) or reducing the number of processes may resolve the issue on your existing PV instance.

Replacing your instance with one started from a HVM AMI will provide more system resources than PV instances. In either case (adjusting your configuration or moving to HVM), you should be able to run your existing workload on a m1.medium instance if you do not want to change to a different size.

I'm sorry that the additional information that would have provided a better explanation for the recommendations made in the case was not originally included, and that this important update is requiring additional effort beyond the reboot for your workload and configuration.

Kind regards,

Matt
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: ajnaware
Posted on: Dec 30, 2017 9:23 AM
in response to: Matt@AWS in response to: Matt@AWS
  Click to reply to this thread Reply
I have now moved to an m3.medium instance which brought typical cpu loads down from about 50% (with many max-outs) to about 15%.

As there was no simple migration path from my m1.medium instance to HVI AMI I had to re-install server software from scratch, which is a lengthy process. I did not test m1.medium HVI AMI because I couldn't afford to waste time testing configurations. I just needed a solution that would allow my server to run reliably, henceforth, and based on your advice m3.medium seemed the safest bet.

I have put my m1.medium reserved instance up for sale for the recommended $470. However, given that it is an obsolete instance type I am not optimistic that it will sell.

Overall, in direct monetary terms I would estimate its going to cost me at least $1000 for the wasted reserved instance plus contractor time. And that figure doesn't include my own time and stress, nor the intangible loss of user confidence that came from the problems when my instance started maxing out in cpu.

So while I appreciate the "better late than never" advice that you gave after I posted the problem here, which has subsequently allowed me to find a solution at my own effort and expense, given that the problem arose entirely due to AWS actions causing the service to degrade, I would expect something a bit more proportionate from you than just a verbal expression of regret.
Re: Degraded performance after forced reboot due to AWS instance maintenance
Posted by: jbeaumont1
Posted on: Jan 4, 2018 4:33 AM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
This just happened to us today on a c3.large. The cost to us to move the platform to new hardware and the lost confidence from our customers is huge.

Edited by: jbeaumont1 on Jan 4, 2018 4:34 AM
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: mrcudds
Posted on: Jan 4, 2018 7:28 AM
in response to: Matt@AWS in response to: Matt@AWS
  Click to reply to this thread Reply
M1 instances do not support HVM AMIs. Please show us an example of 1 that can be ran in any region.
Re: Degraded performance after forced reboot due to AWS instance maintenance
Posted by: HalTemp
Posted on: Jan 4, 2018 7:45 AM
in response to: jbeaumont1 in response to: jbeaumont1
  Click to reply to this thread Reply
I'm not sure if you've been following the news (e.g., http://fortune.com/2018/01/04/meltdown-spectre-intel-amd-arm-security-bug-apple-microsoft-google-apple-amazon/) but this is the result of an industry-wide security crisis. It is as close to computing Armageddon as we've come. It isn't something of AWS' doing, or of them having a choice in the matter. All cloud providers were rushing to patch their infrastructure before the vulnerabilities were disclosed. I know people who pretty much haven't seen their families since Thanksgiving as they worked to patch systems.

Now that the issues were prematurely leaked, software suppliers are rushing to patch operating systems and other software (e.g., browsers). Microsoft had to accelerate release of fixes for Windows by a week. Patches for the Linux kernel are rolling out. I believe OS X has a Meltdown fix out as well. Google has Android and Chrome OS updates, and is recommending users turn on an experimental feature in the Chrome browser as a mitigation for Spectre. You are going to want to deploy those on your VMs, bare metal servers, personal computers, tablets, and phones asap. It's a real mess.

These are Intel hardware bugs and Intel/AMD/ARM architectural issues that allow information leak across supposedly protected boundaries, including between virtual machines. They had to be addressed, and the mitigations have performance impact. The impact is worse for some workloads, and for older (pre-2010) processors and Paravirtualization. So some people will have no work to do other than install patches, while others will have real work to do (or make other cost tradeoffs) to recover performance lost by having to workaround these hardware issues.
Re: Degraded performance after forced reboot due to AWS instance maintenance
Posted by: ajnaware
Posted on: Jan 4, 2018 7:56 AM
in response to: HalTemp in response to: HalTemp
  Click to reply to this thread Reply
Yes, the news of cpu security issues is finally out now. However you seem to be missing the point that AWS knew that implementing these updates would cause loss of capability to some customers (like myself and others in this thread) and yet failed to either 1) notify us in advance of possible problems, so that we could take pre-emptive action to avoid damage to our own business, nor 2) provide any other mitigating workaround (like pre-emptively upgrading capability for affected customers) which could have prevented business damage to affected customers.
Re: Degraded performance after forced reboot due to AWS instance maintenance
Posted by: HalTemp
Posted on: Jan 4, 2018 9:37 AM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
Which, unfortunately, is the nature of the embargo on these kinds of security problems.
Re: Degraded performance after forced reboot due to AWS instance maintenance
Posted by: ajnaware
Posted on: Jan 4, 2018 9:55 AM
in response to: HalTemp in response to: HalTemp
  Click to reply to this thread Reply
AWS could and should have warned of significant service degrade and/or taken pre-emptive steps to avoid customers experiencing unexpected degrade. Obviously no breaking of embargo would have been necessary in order to do that. People's businesses have been damaged unnecessarily.
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: Matt@AWS
Posted on: Jan 4, 2018 12:55 PM
in response to: mrcudds in response to: mrcudds
  Click to reply to this thread Reply
Hi,

As I mentioned last year in https://forums.aws.amazon.com/thread.jspa?messageID=822633#822633, all instance types support HVM AMIs, regardless of the operating system.

Kind regards,

Matt
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: jbeaumont1
Posted on: Jan 4, 2018 1:29 PM
in response to: Matt@AWS in response to: Matt@AWS
  Click to reply to this thread Reply
HVM AMIs are not the magic answer. Our paravirtual c3 was crippled by the update, but our HVM lost 30% of it's performance overnight too.

I also appreciate that this is not Amazon's fault. But some prior warning that instances would lose performance on certain dates was necessary so we could take pre-emptive action. Instead we had to put up with the crisis we had this morning due to Amazon's silence.
Re: Degraded performance after forced reboot due to AWS instance maintenance
Posted by: miljesse2
Posted on: Jan 4, 2018 1:58 PM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
It's was around 4 AM (UTC) last night that we started seeing problems. I have 2 c3.large (PV) instances behind an ELB, both of them were peaking at most 50% CPU usage (over 1 hour) at peak hours, now I'm having spikes of 83% (over 1 hour!) so they've been close to 100% many times. The load averages (from 'top') they are reporting have been past 10 multiple times!
Needles to say they're pretty sluggish to even access.

Is there going to be any relief? There's no larger instance type for these AMI:s.

I also have multiple m1.small instances (for development mostly), they're nearly unusable.
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: Matt@AWS
Posted on: Jan 4, 2018 2:09 PM
in response to: jbeaumont1 in response to: jbeaumont1
  Click to reply to this thread Reply
Hi,

I tried to find a support case to look into what you're reporting more deeply, but was unable to find one.

This is not typical behavior, and without further investigation I can't say if it is related to recent events and maintenance. A 30% performance change is not in any way typical or expected, and we want to make every effort to understand what you are seeing completely so it can be appropriately addressed.

Please feel free to send a private message to me with additional information.

Kind regards,

Matt
Re: Degraded performance after forced reboot due to AWS instance maintenanc
Posted by: willglasshusain
Posted on: Jan 4, 2018 5:16 PM
in response to: Matt@AWS in response to: Matt@AWS
  Click to reply to this thread Reply
I had an m1.small go haywire unexpectedly this AM with very high load (5-10) and 100% CPU. The process taking the CPU varied, but all basic stuff.

I upgraded to m1.medium and that helped.

Although I rebooted it weeks ago, given the timing, I am wondering if it's related to the system reboots.
Re: Degraded performance after forced reboot due to AWS instance maintenance
Posted by: ramj
Posted on: Jan 4, 2018 9:08 PM
in response to: ajnaware in response to: ajnaware
  Click to reply to this thread Reply
We were hit by this issue and saw a 50% spike in some of our i3 nodes. And we can almost see the spikes happen in waves across different AZ's. Maybe they correlate with when the patches we being applied.

Do we know if AWS is done patching all their nodes, or is there still more to come ?
Re: Degraded performance after forced reboot due to AWS instance maintenance
Posted by: sr2017
Posted on: Jan 4, 2018 9:30 PM
in response to: ramj in response to: ramj
  Click to reply to this thread Reply
I thought we were the only one to have this issue and trying to fix and re-look at our DB queries, etc.
Our CPU load has gone up 10 times and hovering at around 100% all the time.

We have r4.2xlarge - Instance ID : i-0114777a09d0997d1

Can Amazon team pls take a look and help us out ?
« Previous | Page: 1 2 | Next »