Vertical Scaling on AWS
Quick Reference Guide
What is Vertical Scaling?
Think of vertical scaling like upgrading your laptop’s RAM instead of buying a second laptop. You’re beefing up your existing server—more CPU, more memory, more storage—rather than spinning up additional servers.
I’ve used this approach countless times when an application starts struggling. Instead of refactoring everything for horizontal scaling (which can take weeks), you can often just bump up the instance size and buy yourself time. It’s not always the right solution, but when it fits, it’s a lifesaver.
When to Use Vertical Scaling
I’ll be honest—vertical scaling isn’t always the answer. But here’s when it actually makes sense:
- ✅ You’ve got a monolithic app that’s hard to split across servers
- ✅ Traffic is pretty steady (no crazy spikes at 3 AM)
- ✅ You need something done fast—like, today fast
- ✅ Budget is tight and you’re not at massive scale yet
- ✅ You want to avoid the complexity of load balancers and auto-scaling groups
AWS Services That Support Vertical Scaling
| Service | How to Scale | Downtime |
|---|---|---|
| EC2 | Change instance type (stop → modify → start) | Yes (2-5 min) |
| RDS | Modify DB instance class | Yes (Multi-AZ: minimal) |
| ElastiCache | Modify node type | Yes (with Multi-AZ: minimal) |
| EBS | Resize volume, change type | No (online resize) |
| Lambda | Increase memory allocation | No |
| ECS | Change task CPU/memory | No (rolling update) |
Practical Example: Task Management API
Let’s walk through a real scenario I’ve dealt with: a Node.js API with PostgreSQL and Redis, all running on one EC2 instance. Docker makes this way easier than installing everything manually—trust me, I’ve done both ways.
Quick Setup
Docker Compose Configuration
How to Scale Vertically
Alright, let’s get into the actual process. I’ve scaled instances dozens of times, and here’s what actually works (and what doesn’t).
Manual Scaling (EC2)
- Create snapshot: EC2 → Snapshots → Create snapshot
- Stop instance: EC2 → Instances → Stop
- Modify type: Actions → Instance Settings → Change Instance Type
- Select new type: e.g., m5.large → m5.xlarge
- Start instance: EC2 → Instances → Start
- Verify:
curl http://localhost:3000/health
AWS CLI Method
Automatic Scaling with Lambda
Trigger scaling automatically when CloudWatch alarms fire.
Step 1: Create Lambda Function
Step 2: Update CloudWatch Alarm
Monitoring & Alerts
You can’t manage what you don’t measure. Before you even think about scaling, get your monitoring in place. Trust me, you’ll want to know what’s happening.
Setup CloudWatch Alarms
Key Metrics to Monitor
| Metric | Threshold | Action |
|---|---|---|
| CPU Utilization | > 75% for 10 min | Scale up |
| Memory Usage | > 80% for 10 min | Scale up |
| Disk Usage | > 85% | Resize EBS or cleanup |
| Application Latency | > 500ms | Investigate & scale |
Instance Type Selection
| Instance Type | vCPU | RAM | Use Case |
|---|---|---|---|
| t3.medium | 2 | 4 GB | Development/Testing |
| m5.large | 2 | 8 GB | Small production |
| m5.xlarge | 4 | 16 GB | Medium production |
| m5.2xlarge | 8 | 32 GB | Large production |
Scaling Path Example
Start: m5.large (2 vCPU, 8 GB) → Scale to: m5.xlarge (4 vCPU, 16 GB) → Scale to: m5.2xlarge (8 vCPU, 32 GB)
Best Practices (Learned the Hard Way)
I’ve made my share of mistakes scaling instances. Here’s what I wish someone had told me:
- ✅ Always create a snapshot first. Seriously, don’t skip this. I’ve had to restore from backups before, and it’s not fun.
- ✅ Test in staging. Your production environment isn’t the place to learn that your app doesn’t handle the new instance type well.
- ✅ Watch your costs. That m5.2xlarge costs twice as much as m5.xlarge. Make sure you actually need it.
- ✅ Set up alerts before you scale. You want to know immediately if something goes wrong.
- ✅ Use Multi-AZ for RDS. The downtime difference is huge—30 seconds vs 5 minutes. Worth the extra cost.
- ✅ Plan a maintenance window. Even if it’s just 10 minutes, let your users know.
- ✅ Write down what you did. Future you will thank present you when you need to do this again in 6 months.
Common Issues & Solutions
| Issue | Solution |
|---|---|
| Instance won’t stop | Check for running processes, force stop if needed |
| IP address changed | Use Elastic IP or update DNS records |
| Performance not improved | Check if bottleneck is elsewhere (database, network) |
| Scaling taking too long | Check RDS events, verify Multi-AZ status |
EBS Volume Optimization
Here’s something a lot of people don’t realize: you can often fix performance issues by just upgrading your EBS volume instead of scaling the whole instance. I’ve saved money and time doing this more times than I can count.
EBS volumes can be resized and optimized without stopping the instance (in most cases). If you’re hitting I/O limits or running out of space, this is usually your first move before touching the instance type.
Resize EBS Volume (Online)
Step 1: Modify Volume Size
Step 2: Extend File System (Linux)
Change EBS Volume Type
Upgrade volume type for better performance without resizing:
| Volume Type | Use Case | IOPS | Throughput |
|---|---|---|---|
| gp3 | General purpose (default) | 3,000-16,000 | 125-1,000 MB/s |
| gp2 | General purpose (legacy) | 3-16,000 (baseline) | 128-250 MB/s |
| io1/io2 | High IOPS workloads | 64-256,000 | 1,000 MB/s |
| st1 | Throughput-optimized | 500 | 500 MB/s |
EBS Optimization Best Practices
- ✅ Monitor volume metrics: CloudWatch → EBS → VolumeReadOps, VolumeWriteOps
- ✅ Use gp3 instead of gp2: Better price/performance, configurable IOPS
- ✅ Right-size IOPS: Don’t over-provision (io1/io2 costs more)
- ✅ Separate data volumes: Use separate EBS for database data
- ✅ Enable encryption: Use encrypted volumes for sensitive data
- ✅ Snapshot before changes: Always backup before modifying volumes
Quick Decision Guide: Vertical vs Horizontal Scaling
I get asked this question all the time: “Should I scale up or scale out?” The answer isn’t always obvious. Use this decision tree—it’s saved me from making the wrong choice more than once.
Scaling Decision Tree
- Scale up your single instance
- Faster to implement
- Simpler architecture
- Multiple instances for redundancy
- Load balancer for distribution
- No single point of failure
- Right-size for steady load
- Cost-effective for predictable workloads
- Easier to manage
- Auto-scaling groups
- Scale out during spikes
- Scale in during low traffic
- Cannot scale vertically further
- Migrate to horizontal architecture
- Consider microservices
- Easy to add more instances
- Better for auto-scaling
- More flexible
- State stored on single server
- Easier than migrating state
- Simpler architecture
Quick Comparison Table
| Factor | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Implementation | Fast (minutes) | Slow (days/weeks) |
| Complexity | Simple | Complex |
| Availability | Single point of failure | High availability |
| Cost (small scale) | Lower | Higher (overhead) |
| Cost (large scale) | Higher (large instances expensive) | Lower (commodity hardware) |
| Scalability Limit | Maximum instance size | Unlimited |
| Downtime | Yes (2-5 min) | No (rolling updates) |
| Best For | Monolithic, predictable traffic | Microservices, unpredictable traffic |
Cost Considerations
Let’s talk money. Vertical scaling isn’t free, and the costs add up faster than you might think.
- EC2: Roughly doubles each time you go up a size. m5.large (~$70/month) → m5.xlarge (~$140/month) → m5.2xlarge (~$280/month). That’s a big jump.
- RDS: Same story—pricing mirrors EC2 for similar instance types. Multi-AZ adds about 2x the cost but gives you high availability.
- EBS: Storage is cheap ($0.10/GB/month for gp3), but if you need high IOPS (io1/io2), that’s where it gets expensive. I’ve seen bills spike from IOPS alone.
- CloudWatch: First 10 alarms are free, then $0.10 each. Not bad, but if you’re setting up alarms for everything, it adds up.
Pro tip: Use the AWS Pricing Calculator before scaling. I’ve been surprised by costs more than once.
Rollback Procedures
Here’s the thing: scaling can go wrong. I’ve had instances that wouldn’t start after changing types, applications that crashed on the new instance, and performance that was somehow worse. Always have a rollback plan—you’ll sleep better at night.
When to Rollback
- ❌ Application fails to start after scaling
- ❌ Performance is worse than before scaling
- ❌ Application errors increase significantly
- ❌ Health checks fail consistently
- ❌ Critical bugs discovered after scaling
- ❌ Cost exceeds budget unexpectedly
Manual Rollback (EC2)
Step-by-Step:
- Verify current state:# Check current instance type aws ec2 describe-instances –instance-ids i-xxx \ –query ‘Reservations[0].Instances[0].InstanceType’ –output text # Check application health curl http://localhost:3000/health
- Stop instance:aws ec2 stop-instances –instance-ids i-xxx aws ec2 wait instance-stopped –instance-ids i-xxx
- Revert to previous instance type:# Change back to original type (e.g., m5.xlarge → m5.large) aws ec2 modify-instance-attribute \ –instance-id i-xxx \ –instance-type Value=m5.large
- Start instance:aws ec2 start-instances –instance-ids i-xxx aws ec2 wait instance-running –instance-ids i-xxx
- Verify rollback:# Check instance type aws ec2 describe-instances –instance-ids i-xxx \ –query ‘Reservations[0].Instances[0].InstanceType’ –output text # Check application curl http://localhost:3000/health curl http://localhost:3000/health/detailed
Rollback Script (Key Commands)
Automatic Rollback with Lambda
Lambda function structure for automatic rollback:
Rollback for RDS
RDS rollback requires restoring from snapshot or point-in-time recovery:
Rollback Best Practices
- ✅ Always create snapshots before scaling (automated in scripts)
- ✅ Document previous state (instance type, configuration)
- ✅ Test rollback procedure in staging environment first
- ✅ Set up alerts to notify team of rollbacks
- ✅ Monitor after rollback to ensure application is stable
- ✅ Keep rollback scripts ready and tested
- ✅ Time limit for rollback – decide within 1-2 hours if rollback is needed
- ✅ Post-mortem – analyze why scaling failed to prevent future issues
Rollback Checklist
| Step | Action | Time |
|---|---|---|
| 1 | Identify issue requiring rollback | 5 min |
| 2 | Stop instance (if needed) | 1-2 min |
| 3 | Revert instance type | 30 sec |
| 4 | Start instance | 1-2 min |
| 5 | Verify application health | 2-5 min |
| 6 | Notify team | 1 min |
| Total | Complete rollback | 5-15 min |
Frequently Asked Questions
Q1: How long does vertical scaling take on EC2?
Answer: Usually 2-5 minutes, but I’ve seen it take longer. It really depends on how big your instance is and what’s running on it.
Breakdown:
- Stopping instance: 30-60 seconds
- Modifying instance type: 10-30 seconds
- Starting instance: 60-180 seconds
- Application startup: 30-120 seconds (depends on your app)
Example:
Tip: Use Multi-AZ for RDS to reduce downtime to ~30 seconds during scaling.
Q2: Will my IP address change when I scale vertically?
Answer: Yeah, probably. Unless you’re using an Elastic IP, AWS will give you a new public IP when you stop and start the instance. This bit me once when I forgot to set up an Elastic IP—had to update DNS records at 2 AM.
Solution 1: Use Elastic IP (Recommended)
Solution 2: Update DNS Records
Q3: How do I prevent rapid scaling up and down (thrashing)?
Answer: This is a real problem. I’ve seen systems scale up and down every 10 minutes because the thresholds were too close together. You need cooldown periods and different thresholds for scaling up vs down.
Strategy:
- Scale-up threshold: CPU > 75% for 10 minutes
- Scale-down threshold: CPU < 40% for 30 minutes
- Cooldown period: 30-60 minutes between scaling actions
Lambda Cooldown Check:
Q4: What happens if I scale to maximum instance size and still need more?
Answer: You’ve hit the wall. At this point, you’ve got two options: optimize your application (which you should probably do anyway) or bite the bullet and go horizontal. I’ve been here—it’s not fun, but it’s doable.
Maximum Instance Sizes:
- General Purpose (m5): m5.24xlarge (96 vCPU, 384 GB RAM)
- Compute Optimized (c5): c5.24xlarge (96 vCPU, 192 GB RAM)
- Memory Optimized (r5): r5.24xlarge (96 vCPU, 768 GB RAM)
Options:
- Optimize Application:
- Database query optimization
- Implement caching (Redis)
- Code profiling and optimization
- Connection pooling
- Migrate to Horizontal Scaling:
- Split application into microservices
- Use load balancer (ALB/NLB)
- Deploy multiple instances
- Use auto-scaling groups
- Hybrid Approach:
- Keep database on large instance (vertical)
- Scale application servers horizontally
- Use read replicas for database
Q5: How do I test automatic scaling without actually scaling?
Answer: Use test mode in Lambda or create a separate test alarm with very low threshold.
Method 1: Test Mode in Lambda
Method 2: Test Alarm with Low Threshold
Q6: Can I scale down automatically when usage drops?
Answer: Yes, create a separate alarm with lower threshold for scale-down.
Example Setup:
Scale-Down Lambda Function:
Q7: How do I monitor costs when using automatic scaling?
Answer: Set up AWS Budgets and CloudWatch billing alarms.
Step 1: Create Billing Alarm
Step 2: Track Instance Type Changes
Step 3: Use AWS Cost Explorer
- Go to AWS Console → Cost Explorer
- Filter by service: EC2
- Group by: Instance Type
- Set date range to track costs over time
Q8: What’s the difference between scaling EC2 vs RDS vertically?
Answer: Similar process but different considerations for downtime and data.
| Aspect | EC2 | RDS |
|---|---|---|
| Downtime | 2-5 minutes (must stop) | 30 seconds – 5 min (depends on Multi-AZ) |
| Multi-AZ | Not applicable | Minimal downtime (~30 sec) |
| Data Safety | Create snapshot manually | Automated backups |
| Scaling Time | Faster (2-5 min) | Slower (5-30 min for large DBs) |
| Rollback | Easy (change type back) | More complex (restore from backup) |
RDS Scaling Example:
Q9: How do I handle application state during scaling?
Answer: Design your application to be stateless or use external state storage.
Problem: In-memory sessions, caches, and state are lost when instance stops.
Solutions:
- Use External Session Storage:// Store sessions in Redis (external) app.use(session({ store: new RedisStore({ host: process.env.REDIS_HOST }), secret: ‘your-secret’ }));
- Use Database for State:// Store state in database await db.query( ‘INSERT INTO app_state (key, value) VALUES ($1, $2)’, [‘current_state’, JSON.stringify(state)] );
- Use EBS for Persistent Data:# Mount EBS volume for persistent storage sudo mkdir /data sudo mount /dev/xvdf /data # Configure application to use /data export DATA_DIR=/data
Q10: What happens if scaling fails mid-process?
Answer: Implement rollback mechanism and health checks.
Failure Scenarios:
- Instance fails to stop
- Instance fails to start after modification
- Application fails to start on new instance
- Network connectivity issues
Rollback Strategy:
Complete Terraform Guide
Learn more about Rails
Learn more about Mern Stack
Learn more about DevOps
Learn more about AWS ECS Infrastructure guide



https://shorturl.fm/dLwZj
fb777login https://www.fb777loginv.org