Comprehensive Guide to AWS EC2

Amazon Elastic Compute Cloud (EC2) is a foundational service in AWS, offering scalable and resizable compute capacity in the cloud. This guide provides an in-depth look at EC2’s features, instance types, placement strategies, monitoring, purchasing options, and troubleshooting methods, catering to beginners and experienced cloud architects.

Why Amazon EC2?

Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 750 instances and a choice of the latest processor, storage, networking, operating system, and purchase model to help you best match your workload’s needs. AWS is the first primary cloud provider that supports Intel, AMD, and Arm processors, the only cloud with on-demand EC2 Mac instances, and the only cloud with 400 Gbps Ethernet networking. AWS offers the best price-performance for machine learning training and the lowest cost per inference instances in the cloud. More SAP, high-performance computing (HPC), ML, and Windows workloads run on AWS than any other cloud.

Key Features of Amazon EC2

  1. Scalability and Flexibility: Easily scale your compute resources up or down to meet changing demands.
  2. Variety of Instance Types: Choose from general-purpose, compute-optimized, memory-optimized, storage-optimized, and accelerated computing instances.
  3. Global Infrastructure: Deploy instances in multiple regions and availability zones for redundancy and high availability.
  4. Security and Compliance: Integrated with AWS Identity and Access Management (IAM), Virtual Private Cloud (VPC), and various compliance certifications.
  5. Elastic Load Balancing and Auto Scaling: Automatically distribute incoming traffic and adjust capacity to maintain performance.
  6. Integration with Other AWS Services: Seamlessly work with services like S3, RDS, Lambda, and more.

Instance Types

EC2 provides a diverse range of instance types, each optimized for specific workloads. Selecting the appropriate instance type is crucial for performance and cost efficiency.

    • R (Memory Optimized):
      • Best for memory-intensive applications such as in-memory databases (e.g., Redis, Memcached), real-time big data analytics, and high-performance databases.
      • Examples: r5.large, r6g.xlarge.
    • C (Compute Optimized):
      • It is ideal for compute-bound applications that benefit from high-performance processors, such as batch processing, video encoding, and high-performance web servers.
      • Examples: c5.large, c6g.xlarge.
    • M (General Purpose):
      • Balanced resources for various applications, including small and mid-size databases, caching fleets, and backend servers.
      • Examples: m5.large, m6g.xlarge.
    • I (I/O Optimized):
      • Designed for I/O-intensive applications, such as NoSQL databases (e.g., Cassandra, MongoDB) and transactional databases requiring low-latency, high-throughput storage.
      • Examples: i3.large, i4i.xlarge.
    • G (GPU Optimized):
      • Tailored for high-performance graphics processing applications, including machine learning training, inference workloads, and 3D rendering.
      • Examples: g4dn.xlarge, g5.2xlarge.
    • T (Burstable Performance):
      • It provides baseline performance with the ability to burst CPU usage for short periods, which is ideal for applications with variable workloads like development environments and microservices.
      • Examples: t3.micro, t4g.small.
    • EC2 Graviton:
      • Arm-based processors power graviton instances and deliver exceptional price/performance benefits for Linux and Unix workloads. Ideal for High-Performance Computing (HPC) applications.
      • Examples: m6g.large, c6g.xlarge, r6g.2xlarge.

Placement Groups

Placement groups are a critical feature in EC2 that influences how instances are placed on underlying hardware to optimize for performance or fault tolerance.

        1. Cluster Placement Group:
          • Places instances in the same rack within a single Availability Zone (AZ).
          • It provides low-latency networking (up to 10 Gbps or more), ideal for tightly coupled node-to-node communication in HPC applications.
          • Suitable for applications requiring high throughput and low latency.
        2. Spread Placement Group:
          • Distributes instances across multiple hardware racks and AZs, reducing the risk of simultaneous hardware failures.
          • Limits up to 7 instances per AZ to ensure high availability.
          • Ideal for mission-critical applications where fault tolerance is paramount.
        3. Partition Placement Group:
          • Divides instances into logical partitions, each isolated from failures in other partitions.
          • Scalable to hundreds of instances, making it suitable for large distributed and replicated workloads like Hadoop, HDFS, and Apache Kafka.

    Modifying Placement Groups:

    Instances can be moved between placement groups by stopping them first. The options include:

    1. Moving an existing instance to a placement group.
    2. Transferring an instance from one placement group to another.
    3. Removing an instance from a placement group.

Launch Types

Choosing the right EC2 launch type is essential for cost management and operational efficiency.

      • On-Demand Instances:
        • It is ideal for short-term, unpredictable workloads that cannot be interrupted.
        • No upfront payment is required; pay for computing capacity by the hour or second.
        • Best for applications in development or testing phases.
      • Spot Instances:
        • Offers significant cost savings (up to 90%) by utilizing unused EC2 capacity.
        • Suitable for fault-tolerant, flexible applications like batch processing, big data analytics, and CI/CD workloads.
        • Instances can be terminated by AWS when capacity is needed elsewhere.
      • Reserved Instances (RI):
        • Provide substantial discounts (up to 72%) for long-term (1-3 years) commitments.
        • Convertible RIs offer flexibility to change instance attributes during the term.
        • Suitable for steady-state applications like databases and enterprise applications.
      • Dedicated Hosts:
        • Provides physical servers dedicated to your account, offering greater visibility and control over instance placement.
        • Ideal for meeting compliance requirements and using existing software licenses.
        • Supports host affinity, keeping instances on the same physical server.
      • AWS Savings Plan:
        • Commit to a specific dollar amount per hour or year, offering flexible cost savings across EC2, Fargate, and Lambda.
        • Savings up to 70% for EC2, 66% for compute services, and 64% for SageMaker.

Note: To exclude an Organizational Unit (OU) from RI sharing, disable RI sharing on the master account for all member accounts in that OU.

Monitoring and Troubleshooting

1. Monitoring EC2 Instances

Effective monitoring ensures the performance and health of EC2 instances.

      • Standard Metrics: CPU utilization, network in/out, disk I/O, and status checks.
      • Detailed Monitoring: Provides 1-minute granularity for metrics, available at an additional cost.
      • Custom Metrics: CloudWatch agent is required for metrics like RAM usage and OS-level details.
      • System and Status Checks:
        • System Checks: Detect hardware issues like power failures or network disruptions.
        • Status Checks: Monitor software issues such as OS crashes or misconfigurations.

Recovery Options:

      • EC2 instances can recover from system failures while retaining the same IP address, metadata, and placement.

2. EC2Rescue

EC2Rescue is a troubleshooting tool for diagnosing and resolving issues on EC2 Linux and Windows instances.

      • Manual Execution: Run directly on the instance.
      • Automated Execution: Use Systems Manager (SSM) with the command AWSSupport-ExecuteEC2Rescue.

EC2 Instance Connect

A secure way to connect to EC2 instances without managing SSH keys.

      • Utilizes the SendSSHPublicKey API to establish connections with a 60-second token.
      • Ensure port 22 is open to the AWS IP range. 18.206.106.24/29.
      • All connections are logged in AWS CloudTrail for auditing.

EC2 Spot Instances

Spot Instances provide an economical solution for flexible workloads.

      • Spot Requests: Set a maximum price and specify a launch template.
      • Spot Fleets: Combine spot and on-demand instances to meet capacity requirements.
        • Supports Auto Scaling Groups (ASG), Elastic Container Service (ECS), and AWS Batch.
        • Can manage up to 10,000 target capacities and 100,000 instances per region.

Spot Instance Allocation Strategies:

      • lowestPrice: Chooses the lowest-priced pool, ideal for cost-sensitive, short-term workloads.
      • diversified: Distributes instances across multiple pools for higher availability.
      • capacityOptimized: Select pools with the optimal capacity to reduce interruptions.

Lightsail

Amazon Lightsail offers a simplified cloud platform for deploying virtual private servers (VPS).

      • Provides pre-configured development stacks, such as WordPress, Node.js, and LAMP.
      • Includes integrated networking, storage, databases, and load balancers.
      • It is ideal for small businesses, developers, and those new to AWS.

Shutdown and Launch Troubleshooting

1. Shutdown Behavior

      • Configure instances to either stop or terminate upon shutdown.
      • Enable termination protection to prevent accidental deletion from the AWS Management Console. Note that termination from within the OS will bypass this protection.

2. Common Launch Errors

      • InstanceLimitExceeded: Maximum vCPU limit reached; request a quota increase.
      • InsufficientInstanceCapacity: Capacity issues in the selected AZ; consider changing the instance type, reducing the number of instances, or selecting a different AZ.
      • Instance Terminates Immediately: Check for EBS volume limits, encrypted root volumes, corrupted snapshots, KMS issues, or incomplete AMIs.

3. SSH Troubleshooting

      • Unprotected Private Key: Ensure the private key file has the correct permissions (e.g., chmod 400 key.pem).
      • Host Not Found/Permission Denied: Verify the correct username (e.g., ec2-user, ubuntu) and ensure the key matches the instance.
      • Connection Timed Out: Check security group settings and NACLs and ensure the instance’s public IP is accessible.

Purchasing Options

Understanding EC2 purchasing options is crucial for optimizing costs.

      • Reserved Instances:
        • Up to 72% savings for long-term commitments.
        • Convertible RIs offer 66% savings with the flexibility to change instance attributes.
      • EC2 Savings Plans:
        • Commit to usage over time for significant savings.
        • Flexible across EC instance families and regions.
      • Dedicated Hosts/Instances:
        • Full or partial hardware isolation for compliance and licensing.
      • Capacity Reservations:
        • Reserve on-demand capacity within specific AZs or regions to ensure availability.

Additional EC2 Features

1. Elastic IP

      • Provides a static, public IPv4 address that can be associated with any instance.
      • Limited to 5 Elastic IPs per AWS account by default.

2. CloudWatch Monitoring

      • Basic Monitoring: 5-minute intervals, enabled by default.
      • Detailed Monitoring: 1-minute intervals for enhanced insights (paid feature).
      • Custom Metrics: Collect OS-level metrics like RAM usage using the CloudWatch agent.
      • Process Monitoring: Use the procstat plugin to monitor specific OS processes.

3. IPv6 Support

      • Supported from M4 large instances and above.
      • Requires configuring an egress-only NAT gateway.
      • Create a subnet with a /64 IPv6 CIDR block in your VPC.

4. EC2 Health Checks

      • System Check: Detects underlying host issues such as network or hardware failures.
      • Status Check: Identifies VM-level issues, including network configuration or file system errors.

5. EC2 Hibernate

      • Preserves the instance’s in-memory state (RAM) to enable fast rebooting.
      • The root EBS volume must be encrypted to use hibernation.

6. EC2 Naming Conventions

      • a: AMD processors
      • g: Graviton processors
      • i: Intel processors
      • d: Instance store
      • n: Network optimized
      • b: Block storage optimized
      • e: Extra storage or memory
      • z: High-frequency processors

This comprehensive guide should thoroughly understand AWS EC2’s capabilities, helping you optimize your cloud infrastructure for performance, reliability, and cost. For further details and updates, consult the official AWS documentation.