Cloud/AWS

Choosing the right database on AWS

데먕 2021. 3. 8. 17:47

1. Overview

We have a lot of managed databases on AWS to choose from. 

1.1 Questions to choose the right database based on your architecture:

  • Read-heavy, write-heavy, or balanced workload?
  • Throughput needs?
    • Will it change?
  • Does it need to scale or fluctuate during the day?
  • How much data to store and for how long?
    • Will it grow?
  • Average object size?
  • How are they accessed?
  • Data durability?
  • Source of truth for the data?
  • Latency requirement?
  • Concurrent users?
  • Data model?
  • How will you query the data?
    • Joins?
    • Structured?
    • Semi-structured?
  • Strong schema?
    • More flexibility?
  • Reporting?
  • Search?
  • RDBMS/NoSQL?
  • License costs?
  • Switch to Cloud-Native DB Such as Aurora?

2. Database Types

  • RDBMS (=SQL/Online Transaction processing(OLTP)): RDS, Aurora - great for joins
  • NoSQL database: DynamoDB (~JSON), ElasticCache (key/value pairs), Nepture (grapths) - no joins, no SQL
  • Object Store: S3 (for big objects) / Glacier (for backups/archives)
  • Data Warehouse(=SQL Analytics/BI): Redshift (Online Analytical processing(OLAP)), Athena
  • Search: ElasticSearch (JSON) - free text, unstructured seaches
  • Graphs: Nepture - displays relationships between data

3. RDS Overview

  • Managed PostgreSQL/MySQL/Oracle/SQL Server
  • Must provision an EC2 instance & EBS Volume type and size in behind
  • Support for Read Replicas and Multi-AZ
  • Security through IAM, Security Groups, KMS, SSL in transit
  • Backup/Snapshot/Point in time restore feature
  • Managed and Scheduled maintenance
  • Monitoring through CloudWatch

3.1 Use cases:

Store relational datasets(RDBMS/OLTP), perform SQL queries, transactional inserts, update/ delete is available

3.2 RDS for Solutions Architect

  • Operations: Small downtime when failover happens, when maintenance happens, scaling in read replicas / ec2 instance/ restore EBS implies manual intervention, application changes
  • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
  • Reliability: Multi-AZ feature, failover in case of failures
  • Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas Doesn't auto-scale
  • Cost: Pay per hour based on provisioned EC2 and EBS

4. Aurora

  • Compatible API for PostgreSQL/MySQL
  • Data is held in 6 replicas, across 3 AZ
  • Auto healing capability
  • Multi-AZ, Auto Scaling Read Replicas
  • Read Replicas can be Global
  • Aurora database can be Global for DR or latency purposes
  • Auto-scaling of storage from 10GB to 64TB
  • Define EC2 instance type for aurora instances
  • Same security/monitoring/maintenance features as RDS
  • "Aurora Serverless" option

4.1 Use Case

  • Same as RDS but with less maintenance/more flexibility/more performance/pricier

4.2 Aurora for Solutions Architect

  • Operations: fewer operations, auto-scaling storage
  • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
  • Reliability: Multi-AZ, highly available, possibly more than RDS, Aurora serverless option
  • Performance: 5 times performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas(only 5 for RDS)
  • Cost: Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise-grade databases such as Oracle

5. DynamoDB

  • AWS proprietary technology managed NoSQL database
  • Serverless, provisioned capacity, auto-scaling, on-demand capacity(Nov 2018)
  • Can replace ElastiCache as a key/value store (storing session data for example)
  • Highly Available, Multi-AZ by default, Read and Writes are decoupled, DAX for read cache
  • Reads can be eventually consistent or strongly consistent
  • Security, authentication, and authorization is done through IAM
  • DynamoDB Streams to integrate with AWS Lambda
  • Backup/Restore feature, Global Table feature
  • Monitoring through CloudWatch
  • Can only query on the primary key, sort key, or indexes

5.1 Use Case

  • Serverless applications development (Small documents 100s KB)
  • Distributed serverless cache
  • Doesn't have SQL query language available
  • Has transaction capability from Nov 2018

5.2 DynamoDB for Solutions Architect

  • Operations: No operations needed, auto-scaling capacity, serverless
  • Security: full security through IAM policies, KMS encryption, SSL in flight
  • Reliability: Multi-AZ, Backups
  • Performance: single-digit millisecond performance, DAX for caching reads, performance doesn't degrade if your application scales
  • Cost: Pay per provisioned capacity and storage usage (no need to guess in advanced any capacity - can use auto-scaling)

6. S3

  • S3 is a key/value store for objects
  • Great for big objects, not so great for small objects
  • Serverless, scales infinitely, the max object size is 5TB
  • Eventually, consistency for overwrites and deletes
  • Tiers: S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups
  • Features: Versioning, Encryption, Cross-Region Replication, and etc.
  • Security: IAM, Bucket Policies, ACL
  • Encryption: SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit

6.1 Use case

Static files, key-value store for big files, website hosting

6.2 S3 for Solutions Architect

  • Operations: no operations needed
  • Security: IAM, Bucket Policies, ACL, Encryption (Server/Client), SSL
  • Reliability: 99.99999999% durability/99.99% availability, Multi-AZ, CRR
  • Performance: Scales to thousands of reading/writes per second, transfer acceleration/multi-part for big files
  • Cost: Pay per storage usage, network cost, requests number

7. ElastiCache

  • Managed Redis/Memcached (similar offering as RDS, but for caches)
  • The in-memory data store, sub-millisecond latency
  • Must provision an EC2 instance type
  • Support for Clustering (Redis) and Multi-AZ, Read Replicas (sharding)
  • Security through IAM, Security Groups, KMS, Redis Auth
  • Backup/Snapshot/Point in time restore feature
  • Managed and Scheduled maintenance
  • Monitoring through CloudWatch

7.1 Use Case

Key/Value Store, Frequent reads, less write, cache results for DB queries, store session data for websites, cannot use SQL

7.2 ElastiCache for Solutions Architect

  • Operations: same as RDS
  • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL
  • Reliability: Clustering, Multi-AZ
  • Performance: Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option
  • Cost: Pay per hour based on EC2 and storage usage

8. Athena

  • Fully serverless database with SQL capabilities
  • Used to query data in S3
  • Pay per query
  • Output results back to S3
  • Secured through IAM
  • lightweight queries or not too complicated, not too many joints, Athena is a great candidate

8.1 Use Case

One time SQL queries, serverless queries on S3, log analytics

8.2 Athena for Solutions Architect

  • Operations: no operations needed, serverless
  • Security: IAM + S3 security
  • Reliability: managed service, uses Presto engine, highly available
  • Performance: queries scale based on data size
  • Cost: Pay per query / per TB of data scanned, serverless