-
Choosing the right database on AWSCloud/AWS 2021. 3. 8. 17:47
1. Overview
We have a lot of managed databases on AWS to choose from.
1.1 Questions to choose the right database based on your architecture:
- Read-heavy, write-heavy, or balanced workload?
- Throughput needs?
- Will it change?
- Does it need to scale or fluctuate during the day?
- How much data to store and for how long?
- Will it grow?
- Average object size?
- How are they accessed?
- Data durability?
- Source of truth for the data?
- Latency requirement?
- Concurrent users?
- Data model?
- How will you query the data?
- Joins?
- Structured?
- Semi-structured?
- Strong schema?
- More flexibility?
- Reporting?
- Search?
- RDBMS/NoSQL?
- License costs?
- Switch to Cloud-Native DB Such as Aurora?
2. Database Types
- RDBMS (=SQL/Online Transaction processing(OLTP)): RDS, Aurora - great for joins
- NoSQL database: DynamoDB (~JSON), ElasticCache (key/value pairs), Nepture (grapths) - no joins, no SQL
- Object Store: S3 (for big objects) / Glacier (for backups/archives)
- Data Warehouse(=SQL Analytics/BI): Redshift (Online Analytical processing(OLAP)), Athena
- Search: ElasticSearch (JSON) - free text, unstructured seaches
- Graphs: Nepture - displays relationships between data
3. RDS Overview
- Managed PostgreSQL/MySQL/Oracle/SQL Server
- Must provision an EC2 instance & EBS Volume type and size in behind
- Support for Read Replicas and Multi-AZ
- Security through IAM, Security Groups, KMS, SSL in transit
- Backup/Snapshot/Point in time restore feature
- Managed and Scheduled maintenance
- Monitoring through CloudWatch
3.1 Use cases:
Store relational datasets(RDBMS/OLTP), perform SQL queries, transactional inserts, update/ delete is available
3.2 RDS for Solutions Architect
- Operations: Small downtime when failover happens, when maintenance happens, scaling in read replicas / ec2 instance/ restore EBS implies manual intervention, application changes
- Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
- Reliability: Multi-AZ feature, failover in case of failures
- Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas Doesn't auto-scale
- Cost: Pay per hour based on provisioned EC2 and EBS
4. Aurora
- Compatible API for PostgreSQL/MySQL
- Data is held in 6 replicas, across 3 AZ
- Auto healing capability
- Multi-AZ, Auto Scaling Read Replicas
- Read Replicas can be Global
- Aurora database can be Global for DR or latency purposes
- Auto-scaling of storage from 10GB to 64TB
- Define EC2 instance type for aurora instances
- Same security/monitoring/maintenance features as RDS
- "Aurora Serverless" option
4.1 Use Case
- Same as RDS but with less maintenance/more flexibility/more performance/pricier
4.2 Aurora for Solutions Architect
- Operations: fewer operations, auto-scaling storage
- Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
- Reliability: Multi-AZ, highly available, possibly more than RDS, Aurora serverless option
- Performance: 5 times performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas(only 5 for RDS)
- Cost: Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise-grade databases such as Oracle
5. DynamoDB
- AWS proprietary technology managed NoSQL database
- Serverless, provisioned capacity, auto-scaling, on-demand capacity(Nov 2018)
- Can replace ElastiCache as a key/value store (storing session data for example)
- Highly Available, Multi-AZ by default, Read and Writes are decoupled, DAX for read cache
- Reads can be eventually consistent or strongly consistent
- Security, authentication, and authorization is done through IAM
- DynamoDB Streams to integrate with AWS Lambda
- Backup/Restore feature, Global Table feature
- Monitoring through CloudWatch
- Can only query on the primary key, sort key, or indexes
5.1 Use Case
- Serverless applications development (Small documents 100s KB)
- Distributed serverless cache
- Doesn't have SQL query language available
- Has transaction capability from Nov 2018
5.2 DynamoDB for Solutions Architect
- Operations: No operations needed, auto-scaling capacity, serverless
- Security: full security through IAM policies, KMS encryption, SSL in flight
- Reliability: Multi-AZ, Backups
- Performance: single-digit millisecond performance, DAX for caching reads, performance doesn't degrade if your application scales
- Cost: Pay per provisioned capacity and storage usage (no need to guess in advanced any capacity - can use auto-scaling)
6. S3
- S3 is a key/value store for objects
- Great for big objects, not so great for small objects
- Serverless, scales infinitely, the max object size is 5TB
- Eventually, consistency for overwrites and deletes
- Tiers: S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups
- Features: Versioning, Encryption, Cross-Region Replication, and etc.
- Security: IAM, Bucket Policies, ACL
- Encryption: SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit
6.1 Use case
Static files, key-value store for big files, website hosting
6.2 S3 for Solutions Architect
- Operations: no operations needed
- Security: IAM, Bucket Policies, ACL, Encryption (Server/Client), SSL
- Reliability: 99.99999999% durability/99.99% availability, Multi-AZ, CRR
- Performance: Scales to thousands of reading/writes per second, transfer acceleration/multi-part for big files
- Cost: Pay per storage usage, network cost, requests number
7. ElastiCache
- Managed Redis/Memcached (similar offering as RDS, but for caches)
- The in-memory data store, sub-millisecond latency
- Must provision an EC2 instance type
- Support for Clustering (Redis) and Multi-AZ, Read Replicas (sharding)
- Security through IAM, Security Groups, KMS, Redis Auth
- Backup/Snapshot/Point in time restore feature
- Managed and Scheduled maintenance
- Monitoring through CloudWatch
7.1 Use Case
Key/Value Store, Frequent reads, less write, cache results for DB queries, store session data for websites, cannot use SQL
7.2 ElastiCache for Solutions Architect
- Operations: same as RDS
- Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL
- Reliability: Clustering, Multi-AZ
- Performance: Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option
- Cost: Pay per hour based on EC2 and storage usage
8. Athena
- Fully serverless database with SQL capabilities
- Used to query data in S3
- Pay per query
- Output results back to S3
- Secured through IAM
- lightweight queries or not too complicated, not too many joints, Athena is a great candidate
8.1 Use Case
One time SQL queries, serverless queries on S3, log analytics
8.2 Athena for Solutions Architect
- Operations: no operations needed, serverless
- Security: IAM + S3 security
- Reliability: managed service, uses Presto engine, highly available
- Performance: queries scale based on data size
- Cost: Pay per query / per TB of data scanned, serverless
'Cloud > AWS' 카테고리의 다른 글
Lake Formation (0) 2022.04.26 Lambda (0) 2021.03.09 DynamoDB (0) 2021.03.08 Simple Storage Service (S3) (0) 2020.11.24 CloudFront (0) 2020.11.24