ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Choosing the right database on AWS
    Cloud/AWS 2021. 3. 8. 17:47

    1. Overview

    We have a lot of managed databases on AWS to choose from. 

    1.1 Questions to choose the right database based on your architecture:

    • Read-heavy, write-heavy, or balanced workload?
    • Throughput needs?
      • Will it change?
    • Does it need to scale or fluctuate during the day?
    • How much data to store and for how long?
      • Will it grow?
    • Average object size?
    • How are they accessed?
    • Data durability?
    • Source of truth for the data?
    • Latency requirement?
    • Concurrent users?
    • Data model?
    • How will you query the data?
      • Joins?
      • Structured?
      • Semi-structured?
    • Strong schema?
      • More flexibility?
    • Reporting?
    • Search?
    • RDBMS/NoSQL?
    • License costs?
    • Switch to Cloud-Native DB Such as Aurora?

    2. Database Types

    • RDBMS (=SQL/Online Transaction processing(OLTP)): RDS, Aurora - great for joins
    • NoSQL database: DynamoDB (~JSON), ElasticCache (key/value pairs), Nepture (grapths) - no joins, no SQL
    • Object Store: S3 (for big objects) / Glacier (for backups/archives)
    • Data Warehouse(=SQL Analytics/BI): Redshift (Online Analytical processing(OLAP)), Athena
    • Search: ElasticSearch (JSON) - free text, unstructured seaches
    • Graphs: Nepture - displays relationships between data

    3. RDS Overview

    • Managed PostgreSQL/MySQL/Oracle/SQL Server
    • Must provision an EC2 instance & EBS Volume type and size in behind
    • Support for Read Replicas and Multi-AZ
    • Security through IAM, Security Groups, KMS, SSL in transit
    • Backup/Snapshot/Point in time restore feature
    • Managed and Scheduled maintenance
    • Monitoring through CloudWatch

    3.1 Use cases:

    Store relational datasets(RDBMS/OLTP), perform SQL queries, transactional inserts, update/ delete is available

    3.2 RDS for Solutions Architect

    • Operations: Small downtime when failover happens, when maintenance happens, scaling in read replicas / ec2 instance/ restore EBS implies manual intervention, application changes
    • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
    • Reliability: Multi-AZ feature, failover in case of failures
    • Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas Doesn't auto-scale
    • Cost: Pay per hour based on provisioned EC2 and EBS

    4. Aurora

    • Compatible API for PostgreSQL/MySQL
    • Data is held in 6 replicas, across 3 AZ
    • Auto healing capability
    • Multi-AZ, Auto Scaling Read Replicas
    • Read Replicas can be Global
    • Aurora database can be Global for DR or latency purposes
    • Auto-scaling of storage from 10GB to 64TB
    • Define EC2 instance type for aurora instances
    • Same security/monitoring/maintenance features as RDS
    • "Aurora Serverless" option

    4.1 Use Case

    • Same as RDS but with less maintenance/more flexibility/more performance/pricier

    4.2 Aurora for Solutions Architect

    • Operations: fewer operations, auto-scaling storage
    • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
    • Reliability: Multi-AZ, highly available, possibly more than RDS, Aurora serverless option
    • Performance: 5 times performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas(only 5 for RDS)
    • Cost: Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise-grade databases such as Oracle

    5. DynamoDB

    • AWS proprietary technology managed NoSQL database
    • Serverless, provisioned capacity, auto-scaling, on-demand capacity(Nov 2018)
    • Can replace ElastiCache as a key/value store (storing session data for example)
    • Highly Available, Multi-AZ by default, Read and Writes are decoupled, DAX for read cache
    • Reads can be eventually consistent or strongly consistent
    • Security, authentication, and authorization is done through IAM
    • DynamoDB Streams to integrate with AWS Lambda
    • Backup/Restore feature, Global Table feature
    • Monitoring through CloudWatch
    • Can only query on the primary key, sort key, or indexes

    5.1 Use Case

    • Serverless applications development (Small documents 100s KB)
    • Distributed serverless cache
    • Doesn't have SQL query language available
    • Has transaction capability from Nov 2018

    5.2 DynamoDB for Solutions Architect

    • Operations: No operations needed, auto-scaling capacity, serverless
    • Security: full security through IAM policies, KMS encryption, SSL in flight
    • Reliability: Multi-AZ, Backups
    • Performance: single-digit millisecond performance, DAX for caching reads, performance doesn't degrade if your application scales
    • Cost: Pay per provisioned capacity and storage usage (no need to guess in advanced any capacity - can use auto-scaling)

    6. S3

    • S3 is a key/value store for objects
    • Great for big objects, not so great for small objects
    • Serverless, scales infinitely, the max object size is 5TB
    • Eventually, consistency for overwrites and deletes
    • Tiers: S3 Standard, S3 IA, S3 One Zone IA, Glacier for backups
    • Features: Versioning, Encryption, Cross-Region Replication, and etc.
    • Security: IAM, Bucket Policies, ACL
    • Encryption: SSE-S3, SSE-KMS, SSE-C, client side encryption, SSL in transit

    6.1 Use case

    Static files, key-value store for big files, website hosting

    6.2 S3 for Solutions Architect

    • Operations: no operations needed
    • Security: IAM, Bucket Policies, ACL, Encryption (Server/Client), SSL
    • Reliability: 99.99999999% durability/99.99% availability, Multi-AZ, CRR
    • Performance: Scales to thousands of reading/writes per second, transfer acceleration/multi-part for big files
    • Cost: Pay per storage usage, network cost, requests number

    7. ElastiCache

    • Managed Redis/Memcached (similar offering as RDS, but for caches)
    • The in-memory data store, sub-millisecond latency
    • Must provision an EC2 instance type
    • Support for Clustering (Redis) and Multi-AZ, Read Replicas (sharding)
    • Security through IAM, Security Groups, KMS, Redis Auth
    • Backup/Snapshot/Point in time restore feature
    • Managed and Scheduled maintenance
    • Monitoring through CloudWatch

    7.1 Use Case

    Key/Value Store, Frequent reads, less write, cache results for DB queries, store session data for websites, cannot use SQL

    7.2 ElastiCache for Solutions Architect

    • Operations: same as RDS
    • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL
    • Reliability: Clustering, Multi-AZ
    • Performance: Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option
    • Cost: Pay per hour based on EC2 and storage usage

    8. Athena

    • Fully serverless database with SQL capabilities
    • Used to query data in S3
    • Pay per query
    • Output results back to S3
    • Secured through IAM
    • lightweight queries or not too complicated, not too many joints, Athena is a great candidate

    8.1 Use Case

    One time SQL queries, serverless queries on S3, log analytics

    8.2 Athena for Solutions Architect

    • Operations: no operations needed, serverless
    • Security: IAM + S3 security
    • Reliability: managed service, uses Presto engine, highly available
    • Performance: queries scale based on data size
    • Cost: Pay per query / per TB of data scanned, serverless

    'Cloud > AWS' 카테고리의 다른 글

    Lake Formation  (0) 2022.04.26
    Lambda  (0) 2021.03.09
    DynamoDB  (0) 2021.03.08
    Simple Storage Service (S3)  (0) 2020.11.24
    CloudFront  (0) 2020.11.24

    댓글

Designed by Tistory.