-
AWS MSKData Engineering 2022. 7. 1. 20:05
Overview
- Alternative to Kinesis (Kafka vs Kinesis next lecture)
- Fully managed Apache Kafka on AWS
- Allow you to create, update, delete clusters
- MSK create & manages Kafka brokers nodes & Zookeeper nodes for you
- Deploy the MSK cluster in your VPC, multi-AZ (up to 3 for HA)
- Automatic recovery from common Apache Kafka failures
- Data is stored on EBS volumes
- You can build producers and consumers for your clusters
- Default message size of 1 MB
- Possibilities of sending large message (ex: 10 MB) into Kafka after custom configuration
Configuration
- Choose the number of AZ (3 - recommended, or 2)
- Choose the VPC & Subnets
- The broker instance type (ex: kafka.m5.large)
- The number of brokers per AZ (can add brokers later)
- Size of your EBS volumes (1 GB - 16 TB)
Security
Encryption
- Optional in-flight using TLS between the brokers
- Optional in-flight with TLS between the clients and brokers
- At rest for your EBS volumes using KMS
Network Security
- Authorize specific security groups for your Apache Kafka clients
Authentication & Authorization (important)
- Define who can read/write to which topics
- MutualTLS (AuthN) + Kafka ACLs (AuthZ)
- SASL/SCRAM (AuthN) + Kafka ACLs (AuthZ)
- IAM Access Control (AuthN + AuthZ)
Monitoring
CloudWatch Metrics
- Basic monitoring (cluster and broker metrics)
- Enhanced monitoring (++enhanced broker metrics)
- Topic-level monitoring (++enhanced topic-level metrics)
Prometheus (Open-Source Monitoring)
- Opens a port on the broker to export cluster, broker and topic-level metrics
- Setup the JMX Exporter (metrics) or Node Exporter (CPU and disk metrics)
Broker Log Delivery
- Delivery to CloudWatch Logs
- Delivery to Amazon S3
- Delivery to Kinesis Data Streams
MSK Connect
- Managed Kafka Connect workers on AWS
- Auto-scaling capabilities for workers
- You can deploy any Kafka Connect connectors to MSK Connect as a plugin
- S3, RedShift, OpenSearch, Debezium, and etc.
- Example pricing: Pay $ 0.11 per worker per hour
MSK Serverless
- Run Apache Kafka on MSK without managing the capacity
- MSK automatically provisions resources and scales compute & storage
- You just define your topics and your partitions and you’re good to go
- Security: IAM Access Control for all clusters
- Example Pricing:
- $0.75 per cluster per hour = $558 monthly per cluster
- $0.0015 per partition per hour = $1.08 monthly per partition
- $0.10 per GB of storage each month
- $0.10 per GB in
- $0.10 per GB out
Kinesis Data Streams vs Amazon MSK
Kinesis Data Streams Amazon MSK Message Size Limit 1 MB message size limit 1 MB default, configure for higher (ex: 10MB) Distribution Data Streams with Shards Kafka Topics with Partitions Sizing Shard Splitting & Merging Can only add partitions to a topic In-flight Security TLS In-flight encryption PLAINTEXT or TLS In-flight Encryption Rest Security KMS At-rest encryption KMS At-rest encryption Auth IAM policies for AuthN/AuthZ Mutual TLS (AuthN) + Kafka ACLs (AuthZ)
SASL/SCRAM (AuthN) + Kafka ACLs (AuthZ)
IAM Access Control (AuthN + AuthZ)'Data Engineering' 카테고리의 다른 글
Collection Introduction (0) 2022.06.29 Data Format (0) 2022.06.17 Apache Airflow (0) 2022.06.06 Kinesis (0) 2019.09.20 Apache Kafka (0) 2019.09.05