ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • AWS MSK
    Data Engineering 2022. 7. 1. 20:05

    Overview

    • Alternative to Kinesis (Kafka vs Kinesis next lecture)
    • Fully managed Apache Kafka on AWS
      • Allow you to create, update, delete clusters
      • MSK create & manages Kafka brokers nodes & Zookeeper nodes for you
      • Deploy the MSK cluster in your VPC, multi-AZ (up to 3 for HA)
      • Automatic recovery from common Apache Kafka failures
      • Data is stored on EBS volumes
    • You can build producers and consumers for your clusters
      • Default message size of 1 MB
      • Possibilities of sending large message (ex: 10 MB) into Kafka after custom configuration

    Configuration

    • Choose the number of AZ (3 - recommended, or 2)
    • Choose the VPC & Subnets
    • The broker instance type (ex: kafka.m5.large)
    • The number of brokers per AZ (can add brokers later)
    • Size of your EBS volumes (1 GB - 16 TB)

    Security

    Encryption

    • Optional in-flight using TLS between the brokers
    • Optional in-flight with TLS between the clients and brokers
    • At rest for your EBS volumes using KMS

    Network Security

    • Authorize specific security groups for your Apache Kafka clients

    Authentication & Authorization (important)

    • Define who can read/write to which topics
    • MutualTLS (AuthN) + Kafka ACLs (AuthZ)
    • SASL/SCRAM (AuthN) + Kafka ACLs (AuthZ)
    • IAM Access Control (AuthN + AuthZ)

    Monitoring

    CloudWatch Metrics

    • Basic monitoring (cluster and broker metrics)
    • Enhanced monitoring (++enhanced broker metrics)
    • Topic-level monitoring (++enhanced topic-level metrics)

    Prometheus (Open-Source Monitoring)

    • Opens a port on the broker to export cluster, broker and topic-level metrics
    • Setup the JMX Exporter (metrics) or Node Exporter (CPU and disk metrics)

    Broker Log Delivery

    • Delivery to CloudWatch Logs
    • Delivery to Amazon S3
    • Delivery to Kinesis Data Streams

    MSK Connect

    • Managed Kafka Connect workers on AWS
    • Auto-scaling capabilities for workers
    • You can deploy any Kafka Connect connectors to MSK Connect as a plugin
      • S3, RedShift, OpenSearch, Debezium, and etc.
    • Example pricing: Pay $ 0.11 per worker per hour

    MSK Serverless

    • Run Apache Kafka on MSK without managing the capacity
    • MSK automatically provisions resources and scales compute & storage
    • You just define your topics and your partitions and you’re good to go
    • Security: IAM Access Control for all clusters
    • Example Pricing:
      • $0.75 per cluster per hour = $558 monthly per cluster
      • $0.0015 per partition per hour = $1.08 monthly per partition
      • $0.10 per GB of storage each month
      • $0.10 per GB in
      • $0.10 per GB out

    Kinesis Data Streams vs Amazon MSK

      Kinesis Data Streams Amazon MSK
    Message Size Limit 1 MB message size limit 1 MB default, configure for higher (ex: 10MB)
    Distribution Data Streams with Shards Kafka Topics with Partitions
    Sizing Shard Splitting & Merging Can only add partitions to a topic
    In-flight Security TLS In-flight encryption PLAINTEXT or TLS In-flight Encryption
    Rest Security KMS At-rest encryption KMS At-rest encryption
    Auth IAM policies for AuthN/AuthZ Mutual TLS (AuthN) + Kafka ACLs (AuthZ)
    SASL/SCRAM (AuthN) + Kafka ACLs (AuthZ)
    IAM Access Control (AuthN + AuthZ)

    'Data Engineering' 카테고리의 다른 글

    Collection Introduction  (0) 2022.06.29
    Data Format  (0) 2022.06.17
    Apache Airflow  (0) 2022.06.06
    Kinesis  (0) 2019.09.20
    Apache Kafka  (0) 2019.09.05

    댓글

Designed by Tistory.