aws
-
GlueCloud/AWS 2022. 6. 15. 17:59
Introduction Serverless discovery and definition of table definitions and schema S3 “Data Lakes” RDS Redshift Athena EMR Most other SQL databases Custom ETL jobs Trigger-driven, on a schedule, or on-demand Fully managed Use Apache Spark under the hood (Don’t need to manage the Spark Cluster) Glue Crawler Glue crawler scans data in S3, creates schema Can run periodically Populates the Glue Data C..
-
KinesisData Engineering 2019. 9. 20. 00:42
Kinesis Data Stream Real-time Data Stream Retention between 1 day to 365 days Ability to reprocess (replay) data Once data is inserted in Kinesis, it can’t be deleted (immutability) Data that share the same partition goes to the same shard (ordering) Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent Consumers Write your own: Kinesis Client Library (KCL), AWS SDK Managed: AWS Lamb..