Glue
-
AthenaCloud/AWS 2022. 6. 17. 10:56
Overview Interactive query service for S3 (SQL) No need to load data, it stays in S3 Presto under the hood Serverless Unstructured, semi-structured, or structured Supports many data formats CSV JSON ORC Parquet Avro Examples ad-hoc queries of weblogs Querying staging data before loading to Redshift Analyze Cloudtail/CloudFront/VPC/ELB etc logs in S3 Integration with Jupiter, Zeppelin, R Studio n..
-
GlueCloud/AWS 2022. 6. 15. 17:59
Introduction Serverless discovery and definition of table definitions and schema S3 “Data Lakes” RDS Redshift Athena EMR Most other SQL databases Custom ETL jobs Trigger-driven, on a schedule, or on-demand Fully managed Use Apache Spark under the hood (Don’t need to manage the Spark Cluster) Glue Crawler Glue crawler scans data in S3, creates schema Can run periodically Populates the Glue Data C..