Cloud
-
AthenaCloud/AWS 2022. 6. 17. 10:56
Overview Interactive query service for S3 (SQL) No need to load data, it stays in S3 Presto under the hood Serverless Unstructured, semi-structured, or structured Supports many data formats CSV JSON ORC Parquet Avro Examples ad-hoc queries of weblogs Querying staging data before loading to Redshift Analyze Cloudtail/CloudFront/VPC/ELB etc logs in S3 Integration with Jupiter, Zeppelin, R Studio n..
-
GlueCloud/AWS 2022. 6. 15. 17:59
Introduction Serverless discovery and definition of table definitions and schema S3 “Data Lakes” RDS Redshift Athena EMR Most other SQL databases Custom ETL jobs Trigger-driven, on a schedule, or on-demand Fully managed Use Apache Spark under the hood (Don’t need to manage the Spark Cluster) Glue Crawler Glue crawler scans data in S3, creates schema Can run periodically Populates the Glue Data C..
-
AWS RedshiftCloud/AWS 2022. 6. 5. 17:30
Overview Fully-managed, petabyte scale data warehouse service 10 times better performance than other DW’s Via machine learning, massively parallel query execution(MPP), columnar storage Designed for OLAP, not OLTP Cost effective SQL, ODBC, JDBC interfaces Scale up or down on demand Built-in replication & backups Monitoring via CloudWatch/CloudTrail Use Cases Accelerate analytics workloads Unifie..
-
Lake FormationCloud/AWS 2022. 4. 26. 11:43
Introduction Can tie to IAM users/roles, SAML, or external AWS accounts Can use policy tags on databases, tables, or columns Can select specific permissions for tables or columns Overview “Makes it easy to set up a secure data lake in days” Loading data & monitoring data flows Setting up partitions Encryption & managing keys Defining transformation jobs & monitoring them Access control Auditing ..
-
LambdaCloud/AWS 2021. 3. 9. 09:46
1. Comparison between EC2 and Lambda 1.1 EC2 Virtual Servers in the Cloud Limited by RAM and CPU Continuously running Scaling means interventions to add/remove servers 1.2 Lambda Virtual functions - no servers to manage Limited by time - short executions Run on-demand Scaling is automated 2. Benefits of AWS Lambda 2.1 Easy Pricing Pay per request and compute time Free-tier of 1,000,000 AWS Lambd..
-
Choosing the right database on AWSCloud/AWS 2021. 3. 8. 17:47
1. Overview We have a lot of managed databases on AWS to choose from. 1.1 Questions to choose the right database based on your architecture: Read-heavy, write-heavy, or balanced workload? Throughput needs? Will it change? Does it need to scale or fluctuate during the day? How much data to store and for how long? Will it grow? Average object size? How are they accessed? Data durability? Source of..
-
DynamoDBCloud/AWS 2021. 3. 8. 15:07
1. Overview Fully Managed, Highly available with replication across 3 AZ NoSQL database: Not a regional database Scales to massive workloads, distributed database Millions of requests per seconds, trillions of row, 100s of TB of storage Fast and consistent in performance (low latency on retrieval) Integrated with IAM for security, authorization, and administration Enables event-driven programmin..
-
Simple Storage Service (S3)Cloud/AWS 2020. 11. 24. 16:30
1. Overview Amazon S3 is one of the main building blocks of AWS It's advertised as "infinitely scaling" storage It's widely popular and deserves its own section Maybe websites use Amazon S3 as a backbone Many AWS services use Amazone S3 as an integration as well 2. Buckets Amazon S3 allows people to store objects (files) in "buckets" (directories) Buckets must have a globally unique name Buckets..