ETL
-
GlueCloud/AWS 2022. 6. 15. 17:59
Introduction Serverless discovery and definition of table definitions and schema S3 “Data Lakes” RDS Redshift Athena EMR Most other SQL databases Custom ETL jobs Trigger-driven, on a schedule, or on-demand Fully managed Use Apache Spark under the hood (Don’t need to manage the Spark Cluster) Glue Crawler Glue crawler scans data in S3, creates schema Can run periodically Populates the Glue Data C..
-
Difference between Extract, Transform and Load(ETL) and Enterprise Application Integration(EAI)Modeling/Architecture 2019. 9. 29. 18:03
1. Overview Although both Extract, Transform and Load(ETL) and Enterprise Application Integration(EAI) technologies seem surprisingly similar from an architectural view - where so-called adapters (or connectors) provide access to systems and data sources transformations take place to standardize proprietary formats, or routing capabilities are used to move packets of data - ETL and EAI serve fun..
-
OLTP, OLAP, and ETLDB/RDB 2019. 9. 11. 14:28
1. Overview 1.1 On-line Transaction Processing (OLTP) OLTP stands for On-line Transaction Processing. OLTP based systems (account, ticket booking, banking systems, money transfer system) are used to perform a large number of short transactions. Almost all of the database queries in OLTP system consist of commands insert, update, delete. Select queries are mainly designed to enable users to selec..
-
Apache HadoopDistributedSystem 2019. 9. 5. 03:44
1. Overview Apache Hadoop is a set of software technology components that together form a scalable system optimized for analyzing data. Data analyzed on Hadoop has several typical characteristics. Structured: For example, customer data, transaction data, and clickstream data that is recorded when people click links while visiting websites Unstructured: For example, text from web-based news feeds..