Cloud/AWS

Lake Formation

데먕 2022. 4. 26. 11:43

Introduction

  • Can tie to IAM users/roles, SAML, or external AWS accounts
  • Can use policy tags on databases, tables, or columns
  • Can select specific permissions for tables or columns

Overview

  • “Makes it easy to set up a secure data lake in days”
  • Loading data & monitoring data flows
  • Setting up partitions
  • Encryption & managing keys
  • Defining transformation jobs & monitoring them
  • Access control
  • Auditing
  • Built on top of Glue

Pricing

No cost for Lake Formation itself. but underlying services incur changes

  • Glue
  • S3
  • EMR
  • Athena
  • Redshift

Building a Data Lake

  1. Create an IAM user for Data Analyst
  2. Create AWS Glue connection to your data sources
  3. Create S3 bucket for the lake
  4. Register the S3 path in Lake Formation, grant permissions
  5. Create database in Lake Formation for data catalog, gran permissions
  6. Use a blueprint for a workflow (ie, Database snapshot)
  7. Run the workflow
  8. Grant SELECT permissions to whoever needs to read it (Athena, Redshift Spectrum, etc)

Finer Point

Lake Formation does not support manifests in Athena or Redshift queries

IAM Permissions on the KMS encryption key are needed for encrypted data catalogs in Lake Formation

IAM Permissions needed to create blueprints and workflow

Cross-account Lake Formation permission

  • Recipient must be set up as a data lake administrator
  • Can use AWS Resource Access Manager for accounts external to your organization
  • IAM permissions for cross-account access

Troubleshooting

If encountering an error about being able to create a blueprint or a workflow, it’s probably an IAM issue

If encountering any cross-account permission issue, it’s probably need to do something with Resource Access Manager (RAM)