ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Data Format
    Data Engineering 2022. 6. 17. 11:02

    Parquet

    • Binary Format
    • Machine-Readable
    • Splitable
    • Column-wise
    • Good for Read-Heavy Apps
    • Compression-able
    • Mostly used in Apache Spark Apps

    Avro

    • Binary Format
    • Machine-Readable
    • Splitable
    • Row-wise
    • Good for Write-Heavy Apps
    • Compression-able
    • Schema Evolution-able
    • Mostly used in Kafka Apps

    ORC

    • Binary Format
    • Machine-Readable
    • Splitable
    • Column-wise
    • Good for Read-Heavy Apps
    • Mostly used in Hive Apps

    Protocol Buffers (ProtoBuf)

    • Binary Format
    • Machine-Readable
    • Splitable
    • Row-wise
    • Compression-able
    • Schema Evolution-able
    • Mostly used in Kafka Apps

    JSON

    • Non-Binary
    • Human-Readable
    • Non-splitable
    • Used for browser-based Apps

    XML

    • Non-Binary
    • Human-Readable
    • Non-splitable
    • Used for browser-based Apps

    CSV

    • Non-Binary
    • Human-Readable
    • Non-splitable
    • Row-wise
    • Used for data science Apps

    Reference

    https://www.youtube.com/watch?v=oipFhroPFVM

    'Data Engineering' 카테고리의 다른 글

    AWS MSK  (0) 2022.07.01
    Collection Introduction  (0) 2022.06.29
    Apache Airflow  (0) 2022.06.06
    Kinesis  (0) 2019.09.20
    Apache Kafka  (0) 2019.09.05

    댓글

Designed by Tistory.