Parquet
- Binary Format
- Machine-Readable
- Splitable
- Column-wise
- Good for Read-Heavy Apps
- Compression-able
- Mostly used in Apache Spark Apps
Avro
- Binary Format
- Machine-Readable
- Splitable
- Row-wise
- Good for Write-Heavy Apps
- Compression-able
- Schema Evolution-able
- Mostly used in Kafka Apps
ORC
- Binary Format
- Machine-Readable
- Splitable
- Column-wise
- Good for Read-Heavy Apps
- Mostly used in Hive Apps
Protocol Buffers (ProtoBuf)
- Binary Format
- Machine-Readable
- Splitable
- Row-wise
- Compression-able
- Schema Evolution-able
- Mostly used in Kafka Apps
JSON
- Non-Binary
- Human-Readable
- Non-splitable
- Used for browser-based Apps
XML
- Non-Binary
- Human-Readable
- Non-splitable
- Used for browser-based Apps
CSV
- Non-Binary
- Human-Readable
- Non-splitable
- Row-wise
- Used for data science Apps
Reference
https://www.youtube.com/watch?v=oipFhroPFVM