Apache Arrow
A cross-language development platform for in-memory data.
Language: Multi-languageCategory: DataFirst released: 2016Created by: Apache Software FoundationLicense: Apache-2.0
Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It provides libraries in over a dozen languages including C, C++, Python, Java, JavaScript, Rust, and Go, all implementing the same standardized format so data can be exchanged without serialization overhead. The Arrow Flight RPC protocol extends this zero-copy approach across networked services with gRPC-based transport. Arrow integrates deeply with the broader data ecosystem, serving as the in-memory backbone for Apache Spark, Pandas, Dask, DuckDB, Polars, and many other projects.
Links
Key Features
Columnar in-memory formatZero-copy IPCFlight RPC for data transportC, C++, Python, Java, Rust bindingsGPU support (CUDA)Gandiva expression compilerParquet integrationIntegration with Pandas, Spark, Dask