Apache Arrow Flight SQL

What is Apache Arrow Flight SQL?

Arrow Flight SQL is a subproject within the main Apache Arrow project that provides a high performance SQL interface for working with databases using the Arrow Flight RPC framework.

The goal of Arrow Flight SQL is to allow for efficient querying of big data with low latency over networks to support the needs of developers working with data in ways that protocols like ODBC and JDBC weren’t designed to handle. In real world use cases Arrow Flight SQL has been shown to have 20-50x better performance than ODBC and JDBC.

How does Arrow Flight SQL work?

Flight SQL is built on top of Arrow Flight as the name suggests. Arrow Flight is a framework that uses gRPC to make data transfer between servers more efficient.

The key feature of Arrow Flight SQL is that it uses the Apache Arrow columnar data model during transit, which means that the data doesn’t need to be serialized or deserialized which saves huge amounts of time and processing requirements as the amount of data being transferred gets larger. Due to being transferred in Arrow’s efficient columnar format the data also consumes less bandwidth due to superior compression ratios compared to standard row based formats.

Arrow Flight SQL is also able to support parallel queries across systems to increase performance and scalability. Arrow Flight SQL is also designed to be extensible and can be modified to support custom data sources if needed.

Related resources