Skip to main content

Querying Data

Executing SQL

using var df = await context.SqlAsync("SELECT * FROM orders WHERE amount > 100");

Parameterized Queries

Use named parameters with $paramName syntax to safely bind values:

using var df = await context.SqlAsync(
"SELECT * FROM orders WHERE status = $status AND amount > $min_amount",
[("status", "Completed"), ("min_amount", 50.0)]);

Parameters are passed as a collection of NamedScalarValueAndMetadata (the naming is similar to original DataFusion library).

Thanks to implicit conversions, you can pass C# primitives directly:

// These are equivalent

// int -> ScalarValue.Int32
[("id", 42)]

// explicit ScalarValue
[("id", new ScalarValue.Int32(42))]

// full form without metadata
[new NamedScalarValueAndMetadata("id", new ScalarValueAndMetadata(new ScalarValue.Int32(42)))]

// full form with metadata
[new NamedScalarValueAndMetadata("id", new ScalarValueAndMetadata(new ScalarValue.Int32(42), new Dictionary<string, string>()))]

ScalarValue Types

ScalarValue is a type hierarchy mirroring DataFusion's Rust ScalarValue enum. Common types with implicit conversions:

C# TypeScalarValue
boolBoolean
floatFloat32
doubleFloat64
decimalDecimal128
sbyteInt8
shortInt16
intInt32
longInt64
byteUInt8
ushortUInt16
uintUInt32
ulongUInt64
stringUtf8
byte[]Binary
DateOnlyDate32
TimeOnlyTime64Microsecond
TimeSpanDurationMillisecond
DateTimeOffsetTimestampMillisecond

Additional types available (construct explicitly): Float16, Date64, TimestampSecond, TimestampMicrosecond, TimestampNanosecond, Decimal256, LargeUtf8, LargeBinary, and more.

All nullable variants pass null to represent SQL NULL:

[("name", (string?)null)]

or

[("name", ScalarValue.Utf8.Null)]

Result Methods

MethodReturnsDescription
CountAsync()ulongNumber of rows
ShowAsync(limit?)Print formatted table to stdout
ToStringAsync()stringFormatted table as string
CollectAsync()DataFrameCollectedResultAll record batches in memory
ExecuteStreamAsync()DataFrameStreamAsync enumerable of record batches

Working with RecordBatch

Arrow RecordBatch columns are typed arrays. Cast to the appropriate type to access values. For background on Arrow's columnar model and why casting is needed, see Working with Arrow.

using var result = await df.CollectAsync();
foreach (var batch in result.Batches)
{
var names = (StringArray)batch.Column(0); // access by index or name
var amounts = (Int64Array)batch.Column(1); // access by index or name

for (int i = 0; i < batch.Length; i++)
{
Console.WriteLine($"{names.GetString(i)}: {amounts.GetValue(i)}");
}
}

Common Arrow array types: StringArray, StringViewArray, Int32Array, Int64Array, FloatArray, DoubleArray, BooleanArray, BinaryArray, Date32Array, TimestampArray.

Extension Methods for Column Access

Instead of casting to typed arrays manually, you can use extension methods that return IEnumerable<T?> for LINQ-friendly access:

using var result = await df.CollectAsync();
foreach (var batch in result.Batches)
{
var names = batch.Column("name").AsString(); // IEnumerable<string?>
var amounts = batch.Column("amount").AsInt64(); // IEnumerable<long?>
var prices = batch.Column("price").AsDouble(); // IEnumerable<double?>
var active = batch.Column("active").AsBool(); // IEnumerable<bool?>

foreach (var name in names)
Console.WriteLine(name);
}

These methods are defined in ArrayExtensions and work on some IArrowArray column.

Binding Parameters to a DataFrame

WithParameters binds parameter values to a DataFrame created from a parameterized SQL query. It accepts the same NamedScalarValueAndMetadata as SqlAsync and returns the same DataFrame instance for chaining:

using var df = await context.SqlAsync("SELECT * FROM orders WHERE id = $id");
df.WithParameters([("id", 123)]);
await df.ShowAsync();

Reusing a Query Plan

Clone() creates a copy of the query plan, which is useful when you want to execute the same parameterized query with different parameter values:

using var df = await context.SqlAsync("SELECT * FROM orders WHERE status = $status");
using var df1 = df.Clone();
using var df2 = df.Clone();

df1.WithParameters([("status", "Completed")]);
df2.WithParameters([("status", "Pending")]);

await df1.ShowAsync();
await df2.ShowAsync();