Array Blind Indexes
[BlindIndex] makes a single encrypted string queryable. [ArrayBlindIndex] does the same for a string collection - a string[], List<string>, HashSet<string>, and similar. Tayra computes one HMAC hash per element and stores the result in a companion collection of the same shape and length, so you can answer questions like "find every subject whose email list contains alice@example.com" without decrypting a single row.
Version compatibility
[ArrayBlindIndex] is a Tayra.Core feature with no framework-specific integration code, so it works across every supported store and integration package - Tayra.Marten8 / Tayra.Marten9, Tayra.Wolverine5 / Tayra.Wolverine6, EF Core (Npgsql), and MongoDB. The companion collection is plain data that every supported serializer round-trips natively, so no version-specific configuration is required.
When to Use
Reach for [ArrayBlindIndex] when:
- A property holds a collection of personal strings (email addresses, phone numbers, aliases, prior surnames) and you need equality lookups against the collection.
- You want native LINQ
Contains/Anyto translate into an indexed database operation rather than a full table scan.
It is the collection sibling of [BlindIndex]. The HMAC machinery, key separation, transforms, scope, and truncation all behave exactly as they do for the scalar attribute - applied per element.
Quick Start
Apply [ArrayBlindIndex] next to [PersonalData] and add a companion collection property:
public class User
{
[DataSubjectId]
public Guid Id { get; set; }
[PersonalData]
[ArrayBlindIndex(
IndexName = "EmailsIndex",
IndexPropertyName = nameof(EmailsIndex),
Transforms = ["lowercase", "trim"])]
public string[] Emails { get; set; } = [];
/// <summary>
/// Companion collection - one HMAC hash per element of Emails.
/// Same kind (array) and same element nullability as the source.
/// </summary>
public string[] EmailsIndex { get; set; } = [];
}When EncryptAsync runs (or the EF Core / Marten interceptor fires on save), Tayra enumerates Emails, applies the transforms to each element, HMACs it, and writes the hashes into EmailsIndex element-by-element. The source order, length, and duplicates are preserved.
Attribute Reference
[ArrayBlindIndex] mirrors [BlindIndex] exactly. There are no array-specific knobs (no dedupe option, no length cap - those are the caller's responsibility).
| Property | Type | Default | Description |
|---|---|---|---|
IndexName | string? | {PropertyName}Index | Logical query name passed to ComputeBlindIndexAsync. |
IndexPropertyName | string? | {PropertyName}Index | Name of the companion collection property that holds the hashes. |
BitLength | int | 0 (full 256-bit) | Truncates each element hash to the given bit length. See Security Considerations. |
Scope | string | "default" | HMAC key scope. Elements in the same scope share an HMAC key. |
Transforms | string[] | [] | Normalisation transforms applied to each element before hashing. See Transforms. |
Fluent Configuration
If you prefer keeping attributes off your model, configure the same index fluently. ArrayBlindIndex sits alongside the scalar BlindIndex and CompoundBlindIndex builders and exposes the full attribute surface: transforms, StoredIn (companion), WithIndexName, WithScope, and WithBitLength.
services.AddTayra(opts =>
{
opts.LicenseKey = licenseKey;
opts.Entity<User>(e =>
{
e.ArrayBlindIndex(u => u.Emails)
.WithLowercase()
.WithTrim()
.StoredIn(u => u.EmailsIndex) // optional; defaults to {Property}Index
.WithScope("emails")
.WithBitLength(0);
});
});The fluent and attribute paths produce identical companions for the same plaintext, transforms, and scope. The shape rules below are enforced either way - the fluent path validates the source and companion collection types when the metadata is built (throwing InvalidOperationException on a bad shape), while the attribute path is additionally checked at compile time by TAYRA010.
Source and Companion Shape Rules
The companion property must be the same collection kind and same element nullability as the source. The analyzer TAYRA010 enforces this at compile time; a runtime guard backs it up for reflection-defined types.
| Declared source type | Required companion type |
|---|---|
string[] | string[] |
string?[] | string?[] |
List<string> | List<string> |
List<string?> | List<string?> |
IList<string> | IList<string> |
IList<string?> | IList<string?> |
IReadOnlyList<string> | IReadOnlyList<string> |
IReadOnlyList<string?> | IReadOnlyList<string?> |
HashSet<string> | HashSet<string> |
HashSet<string?> | HashSet<string?> |
IEnumerable<string> | Rejected - needs a materialized shape |
Non-string elements (int[], Guid[], ...) | Rejected - v1 supports string elements only |
Notes:
- Source nullability drives the contract. If the source declares non-null elements (
string[],List<string>), a null element at write time is a caller bug and the indexer throws. Declare the nullable variant (string?[],List<string?>) if your data legitimately contains null slots. HashSet<string>dedupes by set semantics. Because the source set already holds distinct values, the companion holds distinct hashes. Order is not preserved (consistent withHashSet).
Null, Empty, and Duplicate Behavior
| Source state | Companion result | Notes |
|---|---|---|
Whole collection is null | Companion left untouched (no-op) | Tayra never overwrites with an empty collection here. |
| Empty collection | Empty companion of the same kind | string[0] to string[0], List<string>(0) to List<string>(0). |
null element, source admits nulls (string?[]) | null companion slot | Index alignment preserved. |
null element, source declares non-null (string[]) | Indexer throws | Caller bug surfaced loudly with the entity type, property, and index. |
"" (empty string) element | HMAC(transforms("")) | Hashed normally - same as scalar [BlindIndex]. |
Duplicate elements (["a", "a"]) | Duplicate hashes ([h("a"), h("a")]) | Preserved, unless the source is HashSet<string>, which dedupes first. |
Querying
Querying uses each LINQ provider's native collection operators against the companion - there are no Tayra-specific LINQ extensions or expression rewriters. Compute the search hash (or hashes) with Tayra, then write a plain Contains / Any query.
Single value
// "users whose email list contains this address"
var hash = await tayra.ComputeBlindIndexAsync(
"alice@example.com", "EmailsIndex", typeof(User));
var users = await session.Query<User>()
.Where(u => u.EmailsIndex.Contains(hash))
.ToListAsync();Set of values
ComputeBlindIndexesAsync hashes a batch of search terms in one call, reusing the cached HMAC key:
// "users matching any of these addresses"
var hashes = await tayra.ComputeBlindIndexesAsync(
["alice@example.com", "bob@example.com"], "EmailsIndex", typeof(User));
var users = await session.Query<User>()
.Where(u => u.EmailsIndex.Any(h => hashes.Contains(h)))
.ToListAsync();The signatures are:
Task<string> ITayra.ComputeBlindIndexAsync(
string value, string indexName, Type entityType, CancellationToken ct = default);
// extension on ITayra
Task<IReadOnlyList<string>> TayraBlindIndexExtensions.ComputeBlindIndexesAsync(
this ITayra tayra,
IEnumerable<string> values, string indexName, Type entityType,
CancellationToken ct = default);Both take the Type entityType so Tayra can resolve the index definition and its scope. See Querying for the full pattern reference.
Per-Backend Storage and Query
Tayra stays storage-agnostic - it computes the companion collection and lets your store persist it. The guidance below covers the shape and index recommendation for each backend.
Marten (JSONB)
Marten serialises the companion as a JSON array inside the document. Native LINQ Contains / Any translate to JSONB containment. Register a GIN index over the companion path for efficient lookups:
services.AddMarten(opts =>
{
opts.Connection(connectionString);
opts.UseTayra();
// GIN index on the companion array for fast containment queries
opts.Schema.For<User>().Index(x => x.EmailsIndex, idx =>
{
idx.Method = IndexMethod.gin;
});
});var hash = await tayra.ComputeBlindIndexAsync(
"alice@example.com", "EmailsIndex", typeof(User));
var users = await session.Query<User>()
.Where(u => u.EmailsIndex.Contains(hash))
.ToListAsync();Duplicated fields: companion only, never the source
The companion holds HMAC hashes, not personal data, so it is safe to expose as a Marten duplicated field - the same pattern TAYRA008 steers you toward for scalar [BlindIndex] companions. The [PersonalData] source array must never be duplicated: Marten reads a duplicated column off the .NET object outside Tayra's serializer, so duplicating the source would write plaintext (TAYRA008 flags it).
opts.Schema.For<User>()
// OK: the companion is hashes, not PII
.Duplicate(x => x.EmailsIndex, pgType: "text[]", configure: idx => idx.Method = IndexMethod.gin);
// NOT x.Emails - that is [PersonalData]; TAYRA008 will flag itOne caveat specific to array companions: unlike a scalar companion (which duplicates into an indexable equality column), a duplicated array companion does not speed up Contains lookups. Marten translates a duplicated-array Contains to ? = ANY(col), which a GIN index does not serve, whereas the JSONB companion path above translates to the GIN-servable @> containment operator. So for array companions, prefer the JSONB GIN index and only duplicate the companion if you need the relational array column for a different access pattern.
EF Core + Npgsql (text[])
Map the companion as a PostgreSQL text[] column and add a GIN index. Contains translates to the @> containment operator:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<User>(b =>
{
b.Property(u => u.EmailsIndex)
.HasColumnType("text[]");
b.HasIndex(u => u.EmailsIndex)
.HasMethod("gin");
});
}var hash = await tayra.ComputeBlindIndexAsync(
"alice@example.com", "EmailsIndex", typeof(User));
var users = await db.Users
.Where(u => u.EmailsIndex.Contains(hash))
.ToListAsync();The generated SQL uses the containment operator against the GIN index:
WHERE x."EmailsIndex" @> ARRAY[@hash]::text[]The set query also works:
var hashes = await tayra.ComputeBlindIndexesAsync(
["alice@example.com", "bob@example.com"], "EmailsIndex", typeof(User));
var users = await db.Users
.Where(u => u.EmailsIndex.Any(h => hashes.Contains(h)))
.ToListAsync();EF Core + SQL Server / SQLite (caller-driven)
SQL Server and SQLite have no native array column type. Tayra does not ship migration helpers for them in v1, but the companion still works if you provide the storage shape yourself. Two common patterns:
- JSON column with a value converter. Map the companion
string[]to a JSONnvarchar(max)/textcolumn via aValueConverter, and query with the provider's JSON functions (for exampleOPENJSONon SQL Server). Containment may need an explicit JSON-path predicate. - Sidecar table. Project the companion into a
(EntityId, Field, Hash)table with a composite index on(Field, Hash), write the join yourself, and treat it as a dev-only pattern.
These are guidance only - they are not shipped or test-covered in v1.
MongoDB (multikey)
MongoDB stores the companion as a native BSON array. Native LINQ Contains translates to a multikey-indexable match. Create a multikey index on the companion field:
var keys = Builders<User>.IndexKeys.Ascending(u => u.EmailsIndex);
await collection.Indexes.CreateOneAsync(new CreateIndexModel<User>(keys));var hash = await tayra.ComputeBlindIndexAsync(
"alice@example.com", "EmailsIndex", typeof(User));
var users = await collection
.Find(u => u.EmailsIndex.Contains(hash))
.ToListAsync();Automatic recompute after key rotation is not yet available for MongoDB. If you rotate the HMAC key, re-hash and re-save affected documents yourself until the Mongo recompute service ships.
Recompute After Key Rotation
When you rotate an HMAC key, the existing companion hashes become stale. The recompute services in Tayra.EFCore and Tayra.Marten handle array companions automatically: for each entity they read the source collection and re-hash every element under the new key, preserving null slots and collection shape. No array-specific call is required - run the same RecomputeAsync<T> you use for scalar indexes. See Recompute.
Security Considerations
Array blind indexes inherit every trade-off of scalar blind indexes (see Security) and add one of their own: element multiplicity leakage.
- Frequency leakage is multiplied across array length. As with scalar indexes, identical plaintext produces identical hashes, so an attacker reading the companion can perform frequency analysis. With arrays this is worse: a common value that appears in many elements across many records produces a high-frequency hash that stands out more sharply than a single-column scalar would.
- Use
HashSet<string>to drop duplicate-frequency leakage. If the same value can legitimately appear multiple times within one record's collection, aList<string>/string[]companion leaks that per-record multiplicity (the hash repeats). Declaring the source asHashSet<string>collapses duplicates to a single hash per record via set semantics, removing the within-record frequency signal. - Prefer
BitLength = 0(full hash) forHashSet<string>companions. Truncation deliberately introduces collisions, and set semantics then absorb those collisions silently. The two effects compound: truncation false positives plus set absorption widen the query-hit fan-out more than they would for an orderedList<string>/string[]companion at the sameBitLength. Keep the full 256-bit hash when the companion is a set.
For low-cardinality collection values the same advice as scalar indexes applies - avoid blind indexes where the value space is small enough to enumerate.
See Also
- Blind Indexes Overview - How HMAC blind indexes work
- Querying - EF Core and Marten query examples
- Transforms - Per-element normalisation
- Recompute - Rebuilding companions after key rotation
- Security - Threat model and cardinality risks
- Roslyn Analyzers - TAYRA010 array shape validation
