Skip to content

Transforms

Transforms normalise a plaintext value before it is passed to the HMAC function. Normalisation ensures that logically equivalent inputs - differing only in case or whitespace - produce the same blind index fingerprint.

Without transforms, searching for "jane@example.com" would not match a record saved with "Jane@Example.com", even though they refer to the same address.

Configuring Transforms

Transforms are declared on the [BlindIndex] attribute as an ordered array of strings:

csharp
[BlindIndex(
    CompanionProperty = nameof(EmailHash),
    Transforms = ["lowercase", "trim"])]
public string Email { get; set; } = "";

Transforms are applied left to right. The example above first converts to lowercase, then strips surrounding whitespace.

Built-In Transforms

lowercase

Converts the entire value to lowercase using the invariant culture.

InputOutput
"Jane@Example.COM""jane@example.com"
"ACME Corp""acme corp"
"already lower""already lower"

Use on: email addresses, usernames, domain names.


trim

Removes leading and trailing whitespace (spaces, tabs, newlines).

InputOutput
" jane@example.com ""jane@example.com"
"\tjane\n""jane"
"no change""no change"

Use on: any field where trailing spaces might appear from user input or data imports.


alphanumeric

Removes all characters that are not ASCII letters (a-z, A-Z) or digits (0-9). Useful for normalising names or identifiers that might contain punctuation.

InputOutput
"O'Brien""OBrien"
"Smith-Jones""SmithJones"
"+1 (555) 867-5309""15558675309"

Combine with lowercase for case-insensitive matching

alphanumeric alone does not change case. Use ["lowercase", "alphanumeric"] if you want case-insensitive matching.


digits

Retains only ASCII digit characters (0-9). All other characters are removed. Designed for phone numbers, tax IDs, and other numeric identifiers.

InputOutput
"+1 (555) 867-5309""15558675309"
"SSN: 123-45-6789""123456789"
"GB VAT 123 456 789""123456789"

last4

Retains only the last 4 characters of the value after all other characters have been processed. Commonly used for partial credit card or SSN matching.

InputOutput
"4111111111111111""1111"
"123-45-6789""6789"
"AB12""AB12"
"AB""AB" (shorter than 4 - returned as-is)

Combine last4 with digits for card numbers

Use ["digits", "last4"] to strip formatting characters before taking the last four digits. This ensures "4111-1111-1111-1111" and "4111111111111111" produce the same result.


first_char

Retains only the first character of the value. Useful for bucketed or initial-based lookups.

InputOutput
"Jane""J"
"jane""j"
"""" (empty string is preserved)

Low cardinality warning

first_char produces at most 26 distinct values (plus digits and symbols). This is a very low-cardinality blind index and is susceptible to frequency analysis. See Security Considerations.


Transform Ordering

Transforms are applied in the order they are declared. Order matters.

Example: ["trim", "lowercase", "digits"]

Input:  "  +1 (555) 867-5309  "
  trim →  "+1 (555) 867-5309"
  lowercase → "+1 (555) 867-5309"  (no letters, no change)
  digits → "15558675309"

Example: ["digits", "last4"]

Input:  "4111-1111-1111-1111"
  digits → "4111111111111111"
  last4 → "1111"

Reversing the order would give last4 the formatted string first, which could produce a different result depending on the trailing characters.

Custom Transforms

Use WithTransform() to add inline custom transforms in the fluent API:

cs
// Inline custom transforms - no class or registration needed
var transformServices = new ServiceCollection();
var transformBuilder = transformServices.AddTayra(opts => opts.LicenseKey = licenseKey);
transformBuilder.Entity<IndexedCustomer>(e =>
{
    e.DataSubjectId(c => c.CustomerId);
    e.PersonalData(c => c.Email);
    e.BlindIndex(c => c.Email)
        .WithTransform(value => value.Split('@')[0]) // extract local part
        .WithLowercase()
        .StoredIn(c => c.EmailIndex);
});
anchor

Custom transforms are just functions - no class or registration needed. They compose naturally with built-in transforms in the pipeline.

Custom Transform Rules

  • The function must be a pure function - same input always produces the same output.
  • The function must not throw on an empty string.
  • Transforms should be fast (no I/O, no allocations if avoidable).

Transform Reference Summary

NameEffectTypical Use
lowercaseConverts to invariant lowercaseEmail, username
trimRemoves leading/trailing whitespaceAny user-input field
alphanumericKeeps only [a-zA-Z0-9]Names, identifiers
digitsKeeps only [0-9]Phone numbers, tax IDs
last4Keeps last 4 charactersCard numbers, SSN suffix
first_charKeeps first character onlyBucketed lookups

See Also