How we built hybrid search for SURF's Orchestrator-Core — coming in v5.0

We’re excited to share a project we’ve been working on with SURF, the collaborative ICT organization for Dutch education and research. Together, with a small but focused team, we built a hybrid search system for orchestrator-core — and it’s shipping in the upcoming 5.0 release.

The challenge

Orchestrator-core is an open-source framework for managing product lifecycles and workflows. It powers critical network infrastructure at organizations like SURF, where thousands of subscriptions, products, and workflows need to be managed reliably.

The problem? These domain models are dynamic and user-defined. A single subscription can contain hundreds of attributes distributed across deeply nested structures. The searchable schema isn’t fixed — it evolves as users define new products and product blocks.

Traditional search approaches — Elasticsearch, static column indexes, or simple LIKE queries — don’t cut it here. We needed something that could:

Search across nested, dynamic schemas without knowing the structure at design time
Combine full-text search with semantic (vector) search
Support structured filtering on typed fields (dates, enums, booleans)
Stay inside PostgreSQL — no separate search infrastructure to maintain
Be safe for AI agents to query without generating raw SQL

The solution: PostgreSQL-native hybrid search

Instead of bolting on external search infrastructure, we built everything on top of PostgreSQL using three powerful extensions:

pgvector — semantic search

pgvector enables vector similarity search directly in PostgreSQL. We generate embeddings for text attributes and store them alongside the data. When a user searches for “network connection in Amsterdam,” the system understands the meaning, not just the keywords.

pg_trgm — fuzzy text search

PostgreSQL’s trigram extension handles fuzzy matching and typo tolerance. Combined with full-text search, it catches queries that semantic search alone might miss — like searching for a specific subscription ID or a product code with a typo.

ltree — hierarchical paths

The ltree extension lets us represent the nested structure of domain models as hierarchical paths. A field like subscription.product_block.interface.speed becomes a traversable tree path, enabling queries across any level of nesting.

Reciprocal rank fusion

The magic is in how these retrievers work together. We use Reciprocal Rank Fusion (RRF) to merge results from semantic search, trigram matching, and structured filters into a single, unified ranking. Each retriever contributes its own ranking, and RRF combines them without requiring score normalization.

Schema-agnostic indexing with EAV

The core innovation is our Entity-Attribute-Value (EAV) indexing approach. Instead of mapping dynamic schemas to fixed columns, we decompose each entity into individual attribute rows:

entity_type: SUBSCRIPTION
entity_id: 550e8400-e29b-41d4-a716-446655440000
path: product.interface.speed
value: "10Gbps"
value_type: STRING
embedding: [0.023, -0.041, ...]

Each attribute gets its own row with its hierarchical path (via ltree), typed value, and optional embedding vector. This means:

New product types are searchable immediately — no reindexing, no migrations
Nested attributes at any depth are fully queryable
Type safety is preserved — dates are dates, booleans are booleans
Incremental indexing keeps the index in sync without full rebuilds

Type-safe query DSL

Raw SQL generation by AI agents is dangerous. One hallucinated DROP TABLE and your day is ruined. Instead, we built a Pydantic-based Query DSL that compiles into validated SQL:

query = SearchQuery(
    search="amsterdam network",
    filters=[
        FilterPredicate(
            path="product.status",
            operator="eq",
            value="active"
        )
    ],
    sort_by="created_at",
    limit=25
)

The same Pydantic models serve as structured tool arguments for PydanticAI, allowing AI agents to construct queries through validated, constrained interfaces. The agent never touches SQL — it fills in a structured form, and the system compiles it safely.

Why this matters

For network operators

SURF manages network infrastructure for 100+ Dutch educational and research institutions. Having fast, intelligent search across all subscriptions, products, and workflows means operators can find what they need in seconds instead of clicking through nested pages.

For the open-source community

This isn’t a proprietary solution — it’s going into orchestrator-core 5.0 as an open-source feature. Any organization running orchestrator-core gets hybrid search out of the box.

For AI-driven operations

The query DSL + AI agent integration means that operators can eventually ask questions in natural language: “Show me all 10Gbps interfaces in the Amsterdam region that were provisioned last month” — and get accurate, validated results.

Small team, big impact

One of the things we’re most proud of is how this was achieved. This wasn’t a massive project with dozens of engineers. It was a small, focused team working closely together — combining SURF’s deep domain expertise in network orchestration with our experience in AI, search, and Python backend development.

The result? A production-ready feature that adds genuine value, built efficiently and shipping in a major release.

Technical deep dive

For the full technical details — including the EAV indexing implementation, retriever routing strategy, keyset pagination, and AI agent architecture — check out the excellent write-up by Tim Fröhlich: Building a Schema-Agnostic Hybrid Search System in PostgreSQL.

What’s next

With hybrid search landing in orchestrator-core 5.0, we’re looking at:

Enhanced AI agent capabilities — natural language queries for network operations
Aggregation support — analytics and reporting on dynamic schemas
Cross-entity search — unified search across subscriptions, workflows, and processes
Performance optimization — scaling to millions of indexed attributes

Want to learn more about how we build search and AI solutions for complex systems? Get in touch — we’d love to hear about your challenges.