Custom LLM Endpoints

For organizations that host their own models or use private AI deployments, RelayHub supports connecting to custom LLM endpoints. This routes all AI traffic through your infrastructure with zero leakage to external providers.

Supported Endpoint Types

Azure AI Foundry
OpenAI-Compatible

Connect to models deployed through Azure AI Foundry (formerly Azure AI Studio). RelayHub communicates with Azure’s OpenAI-compatible API surface, so any model available through your Azure deployment works seamlessly.Requirements:

Azure AI Foundry endpoint URL (e.g., https://your-resource.openai.azure.com/)
API key or Azure AD credentials
Deployment name(s) for your models

Any endpoint that implements the OpenAI Chat Completions API can be used as a custom provider. This includes:

vLLM and TGI self-hosted inference servers
Ollama running on your infrastructure
LiteLLM proxy configurations
Any proxy or gateway that speaks the OpenAI protocol

Requirements:

Base URL for the endpoint (e.g., https://your-server.com/v1)
API key (if the endpoint requires authentication)

Adding a Custom Endpoint

Open Provider Hub

Navigate to Provider Hub from the left sidebar (Admin only).

Click Add Custom Provider

At the bottom of the Provider Hub page, click Add Custom Provider.

Configure the connection

Fill in the following fields:

Provider Name — a label for this endpoint (e.g., “Azure Production”)
Base URL — the full URL to your API endpoint
API Key — your authentication credential (encrypted at rest)
Provider Type — select Azure AI Foundry or OpenAI-Compatible

Discover models

Click Discover Models. RelayHub queries your endpoint’s model listing API and displays all available models. Select which models your team should have access to.

Enable the provider

Toggle the provider to Enabled. It will now take priority over BYOK keys and platform keys for all LLM traffic.

Zero Leakage Guarantee

When a custom provider is active with platform fallback disabled (the default), every LLM call in the system routes through your endpoint:

Chat conversations (standard and dual chat)
Embedding generation for document indexing
Background workers (memory crystals, knowledge extraction)
Vision and image analysis tasks
Utility tasks (summarization, classification)

If your custom endpoint goes down, LLM calls will return errors rather than silently falling back to external providers. This is by design — it ensures no data ever leaves your infrastructure without your knowledge.

Model Discovery

RelayHub queries your endpoint’s /v1/models route to discover available models. Discovered models appear in a selection list where you can:

Enable or disable individual models for your team
Set a default model that is pre-selected in new chat sessions
Label models with friendly names (e.g., “Fast” or “High Quality”)

If your endpoint does not support the /v1/models listing route, you can manually add models by name. This is common with some Azure deployments where the model ID matches the deployment name.

Resolution Order

Custom endpoints sit at the top of RelayHub’s provider resolution chain:

Custom Provider (highest priority)
BYOK Key
Platform Key (lowest priority)

This means adding a custom provider immediately overrides any BYOK keys you may have configured. To temporarily bypass the custom provider, disable it from the Provider Hub and traffic will fall through to the next available key.

Test your custom endpoint with a few chat messages before rolling it out to your full team. You can enable it for just your admin account first by keeping it disabled at the organization level and testing via the API directly.

​Supported Endpoint Types

​Adding a Custom Endpoint

​Zero Leakage Guarantee

​Model Discovery

​Resolution Order

Supported Endpoint Types

Adding a Custom Endpoint

Zero Leakage Guarantee

Model Discovery

Resolution Order