Custom Providers

You can integrate any custom model providers that conform to the OpenAI API format. To do this:

OpenAI Format Compatibility

We expect your endpoint API to follow the OpenAI API format for chat completions. Concretely, this means that:

  • When integrating a <url>, a chat completion endpoint should exist at <url>/chat/completions
  • The input format follows the OpenAI API format specification. For example, the following is a valid request:
curl <url>/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

📘

OpenAI Compatibility

Even though we expect the endpoint to be compatible with the OpenAI API format, we do not expect the endpoint to support for all the parameters.

The parameters currently used by LatticeFlow include:

  • max_completion_tokens: An upper bound for the number of tokens that can be generated for a completion.
  • temperature: The sampling temperature, between 0 and 1.
  • n: How many chat completion choices to generate for each input message.
  • top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

If not all of these are supported, the integration will still work, but some functionality might be disabled.

  • The output format follows the OpenAI API format specification. For example, the following is a valid output:
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "logprobs": null,
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21,
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

Step 1. Connect a New Model

To connect you custom endpoint, the key things to provide are:

  • url at which the endpoint is accessible.
  • api_key used to authenticate. This will be passed in the header as Authorization: Bearer $API_KEY parameter.
  • key of the model to be used to generate the response.
  • (optional) custom_headers that need to be provided in the model inference. Custom headers will be passed as additional headers in the model inference POST request.

Example

To connect an OpenAI model, the following API call is used:

curl --request POST \
     --url http://127.0.0.1:5005/api/model-providers/model_endpoints/models \
     --header 'X-LatticeFlow-API-Key: $LF_API_KEY' \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "modality": "text",
  "task": "chat_completion",
  "key": "gpt-4.5-preview",
  "api_key": "$OPENAI_API_KEY",
  "model_adapter_key": "openai",
  "url": "https://api.openai.com/v1",
  "name": "gpt-4.5-preview"
}
'

For a full documentation, please consult the create model API endpoint specification.

Step 2. Test the Integration

Testing existing integrations is supported suing the test inference API endpoint.

Example

To test the integration above, the following call is used:

curl --request GET \
     --url http://127.0.0.1:5005/api/model-providers/model_endpoints/models/gpt-4.5-preview/test \
     --header 'X-LatticeFlow-API-Key: $LF_API_KEY' \
     --header 'accept: application/json'

If everything works fine, you should see a response like:

{"choices": [{"message": {"role": "assistant", "content": "Hello! How can I help you today?"}}]}