End-to-End Examples
Complete examples for every endpoint and common use cases.
Chat with OpenAI GPT-4o
export OPENAI_API_KEY=sk-...
./mvnw -pl gateway-server spring-boot:run &
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain virtual threads in one sentence."}
],
"max_tokens": 80
}'
Chat with Anthropic Claude
export ANTHROPIC_API_KEY=sk-ant-...
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-5",
"messages": [
{"role": "user", "content": "What is 2 + 2?"}
]
}'
Chat with Google Gemini
export GEMINI_API_KEY=AIza...
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.0-flash",
"messages": [
{"role": "user", "content": "What is the capital of Japan?"}
],
"max_tokens": 256
}'
Chat with AWS Bedrock
Any model hosted on Bedrock can be used. Prefix the Bedrock model ID with bedrock/:
export BEDROCK_ENABLED=true
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
"messages": [
{"role": "user", "content": "What is 2 + 2?"}
]
}'
Other Bedrock model examples:
bedrock/amazon.titan-text-express-v1
bedrock/meta.llama3-70b-instruct-v1:0
bedrock/mistral.mistral-7b-instruct-v0:2
Local Ollama (llama3)
# Start Ollama and pull a model first:
# ollama serve
# ollama pull llama3.2
OLLAMA_ENABLED=true ./mvnw -pl gateway-server spring-boot:run &
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/llama3.2",
"messages": [
{"role": "user", "content": "Hello from the gateway!"}
]
}'
Mock Provider (Testing / CI)
MOCK_PROVIDER_ENABLED=true ./mvnw -pl gateway-server spring-boot:run &
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mock/test-model",
"messages": [
{"role": "user", "content": "Hello from mock!"}
]
}'
Generate an Embedding
curl -s -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-ada-002",
"input": "Dvara AI Gateway"
}' | python3 -c "
import sys, json
d = json.load(sys.stdin)
vec = d['data'][0]['embedding']
print(f'Dimensions: {len(vec)}')
print(f'First 5 values: {vec[:5]}')
"
Streaming Chat (SSE)
curl -s -N -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Count to five slowly."}
],
"stream": true
}'
Structured Output — JSON Object
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "List 3 programming languages as a JSON array."}
],
"response_format": {"type": "json_object"}
}'
Structured Output — JSON Schema
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Extract: John is 30 years old and lives in Paris."}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"city": {"type": "string"}
},
"required": ["name", "age", "city"]
},
"strict": true
}
}
}'
Structured Output with Anthropic
The same response_format works transparently — the gateway translates to Anthropic's tool-use API:
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-5",
"messages": [
{"role": "user", "content": "Extract: Jane is 25 and works as an engineer."}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"occupation": {"type": "string"}
},
"required": ["name", "age", "occupation"]
},
"strict": true
}
}
}'
Check for the X-Gateway-Strict-Downgraded: true header in the response — Anthropic does not natively enforce strict schemas.
Python — OpenAI SDK (Drop-In Replacement)
from openai import OpenAI
client = OpenAI(
api_key="sk-my-app-key",
base_url="http://localhost:8080/v1"
)
# Switch providers by changing only the model
for model in ["gpt-4o", "claude-sonnet-4-5", "gemini-2.0-flash"]:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Hello from Dvara!"}]
)
print(f"{model}: {response.choices[0].message.content}")
Node.js — OpenAI SDK (Drop-In Replacement)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-my-app-key",
baseURL: "http://localhost:8080/v1",
});
// Switch providers by changing only the model
for (const model of ["gpt-4o", "claude-sonnet-4-5", "gemini-2.0-flash"]) {
const response = await client.chat.completions.create({
model,
messages: [{ role: "user", content: "Hello from Dvara!" }],
});
console.log(`${model}: ${response.choices[0].message.content}`);
}
Java — RestClient
RestClient client = RestClient.create();
String response = client.post()
.uri("http://localhost:8080/v1/chat/completions")
.contentType(MediaType.APPLICATION_JSON)
.body("""
{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Hello from Java!"}
]
}
""")
.retrieve()
.body(String.class);
System.out.println(response);
Multi-Provider Failover Demo
Configure weighted routing with two providers, then simulate a failure:
# application.yml
gateway:
routes:
- id: failover-demo
model-pattern: "gpt*"
strategy: round-robin
providers:
- provider: openai
- provider: mock # fallback to mock if OpenAI fails
providers:
mock:
enabled: true
# Start with both providers
OPENAI_API_KEY=sk-... MOCK_PROVIDER_ENABLED=true ./mvnw -pl gateway-server spring-boot:run &
# Normal request (goes to OpenAI)
curl -s http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
# If OpenAI is down, the gateway automatically fails over to mock
Request Tracing
Pass a custom trace ID to correlate requests across your system:
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Trace-ID: my-request-12345" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}' -i 2>/dev/null | grep X-Trace-ID
# → X-Trace-ID: my-request-12345