End-to-End Examples

Complete examples for every endpoint and common use cases.

Chat with OpenAI GPT-4o

export OPENAI_API_KEY=sk-...
./mvnw -pl gateway-server spring-boot:run &

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Explain virtual threads in one sentence."}
    ],
    "max_tokens": 80
  }'

Chat with Anthropic Claude

export ANTHROPIC_API_KEY=sk-ant-...

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "What is 2 + 2?"}
    ]
  }'

Chat with Google Gemini

export GEMINI_API_KEY=AIza...

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.0-flash",
    "messages": [
      {"role": "user", "content": "What is the capital of Japan?"}
    ],
    "max_tokens": 256
  }'

Chat with AWS Bedrock

Any model hosted on Bedrock can be used. Prefix the Bedrock model ID with bedrock/:

export BEDROCK_ENABLED=true
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    "messages": [
      {"role": "user", "content": "What is 2 + 2?"}
    ]
  }'

Other Bedrock model examples:

bedrock/amazon.titan-text-express-v1
bedrock/meta.llama3-70b-instruct-v1:0
bedrock/mistral.mistral-7b-instruct-v0:2

Local Ollama (llama3)

# Start Ollama and pull a model first:
#   ollama serve
#   ollama pull llama3.2

OLLAMA_ENABLED=true ./mvnw -pl gateway-server spring-boot:run &

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/llama3.2",
    "messages": [
      {"role": "user", "content": "Hello from the gateway!"}
    ]
  }'

Mock Provider (Testing / CI)

MOCK_PROVIDER_ENABLED=true ./mvnw -pl gateway-server spring-boot:run &

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock/test-model",
    "messages": [
      {"role": "user", "content": "Hello from mock!"}
    ]
  }'

Generate an Embedding

curl -s -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-ada-002",
    "input": "Dvara AI Gateway"
  }' | python3 -c "
import sys, json
d = json.load(sys.stdin)
vec = d['data'][0]['embedding']
print(f'Dimensions: {len(vec)}')
print(f'First 5 values: {vec[:5]}')
"

Streaming Chat (SSE)

curl -s -N -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Count to five slowly."}
    ],
    "stream": true
  }'

Structured Output — JSON Object

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "List 3 programming languages as a JSON array."}
    ],
    "response_format": {"type": "json_object"}
  }'

Structured Output — JSON Schema

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Extract: John is 30 years old and lives in Paris."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "person",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "city": {"type": "string"}
          },
          "required": ["name", "age", "city"]
        },
        "strict": true
      }
    }
  }'

Structured Output with Anthropic

The same response_format works transparently — the gateway translates to Anthropic's tool-use API:

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "Extract: Jane is 25 and works as an engineer."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "person",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "occupation": {"type": "string"}
          },
          "required": ["name", "age", "occupation"]
        },
        "strict": true
      }
    }
  }'

Check for the X-Gateway-Strict-Downgraded: true header in the response — Anthropic does not natively enforce strict schemas.

Python — OpenAI SDK (Drop-In Replacement)

from openai import OpenAI

client = OpenAI(
    api_key="sk-my-app-key",
    base_url="http://localhost:8080/v1"
)

# Switch providers by changing only the model
for model in ["gpt-4o", "claude-sonnet-4-5", "gemini-2.0-flash"]:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Hello from Dvara!"}]
    )
    print(f"{model}: {response.choices[0].message.content}")

Node.js — OpenAI SDK (Drop-In Replacement)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-my-app-key",
  baseURL: "http://localhost:8080/v1",
});

// Switch providers by changing only the model
for (const model of ["gpt-4o", "claude-sonnet-4-5", "gemini-2.0-flash"]) {
  const response = await client.chat.completions.create({
    model,
    messages: [{ role: "user", content: "Hello from Dvara!" }],
  });
  console.log(`${model}: ${response.choices[0].message.content}`);
}

Java — RestClient

RestClient client = RestClient.create();

String response = client.post()
    .uri("http://localhost:8080/v1/chat/completions")
    .contentType(MediaType.APPLICATION_JSON)
    .body("""
        {
          "model": "gpt-4o",
          "messages": [
            {"role": "user", "content": "Hello from Java!"}
          ]
        }
        """)
    .retrieve()
    .body(String.class);

System.out.println(response);

Multi-Provider Failover Demo

Configure weighted routing with two providers, then simulate a failure:

# application.yml
gateway:
  routes:
    - id: failover-demo
      model-pattern: "gpt*"
      strategy: round-robin
      providers:
        - provider: openai
        - provider: mock    # fallback to mock if OpenAI fails
  providers:
    mock:
      enabled: true

# Start with both providers
OPENAI_API_KEY=sk-... MOCK_PROVIDER_ENABLED=true ./mvnw -pl gateway-server spring-boot:run &

# Normal request (goes to OpenAI)
curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

# If OpenAI is down, the gateway automatically fails over to mock

Request Tracing

Pass a custom trace ID to correlate requests across your system:

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Trace-ID: my-request-12345" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }' -i 2>/dev/null | grep X-Trace-ID
# → X-Trace-ID: my-request-12345

Chat with OpenAI GPT-4o​

Chat with Anthropic Claude​

Chat with Google Gemini​

Chat with AWS Bedrock​

Local Ollama (llama3)​

Mock Provider (Testing / CI)​

Generate an Embedding​

Streaming Chat (SSE)​

Structured Output — JSON Object​

Structured Output — JSON Schema​

Structured Output with Anthropic​

Python — OpenAI SDK (Drop-In Replacement)​

Node.js — OpenAI SDK (Drop-In Replacement)​

Java — RestClient​

Multi-Provider Failover Demo​

Request Tracing​