Skip to content

[Feature]: Support for Image Generation #950

@Pratham-Mishra04

Description

@Pratham-Mishra04

Add native image generation capabilities to Bifrost, enabling unified access to image generation models (DALL-E, Stable Diffusion, Midjourney API, etc.) through the existing gateway infrastructure with full streaming support, semantic caching, and UI rendering.

As AI applications increasingly combine text and image generation, Bifrost should provide a unified interface for image generation models with the same benefits it offers for LLMs:

  • Provider abstraction - Switch between DALL-E, Stable Diffusion, etc. seamlessly
  • Fallback support - Automatic failover between image providers
  • Observability - Logging, metrics, and cost tracking for image generation
  • Caching - Semantic cache for repeated image prompts
  • Streaming - Progressive image loading for better UX

Scope

In Scope

  • New /v1/images/generations endpoint (OpenAI-compatible)
  • Image generation via Chat Completion API (tool use pattern)
  • Image generation via Responses API (native support)
  • Streaming image delivery (base64 chunks)
  • Semantic caching for image generation
  • UI components for image rendering
  • Provider implementations: OpenAI DALL-E, Azure DALL-E

Out of Scope (Future Work)

  • Image editing (/v1/images/edits)
  • Image variations (/v1/images/variations)
  • Video generation
  • Additional providers (Stability AI, Midjourney)

Technical Design

Note: These are just basic examples for better understanding

1. Schema Definitions

New File: core/schemas/images.go

package schemas

// Request Types
const (
    ImageGenerationRequest       RequestType = "image_generation"
    ImageGenerationStreamRequest RequestType = "image_generation_stream"
)

// BifrostImageGenerationRequest represents an image generation request
type BifrostImageGenerationRequest struct {
    Provider       ModelProvider              `json:"provider"`
    Model          string                     `json:"model"`
    Input          *ImageGenerationInput      `json:"input"`
    Params         *ImageGenerationParameters `json:"params,omitempty"`
    Fallbacks      []Fallback                 `json:"fallbacks,omitempty"`
    RawRequestBody []byte                     `json:"-"`
}

type ImageGenerationInput struct {
    Prompt string `json:"prompt"`
}

type ImageGenerationParameters struct {
    N              *int    `json:"n,omitempty"`               // Number of images (1-10)
    Size           *string `json:"size,omitempty"`            // "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"
    Quality        *string `json:"quality,omitempty"`         // "standard", "hd"
    Style          *string `json:"style,omitempty"`           // "natural", "vivid"
    ResponseFormat *string `json:"response_format,omitempty"` // "url", "b64_json"
    User           *string `json:"user,omitempty"`
    ExtraParams    map[string]interface{} `json:"extra_params,omitempty"`
}

// BifrostImageGenerationResponse represents the response
type BifrostImageGenerationResponse struct {
    ID          string                     `json:"id"`
    Created     int64                      `json:"created"`
    Model       string                     `json:"model"`
    Data        []ImageData                `json:"data"`
    Usage       *ImageUsage                `json:"usage,omitempty"`
    ExtraFields BifrostResponseExtraFields `json:"extra_fields,omitempty"`
}

type ImageData struct {
    URL           string `json:"url,omitempty"`
    B64JSON       string `json:"b64_json,omitempty"`
    RevisedPrompt string `json:"revised_prompt,omitempty"`
    Index         int    `json:"index"`
}

type ImageUsage struct {
    PromptTokens int `json:"prompt_tokens"`
    TotalTokens  int `json:"total_tokens"`
}

// Streaming Response
type BifrostImageStreamResponse struct {
    ID            string  `json:"id"`
    Type          string  `json:"type"`                        // "image.chunk", "image.complete", "error"
    Index         int     `json:"index"`                       // Which image (0-N)
    ChunkIndex    int     `json:"chunk_index"`                 // Chunk order within image
    PartialB64    string  `json:"partial_b64,omitempty"`       // Base64 chunk
    RevisedPrompt string  `json:"revised_prompt,omitempty"`    // On first chunk
    Usage         *ImageUsage `json:"usage,omitempty"`         // On final chunk
    Error         *BifrostError `json:"error,omitempty"`
}

2. Provider Interface Extension

Update: core/schemas/provider.go

type Provider interface {
    // ... existing methods ...

    // Image Generation
    ImageGeneration(ctx context.Context, key Key, request *BifrostImageGenerationRequest) (
        *BifrostImageGenerationResponse, *BifrostError)
    ImageGenerationStream(ctx context.Context, postHookRunner PostHookRunner, key Key,
        request *BifrostImageGenerationRequest) (chan *BifrostStream, *BifrostError)
}

3. Core Bifrost Methods

Update: core/bifrost.go

// Add public methods
func (b *Bifrost) ImageGenerationRequest(ctx context.Context,
    req *schemas.BifrostImageGenerationRequest) (*schemas.BifrostImageGenerationResponse, *schemas.BifrostError)

func (b *Bifrost) ImageGenerationStreamRequest(ctx context.Context,
    req *schemas.BifrostImageGenerationRequest) (chan *schemas.BifrostStream, *schemas.BifrostError)

// Update handleRequest switch statement
case schemas.ImageGenerationRequest:
    resp, err := provider.ImageGeneration(req.Context, key, req.BifrostRequest.ImageGenerationRequest)
    if err != nil {
        return nil, err
    }
    response.ImageGenerationResponse = resp

4. HTTP Transport Layer

Update: transports/bifrost-http/handlers/inference.go

// Add route
r.POST("/v1/images/generations", h.imageGeneration)

// Handler implementation
func (h *CompletionHandler) imageGeneration(ctx *fasthttp.RequestCtx) {
    var req ImageGenerationHTTPRequest
    if err := sonic.Unmarshal(ctx.PostBody(), &req); err != nil {
        // error handling
    }

    bifrostReq := toBifrostImageRequest(&req)

    if req.Stream != nil && *req.Stream {
        h.handleStreamingImageGeneration(ctx, bifrostReq)
        return
    }

    resp, err := h.client.ImageGenerationRequest(ctx, bifrostReq)
    // response handling
}

5. Streaming Implementation

New File: framework/streaming/images.go

package streaming

type ImageStreamChunk struct {
    Timestamp    time.Time
    Delta        *schemas.BifrostImageStreamResponse
    FinishReason *string
    ChunkIndex   int
    ImageIndex   int
    ErrorDetails *schemas.BifrostError
}

// Pool for memory efficiency
var imageStreamChunkPool = sync.Pool{
    New: func() interface{} {
        return &ImageStreamChunk{}
    },
}

func (a *Accumulator) addImageStreamChunk(requestID string, chunk *ImageStreamChunk, isFinal bool) error {
    acc := a.getOrCreateStreamAccumulator(requestID)
    acc.mu.Lock()
    defer acc.mu.Unlock()

    acc.ImageStreamChunks = append(acc.ImageStreamChunks, chunk)

    if isFinal {
        return a.processImageStreamingResponse(requestID, acc)
    }
    return nil
}

func (a *Accumulator) processImageStreamingResponse(requestID string, acc *StreamAccumulator) error {
    // Sort chunks by ImageIndex, then ChunkIndex
    sort.Slice(acc.ImageStreamChunks, func(i, j int) bool {
        if acc.ImageStreamChunks[i].ImageIndex != acc.ImageStreamChunks[j].ImageIndex {
            return acc.ImageStreamChunks[i].ImageIndex < acc.ImageStreamChunks[j].ImageIndex
        }
        return acc.ImageStreamChunks[i].ChunkIndex < acc.ImageStreamChunks[j].ChunkIndex
    })

    // Reconstruct complete images from chunks
    images := make(map[int]*strings.Builder)
    for _, chunk := range acc.ImageStreamChunks {
        if _, ok := images[chunk.ImageIndex]; !ok {
            images[chunk.ImageIndex] = &strings.Builder{}
        }
        images[chunk.ImageIndex].WriteString(chunk.Delta.PartialB64)
    }

    // Build final response
    // ...
}

Update: framework/streaming/types.go

type StreamAccumulator struct {
    // ... existing fields ...
    ImageStreamChunks []*ImageStreamChunk
}

const StreamTypeImage = "image.generation"

6. Semantic Cache Integration

Update: plugins/semanticcache/main.go

// Add image generation to cacheable request types
func (p *SemanticCachePlugin) PreHook(ctx context.Context, req *schemas.BifrostRequest) error {
    switch req.RequestType {
    case schemas.ChatCompletionRequest, schemas.ImageGenerationRequest:
        return p.checkCache(ctx, req)
    }
    return nil
}

// Image-specific cache key generation
func (p *SemanticCachePlugin) getImageCacheKey(req *schemas.BifrostImageGenerationRequest) string {
    // Hash: prompt + size + quality + style + n
    h := xxhash.New()
    h.WriteString(req.Input.Prompt)
    if req.Params != nil {
        if req.Params.Size != nil {
            h.WriteString(*req.Params.Size)
        }
        if req.Params.Quality != nil {
            h.WriteString(*req.Params.Quality)
        }
        // ... other params
    }
    return fmt.Sprintf("img_%x", h.Sum64())
}

Cache Storage Schema:

// Vector store properties for image cache
Properties: []Property{
    {Name: "request_hash", DataType: "string"},
    {Name: "prompt_embedding", DataType: "vector"},  // For semantic similarity
    {Name: "image_urls", DataType: "string[]"},      // Cached URLs
    {Name: "image_b64", DataType: "string[]"},       // Cached base64 (optional)
    {Name: "revised_prompts", DataType: "string[]"},
    {Name: "expires_at", DataType: "int"},
    {Name: "provider", DataType: "string"},
    {Name: "model", DataType: "string"},
    {Name: "params_hash", DataType: "string"},
}

7. Provider Implementation (OpenAI)

New File: core/providers/openai/images.go

package openai

type OpenAIImageRequest struct {
    Model          string  `json:"model"`
    Prompt         string  `json:"prompt"`
    N              *int    `json:"n,omitempty"`
    Size           *string `json:"size,omitempty"`
    Quality        *string `json:"quality,omitempty"`
    Style          *string `json:"style,omitempty"`
    ResponseFormat *string `json:"response_format,omitempty"`
    User           *string `json:"user,omitempty"`
}

type OpenAIImageResponse struct {
    Created int64 `json:"created"`
    Data    []struct {
        URL           string `json:"url,omitempty"`
        B64JSON       string `json:"b64_json,omitempty"`
        RevisedPrompt string `json:"revised_prompt,omitempty"`
    } `json:"data"`
}

func (p *OpenAIProvider) ImageGeneration(ctx context.Context, key schemas.Key,
    req *schemas.BifrostImageGenerationRequest) (*schemas.BifrostImageGenerationResponse, *schemas.BifrostError) {

    openaiReq := toOpenAIImageRequest(req)

    resp, err := p.doRequest(ctx, key, "POST", "/v1/images/generations", openaiReq)
    if err != nil {
        return nil, toBifrostError(err)
    }

    return toBifrostImageResponse(resp), nil
}

func (p *OpenAIProvider) ImageGenerationStream(ctx context.Context, postHookRunner schemas.PostHookRunner,
    key schemas.Key, req *schemas.BifrostImageGenerationRequest) (chan *schemas.BifrostStream, *schemas.BifrostError) {

    // OpenAI doesn't natively support streaming for images
    // Implement chunked base64 delivery for large images
    streamChan := make(chan *schemas.BifrostStream, 10)

    go func() {
        defer close(streamChan)

        // Generate image (non-streaming from provider)
        resp, err := p.ImageGeneration(ctx, key, req)
        if err != nil {
            streamChan <- &schemas.BifrostStream{BifrostError: err}
            return
        }

        // Stream base64 data in chunks
        for i, img := range resp.Data {
            if img.B64JSON != "" {
                chunks := chunkBase64(img.B64JSON, 64*1024) // 64KB chunks
                for j, chunk := range chunks {
                    streamChan <- &schemas.BifrostStream{
                        BifrostImageStreamResponse: &schemas.BifrostImageStreamResponse{
                            ID:         resp.ID,
                            Type:       "image.chunk",
                            Index:      i,
                            ChunkIndex: j,
                            PartialB64: chunk,
                        },
                    }
                }
            }
            // Send completion marker
            streamChan <- &schemas.BifrostStream{
                BifrostImageStreamResponse: &schemas.BifrostImageStreamResponse{
                    ID:            resp.ID,
                    Type:          "image.complete",
                    Index:         i,
                    RevisedPrompt: img.RevisedPrompt,
                },
            }
        }
    }()

    return streamChan, nil
}

8. UI Components

New File: ui/components/chat/ImageMessage.tsx

import React, { useState, useEffect } from 'react';
import { Card } from '@/components/ui/card';
import { Skeleton } from '@/components/ui/skeleton';

interface ImageMessageProps {
  images: Array<{
    url?: string;
    b64_json?: string;
    revised_prompt?: string;
    index: number;
  }>;
  isStreaming?: boolean;
  streamProgress?: number; // 0-100
}

export const ImageMessage: React.FC<ImageMessageProps> = ({
  images,
  isStreaming,
  streamProgress
}) => {
  return (
    <div className="grid grid-cols-2 gap-4 my-4">
      {images.map((img, idx) => (
        <Card key={idx} className="overflow-hidden">
          {isStreaming && !img.url && !img.b64_json ? (
            <div className="relative">
              <Skeleton className="w-full aspect-square" />
              <div className="absolute bottom-2 left-2 text-sm text-muted-foreground">
                Loading... {streamProgress}%
              </div>
            </div>
          ) : (
            <>
              <img
                src={img.url || `data:image/png;base64,${img.b64_json}`}
                alt={img.revised_prompt || `Generated image ${idx + 1}`}
                className="w-full h-auto"
                loading="lazy"
              />
              {img.revised_prompt && (
                <div className="p-2 text-xs text-muted-foreground border-t">
                  {img.revised_prompt}
                </div>
              )}
            </>
          )}
        </Card>
      ))}
    </div>
  );
};

API Examples

REST API

Request:

curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A serene Japanese garden with cherry blossoms",
    "n": 1,
    "size": "1024x1024",
    "quality": "hd",
    "response_format": "b64_json"
  }'

Response:

{
  "id": "img-abc123",
  "created": 1699999999,
  "model": "dall-e-3",
  "data": [
    {
      "b64_json": "iVBORw0KGgo...",
      "revised_prompt": "A tranquil Japanese garden featuring blooming cherry blossom trees...",
      "index": 0
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "total_tokens": 15
  },
  "extra_fields": {
    "provider": "openai",
    "latency_ms": 8500,
    "cache_debug": null
  }
}

Streaming Response (SSE)

data: {"id":"img-abc123","type":"image.chunk","index":0,"chunk_index":0,"partial_b64":"iVBORw0KGgo..."}

data: {"id":"img-abc123","type":"image.chunk","index":0,"chunk_index":1,"partial_b64":"AAAANSUhEU..."}

data: {"id":"img-abc123","type":"image.complete","index":0,"revised_prompt":"A tranquil Japanese garden...","usage":{"prompt_tokens":15,"total_tokens":15}}

data: [DONE]

Files to Modify/Create

New Files

File Purpose
core/schemas/images.go Image generation type definitions
core/providers/openai/images.go OpenAI DALL-E implementation
core/providers/azure/images.go Azure DALL-E implementation
core/internal/testutil/image_generation.go Image Generation Test Scenario
core/internal/testutil/image_generation_stream.go Image Generation Stream Test Scenario
framework/streaming/images.go Image stream accumulation
ui/components/chat/ImageMessage.tsx React image rendering component
ui/hooks/useImageStream.ts Streaming hook for progressive loading

Modified Files

File Changes
core/schemas/bifrost.go Add ImageGenerationRequest/Response to BifrostRequest/Response unions
core/schemas/provider.go Add ImageGeneration methods to Provider interface
core/bifrost.go Add ImageGenerationRequest/StreamRequest methods, update handleRequest
core/providers/openai/openai.go Implement Provider interface methods
transports/bifrost-http/handlers/inference.go Add /v1/images/generations route
framework/streaming/types.go Add ImageStreamChunk, StreamTypeImage
framework/streaming/accumulator.go Add image chunk pool, processing methods
plugins/semanticcache/main.go Add image generation caching support
plugins/semanticcache/search.go Add image-specific cache search

Testing Plan

Unit Tests

  • Schema serialization/deserialization
  • Request transformation (Bifrost → OpenAI format)
  • Response transformation (OpenAI → Bifrost format)
  • Stream chunk accumulation
  • Cache key generation

Integration Tests

  • End-to-end image generation (non-streaming)
  • End-to-end streaming image generation
  • Fallback to secondary provider
  • Cache hit/miss scenarios
  • Error handling (rate limits, invalid prompts)

Load Tests

  • Concurrent image generation requests
  • Stream memory usage under load
  • Cache performance at scale

Rollout Plan

  1. Phase 1: Core schema and provider implementation (OpenAI + Azure)
  2. Phase 2: HTTP transport and non-streaming endpoint
  3. Phase 3: Streaming support and accumulator
  4. Phase 4: Semantic cache integration (Base64 storage, 5min TTL)
  5. Phase 5: UI components and documentation

Design Decisions

Decision Choice Rationale
Cache Storage Base64 Data Store actual image bytes to prevent expiration issues with provider URLs
Initial Providers OpenAI + Azure Both DALL-E providers for redundancy and enterprise support
Streaming Chunk Size 64KB Balance between latency and overhead

References


Critical Files Reference

Component File Path
Core schemas core/schemas/bifrost.go, core/schemas/chatcompletions.go
Provider interface core/schemas/provider.go
Main engine core/bifrost.go
OpenAI provider core/providers/openai/openai.go, core/providers/openai/chat.go
HTTP handlers transports/bifrost-http/handlers/inference.go
Streaming framework framework/streaming/accumulator.go, framework/streaming/types.go
Semantic cache plugins/semanticcache/main.go, plugins/semanticcache/search.go
UI components ui/components/

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions