Skip to main content

Documentation Index

Fetch the complete documentation index at: https://hastekit.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

HasteKit SDK supports text-to-speech generation with various LLM providers like OpenAI.

Generate speech

Generate audio from text input. The response contains raw binary audio data and content type information.
import (
    "context"
    "fmt"
    "log"
    "os"
    "github.com/hastekit/hastekit-sdk-go/pkg/gateway/llm/speech"
)

resp, err := client.NewSpeech(context.Background(), &speech.Request{
    Model: "OpenAI/tts-1",
    Input: "Hello, this is a test of the text-to-speech system.",
    Voice: "alloy",
})
if err != nil {
    log.Fatal(err)
}

// Save audio to file
err = os.WriteFile("output.mp3", resp.Audio, 0644)
if err != nil {
    log.Fatal(err)
}

// Access content type
fmt.Println("Content-Type:", resp.ContentType)

Response

The response contains the audio data as a byte array and the content type (e.g., audio/mpeg, audio/opus).
// Access raw audio bytes
audioData := resp.Audio
fmt.Printf("Audio size: %d bytes\n", len(audioData))

// Access content type
fmt.Println("Content-Type:", resp.ContentType)

// The audio data may be gzip-compressed, which is automatically handled by the SDK

Streaming Speech

For real-time applications, the SDK supports streaming speech generation. This returns a channel that yields audio chunks as they are generated.
import (
    "context"
    "fmt"
    "github.com/hastekit/hastekit-sdk-go/pkg/utils"
    "github.com/hastekit/hastekit-sdk-go/pkg/gateway/llm/speech"
)

stream, err := client.NewStreamingSpeech(context.Background(), &speech.Request{
    Model: "OpenAI/tts-1",
    Input: "This is a streaming text-to-speech example.",
    Voice: "alloy",
    Speed: utils.Ptr(1), // Speed parameter enables streaming
})
if err != nil {
    panic(err)
}

var audioChunks []string
for chunk := range stream {
    // Handle audio delta chunks
    if chunk.OfAudioDelta != nil {
        audioChunks = append(audioChunks, chunk.OfAudioDelta.Audio)
    }
    
    // Handle completion chunk with usage statistics
    if chunk.OfAudioDone != nil {
        fmt.Printf("\nCompleted! Tokens used: %d\n", chunk.OfAudioDone.Usage.TotalTokens)
    }
}

Request Configuration

ParameterTypeDescription
InputstringThe text input to convert to speech.
ModelstringThe speech model to use (e.g., "tts-1", "tts-1-hd").
VoicestringThe voice to use (e.g., "alloy", "echo", "fable", "onyx", "nova", "shimmer").
Instruction*stringOptional. Instructions for the voice style or pronunciation.
ResponseFormat*stringOptional. Audio format: "mp3", "opus", "aac", "flac", or "pcm". Default is "mp3".
Speed*intOptional. Speed of the generated audio. When set, enables streaming mode. Range typically 0.25 to 4.0.
StreamFormat*stringOptional. Format for streaming responses (e.g., "pcm").

Response Structure

FieldTypeDescription
Audio[]byteRaw binary audio data. Automatically decompressed if gzip-compressed.
ContentTypestringMIME type of the audio (e.g., "audio/mpeg", "audio/opus").

Streaming Response Chunks

When using streaming, the response channel yields ResponseChunk objects:

ChunkAudioDelta

Contains incremental audio data during streaming.
FieldTypeDescription
TypeChunkTypeAudioDeltaChunk type identifier: "speech.audio.delta".
AudiostringBase64-encoded audio chunk data.

ChunkAudioDone

Sent when streaming is complete, includes usage statistics.
FieldTypeDescription
TypeChunkTypeAudioDoneChunk type identifier: "speech.audio.done".
UsageUsageToken usage statistics.

Usage

FieldTypeDescription
InputTokensintNumber of input tokens processed.
OutputTokensintNumber of output tokens generated.
TotalTokensintTotal tokens used.

Example: Complete Speech Generation

package main

import (
    "context"
    "log"
    "os"
    
    "github.com/hastekit/hastekit-sdk-go/pkg/utils"
    "github.com/hastekit/hastekit-sdk-go/pkg/gateway"
    "github.com/hastekit/hastekit-sdk-go/pkg/gateway/llm"
    "github.com/hastekit/hastekit-sdk-go/pkg/gateway/llm/speech"
    hastekit "github.com/hastekit/hastekit-sdk-go"
)

func main() {
    // Initialize SDK client
    client, err := hastekit.New(&hastekit.ClientOptions{
        ProviderConfigs: []gateway.ProviderConfig{
            {
                ProviderName:  llm.ProviderNameOpenAI,
                BaseURL:       "",
                CustomHeaders: nil,
                ApiKeys: []*gateway.APIKeyConfig{
                    {
                        Name:   "Key 1",
                        APIKey: os.Getenv("OPENAI_API_KEY"),
                    },
                },
            },
        },
    })
    if err != nil {
        log.Fatal(err)
    }

    // Generate speech
    resp, err := client.NewSpeech(context.Background(), &speech.Request{
        Model:          "OpenAI/tts-1",
        Input:          "Hello! This is a text-to-speech example using HasteKit SDK.",
        Voice:          "alloy",
        ResponseFormat: utils.Ptr("mp3"),
    })
    if err != nil {
        log.Fatal(err)
    }

    // Save audio file
    err = os.WriteFile("output.mp3", resp.Audio, 0644)
    if err != nil {
        log.Fatal(err)
    }

    log.Printf("Audio generated successfully! Size: %d bytes, Type: %s\n", 
        len(resp.Audio), resp.ContentType)
}

Supported Providers

ProviderSpeech
OpenAI
Gemini
Anthropic