Get started with Kani TTS in minutes. Follow our comprehensive installation guide to set up the text-to-speech model on your system.
Python 3.8 or higher is required for running Kani TTS. We recommend using Python 3.10 or 3.11 for optimal compatibility.
python --version
For optimal performance, a GPU with at least 2GB VRAM is recommended. The model has been tested on NVIDIA GeForce RTX 5080 with 16GB GPU memory.
nvidia-smi
CUDA 12.8 or compatible version is required for GPU acceleration. Ensure your system has the appropriate CUDA drivers installed.
nvcc --version
Windows 10/11 with Python 3.8+
macOS 10.15+ with Python 3.8+
Ubuntu 18.04+ or equivalent with Python 3.8+
Install the essential packages required for Kani TTS to function properly.
# Core dependencies
pip install torch librosa soundfile numpy huggingface_hub
pip install "nemo_toolkit[tts]"
CRITICAL: Kani TTS requires a custom transformers build for the "lfm2" model type. This is essential for proper functionality.
# CRITICAL: Custom transformers build required for "lfm2" model type
pip install -U "git+https://github.com/huggingface/transformers.git"
For browser-based interface with real-time audio playback, install these additional packages.
# Optional: For web interface
pip install fastapi uvicorn
Run the basic example with built-in sample text to test your installation.
python basic/main.py
Provide your own text input for speech generation.
python basic/main.py --prompt "Hello world! My name is Kani, I'm a speech generation model!"
The TTS model loads into memory and initializes the processing pipeline.
The system generates speech from the provided text using the neural network.
Audio is saved as generated_audio_YYYYMMDD_HHMMSS.wav in the current directory.
Launch the web interface for browser-based interaction with real-time audio playback.
# Start the FastAPI server
python fastapi_example/server.py
The server will run on http://localhost:8000
nineninesix/kani-tts-450m-0.1-pt
Generates random voices
nineninesix/kani-tts-450m-0.2-ft
Specialized for female voice characteristics
nineninesix/kani-tts-450m-0.1-ft
Specialized for male voice characteristics
To use a different model, modify the ModelConfig class in config.py:
# Example: Switching to female voice model
class ModelConfig:
model_name = "nineninesix/kani-tts-450m-0.2-ft"
sample_rate = 22050
max_tokens = 1200
temperature = 1.4
Reduce batch size or use CPU mode if GPU memory is insufficient.
Ensure the custom transformers package is properly installed.
Check sample rate settings and ensure proper audio drivers are installed.
Use GPU acceleration for faster processing and better performance.
Monitor VRAM usage and adjust parameters accordingly.
Process multiple texts in batches for improved efficiency.
Now that you have Kani TTS installed, explore the demo and start generating high-quality speech from text.