Documentation Index
Fetch the complete documentation index at: https://docs.rightnowai.co/llms.txt
Use this file to discover all available pages before exploring further.
OpenRouter BYOK (Free Tier)
OpenRouter is the only supported BYOK provider - all other AI providers route through OpenRouter’s unified API.Setup Steps
- Create Account: Sign up at openrouter.ai
- Get API Key: Visit openrouter.ai/settings/keys
- Configure RightNow AI:
- Go to Settings → AI Providers → OpenRouter
- Enter your OpenRouter API key
- Test connection
Available Models
Access 200+ models through OpenRouter’s unified API: Free Models (with your API key):google/gemini-2.0-flash-exp:freemistralai/mistral-small-3.1-24b-instruct:free
- OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
- Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
- DeepSeek: R1 series, Chat models
- Mistral: Large, Codestral 2501
- Google: Gemini 2.0 Flash
Provider Routing
All cloud providers automatically route through OpenRouter:- OpenAI → OpenRouter → OpenAI
- Anthropic → OpenRouter → Anthropic
- DeepSeek → OpenRouter → DeepSeek
- Mistral → OpenRouter → Mistral
- Google → OpenRouter → Google
RightNow Pro (Managed Service)
No API key setup required - fully managed OpenRouter integration.Benefits
- Curated Models: Optimized selection for CUDA development
- Usage Tracking: Comprehensive analytics and billing
- Priority Access: Faster response times and premium models
- Seamless Experience: No API key management needed
Available Models
Chat Models:anthropic/claude-sonnet-4google/gemini-2.5-flashdeepseek/deepseek-chat-v3-0324
codestral-2501deepseek-r1-distill-qwen-7b
Upgrade
Ready to upgrade? Visit rightnowai.co/pricing to get started with RightNow Pro.Local Models (Privacy-First)
Complete offline capability with no data leaving your machine.Ollama
Setup:- Install Ollama on your system
- Pull a model:
ollama pull codellama - Configure RightNow AI:
- Settings → AI Providers → Ollama
- Set endpoint:
http://localhost:11434 - Select your model and test connection
- Easy local model management
- CUDA acceleration support
- Automatic model updates
vLLM
Setup:- Install vLLM:
pip install vllm - Start server:
python -m vllm.entrypoints.api_server --model codellama/CodeLlama-7b-Instruct-hf - Configure RightNow AI:
- Settings → AI Providers → vLLM
- Set endpoint and model
- Test connection
- High-performance inference server
- Optimized for CUDA GPUs
- Excellent throughput for large models
LM Studio
Setup:- Download and install LM Studio
- Download a CUDA-compatible model
- Start local server in LM Studio
- Configure RightNow AI:
- Settings → AI Providers → LM Studio
- Configure endpoint and test connection
- User-friendly interface
- GPU acceleration support
- Easy model management
