OpenRouter BYOK (Free Tier)

OpenRouter is the only supported BYOK provider - all other AI providers route through OpenRouter’s unified API.

Setup Steps

  1. Create Account: Sign up at openrouter.ai
  2. Get API Key: Visit openrouter.ai/settings/keys
  3. Configure RightNow AI:
    • Go to SettingsAI ProvidersOpenRouter
    • Enter your OpenRouter API key
    • Test connection

Available Models

Access 200+ models through OpenRouter’s unified API: Free Models (with your API key):
  • google/gemini-2.0-flash-exp:free
  • mistralai/mistral-small-3.1-24b-instruct:free
Premium Models (with your API key):
  • OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
  • Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
  • DeepSeek: R1 series, Chat models
  • Mistral: Large, Codestral 2501
  • Google: Gemini 2.0 Flash

Provider Routing

All cloud providers automatically route through OpenRouter:
  • OpenAI → OpenRouter → OpenAI
  • Anthropic → OpenRouter → Anthropic
  • DeepSeek → OpenRouter → DeepSeek
  • Mistral → OpenRouter → Mistral
  • Google → OpenRouter → Google

RightNow Pro (Managed Service)

No API key setup required - fully managed OpenRouter integration.

Benefits

  • Curated Models: Optimized selection for CUDA development
  • Usage Tracking: Comprehensive analytics and billing
  • Priority Access: Faster response times and premium models
  • Seamless Experience: No API key management needed

Available Models

Chat Models:
  • anthropic/claude-sonnet-4
  • google/gemini-2.5-flash
  • deepseek/deepseek-chat-v3-0324
FIM Models (Autocomplete):
  • codestral-2501
  • deepseek-r1-distill-qwen-7b

Upgrade

Ready to upgrade? Visit rightnowai.co/pricing to get started with RightNow Pro.

Local Models (Privacy-First)

Complete offline capability with no data leaving your machine.

Ollama

Setup:
  1. Install Ollama on your system
  2. Pull a model: ollama pull codellama
  3. Configure RightNow AI:
    • SettingsAI ProvidersOllama
    • Set endpoint: http://localhost:11434
    • Select your model and test connection
Benefits:
  • Easy local model management
  • CUDA acceleration support
  • Automatic model updates

vLLM

Setup:
  1. Install vLLM: pip install vllm
  2. Start server: python -m vllm.entrypoints.api_server --model codellama/CodeLlama-7b-Instruct-hf
  3. Configure RightNow AI:
    • SettingsAI ProvidersvLLM
    • Set endpoint and model
    • Test connection
Benefits:
  • High-performance inference server
  • Optimized for CUDA GPUs
  • Excellent throughput for large models

LM Studio

Setup:
  1. Download and install LM Studio
  2. Download a CUDA-compatible model
  3. Start local server in LM Studio
  4. Configure RightNow AI:
    • SettingsAI ProvidersLM Studio
    • Configure endpoint and test connection
Benefits:
  • User-friendly interface
  • GPU acceleration support
  • Easy model management
Use local models for privacy-sensitive projects where code cannot leave your machine.