> ## Documentation Index
> Fetch the complete documentation index at: https://docs.rightnowai.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Core Features

> Essential AI and profiling capabilities for CUDA development

## AI-Powered CUDA Coding

### Smart Autocomplete

Context-aware CUDA completions with Fill-in-the-Middle (FIM) optimization. Supports 20+ FIM-capable models including DeepSeek R1, Codestral 2501, and StarCoder2.

### Ctrl+K Editing

Select any code and press `Ctrl+K` to describe changes in natural language:

* "Optimize this kernel for memory bandwidth"
* "Add error checking to this CUDA call"
* "Convert this to use shared memory"

### Chat Integration

Full project context with CUDA-specific knowledge:

* Ask questions about GPU architecture optimization
* Get recommendations for specific hardware (Ampere, Ada Lovelace, Hopper)
* Troubleshoot CUDA compilation and runtime issues

## Real-Time CUDA Profiling

### NVIDIA Nsight Compute Integration

Production-grade profiling using `nv-nsight-cu-cli` with comprehensive hardware metrics:

**Core Performance Metrics**:

* **SM Efficiency**: Streaming Multiprocessor utilization percentage
* **Memory Throughput**: Achieved vs theoretical memory bandwidth (GB/s)
* **Occupancy**: Active warps vs maximum theoretical warps
* **Warp Efficiency**: Percentage of active threads in executed warps
* **Instruction Replay Overhead**: Pipeline stall analysis
* **Global Memory Efficiency**: Coalesced memory access patterns
* **Shared Memory Efficiency**: Bank conflict analysis
* **Branch Efficiency**: Divergent execution measurement

**Advanced Metrics**:

* **L1/L2 Cache Hit Rates**: Memory hierarchy performance
* **Register Usage**: Per-thread register consumption
* **Power Draw**: Real-time GPU power consumption (watts)
* **Temperature**: GPU thermal monitoring
* **Roofline Analysis**: Compute vs memory-bound classification

### Multi-Level Profiling Support

<CardGroup cols={3}>
  <Card title="Kernel Profiling" icon="cube">
    Profile specific `__global__` functions with targeted analysis
  </Card>

  <Card title="Application Profiling" icon="play">
    Full executable profiling with complete call graphs
  </Card>

  <Card title="CLI Integration" icon="terminal">
    Direct `nv-nsight-cu-cli` integration with custom metrics
  </Card>
</CardGroup>

### Visual Profiling Interface

**CodeLens Integration**:

* Inline performance metrics displayed above CUDA kernels
* Real-time execution time, SM efficiency, memory throughput
* Color-coded performance indicators:

<CardGroup cols={3}>
  <Card title="🟢 Green" icon="circle-check">
    \>80% efficiency (optimized kernels)
  </Card>

  <Card title="🟡 Orange" icon="circle-exclamation">
    40-80% efficiency (moderate performance)
  </Card>

  <Card title="🔴 Red" icon="circle-xmark">
    \<40% efficiency (needs optimization)
  </Card>
</CardGroup>

**Interactive Profiling Controls**:

* **Gutter Play Buttons**: One-click profiling from editor margins
* **Dedicated Profiling Panel**: Comprehensive results view with historical data
* **Multi-GPU Support**: Device switching and cross-GPU analysis
* **Elevated Profiling**: Windows UAC support for performance counter access

### Hardware Detection & Monitoring

**GPU Hardware Integration**:

* **Multi-Vendor Detection**: NVIDIA, AMD, Intel, Apple Silicon
* **Real-Time Monitoring**: `nvidia-smi` integration for live metrics
* **Hardware Specifications**: Automatic detection of compute capability, SM count, memory specs
* **Architecture Support**: Turing, Ampere, Ada Lovelace, Hopper optimizations

**CUDA Environment Detection**:

* **Toolkit Detection**: Automatic CUDA 11.0-12.5 detection
* **Registry Integration**: Windows performance counter access
* **Multi-Version Support**: Compatible with various NCU versions
* **Diagnostic Capabilities**: Comprehensive environment validation

### AI-Powered Performance Analysis

**Intelligent Optimization Recommendations**:

* **Bottleneck Classification**: Memory-bound vs compute-bound identification
* **Architecture-Specific Suggestions**: Tailored for detected GPU architecture
* **Performance Trend Analysis**: Historical optimization tracking
* **Automated Code Suggestions**: AI-generated kernel optimizations based on profiling data

## AI Provider Architecture

### BYOK (Bring Your Own Key) - Free Tier

* **OpenRouter API Key**: The only supported BYOK option
* Get unlimited usage with your own OpenRouter key
* Access to 200+ models through OpenRouter's unified API
* Support for providers: OpenAI, Anthropic, DeepSeek, Mistral, Google, and more

### RightNow Proxy - Pro Tier

* **Managed Service**: Pre-configured OpenRouter integration
* **Curated Models**: Optimized model selection for CUDA development
* **Usage Tracking**: Comprehensive analytics and billing
* **Priority Access**: Faster response times and premium models

### Model Routing Architecture

All cloud providers route through OpenRouter for unified access:

* **OpenAI** (GPT-4, GPT-4 Turbo) → OpenRouter → OpenAI
* **Anthropic** (Claude 3.5 Sonnet, Claude 3 Opus) → OpenRouter → Anthropic
* **DeepSeek** (R1 series) → OpenRouter → DeepSeek
* **Mistral** (Codestral, Mistral Large) → OpenRouter → Mistral
* **Google** (Gemini 2.0 Flash) → OpenRouter → Google

### Local Models (Privacy-First)

Complete offline capability with no data leaving your machine:

* **Ollama**: Easy local model management with CUDA acceleration
* **vLLM**: High-performance inference server for CUDA GPUs
* **LM Studio**: User-friendly local deployment with GPU support

## Hardware Integration

### Automatic GPU Detection

* **NVIDIA GPUs**: Full support (GeForce, RTX, Quadro, Tesla, A100, H100)
* **Multi-GPU**: Cross-GPU profiling and load balancing analysis
* **CUDA Versions**: Support for CUDA Toolkit 11.0-12.5

### Architecture-Aware Intelligence

Tailored suggestions for specific GPU architectures:

* **Turing**: Tensor core optimization, RT core utilization
* **Ampere**: Sparse tensor operations, structural sparsity
* **Ada Lovelace**: Ada shader efficiency, RT generation 3
* **Hopper**: Transformer engine, thread block clusters

<Warning>
  **macOS Support**: Coming soon with serverless profiling capabilities
</Warning>