Core Features

AI-Powered CUDA Coding

Smart Autocomplete

Context-aware CUDA completions with Fill-in-the-Middle (FIM) optimization. Supports 20+ FIM-capable models including DeepSeek R1, Codestral 2501, and StarCoder2.

Ctrl+K Editing

Select any code and press Ctrl+K to describe changes in natural language:

“Optimize this kernel for memory bandwidth”
“Add error checking to this CUDA call”
“Convert this to use shared memory”

Chat Integration

Full project context with CUDA-specific knowledge:

Ask questions about GPU architecture optimization
Get recommendations for specific hardware (Ampere, Ada Lovelace, Hopper)
Troubleshoot CUDA compilation and runtime issues

Real-Time CUDA Profiling

NVIDIA Nsight Compute Integration

Production-grade profiling using nv-nsight-cu-cli with comprehensive hardware metrics: Core Performance Metrics:

SM Efficiency: Streaming Multiprocessor utilization percentage
Memory Throughput: Achieved vs theoretical memory bandwidth (GB/s)
Occupancy: Active warps vs maximum theoretical warps
Warp Efficiency: Percentage of active threads in executed warps
Instruction Replay Overhead: Pipeline stall analysis
Global Memory Efficiency: Coalesced memory access patterns
Shared Memory Efficiency: Bank conflict analysis
Branch Efficiency: Divergent execution measurement

Advanced Metrics:

L1/L2 Cache Hit Rates: Memory hierarchy performance
Register Usage: Per-thread register consumption
Power Draw: Real-time GPU power consumption (watts)
Temperature: GPU thermal monitoring
Roofline Analysis: Compute vs memory-bound classification

Multi-Level Profiling Support

Kernel Profiling

Profile specific __global__ functions with targeted analysis

Application Profiling

Full executable profiling with complete call graphs

CLI Integration

Direct nv-nsight-cu-cli integration with custom metrics

Visual Profiling Interface

CodeLens Integration:

Inline performance metrics displayed above CUDA kernels
Real-time execution time, SM efficiency, memory throughput
Color-coded performance indicators:

🟢 Green

>80% efficiency (optimized kernels)

🟡 Orange

40-80% efficiency (moderate performance)

🔴 Red

<40% efficiency (needs optimization)

Interactive Profiling Controls:

Gutter Play Buttons: One-click profiling from editor margins
Dedicated Profiling Panel: Comprehensive results view with historical data
Multi-GPU Support: Device switching and cross-GPU analysis
Elevated Profiling: Windows UAC support for performance counter access

Hardware Detection & Monitoring

GPU Hardware Integration:

Multi-Vendor Detection: NVIDIA, AMD, Intel, Apple Silicon
Real-Time Monitoring: nvidia-smi integration for live metrics
Hardware Specifications: Automatic detection of compute capability, SM count, memory specs
Architecture Support: Turing, Ampere, Ada Lovelace, Hopper optimizations

CUDA Environment Detection:

Toolkit Detection: Automatic CUDA 11.0-12.5 detection
Registry Integration: Windows performance counter access
Multi-Version Support: Compatible with various NCU versions
Diagnostic Capabilities: Comprehensive environment validation

AI-Powered Performance Analysis

Intelligent Optimization Recommendations:

Bottleneck Classification: Memory-bound vs compute-bound identification
Architecture-Specific Suggestions: Tailored for detected GPU architecture
Performance Trend Analysis: Historical optimization tracking
Automated Code Suggestions: AI-generated kernel optimizations based on profiling data

AI Provider Architecture

BYOK (Bring Your Own Key) - Free Tier

OpenRouter API Key: The only supported BYOK option
Get unlimited usage with your own OpenRouter key
Access to 200+ models through OpenRouter’s unified API
Support for providers: OpenAI, Anthropic, DeepSeek, Mistral, Google, and more

RightNow Proxy - Pro Tier

Managed Service: Pre-configured OpenRouter integration
Curated Models: Optimized model selection for CUDA development
Usage Tracking: Comprehensive analytics and billing
Priority Access: Faster response times and premium models

Model Routing Architecture

All cloud providers route through OpenRouter for unified access:

OpenAI (GPT-4, GPT-4 Turbo) → OpenRouter → OpenAI
Anthropic (Claude 3.5 Sonnet, Claude 3 Opus) → OpenRouter → Anthropic
DeepSeek (R1 series) → OpenRouter → DeepSeek
Mistral (Codestral, Mistral Large) → OpenRouter → Mistral
Google (Gemini 2.0 Flash) → OpenRouter → Google

Local Models (Privacy-First)

Complete offline capability with no data leaving your machine:

Ollama: Easy local model management with CUDA acceleration
vLLM: High-performance inference server for CUDA GPUs
LM Studio: User-friendly local deployment with GPU support

Hardware Integration

Automatic GPU Detection

NVIDIA GPUs: Full support (GeForce, RTX, Quadro, Tesla, A100, H100)
Multi-GPU: Cross-GPU profiling and load balancing analysis
CUDA Versions: Support for CUDA Toolkit 11.0-12.5

Architecture-Aware Intelligence

Tailored suggestions for specific GPU architectures:

Turing: Tensor core optimization, RT core utilization
Ampere: Sparse tensor operations, structural sparsity
Ada Lovelace: Ada shader efficiency, RT generation 3
Hopper: Transformer engine, thread block clusters

macOS Support: Coming soon with serverless profiling capabilities

Getting Started

Profiling & Analysis

Advanced Features

Settings

Support

Core Features

AI-Powered CUDA Coding

Smart Autocomplete

Ctrl+K Editing

Chat Integration

Real-Time CUDA Profiling

NVIDIA Nsight Compute Integration

Multi-Level Profiling Support

Kernel Profiling

Application Profiling

CLI Integration

Visual Profiling Interface

🟢 Green

🟡 Orange

🔴 Red

Hardware Detection & Monitoring

AI-Powered Performance Analysis

AI Provider Architecture

BYOK (Bring Your Own Key) - Free Tier

RightNow Proxy - Pro Tier

Model Routing Architecture

Local Models (Privacy-First)

Hardware Integration

Automatic GPU Detection

Architecture-Aware Intelligence

Getting Started

Core Features

Profiling & Analysis

Advanced Features

Settings

Support

​AI-Powered CUDA Coding

​Smart Autocomplete

​Ctrl+K Editing

​Chat Integration

​Real-Time CUDA Profiling

​NVIDIA Nsight Compute Integration

​Multi-Level Profiling Support

Kernel Profiling

Application Profiling

CLI Integration

​Visual Profiling Interface

🟢 Green

🟡 Orange

🔴 Red

​Hardware Detection & Monitoring

​AI-Powered Performance Analysis

​AI Provider Architecture

​BYOK (Bring Your Own Key) - Free Tier

​RightNow Proxy - Pro Tier

​Model Routing Architecture

​Local Models (Privacy-First)

​Hardware Integration

​Automatic GPU Detection

​Architecture-Aware Intelligence

AI-Powered CUDA Coding

Smart Autocomplete

Ctrl+K Editing

Chat Integration

Real-Time CUDA Profiling

NVIDIA Nsight Compute Integration

Multi-Level Profiling Support

Visual Profiling Interface

Hardware Detection & Monitoring

AI-Powered Performance Analysis

AI Provider Architecture

BYOK (Bring Your Own Key) - Free Tier

RightNow Proxy - Pro Tier

Model Routing Architecture

Local Models (Privacy-First)

Hardware Integration

Automatic GPU Detection

Architecture-Aware Intelligence