
Denvr AI Inference Services
Price-performant inference at scale with OpenAI API compatibility.

Serverless Endpoints
Use OpenAI API-compatible APIs with leading open source foundation models like Llama, Qwen, and DeepSeek.

Dedicated Endpoints
Leverage private endpoints for reliability and privacy. For use with open weight and private fine-tuned models.

Intel Collaboration
Partnership with Intel to develop enterprise-ready inference powered by cost-efficient Intel Xeon and Gaudi AI accelerators.
Serverless Models Supported
Open-weight foundation models available with API endpoints for rapid integration.
MODEL NAME
PARAMS
CONTEXT
PRECISION
Llama 3.3
70B
32k
BF16
Llama 3.2
1B, 3B
32k
BF16
Llama 3.1
8B, 70B
32k
BF16
Llama 3.1 (soon)
405B
32k
FP8
DeepSeek R1 (soon)
671B
32k
FP8
Mistral v0.1
7B, 8x7B
32k
BF16
Qwen 2.5
7B, 14B, 32B, 72B
32k
BF16
Falcon 3
7B, 10B
32k
BF16
ALLam-AI Preview
7B
32k
BF
BGE M3 Embedder
108M
8k
BF16
BGE M3 Reranker
568M
1k
BF16
-> Native OpenAI API compatibility allowing quick model migration and efficient inference deployment.
-> Cost efficient managed services that reduces your hosting and operational expenses.
-> Model serving optimized for first-token latency, or batch throughput.
-> Serverless endpoints limited to published models and up 60 requests per second.
Early Access Program
Apply now for early access to Denvr AI Inference Services.
Partner Feedback Goals
-> Validate use of serverless models and preferred models
-> Validate use of private endpoints for user workloads
-> Validate performance requirements (Time-to-first-token and inter-token-latency)
-> Provide feedback on feature prioritization
-> Consult on pricing and SLA requirements
Upcoming Features
-> UI and API access for serverless and private endpoints
-> Full context size models (128K)
-> Self-service API key management
-> On-demand management of private endpoints
-> Model fine-tuning workflow
-> Detailed utilization metrics
-> Flexible billing via pre-paid VISA or post-paid Invoice
Pricing guidance inline with market for Serverless, private endpoints at $2.50 per Intel Gaudi2 GPU hour (on-demand) and discounts for term-based commitments. Pricing is under review with Early Access Partners.
Easy Deployment of Models with Denvr AI Inference Services
High value Inference with no unnecessary overhead.


Pre-Trained Models
Easy access and deployment of common ready-to-use (pre-trained) AI models.

No Hardware Management
Model deployment does not require any management, maintenance or operational overhead of hardware infrastructure.

Custom Model Support
Support for custom model hosting and deployment.





Adaptability
Experiment with different stack configurations and optimize compute resource dynamically.

Pay-Per-Use
Only pay for the compute resources used, reducing costs and eliminating wastage.

Flexibility & Scalability
Scale compute resources up or down quickly based on immediate needs.
Experiment with different stack configurations and optimize compute resource dynamically.
Adaptability
Only pay for the compute resources used, reducing costs and eliminating wastage.
Pay-Per-Use
Experiment with different stack configurations and optimize compute resource dynamically.
Immediate Access
Scale compute resources up or down quickly based on immediate needs.
Flexibility & Scalability
