

Intel Gaudi AI Accelerators
API endpoints for serverless and dedicated model hosting.
Denvr AI Inference
Deploy and scale your GenAI applications with foundation and private models


Denvr AI Inference
Experience Intel Gaudi: Price-Performant AI Inference at Scale with OpenAI API compatibility.
Denvr AI Inference Services



Managed Endpoints
Intel Collaboration
Maximized Efficiency
Use OpenAI API-compatible APIs with leading open source models like Llama, Qwen, and DeepSeek. Support for shared or dedicated deployments for reliability and privacy.
Partnership with Intel to develop enterprise-ready inference engines. Powered by cost-efficient Intel Gaudi 2 AI accelerators.
Leverage Intel Gaudi 2 AI acceleration to enhance compute density, lower infrastructure costs, and drive AI scalability across demanding enterprise applications.

MODEL NAME
PARAMS
CONTEXT
PRECISION
Llama 3.3
70B
32k
BF16
Llama 3.2
1B, 3B
32k
BF16
Llama 3.1
8B, 70B
32k
BF16
Llama 3.1
405B (available soon)
32k
FP8
DeepSeek-R1
671B (available soon)
32k
FP8
Mistral v0.1
7B, 8x7B
32k
BF16
Qwen 2.5
7B, 14B, 32B, 72B
32k
BF16
Falcon 3
7B, 10B
32k
BF16
ALLam-AI Preview
7B
32k
FP16
BGE M3 Embedder
108M
8k
BF16
BGE M3 Reranker
568M (available soon)
1k
BF16
Private models
Any
Any
Any
Foundation Models Supported
-> Native OpenAI API compatibility allowing quick model migration and efficient inference deployment.
-> Cost efficient managed services that reduces your hosting and operational expenses.
-> Model serving optimized for either interactive latency, or batch throughput.
-> Serverless endpoints limited to published models and up 60 requests per second.
-> Private endpoints for predictable performance and no rate limits
Beta Program
Partner Feedback Goals
-> Validate use of serverless models and preferred models
-> Validate use of private endpoints for user workloads
-> Validate performance requirements (Time-to-first-token and inter-token-latency)
-> Provide feedback on feature prioritization
-> Consult on pricing and SLA requirements
Pricing guidance is expected to be inline with market for Serverless; private endpoints at $2.50 per Intel Gaudi2 GPU hour (on-demand) and discounts for term-based commitments. Pricing is under review with Beta Partners.
Upcoming Features
-> UI and API access for serverless and private endpoints
-> Full context size models (128K)
-> Self-service API key management
-> On-demand management of private endpoints
-> Model fine-tuning workflow
-> Detailed utilization metrics
-> Flexible billing via pre-paid VISA or post-paid Invoice