The Denvr and Intel Partnership

Managed API endpoints for serverless or dedicated model hosting.

Intel Gaudi AI Accelerators

API endpoints for serverless and dedicated model hosting.

Denvr AI Inference

Request Beta Access

Deploy and scale your GenAI applications with foundation and private models

Denvr AI Inference

Experience Intel Gaudi: Price-Performant AI Inference at Scale with OpenAI API compatibility.

Request Beta Access

Get support ->

MODEL NAME

PARAMS

CONTEXT

PRECISION

Llama 3.3

70B

32k

BF16

Llama 3.2

1B, 3B

32k

BF16

Llama 3.1

8B, 70B

32k

BF16

Llama 3.1

405B (available soon)

32k

FP8

DeepSeek-R1

671B (available soon)

32k

FP8

Mistral v0.1

7B, 8x7B

32k

BF16

Qwen 2.5

7B, 14B, 32B, 72B

32k

BF16

Falcon 3

7B, 10B

32k

BF16

ALLam-AI Preview

7B

32k

FP16

BGE M3 Embedder

108M

8k

BF16

BGE M3 Reranker

568M (available soon)

1k

BF16

Private models

Any

Foundation Models Supported

-> Native OpenAI API compatibility allowing quick model migration and efficient inference deployment.

-> Cost efficient managed services that reduces your hosting and operational expenses.

-> Model serving optimized for either interactive latency, or batch throughput.

-> Serverless endpoints limited to published models and up 60 requests per second.

-> Private endpoints for predictable performance and no rate limits

Frequently asked questions

Chat

Ai Ascend

Inference Services Use Cases

Inference Core Features

AI Platform Services - Customer Types

Full Stack Optimization

Denvr AI Inference Services

Managed Endpoints

Intel Collaboration

Maximized Efficiency

Use OpenAI API-compatible APIs with leading open source models like Llama, Qwen, and DeepSeek. Support for shared or dedicated deployments for reliability and privacy.

Partnership with Intel to develop enterprise-ready inference engines. Powered by cost-efficient Intel Gaudi 2 AI accelerators.

Leverage Intel Gaudi 2 AI acceleration to enhance compute density, lower infrastructure costs, and drive AI scalability across demanding enterprise applications.

MODEL NAME

PARAMS

CONTEXT

PRECISION

PLATFORM

GPU COUNT

GPU VRAM

vCPUs

PLATFORM

GPU COUNT

GPU VRAM

vCPUs

PLATFORM

GPU COUNT

GPU VRAM

vCPUs

Foundation Models Supported

-> Native OpenAI API compatibility allowing quick model migration and efficient inference deployment.

-> Cost efficient managed services that reduces your hosting and operational expenses.

-> Model serving optimized for either interactive latency, or batch throughput.

-> Serverless endpoints limited to published models and up 60 requests per second.

-> Private endpoints for predictable performance and no rate limits

Beta Program

Partner Feedback Goals

-> Validate use of serverless models and preferred models

-> Validate use of private endpoints for user workloads

-> Validate performance requirements (Time-to-first-token and inter-token-latency)

-> Provide feedback on feature prioritization

-> Consult on pricing and SLA requirements

Pricing guidance is expected to be inline with market for Serverless; private endpoints at $2.50 per Intel Gaudi2 GPU hour (on-demand) and discounts for term-based commitments. Pricing is under review with Beta Partners.

Upcoming Features

-> UI and API access for serverless and private endpoints

-> Full context size models (128K)

-> Self-service API key management

-> On-demand management of private endpoints

-> Model fine-tuning workflow

-> Detailed utilization metrics

-> Flexible billing via pre-paid VISA or post-paid Invoice

Intel Gaudi AI Accelerators

API endpoints for serverless and dedicated model hosting.

Denvr AI Inference

Deploy and scale your GenAI applications with foundation and private models

Denvr AI Inference

Foundation Models Supported

Frequently asked questions

I want to learn more about the promotional pricing on the H100s

Take me to the AI Ascend page so I can sign up for free credits

I need to get my AI model training up and running fast. I'd like to speak to someone about getting onboarded.

Denvr AI Inference Services

Managed Endpoints

Intel Collaboration

Maximized Efficiency

Foundation Models Supported

Beta Program

Partner Feedback Goals

Upcoming Features

Company

Resources

Products