Deploy your own AI Vibe Coding platform with All Hands AI, Devstral, and VLLM on Denvr Cloud

May 25

7 min read

The landscape of software development is undergoing a profound transformation, driven by the ever-evolving capabilities of Artificial Intelligence. No longer confined to the realms of research, AI is now becoming an indispensable companion for developers, promising to revolutionize how we write, debug, and optimize code. This isn't just about faster autocompletion; it's about fostering a more intuitive, collaborative, and ultimately, more joyful coding experience – a concept we like to call "Vibe Coding."

Vibe Coding: What it is, why it matters, and how it’s reshaping modern development workflows.
All Hands AI: Its core functionalities, benefits, and why pairing it with Devstral and VLLM creates an unparalleled AI development environment.
A meticulous, step-by-step deployment guide designed for clarity and ease of execution.
Visual walkthroughs with screenshots and example outputs to ensure a smooth setup process.
Practical, real-world use cases and invaluable productivity tips to help you extract maximum value from this formidable tool.

The Evolution of Development: Embracing "Vibe Coding"

For too long, coding has been synonymous with solitary concentration, often punctuated by frustrating debugging sessions or tedious boilerplate generation. While problem-solving remains at the heart of development, the rise of AI offers an opportunity to shift the focus from repetitive tasks to creative innovation. This is the essence of Vibe Coding: leveraging intelligent tools to create a coding experience that is not just efficient, but also deeply engaging, intuitive, and collaborative.

Vibe Coding goes far beyond traditional autocompletion. It envisions an AI assistant that truly understands your project's context, your coding style, and even your intentions. Imagine an AI that:

Suggests meaningful improvements to your code, identifying potential optimizations before you even think of them.
Catches subtle bugs early, reducing the time spent on arduous debugging cycles.
Generates boilerplate code, tests, or even complex refactors with minimal input, freeing you to concentrate on higher-level logic.
Facilitates seamless collaboration by providing context-aware suggestions across team members' contributions.

The ultimate goal of Vibe Coding is to empower developers to maintain a state of "flow" – that deeply focused, highly productive mental state where coding feels effortless and almost meditative. Tools that champion Vibe Coding, such as All Hands AI, act as intelligent pair programmers, absorbing your project's nuances and proactively assisting you. This paradigm shift aims to reduce friction, eliminate mundane tasks, and ultimately, bring the joy back to writing code, especially in complex team or enterprise environments where context switching and large codebases can often be overwhelming.

All Hands AI: Your Personal, Private, and Powerful AI Coding Assistant

In a world increasingly reliant on cloud-based services, the idea of a self-hosted AI assistant might seem counter-intuitive to some, but for many, it represents the pinnacle of control, security, and customization. This is precisely where All Hands AI shines. It's an open-source, VS Code-compatible AI coding assistant that offers a compelling alternative to proprietary solutions like GitHub Copilot, Cursor, or Cody.

What makes All Hands AI stand out? Its core philosophy revolves around empowering developers with:

Complete Self-Hosting: You retain full control over your code and your AI environment. No data leaves your infrastructure, ensuring maximum security and privacy, a critical concern for sensitive projects or enterprise-level development. This eliminates potential compliance headaches and gives you peace of mind.
Unparalleled Customization: Every development team has unique workflows, coding standards, and project-specific requirements. All Hands AI is designed to be highly adaptable, allowing you to fine-tune its behavior, integrate it with your existing toolchain, and even train it on your private repositories for hyper-contextual suggestions. This level of flexibility is often absent in off-the-shelf solutions.
Security and Privacy by Design: By running All Hands AI in your own environment, you circumvent the risks associated with third-party data exposure and vendor lock-in. Your intellectual property remains yours, always. This is particularly valuable for companies dealing with proprietary algorithms, confidential data, or regulated industries where data sovereignty is paramount.

Imagine an internal version of GitHub Copilot, but one that's specifically tuned to your internal repositories, your company's coding conventions, and your unique infrastructure. That's the promise of All Hands AI – a secure, private, and customizable AI companion tailored to your team's exact needs.

The Unbeatable Trio: Devstral + VLLM on Denvr Cloud

While All Hands AI provides the framework for your AI coding assistant, its true power is unlocked when paired with a highly capable underlying Large Language Model (LLM) and an efficient inference engine. This is where Devstral and VLLM come into play, forming an unbeatable trio that, when deployed on Denvr Cloud, creates an unparalleled development experience.

Devstral: The Agentic LLM for Software Engineering

Devstral is not just any LLM; it's an agentic LLM specifically engineered for software engineering tasks. Developed through a synergistic collaboration between Mistral AI and All Hands AI, Devstral is designed to go beyond simple text generation. It excels at:

Using Tools to Explore Codebases: Devstral can intelligently navigate and understand complex code structures, identify dependencies, and locate relevant sections of code for a given task. This "agentic" capability means it can interact with your development environment, similar to how a human developer would.
Editing Multiple Files: Unlike simpler models, Devstral can handle tasks that span across multiple files, making consistent and context-aware changes throughout your project. This is crucial for refactoring, implementing new features, or fixing cross-cutting concerns.
Powering Software Engineering Agents: Its design makes it ideal for building sophisticated AI agents that can automate complex development workflows, from generating comprehensive test suites to performing security audits.

The impressive performance of Devstral on the SWE-bench benchmark is a testament to its capabilities, positioning it as a leading open-source model for software development tasks. This means you're leveraging an LLM that's not just powerful, but also purpose-built for the challenges of coding.

VLLM: High-Throughput and Low-Latency LLM Serving

Even the most intelligent LLM needs an efficient way to serve its inferences. This is where VLLM (Vector-Quantized Large Language Model) enters the picture. VLLM is an open-source library designed for high-throughput and low-latency LLM serving. It significantly optimizes the inference process, ensuring that your All Hands AI assistant responds swiftly and efficiently to your prompts, even when dealing with large models like Devstral. VLLM achieves this through advanced techniques like PagedAttention, which efficiently manages GPU memory and reduces latency.

Denvr Cloud: The Ideal Deployment Platform

While self-hosting on your local machine is certainly an option, leveraging a dedicated GPU cloud provider like Denvr Cloud offers distinct advantages, particularly for teams or individuals requiring scalable, high-performance infrastructure. Running the All Hands AI + Devstral + VLLM stack on Denvr Cloud provides:

🚀 Access to GPU-Accelerated Infrastructure: Denvr Cloud offers a diverse range of powerful GPUs, including A100, H100, and Gaudi2. These high-end GPUs are essential for running large language models like Devstral efficiently, ensuring rapid inference and a smooth user experience.
🛠️ Seamless Integration with Container-Based Deployment Pipelines: Denvr Cloud's environment is highly compatible with containerization technologies like Docker, simplifying deployment, scaling, and management of your AI services. This streamlines your DevOps workflow and ensures consistency across environments.
🔐 Enhanced Control and Isolation for Enterprise Teams: For larger organizations, Denvr Cloud provides the necessary infrastructure for secure, isolated environments, meeting enterprise-grade security and compliance requirements. This allows teams to deploy and manage their AI assistants with confidence, knowing their data and operations are secure.

The synergy of these three components—All Hands AI for the assistant framework, Devstral as the intelligent backbone, VLLM for optimized inference, and Denvr Cloud for scalable, secure infrastructure—creates a truly powerful and versatile AI coding platform.

Prerequisites

To follow this tutorial, you'll need:

A Denvr Cloud account with GPU resources provisioned.
A Huggingface account and an access token. (optional)

Step-by-Step Setup

1. Launch a GPU Instance on Denvr Cloud

Select an A100 or H100 node.
Choose your preferred OS (Ubuntu 22.04 recommended).
Enable Docker support and SSH access.

✅ Tip: Use a VM template with Nvidia drivers preinstalled to save time.

1.1 After login to the VM Add ubuntu user into docker group

sudo usermod -a -G docker ubuntu

1.2 Clone the repo with all instructions and docker compose file

git clone https://github.com/denvrdata/examples.git

2. Install the Latest Version of VLLM

Install vLLM:

pip install vllm --upgrade
pip install --upgrade pyopenssl

3. Download and Run the Devstral-Small-2505 Model

Spin up VLLM inference across all GPUs:

screen -S vllm
vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 8

It may take some time to download the model. Once the model is running, check its status using a llm_ping script:

python3 llm_ping.py <IP address>

Setup the Prometheus Monitoring Stack

Run Prometheus and Grafana using Docker Compose for monitoring and visualization.

Start Docker Compose:

cd examples/all-hands-ai-mistral/
docker compose up -d

Grafana will be up at http://<IP>:3000.
Add Prometheus as a data source in Grafana.
Import the vLLM dashboard from the vllm_dashboard.json file available in the repo.
NVIDIA DCGM Dashboard: You can import the NVIDIA DCGM dashboard from Grafana for monitoring GPU metrics.

Run OpenHands App in Docker

Run OpenHands in Docker for a seamless, collaborative environment:

docker run -it --rm --pull=always \
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
  -e LOG_ALL_EVENTS=true \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.openhands-state:/.openhands-state \
  -p 3001:3000 \
  --add-host host.docker.internal:host-gateway \
  --name openhands-app \
  docker.all-hands.dev/all-hands-ai/openhands:0.38

Note:

Uvicorn is running at port 3000, but since Grafana uses port 3000, OpenHands is mapped to port 3001.

Once the application is running, visit http://<IP>:3001 in your browser.

Setting Up Custom Inference

All hands AI setup — LLM Configuration Option on first run

Custom Model: openai/mistralai/Devstral-Small-2505
Base URL: http://<your-server-url>:8000/v1
API Key: token

OpenHands AI Vibe Coding Examples in Action

All Hands AI Coding with Monitoring in Action

Nvidia DCGM monitoring — Nvidia GPU Monitoring data

Here’s an example of the kind of output All Hands AI can generate:

Prompt:

Build a To-Do list app with the following requirements:
- Built using FastAPI and React.
- Make it a one page app that:
- Allows to add a task.
- Allows to delete a task.
- Allows to mark a task as done
- Displays the list of tasks.
- Store the tasks in a SQLite database.

Output should look like

All hands AI generated code — Generated Code

Real-World Use Cases

Here are some common ways teams use All Hands AI:

Syntax fixes across large codebases.
Security review assistants with custom rules.
Test generation for legacy code.
Context-aware suggestions using repository embeddings.

Final Thoughts

By combining All Hands AI, Devstral inference, and Denvr Cloud, you're not just getting a GitHub Copilot alternative — you're creating an enterprise-grade, secure coding platform for your team.

Have questions or want to collaborate on tuning it for your organization? Feel free to reach out!