How to run Meta Llama 3.1 405B with Nebius AI Studio API

Victor king Oshimua

Click Create API Key to generate your key to access the API.
3. Set Up Your API Key3. Set Up Your API Key
It’s recommended to store your API key in an environment variable for security reasons. Here’s how to do that:
On MacOS/Linux:
Open your terminal and run the following command:
export NEBIUS_API_KEY="your_nebius_api_key"

On Windows:
Open your Command Prompt and run the following command:
set NEBIUS_API_KEY="your_nebius_api_key"

Make sure to replace "your_nebius_api_key" with your actual API key.
4. Install the OpenAI SDK4. Install the OpenAI SDK
Depending on the programming language you’re using, install the appropriate SDK to interact with Nebius AI’s API.
For Python:
Open your terminal and run:
pip install openai
For JavaScript:
If you’re using Node.js, run the following command in your terminal:
npm install openai

Understanding Llama 3.1 405BUnderstanding Llama 3.1 405B

With the Llama 3.1 model, Meta has set a new standard in using large-scale models for language-based AI applications. Llama 3.1 405B is designed to handle large-scale natural language processing (NLP) tasks. As the name suggests, this model comes with 405 billion parameters, which outscales previous versions in both complexity and performance.
Key featuresKey features
Large parameter count: With 405 billion parameters, Llama 3.1 ranks among the largest models for NLP.
Pretraining on diverse data: Trained on a wide range of multilingual data, Llama 3.1 has a strong ability to understand different languages and retain knowledge.
Scalability: Although large, Llama 3.1 is optimized for efficiency on distributed systems. This makes it suitable for applications that demand computational power.
Open source: Developers can freely access and build on the model.

API overviewAPI overview

The Nebius AI Studio API allows you to interact with various advanced models via an OpenAI-compatible interface. This API simplifies the process of building AI-powered applications by offering flexibility across different development environments. If you’re familiar with OpenAI’s API, you can use Studio with minimal changes to your code.
API access methodsAPI access methods
Python SDK: Using the Open AI package for an easy setup, you can efficiently interact with the service. This method allows Python developers to quickly make API requests and integrate them into Python-based applications.
cURL: Ideal for users who prefer command-line tools, the service supports cURL for making API calls. This method is perfect for quick testing, automation scripts, or when working in environments that don’t require full-fledged SDKs.
JavaScript SDK: For web-based developments, JavaScript SDK provides a straightforward way to integrate available models directly into your applications.
Token limitsToken limits
Studio applies rate limits based on the model you’re using. Essentially, this means that it restricts the units of text that can be processed by the API within a given time frame, depending on the model. Here’s the breakdown:
Meta-Llama-3.1-405B-Instruct: This specific model allows up to 100,000 tokens to be processed per minute.
All other models: The limit is higher for the rest of the available models; you get up to 200,000 tokens per minute.
These limits prevent system overload and ensure resources are efficiently distributed. Despite these optimizations, models on Nebuis AI Studio maintain 99% of the original quality. This means that you get identical output with improved performance.

How to make API calls to LLMHow to make API calls to LLM

You can run an LLM with the Studio API in four simple steps. Here’s how it works:
Initialise the API Client: This is where you set things up. You load the required library (in this case, Python’s OpenAI SDK) and input your API key. This key acts as a password, giving you access to Nebius AI’s models. Essentially, you’re telling your program, “This is the service I want to use, and here’s the access key.”
Create a request: In this step, you decide what you want the model to do. You specify the model (like “Meta-Llama-3.1”) and provide input text or prompts. You can also customise parameters like max_tokens (how long the response should be) or temperature (how creative or random the model should be in its response).
Send the API request: Now you’re ready to send the request to Nebius AI. This is done by making a POST request to the API endpoint (e.g., /v1/chat/completions), where you include the model, the input message, and your API key. It’s like sending a question or task to the model.
Receive and process the response: After the request is sent, the API returns a response from the model. This could be an answer to your prompt, a piece of text, or any other completion task. You can then use this response in your application, whether it’s part of a chatbot, a content generator, or a research tool.
We will see how these steps work in practice by exploring the various API access methods and how they add flexibility to your development lifecycle.

Accessing API with PythonAccessing API with Python

If you want to integrate Llama 3.1 405B into your data science pipelines, get started with Python to access the API. Here is how:
Step 1: Initialise the API clientStep 1: Initialise the API client
This step sets up the connection to Nebius AI’s models. Import the required libraries and pass in your API key using the environment variable.
import os
from openai import OpenAI

# Initialise the API client
client = OpenAI(
 base_url="<https://api.studio.nebius.ai/v1/>",
 api_key=os.environ.get("NEBIUS_API_KEY"),
)

Step 2: Create a requestStep 2: Create a request
In this step, you’re specifying which AI model you want to use and what input (prompt) you want to send. In this case, you’re running the LLama 3.1 405B model. You’ll also customise options like temperature to control the creativity of the model’s responses.
# Create the request with your prompt
completion = client.chat.completions.create(
 model="meta-llama/Meta-Llama-3.1-405B-Instruct",
  messages=[
  {
   "role": "user",
   "content": """What is an API?"""
  }
 ],
 temperature=0.6
)

Step 3: Send the RequestStep 3: Send the Request
This part is already handled when you call the .create() method on the client. Behind the scenes, this sends a request to the Nebius API and waits for a response.
Step 4: Receive and process the responseStep 4: Receive and process the response
Once the model processes your request, it will return a response. You can print this response or use it in your application.

Accessing API with JavaScriptAccessing API with JavaScript

The JavaScript SDK is perfect for embedding LLama 3.1 in a real-time web app. Here is how:
Step 1: Initialise the API clientStep 1: Initialise the API client
In this step, you set up the API client by importing the OpenAI JavaScript SDK and configuring it with your API key. The baseURL is set to Nebius AI’s endpoint.
Like this project
0

Posted Apr 2, 2025

Discover how Nebius AI Studio enables you to integrate the famous Meta Llama 3.1 405B large language model in your applications.

How to Generate High-Quality Synthetic Data for Fine-Tuning Lar…
How to Generate High-Quality Synthetic Data for Fine-Tuning Lar…
How to Deploy Segment Anything Model 2 (SAM 2) With Modelbit
How to Deploy Segment Anything Model 2 (SAM 2) With Modelbit