Understanding NVIDIA NGC file
The NGC specification script includes commands to iterate through the NGC specification file and retrieve the repository ID.
The NVIDIA NGC specification YAML file specifies metadata for NGC AI models,
including multiple optimization profiles
for each model. These profiles describe
how each model is packaged and optimized for specific hardware and use cases (for example,
latency or throughput tuning).
models:
- name: ...
modelVariants:
- variantId: ...
optimizationProfiles:
- profileId: ...
The profileId
of each optimizationProfile
is the repository
ID we provide as an -ri
argument in the script.
- one model:
E5 Embedding v5
- one variant under
modelVariants
:E5 Embedding
- one
optimizationProfile
:nim/nvidia/nv-embedqa-e5-v5:5_FP16_onnx
models:
- name: E5 Embedding v5
displayName: E5 Embedding v5
modelHubID: e5-embedding-v5
category: Embedding
type: NGC
description: NVIDIA NIM for GPU accelerated NVIDIA Retrieval QA E5 Embedding v5
inference
modelVariants:
- variantId: E5 Embedding
displayName: E5 Embedding
source:
URL: https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/nv-embedqa-e5-v5
optimizationProfiles:
- profileId: nim/nvidia/nv-embedqa-e5-v5:5_FP16_onnx
displayName: Embedding ONNX FP16
framework: ONNX
sha: onnx
ngcMetadata:
onnx:
container_url: https://catalog.ngc.nvidia.com/containers
model: nvidia/nv-embedqa-e5-v5
model_type: embedding
tags:
llm_engine: onnx
workspace: !workspace
components:
- dst: ''
src:
repo_id: ngc://nim/nvidia/nv-embedqa-e5-v5:5_tokenizer
- dst: onnx
src:
repo_id: ngc://nim/nvidia/nv-embedqa-e5-v5:5_FP16_onnx
modelFormat: onnx
latestVersionSizeInBytes: 668847682
spec:
- key: DOWNLOAD SIZE
value: 1GB
- key: MAX TOKENS
value: 512
- key: Dimension
value: 1024
- key: NIM VERSION
value: 1.0.1
python3 import_to_airgap.py -do -rt ngc -p $PWD/models -ri nim/nvidia/nv-embedqa-e5-v5:5_FP16_onnx -ns ./ngc_spec.yaml
Optimization profile ID
To understand optimization profiles, pay attention to the infromation highlighted in bold in the following example optimization profile:
nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-h100x2-fp8-latency.0.3.20143152
h100
: The NVIDIA GPU type required to run this model is H100.-
x2
: It specifies the two GPUS of H100. fp8
: The precision is FP8, representing 8-bit floating-point format.latency
: The model profile is designed to optimize latency.
Traversing NVIDIA NGC specification file
The provided NGC specification file is nearly 5,000 lines long, making it tedious to manually locate the profile ID. To simplify this process, the airgap script includes commands to efficiently navigate through the NGC spec file.
Use the following commands to list all the models in the NGC specification file:
# List all models
python import_to_airgap.py -ns ./ngc-spec.yaml --list-all
=== ALL MODELS ===
1. Llama 3.2 Vision Instruct
Display Name: Llama 3.2 Vision Instruct
Category: Image to Text Generation
Hub ID: llama-3.2-vision-instruct
Description: The Llama 3.2 Vision instruction-tuned models are optimized for visual recognition, image reasoning,...
2. Mixtral Instruct
Display Name: Mixtral Instruct
Category: Text Generation
Hub ID: mixtral-instruct
Description: The Mixtral Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts model. M...
3. E5 Embedding v5
Display Name: E5 Embedding v5
Category: Embedding
Hub ID: e5-embedding-v5
Description: NVIDIA NIM for GPU accelerated NVIDIA Retrieval QA E5 Embedding v5 inference
To display all variants of a specific model, use the -m
parameter
to specify the model name from the list above, along with the --list-variants
parameter to list all available model variants.
python3 import_to_airgap.py -ns ./ngc-spec.yaml -m "Llama 3.2 Vision Instruct" --list-variants
=== VARIANTS FOR 'LLAMA 3.2 VISION INSTRUCT' ===
1. Llama 3.2 11B Vision Instruct
2. Llama 3.2 90B Vision Instruct
To list all the optimization profiles for a given model and a model variant, use the following command:
python3 import-pvc.py -ns ./ngc-private.yaml -m "Llama 3.2 Vision Instruct" -vid "Llama 3.2 11B Vision Instruct" --list-profiles
=== OPTIMIZATION PROFILES FOR 'LLAMA 3.2 VISION INSTRUCT' VARIANT 'LLAMA 3.2 11B VISION INSTRUCT' ===
1. nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-h100x2-bf16-latency.0.3.20143152
2. nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-a10gx4-bf16-throughput.0.3.20143152
3. nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-a10gx8-bf16-latency.0.3.20143152
4. nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-h100x2-fp8-latency.0.3.20143152
....
Select an optimization profile that matches your hardware requirements and provide it as the
repository ID using the -ri
parameter in the airgap script to download the
specific NGC model profile.
python3 import_to_airgap.py -do -rt ngc -p $PWD/models -ns ./ngc-spec.yaml -ri nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-h100x2-fp8-latency.0.3.20143152