Understanding NVIDIA NGC file

The NGC specification script includes commands to iterate through the NGC specification file and retrieve the repository ID.

The NVIDIA NGC specification YAML file specifies metadata for NGC AI models, including multiple optimization profiles for each model. These profiles describe how each model is packaged and optimized for specific hardware and use cases (for example, latency or throughput tuning).

models:
  - name: ...
    modelVariants:
      - variantId: ...
        optimizationProfiles:
          - profileId: ...

The profileId of each optimizationProfile is the repository ID we provide as an -ri argument in the script.

The example NVIDIA NGC specification file provided below has the following details:

one model: E5 Embedding v5
one variant under modelVariants: E5 Embedding
one optimizationProfile: nim/nvidia/nv-embedqa-e5-v5:5_FP16_onnx

models:
- name: E5 Embedding v5
  displayName: E5 Embedding v5
  modelHubID: e5-embedding-v5
  category: Embedding
  type: NGC
  description: NVIDIA NIM for GPU accelerated NVIDIA Retrieval QA E5 Embedding v5
    inference
  modelVariants:
  - variantId: E5 Embedding
    displayName: E5 Embedding
    source:
      URL: https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/nv-embedqa-e5-v5
    optimizationProfiles:
    - profileId: nim/nvidia/nv-embedqa-e5-v5:5_FP16_onnx
      displayName: Embedding ONNX FP16
      framework: ONNX
      sha: onnx
      ngcMetadata:
        onnx:
          container_url: https://catalog.ngc.nvidia.com/containers
          model: nvidia/nv-embedqa-e5-v5
          model_type: embedding
          tags:
            llm_engine: onnx
          workspace: !workspace
            components:
            - dst: ''
              src:
                repo_id: ngc://nim/nvidia/nv-embedqa-e5-v5:5_tokenizer
            - dst: onnx
              src:
                repo_id: ngc://nim/nvidia/nv-embedqa-e5-v5:5_FP16_onnx
      modelFormat: onnx
      latestVersionSizeInBytes: 668847682
      spec:
      - key: DOWNLOAD SIZE
        value: 1GB
      - key: MAX TOKENS
        value: 512
      - key: Dimension
        value: 1024
      - key: NIM VERSION
        value: 1.0.1

To download this optimization profile using the airgap script use the following the command:

python3 import_to_airgap.py -do -rt ngc -p $PWD/models -ri nim/nvidia/nv-embedqa-e5-v5:5_FP16_onnx -ns ./ngc_spec.yaml

Optimization profile ID

To understand optimization profiles, pay attention to the infromation highlighted in bold in the following example optimization profile:

nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-h100x2-fp8-latency.0.3.20143152

It conveys the following information:

h100: The NVIDIA GPU type required to run this model is H100.
x2: It specifies the two GPUS of H100.
fp8: The precision is FP8, representing 8-bit floating-point format.
latency: The model profile is designed to optimize latency.

Traversing NVIDIA NGC specification file

The provided NGC specification file is nearly 5,000 lines long, making it tedious to manually locate the profile ID. To simplify this process, the airgap script includes commands to efficiently navigate through the NGC spec file.

Use the following commands to list all the models in the NGC specification file:

# List all models
python import_to_airgap.py -ns ./ngc-spec.yaml --list-all

=== ALL MODELS ===
1. Llama 3.2 Vision Instruct
   Display Name: Llama 3.2 Vision Instruct
   Category: Image to Text Generation
   Hub ID: llama-3.2-vision-instruct
   Description: The Llama 3.2 Vision instruction-tuned models are optimized for visual recognition, image reasoning,...

2. Mixtral Instruct
   Display Name: Mixtral Instruct
   Category: Text Generation
   Hub ID: mixtral-instruct
   Description: The Mixtral Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts model. M...

3. E5 Embedding v5
   Display Name: E5 Embedding v5
   Category: Embedding
   Hub ID: e5-embedding-v5
   Description: NVIDIA NIM for GPU accelerated NVIDIA Retrieval QA E5 Embedding v5 inference

To display all variants of a specific model, use the -m parameter to specify the model name from the list above, along with the --list-variants parameter to list all available model variants.

python3 import_to_airgap.py -ns ./ngc-spec.yaml -m "Llama 3.2 Vision Instruct" --list-variants

=== VARIANTS FOR 'LLAMA 3.2 VISION INSTRUCT' ===
1. Llama 3.2 11B  Vision Instruct
2. Llama 3.2 90B  Vision Instruct

To list all the optimization profiles for a given model and a model variant, use the following command:

python3 import-pvc.py -ns ./ngc-private.yaml -m "Llama 3.2 Vision Instruct" -vid "Llama 3.2 11B  Vision Instruct" --list-profiles

=== OPTIMIZATION PROFILES FOR 'LLAMA 3.2 VISION INSTRUCT' VARIANT 'LLAMA 3.2 11B  VISION INSTRUCT' ===
1. nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-h100x2-bf16-latency.0.3.20143152
2. nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-a10gx4-bf16-throughput.0.3.20143152
3. nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-a10gx8-bf16-latency.0.3.20143152
4. nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-h100x2-fp8-latency.0.3.20143152
....

Select an optimization profile that matches your hardware requirements and provide it as the repository ID using the -ri parameter in the airgap script to download the specific NGC model profile.

python3 import_to_airgap.py -do -rt ngc -p $PWD/models -ns ./ngc-spec.yaml -ri nim/meta/llama-3.2-11b-vision-instruct:0.15.0.dev2024102300+ea8391c56-h100x2-fp8-latency.0.3.20143152