Skip to content

Added API to get multi model deployment config #1055

New issue

Have a question about this project? Sign up for a free account to open an issue and contact its maintainers and the community.

By clicking “Sign up for ”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on ? Sign in to your account

Conversation

lu-ohai
Copy link
Member

@lu-ohai lu-ohai commented Feb 4, 2025

Added API to get multi model deployment config

Input/Output format

  • The input format for multi model deployment config is listed as below
{
    "shape": [
        "VM.GPU.A10.2",
        "VM.GPU.A10.4",
        "BM.GPU.A100-v2.8",
        "BM.GPU.H100.8"
    ],
    "configuration": {
        "VM.GPU.A10.4": {
            "parameters": {
                "VLLM_PARAMS": "--trust-remote-code --max-model-len 60000"
            },
            "multi_model_deployment": [
                {
                    "gpu_count": 1,
                    "parameters": {
                        "VLLM_PARAMS": "--trust-remote-code --max-model-len 32000"
                    }
                },
                {
                    "gpu_count": 2,
                    "parameters": {
                        "VLLM_PARAMS": "--trust-remote-code --max-model-len 32000"
                    }
                }
            ]
        }
    }
}
  • The output response format is listed as below
{
    "deployment_config": {
        "model_ocid_1": {
            "shape": [
                "BM.GPU.A10.4",
                "BM.GPU4.8",
                "BM.GPU.L40S-NC.4",
                "BM.GPU.A100-v2.8",
                "BM.GPU.H100.8"
            ],
            "configuration": {
                "BM.GPU.A10.4": {
                    "parameters": {
                        "VLLM_PARAMS": "--enforce-eager --max-num-seqs 16 --max-model-len 65536"
                    },
                    "multi_model_deployment": [
                      {
                          "gpu_count": 1,
                          "parameters": {
                              "VLLM_PARAMS": "--trust-remote-code --max-model-len 32000"
                          }
                      },
                      {
                          "gpu_count": 2,
                          "parameters": {
                              "VLLM_PARAMS": "--trust-remote-code --max-model-len 32000"
                          }
                      }
                  ]
                },
            }
        },
        "model_ocid_2": {
            "shape": [
                "BM.GPU.A10.4",
                "BM.GPU4.8",
                "BM.GPU.L40S-NC.4",
                "BM.GPU.A100-v2.8",
                "BM.GPU.H100.8"
            ],
            "configuration": {
                "BM.GPU.A10.4": {
                    "parameters": {
                        "VLLM_PARAMS": "--enforce-eager --max-num-seqs 16 --max-model-len 65536"
                    },
                    "multi_model_deployment": [
                      {
                          "gpu_count": 1,
                          "parameters": {
                              "VLLM_PARAMS": "--trust-remote-code --max-model-len 32000"
                          }
                      },
                      {
                          "gpu_count": 2,
                          "parameters": {
                              "VLLM_PARAMS": "--trust-remote-code --max-model-len 32000"
                          }
                      }
                  ]
                },
            }
        }
    },
    "gpu_allocation": {
        "VM.GPU.A10.4": {
            "models": [
                {
                    "ocid": "model_ocid_1",
                    "gpu_count": 2
                },
                {
                    "ocid": "model_ocid_2",
                    "gpu_count": 2
                }
            ],
            "total_gpus_available": 4
        }
    }
}

Notebook

  • No possible gpu allocations savailable
Screenshot 2025-02-05 at 4 22 05 PM
  • No primary model id provided
Screenshot 2025-02-05 at 4 22 28 PMScreenshot 2025-02-05 at 4 22 36 PMScreenshot 2025-02-05 at 4 22 43 PM
  • Primary model id provided (id ending with jwsq) and it gets the maximum gpu count.
Screenshot 2025-02-05 at 4 56 09 PMScreenshot 2025-02-05 at 4 56 17 PM

@oracle-contributor-agreementoracle-contributor-agreement bot added the OCA VerifiedAll contributors have signed the Oracle Contributor Agreement.label Feb 4, 2025
@github-actionsGitHub Actions
Copy link

-actions bot commented Feb 4, 2025

📌 Cov diff with main:

Coverage-87%

📌 Overall coverage:

Coverage-56.84%

@github-actionsGitHub Actions
Copy link

-actions bot commented Feb 4, 2025

📌 Cov diff with main:

Coverage-95%

📌 Overall coverage:

Coverage-56.84%

@lu-ohailu-ohai changed the base branch from main to feature/multi_model_deployment February 4, 2025 19:37
@lu-ohailu-ohai marked this pull request as ready for review February 4, 2025 19:48
@mrDzurb
Copy link
Member

Hi @lu-ohai, can you add more description into the PR? Also add the test and validation details. Share what is the expected input data and what would be the output, just provide a couple of use cases.

Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, the get_multimodel_compatible_shapes API might be a very slow operation given that we have to get the config file from object storage for each model. On average, the get_deployment_config API call takes 4-6 seconds for each model. It could result in a bad experience if user select 2-3 models and waits for 10-15 seconds only to see a message saying the combination is not feasible. We can probably cache the result for each model so that the subsequent calls are faster. Or we can send parallel async requests to fetch multiple configs, instead of reading them sequentially. Some testing will be required to confirm what optimizations may be required.

cc: @mrDzurb

@mrDzurb
Copy link
Member

overall, the get_multimodel_compatible_shapes API might be a very slow operation given that we have to get the config file from object storage for each model. On average, the get_deployment_config API call takes 4-6 seconds for each model. It could result in a bad experience if user select 2-3 models and waits for 10-15 seconds only to see a message saying the combination is not feasible. We can probably cache the result for each model so that the subsequent calls are faster. Or we can send parallel async requests to fetch multiple configs, instead of reading them sequentially. Some testing will be required to confirm what optimizations may be required.

cc: @mrDzurb

Totally agree. We should use both technics caching and threadpool.

4-6 seconds to read a file from OS bucket, this is insane :)

@lu-ohai
Copy link
Member Author

@mrDzurb @VipulMascarenhas Based on the testing, fetching configs from three model ids takes roughly 5-6 microseconds, so wondering under which case it takes get_deployment_config api 6 seconds to complete. I think we can add the cache layer the followup pr if needed.
Screenshot 2025-02-05 at 4 56 09 PM

Copy link
Member

@darenr darenr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice code

Copy link
Member

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@VipulMascarenhasVipulMascarenhas merged commit 0f08a64 into feature/multi_model_deployment Feb 6, 2025
1 check passed
Sign up for free to join this conversation on . Already have an account? Sign in to comment
Labels
OCA VerifiedAll contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants