Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add model source properties to store metadata about origin of a model artifact, fixes RHOAIENG-19885 #838

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dhirajsb
Copy link
Contributor

@dhirajsb dhirajsb commented Feb 27, 2025

Description

Add the following model source properties to store metadata about origin of a model artifact
ModelSourceKind
ModelSourceClass
ModelSourceGroup
ModelSourceId
ModelSourceName

Note that these properties are used to reference an external source where a model artifact was created.
A fuller lineage model for tracking ML Experiments and Runs will be added in the future as model registry API resource types.
These properties also support referencing arbitrary model artifact sources, without constraining it to ML models only.

Fixes RHOAIENG-19885

How Has This Been Tested?

Added new model source properties to existing unit tests for model artifacts in backend service as well as Python client API and tests.

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.

If you have UI changes

  • The developer has added tests or explained why testing cannot be added.
  • Included any necessary screenshots or gifs if it was a UI change.
  • Verify that UI/UX changes conform the UX guidelines for Kubeflow.

… model artifact, fixes RHOAIENG-19885

Signed-off-by: Dhiraj Bokde <[email protected]>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tarilabs for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @dhirajsb , overall is a good information to add to the logical model.

  • do we really need to denormalize between (Kind, Class) and (Id, Name)? unless I'm missing anythin in the example in the case of a PipelineRun one field would have been enough and it's up to the producer+consumer (as you rightly say) to manage this field to their need, that is if they want to make it even more explicit they could use the "FQDN" of the resource
  • why on ModelArtifact and not on ModelVersion as discussed? A ML model is an ensemble of assets with one "trainer" source.

@dhirajsb
Copy link
Contributor Author

dhirajsb commented Feb 28, 2025

do we really need to denormalize between (Kind, Class) and (Id, Name)? unless I'm missing anythin in the example in the case of a PipelineRun one field would have been enough and it's up to the producer+consumer (as you rightly say) to manage this field to their need, that is if they want to make it even more explicit they could use the "FQDN" of the resource

The idea is to allow clients to search by increasing smaller groups of model sources. So, we can support search by kind, class, group, etc.

why on ModelArtifact and not on ModelVersion as discussed? A ML model is an ensemble of assets with one "trainer" source.

A version could be a collection of multiple model artifacts all coming from separate sources, huggingface, pipelines, etc. to create a composite model like an agentic system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants