Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(openapi): Model catalog API spec #823

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pboyd
Copy link
Contributor

@pboyd pboyd commented Feb 19, 2025

Description

Proposed model catalog API spec to implement #702. Draws heavily from the spec in PR #725, and builds on the (presently unmerged) changes PR #822.

This is a draft right now, because there's no implementation. But it would be helpful to get some feedback on the spec.

How Has This Been Tested?

Nothing to test right now, as it's just an API spec.

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.

The code in the repo has deviated from what openapi-generator-cli
produces. This overrides the templates so that the generated code is
equivalent.

Signed-off-by: Paul Boyd <[email protected]>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tarilabs for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Proposed model catalog API spec. Draws heavily from the spec in PR kubeflow#725.

Signed-off-by: Paul Boyd <[email protected]>
@pboyd
Copy link
Contributor Author

pboyd commented Feb 24, 2025

@dhirajsb @dtrifiro could I get your eyes on this? It's not nearly as big as it looks, it's mostly an update to the openapi spec and re-generated code.

@rareddy
Copy link
Contributor

rareddy commented Feb 25, 2025

I read through a high-level API; +1 for generating the code. In the spirit, this looks closer to what @dhirajsb had in his PR (#725), where the definition of models was separated between the MR and MC. As I understood the ask from @danielezonca his ask is if we could make common schema for model. I understand they do not match currently due the model we adopted to support with ml-metadata.

@dhirajsb
Copy link
Contributor

@pboyd instead of adding the catalog types right away, can we create a PR with the base types as we discussed along with @adysenrothman?
I know there aren't a lot of common fields based on the catalog types schema in this PR, but I still think we will benefit in the long run from having common BaseModel* types so that we can keep shared properties in the base types.

@rareddy
Copy link
Contributor

rareddy commented Feb 26, 2025

@dhirajsb when we know a model comes with certain top-level properties as showcased in the CatalogModel why can't that become the base model and derive the RegisteredModel from it with additional custom properties and properties to add to it to make it generic for user so that they can use that for arbitary properties? maybe we went too conservative in the first round to define model properties as we were adopting MLMD types as is.

"500":
$ref: "#/components/responses/InternalServerError"
operationId: getAllCatalogModels
/api/model_catalog/v1alpha3/sources/{source_id}/models/{model_id}:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a duplication of
/api/model_catalog/v1alpha3/models
where source is one of the parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are different. /api/model_catalog/v1alpha3/models returns a list of models (optionally filtered by source), and /api/model_catalog/v1alpha3/sources/{source_id}/models/{model_id} returns a single model.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I missed the {model_id} part :)

type: string
in: path
required: true
/api/model_catalog/v1alpha3/sources/{source_id}/models/{model_id}/readme:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the key to identify a model in this context?
model_id or the combination of source_id and model_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_id and model_id are intended to be a composite primary key.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model_id might be itself composite: for example in the case of HuggingFace the model_id should be org/repo (i.e. ibm-granite/granite-guardian-3.0-8b) while source_id will be something like hugging-face.

Moreover when we import the model from catalog to registered model we will need to have a single value as key that is unique so I would consider making model_id primary key in catalog already composing with source_id

required:
- items
- $ref: "#/components/schemas/BaseResourceList"
CatalogModel:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a PR with the base types

I agree, we can use CatalModel as base type if it has all the common shared properties.
If we prefer to have a different base entity that's fine too but we should really limit the hierarchy of model metadata to avoid possible incompatibilities/gaps when we convert one "model type" (i.e. catalog model) in another "model type" (i.e. registered model)

items:
$ref: "#/components/schemas/CatalogModelArtifact"
- $ref: "#/components/schemas/BaseResourceDates"
CatalogModelArtifact:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it differ with existing ModelArtifact?
If it is a similar concept I would apply the same comment as before: define a common type or extend one type to implement the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the full list of fields for each one.

ModelArtifact:

  • customProperties
  • description
  • externalId
  • uri
  • state
  • name
  • id
  • createTimeSinceEpoch
  • lastUpdateTimeSinceEpoch
  • artifactType
  • modelFormatName
  • storageKey
  • storagePath
  • modelFormatVersion
  • serviceAccountName

CatalogModelArtifact

  • createTimeSinceEpoch
  • protocol
  • tags
  • uri

There isn't much in common, currently only createTimeSinceEpoch and uri. I'll make a base type with the date fields and a URI.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect a CatalolgModelArtifact to have name and description at least.
At the same time artifactType, modelFormatName, storageKey, storagePath, modelFormatVersion can become common and maybe replace protocol.

@pboyd
Copy link
Contributor Author

pboyd commented Feb 26, 2025

@rareddy Yes, it is much closer to @dhirajsb's #725.

@dhirajsb @rareddy Perhaps I have misunderstood the request, but there aren't many fields in common. Registered models have the generic MLMD customProperties and there aren't many fields in common with the catalog model. I did pull the date fields into a common base, but there weren't any other common fields. I can make a PR with just that, if you want.

@rareddy
Copy link
Contributor

rareddy commented Feb 26, 2025

Perhaps I have misunderstood the request, but there aren't many fields in common.

See that is based on the current design and adoption of what MLMD folks designed as their based Model Type. Ignoring what MLMD brings, we have a candidate list of properties from the Model Catalog side, what I am asking why not make those as top-level properties and make the MLMD model extend that. This will avoid shoving lot of properties as part of custom-properties if we were to adopt MLMD model as base type. wdyt?

when we push a catalog model into MR we need to still capture the model properties into MR, this helps there.

The advantage is we can use the same UI rendering for both catalog and MR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants