Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net MEVD: POCO property access and trimming/NativeAOT compatibility #10256

Open
roji opened this issue Jan 22, 2025 · 0 comments
Open

.Net MEVD: POCO property access and trimming/NativeAOT compatibility #10256

roji opened this issue Jan 22, 2025 · 0 comments
Assignees
Labels
msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code triage

Comments

@roji
Copy link
Member

roji commented Jan 22, 2025

MEVD includes serialization/deserialization of arbitrary user .NET types (POCO) (just like e.g. System.Text.Json), which is always a problematic scenario for trimming compatibility. With the current design, VectorStoreRecordPropertyReader reflects over the user POCO, extracting PropertyInfos (and ConstructorInfos); these PropertyInfos are later used to read and write the POCO for serialization/deserialization.

Work was done in #9375 to make SemanticKernel trimming- and NativeAOT-friendly. In MEVD, this consisted of adding [DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicProperties | DynamicallyAccessedMemberTypes.PublicConstructors)] for the user POCO type, which causes all properties to be preserved. While this approach is currently sufficient, we plan to support hierarchical data models in the future, where the user POCO can reference other POCOs, as a way of modeling e.g. JSON records in the database (for MongoDB, Cosmos, Milvus and others, see e.g. #10152 (comment)). The current approach can't be made to work for hierarchical models in the future, since the linker can't be made aware that anything referenced by the top-level POCO needs to also have its public properties/constructors preserved (recursively).

Even for users not requiring trimming (or hierarchical data models), using reflection to read/write properties is generally not very efficient.

The general solution here likely requires a source generator, similar to how System.Text.Json handles trimming compatibility.

One important aspect of integrating a source generator, is that we won't be able to do any sort of pluggability/interaction with specific providers (unless each provider has its own source generator, which seems like a non-starter to me). For example, the current VectorStoreRecordPropertyReader accepts various "capabilities" from the provider (SupportsMultipleKeys, RequiresAtLeastOneVector...) - it won't be possible to pass these to the source generator. As a result, the source generator will likely generate only the minimum getters/setters for the properties it finds on the user POCO; the schema definition will likely need to be generated at runtime - taking provider-specific concerns into account - via a different reflection process.

/cc @westey-m @SergeyMenshykh

@roji roji added .NET Issue or Pull requests regarding .NET code msft.ext.vectordata Related to Microsoft.Extensions.VectorData labels Jan 22, 2025
@roji roji self-assigned this Jan 22, 2025
@github-actions github-actions bot changed the title .NET MEVD: POCO property access and trimming/NativeAOT compatibility .Net MEVD: POCO property access and trimming/NativeAOT compatibility Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code triage
Projects
None yet
Development

No branches or pull requests

2 participants