.Net MEVD: POCO property access and trimming/NativeAOT compatibility #10256
Labels
msft.ext.vectordata
Related to Microsoft.Extensions.VectorData
.NET
Issue or Pull requests regarding .NET code
triage
MEVD includes serialization/deserialization of arbitrary user .NET types (POCO) (just like e.g. System.Text.Json), which is always a problematic scenario for trimming compatibility. With the current design, VectorStoreRecordPropertyReader reflects over the user POCO, extracting PropertyInfos (and ConstructorInfos); these PropertyInfos are later used to read and write the POCO for serialization/deserialization.
Work was done in #9375 to make SemanticKernel trimming- and NativeAOT-friendly. In MEVD, this consisted of adding
[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicProperties | DynamicallyAccessedMemberTypes.PublicConstructors)]
for the user POCO type, which causes all properties to be preserved. While this approach is currently sufficient, we plan to support hierarchical data models in the future, where the user POCO can reference other POCOs, as a way of modeling e.g. JSON records in the database (for MongoDB, Cosmos, Milvus and others, see e.g. #10152 (comment)). The current approach can't be made to work for hierarchical models in the future, since the linker can't be made aware that anything referenced by the top-level POCO needs to also have its public properties/constructors preserved (recursively).Even for users not requiring trimming (or hierarchical data models), using reflection to read/write properties is generally not very efficient.
The general solution here likely requires a source generator, similar to how System.Text.Json handles trimming compatibility.
One important aspect of integrating a source generator, is that we won't be able to do any sort of pluggability/interaction with specific providers (unless each provider has its own source generator, which seems like a non-starter to me). For example, the current VectorStoreRecordPropertyReader accepts various "capabilities" from the provider (SupportsMultipleKeys, RequiresAtLeastOneVector...) - it won't be possible to pass these to the source generator. As a result, the source generator will likely generate only the minimum getters/setters for the properties it finds on the user POCO; the schema definition will likely need to be generated at runtime - taking provider-specific concerns into account - via a different reflection process.
/cc @westey-m @SergeyMenshykh
The text was updated successfully, but these errors were encountered: