Skip to content

Version Tracking

Chase William edited this page Aug 16, 2023 · 3 revisions

I wrote this so that I may decide which of my two approaches for versioning tracking is superior.

In short, State-to-State is the victor. 🎉

State-to-State

This approach would mean forming relationships directly between state nodes to provide the needed baseline graph structure with version tracking.

In the diagram below we can see two identity nodes, node i0 and i1 each referencing their own state nodes and a specific version of i1 is dependent on a state node of i0.

graph

Using the above diagram, queries will be able to traverse directly from one state node to the other. However, linking state nodes to other state nodes has a drawback; it will produce new relationships with all dependencies state nodes upon insertion.

Using a state-to-state approach, an assembly with 20 types is added to the database. At this point, 20 relationships exist connecting each type state node to the assembly state node. However, a mistake was made and the assembly's target framework was changed and therefore has now been pushed to the database. A new state node for the assembly is made and forms relationships with all existing type state nodes as they haven't changed. In this example, even though not a single type was changed, a new relationship was needed for each existing type.

Drawbacks

The follow equation calculates the number of relationships that will exist when adding new assembly states where $t$ represents a constant number of types that exist within that assembly. (This is for rough estimation)

a, number of assembly states
t, number of type states
r, number of resulting relationships

$a+a*t+t=r$

Using this equation, releasing 5 versions of an assembly with 20 types in each, 125 relationships with exist.

  • 100 state-to-state
  • 25 identity-to-state

Identity-to-Identity

This approach involves forming relationships between identity nodes as a means to provide the baseline graph structure with version tracking.

image

Using the above diagram, queries will need to traverse through identity nodes to reach the desired state node.

Drawbacks

  1. An issue with approach is if the identity node being depended on changes; how will the depending node know which identity is to be queried? To resolve this, the assembly and type will need to be recorded somewhere to aid the query.
  2. Linking identity nodes to other identity nodes and querying through the relationship to find the correct state has a critical issue: how do we know what state (version) to look for? To resolve this, the version being depended on must be held somewhere in the referencing node.

Identity-to-Identity vs. State-to-State

The Identity-to-Identity approach reduces the number of relationships within the graph at the expensive of increased complexity because of the metadata needing to be retained so that queries can operate.

The State-to-State approach reduces complexity by forming large amounts of simply understood relationships.

My decision? Well... I will use the State-to-State approach as it's simplicity is appealing due to the fact this project is a solo endeavor. I am one person, and the increased complexity of the Identity-to-Identity is not welcomed even at the expense of possible performance gains (saying the implementation I'd write handles them well). I do not know everything in this space, using practices like indexing and others with the simpler State-to-State approach may just provide what I need. Lastly, I will learn more trying to leverage industry standard tools than breeding up my own custom concoction in attempt to handle problems presented in the Identity-to-Identity approach.

So Moving on...

Generic Rules

  • State nodes only depend on other state nodes.
    • Depending on an identity node is not allowed as that suggest the relationship is constant throughout versions.
  • Each state node has a property containing versions.
    • Each state may be identical in multiple versions, therefore, an array is used.
    • Relationships do not contain version info so the state node may act as a single source of truth.
    • It will become ambiguous which version a state node depends on of another state node when the other has identical states throughout multiple versions. (Basically referencing an array of valid values) ❌ See Diagram Below

graph

  • It should be impossible for s4 to depend on s3 as the relationship between s2 and s5 proves s4 existed before s3.
    • This can be deferred as the version and it's dependencies are inserted holistically or not at all at one given time. There are not partial inserts where later updates are required.
  • Identity nodes only contain immutable data.
    • The contents of an identity node are to never change, any mutable content should find itself resting within a state node.
  • The absence of a state node in a particular version means it was deleted in that release. (when viewing a version that was successfully inserted)
    • Absence, depending on context can also mean the version was never added to the graph.
  • A state node should never reference another state node that does not already exist.
    • Dependencies should be inserted before dependents.

Insertion Process

This section presents guidelines for insertion of new data into the graph.

  1. Check for existing identity node.
  2. (OPTIONAL) Create identity node if needed.
  3. Create state node.
  4. Attach state node to identity node.

The steps above should be repeated for all types of identity/state node groups.