Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: IBA/IBE/ICE refactor #564

Open
alanruttenberg opened this issue Nov 26, 2024 · 16 comments
Open

Proposal: IBA/IBE/ICE refactor #564

alanruttenberg opened this issue Nov 26, 2024 · 16 comments
Assignees
Labels
for 2.1 release These are changes we would like to see addressed under the 2.1 release

Comments

@alanruttenberg
Copy link
Contributor

Here is the outline of what I would change. I'll rewrite definitions if there is consensus that these are the changes to make. There are further changes I would make, adding axioms and additional classes, but this is the bare minimum.

IBAs that make more sense as ICEs

  • Barcode and subclasses
  • Book
  • Certificate
  • Database
  • Document
  • Image, Chart
  • Information line (unless you are talking about things like the touch bar on a macbook)
  • Journal article
  • List, Code List
  • Message and subclasses
  • Spreadsheet
  • Video

Rearrangements:

Book, Journal article to be subclass of Document. I'm unsure whether
spreadsheet should also be but lean that way. Document, I think, connotes a whole. Images
and Charts may be documents, but also may be only parts of documents.

Rename

Document Form -> Form Document. The label makes it sound like a part of document.

Material artifacts

Some of the above are sometimes interesting as physical items, things
you would track copies of.

Book Artifact: Material artifact and is carrier of some Book
Document Artifact : Material artifact and is carrier of some Document

An alternative would not to define them and just have material
artifact instances and relate them to what they carry, in the RDF.

remaining IBA classes

Timekeeping Instrument with subclass System Clock, Instrument Display
Panel, seem to me to be Information Medium Artifacts.

IBE

Document Field: Move to ICE. Add axiom: continuant part of some Document

Deprecate IBA.

The distinction between IBE and IBA is minor and IBE is more general. It's arguable whether
a tree with a carving "A heart L" is an IBA, but it is clearly an IBE.

Properties whose domains or range is IBE

Deprecate the 'has value' properties like 'has text value' in
favor of a single 'has value' property. That would include all data
properties other than: 'has latitude value', 'has longitude value', and
'as WKT'. The typing of the relation is unecessary - the type information is available from the value, and restrictions could be added to constrain types where relevant.

In the below relations, change IBE to ICE in domain or range

  • is excerpted from (both). Consider Document as D/R, else ICE, but I think that too broad.
  • is geospatial coordinate reference system of, uses geospatial coordinate system
  • is measurment unit of, uses measurement unit.
  • is reference system of, uses reference system
  • language used in, uses language
  • time zone identifier used by, uses time zone identifier

is_tokenized_by

Deprecate and suggest has value instead.

Logistics

The proper thing to do is to deprecate old terms and create new terms
where there is a significant change, like IBA->ICE. The alternative is
to keep using the same IRIs, but this risks cases where domain
ontologies have specialized below the top-level classes. So specializing
Document will be fine in the switch, because a subclass of Document will
remain a subclass of Document in the switch. However, if a new direct
subclass of IBA is created, that won't move, as the proposal does not
include equating ICE with the old IBE.

For rearrangements, such as putting Book and Journal article below
Document, the potential damage is less, and in such cases the terms do
not need to be deprecated.

There will be a mapping file from english name IRIs to numeric IRIs. I'd
suggest that the old classes not be included in that mapping, but a
supplementary information mapping that maps deprecated terms to
alternatives be included as an adjunct.

@alanruttenberg
Copy link
Contributor Author

Bump: This has been in the works for a long time. Could people please weigh in so we can finally get this done?

@swartik
Copy link

swartik commented Dec 3, 2024

@alanruttenberg I agree with most of your proposal. I'm not convinced about barcodes. Barcode subclasses do seem to be more about standards than about material artifacts. However, someone went to the trouble to add them to CCO, presumably because they thought that level of detail would be useful. Class Code 93 Barcode has its uses as formulated. Members of the class are individuals that conform to the Code 93 standard. Formalizing that standard as a subclass of Directive Information Content Entity is also useful, but I'm not prepared to advocate for eliminating the class hierarchy under material artifact as well. I would:

  • Keep class Barcode where it is.
  • Create a hierarchy under Directive Information Content Entity, using labels like "Codabar Barcode Specification".

In the course of composing this reply, I've begun to wonder why CCO has this much detail on barcodes. Can anyone supply some history? I would be open to moving subclasses of Barcode to a separate module.

@giacomodecolle
Copy link

I agree with the general point about moving the focus to ICEs and refactoring classes.

I dislike spreadsheet being a subclass of document, as I wonder whether that would entail that a lot of other types of databases end up being documents alongside spreadsheets, and that doesn't sound right to me.

Can you elaborate a bit on the logistics regarding existing extensions of CCO? I am thinking about cases such as the Cyber Ontology, where a lot of development has been done under the IBA side of the hierarchy. If I understand correctly, you are currently suggesting that direct children of IBA are not necessarily moved?

@BrendaBraitling
Copy link

@alanruttenberg Thinking from an information systems design perspective, "Document serving" is the basis of most of what happens on websites, emails, messaging systems. Document servers handle discrete packages rather than streaming services, for example. Anything that is served on a document server system would be related to a "Document." When we see streaming on a website, the website document is providing a place saver to serve streaming content from a streaming server.

So when a Document is defined in a top level or midlevel ontology, it has deep technical meaning to the digital system designers around the world dealing in document server systems.

For those who deal with the physical world, a document is a discrete, portable artifact used to share information about something. There are identity documents, license documents, etc. But there are also various publications, reports, etc...

For the record, information systems design handles how humans interact with both digital and physical information processing technologies :)

There is no doubt in my mind that a spreadsheet is a document in the digital or physical world because it has content (Body), a beginning (Header) and an end (Footer); communicates information having a particular data architecture or content and is portable for sharing on paper or electronically.

@alanruttenberg
Copy link
Contributor Author

@giacomodecolle I don't follow the Cyber ontology, but if they are building out under CCO's IBA they are likely making the same conceptual error that we see in CCO. In my proposal I do indeed suggest deprecating most classes of IBA, and I would suggest that, to the extent that the Cyber Ontology builds under IBA, once the change to CCO is made, they appropriately update their ontology, or remain using an earlier version of CCO.

Consider 'Email Message'. The sense of that term under IBA would have to be a particular piece of paper with the message written on it. It's highly unlikely that that's the intended sense in almost any ontology that has subject matter regarding email messages. Most phishing emails, for example, have no power in their printed form and instead rely on clicking a link on what is effectively a concretization of the message that is currently being presented on any of their screens.

So keeping the term, and having an ontology keep using it, would mean supporting the likely incorrect use of the term. And, once people start using the proper ICE term we would have interoperability problems, with some ontologies using the IBA term and others the ICE term.

My proposal did suggest keeping a couple of terms (and perhaps other of similar nature) like Material Book, or Material Document, to mean those sets of printed pages, because sometimes the physical artifacts are tracked and so these terms would be useful.

Should developers of the Cyber Ontology or others affected by this change wish to discuss this in more detail, I'd be happy to do so.

@alanruttenberg
Copy link
Contributor Author

@swartik #561 suggests factoring out most barcode classes into a separate domain ontology. I don't agree with you regarding barcode typically denoting a physical object. The "same" barcode is printed on e.g. all the local brand 1% milk at my store. There would a stronger argument for RFIDs having an IBA sense, as each is unique, as well as an ICE sense for the indentifier.

@swartik
Copy link

swartik commented Dec 9, 2024

@alanruttenberg, what you describe is something that I've never seen explicitly stated in CCO, or in BFO for that matter. Please correct me if I'm wrong.

You're right that the same barcode is printed in lots of places. The images on each carton are physically different only because they are made with different ink.

For two barcodes on two milk cartons, the difference is trivial and maybe not worth the effort. It becomes less trivial as you scale up. Are two cartons of milk of the same brand and fat percentage represented by the same individual? Are two printed copies of the same book represented by the same individual? Are two cars of the same make and model represented by the same individual? To take the example to a (fictional) extreme, are our Earth and the Earth in the Star Trek episode Miri represented by the same individual?

There is a case to be made for answering yes to all of these questions. As usual, it depends on what one's ontology is trying to express. Does CCO (or BFO) have a position on whether every physically distinct entity is a different individual? Does the answer depend on the class, and if so, what's the distinguishing characteristic?

My personal viewpoint: I wouldn't want to represent every single printed barcode as a different individual because I can't see the need to state that ink batch 1 was used to print barcode 1 and ink batch 2 was used to print barcode 2. But, should I want to express that a carton of Safeway Brand 1% milk has a particular barcode, I'd do it by:

  1. Creating a barcode artifact subclass I didn't expect to populate, placing subclass restrictions on the class about the patterns any member of the class must bear, and the ICE it is a carrier of.
  2. Creating a restriction on the Safeway Brand 1% milk class stating that every member must have an individual of the barcode artifact subclass as a part.

I'd use this approach for books, cars, and planets.

But as I say, if CCO already has another approach, let me know.

@alanruttenberg
Copy link
Contributor Author

@swartik To be clear: Both the material artifact and the ICE exist. Both could be part of the ontology. What differs is the sort of thing one says about all of them. To some extent this is a matter of good taste in ontology development. As a practical concern, having both terms will confuse people who aren't completely clear on the distinctions between the two. Most of the things that will be said of a barcode, will be said of all barcodes with the same form. For instance, the pattern of marks, who originated it, what it denotes. There is very little to say about an individual bar code. The sort of thing we would say about a material version is that this one on this milk carton is smudged. Since it seems to me that it is vastly more common to want to say something that is true of all the barcode, the subject would be the ICE, which captures the regularity.

So when I see this situation, I vote for ICE representation as primary. In the rarer material usage, one can always define the class: Artifact and bears some bar code.

In a few cases the material is as important as the ICE, because it is not uncommon to track specific copies. For example, we track a specific physical copy of a book in a library. In classified settings, how physical copies of reports are to be distributed is constrained and it isn't uncommon to track each individual. So for a few select classes (Books, Documents) I suggest we do also have the material term.

I don't understand what you mean by "I didn't expect to populate" - maybe: you don't plan to specifically assert statements like x rdf:type barcode artifact? You should consider such a situation a mark against representing something. It isn't determinative, but one is often (properly) drive to define classes because there are instances of them that are of individual importance. Sometimes we don't plan to directly instantiate, but that happens usually for more general classes - middle classes in an ontology. The barcode you describe will not be that sort of class.

You are also missing out that the barcode is about something, namely some class of Safeway Brand 1% milk. Just because we know every member of a class has a part of a certain class, doesn't mean we know what the relationship between them is.

Since I think we can say everything of importance by reference to the ICE in this case, there's also an argument that your choice of representation is more complicated than it need be.

@alanruttenberg
Copy link
Contributor Author

As an example of confusion, one might represent, in CCO an email message as an IBA instance, asserting there was an author of that message. But that makes an assertion about exactly one physical copy or while-on-screen copy of the message, not all of the messages with the same content. But the author statement will be true of all the copies. Therefore it should be an assertion on the ICE.

@johnbeve johnbeve self-assigned this Dec 13, 2024
@alanruttenberg
Copy link
Contributor Author

bump

@neilotte neilotte added the for 2.1 release These are changes we would like to see addressed under the 2.1 release label Jan 12, 2025
@neilotte
Copy link
Contributor

@johnbeve @APCox @mark-jensen @oliviahobai -- I would like to prioritize reviewing and addressing @alanruttenberg 's proposals in this issue in the next release.

Relatedly, there's an open PR to address #586 and Rabenberg's review should address some consistency issues across classes, relations, and domain-range restrictions.

Together, this would amount to a moderate but manageable number of updates. I believe we can address this without solving the larger ICE-IBE debate that @CarterBeauBenson's CQ-driven forum is set to tackle. The resolution of this issue will take considerably more work and, in my mind, warrants waiting to the subsequent release.

@alanruttenberg
Copy link
Contributor Author

What is this larger ICE IBE debate?

@neilotte
Copy link
Contributor

@alanruttenberg I'm referring to the guidance you have argued against previously in section 2 of the document at: documentation/archive/legacy-documentation/Modeling Information with the Common Core Ontologies 1.3.docx. My understanding is that the information working group is taking on this discussion.

@alanruttenberg
Copy link
Contributor Author

I think these changes intend to resolve that, though documentation will be needed. All the properties that currently have IBA/IBE as domain or range are proposed to change (deprecate/replaced) with cognates that have ICE as domain and range. Is there a more formal ICE/IBE debate than the ongoing one I've been pushing (occasional discussions)?

@CarterBeauBenson
Copy link
Contributor

@giacomodecolle Is the CCO WG in a good place where they can use the CQs to assess ICE theories?

@giacomodecolle
Copy link

@CarterBeauBenson if that's what you are asking, we can move our focus to building design patterns that answer your CQs, thus checking different ICE theories. I believe this is what we practically did a couple sessions when @alanruttenberg was there.

This would be the place that takes care of what @neilotte calls the bigger debate, part of which I understand is effectively what @alanruttenberg has been suggesting with this issue. Is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for 2.1 release These are changes we would like to see addressed under the 2.1 release
Projects
None yet
Development

No branches or pull requests

7 participants