Why, oh WHY, do you care what's in my content reference?

In DITA, when you use the @conref content reference mechanism on an element with required children, you still have to put in the required children. This is a pain, and nobody likes it. This rule is a combination of DITA's reuse design with basic XML rules, but it's not intuitive, and everyone is confused the first time they run into it. If you forget to do add those children, you get parser errors, build errors, and more. Which leads you to either shake your fist at DITA in general, or to find the closest DITA tools developer and start grumbling.

In this world of Slack channels and instant messages, I'm the "closest" DITA tools developer for many people. As such, I've seen this confusion so many times – most recently in Radu Coravu's blog post laying out customer complaints about DITA – that it seemed useful to recount the usual conversation for the next time it comes up. Not every conversation is the same, but they all pretty reliably follow this pattern...

Stupid stupid conref stupid parser errors stupid stupid DITA stupid

Whoa now … you're muttering again. And as my 5-year-old reminded me last week, "stupid" is a bad word.

Really? You're going to complain about my language? YOUR "DITA" LANGUAGE IS THE PROBLEM!

You sound grumpy?

Yes! My build is broken! It says I need to add elements but they're pointless because I'm using conref!

Ahhhh … I bet you're using an element with required children. Maybe <table>? Or <ol>? Maybe even <topic>?

… Yes, I'm using <table>.

Yeah, that's an XML rule. It bugs me too, but we can't do anything about it.

But I'm using conref.

OK … I'm going to try to explain the background here, but I have to warn you it gets a bit technical.

What's technical about it? I'm using conref.

So, stepping back a bit, DITA is XML. And any controlled set of XML is governed by grammar files – like DTD, Schema, or RelaxNG. Those are basically a set of rules that say what can and cannot go in your document. For example, with DITA, the grammar files say that <title> is optional inside <section>, but is not allowed in <keyword>. Grammar files also say when something must appear in a certain place – for example, every <topic> requires one <title>, every ordered list requires at least one list item, every <row> in a table requires at least one <entry>, and so on.

Why would you say something MUST appear? Can't you just leave it up to me?

Various reasons. Requiring an element often helps guide authors – if a list is meaningless without at least one list item, requiring that list item ensures that you put one in while editing (and many editors will do it for you).

Requirements like this also ensure that anybody trying to implement the standard can rely on a few basic rules. For example, my cloud service that expects a DITA topic as input is guaranteed to find one and only one title for the topic; even if that title is empty, I know it's there, which means I don't have to handle the exception "what if you forgot your title tag".

But I don't need that required element, because I'm using conref!

Yeah. So, more about XML. Just like there are a bunch of tools out there that support DITA – which only work reliably if you follow the rules of DITA – there are a bunch of tools out there that are designed to work with XML. Not just DITA, but any XML. (The world of XML tools and standards is built on these – everybody uses them.) For our purposes, the important one here is the XML parser – a basic, widespread tool that helps read and validate any XML file so that programs can do something useful with the content. One important job for a parser is to compare your XML to your grammar file, to see if you're following the rules. If you're not following the rules, the parser squawks.

Why would it squawk? I'm using conref!

If you have a <table> with no children, the parser says NO, because you're not following the rules of DITA's XML grammar … and XML parsers exist partly to make sure that any XML follows its own grammar.

But I'm using conref. Why would DITA's grammar require children when I use conref?

So, now we get to the real problem. DITA is XML. So, the XML tools that we use everywhere – which only work reliably if you follow the rules of XML – expect you to do what the grammar file says. If your grammar file says that an element is required, then as far as the parser is concerned, you must have that element or you're not following the rules.

In this case, the rules say that you must have a <tgroup> inside of <table>, because in the normal non-reuse case, a <table> without <tgroup> doesn't make sense.

I'm not talking about that case. If you didn't hear, I'M USING CONREF.

No, I heard. The problem is, generic XML parsers don't know or care about the @conref attribute, or even about DITA. All the XML parser knows is – "the DITA grammar file this file CLAIMS to follow says <tgroup> has to be here, so I'm going to complain if <tgroup> isn't here."

The DITA specification could say "If an element specifies @conref, DITA implementations must ignore the XML grammar that says children are required." But that would mean making an exception to XML's own rules – at least for any tool that uses DTD or Schema as a grammar file, which is still the basis for many common tools. (RelaxNG now has ways to do it, but … not everybody uses those). Again: XML doesn't know or care about the @conref attribute. Making that exception would mean the many tools that still use DTD or Schema for XML can no longer rely on standard XML tools. Trying to support our own "almost XML" without those standard XML tools would be cost-prohibitive.

… Oh.

Yeah.

I don't like it either. I've been running into it since DITA 1.0. The only thing I know of to make it easier is to try to set up editors to add these required elements in for us. But overall, we're kind of stuck with the problem as long as reuse is handled with the @conref attribute, and as long as we have any need to support DTDs and Schemas.

Note: Most conversations about this issue end here. A few of the very curious carry on.
… so, could this change?

Sure, in theory, but it would probably mean abandoning the @conref attribute / the current content reference mechanism. As long as we have the same general mechanism (that is, a <table> is needed in order to reference another <table>), we have to comply with the rules of XML – meaning required elements are always required. Any alternative really has to replace the @conref design.

Like … use XInclude? I think I heard of that once?

So, one of the things that sets reuse-with-conref apart is that you get automatic validation. Because you have to add a <table> in order to pull in a <table>, the resolved content reference is guaranteed to be valid. [Even complex edge-cases with domains are handled with @conref rules in the existing specification.]

This is what @conref is designed to enforce. The explicit goal was that reuse should never allow you to put something where it wasn't originally valid. Of course, that's the root of this whole problem: DITA ensures resolved-conref validity by forcing you to put in a <table>; XML forces you to put in every required element as part of that table.

So … what do you suggest as an alternative?

Me? Nothing, at the moment. I don't have a suggestion I'm happy with. But in theory, and only if we abandon the existing @conref based reuse mechanism, we could replace @conref with some other variable mechanism. One where you don't start with the reused element, so there are no required child elements. But … if you can pull anything into anywhere, you're likely to get something that's not valid in the original context (like pulling a <table> into a <title>), and if you can pull anything into anything, you're almost guaranteed to make your DITA tools squawk at some point. And … unexpected squawking is what got us started on this weird little dialogue, right? So I'm not thrilled at the idea.

Not to mention that we're really trying to avoid having two ways to do the same thing. So if we come up with something new, it probably means getting rid of @conref. I shudder a bit imagining the migration pain that would cause.

Couldn't we just get rid of required elements?

We could. It means tools get (a bit) more complicated to design, just because they require so much more error checking. ("If <table> is missing <tgroup> and isn't using @conref, throw an error." "If <tgroup> is missing <tbody> and isn't using @conref, throw an error." "If <topic> is missing <title> and isn't using @conref, throw an error." And so on for every element.) Not impossible, just a pain, and you lose the "do what's obviously needed" advantage that comes when an XML editor detects required elements and puts them in for you. It also means that DITA's out-of-the-box design lets you do more nonsensical stuff, like creating a valid topic that is just <topic id="squawk"></topic>.

Are there any other options?

I'm assuming those are not the only alternatives. I just haven't thought of any better ones.

Update, morning of Sept 19:

I missed the Radu's follow-up note on the DITA TC comment list pointing out that RelaxNG does allow you to say "This element requires children unless @conref is specified." So in theory, another solution would be to throw out support for DTD/Schema and support only RelaxNG for grammar files. On the down side, that would mean many existing DITA tools probably can't support DITA, or have to use their own non-standard DTD/Schema files.

Alternatively, we could also have DTD/Schema require these sub-elements while RelaxNG does not. However, that would probably mean a growing number of DITA documents that are locked into RelaxNG-aware tools; any attempt to use them with other DTD or Schema based tools would immediately throw these parsing errors all over again. So, again … not ideal.

So who do I talk to if I want to make something like this happen?

Find somebody on the OASIS DITA Technical Committee who's willing to take this on for DITA 2.0. (Not me, please.)

Or … join OASIS and offer your own design! We'd love to have you!