Tuesday, July 31, 2012

Case study: DITA topic architecture

This post is part of a series of posts that question some of the claims made about the benefits of DITA adoption.

(Note: DITA uses the word "topic" to refer to a reusable module of content. Out in the rest of the world the word topic tends to refer to an HTML page or a section of a PDF. This may be seen as an infuriating oversight, but I suspect it's actually deliberate. For a while I tried replacing DITA "topic" with "reusable module of content" but I have given up and now just use the one word to mean multiple things.)

I started with DITA several years ago when I got a job in a large doc team that had been using DITA for a while. I inherited a few deliverables and was appalled at the way the content was broken up into concept, task and code sample topics. We optimized for HTML output and there were too many brief HTML pages that users had to click through. Even for a simple idea that could have been covered in one paragraph, these docs would have three topics. For example, a description of how to stop the server would have a concept, task and sample topic, each appearing on separate HTML pages.

My audience was developers. I did quite a lot of usability testing with them and found they were furious about the documentation. They hated having to click through multiple tiny pages. They hated the minimalism and choppiness. They described the docs as unfriendly, officious, insulting and unhelpful.

I looked into chunking (the DITA way to combine multiple topics on an HTML page) but our CMS didn't support it. Even had I been able to use chunking, a topic was defined as a title and body, and frequently the titles would have got in the way of well-formed topics.

I ended up copying and pasting text into new, larger, restructured topics. It was an enormous pain because DITA dictates different elements in the different topic types. (Why oh why could DITA have not just used ul, ol, li and p consistently? Why stick us with steps and cmds and other nonsensical redundant elements? If anyone can answer that, I'd really like to know.)

After a lot of work I created a nice usable, readable help system made up completely of concept topics. My concepts included steps where needed, tables where needed, lots of code samples sprinkled throughout, and lots of sections with section headings. I was writing about advanced topics like cryptography and there was no potential for reuse (as there was no potential for reuse of most of our developer topics - DITA was brought in for the end user team).

I changed companies to another DITA shop and son of a gun, I inherited a gazillion tiny concept, task and reference topics. This time I was working in FrameMaker 9 and it doesn't support chunking either. I had more potential for reuse this time and so had to consider that in my rearchitecture, but I still returned to concepts.

When DITA gets criticized the DITAnistas invariably respond that DITA is not to blame. Another blogger wrote a very erudite post on this topic ( The Tyranny of the Terrible Troika: Rethinking Concept, Task, and Reference), and a commenter wrote that DITA is like hamburger: if you don't like the meatloaf, don't blame the meat.

But the poor architecture I inherited is directly the fault of DITA. OASIS creates the DITA spec, and on the OASIS web site (eg docs.oasis-open.org/dita/v1.0/archspec/topicover.html), writers are told to create topics that way. In both companies, they simply did what OASIS told them to do.

It's true that I have the flexibility in DITA to organize my topics however I like. (That's nothing special: I could make equally free choices if I wrote in Docbook or raw HTML.) And I am always going to organize topics in the way I think they'll be most useful and attractive to my readers, as well as easiest for my reuse. What I would like is to not have to keep encountering DITA writing that is ugly, difficult for users, and difficult for me to change.


  1. Amen to that...we are embarking on the road to DITA (sometime) and we are all appalled at the requirement to cut everything into tiny little pieces. We have also seen the effect of this slicing and dicing in other information deliverables and the usability, to be frank, sucks. The ability to find useful information quickly is negated and basic navigation is atrocious. You have given me some points to ponder and, hopefully, we can approach this in a more intelligent fashion given some of the caveats you've pointed to.

    Thanks for the blogging...looks like you have some great subject matter to cover!


  2. Hi Andy,

    My first comment on my new blog! Thanks so much. I should have confetti to pour on your head and a big novelty cheque to hand out. Alas.

    Moving to DITA... that's really too bad. I found it took about 15 minutes to get used to authoring in Docbook XML, but I have used DITA for years and am still not used to it. The whole paradigm is wrong-headed. I judge every year in the STC international competition, and I can tell when I read a doc set authored with DITA... it's a real pity what is happening to our profession. (I try to have an open mind but that is my real opinion.)

    Thanks again and I hope you're well!

  3. Hi Ruth,

    I am seeing more and more of the kind of thing you are describing as DITA continues to spread. Just as we saw the "desktop publishing look" in the 80s, so today we are seeing more and more of the DITA publishing look -- and it's not pretty. (I call it Frankenbooks -- http://everypageispageone.com/2012/02/24/frankenbooks-must-die-a-rant/).

    As you say, the DITAnistas will claim that any information design issue are a result of people not doing it correctly, but, as you point out, the problem is so pervasive, and the Frankenbook results so similar, that one cannot help but conclude that the fault lies in DITA itself. Even if it is possible to produce better results from DITA, it clearly requires significant extra work, and significant technical knowledge to do so, and it just should not be that way.

    What I really fear, though, is the the Frankebook mess that DITA is creating will give topic-based authoring in general a bad name. In the age of the mobile web, we can't keep producing technical documentation in books. We need to create real topics -- Every Page is Page One topics that cover a single subject properly. DITA calls itself a topic-based system, but it is actually a fragment-based system. If the fragments are to be made into real usable topics, that will be entirely through the efforts of the author, hindered rather than helped by DITA.

  4. Hi Ruth

    I find myself disagreeing with just about everything you've written. The nub of the issue is that you seem to feel that standards are constricting your ability to create "useful and attractive" documents. DITA's architecture wasn't just created through some random process designed to make it difficult for authors. The architecture is the result of scientific approach to information design, and embodies best practice. It is pretty insulting to the technical communicators who have contributed to information typing over the past fifty or so years to describe the fruit of their labour as "poor architecture".

    The term "topic" is short of "topical information unit". You have misunderstood DITA's architecture if you think of a DITA topic as a re-usable module of content. Every element in DITA is re-usable. DITA's use of the term lines up with the widespread meaning of the term in Help authoring.

    If the CMS that you chose doesn't support the chunking that you think you need, that sounds to me like a shortcoming of the CMS, not DITA. If my Web authoring tool doesn't support links, I can't blame the HTML standard!

    If your content is optimised for HTML delivery, then it is likely that most content will be discovered by readers through a search engine. If this is true in your case, the issue of users having to click lots of times to get through small topics is a non-issue. Instead, you will have users having to find a small nugget of information in large Web pages covering many topics of discussion. In my experience, a topic is never too small because there aren't enough words on the page. A topic should be as small as possible, but as large as necessary. I know you said your users "hated minimalism", but did they understand what minimalism is? Do they really have the time to read more than is necessary?

    But let's assume that in your situation, chunking topics together in the deliverable is necessary. I can't see how having compulsory titles on all the chunks is a problem. Back in the 1960s, a methodology called STOP was used, and the title was recogised as perhaps the most important element in a topic. I can't imagine any circumstance in which I would want to omit a title, so was intrigued by your desire to have optional titles.

    As a constructive suggestion... when you have three topics, as you say, that could be covered in one paragraph, deletion should always be considered. Or filtering, so that the two superfluous topics are omitted from the deliverable. Ironically, one of the advantages of information chunked into concept, task, and example, in your case, is that you can very easily omit those information types not appropriate for your deliverable.

    You asked why DITA has step, cmd structures rather than just ol, li structures everywhere. The answer is that semantic mark-up offers document engineering options not otherwise possible. The strict structures also serve as a style guide for authors, to make sure they don't mistakenly describe a system response as a step. Once you've semantically identified the elements of a step, you can choose to filter that structure for different purposes. For example, in your user guide you might have commands and results, but in a checklist you might want to omit the results.

    I hope this comes across as constructive, because that's my intention.


  5. With my background in both Information Mapping and DITA, I offer that people who use information want just enough information at just the right time in just the right form. DITA is an XML mark-up that can help achieve this technically. Organizations should not place the adoption of DITA as an independent goal for it is only a tool for achieving a broader goal. Large technical organizations may need "most of DITA" to achieve their goals, but enterprise authors rarely need "most of DITA" in order to improve the reader experience through single source and multi-channel publishing. It is usually misguided for small tech pubs groups or authors of policies and procedures, marketing materials, compliance information, etc. to focus on learning and using "everything DITA" with the requirement of significant DITA training and the adoption of a technical XML editor. They should be writing with the reader in mind using the tools and paradigms they are already familiar with. Thinking about topics and collections of topics (DITA Maps)is useful for all authors. And DITA is an important authoring tool for large technical organizations. I think that in the future for the enterprise authors who are not technical, DITA will become more of a publishing output rather than an authoring input. They will be able to continue to write productively with Both topic-based writing AND the document paradigms that are familiar to them. Organizations need to keep it simple and relevant to various audiences in order to succeed!

    1. In response to Simply Doug (and with your Information Mapping background, Doug, maybe we can guess how close you were to IMI!): I fully agree. I've also spent many years working with and for Info Mapping, and have now been working in a DITA environment for a while. After the flexibility of The [IM] Method I often cringe at how prescriptive DITA can be at the authoring level, and how restricting its rules are - just three topic types? Humbug.

      IM's structured approach, when used properly, offers the author a practical framework in which to work. DITA (probably) offers a better output framework.

      Shouldn't the equation be:
      '(DITA) AND (AN Other)' rather than
      '(DITA) OR (AN Other)'?

  6. Hi Mark,

    I greatly enjoy your blog: your writing is illuminating! I take your point about the value of topic-based writing; I think I have been forgetting that. I have written in a topic-based way since the 90s (mostly in Docbook XML) and once I hit DITA I started to lose faith in it, but should rethink that.

  7. Hi Doug,

    Thanks for the comment. I certainly agree that we shouldn't force our readers to read any more than they need to. Most doc use is by someone in a hurry looking for one particular piece of info, so navigation and brevity are paramount. We also need a lot of chunking within topics to make it easier for them to skim for information.

    In my experience writers don't have any trouble picking up XML. About 12 years ago I worked at a company that migrated from Word to Docbook XML using XMetaL, and none of the writers took longer than half an hour to get up to speed. DITA is a different kettle of fish because of topic types, maps, reltables, lots more tagging, etc etc - not to mention the horrors of using a CMS.

    I agree with you that DITA should be used only by large technical organizations that need to write in that fashion. The problem is that DITA is being trumped as a solution for everyone - not just for everyone, but that everyone SHOULD adopt. Consequently, it is being used everywhere - and mostly misused. It's a real problem in the technical writing field that DITA has become so pervasive. That's not to say that it is never useful.

  8. This comment has been removed by the author.

  9. This comment has been removed by the author.

  10. I am so happy I found this article. I thought it was just me hating to have to break down content that needs to stick together into a million fragments, and having users to go back and forth all the time (some of which are still using PDF, ha haa.)

  11. Thanks for the excellent post Yappa. The replies by Tony and Doug are also very helpful. I am part of a small technical writing team and we are considering DITA. However our manuals are currently structured by menu option (mostly) and each menu option just happens to have four sections that almost exactly match the DITA topic types - concept, task, reference, and sample (where applicable). Customers do not access the content by search engine typically - much more often it is by opening a PDF manual or clicking a context-sensitive help button from the application itself. So perhaps staying in FrameMaker with our current in-built "chunking" is best.

    My own personal DITA rant involves Illustrator, which I don't use too often. I needed to figure out how to draw a circle for a diagram (for the record, you just left-click on the Rectangle tool and a pull-out menu of other shapes appears, one of which is the Ellipse tool, which allows you to draw your circle). If I had an Illustrator PDF manual I could just have looked this up in 5 minutes using bookmarks. But no, I had to use a Search engine, which pulled up hundreds of irrelevant information snippets, none of which explained how to draw a circle. I finally figured it out via a helpful forum post outside of Adobe's website entirely (Ask.com or somewhere). The takeaway lesson is that Search engines are great for some things but terrible for others, and a comprehensive user manual is always good to have around.

  12. Fascinating article, as is Tony Self's response.
    I wonder how much of the problem is due to people refusing to accept that different subjects (and audiences) require different approaches?
    DITA, in the out-of-the-box structure, does not allow for there being multiple methods to achieve a task. In fact, it doesn't even acknowledge the existence of a 'task' and then 'methods'.