Wednesday, October 10, 2012

DITA and the future of tech writing

This post is part of a series of posts that question some of the claims made about the benefits of DITA adoption.

DITA is designed to work with a CMS to create a fully structured tech writing environment. In a full DITA implementation, the process of creating technical documentation is fundamentally different from what is done in a traditional writing department. There are so many variations of tech writing processes that it's impossible to describe either the non-DITA or DITA structure accurately, but (with some trepidation), I'll take a stab at it...

In a traditional setup, at least for documentation that requires a fair amount of specialized knowledge, the majority of members of the doc department are writers. Typically, each writer researches, designs and writes one or more deliverables (and is effectively the project manager for the deliverable). There may be an editor, or the group may rely on peer reviews. There is a manager/team lead, but often the management style is quite flat, with writers making a lot of the decisions on things like style guidelines, user research, and priorities. In a high-functioning team writers are active in the development teams they work with, adding to terminology decisions and usability, as well as editing resource strings. The doc department may have a small tools team, or a writer may do double duty on tools maintenance.

By contrast, a DITA implementation is supposed to be more like building a house: writers create the bricks, but other specialists design the house and build it. Writers create small, structured, reusable modules of content. Architects create templates for the modules, and possibly also oversee information mapping of the modules. Map editors create maps that use the modules to produce deliverables. Editors enforce consistency. A team of tools developers maintains the complicated software required by the process. Team leads or architects act as project managers.

Writers must accept that they must spend a higher percentage of their time on tools and bureaucracy than they did in the traditional doc setup. They must also accept that they have much less control over the final output. This fundamental change often results in writers being unhappy about working in DITA, and the DITA literature goes on about how writers must be assimilated, how failures to return on investment are usually caused by writers having bad attitudes. But stop a moment: When your employees balk at a change, shouldn't you respect their instincts? Unless part of your business model for a DITA transition is that you want to reduce quality for readers, you should at least listen to the people who are responsible for creating that quality.

Instead, DITA proponents say that writers must shape up or change careers. I have heard it put as baldly as that: DITA is sweeping the tech writing field and writers can no longer see themselves as project managers for readers. They are now simply a cog in a wheel. If they don't like it they won't get hired: they'll have to find a new line of work. The real tragedy of this attitude is that the writers who balk at losing their responsibility are the high quality, senior ones who are passionate about their readers and have a professional attitude about how they work. Crappy writers will be perfectly happy assimilating to less responsibility. (They might be less happy when they realize that the transition to structured writing means that it will be much easier to ship jobs off-shore.)

DITA is a beautiful solution... if you're trying to document the parts for an airplane. It would also be suitable if you're documenting 50 similar products, each with end user docs that overlap. My problem with DITA is that it has been sold as a general purpose doc solution. DITA advocates went too far in extolling the virtues of DITA, such as saying that any company that translates content should adopt it.

When people complain about DITA, DITA proponents like to say that it's just a tool: if you don't like the meal, don't blame the knife. But DITA is much more than a tool. It's a tool developed to be used in a particular way, and there's no sense adopting it unless you also adopt the system of structured writing it was created for. The literature about DITA has also created a culture - such as the emphasis on assimilating writers - that permeates many organizations that adopt DITA. And the way DITA is meant to be used, creating reusable modules of content, creates a tendency for doc deliverables to have a certain look and feel. (More on that in another post.)

In fact, DITA is having a profound effect on all aspects of technical writing: on the way we work, the productivity of doc departments, our job responsibilities, and the quality of our output. I know that some call what I'm doing "DITA bashing", but we are past due for a deep reflection on the pro's, cons, and appropriate use cases for DITA.

Tuesday, October 2, 2012

DITA ROI: Are translation savings all they seem?

This post is part of a series of posts that question some of the claims made about the benefits of DITA adoption. This post focuses on savings in translation costs.

Articles about DITA ROI make some rather sweeping claims about the money you can save by adopting DITA. One prominent DITA proponent writes, "If you have localization in your workflow, you can probably justify the cost of DITA implementation." I would argue that that claim is false: that most companies that localize their content would never recoup the costs of a full DITA/CMS implementation, and that DITA makes sense mostly in fairly extreme cases such as hardware documentation where there are hundreds of similar versions to be documented.

There are two main claims for translation savings with DITA: topic reuse and post-translation DTP costs.

Topic reuse
First, DITA is supposed to save you money because you can reuse topics. "Write once, use frequently" means that a topic is only translated once. Big savings, right?

Maybe yes, maybe no. Translators use Translation Memory. TM is very sophisticated: each sentence is read into memory, and each sentence is flagged if it is an identical or fuzzy match to a sentence before it. If you repeat a sentence, TM will ensure that it is only translated once.

There is still a cost for processing a 100% match, but it is minimal. Typically, the cost for identical repetitions is 15% to 30% of the cost of new translation.

What this all means is that if currently 10% of your topics are duplicates of other topics, your translation costs are higher by 1.5-3% than if you reused topics.

Note: You can get some additional savings from DITA with a CMS by transforming your ditamaps into an interchange format called XLIFF before sending them to the translator. This is a pretty complicated procedure; have a look a this link to see if your organization can handle it. (And I remian somewhat confused about XLIFF: my friend who runs a large translation company says, "Since our CAT tool can handle XML directly, it’s not necessary to go through the migration process into .xliff format.")

Keep in mind that the savings from topic reuse only apply to topics that you are currently maintaining in duplicate places. If you decide to start reusing other topics in more places, that could arguably improve your quality, but it does not improve your ROI. (Plus, I argued in another post that the reuse following DITA adoption is often actually harmful to reader usability: link)

It is true that translation costs rise for reused text when it gets out of sync - when different locations are updated differently. It is always a good idea before sending things for translation to spend some time preparing the files; syncing duplicate content should be part of that check, when it occurs. But even when translators get different versions of dupes, they charge less for fuzzy matches, so the price is not the same as translating the section twice.

My point here is about ROI, not how to write. I am not arguing that cutting and pasting content is good practice. But for many writing teams there is not so much duplication that there's any problem keeping up with it, and if there is, then there are many other systems that provide excellent mechanisms for reusing topics, including Docbook XML, other forms of XML, and Madcap Flare. An extremely expensive full-blown DITA implementation with a CMS is not the only way to reuse topics - and for many organizations, it is not the best. (More on that in a later post.)

Post-translation DTP costs
DITA is supposed to save you money because in other systems, work has to be done after translation. One prominent DITA proponent claims, "Typically, 30–50 percent of total localization cost in a traditional workflow is for desktop publishing. That is, after the files are translated from English into the target language, there is work to be done to accommodate text expansion and pagination changes."

This is a valid point, except that it doesn't state its assumption that you are using bad practices. When you start to localize your DTP content you should remove manual formatting and rely on styles instead. In addition, you can't use formatting that will cause problems in languages that have longer words or are more verbose. This means: stop adding manual page breaks, stop using format overrides (FrameMaker 10 provides an easy way to find and remove overrides), stop putting section headers in the margin, stop setting manual cell heights in tables, stop using forced line breaks (Shift-Enter).

These practices will hugely reduce the post-translation DTP costs (certainly to way less than the stated 30-50%, although there is still a per-page DTP fee). When we talk about the advantages of DITA, we assume people are using good practices; we shouldn't assume that the alternatives are created with bad practices.

Articles about DITA ROI often give you rules of thumb to use in your calculations. Their claims are almost always based on an unstated assumption that your current authoring environment is the most inefficient one possible, and even then their claims can be over the top. It is prudent to ignore this advice and instead go to your translation vendor to find out what your cost savings might be. I have become friendly with the managing director of a translation vendor I once worked with, and he assures me that translation cost is virtually the same when the source is DITA, Docbook, other forms of XML, Flare's XHTML, HTML, etc.

I have spoken with doc teams who are planning to move from Docbook XML to DITA simply because they are confused by these DITA ROI articles and think that the massive translation savings will apply to them. This is not a trivial issue. DITA proponents should be much more precise in the claims they make about DITA cost savings, and doc departments should be much better educated before jumping on the DITA bandwagon.

Note: I'm uneasy about quoting individuals. It isn't fair to single out any particular DITA proponents on how they justify DITA ROI, as many DITA proponents are saying similar things. In addition, I don't mean to impugn the motivations of anyone.

Update: I have a growing unease about quoting people and then knocking down what they say. I have now removed links to DITA proponents I quote. In later posts, I may even stop quoting.