Focus on Readers: DITA ROI

Showing posts with label DITA ROI. Show all posts

Sunday, March 23, 2014

DITA in times of contraction

When Pubs managers decide to move their doc content to DITA, all they see is the savings. It's ROI, ROI, ROI. "You have to spend to save," they argue, and they often start spending hundreds of thousands of dollars on software purchases, training, and non-writing personnel. All that's fine when a company is growing and has loads of cash, but what risks are Pubs managers exposing themselves to if the company hits bad times?

As I have argued before, in many cases DITA doesn't so much save money as redistribute it. Where before you spent the lion's share of your doc budget on salaries for writers, now you're spending the most money on tools developers, information architects, editors, and software.

I'll give you an example: I once worked in a DITA shop where a team of 11 writers was overseen by a manager, three team leads, three editors, and two information architects; and it was supported by nearly a dozen tools developers. There were almost twice as many non-writers working on the documentation as writers (and yet writers had to fill out complicated forms for the editors, as well as project-manage the localization of their docs). The CMS was enormously expensive, and then the CMS vendor end-of-lifed our database so we had to spend a pant-load on a new one, including two years of research, planning, tools redevelopment, migrating, and tweaking the migration.

In a DITA shop, teams become complexly interdependent. Much effort is expended on assimilating writers so that they give up their holistic approach to writing, and accept their role in a DITA production line that starts with information developers; relegates writers to the role of filling in content in designated, pre-typed topics; and ends with editors. As it was explained to me, the writer must learn to pass the baton. DITA proponents argue that writers who can't assimilate should be fired.

The CMS and publishing tools are enormously complex so that nothing can be published without the help of a large team of tools developers. In addition, the complex new processes and corresponding bureaucracy require training (and hence trainers) before new writers can become productive.

Now imagine that the company has a profit dip and needs to cut costs. Who and what is expendable?

Before you had a team of writers, and if the company got in trouble you could lay off some with minimal impact. But now, if you have to contract your Pubs department you're in a pickle. The information typing process relies on so many non-writers that it seems inevitable that when companies are in decline, a DITA shop is going to have to give up more writers than a non-DITA shop.

That fragile CMS doesn't run itself, and keeping it going requires expert skills: you're going to have to keep most of your tools developers unless you want to give up publishing documentation altogether. It's probably not possible to give up the expensive maintenance plan for the CMS, either.

Your complex processes are going to continue to require the trainers, team leads, information architects, and editors.

In short, you're left with an expensive behemoth that can't be easily dismantled... unless you decide to ditch DITA altogether and migrate to a simpler solution.

The risk of DITA is fine when there is real justification for adopting DITA: when there is real need for reuse, when translation savings can't be garnered by a simpler alternative like Docbook XML or Madcap Flare, when you absolutely need to enforce strict information typing on writers. The problem is that nearly all outfits that are adopting DITA do not have that real justification. They're wasting money on DITA, and that could get them into trouble when the cash stops flowing.

Sunday, September 8, 2013

The lesson of databases: use only when necessary

The current CIDM newsletter has an article about content management systems: Why do organizations hate their content management system?

The article is scathing about companies that make bad purchasing decisions, and scathing about CMS vendors that make difficult, bloated products.

But the article is missing something important. The fact is that relational databases are notoriously difficult to use. I spent over ten years documenting databases, so I've seen some of the messy innards first hand. You don't just buy an RDBMS and then figure out how to use it. You need Database Administrators with a lot of skills. You need to make an ongoing investment of money and time just to keep the thing working.

When I worked in the IT department of a large financial firm, there was a prohibition on databases. We used Excel in very sophisticated ways. We transferred millions of text files a night. But we avoided RDBMSs at all costs. Apparently the company had been burned badly by a database implementation and was unwilling to try again.

The CMSs used by doc teams present extra challenges. At one DITA shop where I worked, our CMS vendor decided to deprecate documentation use of their CMS and end support for our application of it. That meant that we had to spend an enormous amount of time and expense to choose a new CMS and get it set up. The cost must have been in the hundreds of thousands of dollars, none of which was figured into the initial ROI for moving to DITA (which we had done just a couple of years before).

I would admonish documentation departments to avoid using a CMS unless they really need it. As with DITA, it makes no sense to take on the enormous expense, steep learning curve, extra manpower requirements, and ongoing hassles - unless you really need it. "Really needing it" means that simpler options won't work for you. The complexity of reuse in most doc departments doesn't come close to justifying the enormous expense.

Tuesday, October 2, 2012

DITA ROI: Are translation savings all they seem?

This post is part of a series of posts that question some of the claims made about the benefits of DITA adoption. This post focuses on savings in translation costs.

Articles about DITA ROI make some rather sweeping claims about the money you can save by adopting DITA. One prominent DITA proponent writes, "If you have localization in your workflow, you can probably justify the cost of DITA implementation." I would argue that that claim is false: that most companies that localize their content would never recoup the costs of a full DITA/CMS implementation, and that DITA makes sense mostly in fairly extreme cases such as hardware documentation where there are hundreds of similar versions to be documented.

There are two main claims for translation savings with DITA: topic reuse and post-translation DTP costs.

Topic reuse
First, DITA is supposed to save you money because you can reuse topics. "Write once, use frequently" means that a topic is only translated once. Big savings, right?

Maybe yes, maybe no. Translators use Translation Memory. TM is very sophisticated: each sentence is read into memory, and each sentence is flagged if it is an identical or fuzzy match to a sentence before it. If you repeat a sentence, TM will ensure that it is only translated once.

There is still a cost for processing a 100% match, but it is minimal. Typically, the cost for identical repetitions is 15% to 30% of the cost of new translation.

What this all means is that if currently 10% of your topics are duplicates of other topics, your translation costs are higher by 1.5-3% than if you reused topics.

Note: You can get some additional savings from DITA with a CMS by transforming your ditamaps into an interchange format called XLIFF before sending them to the translator. This is a pretty complicated procedure; have a look a this link to see if your organization can handle it. (And I remian somewhat confused about XLIFF: my friend who runs a large translation company says, "Since our CAT tool can handle XML directly, it’s not necessary to go through the migration process into .xliff format.")

Keep in mind that the savings from topic reuse only apply to topics that you are currently maintaining in duplicate places. If you decide to start reusing other topics in more places, that could arguably improve your quality, but it does not improve your ROI. (Plus, I argued in another post that the reuse following DITA adoption is often actually harmful to reader usability: link)

It is true that translation costs rise for reused text when it gets out of sync - when different locations are updated differently. It is always a good idea before sending things for translation to spend some time preparing the files; syncing duplicate content should be part of that check, when it occurs. But even when translators get different versions of dupes, they charge less for fuzzy matches, so the price is not the same as translating the section twice.

My point here is about ROI, not how to write. I am not arguing that cutting and pasting content is good practice. But for many writing teams there is not so much duplication that there's any problem keeping up with it, and if there is, then there are many other systems that provide excellent mechanisms for reusing topics, including Docbook XML, other forms of XML, and Madcap Flare. An extremely expensive full-blown DITA implementation with a CMS is not the only way to reuse topics - and for many organizations, it is not the best. (More on that in a later post.)

Post-translation DTP costs
DITA is supposed to save you money because in other systems, work has to be done after translation. One prominent DITA proponent claims, "Typically, 30–50 percent of total localization cost in a traditional workflow is for desktop publishing. That is, after the files are translated from English into the target language, there is work to be done to accommodate text expansion and pagination changes."

This is a valid point, except that it doesn't state its assumption that you are using bad practices. When you start to localize your DTP content you should remove manual formatting and rely on styles instead. In addition, you can't use formatting that will cause problems in languages that have longer words or are more verbose. This means: stop adding manual page breaks, stop using format overrides (FrameMaker 10 provides an easy way to find and remove overrides), stop putting section headers in the margin, stop setting manual cell heights in tables, stop using forced line breaks (Shift-Enter).

These practices will hugely reduce the post-translation DTP costs (certainly to way less than the stated 30-50%, although there is still a per-page DTP fee). When we talk about the advantages of DITA, we assume people are using good practices; we shouldn't assume that the alternatives are created with bad practices.

Conclusion
Articles about DITA ROI often give you rules of thumb to use in your calculations. Their claims are almost always based on an unstated assumption that your current authoring environment is the most inefficient one possible, and even then their claims can be over the top. It is prudent to ignore this advice and instead go to your translation vendor to find out what your cost savings might be. I have become friendly with the managing director of a translation vendor I once worked with, and he assures me that translation cost is virtually the same when the source is DITA, Docbook, other forms of XML, Flare's XHTML, HTML, etc.

I have spoken with doc teams who are planning to move from Docbook XML to DITA simply because they are confused by these DITA ROI articles and think that the massive translation savings will apply to them. This is not a trivial issue. DITA proponents should be much more precise in the claims they make about DITA cost savings, and doc departments should be much better educated before jumping on the DITA bandwagon.

Note: I'm uneasy about quoting individuals. It isn't fair to single out any particular DITA proponents on how they justify DITA ROI, as many DITA proponents are saying similar things. In addition, I don't mean to impugn the motivations of anyone.

Update: I have a growing unease about quoting people and then knocking down what they say. I have now removed links to DITA proponents I quote. In later posts, I may even stop quoting.

Friday, September 28, 2012

The True Costs of DITA Adoption

For anyone who disagrees with this post: please leave a comment or send me an email. If I am incorrect about something, I will modify the post so that I am not spreading misinformation. And I would love to have a dialog on the topic.

There are many articles that advise doc managers about how to calculate return on investment for potential DITA adoption. Most of these articles seem to be written by consultants who make money by helping companies set up DITA systems: they have a vested interest in making DITA look beneficial. Also, they tend to help out during the initial migration and might not be around when some of the costs kick in: they simply might not be aware. Finally, they might deal mostly with large companies where large expenditures do not seem excessive. For whatever reasons, the literature seems to be underestimating the true cost of moving to DITA.

There can be a lot of costs related to DITA adoption. Here are some that might affect you. (Different implementations will vary somewhat.)

Most of us know about the cost of a CMS, which can set you back over $250,000 (or might be a lot less). You can do without the CMS, but DITA is designed for use with a CMS and you need one to get the full benefits. But the CMS is just the start.

In the early stages of your migration to DITA, you will likely need to hire DITA specialists (the consultants I mention above) to help you plan and set up your system.

You'll likely need more inhouse tools developers, and you may need developers with different skills than you currently have. This is not just to set up the new publishing system and so on, but also to troubleshoot publishing problems, adapt to new browser versions, and address all the bugs and glitches. In my experience there are all sorts of problems that crop up with the relational database (CMS), and also with the ditamaps, formatting of outputs, and many other things. Part of the problem is that the DITA Open Toolkit is notoriously difficult (and could require extra expense for things like a rendering engine). Some of the tools designed to work with DITA are arguably not quite up to speed yet. Your tools developers will also spend a lot more time helping writers.

(If you don't move to a CMS, but use something like FrameMaker and WebWorks ePublisher, you may find that you have a lot more headaches in producing docs without much in the way of DITA benefits.)

You need extremely skilled information architects to create a reuse strategy, engage in typing exercises, and design and edit your ditamaps. This isn't a skill set that people currently in your organization can easily acquire. Even most information architects have trouble adequately mapping topics. For a discussion of the sorts of challenges they'll face, see this series of posts; the moral is: if you don't have skilled architects working on your system, you may end up with Frankenbooks that are not particularly useful for your readers. I raise some additional topic reuse problems in an earlier post: link.

You'll need to spend significant time developing new processes, policies, and internal instruction manuals.

Your team will have to undergo intensive training. In coming years, new writers you hire will also need training. I have found that writers can move to Docbook XML with very little training, but DITA requires a great deal of training, not just for the CMS, but also to learn how to use ditamaps, reltables, and so on.

The migration of content will likely be quite time consuming, with manual work required to correct tagging that doesn't convert automatically, mapping, and a complete indepth edit.

Your writers will need to spend more time on non-writing activities. This can greatly reduce their productivity. Working with a relational database, especially an opaque one like a CMS, is much more time consuming than checking files out of a version control system. Creating reltables is a lot more work than adding links. Coordinating topics is a lot more work than designing and writing a standalone deliverable. Plus, there is a lot more bureaucracy associated with DITA workflows.

With most DITA implementations, topics exist in a workflow that only starts with the writer. You'll probably need more editors and more software.

You'll also probably need more supervisors. The DITA literature emphasizes the importance of assimilating writers to the new regime and then monitoring their attitudes. Pre-DITA, writers were project managers for their own content; with DITA they have to learn to hand that responsibility off to others.

There are some organizations, such as ones that have to cope with hundreds of versions of a hardware product, that have a clear ROI for DITA. But many (most?) organizations could find that DITA doesn't so much save money as redistribute money. Where before you spent the lion's share of your doc budget on salaries for writers, now writer salaries will be a much smaller proportion of your budget. In many cases, companies could find themselves facing higher costs than pre-adoption: they will never see return on their investment. And given the complexities of using DITA, ongoing hassles and escalating costs, some companies are going to find themselves having to ditch DITA and go through an expensive migration to another system.

Sunday, September 16, 2012

Musing about topic reuse

I first started writing in a dot command language called Script. Later I used TROFF and then LaTeX. But eventually the WYSIWYG editor was born (I had mixed feelings about it at first), desk top publishing applications and laser printers appeared, and Macs hit the market - and typefaces hit the world with a bang.

Tech writers, wanting to show off what they could do, started using typefaces like there was no tomorrow. Some manuals were so busy that it seemed like every word was italic, bold, colored, or in a different font altogether. They were hard to read.

(We still overdo typefaces to a certain extent. I would like to use bold only to designate words that I want to "pop" off the page, and not for UI controls and so on... but that appears to be a battle I have lost.)

Nowadays I wonder if topic reuse is a bit like those heady days of typefaces. When we say, "my doc set has topic reuse of 30%," that doesn't mean "my doc set needs to have topic reuse of 30%." There is a sense of "Topic Reuse Good, No Topic Reuse Bad." There is a need to justify spending a lot of money for tools like CMSs that facilitate topic reuse.

I have also noticed a tendency among some writers to pad out their deliverables with other people's topics when it isn't really helping the reader. When working in large writing departments, I have found my topics in odd places where a link would be more useful. In one instance I took a deliverable that was 50 pages in PDF form and deleted the reused topics, creating a much more focused, useful doc that fit on one HTML page. The reused topics in that example were actually harmful because they were pulled out of context. I know, topic-based writing isn't supposed to have context, but in many cases it does, especially with complex subjects.

The problem is that writers are given latitude - and are even pressured - to reuse topics when there is no clearly defined reuse strategy. In fact, I have never seen a well-articulated content reuse strategy. You see descriptions of the mechanics of reuse, like this one or this one, but they don't provide guidance on why to reuse topics and which topics to reuse.

Sometimes topic reuse makes no sense, like a doc set I once saw that repeated a large set of introductory topics at the beginning of every deliverable, which to my mind just clogged up the user experience. Even worse, a decision was made to include the topics in the HTML and not the PDF - the reasoning being that the topics weren't really needed and would add to printing costs - which confused readers about whether the two outputs were two different sets of content.

Sometimes topic reuse strategies are ill-considered, such as trying to use doc topics in training material. Docs and training require such different styles of writing that that can result in really bad output, and only seems to make sense when cost-saving requires highly compromised training materials. (Which, in that case, is fine, as tech writing is necessarily all about doing the best we can with available resources.)

Sometimes topic reuse becomes a sort of mania, an end-in-itself. I once saw a DITA topic that single-sourced a table that described fields in a screenshot. There were three versions of the screenshot, each completely different UI screens, each marked with attributes for different products. In the table, there were two or three rows that were not conditionalized. There were about a dozen other rows that each appeared three times, marked with attributes for the three different products. Updating the table was a nightmare, as you might imagine.

This is not to say that topic reuse is not useful. Anyone who has had to modify the same text in two places knows how important topic reuse is. But I have never documented hardware, so I have never worked on a doc set that required a lot of topic reuse. Consequently, approaches like this one, in a department where 48 writers produce 14,000 publications, were not even remotely applicable. (I have worked in departments with that many writers, but never with even one percent of that many publications. It is my contention that that is not the norm.)

My preferred approach would be to reuse topics when previously you would have cut and pasted the content, and otherwise a writer would be required to make a case for reusing a topic. My reasoning is that we must never force the reader to read anything more than they need to read. (Unfortunately, minimalism is often thought of in terms of our convenience and costs rather than in terms of reader usability, as it should be.)

We are in danger of reusing too much because reuse is easy and because there are non-reader incentives to reuse, as described above. The problem with my approach is that it could keep the stats on reuse low, which wouldn't help with proving ROI for the CMS or other tools that were bought with reuse as a justification. But it would help avoid a tendency to go hog-wild and reuse when it's detrimental to readers.