Continuing my series of posts about Word Automation Services, I wanted to talk specifically about the things we did around two of our focus areas of the first release: performance and scale.
The goals for the service
One of the overall goals for Word 2010, both clients and servers, was ensuring that we built a version of Word that was better and faster than anything we've previously released. When we started building a server-based solution for manipulating Word documents, we took that message around performance to heart – it was clear that one of our primary objectives had to be ensuring that the service could scale to "server-like" loads; something that the previous "solution" of just running the client was ill-equipped to do,
Microsoft Office Professional Plus 2010, as it was optimized to be run on an interactive desktop by a single user. That goal meant answering a few important questions:
How can we improve the raw speed of a server-based component, given that we'd know: Exactly what task we were doing (e.g. converting a document)? That we're not being run on an interactive desktop? Etc. How can we enable ourselves to scale beyond a single instance, to work well in environments where the # of CPUs/machines is the scale factor (and certainly not the raw speed of a single CPU)? How can we enable ourselves to handle significant "peaks" of input, given that even the fastest engine we could build would be unlikely to keep up all of the time? What other assumptions does the client application make that don't apply to the server?
The answers to those questions resulted in work that fell into three distinct buckets: raw performance improvements,
Windows 7 Pro, reducing resource contention, and the creation of a persistent queue.
Raw Performance Improvements
The first set of improvements for Word Automation Services focused on its "raw speed" – how fast the service could process a single file. Our plan here primarily focused on answering the question: What does the desktop version of Word need to do that we don't need to do on the server? Each answer to that question gave us something to focus on removing from the service,
Office 2010 Key, improving its performance characteristics.
This meant doing an inventory of Word, of sorts, and realizing that we didn't need things that ranged from the incredibly obvious (Ribbon and other UI-related code) to the obscure (querying the registry for the friendly name for embedded objects, which we do so you can see them in the status bar when the object gets focus: ). It also meant that we had to update assumptions as basic as the fact that we needed to try to update every field in the document; given that a server process operates in a restricted-rights environment without access to remote files, the registry, or a user identity, we can eke out small gains by not updating INCLUDETEXT/AUTHOR/etc. fields at all.
In the end, we were able to create an engine that ranges between 10% (DOCX->DOCX) and 30% (DOCX->PDF) faster than the desktop application on similar hardware when performing the actions supported by the service (document conversion). Our focus on a few core scenarios enabled us to optimize our engine for those tasks.
Reducing Resource Contention
If you've ever tried to use the desktop version of Word to do server-side automation, I'm sure you've run into an example of the traditional problems of this type: error dialogs that "normal.dot is in use", severe slowdown in performance with multiple processes running, etc.
When we set out to build a server-ready version of Word,
Office 2007 Key, it was clear that this class of issues was something we had to tackle – the service needed to be able to scale efficiently to machines with 8 cores of processing power (high-end today, widely-available in the not-so-distant future).
This meant a long process of measurement and analysis in which we looked at our scale barriers (GDI contention, disk contention,
Microsoft Office Professional Plus 2010, etc.) and worked through them one-by-one – doing things like making sure we never depended on a disk-based resource (temporary files, etc. needed to be memory based), as well as optimizing our use of system-wide resources like GDI locks.
This work didn't make Word faster, but it did result in a service that scales linearly up to four simultaneous conversions on a single machine, and which can be scaled out among many machines – a significant improvement over desktop Word, and one we'll continue to build on in future versions.
Creating a Persistent Queue
Even with all of those improvements in place, it was obvious that our service would often be unable to keep up with incoming requests – if you ask to convert 10,000 Word documents to PDF, even the fastest engine needs some time to process that workload.
To handle this, we built the service to keep a queue, enabling us to receive peaks of work and process them as resources allowed; knowing that we're processing arbitrary input documents, we then went a step further and made this queue persistent, so that a single rogue document, machine hiccup, etc. didn't cause a job of thousands of items to stop mid-processing with no clear indication of what was completed and what was not.
We'll be publishing more precise data on how the server scales both up and out as part of Capacity Planning guidance for SharePoint 2010; laying a solid foundation here was definitely one of our goals.
- Tristan
<div