This is the first in a two part series about document generation outside and inside of SharePoint. This post describes the pattern and a standard approach to solving the general problem of document generation using Open Xml and Word Content Controls. Enjoy learning more about what we’ve learned on projects on the subject and Pete’s interesting tone in this article.
The Problem
I don’t want to get off on a rant here, but can we solve this document generation problem once and for all? Ever since Gutenberg moved his first set of metal letters, business users have been asking for documents to be created faster and faster. Johannes probably thought he addressed all of the challenges, but even recently, if a business process required the generation of documents in a dynamic fashion, there were several challenges to overcome including:
- Server side document generation
- Template and data mapping
- Configuration and flexibility
Now, Heraclitus might have said you can never step twice in the same river, but he certainly would understand every programmers desire to stay DRY. We want to solve these kinds of problems once, right? For those of us who use Microsoft technologies to address business problems, the first two challenges have been addressed in recent years by Open Xml and Word Content Controls. There are many excellent resources and samples of how to use these two advances to solve simple document generation concerns (see References and Credits below). However, several of our clients at ThreeWill over the last 2 years have asked for solutions that address the last challenge: integrate these two advances, and make the solution flexible through configuration.
Recognizing the Pattern
Every document generation solution has some set of data that needs to be applied to a template or templates on a repetitive basis. It’s not like we need Rorschach to help interpret this pattern, the data is our Model, the template is our View, and we need to create a Controller to manage the generation activity – even my broken proboscis smells the MVC pattern. To solve this problem once, we want the data to be able to come from a variety of places, including Office Client application like Excel, a database, web service, SharePoint, etc., and the templates (all Word files with Content Control placeholders in our present case) which should be able to come from local files, a file share, a SharePoint document library or a database. The only thing missing is a Controller implementation to merge the model and the view in a generic way.
The Model
To ensure our document generation framework can handle a wide variety of data, our model is based on several simple interfaces. The simplest way to get a consistent model that would support multiple document types was to use basic .NET types that mapped single values, and table or list based values. I hear you saying , “Nice job Einstein. Some surfer solves physic’s most difficult problem and poses a theory of everything, and the best you can come up with is a couple of interfaces?”. Yes, they are simple, but do we really need a Hadron collider to generate documents?
The simple interfaces enable us to build up from a simple mapping of a ContentControlName (based on the tag in the Content Control in the Word Document). The first interface, IIndividualContentMapping, identifies a simple mapping of one tag name to a value, and can then be used to replace all instances of ContentControls with that tag.
public interface IIndividualContentMapping : IContentMapping
{
string Value { get; }
}
The second interface, ITableContentMapping, defines a map of a List of tags (Mappings) as a template row for a table. This template row mapping is used to copy a prototype row and then use the List of rows (Value) to build a table.
public interface ITableContentMapping : IContentMapping
{
List> Value { get; }
List Mapping { get; }
}
The View
No, we are not talking about the show where host’s opinions are worth less than a collateralized debt obligation containing a pool of Florida, Arizona and Southern Californian mortgages. What we are talking about is the documents that will serve as our view in our pattern implementation. In this case, the templatized Word documents that contain ContentControls. These content controls are mapped (by convention) to the ContentControlNames in the Model’s IContentMapping interface. The two interfaces mentioned before, IIndividualContentMapping and ITableContentMapping, enable mapping our model into our view for simple mappings and complex mappings for tables or lists.
public interface IDocumentTemplate
{
Uri FileUri { get; }
Byte[] ToArray();
string GetTemporaryFileName();
}
This interface definition allows us to rely on a consistent API of getting a byte[] that can be used to seed a memory stream for building documents using the OpenXml SDK.
The Controllers – IDocumentBuilder and IComplexDocumentBuilder
We decided that the best way to implement a flexible controller was to define one or more interfaces that would let us leverage Open Xml file types (i.e. not just Word files) to consistently apply the pattern. Here’s what we came up with.
public interface IDocumentBuilder
{
Byte[] BuildDocument(IDocumentTemplate documentTemplate,
IDocumentModel documentModel);
}
public interface IComplexDocumentBuilder
{
byte[] BuildDocument(List documentTemplates, List documentModels);
}
Now, we’re not trying to argue whether these interfaces are the Platonic Ideal, but no one can argue their truthiness. Splitting these interfaces let’s us build simple documents, as well as complex documents composed of multiple templates (views) and multiple data sets (models).
Generating a simple document is accomplished with the following code.
public byte[] BuildDocument(IDocumentTemplate documentTemplate,
IDocumentModel documentModel)
{
...//some code ommitted
byte[] generatedDocBytes = null;
try
{
//make a writeable copy of the template as the documentStream
using (MemoryStream docStream = new MemoryStream())
{
//write template bytes to doc stream
docStream.Write(documentTemplate.ToArray(), 0, documentTemplate.ToArray().Length);
//open the template doc for edit
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(docStream, true))
{
//- Use OpenXML SDK to update Content Controls
WordMLManager wordMgr = new WordMLManager();
foreach (string key in documentModel.ContentMappings.Keys)
{
IContentMapping mapping = documentModel.ContentMappings[key];
if (mapping is TableContentMapping)
{
ITableContentMapping tableContent = mapping as ITableContentMapping;
//map table content
wordMgr.UpdateContentControlsInTable(wordDoc.MainDocumentPart, tableContent);
}
if (mapping is IndividualContentMapping)
{
//map individual content controls
IIndividualContentMapping contentTagMapping =
mapping as IIndividualContentMapping;
//wordMgr.UpdateOpenXmlPartContentControlsByTagName(wordDoc.MainDocumentPart,
// _individualTags, individualValues);
wordMgr.UpdateContentControlsByTagName(wordDoc.MainDocumentPart,
contentTagMapping.ContentControlName, contentTagMapping.Value);
}
}
}
generatedDocBytes = docStream.ToArray();
}
}
...//some code ommitted
return generatedDocBytes; ;
}
The real magic, not the monotone, David Blaine, “I’m sitting in a box” magic, happens thanks to the PowerTools for OpenXml, which allows you to compose a document from multiple templates and data sources. The code snippet simply uses the IDocumentBuilder implementation to build a series of documents, and then “stitches” them together to form a larger, composite document. Magic!
... //code removed
for (int i = 0; i < documentTemplates.Count; i++)
{
IDocumentBuilder documentBuilder = new WordDocumentBuilder();
byte[] componentDocument = documentBuilder.BuildDocument(documentTemplates[i],
documentModels[i]);
//add the componentDocument to the sources to be "stitched" together later
sources.Add(new Source(WordprocessingDocument.Open(new
MemoryStream(componentDocument), false), true));
}
// now create a "stitched together" document from all of the source documents.
using (WordprocessingDocument compiledDocument =
DocumentBuilder.BuildOpenDocument(sources, documentStream))
{
compiledDocument.FlushParts();
//close the source document references
foreach (Source source in sources)
{
source.Close();
}
}
aggregateDocument = documentStream.ToArray();
...//some code omitted
return aggregateDocument;
}
Conclusion
This pattern, which we’ve refined on several projects over the past two years, has worked very well for us. Since the the pattern is based on simple interfaces that build up collections of data, it is very flexible. After all, this was about defining the framework once. It is simple enough to extend this to PowerPoint or Excel, and although neither offers a clean “ContentControl” to replace, there are ways to do this. So maybe this solution isn’t as flexible as the love child of Stretch Armstrong and Mary Lou Retton, but it certainly does get the job done.
Of course, that’s just my opinion, I could be wrong. Leave a comment below and feel free to agree or start your own rant.
Part 2 will cover the application of this pattern inside of a SharePoint application.
References and Credits
Thanks to Sean Hester who contributed significantly to the successful implementation of this pattern for our clients. A huge thanks goes out to Eric White for providing guidance and samples for much of the underlying code for these solutions. And for those looking to build on a solid Open Xml foundation, you will find some incredible examples in the PowerTools for Open XML.
Related posts:













Hey Pete, Thanks for the nice comment on Document Builder / PowerTools. I'm playing around with some new ideas around document generation: http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/ -Eric
- spam
- offensive
- disagree
- off topic
Like