Saturday, May 19, 2007

Large message processing

Typical Large message documents:

  • Large flat file documents with high volume (many records) and occasionally batched
  • Large flat file documents wrapped in a single CDATA section node in an XML
  • Large XML documents with thousands to millions of "rows" that were batched together
  • EDI interchanges where the file or data to be processed independently or in aggregate
  • Large flat document with a header and trailer at the starts and ends of the file with thousands to millions of records, each record need to be processed separately from the others, but the entire sequence must be processed in order to complete properly


Transforming a document with a map is a memory-intensive operation. BizTalk Server passes the message stream to the .Net XslTransform class, which then loads the document into a .NET XPathDocument object for processing in BizTalk 2006/2004, Where as DOM in the case of BizTalk 2002/2000. Loading the document into the .NET XPathDocument can potentially expand the original file size in memory by a factor of 10 or more.

XPathDocument caches information about the nodes of the XML along with the data itself to allow for faster access, but this result into high performance penalties because of the redundant data that sits in the objects. This is where 90%+ of the Out Of Memory (OOM) exceptions that cause orchestrations and receive/send ports to fail.

This expansion may be more pronounced when mapping flat files because flat files must be parsed into XML before they can be transformed

Note:
1 MB document may be enough with JITTed product and user code assemblies, other messages flowing through the process enough to blow the process to 200-500 MB in memory.
Since BizTalk converts the data into XML for internal processing we need to worry more with the flat files (Non-Xml files) thou they are designed to be as efficient as possible in order minimize cost, but XML explicitly stated this as a non-goal, with readability as a much higher priority.


The best recommendation not to send data that is more than 1MB into BizTalk, without some form of custom processing or large memory machines.
If possible try to transform the XML file before passing onto BizTalk Server Orchestration.

Other approach is to use distinguished fields or property promotion in our process. Orchestration does not load the data of the message stream unless required orchestration will fetch the right value without loading the whole message into memory and update the value this is a powerful means to manipulate key fields without loading the whole document into memory.

Adjust the message size threshold above which documents are buffered to the file system during mapping. To modify the size threshold, create a DWORD value named TransformThreshold in the BizTalk Server registry

HKLM\Software\Microsoft\BizTalk Server\3.0\Administration\TransformThreshold

Enter a decimal value with the number of bytes to set the new threshold to. E.g. 2097152 to increase the message size threshold to 2 MB (from the default of 1 MB). Increase this value on systems with a large amount of available memory to improve throughput. Buffering documents to disk conserves memory at a slight cost to overall throughput.

Wednesday, May 2, 2007

Typed/Untyped Messages & its Implementation

Let’s get into details what is a Typed Message?
Typed Message nothing but a strongly typed message conforms to a selected schema (XSD) or .NET class and the message inherits its properties from this schema or .NET class.

Where as Untyped Message nothing but a message which is configured to use System.Xml.Xmldocument as Message type
Or Non-typed Message, which is not tied to a specific schema

Let’s take simple scenario where by you need to receive PO’s (Purchase Orders) from different locations, systems or partners
Then implement predefine business logic (like check stocks, grade, delivery etc), since we are receiving from different sources PO details might vary.

As a developer we might need to define a generic fashion solution rather then customized solution to fit/resolve one issue!
Instead of building multiple solutions, you are now choose to implement Untyped Message process to receive different PO’s
Then implement predefine business logic, which gives you more manageability over the solutions.