The flat file schema wizard in BTS 2006 helps the creation of flat file schemas massively. previously I found that getting the correct schema structure was the most challenging part, now the wizard helps alleviate much of this pain.
Recently I have been investigating a flat file schema for a fairly complex file structure and I thought I would share some of my findings. The structure of the file was as follows:
There is a header element for a batch, a file can contain many batches. The file has a single trailer at the end. Each batch can contain many records, there are 12 different posible types of record. These can appear in any order in the batch or be missing completely. The records are positional, with many optional fields. Optional fields at the end of a record may be ommitted completely.
To get the schema to work correctly I had to use some manual schema manipulaion because the BTS schema editor still does not allow the setting of certain (very useful) attributes through the UI.
The basic structure of the schema was created using the flat file wizard. Care needs to taken when selecting field types, particularly regarding records, repeating records and when to ignore. Rather than discussing the wizard in detail too much I would recommend reading the article at: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/bts06developing/html/a5e1453f-0380-4505-97a9-9d3526db0923.asp
After running the wizard I had a schema that supported multiple headers, multiple records and the single trailer, however it only worked if the records were fully formed and in the order defined in the schema.
To handle the optional fields, first they need to have the minOccurs attribute set to 0, then at the start of the schema the following attributes need to be set manually in the annotations section:
parser_optimization="compexity"
allow_early_termination="true"
early_terminate_optional_fields="true"
The next challenge was how to get the parser to recognise the different row types. Fortunately for me the records all contained a "type" field. I was able to use this as a tag, and set the tag offset to start position of this field within the record. The wizards allos you to set a tag for a record but not the offset. The offset can be set through the UI.
By marking each of the records as optional the schema supported zero to many records in a batch, but they had to be in the order defined in the schema and iy was not possible to miss out records - only the end ones could be missed. This is because by default the structure type is 'sequence'. To overcome this we are back into manually editing the schema.
I wouldn't like to say this is the best way to acheive this, but it worked and seemed to perform on large ish files (0.5MB). First I wrapped each record in a choice element with minOccurs set to 0 and maxOccurs to unbounded. xs:choice minOccurs="0" maxOccurs="unbounded" This allows records to appear multiple times and to have records ommitted from the sequence, but does not allow for records to appear in any order. I then wrapped all these choices in a single choice defined the same as above. This now lets me have any records in any order in each batch.
Sample data and the final schema can emailed on request.