wiki:accoXchange/keep_it_simple_stupid

The "Keep it Simple Stupid" accommodation data transport format

There are two thoughts that may cross your mind when you look at other people's data structures. “That's smart, I should remember that if I ever happen to work on something like that” is one, “Oh my God!” is the other one.

You can learn from both of them. Of course, beyond mere data structures there are other things that accumulated experience will teach you.

Now that I have some time I think it is the right moment to start working on my own little accommodation data transport format.

Curious? Here goes an outline of my design principles.

  • Simple: Perfection is achieved not by adding things but by taking things away.
  • Expressive: most parts of the format must reveal themselves without having to dig through piles of documentation.
  • Universal: The format can be used for descriptive content as well as for ARI data. The same format is used for both creating and updating data in the target system.
  • Efficient: the amount of information in relation to the amount of structure-giving data and meta-data such es tags and declarations is high.
  • Reliable: Every element is not only defined by specification of its semantics but by describing exactly how it must be processed. This way there is no room left for special cases, misinterpretation and “what if” questions.
  • Useful: A data transport format is only useful if the transported data has a place to become persistent. Therefore, apart from a format specification e.g. an XSD the corresponding database schema will come along with it.

Simple

The 20 / 80 rule also applies to data formats: Twenty per cent of format specification will cover eighty percent of all travel products. Curious enough, price accuracy rates of many suppliers on the German market are around eighty percent when they reach distribution systems. Is that really just a coincidence?

Of course, everybody’s goal in the travel industry is to get as close as possible to the one hundred per cent, but for what effort? Do we really need to be able to represent every fancy calculation rule whether it makes sense or not? Do we really need to transport every tiny piece of static Information no matter if it will ever show up in a front end?

I am sure we should not. Tests of ARI accuracy run over converted cache data prove I am having a point. An approach which reduces complexity and potentially still gets you close to the 95% mark has a lot of things in favour.

Reduced complexity leads to:

  • Easier and faster implementation, thus less investment.
  • Less errors during data conversion, thus better data quality at the end of the day.
  • Faster processing, shorter time to market
  • More human readability making bug squeezing and analysis much easier.

During my last visit in Palma the head of IT of an accommodation supplier told us that on one occasion a contractor had agreed with a hotel that they would support a very special, very nifty type of logic for offers the hotel wanted to have. The implementation of that logic was not trivial and would only be used by that specific hotel. So the IT department decided to go through all bookings of the previous year to see in how many cases the conditions for that type of offers would have been met. There was not one single case where that type of offer would have been sold the previous year. On that basis the IT department refused to implement the required logic. WELL DONE!

But what happens with all those players in the hospitality industry that believe they can't live without some weird offer combination rules or that drive revenue management to its extremes? Well if you don’t want to pollute your format with strange and difficult to understand structures and their corresponding weird interpretation the only other chance is to provide a smart and efficient way to include day rates.

Expressive

One of the golden rules when working on schemas is that naming should be as self explaining as possible. There is not much more to say. However, well thought through naming and naming conventions do not make documentation superfluous. If the information is just stored away into a database there is little more to state than the data type and scope. If some validation or mangling is done this should be described. The only thing where documentation must be verbose is when data needs to be interpreted.

An expressive hence self explaining data format will completely change the requirements for good documentation.

Universal

Ninety nine per cent of all accommodation suppliers will use different data formats for ARI data and static content.

Most of them will also provide different formats for either creating a new contract in the inventory system or updating an existing contract. The only advantage of this distinction I can think of is that you can apply stricter rules on what is mandatory for new contracts while making most of the included data optional for updates. Well, that is a good point in theory but we would end up with two identical schema definitions which would only be distinct by the mandatory declarations of the elements. Would that make sense? I think it doesn't.

Efficient

On StackOverflow? somebody once asked "how do I embed binary data in XML?" Somebody posted the following answer:

XML is so versatile...

<DATA>
  <BINARY>
    <BIT index="0">0</BIT>
    <BIT index="1">0</BIT>
    <BIT index="2">1</BIT>
    ...
    <BIT index="n">1</BIT>
  </BINARY>
</DATA>

XML is like violence - If it doesn't solve your problem, you're not using enough of it.

Although the answer was not meant seriously (of course you would base64-encode the binary data and put it into an element as a CDATA block), it illustrates pretty well one common problem with most formats used in the Travel Industry. Lots of XML just to transport one single price of one single rate plan for one single room on one single day.

For things like prices and availabilities there are smarter ways to transport this data than putting every single bit of information into a separate element or attribute in XML. The same goes for the corresponding database schema.

So, instead of writing something like

  <Rate Plan="BARHB20" Type="RoomRate" >
    <Day Date="2018-01-01">150.50</Day>
    <Day Date="2018-01-02">150.50</Day>
    <Day Date="2018-01-03">135.00</Day>
    ...
  </Rate>

it is far more efficient to do something like

<Rate Plan="BARHB20" Type="RoomRate" Start="2018-01-01">150.50 150.50 135.00</Rate>

Reliable

If you want to see how to specify a format properly, have a look at Adobe's PostScript?® format. The specification do not only tell you what a certain instruction should do, it tells you how exactly an instruction should process its parameters. As a result even very complex documents will print identically on every output device which uses PostScript?. To be fair it must be said that PostScript? rather than a format is a description language which is why its specifications are so focused on the "how" of its processing. But still, the approach can also be used for an XML format.

As soon as there is a step-by-step instruction on how every single element needs to be processed, including even implementation details such as a "rate stack" where calculation results are pushed on, you will make sure that every implementation that follows the specifications will return the same price.

Useful

One of the biggest drawbacks of format specifications is that they are - well - just specifications. Somebody starts working on a format (OTA, HTNG, OTDS, NDC, you name it) but none of them offers even the slightest approach on how the transported data is meant to be made persistent. To be fair, these formats are thought to work on already existing production stacks so they should adapt to what is already there. If they really do is an entirely different question.

In case of my little project the idea is to also provide a database schema for data storage.

Last modified 4 months ago Last modified on Sep 14, 2017 7:10:45 PM