270 likes | 414 Views
XML Transmission Compaction The Quest for Streaming Updates Wednesday, 15 October 2003 5:00pm – 5:30pm. James E. Hartley FISD/SIIA Chief Technologist. Which of the Following Statements is True?. The World is Flat The Moon is Made of Cheese The Earth is the Center of the Universe
E N D
XML Transmission Compaction The Quest for Streaming Updates Wednesday, 15 October 2003 5:00pm – 5:30pm James E. Hartley FISD/SIIA Chief Technologist
Which of the Following Statements is True? The World is Flat The Moon is Made of Cheese The Earth is the Center of the Universe XML is Too Verbose for Market Data
Which of the Following Statements is True? The World is Flat Christopher Columbus The Moon is Made of Cheese Neil Armstrong The Earth is the Center of the Universe Copernicus, Galileo, Keplar, Newton XML is Too Verbose for Market Data James Hartley??
What is XML and What’s the Deal? • XML is a way of encoding data with descriptive tags facilitating data interchange • Example: Passing a date and time • Instead of just “2003/10/15 5:00 p.m.” <dateTime>2003-10-15T17:00:00+05:00</dateTime> • Example: Passing a “last trade” price • Instead of just “103.73” <trade><last>103.73</last><currency>USD</currency></trade>
But Wait – That Does Seem Verbose!! • Date and Time… from 20 to 46 bytes • Last price… from 6 to 58 bytes • In fact, encoding of data in XML can take over 10 times the number of bytes!
Let’s Consider MDDL – XML for Market Data • MDDL is “Market Data Definition Language” The XML specification to enable the interchange of information necessary to account, to analyze, and to trade financial instruments of the world's markets. • The industry standard for encoding market data in XML – for all your data needs!
But Why Would We Want MDDL? • Common terms, definitions, and relationships of data used in the market data industry • Removes confusion on data and definition • Facilitates merging and data interchange • Neutral standard for encoding data • The list goes on – for more info, just ask!
What If There Were… • An industry standard nomenclature for describing market data? • An industry standard data feed for requesting and distributing market data? • Would that be worth something to ya? Huh? • Regardless of position in industry – there is value (positives are greater than negatives)
A Trade in MDDL – Ala Tokyo Stock Exchange 890 Bytes! <mddl version="2.2-beta"> <header> <dateTime>2003-10-15T17:00:00.000+05:00<dateTime> <source>XTC Demonstration</source> </header> <snap><equityDomain><commonClass> <instrumentIdentifier> <code scheme="http://www.mddl.org/ext/scheme/symbol?SRC=XTKS">6501</code> <name>A Company in Your Neighborhood</name> </instrumentIdentifier> <sequence>0306</sequence> <session>1</session> <trade> <last>12375</last> <dateTime>2003-10-15T16:58:32.234+05:00</dateTime> <marketCenter> <code scheme="http://www.mddl.org/xtc/Examples/scheme/iso10383.xml">XTKS</code> </marketCenter> <size>200</size> <currency>JPY</currency> <status scheme="http://wws.mddl.org/xtc/Examples/scheme/tradeStatus.xml">normal</status> </trade> </commonClass></equityDomain></snap> </mddl>
A Trade in MDDL – Ala Tokyo Stock Exchange 890 Bytes! <mddl version="2.2-beta"> <header> <dateTime>2003-10-15T17:00:00.000+05:00<dateTime> <source>XTC Demonstration</source> </header> <snap><equityDomain><commonClass> <instrumentIdentifier> <code scheme="http://www.mddl.org/ext/scheme/symbol?SRC=XTKS">6501</code> <name>A Company in Your Neighborhood</name> </instrumentIdentifier> <sequence>0306</sequence> <session>1</session> <trade> <last>12375</last> <dateTime>2003-10-15T16:58:32.234+05:00</dateTime> <marketCenter> <code scheme="http://www.mddl.org/xtc/Examples/scheme/iso10383.xml">XTKS</code> </marketCenter> <size>200</size> <currency>JPY</currency> <status scheme="http://wws.mddl.org/xtc/Examples/scheme/tradeStatus.xml">normal</status> </trade> </commonClass></equityDomain></snap> </mddl>
How Do We Deal With This? • Identify which data elements actually are modified – these are “fields” • Remaining text is nothing more than markup • The remaining shell defines a “template” • The “template” would need to be transmitted once a day or so…
XML X Transmission T C Compaction
Size of Data Transmitted - Primer • A “bit” is the atomic unit of electronic data • Its value may be “0” or “1” • 4 bits is a “nybble” • 2 nybbles is a “byte” (or 8 bits) • 2 bytes is a “word” • We want to minimize bytes-per-message
A Trade in MDDL – Ala Tokyo Stock Exchange 890 Bytes! <mddl version="2.2-beta"> <header> <dateTime>2003-10-15T17:00:00.000+05:00<dateTime> <source>XTC Demonstration</source> </header> <snap><equityDomain><commonClass> <instrumentIdentifier> <code scheme="http://www.mddl.org/ext/scheme/symbol?SRC=XTKS">6501</code> <name>A Company in Your Neighborhood</name> </instrumentIdentifier> <sequence>0306</sequence> <session>1</session> <trade> <last>12375</last> <dateTime>2003-10-15T16:58:32.234+05:00</dateTime> <marketCenter> <code scheme="http://www.mddl.org/xtc/Examples/scheme/iso10383.xml">XTKS</code> </marketCenter> <size>200</size> <currency>JPY</currency> <status scheme="http://wws.mddl.org/xtc/Examples/scheme/tradeStatus.xml">normal</status> </trade> </commonClass></equityDomain></snap> </mddl>
The Fields We Need to Worry About • Time of Message: “2003-10-15T17:00:00.000+05:00” • Ticker Symbol: “6501” • Sequence Number: “0306” • Last Trade Price: “12375” • Time of Trade: “2003-10-15T16:59:59.234+05:00” • Exchange of Trade: “XTKS” • Size of Trade: “200” • Trade Status: “normal” • Getting better – down to 84 bytes…
Time of Message, Time of Trade • What if we sent a “heartbeat” message once per half-second (500 milliseconds)? • Then we could tell time as a “delta” from that frequent “timestamp” • 500 milliseconds can be delivered in 9 bits • Even fewer if we get creative…
Ticker Symbol • The Tokyo Stock Exchange uses 4-digit numbers for many of their stocks • Our system could map each unique stock to a specific number – and sort based on “most active” • 20 bits is enough to allow for 1,048,575 instruments in our system
Sequence Number • A sequence number helps the receiver determine if there are missing messages • The number usually “wraps” to zero if it exceeds the current maximum – but it is prudent to allow for sufficient transactions • 12 bits allows for 4096 transactions per day on a particular stock
Last Trade Price, Size of Trade • Our Tokyo price is provided in Japanese Yen • 16 bits allows for 65535 yen… • We can play games with “lots” and “blocks” to report the number of stocks traded • 8 bits allows for a wide range of values…
Exchange of Trade • There are a limited number of exchanges that can be referenced • In our case, “XTKS” is one of just a few exchanges that are legal for the TSE • 2 bits is enough to identify the exchange…
Trade Status • As with the exchange, the trade status is one of a few values • Note: “status” is a fictitious field in MDDL • 5 bits allows 32 unique status values
So, Let’s Check Our Count • Time of Message: 9 bits • Ticker Symbol: 20 bits • Sequence Number: 12 bits • Last Trade Price: 16 bits • Time of Trade: 9 bits • Exchange of Trade: 2 bits • Size of Trade: 8 bits • Trade Status: 5 bits • Not too bad – down to 81 bits (10.1 bytes)… • There is an additional 3 byte overhead (length, msgid)
With Careful Analysis and Structuring • Time of Message: 9 bits – can be done with 8 bits (hundreds) • Ticker Symbol: 20 bits – average of 10 bits • Sequence Number: 12 bits – average of 8 bits • Last Trade Price: 16 bits – average of 12 bits • Time of Trade: 9 bits – can be done with 8 bits • Exchange of Trade: 2 bits – can be removed (covered in overhead) • Size of Trade: 8 bits – average of 4 bits (and in overhead) • Trade Status: 5 bits – average of 4 bits • Much better – down to 54 bits (6.8 bytes)… • … And we haven’t even used a computer yet…
Summing It All Up… • Once or twice a day – transmit “template” • And other framework – maybe 60K Bytes • Twice a second – transmit “heartbeat” • Containing about 24 bytes • With each trade – transmit “content” • Less than 9 bytes (after all techniques used)
What Does It Mean? • A self-describing datafeed can be just as efficient as existing proprietary protocols • Bandwidth is not compromised • Processing power is not compromised • A self-describing datafeed allows content to be added dynamically • Increases availability of new features at the convenience of the provider
What Does It All Mean? • XML (when properly implemented) facilitates merging and comparison of data • Like terms are compared • Different terms are easily merged • A self-describing datafeed allows content to be added dynamically • Increases availability of new features at the convenience of the provider
This story will continue… Questions or Comments?