1.15. Worth to Know About UNICODE

1.15. Worth to Know About UNICODE
Prev	Chapter 1. How to use this library	Next

Not so many years ago it was rather complicated for people from different countries to communicate over the internet. The usual 8bit wide numbers were not able to represent more than 255 different characters and often the result of such a transmission was more or less garbage. And it only worked if both sides of the connection knew exactly what the other side was using.

To improve this situation UNICODE was developed. UNICODE has two main advantages:

Every character is represented by at least 16bit. This means that the majority of characters used on earth is clearly transmitted over a connection.
There is an exact documentation about the meaning of every number representing a character. So both sides can reliably transmit characters beyond the ASCII range without guessing the meaning or converting the data.

Unfortunately UNICODE and its wide characters waste a lot of memory. Most of the characters in daily use are within the ASCII range and therefor use only 7 bits. This means that 25 bits are wasted if your system uses 32bit wide characters. As a compromise UTF-8 can be used. Depending on the code point of a character UTF-8 uses 1 to 5 bytes for storage. As a nice side effect a text remains readable even if there are special characters since the characters within the ASCII range are exactly the same.

ulxmlrpcpp supports both the wide characters of UNICODE as well as UTF-8. If you need wide characters you must enable this at compile time. In this case several additional functions and class members are included.

ulxmlrpcpp expectes all strings either in wide characters or in UTF-8. There are several helper functions to convert your strings if they do not already fit. See the api documentation for details.

asciiToUtf8(), utf8ToAscii()
utf8ToUnicode(), unicodeToUtf8()
utf8ToEncoding(), encodingToUtf8(), convertEncoding()
getUnicode(), getLatin1()

Please note that versions up to 1.4 used 8bit encodings instead of UTF-8. This might be a compatibility break in your application.

Prev	Up	Next
1.14.2. Creating a WAP Binary XML parser	Home	Chapter 2. Automatically Create Server and Client Sources