I can appreciate that for internationalisation purposes and forward-compatibility, UTF-8 is the encoding to use for presenting data (e.g. webpages)..but what about the backend?
Most existing textual data on filesystems is probably in the Windows 1252 encoding (or possibly ASCII), but what about when they are stored in a database? Do they retain their original encoding or get re-encoded according to some database default?
What happens when that data is extracted programatically into a dataset, recordset or similar? Does it retain it's encoding or get translated to UTF-16? If so, does this introduce a possibility of conversion errors?
Should you make an effort to convert an existing data archive from the legacy encodings to UTF-16 to lose the backend conversion stage?
I've looked all over for this stuff but can only find wishy-washy stuff, no actual system-wide real world examples.
Any help is appreciated.
Most existing textual data on filesystems is probably in the Windows 1252 encoding (or possibly ASCII), but what about when they are stored in a database? Do they retain their original encoding or get re-encoded according to some database default?
What happens when that data is extracted programatically into a dataset, recordset or similar? Does it retain it's encoding or get translated to UTF-16? If so, does this introduce a possibility of conversion errors?
Should you make an effort to convert an existing data archive from the legacy encodings to UTF-16 to lose the backend conversion stage?
I've looked all over for this stuff but can only find wishy-washy stuff, no actual system-wide real world examples.
Any help is appreciated.