tag:blogger.com,1999:blog-6638235310949735555.post8916981143397345323..comments2024-03-28T20:01:14.536+01:00Comments on Qb's C++ blog: Unicode and your application (1 of n)QbProghttp://www.blogger.com/profile/02804332945083156370noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-6638235310949735555.post-5141661888967525082014-04-25T14:33:52.045+02:002014-04-25T14:33:52.045+02:00Just one word. utf8everywhere.orgJust one word. utf8everywhere.orgPavel Radzivilovskyhttps://www.blogger.com/profile/11584347891261713996noreply@blogger.comtag:blogger.com,1999:blog-6638235310949735555.post-88819225656650397042013-07-17T20:59:56.013+02:002013-07-17T20:59:56.013+02:00> So, even if you have an 8-bit ASCII-codepaged...> So, even if you have an 8-bit ASCII-codepaged text, you cannot use it as UTF8.<br /><br />You are conflating the ASCII _encoding_ with the 8-bit SBCS _format_. <br /><br />ASCII is decidedly a 7-bit encoding, end of story. *No* 8-bit encoding (Latin1,Windows-1252,...) is synonymous with ASCII. To use the term "ASCII" when you really mean "any SBCS/MBCS encoding" does nothing but add to the confusion.<br /><br />It is safe to say that for any encoding in common use today (UTF-xx, Latin1, SJIS, GBK, whatever), the first 128 characters of the encoding match the 128 characters of ASCII precisely. Anything beyond 0x7f is encoding-specific and simply cannot be represented as ASCII.<br />Anonymousnoreply@blogger.com