Converts texts between various character sets including Unicode (UTF).
This online tool converts texts between different encodings (charsets) – for example, CP1251 text into a UTF-8-encoded document.
You can either upload your document, enter it directly or pass an URL.
Linking back with specific form settings
As with all tools here, you can link to a specific form setup with an URL of this form:
The result is:
- New charset select box is set to «utf-8» (you can set to any other, of course; keep in mind this value is case-sensitive);
- Source charset – to «Detect automatically»;
- Download converted is set to off (produced document will be shown).
Character sets are different schemes of representing textual data within a string. Generally they can be divided into those that represent a limited subset of characters from alphabets of languages spoken on the Earth and those that represent all languages (and math, currency, etc. symbols too) in one character set (better say «encoding» in this case since there are no «character sets» anymore).
The latter are variants of what is called Unicode (see also UTF (Unicode Transport Protocol)).
Some of the format character encodings are called 1-byte (or 8-bit) ANSI or ISO-8859 because they're used for one particular language (e.g. US-ANSI) and each character takes up exactly one byte of memory.
This tool lets you convert your document (by direct input, by URL or by upload) between all possible charsets – ranging from national ANSIs to UTF-8, UTF-32, UCS-4 and plain Unicode.
Also, when converting text into a charset that might not have all the characters found in original document you can specify what to do with such characters:
- «Replace with similar-looking symbols» – perform a silent convertion of characters such as Ǒ or Ō into O (capital Latin letter «O»). If there's a more suitable non-Latin character found in target encoding it will be used instead.
- Note: sometimes if a symbol is encountered that can't be mapped into a similar symbol an empty document is returned.
- «Ignore» – simply discard invalid characters; this is most reliable option;
- «Cut from first illegal character» – this will stop convertion on the first non-representable symbol in source stream and return with what has been already converted, ignoring the rest.