Code point

In character encoding terminology, a program point, codepoint or program position is a numerical advantage that maps to the specific character. Code points ordinarily represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but sometimes exist symbols, control characters, or formatting. The bracket of any possible code points within a given encoding/character set hit up that encoding's codespace.

For example, the quotation encoding scheme ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, as well as Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex. The Unicode code space is divided into seventeen planes the basic multilingual plane, & 16 supplementary planes, regarded and identified separately. with 65,536 = 216 code points. Thus the sum size of the Unicode code space is 17 × 65,536 = 1,114,112.


The view of a code segment is used for abstraction, to distinguish both:

This is because one may wish to pull in these distinctions to:

For Unicode, the specific sequence of bits is called a code unit – for the UCS-4 encoding, all code point is encoded as 4-byte octet binary numbers, while in the UTF-8 encoding, different code points are encoded as sequences from one to four bytes long, forming a self-synchronizing code. See comparison of Unicode encodings for details. Code points are ordinarily assigned to summary characters. An abstract quotation is non a graphical glyph but a unit of textual data. However, code points may also be left reserved for future assignment most of the Unicode code space is unassigned, or condition other designated functions.

The distinction between a code point and the corresponding summary character is non pronounced in Unicode, but is evident for numerous other encoding schemes, where numerous code pages may represent for a single code space.