UnicodeCodec

protocol UnicodeCodec

A Unicode encoding form that translates between Unicode scalar values and form-specific code units.

Inheritance _UnicodeEncoding
Conforming Types Unicode.UTF16, Unicode.UTF32, Unicode.UTF8

The UnicodeCodec protocol declares methods that decode code unit sequences into Unicode scalar values and encode Unicode scalar values into code unit sequences. The standard library implements codecs for the UTF-8, UTF-16, and UTF-32 encoding schemes as the UTF8, UTF16, and UTF32 types, respectively. Use the Unicode.Scalar type to work with decoded Unicode scalar values.

Initializers

init init() Required

Creates an instance of the codec.

Declaration

init()

Instance Methods

func decode(_ input: inout I) -> UnicodeDecodingResult Required

Starts or continues decoding a code unit sequence into Unicode scalar values.

To decode a code unit sequence completely, call this method repeatedly until it returns UnicodeDecodingResult.emptyInput. Checking that the iterator was exhausted is not sufficient, because the decoder can store buffered data from the input iterator.

Because of buffering, it is impossible to find the corresponding position in the iterator for a given returned Unicode.Scalar or an error.

The following example decodes the UTF-8 encoded bytes of a string into an array of Unicode.Scalar instances:

let str = "✨Unicode✨"
print(Array(str.utf8))
// Prints "[226, 156, 168, 85, 110, 105, 99, 111, 100, 101, 226, 156, 168]"

var bytesIterator = str.utf8.makeIterator()
var scalars: [Unicode.Scalar] = []
var utf8Decoder = UTF8()
Decode: while true {
    switch utf8Decoder.decode(&bytesIterator) {
    case .scalarValue(let v): scalars.append(v)
    case .emptyInput: break Decode
    case .error:
        print("Decoding error")
        break Decode
    }
}
print(scalars)
// Prints "["\u{2728}", "U", "n", "i", "c", "o", "d", "e", "\u{2728}"]"
  • Parameter input: An iterator of code units to be decoded. input must be the same iterator instance in repeated calls to this method. Do not advance the iterator or any copies of the iterator outside this method.

Declaration

mutating func decode<I>(_ input: inout I) -> UnicodeDecodingResult where I: IteratorProtocol, Self.CodeUnit == I.Element

Type Methods

func encode(_ input: Unicode.Scalar, into processCodeUnit: (Self.CodeUnit) -> Void) Required

Encodes a Unicode scalar as a series of code units by calling the given closure on each code unit.

For example, the musical fermata symbol ("𝄐") is a single Unicode scalar value (\u{1D110}) but requires four code units for its UTF-8 representation. The following code uses the UTF8 codec to encode a fermata in UTF-8:

var bytes: [UTF8.CodeUnit] = []
UTF8.encode("𝄐", into: { bytes.append($0) })
print(bytes)
// Prints "[240, 157, 132, 144]"

Declaration

static func encode(_ input: Unicode.Scalar, into processCodeUnit: (Self.CodeUnit) -> Void)