GitHub - railgunlabs/unicorn: Unicode® algorithms on a chip. Compliant with MISRA C:2012.

Unicorn is a lightweight, embeddable implementation of essential Unicode® algorithms written in C99.

Unicorn is compliant with the MISRA C:2012 coding standard. It's perfect for resource constrained devices like microcontrollers and IoT devices.

Features

Normalization (docs)
Case mapping (docs)
Collation (docs)
Segmentation (docs)
Short string compression (docs)
UTF-8, 16, and 32 iterators and convertors (docs)
Various character properties (docs)
MISRA C:2012 compliance (learn more)
Written in C99 with no external dependencies

Fully Customizable

Unicorn is fully customizable. You can choose which Unicode algorithms and character properties to include.

To customize Unicorn, modify features.json and compile the code according to the build instructions. The schema for features.json is documented here.

Ultra Portable

Unicorn is ultra portable. It does not require an FPU or 64-bit integers. It's written in C99 and only requires a few features from libc which are listed in the following table.

Header	Types	Macros	Functions
stdint.h	`int8_t`, `int16_t`, `int32_t` `uint8_t`, `uint16_t`, `uint32_t`
string.h			`memcpy`, `memset`, `memcmp`
stddef.h	`size_t`	`NULL`
stdbool.h		`bool`, `true`, `false`
assert.h		`assert`

MISRA C:2012 Compliance

Unicorn honors all Mandatory, most Required, and most Advisory rules defined by MISRA C:2012 and its amendments. Deviations are documented here. You are encouraged to audit Unicorn and verify its level of conformance is acceptable.

Supported Unicode Encodings

All functions that operate on text can accept UTF-8, UTF-16, UTF-32, or Unicode scalar values. UTF-16 and UTF-32 are accepted as big endian, little endian, and native byte order.

By default, the implementation performs runtime safety checks to guard against malformed or maliciously encoded text. If you know your text isn't malformed you can opt-in to skip these checks to improve processing performance.

Thread Safety

Unicorn is thread-safe except for the following caveats:

Functions that allocate memory are only as thread-safe as the allocator itself.
The configuration API is not thread-safe, however, in typical usage it's only invoked at application startup and only if the default configuration is unsatisfactory.

Atomic Operations

All operations in Unicorn are atomic. That means either an operation occurs or nothing occurs at all. This guarantees errors, such as out-of-memory errors, never corrupt internal state. This also means if an error occurs, like an out of memory error, then you can recover (free up memory) and try the same operation again.

Extensively Tested

100% branch test coverage
Official Unicode conformance tests
Manually written tests
Out-of-memory tests
Fuzz tests
Static analysis
Valgrind analysis
Code sanitizers (UBSAN, ASAN, and MSAN)
Extensive use of assert() and run-time checks

Building

Building Unicorn requires Python 3.10 or newer and a C99 compiler.

To build Unicorn, download the latest version from the releases page and build with

$ ./configure
$ make
$ make install

or build with CMake

$ cmake -B build
$ cmake --build build
$ cmake --install build

If you modify features.json both configure and CMake scripts will auto-detect the changes and rebuild accordingly.

Support

Submit patches and bug reports to RailgunLabs.com/contribute. Do not open a pull request. The pull request tab is enabled because GitHub does not provide a mechanism to disable it.

License

Unicorn is available under the following licenses:

The unit tests and Unicode data generators are not public. Access to them is granted exclusively to commercial licensees.

Unicode® is a registered trademark of Unicode, Inc. in the United States and other countries. This project is not in any way associated with or endorsed or sponsored by Unicode, Inc. (aka The Unicode Consortium).

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github		.github
examples		examples
include		include
man		man
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile.am		Makefile.am
README.md		README.md
UnicornConfig.cmake.in		UnicornConfig.cmake.in
autogen.sh		autogen.sh
configure.ac		configure.ac
features.json		features.json
unicode.bin		unicode.bin
unicorn.pc.in		unicorn.pc.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

Fully Customizable

Ultra Portable

MISRA C:2012 Compliance

Supported Unicode Encodings

Thread Safety

Atomic Operations

Extensively Tested

Building

Support

License

About

Uh oh!

Releases 10

Uh oh!

Languages

License

railgunlabs/unicorn

Folders and files

Latest commit

History

Repository files navigation

Features

Fully Customizable

Ultra Portable

MISRA C:2012 Compliance

Supported Unicode Encodings

Thread Safety

Atomic Operations

Extensively Tested

Building

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Uh oh!

Languages