String Encoding
Contents
String Encoding#
This topic describes the string encoding used by the AVBlocks C++ API.
Unicode Strings#
The AVBlocks API uses UTF-16 unicode encoding.
The AVBlocks API character type is char_t
, where char_t
represents a UTF-16 character and is defined as a 16 bit unsigned integer on all platforms (Windows, Mac and Linux).
Note: A Unicode character can be expressed by 1 or 2 UTF-16 code units, i.e. by 2 or 4 bytes.
Windows#
On Windows the char_t
type is the same as the wchar_t
type. Your Windows application must be compiled for Unicode OR alternatively you should convert strings to wchar_t
before passing calling the AVBlocks API.
Mac#
CoreFoundation#
CFStringRef
is UTF-16, so the encoding matches the char_t
encoding. In your code, you can use the CFStringGetCharactersPtr
function:
// Get a UniChar* pointer from CFStringRef and pass it to AVBlocks
NSString theString;
const UniChar* theStringPtr = CFStringGetCharactersPtr((__bridge CFStringRef) theString);
primo::codecs::MetaAttribute* metaAttribute = primo::avblocks::Library::createMetaAttribute();
metaAttribute->setName(primo::codecs::Meta::Album);
metaAttribute->setValue(theStringPtr);
metaAttribute->release();
Foundation#
NSString is UTF-16. The UTF-16 encoding matches the AVBlocks API. You can use the CFStringGetCharactersPtr
function:
// Get a UniChar* pointer from NSString and pass it to AVBlocks
NSString theString;
const UniChar* theStringPtr = CFStringGetCharactersPtr((__bridge CFStringRef) theString);
primo::codecs::MetaAttribute* metaAttribute = primo::avblocks::Library::createMetaAttribute();
metaAttribute->setName(primo::codecs::Meta::Album);
metaAttribute->setValue(theStringPtr);
metaAttribute->release();
Command Line#
On Mac / Unix, wchar_t
is defined as a 32-bit unsigned integer by default. Conversion is required between your code (if using UTF-8 or UTF-32 encoding) and char_t
(UTF-16 encoding).
AVBlocks comes with the primo::ustring
helper class (defined in include/PrimoUString.h
) which can be used for string conversion. The MetaInfo
sample demonstrates the use of the primo::ustring
class:
void printMetadata(Metadata* meta)
{
MetaAttributeList* attlist = meta->attributes();
for (int i=0; i < attlist->count(); ++i)
{
MetaAttribute* attrib = attlist->at(i);
cout << setw(15) << left
<< attrib->name() << ": "
<< primo::ustring(attrib->value()) << endl;
}
}
Linux#
Command Line#
On Linux, wchar_t
is defined as a 32-bit unsigned integer by default. Conversion is required between your code (if using UTF-8 or UTF-32 encoding) and char_t
(UTF-16 encoding).
AVBlocks comes with the primo::ustring
helper class (defined in include/PrimoUString.h
) which can be used for string conversion. The MetaInfo
sample demonstrates the use of the primo::ustring
class:
void printMetadata(Metadata* meta)
{
MetaAttributeList* attlist = meta->attributes();
for (int i=0; i < attlist->count(); ++i)
{
MetaAttribute* attrib = attlist->at(i);
cout << setw(15) << left
<< attrib->name() << ": "
<< primo::ustring(attrib->value()) << endl;
}
}
ANSI Strings#
The text constants which are part of the AVBlocks API itself are plain 8-bit ANSI char
for convenience. For example, the primo::codecs::Meta
namespace contains metadata attribute names defined as ANSI strings:
/**
Metadata attribute names
*/
namespace Meta
{
/** Album */
static const char Album[] = "Album";
/** Composer */
static const char Composer[] = "Composer";
/** Genre */
static const char Genre[] = "Genre";
/** Copyright */
static const char Copyright[] = "Copyright";
}