Active GUI element
Static GUI element
Code
WPS object
File/Path
Command line
Entry-field content
[Key combination]
Maul Publisher goes Unicode
With the modern requirement to provide advanced editing features, Unicode becomes essential. Maul Publisher V3.06 is the first version to use the Unicode library found in recent versions of OS/2 and eComStation. But what does that mean to the end user?
What is Maul Publisher
Maul Publisher is an industrial strength desktop publisher capable of creating virtually all of the printing seen on everyday household items. You can use it to easily lay out newspapers, cards, books, labels, stamps, posters, charts, forms, and even designs like building plans or furniture arrangements.
In essence, the application combines text and images on a page. Once the page has been created, you can print it, create a PDF document with it, or turn it into an image or metafile. Maul has some uniquely powerful tools to deal with both pictures and text, and is specially designed to provide the best quality possible for a given printer.
Because the application tunes its output to the printer, you must have a printer installed. The resolution available to a printer is between four to eight times finer than that available on a screen, and because of this, and the fact that rounding is minimised, the printed output from Maul is usually stunningly clear.
What is Unicode
Unicode is designed to support a character set larger than 255 codepoints. This has several distinct advantages:
- The character set can support many languages
- More codepoints are available for symbols and special graphics
- The application has less processing to do with languages and codepages
- The Unicode API provides advanced character testing features
With version 3.06, Maul Publisher includes the OS/2 Unicode API and this has a significant impact on how the application decides where to place text, and what text to place.
Maul and Unicode
The Unicode API provides a much more useful set of character tests. This probably has very little impact for
western languages where words are separated by spaces, but provides major improvements for languages where words are
not separated by spaces, such as Japanese. By testing for _punctstart
and
_punctend
attributes, Maul can now correctly format pictogram strings in quotemarks.
The additional characters available in Unicode enable Maul to support smart quotes for the first time. I have
called them Intelligent text quotes, because smart quotes
is the phrase used by
MS Office. And anyway—Maul does it better:
Fig. 1. Intelligent text quotes example
The Unicode character set enables character lookup by name so the application doesn't need to know the codepoint of a particular character. This made it possible to add a bulleted (and numbered) list tool which considerably simplifies the addition of lists to your text articles.
Fig. 2. Bullets and lists example
Maul and character testing
Because Maul was developed by just one person—me—I can only guess which character properties to use when formatting a text article. This means that I must depend on the end user—you—to help determine what these character tests should be.
The character test is used to divide sentences up into words. The words then determine how much text fits onto a line. Where appropriate, hyphenation breaks words in two when they do not fit on a line.
Generally, for western languages the space character determines where a word ends. However, there
are situations where the space character is not available. This can happen where a comma is used to separate two
words, such as in hello,there.
Maul can break up this string by testing for characters with a
break
attribute. The space character is a classic example of a character that has a break
attribute.
For pictogram languages such as Japanese, every character has a break
attribute. This behaviour must
be modified when the pictogram is in quotes. This is achieved by using an attach
attribute. The
attach
attribute overrides the break
attribute of the previous character.
Characters that the Unicode API tests as _punctend
are marked with both a break
and an
attach
attribute. All the alphanumeric characters have no attributes set.
It turns out that you can break up a string into words in any language with just the two attributes described above. The example below shows how this works with some Kanji text. I have shown the attributes on the second line. Note that the Kanji string is not meant to mean anything in particular.
Fig. 3. Attributes example
Because of the attach
attributes ab above, the separated words include their closing
quotes.
If these attributes are wrong, for example bopomofo is not marked as breaking, then the text formatter will tend to fail. As I don't use these languages, I rely on you to tell me if it doesn't work!
Unicode limitations
Because Maul Publisher relies on zero based text escape sequences called LOLs, only UTF-8 Unicode is supported. The UTF-8 Unicode codepage number is 1208. UTF-8 Unicode is a format that can be processed by systems that work on a per character basis. It is distinguished by the fact that it never starts with a zero. The UTF-8 Unicode codeset can consist of codepoints of 5 or more bytes in size.
OS/2 supports codepoints of only 3 bytes at present, and Maul is designed with this in mind. This provides the full gamut of characters available in the Unicode compatible fonts available for OS/2. UTF-8 Unicode takes up more space than normal 16 bit Unicode, and to test characters the UTF-8 characters must first be converted into 16 bit Unicode.
The Insert character dialog provided in Maul Publisher shows the byte sequence as it is found in the file, the 16 bit Unicode equivalent, the character attributes, and where possible the name of the character.
The full list of character attributes as Maul displays them is:
CHARCLASS_BREAKING 0x001 CHARCLASS_ATTACH 0x002 CHARCLASS_SPACE 0x004 CHARCLASS_HYPHEN 0x008 CHARCLASS_QUOTE 0x010 CHARCLASS_RQUOTE 0x020
So the right quote in the example image above (Figure 3) has the code [0033]:
Fig. 4. Insert char dialog