Thursday, 29 November 2007

XML Entity definitions for Characters

In many contexts people find it convenient to enter characters that are not on the keyboard as entity references, such as &rightarrow; to get an arrow rather than remembering what keyboard shortcut or numeric reference (&#x2192;) would produce this. In many cases, life would be simpler if people did not do this: having entity references means that not only do you need a <!DOCTYPE declaration to reference a DTD that defines the entities, you need your XML parser to read the DTD, and it makes processing fragments of XML much harder, as either the fragments do not have a <!DOCTYPE (in which case they are not, themselves, well formed) and the fragment pasting operation needs to ensure that a suitable DTD reference is placed on the target document, or the fragments do have a doctype, and the fragment pasting needs to strip this off, and still ensure that the target document has a compatible DTD.

If fragments are being moved from one place to another this can be difficult. Consider moving a fragent of MathML from an XHTML document to Docbook for example. XHTML and Docbook define entities with in several case the same name but different definition (the original ISO definitions of the entity names did not give definitions in terms of Unicode characters) and older versions of Unicode did not have sufficient technical symbols to give sensible definitions for most of these names.

All of which preamble is just leading up to say Unicode 5.1 (Beta) does have suitable characters for all the ISO and MathML entities..

The Entity draft at has thus been updated to a new "2007" version, which we (the W3C Math WG) hope to submit to W3C as a new Recommendation track document shortly, but you can view my Editor's Draft here.

MathML3 will hopefully use these by reference, if (X)HTML (and possibly other non-W3C systems such as Docbook) could do the same, then hopefully we would finally have a set of entity names with widespread consistent use across multiple languages. Hopefully.

Over the years I've been maintaining these sets we've kept fairly regular contact with the STIX group and the tables of characters in the above document include characters typeset with the STIX Fonts (if you have them installed, and they work in your browser. (The plane 1 characters still fail for me in all browsers on windows).

Comments are welcome, either in this blog, or better on

Tuesday, 27 November 2007

More STIX Experiments

Having posted a small test file to the stix comment page , I noticed that the stix site has a larger test file (choose "STIX Font Glyph Tables" from the menu options on the STIX site, which will take you to

This differs from my test (apart from being rather larger) in that each cell is individually assigned an appropriate font with separate CSS classes rather than having a single CSS font list and relying on the font choice to fall through fonts that do not have the appropriate glyph. The results are different but still a bit disappointing.

Three images of my Windows XP setup, one of the Font directory just to show they are there, one of the top of the allGlyphs file showing side by side in Firefox, Opera 9.21 and IE 7.0.5730.11, note that only FF actually shows the bold and italic variants. then the same three browsers showing the bottom of the file. here FF is all white, Opera is white for some characters and two missing glyph markers for others, and IE is mainly white but strangely enough actually renders the last few entries, which are monospace digits.

The images link to larger screen dumps which show (or rather do not show) the glyphs more clearly.

Monday, 19 November 2007

Wednesday, 7 November 2007

Stix fonts: Initial comments

After a wait of many years the STIX fonts have finally been released as public beta! These offer the promise of much better, more portable support for scientific documents on the web and elsewhere. The STIX fonts provide a uniform set of fonts that provide the glyphs for almost all the mathematical characters that have been added to unicode in recent years.

Initial testing suggests there may still be some problems with plane 1 characters (I just submitted the following comment to the stix beta test comment form...

I tested browser support (windows XP: Firefox, IE 7, Opera) with similar results in all three browsers. The quadruple integral displayed in all three browsers. The plane 1 alphabetic character did not display at all, FF displays 2 missing glyph boxes, IE shows 1 missing glyph box, and opera just shows white space..

I am not able to tell at this stage whether this is user error (I have not specified a sufficiently large set of the STIX fonts in CSS ?) or if this is due to lack of plane 1 support in the browsers, or if there is a problem with the unicode tables in the stix fonts, so I supply the small test file below, which I expect to render as as



p {
font-family: STIXGeneral, STIXGeneral-Italic, STIXGeneral-Bold, STIXGeneral-BoldItalic;
<p>x2a0c &#x2a0c;</p>
<p>x1d400 &#x1d400;</p>