David,
David Waitzman writes:
> As I expressed in my first message, users don't normally look at raw
> XML contents:
This is an interesting argument, and one that I've seen used quite frequently (I've used it myself on occasion).
When applied to the general question of text-vs-binary protocols, the answer tends to be entirely subjective. Taken to its extreme, you end up with ASN.1+PER or similar solutions.
I favour the protocol being usable in the default case. Even if that sort of usage seems unlikely, it's been hugely beneficial in debugging thus far.
> Your suggestion in msg08717.html of: [...]
> >>> octetsAsString = *(vcharx / escaped)
> >>> vcharx = %x21-5b / %x5d-7e ; VCHAR minus backslash
> >>> escaped = %x5c 2HEXDIG
>
> doesn't handle the UTF-8 encoding cases well.
LDAP solves this by adding UTFMB, a pattern that allows for multi-byte UTF-8 sequences:
octetsAsString = *(vcharx / escapted / UTFMB)
In XML, the easiest solution is slightly different, and even quite simple:
([^\\]|\\[0-9a-fA-F]{2})*
That is, everything other than backslash, or backslash plus two hex characters. To convert from a raw token value to a sequence of octets, UTF-8 encode all characters except backslash, then replace all backslash-escaped sequences with a single octet. In reverse, decode UTF-8 and backslash-escape anything that doesn't decode, plus backslash.
The benefits are clear enough:
<ssid>ManufacturerName</ssid>
...as opposed to:
<ssid>4d616e7566616374757265724e616d65</ssid>
...with the occasional:
<ssid>Wêird\5cNàmé</ssid>
> > p.s. It's not immediately clear, but a token can include any string
> content: <http://www.w3.org/TR/xml/#AVNormalize>. The text you cite is
> misleading.
>
> I looked that up and don't get your point. In <xs:element name="ssid"
> type="wifi:ssidBaseType" minOccurs="0"/> we not dealing with an
> Attribute's value.
You have to follow the thread of definitions: token is defined to have whiteSpace set to "collapse". This definition references whiteSpace, which references the attribute value normalization section of the XML specification.
You can check this behaviour with a conformant processor. Try decoding:
<foo xsi:type="xs:token">  
  </foo>
You should get three spaces.
> One additional question for you: what's the difference between "name"
> and "ssid" in wifiType?
Good question. I honestly can't remember now, and I can't find a good reason for the "name" field. A "name" doesn't feature prominently in 802.11-2007, so it must be from somewhere else. I'll ask around and try to find out for you, but I suspect that it's redundant and safe to remove.
--Martin
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv