> > <ssid>Wêird\5cNàmé</ssid>
>
> But there's a need to be defensive about illegal UTF-8 sequences,
> including 0 byte values.
> See rfc3629.txt section 6 and also
> http://etutorials.org/Programming/secure+programming/Chapter+3.+Input+V
> alidation/3.12+Detecting+Illegal+UTF-8+Characters/
>
> I suggest a more defensive algorithm: anything not printable ASCII gets
> hex encoded. That won't handle Wêird\5cNàmé but it will handle
> ManufacturerName and \20\00\ff\fbUgly\00\00\00.
The joy of writing a specification is that you don't have to specify what algorithm is used. Either algorithm would only be an example - you can use the more conservative algorithm, I can choose to provide a more complicated algorithm :)
Taking "Wêïrd\5cNàmé" you get [0x57, 0xC3, 0xAA, 0xC3, 0xAF, 0x72, 0x64, 0x4E, 0xC3, 0xA0, 0x6D, 0xC3, 0xA9]. You could just as easily go with "W\c3\aa\c3\afrd\5cN\c3\a0m\c3\a9"; or even "\57\c3\aa\c3\af\72\64\4e\c3\a0\6d\c3\a9". They all end up producing the same value. Certainly, the latter two choices are easier to write code for.
This is what I intend to add:
An SSID is a sequence of up to 32 octets. SSIDs are typically presented as strings, which are converted to an octet sequence using either ASCII or UTF-8 encoding. The "ssid" element is provided as a string that is converted to octets by UTF-8 encoding the string. Octet values can be expressed directly using a backslash ('\') followed by two hexadecimal digits.
The value of an SSID is the sequence of octets that
is produced from the concatenation of UTF-8 encoded sequences of
unescaped characters and octets derived from escaped components.
The XML will change to have a constraint:
<xs:pattern value="([^\\]|\\[\da-fA-F]{2}){1,32}"/>
--Martin
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv