Tuesday, August 31, 2010

Re: [Geopriv] location obscuring

I'll take a stab at explaining the (current) known weaknesses in the algorithm that I have proposed. At this stage, I'm not going to propose countermeasures, but there are some possibilities.

==Proposed algorithm

When providing a new location, randomize the point to expand uncertainty to the required distance.

Only provide a new location when the centroid of the known location moves outside a trigger circle.

Trigger circle is defined at the same time as a new location, it has a radius of the required distance and its centre is up to half that distance from the centroid of the known location at that time.

==Weaknesses

(1)
Scenario:
The Target visits the same location multiple times over time.

Assumptions required by recipient:
The Target is visiting the same location.

Constraints on scenario:
Each time, location is only reported while at that location, not on the approach to, or leaving from the location. This requires that the Target be unable to located on approach to or exodus from the location. This constraint is met by having the means of location disabled while in transit - e.g. by turning off their phone.

Weakness:
Multiple samples can be combined over time to determine the location that is visited. For example, this might be the location of the Target's home.

(2)
Scenario:
The algorithm randomly produces two subsequent locations where the two centres of uncertainty are 2.5 times the distance (or approaching that number) apart.

Assumptions required by recipient:
The recipient must know the obscuring distance; a recipient might be able to assume this value by observing that previous reported locations have the same uncertainty.

Constraints on scenario:
The obscuring distance must have been revealed previously. This is possible if the uncertainty of the known location does not exceed the obscuring distance.

The location must be known with low (or very low) uncertainty at the time that each of the subsequent obscured locations are produced.

Random chance determines that this scenario is unlikely.

Weakness:
At the time that the location is produced, the recipient knows that there is a point within the new reported location that is 1.5 times the obscuring distance from a point within the last reported location.

Given that the distance between the two locations is such that there is only a small region within each location that meets this constraint, the location of the Target is known with uncertainty approaching that of the known location.

(3)
Scenario:
Movement.

Assumptions required by recipient:
Approximately straight line travel over the interval, or

Travel along an identified route. Even a relatively large obscuring distance might not prevent a recipient from identifying that the Target is travelling along a particular highway over time.

Constraints on scenario:
Constant or relatively constant travel speed.

Weakness:
Average speed of travel can be estimated. The longer the distance between two locations, the less uncertainty present in the estimate.

An upper or lower bound on speed might be determined. If, as in the previous scenario, two locations are produced with significant distance between them, it might be possible place a lower bound on speed over an interval. From this, it might be possible to identify that the Target is exceeding a speed limit.

These need not be consecutive reported locations.


There may be other flaws, but these are the ones that I've been able to identify so far.

--Martin
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv

Re: [Geopriv] location obscuring

This sort of analysis is interesting, but I'm trying to imagine what sort of outcome you're hoping to achieve.

For 2, I'm not yet convinced of the conclusion that there is no perfect solution. The model is still too loosely defined to make that assessment. In all cases, there is some information revealed; but the question is to what extent that information can be considered significant.

Now, we could spend a lot of time on this sort of analysis, but that's not helping to address the immediate need for a solution.

From my perspective, and with the work as it is, I'd be more interested in a shooting gallery. Propose an algorithm. Describe what information is revealed and under what circumstances and maybe assumptions. Then we pick one and keep the shortcomings visible. We could pick more than one and leave it to the policy to dictate which algorithm is applied, but that is far from optimal for many reasons.

--Martin

> -----Original Message-----
> From: Jorge Cuellar [mailto:jrcuellarj@googlemail.com]
> Sent: Wednesday, 1 September 2010 6:25 AM
> To: Richard L. Barnes
> Cc: Thomson, Martin; geopriv@ietf.org; rp3@u.washington.edu;
> awclark@u.washington.edu; alomair@uw.edu
> Subject: Re: [Geopriv] location obscuring
>
> Hi Richard and all,
>
> instead of trying to present the whole idea in one piece,
> let me discuss the problem in shorter mails.
>
> Today:
> 1. The Problem to be solved
> 2. Why there is no perfect solution (and still, ones are better than
> others)
>
> ---
> 1. The Problem to be solved
>
> A user wants to obscure his location: even if the system
> knows his location with great preciseness, the user wants
> that his location is presented (to certain users or
> applications) as a region with a certain "pretended
> uncertainty" or "obscurity distance". Instead of saying
> "Peter is in this point", the algorithm should return
> "Peter is in this region".
>
> There are two different locations, which are some regions
> in space, the original location and the reported one. A
> condition we have to meet is that the reported location
> must contain the original location.
>
> We will discuss the case of the locations as being circles.
> Their radius is the uncertainty associated with them.
>
> - The original location can be precise (a circle of radius
> 0) or imprecise. Without loss of generality we may assume
> that the original location is precise. (If the requested
> obscurity is smaller than the original uncertainty, we
> must live with the original uncertainty).
>
> - The reported location (which is, if needed, obscured, by
> adding a "pretended uncertainty"). Without loss of
> generality we may assume that the reported location is
> imprecise and has a radius chosen by the user.
>
> Thus we will assume that:
>
> - the original location is a point "o" and
>
> - the reported location is a circle C(r,d) with center "r"
> and radius "d".
>
> Let us call:
>
> r= reported center
> o= original location (or original center)
> d= reported uncertainty, or "distance"
>
> The difference of the two centers is a vector:
>
> reported center - original center
>
> that we may call the "shift in reporting". Its length (or
> absolute value or norm) can be called "distance to reported
> center" and its direction (angle to some "x-axis") the
> "direction of the shift".
>
> What we want is an algorithm that given
> o= a precise original location and
> d= a distance
> provides a reported location, which is a circle of radius d
> around a point
> r= reported center
> in such a way that this circle contains the original
> location and "leaks as less information about the real
> location" as possible.
>
> Thus the algorithm has inputs o and d and produces r.
>
> An initial design would be:
> Algo1: choose randomly any point within the distance d from
> o and report the circle of the required radius d around this
> point. In the moment that the user leaves this region,
> choose another r.
>
> ---
> 2. Why there is no perfect solution (and still, ones are better than
> others)
>
> The algorithm Algo1, when the location changes, leaks a lot
> of information about the location of the user. Anyone
> observing the reported locations knows that the user has
> just left one circle in an arc contained by the new circle.
> Thus we know that the user is (or just was) in a "region"
> (the arc) of area 0. We say, the algorithm "leaks 0-area"
> locations.
>
> In principle, this kind of problem is unavoidable. One
> would like to have an algorithm that never discloses that a
> user was in an area of size 0 (that is, does not leak
> 0-area). One would also like not to disclose the velocity
> of the movement.
>
> It is impossible to obtain such an algorithm. One reason
> why there are no perfect algorithms is because there is
> public information about streets, planes, speed limits,
> etc.
>
> As to the velocity: suppose I want to hide my location
> within 10 miles, but I a moving at 80 mi per hour. If an
> observer sees my reported locations, he may, no matter how
> I choose the set of reported centers r1, r2, r3, ..., if he
> sees enough measurements, he can calculate quite accurately
> my (mean) velocity.
>
> Two examples are show that it is impossible not to leak
> "0-area" when continuously moving: if my location is
> reported to be Paris (+- 100 miles) at 10 am and New York
> at 6 pm of the same day (+- 100 miles), then you may
> conclude (with a very high probability, and that is what
> matters) that at some point of the day I was over the
> meridian of Greenwich, which has area 0. Or: if you find my
> location to be moving west at roughly 80 mi per hour in an
> area where there is only one highway (say, in a country
> with speed limit 80 mi per hour), then any algorithm
> discloses quite precisely my exact location with any
> precision after enough time.
>
> Of course, Algo1 is worse than, say:
>
> Algo2: choose randomly any point r within the distance d
> from o and report the circle of the required radius d
> around this point. Choose randomly a second radius, the
> "trigger radius", a value between 0 and d. In the moment
> that the user leaves the "trigger circle", C(r, d1), choose
> another r.
>
> This one is better because when the reported location of
> the user changes continuously, the algorithm does not leak
> 0-area by providing those two reported values. (But as
> mentioned before, an observer may notice than in a certain
> time span the user crossed a border or another region of
> area 0).
>
> This algorithm, as many others have a different problem
> (which, I think, is worse from the point of view of privacy
> than 0-area leakage): suppose I go once a week to a
> certain town, to visit somebody, say a friend or a business
> partner. (Or every night I go home, or everyday I go to
> work, etc..). I let my location to be know with an
> uncertainty of, say, 10 miles. So the algorithm chooses
> some location r within 10 mi and reports a circle around
> this r. But after 20 times, the 20 different circles it
> provided intersect in a small region.
>
> Thus observers obtain information about what the user
> regularly does. That is bad.
>
> This problem has all algorithms that create a shift in
> reporting (recall: the vector vector r - o) randomly every
> time. Of course, the shift in reporting can not be chosen
> in a deterministic way, even if the value depends on some
> secret of the user (those secrets will leak much to quickly
> fro many reasons. For instance, the exact location of the
> person can be observed at certain places, etc).
> ---
>
> ciao,
> Jorge
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv

Re: [Geopriv] location obscuring

Hi Richard and all,

instead of trying to present the whole idea in one piece,
let me discuss the problem in shorter mails.

Today:
1. The Problem to be solved
2. Why there is no perfect solution (and still, ones are better than others)

---
1. The Problem to be solved

A user wants to obscure his location: even if the system
knows his location with great preciseness, the user wants
that his location is presented (to certain users or
applications) as a region with a certain "pretended
uncertainty" or "obscurity distance". Instead of saying
"Peter is in this point", the algorithm should return
"Peter is in this region".

There are two different locations, which are some regions
in space, the original location and the reported one. A
condition we have to meet is that the reported location
must contain the original location.

We will discuss the case of the locations as being circles.
Their radius is the uncertainty associated with them.

- The original location can be precise (a circle of radius
0) or imprecise. Without loss of generality we may assume
that the original location is precise. (If the requested
obscurity is smaller than the original uncertainty, we
must live with the original uncertainty).

- The reported location (which is, if needed, obscured, by
adding a "pretended uncertainty"). Without loss of
generality we may assume that the reported location is
imprecise and has a radius chosen by the user.

Thus we will assume that:

- the original location is a point "o" and

- the reported location is a circle C(r,d) with center "r"
and radius "d".

Let us call:

r= reported center
o= original location (or original center)
d= reported uncertainty, or "distance"

The difference of the two centers is a vector:

reported center - original center

that we may call the "shift in reporting". Its length (or
absolute value or norm) can be called "distance to reported
center" and its direction (angle to some "x-axis") the
"direction of the shift".

What we want is an algorithm that given
o= a precise original location and
d= a distance
provides a reported location, which is a circle of radius d
around a point
r= reported center
in such a way that this circle contains the original
location and "leaks as less information about the real
location" as possible.

Thus the algorithm has inputs o and d and produces r.

An initial design would be:
Algo1: choose randomly any point within the distance d from
o and report the circle of the required radius d around this
point. In the moment that the user leaves this region,
choose another r.

---
2. Why there is no perfect solution (and still, ones are better than others)

The algorithm Algo1, when the location changes, leaks a lot
of information about the location of the user. Anyone
observing the reported locations knows that the user has
just left one circle in an arc contained by the new circle.
Thus we know that the user is (or just was) in a "region"
(the arc) of area 0. We say, the algorithm "leaks 0-area"
locations.

In principle, this kind of problem is unavoidable. One
would like to have an algorithm that never discloses that a
user was in an area of size 0 (that is, does not leak
0-area). One would also like not to disclose the velocity
of the movement.

It is impossible to obtain such an algorithm. One reason
why there are no perfect algorithms is because there is
public information about streets, planes, speed limits,
etc.

As to the velocity: suppose I want to hide my location
within 10 miles, but I a moving at 80 mi per hour. If an
observer sees my reported locations, he may, no matter how
I choose the set of reported centers r1, r2, r3, ..., if he
sees enough measurements, he can calculate quite accurately
my (mean) velocity.

Two examples are show that it is impossible not to leak
"0-area" when continuously moving: if my location is
reported to be Paris (+- 100 miles) at 10 am and New York
at 6 pm of the same day (+- 100 miles), then you may
conclude (with a very high probability, and that is what
matters) that at some point of the day I was over the
meridian of Greenwich, which has area 0. Or: if you find my
location to be moving west at roughly 80 mi per hour in an
area where there is only one highway (say, in a country
with speed limit 80 mi per hour), then any algorithm
discloses quite precisely my exact location with any
precision after enough time.

Of course, Algo1 is worse than, say:

Algo2: choose randomly any point r within the distance d
from o and report the circle of the required radius d
around this point. Choose randomly a second radius, the
"trigger radius", a value between 0 and d. In the moment
that the user leaves the "trigger circle", C(r, d1), choose
another r.

This one is better because when the reported location of
the user changes continuously, the algorithm does not leak
0-area by providing those two reported values. (But as
mentioned before, an observer may notice than in a certain
time span the user crossed a border or another region of
area 0).

This algorithm, as many others have a different problem
(which, I think, is worse from the point of view of privacy
than 0-area leakage): suppose I go once a week to a
certain town, to visit somebody, say a friend or a business
partner. (Or every night I go home, or everyday I go to
work, etc..). I let my location to be know with an
uncertainty of, say, 10 miles. So the algorithm chooses
some location r within 10 mi and reports a circle around
this r. But after 20 times, the 20 different circles it
provided intersect in a small region.

Thus observers obtain information about what the user
regularly does. That is bad.

This problem has all algorithms that create a shift in
reporting (recall: the vector vector r - o) randomly every
time. Of course, the shift in reporting can not be chosen
in a deterministic way, even if the value depends on some
secret of the user (those secrets will leak much to quickly
fro many reasons. For instance, the exact location of the
person can be observed at certain places, etc).
---

ciao,
Jorge
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv

[Geopriv] IPR Disclosure: Qualcomm Incorporated's Statement about IPR related to draft-ietf-geopriv-arch-02

Dear John Morris, Hannes Tschofenig, Richard Barnes, Henning Schulzrinne, Matt Lepinski, Alissa Cooper:

An IPR disclosure that pertains to your Internet-Draft entitled "An Architecture
for Location and Location Privacy in Internet Applications"
(draft-ietf-geopriv-arch) was submitted to the IETF Secretariat on 2010-08-30
and has been posted on the "IETF Page of Intellectual Property Rights
Disclosures" (https://datatracker.ietf.org/ipr/1394/). The title of the IPR
disclosure is "Qualcomm Incorporated's Statement about IPR related to
draft-ietf-geopriv-arch-02."

The IETF Secretariat


_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv

Saturday, August 28, 2010

Re: [Geopriv] location obscuring

Hi Richard,

the problem with the algorithm that Martin proposes is the following:
suppose I go once a week to a certain town, say Berlin to visit
somebody. I let my location to be know with an uncertainty of 5 kms.
So the algorithm chooses some location within 5 kms (or 10) and says
:"I am within a circle of 5 km (or 10)". But after 10 times, the 10
different circles it provided intersect in a small region. (Or every
night I go home, or everyday I go to work, etc..)

In my algorithm I am not returning the hexagons. This part of the
algorithm is only to determine the center (the "landmark"), but the
region provided as location is much larger than the hexagon. So the
regions intersect (and their intersect is not too small). I hoped this
was clear in my first mail, but probably it was not. I will have time
again on Monday to discuss that in more detail.

cheers, Jorge

On Fri, Aug 27, 2010 at 7:09 PM, Richard L. Barnes <rbarnes@bbn.com> wrote:
> Hey Jorge,
>
> I enjoyed your ASCII art, and this is an interesting idea, but seems to have
> a lot more complexity?  Could you clarify what you think the benefit of this
> system is over the one Martin had proposed?  I actually think it's quite a
> bit worse, in that it misses the point of Martin's algorithm.
>
> Here's my thinking:
>
> 1. There's not really any point to choosing the area returned
> probabilistically, since you can get both areas with repeated queries.  You
> might as well return the union of the two hexagons.
>
> 2. There's actually no point to returning the second hexagon either, since a
> moving target will reveal which hexagon he was in.  For example, zooming in
> on the a piece of the lattice, suppose an entity X moves as indicated:
>
>  \  A  /
>   \   /
>    \ /
>     +
>  X---->X
> B    |    C
>     |
>     +
>    / \
>   /   \
>  /  D  \
>
> At first, the algorithm will return A+B, and afterward, it will return A+C,
> so an observer can deduce that the target has moved from B to C, and can
> discard A as a possible location.
>
> 3. Even worse, based on the timing of the change, the observer can tell that
> the target is somewhere near the boundary (H below), obviously an area much
> smaller than A, B, C, or D, much less A+B or A+C.
>
>  \  A  /
>   \   /
>    \ /
>     +
>     HX
> B    H    C
>     H
>     +
>    / \
>   /   \
>  /  D  \
>
> This is exactly the problem that Martin's algorithm solves with the "hidden
> trigger" concept.  That feature of the algorithm prevents an observer from
> isolating the target to a boundary, so the problem (3) above doesn't arise.
>
> --Richard
>
>
> On Aug 27, 2010, at 12:08 PM, Jorge Cuellar wrote:
>
>> There is a simpler version of (a slight modification) of
>> the algorithm I proposed with the grid.
>>
>> The idea again is to have a set of "landmarks" that are the
>> "centers" of regions used as "possible reported locations".
>>
>> Consider the area you want to tile to be a flat surface. (A
>> tessellation or tiling of a surface is a collection of
>> figures that fills the original one with no overlaps or
>> gaps). We are interested on tiling with approximately
>> regular triangles of approximately the same size. The flat
>> case this is easy: tile the plane in the standard way with
>> regular (equilateral) triangles of the same size. In this way
>> you get, with a suitable choice of coordinates, the grid
>> points:
>>
>> (e x n, d x m) (n, and m range over the integers,
>>                with n+m = even
>>                e and d are numbers with
>>                   d/e =((square root of three) /2),
>>                where "x" represents multiplication and
>>                 "/" division).
>>
>> An example is given in the figure below, where the grid
>> points are shown as asterisks "*". Thus, the points have
>> coordinates (starting, say, at the bottom left with (0,0):
>>
>>  (0,0),  (0,2d),  (0,4d),  ...
>>  (e,d),  (e,3d),  (e,5d),  ...
>>  (2e,0), (2e,2d), (2e,4d), ...
>>  (3e,d), (3e,3d), (3e,5d), ...
>>
>> Those points G ("*") are the "landmarks".
>>
>>         *                   *                   *
>>
>>
>>
>>
>>
>>
>>
>> *                   *                   *                   *
>>
>>
>>
>>
>>
>>
>>
>>         *                   *                   *
>>
>>
>>
>>
>>
>>
>>
>> *                   *                   *                   *
>>
>> Insert now the midpoints of the sides of the triangle. They
>> are(shown in the figure below as dots, ".".  Each of those
>> points is equidistant to exactly two landmarks.
>>
>>
>> .         *         .         *         .         *         .
>>
>>
>>
>>    .         .         .         .         .         .
>>
>>
>>
>> *         .         *         .         *         .         *
>>
>>
>>
>>    .         .         .         .         .         .
>>
>>
>>
>> .         *         .         *         .         *         .
>>
>>
>>
>>    .         .         .         .         .         .
>>
>>
>>
>> *         .         *         .         *         .         *
>>
>> Let us call S the set of all those points "+" or "*". Now
>> consider, for each point in S, the set of points that are
>> closest to a point x in S than to any other point of S.
>> (This is the Voronoi diagram of S).  This divides the region
>> into hexagons as shown in the figure below.  Each oint in
>> the interior of a hexagon is closet to the center of the
>> hexagon than to any other point "+" or "*".
>>
>> There are two types of hexagons: ones whose center is a
>> landmark ("*") and ones whose center "." is equidistant to
>> two landmarks. The points in the first type of hexagons are
>> said to be "clearly closer to a landmark" (the center of the
>> hexagon), while the points in the other type of hexagon are
>> said to be "clearly between two landmarks".
>>
>> .    |    *    |    .    |    *    |    .    |    *    |    .
>>    +         +         +         +         +         +
>>  /    \    /    \    /    \    /    \    /    \    /    \
>> +         +         +         +         +         +         +
>> |    .    |    .    |    .    |    .    |    .    |    .    |
>> +         +         +         +         +         +         +
>>  \    /    \    /    \    /    \    /    \    /    \    /
>>    +         +         +         +         +         +
>> *    |    .    |    *    |    .    |    *    |    .    |    *
>>    +         +         +         +         +         +
>>  /    \    /    \    /    \    /    \    /    \    /    \
>> +         +         +         +         +         +         +
>> |    .    |    .    |    .    |    .    |    .    |    .    |
>> +         +         +         +         +         +         +
>>  \    /    \    /    \    /    \    /    \    /    \    /
>>    +         +         +         +         +         +
>> .    |    *    |    .    |    *    |    .    |    *    |    .
>>    +         +         +         +         +         +
>>  /    \    /    \    /    \    /    \    /    \    /    \
>> +         +         +         +         +         +         +
>> |    .    |    .    |    .    |    .    |    .    |    .    |
>> +         +         +         +         +         +         +
>>  \    /    \    /    \    /    \    /    \    /    \    /
>>    +         +         +         +         +         +
>> *    |    .    |    *    |    .    |    *    |    .    |    *
>>
>> Now, given a point in the plane choose either the landmark
>> clearly closest to it or one of the two closest each with
>> probability 1/2.
>>
>> There is a variant of this for a square grid (but having up
>> to 4 "closest landmarks"). It is relatively straight-forward
>> to give the formulas for the landmarks associated with a
>> point in the plane. I will expand on this on the following
>> days.
>>
>> Ciao, Jorge
>>
>> On Thu, Aug 26, 2010 at 1:49 PM, Jorge Cuellar
>> <jrcuellarj@googlemail.com> wrote:
>>>
>>> Fine! I still have two questions:
>>>
>>> 1. Is it safe to assume that the algorithm that generates
>>> the reported locations has a memory? It seems that the
>>> algorithm checks somehow, (it doesn't matter how) if we are
>>> still fine with the last reported location we still report
>>> it, and if not, we report a new location. It would be good
>>> if we could rely on this: you will be giving much more
>>> information away if you did not remember what your last
>>> reported location was.
>>>
>>> 2. Good random number generation is not easy. Shall we say
>>> something about it? (Most hashes are fine: but hashes of
>>> what?)
>>>
>>> -----------------------------------------------------------
>>> I would prefer an algorithm that tells that I am in Berlin
>>> (say a circle with center in the center of Berlin) than one
>>> that gives each time a different circle with a center
>>> randomly distributed around my real location. The averages
>>> (plus some clustering algorithms) of those values would
>>> provide a high precision on the places I visit in Berlin.
>>>
>>>> The problem is that the (real) location can vary
>>>> slightly, or you might move a little.   If it does vary
>>>> slightly, even if this is well below the obfuscation
>>>> distance, then the random number is completely altered.
>>>> I haven't found a good way around this particular
>>>> problem.  Anything you do to stabilize the
>>>> direction/length of the shift vector only leads to the
>>>> ability to learn what the shift vector is.
>>>
>>> Thus I need one algorithm that chooses out of my location a
>>> "landmark" that is a good approximation of my location (and
>>> of all locations in a neighborhood).
>>>
>>> The idea is: if I am "clearly closer" to a landmark: choose it.
>>> If not, choose the two closest landmarks. Then I am "clearly
>>> between those two landmarks".  I will choose anyone
>>> either memoryless with prob 1/2 or with a bias toward my
>>> previously reported location.
>>>
>>> Consider the following algorithm sketch:
>>>
>>> Divide the problem in two steps:
>>>
>>> 1. What are the locations I will be willing to have a
>>> possible reported locations?  The centers of those possible
>>> reported locations are called "landmarks".
>>>
>>> 2. Given a real measurement, how to choose a landmark and
>>> therefore a reported location?
>>>
>>> 1st step:
>>>
>>> Given (input) a reported uncertainty that one wants to
>>> meet, first associate to them a grid G which is a set of
>>> points ("landmarks" or "possible reported centers") and a
>>> set A (atlas) of regions (say, circles, the "possible
>>> reported locations"). Notice that the associations:
>>>
>>>  uncertainty --> G = grid = Set of "landmarks" or
>>>                           "possible reported locations"
>>>
>>>  uncertainty --> A = atlas = Set of "possible reported
>>>                            locations"
>>>
>>> are independent of any measured location. It is possible to
>>> publish, for given value ranges of uncertainty, the list G
>>> of landmarks (or, even better, the list of "blocks", see
>>> below).
>>>
>>> An atlas is a set of regions that cover all possible places
>>> (locations with uncertainty 0). I would restrict the
>>> possible set of atlases by saying that the have to have a
>>> certain degree of coverage (or degree of overlap). Although
>>> not necessarily every point has to be in several regions of
>>> the atlas, the overlaps between neighboring regions has to
>>> be relatively large. This will be discussed later in more
>>> detail.
>>>
>>> This can be done in many ways. For instance:
>>>
>>> a. the "possible reported centers" could be just the points
>>> in a grid in some geographic coordinate system (say,
>>> latitude and longitude) with a given degree of precision
>>> (significant digits). For instance: represent the latitude
>>> and longitude of a point as a number between 0 and 1 in
>>> base two (more convenient than base 10). Consider first all
>>> points with 1 decimal, than all with two decimal places,
>>> than all with three, etc. For each point considered (this
>>> is one "possible reported center") create the corresponding
>>> "possible reported locations" (which can be, say, a
>>> circle). You continue adding points until you get the degree
>>> of coverage wanted.
>>>
>>> b. You can choose the grid points to be the centers of
>>> "large" cities, towns, or other important landmarks in
>>> maps. Again, we call those points in the maps "landmarks".
>>> Assume you have an ordering of landmarks by importance
>>> (say: size of the city in habitants). Given a certain
>>> targeted uncertainty, create G and A as follows: add a
>>> landmark as a "possible reported center" and the
>>> corresponding "possible reported locations" as long as you
>>> need it to cover the area in question with the expected
>>> degree of coverage.
>>>
>>> For many applications I myself would choose in Germany the
>>> reported uncertainty to be, say, 100 or 300 km and the
>>> "possible reported centers" to be the center of some "large
>>> cities".
>>>
>>> 2nd step:
>>>
>>> Now: given a measured location, a grid G and an atlas A,
>>> the question is how to choose the region R in the atlas
>>> that will be used as reported location.
>>>
>>> You can do it like this:
>>> if the measured location is in only one region R, report
>>> this as the location.
>>>
>>> Else, determine c1 and c2 the two closest landmarks, i.e,
>>> "possible reported centers": at measured distances d1 and
>>> d2. If d1 is "clearly smaller" than d2, choose c1, and
>>> report the region centered at c1. "Clearly smaller" means
>>> that the quotient d1/d2 is less than 0.8 (or other some
>>> positive number, smaller than 1, to be chosen carefully a
>>> priori). Similarly for d2/d1<1.3. If d1 and d2 are "about
>>> the same", choose c1 with probability 1/2, c2 with
>>> probability 1/2. In this case we say that the point is
>>> "clearly between the two landmarks c1 and c2.
>>>
>>> The idea is that each point is either "close to one
>>> landmark" or "between two landmarks". This partitions the
>>> whole space into connected (even: convex) "blocks" (let us
>>> call them so). Thus all points in one block are all clearly
>>> closer to the same landmark or clearly between two given
>>> landmarks.
>>>
>>> What do you think?
>>> -- Jorge
>>>
>>> On Tue, Aug 24, 2010 at 7:38 AM, Thomson, Martin
>>> <Martin.Thomson@andrew.com> wrote:
>>>>
>>>> Hi Jorge,
>>>>
>>>> My email was a little loose in its use of language, so I can understand
>>>> how you might have trouble understanding this.
>>>>
>>>> More inline...
>>>>
>>>>> -----Original Message-----
>>>>> From: Jorge Cuellar [mailto:jrcuellarj@googlemail.com]
>>>>> Sent: Tuesday, 24 August 2010 1:14 AM
>>>>> To: Thomson, Martin; rjsparks@nostrum.com; Hannes.Tschofenig@gmx.net;
>>>>> geopriv@ietf.org
>>>>> Subject: Re: [Geopriv] location obscuring
>>>>>
>>>>> Hi Martin, Hannes, Robert,
>>>>>
>>>>> I would like to understand you discussions on "location
>>>>> obscuring".  First, let me ask you a couple of things :
>>>>>
>>>>> 1. What is your vocabulary / notation?? Assume there is no
>>>>> uncertainity, first. You have one "real location" (?) and two
>>>>> circles, a "trigger circle" and a "reported circle" (obfuscated
>>>>> circle? obfuscated location?). The centres of those two
>>>>> circles ("trigger centre / reported centre" ??) are, I guess,
>>>>> different (if not, I see some problems).
>>>>
>>>> There are three locations:
>>>>
>>>>  - the original location ("real" might be a little too strong here)
>>>>  - the reported location
>>>>  - the trigger location
>>>>
>>>> The first two probably have uncertainty associated with them (imagine a
>>>> circle).  The trigger location doesn't have uncertainty in the same way, but
>>>> can also be described with a corresponding circle enclosing it.  Trigger and
>>>> reported are different (though there is a slim chance that they are the
>>>> same...).
>>>>
>>>>> The vectors:
>>>>>
>>>>>  trigger centre  - real location   and
>>>>>  reported centre - real location
>>>>>
>>>>> let us call them "shift in reporting" / "shift for trigger" have
>>>>> lenght (or absolute value or norm) "distance to trigger centre"
>>>>> / "distance to reported centre". The direction (angle to
>>>>> some "x-axis" of those vectors are the "Direction of the shift
>>>>> in reporting" and "Direction of the shift for trigger".
>>>>
>>>> That's a good description.
>>>>
>>>>>
>>>>> I read the formula:
>>>>>
>>>>>  Direction = 2 * pi * random
>>>>>
>>>>> as an angle (random is a random number on [0,1]). And
>>>>>
>>>>>  direction * distance
>>>>>
>>>>> as a vector (in radians, of lenght distance at an angle direction).
>>>>
>>>> Correct; I was using shorthand - to the detriment of clarity :(
>>>>
>>>>> 2. Have you thought about the following problem? Suppose each day
>>>>> I commute from home to work. They are sufficiently apart so that
>>>>> the 2 "reported circles" have to be disjoint. Suppose I turn off
>>>>> my device during the trip, but torn it on as soon as I changed my
>>>>> location. All algorithms you have been discussing, as far as I
>>>>> see, provide every evening, when I come back home, a
>>>>> new "reported circle" centred around home plus a vector of fixed
>>>>> lenght with random direection. True?? Thus if anyone sees my
>>>>> provided locations all evenings he will know rather fast where
>>>>> exacly I live (by averaging the random shifts).
>>>>
>>>> That's one that didn't factor into this, though perhaps it should have.
>>>>  Though this relies on the assumption that each night you are going to the
>>>> same place, that's a pretty easy conclusion to come to.
>>>>
>>>> One of the original ideas was to seed a pseudorandom number with the
>>>> (real) location.  A hash function would be a good substitute.  This would
>>>> produce the same shift vector for the same location.
>>>>
>>>> The problem is that the (real) location can vary slightly, or you might
>>>> move a little.   If it does vary slightly, even if this is well below the
>>>> obfuscation distance, then the random number is completely altered.  I
>>>> haven't found a good way around this particular problem.  Anything you do to
>>>> stabilize the direction/length of the shift vector only leads to the ability
>>>> to learn what the shift vector is.
>>>>
>>>>> 3. In the formula:
>>>>>
>>>>>  Distance = sqrt(random(0, max(0, distance - existing uncertainty)))
>>>>>
>>>>> you have twice distance (one with capitals): is that intended?
>>>>> Later you use only the non-capitalized one (?)
>>>>
>>>> That's a mistake, the second is the obfuscation distance, the first is
>>>> the resulting shift.
>>>>
>>>>> 4. I did not understand this:
>>>>> This is the same as below, except: (exept what??? and notation below?)
>>>>>
>>>>>    reallyFuzz: function(cu) {
>>>>>        var wobble = Math.max(0, this.distance - cu.uncertainty);
>>>>>        var centre = this.randomize(cu.centroid, wobble);
>>>>>        var unc = Math.max(cu.uncertainty, this.distance);
>>>>>        this.fuzzed = new GeoShape.Circle(centre, unc);
>>>>>        this.triggerPoint = this.randomize(centre, this.distance/2);
>>>>>    }
>>>>
>>>> This is a replacement for the original Javascript code I used to
>>>> describe the algorithm.  The complete (original) is here:
>>>>
>>>>  <http://www.ietf.org/mail-archive/web/geopriv/current/msg08630.html>
>>>
>> _______________________________________________
>> Geopriv mailing list
>> Geopriv@ietf.org
>> https://www.ietf.org/mailman/listinfo/geopriv
>
>
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv

Friday, August 27, 2010

[Geopriv] Fwd: [secdir] [new-work] Proposed W3C Charter: Points of Interest Working Group (until 2010-09-24)

This may be of interest to some folks.
--Richard

Begin forwarded message:

From: Ian Jacobs <ij@w3.org>
Date: August 26, 2010 2:35:09 PM EDT
Subject: [secdir] [new-work] Proposed W3C Charter: Points of Interest Working Group (until 2010-09-24)

Hello,

Today W3C Advisory Committee Representatives received a Proposal to revise the Ubiquitous Web Applications Activity [0] (see the W3C Process Document description of Activity Proposals [1]). This proposal includes a draft charter for the Points of Interest Working Group:
 http://www.w3.org/2010/POI/charter/

As part of ensuring that the community is aware of proposed work at W3C, this draft charter is public during the Advisory Committee review period.

W3C invites public comments through 2010-09-24 on the proposed charter. Please send comments to public-new-work@w3.org, which has a public archive:
 http://lists.w3.org/Archives/Public/public-new-work/

Other than comments sent in formal responses by W3C Advisory Committee Representatives, W3C cannot guarantee a response to comments. If you work for a W3C Member [2], please coordinate your comments with your Advisory Committee Representative. For example, you may wish to make public comments via this list and
have your Advisory Committee Representative refer to it from his or her formal review comments.

If you should have any questions or need further information, please contact Matt Womer, Team Contact <mdw@w3.org>.

Thank you,

Ian Jacobs, Head of W3C Communications

[0] http://www.w3.org/2007/uwa/Activity
[1]
http://www.w3.org/2005/10/Process-20051014/activities#ActivityCreation
[2] http://www.w3.org/Consortium/Member/List



--
Ian Jacobs (ij@w3.org)    http://www.w3.org/People/Jacobs/
Tel:                                      +1 718 260 9447

_______________________________________________
new-work mailing list
new-work@ietf.org
https://www.ietf.org/mailman/listinfo/new-work
_______________________________________________
secdir mailing list
secdir@ietf.org
https://www.ietf.org/mailman/listinfo/secdir

Re: [Geopriv] location obscuring

> 1. Is it safe to assume that the algorithm that generates
> the reported locations has a memory? It seems that the
> algorithm checks somehow, (it doesn't matter how) if we are
> still fine with the last reported location we still report
> it, and if not, we report a new location. It would be good
> if we could rely on this: you will be giving much more
> information away if you did not remember what your last
> reported location was.

This doesn't seem like a bad assumption to me. If the sever is
already storing policy information for a target, caching a location
value or a circe is only a very few more bytes.


> 2. Good random number generation is not easy. Shall we say
> something about it? (Most hashes are fine: but hashes of
> what?)

Good random number generation is actually relatively easy with modern
crypto libraries. I don't think this should be a concern.


> -----------------------------------------------------------
> I would prefer an algorithm that tells that I am in Berlin
> (say a circle with center in the center of Berlin) than one
> that gives each time a different circle with a center
> randomly distributed around my real location. The averages
> (plus some clustering algorithms) of those values would
> provide a high precision on the places I visit in Berlin.

You can do that with Martin's algorithm, since any location value that
contains the circle computed with his algorithm can be returned. The
location server is free to choose, e.g., circles centered on
landmarks, as long as these circles are big enough to contain the
rough location computed according to the algorithm.


>> The problem is that the (real) location can vary
>> slightly, or you might move a little. If it does vary
>> slightly, even if this is well below the obfuscation
>> distance, then the random number is completely altered.
>> I haven't found a good way around this particular
>> problem. Anything you do to stabilize the
>> direction/length of the shift vector only leads to the
>> ability to learn what the shift vector is.
>
> Thus I need one algorithm that chooses out of my location a
> "landmark" that is a good approximation of my location (and
> of all locations in a neighborhood).
>

Given that the location server can make use of landmark-based location
values, as I describe above, the choice of landmarks seems like an
implementation detail. As long as the choice meets the requirement
that the rough location be covered, it won't have any privacy impact,
so we don't need to specify it.

--Richard
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv

Re: [Geopriv] location obscuring

Hey Jorge,

I enjoyed your ASCII art, and this is an interesting idea, but seems
to have a lot more complexity? Could you clarify what you think the
benefit of this system is over the one Martin had proposed? I
actually think it's quite a bit worse, in that it misses the point of
Martin's algorithm.

Here's my thinking:

1. There's not really any point to choosing the area returned
probabilistically, since you can get both areas with repeated
queries. You might as well return the union of the two hexagons.

2. There's actually no point to returning the second hexagon either,
since a moving target will reveal which hexagon he was in. For
example, zooming in on the a piece of the lattice, suppose an entity X
moves as indicated:

\ A /
\ /
\ /
+
X---->X
B | C
|
+
/ \
/ \
/ D \

At first, the algorithm will return A+B, and afterward, it will return
A+C, so an observer can deduce that the target has moved from B to C,
and can discard A as a possible location.

3. Even worse, based on the timing of the change, the observer can
tell that the target is somewhere near the boundary (H below),
obviously an area much smaller than A, B, C, or D, much less A+B or A+C.

\ A /
\ /
\ /
+
HX
B H C
H
+
/ \
/ \
/ D \

This is exactly the problem that Martin's algorithm solves with the
"hidden trigger" concept. That feature of the algorithm prevents an
observer from isolating the target to a boundary, so the problem (3)
above doesn't arise.

--Richard


On Aug 27, 2010, at 12:08 PM, Jorge Cuellar wrote:

> There is a simpler version of (a slight modification) of
> the algorithm I proposed with the grid.
>
> The idea again is to have a set of "landmarks" that are the
> "centers" of regions used as "possible reported locations".
>
> Consider the area you want to tile to be a flat surface. (A
> tessellation or tiling of a surface is a collection of
> figures that fills the original one with no overlaps or
> gaps). We are interested on tiling with approximately
> regular triangles of approximately the same size. The flat
> case this is easy: tile the plane in the standard way with
> regular (equilateral) triangles of the same size. In this way
> you get, with a suitable choice of coordinates, the grid
> points:
>
> (e x n, d x m) (n, and m range over the integers,
> with n+m = even
> e and d are numbers with
> d/e =((square root of three) /2),
> where "x" represents multiplication and
> "/" division).
>
> An example is given in the figure below, where the grid
> points are shown as asterisks "*". Thus, the points have
> coordinates (starting, say, at the bottom left with (0,0):
>
> (0,0), (0,2d), (0,4d), ...
> (e,d), (e,3d), (e,5d), ...
> (2e,0), (2e,2d), (2e,4d), ...
> (3e,d), (3e,3d), (3e,5d), ...
>
> Those points G ("*") are the "landmarks".
>
> * * *
>
>
>
>
>
>
>
> * * * *
>
>
>
>
>
>
>
> * * *
>
>
>
>
>
>
>
> * * * *
>
> Insert now the midpoints of the sides of the triangle. They
> are(shown in the figure below as dots, ".". Each of those
> points is equidistant to exactly two landmarks.
>
>
> . * . * . * .
>
>
>
> . . . . . .
>
>
>
> * . * . * . *
>
>
>
> . . . . . .
>
>
>
> . * . * . * .
>
>
>
> . . . . . .
>
>
>
> * . * . * . *
>
> Let us call S the set of all those points "+" or "*". Now
> consider, for each point in S, the set of points that are
> closest to a point x in S than to any other point of S.
> (This is the Voronoi diagram of S). This divides the region
> into hexagons as shown in the figure below. Each oint in
> the interior of a hexagon is closet to the center of the
> hexagon than to any other point "+" or "*".
>
> There are two types of hexagons: ones whose center is a
> landmark ("*") and ones whose center "." is equidistant to
> two landmarks. The points in the first type of hexagons are
> said to be "clearly closer to a landmark" (the center of the
> hexagon), while the points in the other type of hexagon are
> said to be "clearly between two landmarks".
>
> . | * | . | * | . | * | .
> + + + + + +
> / \ / \ / \ / \ / \ / \
> + + + + + + +
> | . | . | . | . | . | . |
> + + + + + + +
> \ / \ / \ / \ / \ / \ /
> + + + + + +
> * | . | * | . | * | . | *
> + + + + + +
> / \ / \ / \ / \ / \ / \
> + + + + + + +
> | . | . | . | . | . | . |
> + + + + + + +
> \ / \ / \ / \ / \ / \ /
> + + + + + +
> . | * | . | * | . | * | .
> + + + + + +
> / \ / \ / \ / \ / \ / \
> + + + + + + +
> | . | . | . | . | . | . |
> + + + + + + +
> \ / \ / \ / \ / \ / \ /
> + + + + + +
> * | . | * | . | * | . | *
>
> Now, given a point in the plane choose either the landmark
> clearly closest to it or one of the two closest each with
> probability 1/2.
>
> There is a variant of this for a square grid (but having up
> to 4 "closest landmarks"). It is relatively straight-forward
> to give the formulas for the landmarks associated with a
> point in the plane. I will expand on this on the following
> days.
>
> Ciao, Jorge
>
> On Thu, Aug 26, 2010 at 1:49 PM, Jorge Cuellar
> <jrcuellarj@googlemail.com> wrote:
>> Fine! I still have two questions:
>>
>> 1. Is it safe to assume that the algorithm that generates
>> the reported locations has a memory? It seems that the
>> algorithm checks somehow, (it doesn't matter how) if we are
>> still fine with the last reported location we still report
>> it, and if not, we report a new location. It would be good
>> if we could rely on this: you will be giving much more
>> information away if you did not remember what your last
>> reported location was.
>>
>> 2. Good random number generation is not easy. Shall we say
>> something about it? (Most hashes are fine: but hashes of
>> what?)
>>
>> -----------------------------------------------------------
>> I would prefer an algorithm that tells that I am in Berlin
>> (say a circle with center in the center of Berlin) than one
>> that gives each time a different circle with a center
>> randomly distributed around my real location. The averages
>> (plus some clustering algorithms) of those values would
>> provide a high precision on the places I visit in Berlin.
>>
>>> The problem is that the (real) location can vary
>>> slightly, or you might move a little. If it does vary
>>> slightly, even if this is well below the obfuscation
>>> distance, then the random number is completely altered.
>>> I haven't found a good way around this particular
>>> problem. Anything you do to stabilize the
>>> direction/length of the shift vector only leads to the
>>> ability to learn what the shift vector is.
>>
>> Thus I need one algorithm that chooses out of my location a
>> "landmark" that is a good approximation of my location (and
>> of all locations in a neighborhood).
>>
>> The idea is: if I am "clearly closer" to a landmark: choose it.
>> If not, choose the two closest landmarks. Then I am "clearly
>> between those two landmarks". I will choose anyone
>> either memoryless with prob 1/2 or with a bias toward my
>> previously reported location.
>>
>> Consider the following algorithm sketch:
>>
>> Divide the problem in two steps:
>>
>> 1. What are the locations I will be willing to have a
>> possible reported locations? The centers of those possible
>> reported locations are called "landmarks".
>>
>> 2. Given a real measurement, how to choose a landmark and
>> therefore a reported location?
>>
>> 1st step:
>>
>> Given (input) a reported uncertainty that one wants to
>> meet, first associate to them a grid G which is a set of
>> points ("landmarks" or "possible reported centers") and a
>> set A (atlas) of regions (say, circles, the "possible
>> reported locations"). Notice that the associations:
>>
>> uncertainty --> G = grid = Set of "landmarks" or
>> "possible reported locations"
>>
>> uncertainty --> A = atlas = Set of "possible reported
>> locations"
>>
>> are independent of any measured location. It is possible to
>> publish, for given value ranges of uncertainty, the list G
>> of landmarks (or, even better, the list of "blocks", see
>> below).
>>
>> An atlas is a set of regions that cover all possible places
>> (locations with uncertainty 0). I would restrict the
>> possible set of atlases by saying that the have to have a
>> certain degree of coverage (or degree of overlap). Although
>> not necessarily every point has to be in several regions of
>> the atlas, the overlaps between neighboring regions has to
>> be relatively large. This will be discussed later in more
>> detail.
>>
>> This can be done in many ways. For instance:
>>
>> a. the "possible reported centers" could be just the points
>> in a grid in some geographic coordinate system (say,
>> latitude and longitude) with a given degree of precision
>> (significant digits). For instance: represent the latitude
>> and longitude of a point as a number between 0 and 1 in
>> base two (more convenient than base 10). Consider first all
>> points with 1 decimal, than all with two decimal places,
>> than all with three, etc. For each point considered (this
>> is one "possible reported center") create the corresponding
>> "possible reported locations" (which can be, say, a
>> circle). You continue adding points until you get the degree
>> of coverage wanted.
>>
>> b. You can choose the grid points to be the centers of
>> "large" cities, towns, or other important landmarks in
>> maps. Again, we call those points in the maps "landmarks".
>> Assume you have an ordering of landmarks by importance
>> (say: size of the city in habitants). Given a certain
>> targeted uncertainty, create G and A as follows: add a
>> landmark as a "possible reported center" and the
>> corresponding "possible reported locations" as long as you
>> need it to cover the area in question with the expected
>> degree of coverage.
>>
>> For many applications I myself would choose in Germany the
>> reported uncertainty to be, say, 100 or 300 km and the
>> "possible reported centers" to be the center of some "large
>> cities".
>>
>> 2nd step:
>>
>> Now: given a measured location, a grid G and an atlas A,
>> the question is how to choose the region R in the atlas
>> that will be used as reported location.
>>
>> You can do it like this:
>> if the measured location is in only one region R, report
>> this as the location.
>>
>> Else, determine c1 and c2 the two closest landmarks, i.e,
>> "possible reported centers": at measured distances d1 and
>> d2. If d1 is "clearly smaller" than d2, choose c1, and
>> report the region centered at c1. "Clearly smaller" means
>> that the quotient d1/d2 is less than 0.8 (or other some
>> positive number, smaller than 1, to be chosen carefully a
>> priori). Similarly for d2/d1<1.3. If d1 and d2 are "about
>> the same", choose c1 with probability 1/2, c2 with
>> probability 1/2. In this case we say that the point is
>> "clearly between the two landmarks c1 and c2.
>>
>> The idea is that each point is either "close to one
>> landmark" or "between two landmarks". This partitions the
>> whole space into connected (even: convex) "blocks" (let us
>> call them so). Thus all points in one block are all clearly
>> closer to the same landmark or clearly between two given
>> landmarks.
>>
>> What do you think?
>> -- Jorge
>>
>> On Tue, Aug 24, 2010 at 7:38 AM, Thomson, Martin
>> <Martin.Thomson@andrew.com> wrote:
>>> Hi Jorge,
>>>
>>> My email was a little loose in its use of language, so I can
>>> understand how you might have trouble understanding this.
>>>
>>> More inline...
>>>
>>>> -----Original Message-----
>>>> From: Jorge Cuellar [mailto:jrcuellarj@googlemail.com]
>>>> Sent: Tuesday, 24 August 2010 1:14 AM
>>>> To: Thomson, Martin; rjsparks@nostrum.com; Hannes.Tschofenig@gmx.net
>>>> ;
>>>> geopriv@ietf.org
>>>> Subject: Re: [Geopriv] location obscuring
>>>>
>>>> Hi Martin, Hannes, Robert,
>>>>
>>>> I would like to understand you discussions on "location
>>>> obscuring". First, let me ask you a couple of things :
>>>>
>>>> 1. What is your vocabulary / notation?? Assume there is no
>>>> uncertainity, first. You have one "real location" (?) and two
>>>> circles, a "trigger circle" and a "reported circle" (obfuscated
>>>> circle? obfuscated location?). The centres of those two
>>>> circles ("trigger centre / reported centre" ??) are, I guess,
>>>> different (if not, I see some problems).
>>>
>>> There are three locations:
>>>
>>> - the original location ("real" might be a little too strong here)
>>> - the reported location
>>> - the trigger location
>>>
>>> The first two probably have uncertainty associated with them
>>> (imagine a circle). The trigger location doesn't have uncertainty
>>> in the same way, but can also be described with a corresponding
>>> circle enclosing it. Trigger and reported are different (though
>>> there is a slim chance that they are the same...).
>>>
>>>> The vectors:
>>>>
>>>> trigger centre - real location and
>>>> reported centre - real location
>>>>
>>>> let us call them "shift in reporting" / "shift for trigger" have
>>>> lenght (or absolute value or norm) "distance to trigger centre"
>>>> / "distance to reported centre". The direction (angle to
>>>> some "x-axis" of those vectors are the "Direction of the shift
>>>> in reporting" and "Direction of the shift for trigger".
>>>
>>> That's a good description.
>>>
>>>>
>>>> I read the formula:
>>>>
>>>> Direction = 2 * pi * random
>>>>
>>>> as an angle (random is a random number on [0,1]). And
>>>>
>>>> direction * distance
>>>>
>>>> as a vector (in radians, of lenght distance at an angle direction).
>>>
>>> Correct; I was using shorthand - to the detriment of clarity :(
>>>
>>>> 2. Have you thought about the following problem? Suppose each day
>>>> I commute from home to work. They are sufficiently apart so that
>>>> the 2 "reported circles" have to be disjoint. Suppose I turn off
>>>> my device during the trip, but torn it on as soon as I changed my
>>>> location. All algorithms you have been discussing, as far as I
>>>> see, provide every evening, when I come back home, a
>>>> new "reported circle" centred around home plus a vector of fixed
>>>> lenght with random direection. True?? Thus if anyone sees my
>>>> provided locations all evenings he will know rather fast where
>>>> exacly I live (by averaging the random shifts).
>>>
>>> That's one that didn't factor into this, though perhaps it should
>>> have. Though this relies on the assumption that each night you
>>> are going to the same place, that's a pretty easy conclusion to
>>> come to.
>>>
>>> One of the original ideas was to seed a pseudorandom number with
>>> the (real) location. A hash function would be a good substitute.
>>> This would produce the same shift vector for the same location.
>>>
>>> The problem is that the (real) location can vary slightly, or you
>>> might move a little. If it does vary slightly, even if this is
>>> well below the obfuscation distance, then the random number is
>>> completely altered. I haven't found a good way around this
>>> particular problem. Anything you do to stabilize the direction/
>>> length of the shift vector only leads to the ability to learn what
>>> the shift vector is.
>>>
>>>> 3. In the formula:
>>>>
>>>> Distance = sqrt(random(0, max(0, distance - existing
>>>> uncertainty)))
>>>>
>>>> you have twice distance (one with capitals): is that intended?
>>>> Later you use only the non-capitalized one (?)
>>>
>>> That's a mistake, the second is the obfuscation distance, the
>>> first is the resulting shift.
>>>
>>>> 4. I did not understand this:
>>>> This is the same as below, except: (exept what??? and notation
>>>> below?)
>>>>
>>>> reallyFuzz: function(cu) {
>>>> var wobble = Math.max(0, this.distance - cu.uncertainty);
>>>> var centre = this.randomize(cu.centroid, wobble);
>>>> var unc = Math.max(cu.uncertainty, this.distance);
>>>> this.fuzzed = new GeoShape.Circle(centre, unc);
>>>> this.triggerPoint = this.randomize(centre, this.distance/
>>>> 2);
>>>> }
>>>
>>> This is a replacement for the original Javascript code I used to
>>> describe the algorithm. The complete (original) is here:
>>>
>>> <http://www.ietf.org/mail-archive/web/geopriv/current/
>>> msg08630.html>
>>
> _______________________________________________
> Geopriv mailing list
> Geopriv@ietf.org
> https://www.ietf.org/mailman/listinfo/geopriv

_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv

Re: [Geopriv] location obscuring

There is a simpler version of (a slight modification) of
the algorithm I proposed with the grid.

The idea again is to have a set of "landmarks" that are the
"centers" of regions used as "possible reported locations".

Consider the area you want to tile to be a flat surface. (A
tessellation or tiling of a surface is a collection of
figures that fills the original one with no overlaps or
gaps). We are interested on tiling with approximately
regular triangles of approximately the same size. The flat
case this is easy: tile the plane in the standard way with
regular (equilateral) triangles of the same size. In this way
you get, with a suitable choice of coordinates, the grid
points:

(e x n, d x m) (n, and m range over the integers,
with n+m = even
e and d are numbers with
d/e =((square root of three) /2),
where "x" represents multiplication and
"/" division).

An example is given in the figure below, where the grid
points are shown as asterisks "*". Thus, the points have
coordinates (starting, say, at the bottom left with (0,0):

(0,0), (0,2d), (0,4d), ...
(e,d), (e,3d), (e,5d), ...
(2e,0), (2e,2d), (2e,4d), ...
(3e,d), (3e,3d), (3e,5d), ...

Those points G ("*") are the "landmarks".

* * *

* * * *

* * *

* * * *

Insert now the midpoints of the sides of the triangle. They
are(shown in the figure below as dots, ".". Each of those
points is equidistant to exactly two landmarks.


. * . * . * .

. . . . . .

* . * . * . *

. . . . . .

. * . * . * .

. . . . . .

* . * . * . *

Let us call S the set of all those points "+" or "*". Now
consider, for each point in S, the set of points that are
closest to a point x in S than to any other point of S.
(This is the Voronoi diagram of S). This divides the region
into hexagons as shown in the figure below. Each oint in
the interior of a hexagon is closet to the center of the
hexagon than to any other point "+" or "*".

There are two types of hexagons: ones whose center is a
landmark ("*") and ones whose center "." is equidistant to
two landmarks. The points in the first type of hexagons are
said to be "clearly closer to a landmark" (the center of the
hexagon), while the points in the other type of hexagon are
said to be "clearly between two landmarks".

. | * | . | * | . | * | .
+ + + + + +
/ \ / \ / \ / \ / \ / \
+ + + + + + +
| . | . | . | . | . | . |
+ + + + + + +
\ / \ / \ / \ / \ / \ /
+ + + + + +
* | . | * | . | * | . | *
+ + + + + +
/ \ / \ / \ / \ / \ / \
+ + + + + + +
| . | . | . | . | . | . |
+ + + + + + +
\ / \ / \ / \ / \ / \ /
+ + + + + +
. | * | . | * | . | * | .
+ + + + + +
/ \ / \ / \ / \ / \ / \
+ + + + + + +
| . | . | . | . | . | . |
+ + + + + + +
\ / \ / \ / \ / \ / \ /
+ + + + + +
* | . | * | . | * | . | *

Now, given a point in the plane choose either the landmark
clearly closest to it or one of the two closest each with
probability 1/2.

There is a variant of this for a square grid (but having up
to 4 "closest landmarks"). It is relatively straight-forward
to give the formulas for the landmarks associated with a
point in the plane. I will expand on this on the following
days.

Ciao, Jorge

On Thu, Aug 26, 2010 at 1:49 PM, Jorge Cuellar
<jrcuellarj@googlemail.com> wrote:
> Fine! I still have two questions:
>
> 1. Is it safe to assume that the algorithm that generates
> the reported locations has a memory? It seems that the
> algorithm checks somehow, (it doesn't matter how) if we are
> still fine with the last reported location we still report
> it, and if not, we report a new location. It would be good
> if we could rely on this: you will be giving much more
> information away if you did not remember what your last
> reported location was.
>
> 2. Good random number generation is not easy. Shall we say
> something about it? (Most hashes are fine: but hashes of
> what?)
>
> -----------------------------------------------------------
> I would prefer an algorithm that tells that I am in Berlin
> (say a circle with center in the center of Berlin) than one
> that gives each time a different circle with a center
> randomly distributed around my real location. The averages
> (plus some clustering algorithms) of those values would
> provide a high precision on the places I visit in Berlin.
>
>> The problem is that the (real) location can vary
>> slightly, or you might move a little.   If it does vary
>> slightly, even if this is well below the obfuscation
>> distance, then the random number is completely altered.
>> I haven't found a good way around this particular
>> problem.  Anything you do to stabilize the
>> direction/length of the shift vector only leads to the
>> ability to learn what the shift vector is.
>
> Thus I need one algorithm that chooses out of my location a
> "landmark" that is a good approximation of my location (and
> of all locations in a neighborhood).
>
> The idea is: if I am "clearly closer" to a landmark: choose it.
> If not, choose the two closest landmarks. Then I am "clearly
> between those two landmarks".  I will choose anyone
> either memoryless with prob 1/2 or with a bias toward my
> previously reported location.
>
> Consider the following algorithm sketch:
>
> Divide the problem in two steps:
>
> 1. What are the locations I will be willing to have a
> possible reported locations?  The centers of those possible
> reported locations are called "landmarks".
>
> 2. Given a real measurement, how to choose a landmark and
> therefore a reported location?
>
> 1st step:
>
> Given (input) a reported uncertainty that one wants to
> meet, first associate to them a grid G which is a set of
> points ("landmarks" or "possible reported centers") and a
> set A (atlas) of regions (say, circles, the "possible
> reported locations"). Notice that the associations:
>
>  uncertainty --> G = grid = Set of "landmarks" or
>                            "possible reported locations"
>
>  uncertainty --> A = atlas = Set of "possible reported
>                             locations"
>
> are independent of any measured location. It is possible to
> publish, for given value ranges of uncertainty, the list G
> of landmarks (or, even better, the list of "blocks", see
> below).
>
> An atlas is a set of regions that cover all possible places
> (locations with uncertainty 0). I would restrict the
> possible set of atlases by saying that the have to have a
> certain degree of coverage (or degree of overlap). Although
> not necessarily every point has to be in several regions of
> the atlas, the overlaps between neighboring regions has to
> be relatively large. This will be discussed later in more
> detail.
>
> This can be done in many ways. For instance:
>
> a. the "possible reported centers" could be just the points
> in a grid in some geographic coordinate system (say,
> latitude and longitude) with a given degree of precision
> (significant digits). For instance: represent the latitude
> and longitude of a point as a number between 0 and 1 in
> base two (more convenient than base 10). Consider first all
> points with 1 decimal, than all with two decimal places,
> than all with three, etc. For each point considered (this
> is one "possible reported center") create the corresponding
> "possible reported locations" (which can be, say, a
> circle). You continue adding points until you get the degree
> of coverage wanted.
>
> b. You can choose the grid points to be the centers of
> "large" cities, towns, or other important landmarks in
> maps. Again, we call those points in the maps "landmarks".
> Assume you have an ordering of landmarks by importance
> (say: size of the city in habitants). Given a certain
> targeted uncertainty, create G and A as follows: add a
> landmark as a "possible reported center" and the
> corresponding "possible reported locations" as long as you
> need it to cover the area in question with the expected
> degree of coverage.
>
> For many applications I myself would choose in Germany the
> reported uncertainty to be, say, 100 or 300 km and the
> "possible reported centers" to be the center of some "large
> cities".
>
> 2nd step:
>
> Now: given a measured location, a grid G and an atlas A,
> the question is how to choose the region R in the atlas
> that will be used as reported location.
>
> You can do it like this:
> if the measured location is in only one region R, report
> this as the location.
>
> Else, determine c1 and c2 the two closest landmarks, i.e,
> "possible reported centers": at measured distances d1 and
> d2. If d1 is "clearly smaller" than d2, choose c1, and
> report the region centered at c1. "Clearly smaller" means
> that the quotient d1/d2 is less than 0.8 (or other some
> positive number, smaller than 1, to be chosen carefully a
> priori). Similarly for d2/d1<1.3. If d1 and d2 are "about
> the same", choose c1 with probability 1/2, c2 with
> probability 1/2. In this case we say that the point is
> "clearly between the two landmarks c1 and c2.
>
> The idea is that each point is either "close to one
> landmark" or "between two landmarks". This partitions the
> whole space into connected (even: convex) "blocks" (let us
> call them so). Thus all points in one block are all clearly
> closer to the same landmark or clearly between two given
> landmarks.
>
> What do you think?
> -- Jorge
>
> On Tue, Aug 24, 2010 at 7:38 AM, Thomson, Martin
> <Martin.Thomson@andrew.com> wrote:
>> Hi Jorge,
>>
>> My email was a little loose in its use of language, so I can understand how you might have trouble understanding this.
>>
>> More inline...
>>
>>> -----Original Message-----
>>> From: Jorge Cuellar [mailto:jrcuellarj@googlemail.com]
>>> Sent: Tuesday, 24 August 2010 1:14 AM
>>> To: Thomson, Martin; rjsparks@nostrum.com; Hannes.Tschofenig@gmx.net;
>>> geopriv@ietf.org
>>> Subject: Re: [Geopriv] location obscuring
>>>
>>> Hi Martin, Hannes, Robert,
>>>
>>> I would like to understand you discussions on "location
>>> obscuring".  First, let me ask you a couple of things :
>>>
>>> 1. What is your vocabulary / notation?? Assume there is no
>>> uncertainity, first. You have one "real location" (?) and two
>>> circles, a "trigger circle" and a "reported circle" (obfuscated
>>> circle? obfuscated location?). The centres of those two
>>> circles ("trigger centre / reported centre" ??) are, I guess,
>>> different (if not, I see some problems).
>>
>> There are three locations:
>>
>>  - the original location ("real" might be a little too strong here)
>>  - the reported location
>>  - the trigger location
>>
>> The first two probably have uncertainty associated with them (imagine a circle).  The trigger location doesn't have uncertainty in the same way, but can also be described with a corresponding circle enclosing it.  Trigger and reported are different (though there is a slim chance that they are the same...).
>>
>>> The vectors:
>>>
>>>   trigger centre  - real location   and
>>>   reported centre - real location
>>>
>>> let us call them "shift in reporting" / "shift for trigger" have
>>> lenght (or absolute value or norm) "distance to trigger centre"
>>> / "distance to reported centre". The direction (angle to
>>> some "x-axis" of those vectors are the "Direction of the shift
>>> in reporting" and "Direction of the shift for trigger".
>>
>> That's a good description.
>>
>>>
>>> I read the formula:
>>>
>>>   Direction = 2 * pi * random
>>>
>>> as an angle (random is a random number on [0,1]). And
>>>
>>>   direction * distance
>>>
>>> as a vector (in radians, of lenght distance at an angle direction).
>>
>> Correct; I was using shorthand - to the detriment of clarity :(
>>
>>> 2. Have you thought about the following problem? Suppose each day
>>> I commute from home to work. They are sufficiently apart so that
>>> the 2 "reported circles" have to be disjoint. Suppose I turn off
>>> my device during the trip, but torn it on as soon as I changed my
>>> location. All algorithms you have been discussing, as far as I
>>> see, provide every evening, when I come back home, a
>>> new "reported circle" centred around home plus a vector of fixed
>>> lenght with random direection. True?? Thus if anyone sees my
>>> provided locations all evenings he will know rather fast where
>>> exacly I live (by averaging the random shifts).
>>
>> That's one that didn't factor into this, though perhaps it should have.  Though this relies on the assumption that each night you are going to the same place, that's a pretty easy conclusion to come to.
>>
>> One of the original ideas was to seed a pseudorandom number with the (real) location.  A hash function would be a good substitute.  This would produce the same shift vector for the same location.
>>
>> The problem is that the (real) location can vary slightly, or you might move a little.   If it does vary slightly, even if this is well below the obfuscation distance, then the random number is completely altered.  I haven't found a good way around this particular problem.  Anything you do to stabilize the direction/length of the shift vector only leads to the ability to learn what the shift vector is.
>>
>>> 3. In the formula:
>>>
>>>   Distance = sqrt(random(0, max(0, distance - existing uncertainty)))
>>>
>>> you have twice distance (one with capitals): is that intended?
>>> Later you use only the non-capitalized one (?)
>>
>> That's a mistake, the second is the obfuscation distance, the first is the resulting shift.
>>
>>> 4. I did not understand this:
>>> This is the same as below, except: (exept what??? and notation below?)
>>>
>>>     reallyFuzz: function(cu) {
>>>         var wobble = Math.max(0, this.distance - cu.uncertainty);
>>>         var centre = this.randomize(cu.centroid, wobble);
>>>         var unc = Math.max(cu.uncertainty, this.distance);
>>>         this.fuzzed = new GeoShape.Circle(centre, unc);
>>>         this.triggerPoint = this.randomize(centre, this.distance/2);
>>>     }
>>
>> This is a replacement for the original Javascript code I used to describe the algorithm.  The complete (original) is here:
>>
>>  <http://www.ietf.org/mail-archive/web/geopriv/current/msg08630.html>
>
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv