Wednesday, September 1, 2010

Re: [Geopriv] location obscuring

Hi Martin,

> As to the algorithm you propose:
>
>> (1)
>> Scenario:
>>   The Target visits the same location multiple times over time.
>>
>> Assumptions required by recipient:
>>   The Target is visiting the same location.
>>
>> Constraints on scenario:
>>
>>   Each time, location is only reported while at that
>>   location, not on the approach to, or leaving from the
>>   location. This requires that the Target be unable to
>>   located on approach to or exodus from the location. This
>>   constraint is met by having the means of location
>>   disabled while in transit - e.g. by turning off their
>>   phone.
>
> This is not entirely correct: if you are approaching, say,
> every evening your home via the same route, if you report
> your location when you are approaching home, you will get a
> very close approximation of the route that you are using.

One example:

suppose you regularly fly into a city (Paris, NY, whatever).
The airport is a certain distance, say 10 units, from
downtown, to the east.

Whenever you land you turn on the device that measures your
location. Now you get into a taxi and go to a certain part
of the city, say at most 10 units away from downtown.

Suppose you want an uncertainty of 40 units. You want your
friends to know that you are in this city, but not which of
the 4 companies N, W, S, or E (all 10 units away from
downtown) you are visiting. You may feel quite safe with the
proposed algorithm: you are never 40 units away from the
airport not 40 units away from downtown.

But a friend who sees the information that the algorithm
outputs is able (after a while) to deduce quite easily how
far and in which direction you are travelling in the city.

This is true because if you travel only to a place close to
the airport itself, the algorithm will seldom change the
provided location. But if you travel 10 units west of the
city the algorithm will tend to change location relatively
often. Or 10 units north or 10 units south. And the
eavesdropper can distinguish those three cases rather
rapidly. (In statistical terms: you can test the 4
hypothesis:
a) 10 units north
b) 10 units east
c) 10 units west
d) 10 units south
with a high confidence with a large enough sample).

You can see this effect with a simple simulation:

Take the following 4 points in the plane:

A=(-10,0) (Airport)
N=(0,10)
W=(10,0)
S=(0,-10)
E=(-10,0)

Consider now a movement from A to N. Repeat this 100 times.
For each time, see if the reported location changes in the
path, and if yes, select the last reported location. Take
as "statistic" the average of all those numbers. (If never
the location change your statistic is the average of the
100 reported locations).

Call this number you got (your statistic) N1.
Now repeat the experiment 100 times and obtain numbers N2,
N3, ..., N100.

Now repeat the whole procedure with W, S, and E and compare
the results. If my intuition is not wrong, you will get 4
clusters of numbers.

Up to this point this may be the simulation your friend
does.

If you now secretly choose one of the 4 (N, W, S, E) and
only disclose the statistic you obtained, the confidence of
your friend identifying which of the 4 cases is happening
is high, by just looking at the clusters.

Another simulation: consider a movement from (0,0) to (x,0)
(say, initially x=20) in the plane and let your algorithm
create a set of reported locations with uncertainty 40.
This average of the last reported locations will provide a
good indication on how large x is.

I can not tell you now how much information the algorithm
is leaking per movement from the airport to N, W, S or E.
But for sure it is not too little. (Look at the
simulations; with different numbers it will be much more).
And this happens everywhere, in every city or every town.

But my point is: you are leaking much more information than
what you could.

> Another question:
> If you have several devices providing information: are we
> sure all the provided locations are processed by the same
>  *instance* of the algorithm (same server, same local data)?

If we can not be sure of this, and two different instances of
the same algorithm are processing information about your
location and providing location according to the same policy,
the situation is much worse.

ciao, Jorge
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv