Saturday, June 12, 2010

Rethinking analysis of Google's AP data capture

In "Rethink things in light of Google's Gstumbler report," Kim Cameron asks that we rethink our analysis of Google's wireless data capture in light of the third-party analysis of the gstumbler data capture software. In particular he seems to have a particular fondness for the phrase "wrong," "completely wrong," and "wishful thinking" when referring to my comments on the topic. In my defense, I will say that there was no "wishful thinking" going on in my mind. I was just examining the published information rather than jumping to conclusions -- something that I will always advocate. In this case, after examining the published report, it does appear that those who jumped to conclusions happened to be closer to the mark, but I still think they were wrong to jump to those conclusions until the actual facts had been published.

I read through the entire report and have to say that the information in the report is quite different than the information that had been published at the time I expressed my opinions on the events at hand. The differences include:

  1. We had been led to believe that Google had only captured data on open wireless networks (networks that broadcast their SSIDs and/or were unencrypted). The analysis of the software shows that to be incorrect -- Google captured data on every network regardless of the state of openness. So no matter what the user did to try to protect their network, Google captured data that the underlying protocols required to be transmitted in the clear.
  2. We had been led to believe that Google had only captured data from wireless access points (APs). Again the analysis shows that this was incorrect -- Google captured data on any device for which it was able to capture the wireless traffic for (AP or user device). So portable devices that were currently transmitting as the Street View vehicle passed would have their data captured.

    One factor that is potentially in the user's favor is that the typical wireless configuration would encourage portable devices to transmit at just enough power for the AP to hear them (devices on wireless networks do not talk directly to each other). Depending upon the household configuration, it is possible (probable?) that a number of devices would not be transmitting strongly enough for them to be detected from a vehicle out in the middle of the street. However, if Google had a big honking antenna on the vehicle with lots of gain in the right frequencies, it could have detected every device within the house.

Given this new information I would have to agree that Google has clearly stepped into the arena of doing something that could be detrimental to the user's privacy.

That said, however, we need to be a little careful about the automatic assumption that the intent was to put all of this data into some global database. In fact, the way the data was captured -- the header of every data packet was captured, many of which would contain duplicate information -- makes it clear that Google intended to do some post-processing of the data. One could hope that they would use this post-processing step to restrict the data making it into any general, world-wide database. Of course, we don't know whether or not they would do this and even if they would, they still have that raw data capture which contains information that could clearly be used to the users detriment.

In addition, the fact that we know that Google did this, doesn't preclude the fact that others can be doing this (or have already done this) without publicizing that they have done so -- especially those who do intend to use this information for nefarious purposes.

We should take this incident as a wake-up call to start building privacy into the foundations of our programs and protocols.

Tags : / / / / /

Monday, June 07, 2010

Kim vs. Google summary...

As Kim and my ongoing blog discussion seems to have gone off on various tangents (what some might call "rat holes") I thought it best to try to bring things together in a single summary (which I'm sure will probably generate more tangents.

Lets list some of the facts/opinions that have come out in the discussion:

  1. MAC addresses typically are persistent identifiers that by the definition of the protocols used in wireless APs can't be hidden from snoopers, even if you turn on encryption.
  2. By themselves, MAC addresses are not all that useful except to communicate with a local network entity (so you need to be nearby on the same local network to use them.
  3. When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.
  4. SSIDs have some of these properties as well, though the protocol clearly gives the user control over whether or not to broadcast (publicize) their SSID. The choice of the SSID value can have a substantial impact on it's use as a privacy invading value -- a generic value such as "home" or "linksys" is much less likely to be a privacy issue than "ConorCahillsHomeAP".
  5. Google purposely collected SSID and MAC Addresses from APs which were configured in SSID broadcast mode and inadvertently collected some network traffic data from those same APs. Google did not collect information from APs configured to not broadcast SSIDs.
  6. Google associated the SSID and MAC information with some location information (probably the GPS vehicle location at the time the AP signal was strongest).
  7. There is no AP protocol defined means to differentiate between open wireless hotspots and closed hotspots which broadcast their SSIDs.
  8. I have not found out if Google used the encryption status of the APs in its decision about recording the SSID/MAC information for the AP.

Now we get to the point where there are differences of opinion.

  1. Kim believes that since there's no way for the user to configure whether or not to expose their MAC address and because the association of the MAC address to other information could be privacy invasive, that Google should not have collected that data without express user consent to do so and that in this case Google did not have user consent.

    I believe that Google's treatment of the user's decision to broadcast their SSID as an implicit consent for someone to record that SSID and the associated MAC address is a valid and reasonable interpretation. If the user doesn't want their SSID and MAC address collected, they should configure their system to not broadcast their SSID.

    Yes, even with the SSID broadcast turned off, some other party can easily determine the APs MAC address and this would clearly have potential negative impacts on the user's privacy, but that's a technical protocol issue not Google's issue since they clearly interpreted SSID silence to be a user's decision to keep their information private and respected that decision.

  2. In "What harm can come from a MAC address?" Kim seems to argue that because there's some potential way for an entity to abuse a piece of data, that any and all uses of that data should be prohibited. So, because an evil person could capture your mac address of your phone and then drive along the neighborhood to find that mac address and therefore find your home, any use of mac addresses other than their original intent is evil and should be outlawed.

    I believe that it's much better to outlaw what would clearly be illegal activity rather than trying to outlaw all possible uses. So, in this particular case, the stalker should be prohibited from using *any* means to track/identify users with the intent of committing a crime (or something like that).

    Blindly prohibiting all uses will block useful features. For example, giving my device a means of establishing a location of where it is to obtain some location services without revealing to me the basis for that location is a useful feature that I have made use of on my iPhone and I don't believe that I've violated anyone's privacy in using this type of information to know where I am (to do things such as get a list of movies playing at the nearest theatre via the Fandango application).

  3. Kim doesn't seem to have responded at all to my criticism of the privacy advocates failing to use this case as a learning experience for users to help them configure their APs in a way that best protects their privacy.

In summary, I do agree that MAC addresses could be abused if associated with an end-user and used for some nefarious purpose. However, I don't believe that Google was doing either of these.

Tags : / / / /

Sunday, June 06, 2010

House numbers vs SSIDs

In "Are SSIDs and mac addresses like house numbers?" Kim Cameron argues against my characterization of SSIDs and mac addresses being like house numbers:

Let’s think about this. Are SSIDs and MAC addresses like house numbers?

Your house number is used - by anyone in the world who wants to find it - to get to your house. Your house was given a number for that purpose. The people who live in the houses like this. They actually run out and buy little house number things, and nail them up on the side of their houses, to advertise clearly what number they are.

So let’s see:

  1. Are SSIDS and MAC addresses used by anyone in the world to get through to your network? No. A DNS name would be used for that. In residential neighborhoods, you employ a SSID for only one reason - to make it easier to get wireless working for members of your family and their visitors. Your intent is for the wireless access point’s MAC address to be used only by your family’s devices, and the MACs of their devices only by the other devices in the house.
  2. Were SSIDS and MAC addressed invented to allow anyone in the world to find the devices in your house? No, nothing like that.
  3. Do people consciously try to advertise their SSIDs and MAC addresses to the world by running to the store, buying them, and nailing them to their metaphorical porches? Nope again. Zero analogy.

So what is similar? Nothing.

That’s because house addresses are what, in Law Four of the Laws of Identity, were called “universal identifiers”, while SSIDs and MAC addresses are what were called “unidirectional identifiers” - meaning that they were intended to be constrained to use in a single context.

Keeping “unidirectional identifiers” private to their context is essential for privacy. And let me be clear: I’m not refering only to the privacy of individuals, but also that of enterprises, governments and organizations. Protecting unidirectional identifiers is essential for building a secure and trustworthy Internet.

This argument confuses house address with house number. A house number is not able to be used as a universal identifier (I presume that there are many houses out there with the number 15, even in the same town, many times even on the same street in the same zip code (where the only difference is the N.W. and S.E. on the end of the street name).

Like SSIDs and mac addresses, the house number is only usable as an identifier once you get to the neighborhood and very often only once you get to the street.

People choose to advertise SSIDs so they themselves and others will have an easy time connecting with their network once they are within range of the AP - as evidenced by Mike's comment on my previous article (and, the reason why I have chosen to configure my SSID as broadcast). Yes, many people don't know enough to make that decision and perhaps sometimes choose to do what others might consider a wrong thing, but a) that's part of my issue with the wireless AP industry and with the privacy folks not using this as a good educational example.

So while people don't need to go to the hardware store to buy the number to put up on their house, they can, and many do, choose the electronic equivalent when they setup their AP.

House numbers are very much unidirectional identifiers used within the context of a given address (street, city, state, country, postal cod) just as SSIDs and MAC addresses are.

I will admit that there are some differences with the mac address because of how basic Ethernet networking was designed. The mac address is designed to be unique (though, those in networking know that this isn't always the case and in fact most devices let you override the mac address anytime you want). So this could be claimed to be some form of a universal identifier. However, it's not at all usable outside of the local neighborhood. There is no way for me to talk to a particular mac address unless I am locally on the same network with that device.

I do believe that a more privacy enabled design of networking would have allowed for scenarios where mac addresses were more dynamic and thus reducing the universal-ness and persistence of the mac address itself. However, that's an issue for network design and I don't think that what Google did was a substantial privacy issue for the user.

Tags : / / / / /

Friday, June 04, 2010

Privacy Theatre

In a series of blog articles, Kim Cameron and Ben Adida discuss Google's capturing of open access point information as part of its Street View project.

Kim's assertion that Google was wrong to do so is based upon two primary factors:

  • Google intended to capture the SSID and mac address of the access points
  • SSIDs and mac addresses are persistent identifiers

And it seems that this has at least gotten Ben re-thinking his assertion that this was all about privacy theater and even him giving Kim a get-out-of-jail-free card.

While I agree that Kim's asserted facts are true, I disagree with his conclusion.

  • I don't believe Google did anything wrong in collecting SSIDs and mac addresses (capturing data, perhaps). The SSIDs were configured to *broadcast* (to make something known widely). However, SSIDs and mac addresses are local identifiers more like house numbers. They identify entities within the local wireless network and are generally not re-transmitted beyond that wireless network.
  • I don't believe that what they did had an impact on the user's privacy. As I pointed out above, it's like capturing house numbers and associating them with a location. That, in itself, has little to do with the user's privacy unless something else associates the location with the user.
  • I hold the wireless AP industry responsible for the fact that many users don't have their APs setup in SSID stealth and data encrypted mode. The AP industry should have designed things so that they were encrypted by default with hidden SSIDs and required the user to do something to create an open network if they wanted to.
  • The user has to assume some responsibility here, though I really don't expect my mother to know how to configure encryption on an AP (nor do I expect her to know enough to know it's necessary). So I'm back to the AP industry.
  • And, perhaps most of all, I fault the various privacy pundits and all the news outlets who did not take this as an opportunity to teach the users and the industry about how to protect their data. Not one report that I read/saw went into any detail on how the user could protect themselves (which, if they still broadcast their SSIDs and leave their network unencrypted they are open to much worse attacks than Google capturing their SID & mac address).

Perhaps my view is contrarian for one who is somewhat active on the privacy side. However, I think it is a much more pragmatic view that will ultimately bring value to the user far beyond giving Google a hard time for capturing SSIDs and mac addresses which have little privacy value (in my opinion).

Tags : / / / / /