In an article in the October issue of the engineering professional journal IEEE Spectrum, Mark Harris investigates the depth and volume of customer-generated data that Tesla acquires every day from millions of its cars on the road. The reasons for all this data collection appear to be benign for the time being, but it's truly a new thing in the automotive industry, and potential misuse of the data is something to worry about.
In common with all other new cars, Teslas have what are called "event data recorders" (EDRs). Similar in function to an airliner's black box, the data recorder keeps a constantly updated 5-second record of speed, accelerator and brake conditions, steering, and other data relevant to diagnosing a crash. In the event of a wreck, the last data set is preserved so that investigators can reconstruct the conditions leading up to the accident.
But Tesla cars go way beyond the EDR minimum. Every minute, the car's GPS location and certain other data are recorded. And when (not if) the car next gets in touch with its designated wireless hub, it uploads an anonymized version of the data to Tesla HQ through the Internet. Technically, the car's owner is not linked to the randomized ID number that accompanies the upload, according to an engineer under the alias of Green, who has examined scrapped Teslas (as well as the one he owns) to determine what the famously close-mouthed company is doing. But as Green points out, if you have anonymized data showing that the car leaves a certain residential address at 8 every morning and returns there at 5 every evening, it's not going to be hard to figure out whose car it is.
Besides the location data, the vehicle's Autopilot system can do something called Shadow Mode, according to former AI head of Tesla Andrej Karpathy. While the human driver is in control, Autopilot pretends to drive the car and compares its steering and control outputs with what the human actually does. When there's a discrepancy, Autopilot can take a data sample, including camera images and other details, and upload it to Tesla HQ to enable continuous improvement of the Autopilot algorithms. Multiply this by the several million Teslas on the road, and you have the world's best test bed for improving autonomous-driving software. This is yet another example of the tech world's powerful largest-network advantage. Once a player in a networked system gets to be the biggest, that organization has a huge advantage over the other players because of the synergistic effects of network nodes supporting each other, roughly speaking.
Of course, Musk and his engineers say that is the only reason they're collecting all this data: to improve the Autopilot system. But it's come in handy in court at least once, when the father of a teenager who died in the fiery crash of his Tesla sued the company. Tesla was able to present the judge with a detailed catalog of many times when the driver tore around town at up to 130 MPH, establishing that the teen was not driving responsibly.
In fairness to Tesla, they are only doing what any sensible company would do in the same situation. If Ford or Volkswagen had happened to climb to the top of the U. S. electric-vehicle heap first with an autonomous car, they would probably be doing more or less the same data-gathering. In principle, even Tesla owners can decline to have any Internet connection made to the car, but no one knows of any owner who has actually done this. This is probably because the intersection of (people who buy Teslas) and (people who don't want their hardware connected to the Internet) is the empty set.
Should we worry about Tesla, or any other car company for that matter, collecting huge piles of data on where we drive every minute, and how fast we drive, and how safely we drive? There are two entities that have strong reasons to access this data, and the main concerns may come from them.
The first entity is government—Federal, state, and local. Already, state governments are beginning to wonder how they will keep collecting highway-tax revenue as more drivers turn to electric vehicles, which completely evade the X-cents-per-gallon gasoline tax that has up to now been a mainstay of highway funding. It's always seemed to me that if you take a libertarian point of view, the people who use the roads should pay for them. Up to now, it was impractical to tell who was using which road, but as more cars get equipped with follow-me-everywhere software, the technology to assess road taxes by miles used wouldn't be that hard to do. But for various political reasons, the states seem instead to be leaning toward a flat annual tax on electric vehicles that will more than make up for the lost gasoline-tax revenues.
The other entity that would like to get their hands on the data is the auto-insurance industry. It's not hard to imagine developing algorithms that would take in a year's worth of digital driving data on you and assess a personalized insurance cost that would precisely reflect your driving habits. This would be very popular for safe drivers and highly unpopular for the other kind. Of course, as Autopilot and its ilk get better, the insurance companies are going to have to deal with increasing numbers of vehicles driving themselves, and the liability implications of that situation are far from being sorted out. But it's likely that the insurance industry will develop some kind of certification process that you'll have to deal with in order to obtain insurance on a car with a given type of autonomous driving capability.
Finally, there is the general creepiness factor that some software somewhere knows where you've been. But as we've gradually gotten used to that with mobile phones, I suppose it won't be much different if our cars know what our phones know already.
For now, just being aware that this data gathering is going on may be the most we can do about it. But while improving autonomous-vehicle software is a laudable goal, it won't be surprising if hackers or other malevolent actors eventually exploit the data stream that Tesla extracts every day from their cars.
Sources: "The Radical Scope of Tesla's Data Hoard," by Mark Harris appeared on pp. 40-45 of the Oct. 2022 print edition of IEEE Spectrum.