Analysis of Different Approaches to Contact Tracing

The following tables are categorized by features or parts of functionality.

Main criteria of this analysis for each feature:

Privacy
Accuracy of risk analysis
Amount of overall traffic
Ability to collect and analyze data

Privacy analysis is divided in two components:

Actual Privacy reflects who owns data and how it is used in reality
User Perceived Privacy reflects what users may think about protection of their privacy and ownership of their data based on the information they know about the application (for example, the app asking for access to user’s location in the background will be perceived as one that is constantly tracking all locations, regardless of what data is actually sent to server)

Geolocation Data Stored Locally

Approach	Details	Pros	Cons	Actual Privacy	User Perceived Privacy
Detailed and accurate (recommended)		Data can be shared later via any possible way with any precision.		High	Moderate*
Large areas only			Doesn’t give much value to reduce precision if data is stored only locally. Precision is lost - in case we need accurate data at some point, it won’t be available.	High	Moderate*

* In case data is stored only locally and we let users know that we don’t upload it to server.

Geolocation Data Stored on Servers

Approach	Details	Pros	Cons	Actual Privacy	User Perceived Privacy
All locations of all users		Allows very precise filtering and analytics. Minimizes notifications sent to people at risk.	Centralized storage of all locations of all users poses serious privacy and security risks.	Low	Low
Large areas only (recommended)	Geohashes with reduced precision	Allows some level of filtering when notifying users at risk.	Some security and privacy risks. Less precision when notifying users at risk.	Moderate	Low
No location	Store only locally on devices	No privacy or security concerns.	Either broadcasting from server or periodic polling from mobile apps required to notify users at risk.	High	Moderate*

* Assuming that we let users know that their location data is not shared with us.

Bluetooth Encounters Data Stored on Servers

Approach	Details	Pros	Cons	Actual Privacy	User Perceived Privacy
All encounters from of all users	Can be plain UUIDs	Allows very precise filtering and analytics. Minimizes notifications sent to people at risk.		Moderate*	Low
Only ids of confirmed cases (recommended)	Maybe plain UUIDs or public keys	Risk data may be reused / resent to newly onboarded users. Some analytics may be done (e.g. when confirmed, when recovered).	Broadcasting or server polling may be required for notifying users (if no geolocation is available on servers)	Moderate	Low
Nothing	Encounters stored only locally		Broadcasting or server polling may be required for notifying users (if no geolocation is available on servers). No analytics or look-back functionality available.	High	High**

* Only if the IDs are anonymized (without any link to phone, email etc.)

** In case we make clear to users that they fully control the data to be shared

Data Uploaded in Reports

Approach	Details	Pros	Cons	Actual Privacy	User Perceived Privacy
All data (recommended)	All BLE encounters for infected person, their UUID/key and all location history	Precise notifications (if we can match UUIDs/keys or geohashes to devices).		Moderate*	Low
Only location history			Cannot notify specific users. Less accuracy in risk assessment.	Moderate*	Moderate*
Only BLE encounters		Ability to notify only people at risk and high accuracy of risk assessment.	May require broadcasting if no data is stored on server.	Moderate*	Moderate*

* Only if we don’t store uploaded data and let users know about that

Other Data Stored on Servers

Approach	Pros	Cons	Actual Privacy	User Perceived Privacy
Phone numbers	Ability to contact people by phone. Guarantee that person is real and has Canadian number.	Security and privacy risks.	Low	Low
Emails	Less privacy risk than phone numbers, still allowing some level of verifying identity.	Some security and privacy risks.	Moderate	Moderate
Only device IDs (recommended)		No way to verify person identity. (can be mitigated by verifying infection reports through HA)	High	Moderate

Notifying Users at Risk

Approach	Details	Pros	Cons
Broadcasting to everyone		No need to store any ids/locations on servers.	Very high traffic volume with 99% data ignored. (Can be optimized by bundling and sending reports every X hours.)
Broadcasting by geohash			Need to store geohash (area) of each user.
Precise notification by BLE UUID/key		Only specific users at risk get notified	Need to store BLE UUID/key for each user.
Polling server (recommended)	Fetch list of geohashes or BLE UUIDs/keys periodically every X hours	No need to precisely send notifications or broadcast them. Maximum privacy - server is not aware of which users are at risk and cannot restore graph of all contacts.	Active cases (ids/geohashes) need to be stored on servers for a while. Downloading list may be problematic if number of active cases is high. App needs to be woken up periodically to poll from server.

Report Verification

HA - health authority

Approach	Details	Pros	Cons
Send report only with token from HA (recommended)	HA would get unique tokens from our servers	Easy for HA where they cannot use our apps for any reasons (codes/tokens can be send in email/ on paper)	Need to generate and distribute tokens somehow and make sure they are secure.
Let HA send report using QR code (recommended)	HA will use our mobile app or website where they need to be authorized	Easy for HA and user when they can access mobile apps or servers (get authorized)	Need to establish mechanism of authorizing HA representatives on our servers.
No verification (supported)	In this case we can handle such reports in different way (for analytics, “mild” notifications etc.)	No effort needed to interact with HA and authorize reports.	No guarantee that report isn’t false.

BLE Message Generation and Exchange

Approach	Details	Pros	Cons
Generate only UUID or hash on device	UUID can be constant per device or change every day (then history needs to be kept). To enable direct notifications for users at risk some data needs to be stored on the server (either UUIDs of all users or UUIDs of infected users).	Easy to generate and share.	If UUIDs are stored on the server, we can map each UUID to device ID which has moderate privacy concerns. Including any additional info about the encounter (when, where it happened) poses risks when data is broadcasted (without storing on server) because all devices will get it. UUID is plain and doesn’t provide any additional info. No analysis or limited analysis can be done (depending on what data stored on servers).
Generate public and private keys on device	Public key is shared to all encounters	Device of infected person can encrypt additional info about the encounter (e.g. when and where it happened) without sharing it with all devices, so only target device can decrypt the message.	No analysis can be done even if data is stored on server because it can be decrypted only by target device (of person at risk).
Server-generated public and private keys	Unique key is provided by server to each device and only server can decrypt the message when report is submitted.	We can send push notifications directly to target users if we store user IDs <-> device IDs on server and encrypt them with public key when exchanging with peers.	Requires refreshing public keys from server periodically. Requires storage of key pairs on server. Requires server to decrypt the message and have device IDs mapped to user IDs to send notifications to target users => moderate privacy concerns.
Keys derived from asymmetric key pairs on device (recommended)	As defined in TCN Protocol, devices exchange derived identifiers (TCNs) instead of public keys, but those identifiers are rotated frequently and can be restored later only from uploaded and signed infection reports.	Server or anybody else doesn’t know about any user’s location, contacts or exposure status. Users cannot send reports on behalf of other users. Contact identifier rotation minimizes possibility of passive tracking. Infected users who send reports do not reveal any personal information to anyone except time of contact.	Client devices need to poll reports from server periodically and large number of reports may cause large payload (few MB).