Is proof of work a way to ethical telemetry?
11 minutes read
Free and open source project pride themselves with privacy. That is also one of the aspects why users choose to use these projects. This model of development has more advantages over closed one: faster development, clean code, outside contributions and many more.
If asked only about privacy, one segment of FOSS software exceeds as the ultimate champion: offline programs. If you obtained the installation files in private way then you’re covered. Doesn’t matter what your ISP, cloud provider or company’s administrator are up to. If there is no traffic leaving your device there can hardly be any tracking at all. Users like this and have come to expect this.
FOSS has a problem
Most of the GUI FOSS software is written by a single person. That person not only is benevolent dictator but also rarely occasion to discuss a feature with someone before implementing it.
This person obviously has to know how to code, so can be called a programmer. Programming is the only skill needed to write a program. Now, what is the chance that this person knows at least basics of a good UI design?
Let’s reverse the question. If a person is a UX expert, what is the chance that this person will also become maintainer of FOSS app? Slim, I would say.
There is also another thing: users don’t report usability problems. They are not bugs so why bother? Bugtrackers are filled with actual bugs. Would you open an issue titled “Post button is too small”? First, it sounds like your problem and not a software issue. Second, this is really minor. Maintainer surely has more burning issues to resolve.1
Corporate is immune
If a corporation commits to build a new program there is always more then one person assigned to that task. And more than often if the project is big enough the employer will make sure at least one person knows something about UX. That fixes first issue.
Feedback also is available. We rarely see a closed-source program that doesn’t include telemetry. This provides usability data about interfaces that already have been internally tested and developed in a strict pipeline.
No wonder closed-source UIs wins.
While we (community) cannot provide additional developers for many of the projects. We cannot emphasize importance of user experience more (because we already do it and in the end the maintainer will do what they want). But we can find a ways to give developers feedback.
Never done correctly
Incorporating telemetry in free and open codebase has already been done a few times. Nevertheless it always:
was met with lack of applause, sometimes even anger from the community;
was done not in user-respecting way and could be treated as a regression.
Reactions of the community are very understandable. It is proprietary software’s fault that “telemetry” is considered almost a swear word. After all, telemetry is generally used to provide business intelligence and tracking of the user (to later serve ads or sell this data). It’s obvious that this is something unfit for privacy-focused projects.
To provide you with example we can take Audacity. This project (after
acquisition by Muse group) tried including telemetry 2 times. As developers
failed to understand concerns of the community it projected negative image of
how project was managed. As a result fork was created. All the controversies
are linked in Tenacity’s
Tenacity - repository
To provide another example we can take KDE. Here the software does not beg you to opt into telemetry. You can change it in settings. Nevertheless people are still concerned with were the data goes (as you cannot be sure what runs on remote servers).
KDE telemetry policy
As you can see telemetry is very touchy topic when it comes to particular group of users (that is not a bad thing). If we were to include it in some hypothetical project, a lot of care should be put into how privacy-friendly suggested solution is.
Current market solutions are not really up to the task as they all relay on trusting the remote. This should not be the case as remote server can log your IP and then tie your data together. When your few-month long history is presented you can be deanonymised as this study shows
Credit card study blows holes in anonymity
You could say:
But surely there is a way to not include any IP. Every user just needs a proxy.
Let’s say that we do just that. Either with use of TOR network or any other peer-to-peer solution. The data this way loses trust. As a developer I cannot be sure if this telemetry wasn’t swayed by some script kiddy who really wants bigger “Post” button. They could sent few months worth of data in an instant and I am helpless if that happens.
Protect the data!
To improve data integrity there we can sign every packet with a proof-of-work key. This is very similar to how mCaptcha works. Here is an basic overview:
Client wants to send telemetry data to server.
Packet is properly formatted and prepared to be send (for example as a
.jsondocument). One integer field “nonce” is left to be filled later.
Hash value of entire packet is computed. This can be done using MD5 or SHA2 or any other algorithm of developer’s choice (although you should probably lean towards something modern like SHA32).
If hash does not meet server’s requirements the nonce field is changed (incremented by one) and go back to step 2. The server requirements are hard-coded into a client and can be something like this: hash value must begin with “12345”3. This does not serve any purpose other than making it hard to guess nonce value giving this exact hash.
Packet with correct nonce is send to server.
Server computes hash only once to check if it really begins with “12345”.
This ensures the client must do a lot more work than the server and makes it a lot harder to send a lot of information at once. This way normal can users send the data in a rate as it gathers and malicious actors are limited to their machines computational power.
Not a DoS protection
Please don’t confuse this with DoS protection. Above mechanism is there to protect integrity and not availability. There are different techniques to prevent server overload such as discarding unusual traffic before processing it.
If you are a person who would report something like this then I’m glad. You’re making valuable contribution to free software. However, surely you can understand the problem. ↩︎
Modern hash algorithms are not only safer from cryptographic standpoint but also often faster to compute ↩︎
Hash requirements are chosen arbitrarily. The stricter they are the more time client will have to be spend on computing valid hash. I.e. “ends with 6 ones” is a stricter requirement than “ends with 5 ones”. Computing time increases exponentially. ↩︎