[arch-general] [arch-dev-public] AUR ToS (aka making AUR user names public)

YANG Ling yangling1984 at gmail.com
Tue Mar 7 03:08:26 UTC 2017


Mauro Santos via arch-general <arch-general at archlinux.org> writes:

> On 05-03-2017 13:35, Lukas Fleischer wrote:
>> Hi,
>>
>> I was recently contacted by a Polish researcher asking for a list of AUR
>> account names. I did not expect this to be controversial but a couple of
>> Trusted Users raised concerns on IRC, so I decided to move this to the
>> public mailing list and discuss the whole topic in generality. I would
>> like to head more opinions but please read the whole email and give it a
>> second thought before simply bringing up the usual privacy arguments
>> mentioned below.
>>
>> My original questions was: Are we fine with sharing the list of AUR
>> accounts names (only user names, no real names or email addresses) with
>> a researcher that seems trustworthy and agrees to not share the data in
>> any form other than the resulting anonymized statistics?
>>
>> In this particular case, we are talking about Dorota Celinska [1] from
>> the University of Warsaw, Faculty of Economic Sciences [2], see [3] for
>> a list of her publications and [4] for a summary of her research project
>> funded recently by the Polish National Science Centre. She needs the
>> list of user names to perform a segmentation analysis, including users
>> which were active on the older AUR releases both do not show any
>> activity on AUR 4. She would also like to use the user names as
>> identifiers to establish connections with other platforms, such as
>> GitHub.
>>
>> The next question is: Would it make sense to even make this data
>> publicly available? Would it make sense to extend our RPC interface such
>> that one can search for users names? GitHub, for example, already
>> provides such an interface [5]. Let me quickly summarize some arguments
>> for this idea which came up on IRC:
>>
>> * User names are mostly identifiers. It is questionable whether they
>>   can/should be considered personal/private information. Maybe this can
>>   only be answered by a lawyer, though.
>>
>> * The user names of all accounts with any kind of public activity, like
>>   uploading a package, filing a request, writing a comment, are public
>>   already.
>>
>> * After logging into the aurweb interface, you can already check whether
>>   an account with a given user name exists because the account details
>>   page URIs have the form https://aur.archlinux.org/account/$username.
>>   This means that for any platform providing a list of user names (such
>>   as GitHub), you can "establish connections" with the AUR already.
>>
>> Now the arguments against:
>>
>> * Principle of data economy: We should not share any kind of information
>>   we do not need to share.
>>
>> * Sharing user names lowers the threshold for sharing other information
>>   which is considered more confidential.
>>
>> * Users can (and should) already use crawlers to fetch the user names.
>>   For example, the user names of all package maintainers and comment
>>   authors appear on the package details pages. The names of all users
>>   filing package requests appear in the mailing list archives etc.
>>
>> * We do not have ToS so we better not share anything.
>>
>> I, personally, find the second last argument a very weak one. Telling
>> users to build crawlers scraping an brute-forcing our HTML pages makes
>> life difficult for both them and us. What do you think?
>>
>> On the other side of the coin, the last argument is a very good one and
>> it brings me to my last point. Independently of the outcome of this
>> discussion, I think we should add some ToS that users need to agree upon
>> when registering. It should contain information on liability and on
>> privacy. Is anybody willing to write a draft? Do we need the support of
>> a lawyer here?
>>
>> Thank you for your time and have a nice Sunday!
>>
>> Regards,
>> Lukas
>>
>> [1] http://coin.wne.uw.edu.pl/dcelinska/en/
>> [2] https://www.wne.uw.edu.pl/index.php/en/
>> [3] http://coin.wne.uw.edu.pl/dcelinska/en/pages/publications.html
>> [4] https://ncn.gov.pl/sites/default/files/listy-rankingowe/2016-03-15/streszczenia/337724-en.pdf
>> [5] https://developer.github.com/v3/users/
>>
>
> I'd say err on the caution side and don't share, even though the
> usernames are public and easy to find by scraping them from the
> website/mailing list/etc, handing the whole database of usernames in a
> silver platter is a whole different story, which is what is being asked.
> Is there any community/website that provides a full list of registered
> usernames on request?
>
> There is also the question of how useful that data would be, without any
> other data such as email the username list is useless, you have no
> guarantee that user foo on github is the same person as user foo on the
> AUR/Wiki/Forum or user foo somewhere else. In this case I'd also have to
> agree that sharing usernames lowers the threshold for sharing other
> information.
>
> It also doesn't fit with their stated research goals, only github and
> projects associated with scraping data from github are mentioned, why
> would they want to throw the AUR usernames in the mix?
>
> --
> Mauro Santos

Hi all,
Shall we focus on Lukas's questions?

>> My original questions was: Are we fine with sharing the list of AUR
>> accounts names (only user names, no real names or email addresses) with
>> a researcher that seems trustworthy and agrees to not share the data in
>> any form other than the resulting anonymized statistics?
→ The first question: Are we fine with sharing the user names?

>> The next question is: Would it make sense to even make this data
>> publicly available? Would it make sense to extend our RPC interface such
>> that one can search for users names? GitHub, for example, already
>> provides such an interface [5]. Let me quickly summarize some arguments
>> for this idea which came up on IRC:
→ The second question: Would it make sense to even make this data publicly available?

>> I think we should add some ToS that users need to agree upon
>> when registering. It should contain information on liability and on
>> privacy. Is anybody willing to write a draft? Do we need the support of
>> a lawyer here?
→ The third question: Shall we add some ToS that users need to agree upon when registering?

My opinions:

1. The first question: Are we fine with sharing the user names?
   I am fine. But I think some agreements should be made before sharing
   the data.
2. The second question: Would it make sense to even make this data
   publicly available?
   No, it is not OK. Please check this wiki [1]. Login name or nickname
   is Personally identifiable information (PII).
3. The third question: Shall we add some ToS that users need to agree
   upon when registering?
   Yes, it is better to have ToS.

[1]: https://www.wikiwand.com/en/Personally_identifiable_information


More information about the arch-general mailing list