Replication materials for:
“Wiki Surveys: Open and quantifiable social data collection”

The wiki survey project seeks to develop new online methods of social data collection. This data and code release enables others to replicate and extend the results in Salganik and Levy (2015).

Data Release

This data release includes:

  • 6 raw data csv files
  • 6 pre-cleaned csv files
  • 4 csv files related to data cleaning
  • 6 R program files
  • 1 bash script
  • 1 documentation file in pdf format

With these data and code, one can reproduce all the results as reported in Salganik and Levy (2015).

Public and Sign-up Versions

There are two different versions of this data release: public and sign-up. Both require registration to the OPR Data Archive. In the public version, some of the information in the files has been obscured in order to make it more difficult to de-anonymize the data. It is possible to completely reproduce our analysis with the public data.

Additional analysis may require the sign-up version of the data. If you would like to get access to the sign-up version, you will need to apply for it by filling out a separate sign-up form, once you register and logged into the OPR Data Archive. You will be asked to agree on several conditions and to submit a proof that your research is overseen by an Institutional Review Board (IRB) or similar ethical oversight body.

More specifically, the differences between the public and sign-up data versions are:

  • In the public version files, the fields capturing the world wide web user-agent string and referring uniform resource locator (URL) have been cryptographically hashed. This change impacts the votes and nonvotes files.
  • In the public version files, all timestamps have been coarsened to only show the date, not the time. This change impacts multiple variables in the ideas, votes and nonvotes files.


Please direct questions to Matthew Salganik, Department of Sociology and Office of Population Research, Princeton University, Princeton, NJ 08544, USA; mjs3@princeton.edu.



This research was supported by grants from Google (Faculty Research Award, the People and Innovation Lab, and Summer of Code 2010); Princeton University Center for Information Technology Policy; Princeton University Committee on Research in the Humanities and Social Sciences; the National Science Foundation [grant number CNS-0905086]; and the National Institutes of Health [grant number P32-CHD047879]. Some of this research was performed while Matthew Salganik was employed by Microsoft Research.

Registration Required

To access these datasets, please login or register as a user of the data archive.