Notice and Takedown in Everyday Practice – Data Release

Our coded samples from Study 2 and Study 3 in the research report, Notice and Takedown in Everyday Practice, as well as a .txt file containing our statistical calculations, are available here. Anybody may download and use the data and calculations as long as they agree to make certain research ethics disclosures and acknowledge that underlying data that exists on the Internet may change or disappear over time (“link rot”).

Additional documentation about the variables we used to code the data is available here. The variable documentation is available to anyone under a Creative Commons Attribution 4.0 International License.

Please see the FAQs below for more detailed information.

FREQUENTLY ASKED QUESTIONS

Q: What data are you making available?
A:  We are making available our coded samples from Study 2 and Study 3 in the research report, Notice and Takedown in Everyday Practice. We randomly sampled data from a broader dataset (described in the report) that we extracted from the Lumen database; these are the samples we obtained. The samples are “coded” with the metadata we collected about or assigned to each takedown request.

For example, a takedown request in our sample might be coded with the following variables: sender name, sender type, copyrighted work, alleged infringer type, whether the notice exhibits questions related to identifying the works in question, whether the notice is proper subject matter for copyright takedown, and so on. Additional details about our methodology for Study 2 and Study 3 are in Appendix C of Notice and Takedown in Everyday Practice.

Q: Are you making any other information available?
A: Yes! We are also making available detailed documentation about the variables we used to code the data. And, we are making available the statistical calculations we used in a .txt file that can be converted to a .do or other statistical package file. The variable documentation is here, and the statistical calculations are available with the coded samples, here.

Q: What data are available elsewhere?
A: The questions upon which we based our survey and interview questions in Study 1 can be found in Appendix B of Notice and Takedown in Everyday Practice. The raw data in our full Study 2 and Study 3 dataset are available at Lumen, and information about how we extracted our dataset can be found in Notice and Takedown in Everyday Practice.

Q: Can anyone access the data you are making available here?
A: Yes. The data (coded samples and statistical calculations) will be available to anyone who agrees to make certain research ethics disclosures, including:

  • disclosure of funding sources,
  • disclosure of any influences or limitations on methodology or reporting, and
  • disclosure of methods.

Further, you must agree to “pay it forward” by requiring the same of anyone with whom you share the materials. The terms are here. Finally, you must separately acknowledge that “link rot” may affect the data (see the FAQ about this below). There is no cost for accessing or using the data.

The variable documentation is available to anyone under a Creative Commons license. The documentation is here.

Q: Why ask people who want to use your coded data to agree to these research ethics guidelines?
A: In many areas of research—notably, medical research—ethics rules that help ensure the objectivity and independence of researchers are now the norm. When these rules have not been the norm, there is evidence that—even if researchers do not intend it—bias in favor of funders or other supporters can creep in.  Intellectual property research has not previously had ethics norms in place, but this is changing for the better. We now have a beginning set of norms, set out in the 2016 Open Letter on Ethical Norms in Intellectual Property Scholarship. These include disclosing all sources of research funding, disclosing data where possible, and refusing to accept any limitations on reporting.

We believe in these norms, and have followed them in our research. We would like to encourage the growth of these norms across intellectual property research. We are asking those who want to use our data to at least let others know whether they are following the norms.

Q: Why ask us to acknowledge issues with link rot?
A: We want to be sure anyone who uses the coded samples understands that the underlying information (especially the allegedly infringing material) exists on the Internet, and may disappear or change over time. We experienced “link rot” when we were working with the data, and we expect it might get worse over time. We don’t want anyone to think that our coded samples can always be replicated based on what’s on the Internet at the time someone else is using them.