Nov 3, 2009

open data [collection]

In my social network, Ton as been the person that as thought me a great deal about the usefulness of Open Data (OD), although he might not be aware of that [conversations with Lilia :)]. First thoughts I had about open data concerned my research activity and what it meant in terms of ethical issues namely, protection of human subjects, if I was to share the collected data and not because I have any personal problem in sharing.

I believe in sharing but I also believe that we should talk more about open data in research, in social sciences and ethical issues in particular. Apart from changing research behaviours, we are still collecting data from real people and we need to know more about it so we can still protect human subjects, while conducting our research. But then, if we treat them as participants and not as subjects, does it change our obligations to protect them? As researchers, we are responsible to conceal information that our participants have given us, and that we think in our judgement it might be harmful for them, are we not? [side question: if/when methods are given by a science researcher, will the teacher warn students about human research ethics?]

While studying organizations, in order to increase access to study people in organization(s), the researcher or team can decide, à priori, to grant anonymity and restrictions on the data set. This situation can even mean that the leading researcher will sign a declaration guarantying that the data collected in the study will only be used for that specific study. Following this guidelines, is expected to increase the organization trust on the researcher(s). On the other hand, if we are receiving public or European funding, it makes sense that deliverables, papers and data are given back to the society, hence open source, open access, open data and in general, the open movement shift that we have been witnessing from some time.

In order to use open data in social studies, one must incorporate this view while thinking about the research design to define what's going to be part of the open data set. We know that our choices, always determine what we will get and how we can get it. We have also to think about when to give access to the data collected. The social side of research means that releasing the data during the study can influence the behaviour of the people we are studying and hence affect the study results. Besides data, there are so many of us using social tools in the research processes (flickr for easy of coding photos, citeUlike for papers collection, blogs for work in progress, slideshare for...) maybe we just have to make it part of the study...

Funny, when I though of this post I just wanted to do a quick note about the Open Data Kit (*) as useful tools for collecting data during fieldwork... and ended up making my 15 minutes exercise in a public post ;)
(*) So, the tools currently available on the Open Data Kit (ODK), according to the information in the project page:

  • ODK Collect - phone based replacement for paper forms, built on the Android platform. It can collect a variety of form data types, like text, location, photos, video, audio, and barcodes. This functionality could be very useful for fieldwork data collection in social science studies.
  • ODK Aggregate - although not officially launched, it "provides a ready to deploy online repository to store, view and export collected data (...) and enables free hosting of data on Google", if one can does not have the required housing infrastructure.
  • ODK Manage - for managing the transfer of forms, data and applications between the mobile phone devices, using SMS.
  • ODK Validate - serves to validate and ensure the forms work well with all the ODK tools.
  • ODK Voice - also not officially launched, it facilitates "(...) mapping XForms to sound snippets that can be played over a "robo" call to any phone. Responses are collected using the phone's keypad (DTMF) and are automatically aggregated."

1 comment:

  1. Came across this post today and found out that the ODK page is no longer available. This is a serious issue, given that the Open Data Kit was realised 7 years ago but did not survive to be used today.

    Data repositories must be made to last or else it means that we will be wasting our precious time with data management for a very limited time.