Imagine that you are a senior at one of the most demanding universities in the world. After countless nights of hard work and sleep deprivation, you are finally just weeks away from graduating. Getting to this point has been the culmination of years of your life and the support of many communities and institutions. You’ve demonstrated that you will do whatever it takes to get that cap and gown, no matter how uncomfortable.

The university takes advantage of this situation to administer the senior survey, a lengthy series of questions designed to examine and evaluate “your experiences as a student at Princeton.” Until the survey is complete, a senior will not receive their cap & gown, senior jacket, or tickets to graduation. The survey questions get personal. One question, which has caused anxiety and bewilderment among students, asks participants to list their close friends, and rate each one on a scale of closeness. Social data is one of the most easily misused kinds of data. By making the survey mandatory, the university coerces students into handing over information that helps the university, but could easily harm students if not handled with the proper care.

It may surprise you, then, that the senior survey is not protected by the same independent ethics oversight that faculty research, senior theses, or even class projects require. Usually, research involving human subjects must first pass through an Institutional Review Board. The IRB typically requires the full informed consent of research subjects—they must know what possible risks and benefits the research involves and choose freely to participate based on that understanding. Even in cases where these requirements don’t apply, researchers have an obligation to fulfill their responsibility to inform research participants in other ways. If participants are not fully informed, they can unknowingly expose themselves to harm. If the senior survey is mandatory, to what extent can participants be said to have voluntarily consented to the research?

When I first encountered the survey, I was in the midst of producing a senior thesis on ethics procedures in online research. Concerned about the way the survey was being handled, I had a conversation with an administrator in the Office of Institutional Research (who declined to go on record). According to this source, the office has submitted the senior survey to IRB over the years, but the survey does not technically require IRB review because it is specific to Princeton, and does not seek to produce generalizable knowledge.

From a legal standpoint, this is a reasonable answer; however, it’s not a satisfying one for me. Ethics procedures are motivated by two things: their legal basis and their moral basis. While the legal basis of informed consent comes from federal regulation, its moral basis is the principle of respect for autonomy—people have a right to withdraw from participation. To be able to make this determination, people have to understand the risks and benefits of their participation: they have to be informed. Risks could include improper use of data or emotional discomfort, while benefits could include useful answers to research questions that will improve the participants’ condition in the future. The OIR is legally compliant with research regulations, and have minimized their liability. But by skipping informed consent, the university is doing the bare minimum toward their moral obligation to empower people’s freedom to choose whether participating in research is worthwhile for them.

The senior survey did not fulfill this obligation, which resulted in measurable discomfort for the Class of 2018. The unnamed administrator justified the fact that the survey is mandatory with the fact that each individual question on the survey is optional. I wanted to know: was this effectively communicated, or was there a failure to inform? To find out, I ran my own survey. Of the responses to my survey, only 34.88% (± 7.81% with 95% confidence) understood that all the questions were optional. In other words, only a third of the class was properly informed.

The OIR’s justification for bypassing a consent process is unrealistic at best, irresponsible at worst. Princeton students are smart; if they can’t understand a message that should be clear, there’s a problem. Given that a large majority of students were not properly informed that they were not required to answer every question, many felt compelled to answer every one. Further, those who understood they could opt out had trouble doing so because of the survey’s design. “I later decided that I didn’t want to answer some of those questions,” said one student. “[However,] I was unable to ‘unanswer’ questions I had already put down a real answer for (which was almost everything).” This participant decided that they wished to withdraw their consent partway through the survey. The fact that the survey as a whole was mandatory to submit prevented them from easily exercising this right.

The widespread student discomfort with the friendship question in particular demonstrates the danger in interfering with the ability to opt out of the survey. One student was concerned that they were not provided enough information or context on the intent of the questions: “The questions about my network of friends, and the section that asked me to fill out people that I know specifically, really freaked me out. I wish they had been more clear about which aspects were required and also with how these data would be used in the future. I feel as if my privacy and anonymity will not be respected by Princeton.” Another said, “I felt like I was in a psych study. It seemed like a totally separate thing from the point of the original survey, so I felt like that was not what I signed up for.” The first respondent felt that they were not adequately informed about what the research entailed; the second felt that they had not consented to the questions asked.

The friendship network question is a new addition to the survey in 2018. According to the administrator, it is based on questions appearing in research by a professor in the Department of Psychology. (Remember, the senior survey is “not research.”) However, a source in the department denies that this professor is involved with the senior survey. If the questions are indeed sourced from the professor’s IRB-approved research, then this professor is the most qualified to handle the friendship data safely—their lack of involvement in the survey procedures alarms me.

Students are uncomfortable with the questions in large part because they don’t know how their data will be used, yet understand that it could easily be used against them. Recent data privacy scandals like Cambridge Analytica have brought these issues into public consciousness. Underlying these recent events is the fundamental problem that data can’t be controlled once it’s out in the world. People understand that data misuse happens but aren’t fully equipped—aren’t properly informed—to imagine the full extent of ways their data can be used beyond its original intent. For example, when the “Tastes, Ties, and Time” dataset of social relations at an “anonymous northeastern American university” later revealed to be Harvard was released to the public in 2008, a third party was able to link supposedly anonymous data back to individuals using only publicly available information. Senior survey responses are associated with student IDs, and the survey requests metadata such as campus group memberships that can be easily cross-referenced with public information. If even professional researchers can’t predict how data will be used once it leaves their control, how can students trust Princeton with their data when the senior survey procedures are completely opaque to participants?

It is especially important that we improve this state of affairs because of the clear and present risks the data engenders. Consider an incident from 2016 where MIT dismantled the Senior House residential community, a drastic move which disproportionately targeted low-income and underrepresented minority students. As justification, the administration used misleading analysis methods to cite statistics from a survey. This survey was conducted with poor ethics procedures not unlike Princeton’s senior survey. Theirs purported to benefit students by helping the institute understand mental health issues; however, the institute did not inform students that residence metadata was being collected, or how the results would be used. It collected data on students’ dorm of residence, which is closely tied to social groups; the data was then used to misrepresent and break apart affinity groups of vulnerable students. If abuse of data can happen at MIT, it can happen here.

Of course, it’s possible to imagine this data being put to beneficial use. If the university were able to articulate a clear set of empirical hypotheses well suited to the survey questions, we might expect that being able to measure university initiatives—for example, seeing whether the quantity and quality of cross-racial friendships increases over time—might help us improve the Princeton community. This sort of measurement is essential to understanding which changes are actually helpful to students. However, the OIR is non-specific about how the data will be used toward that goal. While there’s no reason to believe the administration doesn’t intend our well-being, there are also no ways that we can publicly verify their intentions or prove they are helpful to students—unless we build processes for accountability and trust into our institutional research.

Because these processes don’t exist, we are very limited in our capacity to understand the OIR’s work, their intentions, and their analysis methods. Because the administration will not answer questions on record, students are left to speculate on the institution’s behalf. It is not our responsibility to make their case for them. The OIR used to produce a centralized report every year, but it was discontinued. As it stands, the results of the survey are peppered across various committee and task force reports, making them difficult to access. Even then, I’m optimistic that there is opportunity for constructive dialogue between students and administrators about accountability structures for institutional research.

Now is the best time to decide how this dialogue takes place. Conversations about research ethics are reaching a crisis point in 2018 as concerns about data production in society become increasingly widespread. To maximize the likelihood of productive dialogue with the university that has momentum and direction, I urge the administration to consider several changes.

First, implement a clear informed consent process. The university will reduce the risk of the survey if it properly informs participants and respects the right to withdraw. This means prominently marking the survey as optional and providing specific information on what purpose the questions serve. The survey would not be less useful if it were optional. The research questions I expect OIR is asking do not statistically require surveying the entire class instead of a representative sample. Furthermore, the OIR is already drastically overestimating the survey’s completion rate according to my investigation. Consent helps everybody win—students will be much happier cooperating with the university’s goals if equipped with a clearer understanding of their beneficence.

Once that small step is made, it could be time to review and prune the existing questions on the survey. Since the survey has been accumulating questions for 20 years, is there anyone who can articulate exactly why each question is there? Are there questions that are redundant with one another? And for riskier questions like the friendship question, the OIR should be asking themselves what they’re trying to learn, whether what they’re measuring actually helps them do that, and whether the way they’re measuring is the only way. Removing redundant questions minimizes the time cost to participants, and replacing riskier questions with less risky ones that accomplish the same thing will minimize the possibility of unintended data misuse.

To prevent the size of the survey from continuing to expand in the future, there could be publicly communicated standards for adding new questions. I’m not convinced that the university can clearly articulate to themselves, much less to us, what empirical hypotheses are motivating the survey questions and their analysis methods. Being asked to justify the addition of new questions publicly will help the institution avoid the temptation to collect data just because they can. This will reduce the risk of data misuse and help the OIR think about the necessary context required to properly inform participants.

Looking toward the future, why don’t we have a transparent archiving process for the survey data? This adjustment could transform the survey into a valuable resource enabling future researchers to do impactful work improving the university. Imagine having a snapshot of what life at Princeton is like at a given moment stretching back for decades. Making the analysis and results publicly available will not only make that a reality, but also help students verify that they are being represented well by their data, and encourage the OIR to think deeply about the risks of each question.

When we talk about research and data production, what’s at stake is the university’s capacity to understand us as students. One respondent to my survey wrote, “I have this weird optimism that maybe if enough seniors take the survey seriously, we can make Princeton a better place.” Data production turns experiences into numbers, friendships into facts. Will our data represent us in the way that we want to be understood? Will it help the university see these four years of our lives not only as they are, but as they could be? We the Class of 2018—and all classes to come—should have the power to decide.