Please Support SilentVector:

Monday, July 13, 2015

Ethics of Harnessing Crowd-Sourcing Technologies

Ethics of Harnessing Crowd-Sourced Technologies

     I have always been in awe of the collective power created by connecting people through the Internet.  Part of these incredibly potent abilities comes from crowd-sourcing.  Crowd-sourcing is the collective accomplishment of a task by giving a group of people small segments of work to be completed.  When each piece of work is completed, the individual parts are reassembled into a functioning product or information utility.  Examples of this are seen in crowd-funding, social networking, and the assignment of metadata to digital information to create meaningful content.

     When “Web 2.0” was built, the function of the Internet shifted.  In 2006, Time Magazine chose “You” as the person of the year because of the amount of useful information being produced by the general population.  Seamlessly, all around us all the time, we create information that is collectively changing the world.  Some of the smallest things we do on the Internet are having the largest impacts.  If we are the primary creators of Internet content, interesting ethical questions arise when owners of crowd-sourced products use our collective accomplishments in ways we did not intend.  Technology continues to pervade the most intimate aspects of our lives rapidly and lawmakers scramble to keep abreast of this development.  An important, modern, poorly documented and sparsely discussed question arises: If we produce so much valuable content, how much of the created products do we actually own and what is the difference between ethical and unethical use of information we create?

     One of the positive examples of crowd-sourcing I mentioned is the reCAPTCHA (Completely Automated Public Turn test to tell Computers and Humans Apart) project owned and run by Google.  Google uses reCAPTCHA information for a variety of projects, to include Google Books.  It is in the process of digitizing scans of books for wider availability and distribution.  Google Maps is in the process of tagging numbered addresses to be used on Google Maps and Google Street View.

     Google uses high resolution digital cameras and software called Optical Character Recognition (OCR) when it scans books or addresses.  Words and numbers the OCR software cannot identify are sent to reCAPTCHA on websites to be translated by humans.  Luis Von Ahn, co-creator of reCAPTCHA says “According to our estimates, humans around the world type more than 100 million CAPTCHAs every day” (“ReCAPTCHA: Human-Based Character Recognition via Web Security Measures,” 2008).

     Based on Mr. Von Ahn’s estimates of how many reCAPTCHAs are processed per day, the following chart shows how long it would take to digitize famous literary works:

Figure 1:  According to Luis Von Ahn, co-founder of reCAPTCHA, how long it would take to digital famous novels based on their world counts and daily reCAPTCHA usage statistics.  Data source:, “Word Count for Famous Novels”:

     I excluded data on my graph about the 44 million words included in the Encyclopedia Britannica because the data dwarfs the other page counts. If reCAPTCHA focused the output of all its users on digitizing the Encyclopedia Britannica, our collective effort would transcribe its data in less than twelve hours. This is an immensely powerful tool for the enrichment and dissemination of human knowledge, but it also provides useful benefits to its users.

     The security created by reCAPTCHA prevents fake accounts and bot programs from flooding Internet websites with Spam. Words and number sequences correctly identified by users are collected by Google. This information is used to complete books and maps, strengthening the usability of Google’s products. In my opinion, this is a great use of crowd-sourcing because both the users and the company providing the service both equally benefit. I found another product that leveraged the unique qualities of crowd-sourced information for more secretive, ethically ambiguous reasons.

     The majority of Facebook’s content is created by its users. Wall Street will disagree with me, but I believe Facebook’s value is determined by its customers. If Facebook didn’t have users to create content for the site, it would be an online advertising billboard; I wouldn’t visit. I assumed a website dependent on its customers for the existence of its business would be transparent and forthcoming when dealing with crowd-sourced information.

     I vaguely remembered a story that broke in the news about Facebook manipulating user’s feeds for some kind of psychological experiment. During my research, I came across the original study and read it in its entirety. What I found was a terrifying example of crowd-sourcing gone wrong. According to a study published in the National Academy of Sciences (“Experimental Evidence of Massive-scale Emotional Contagion Through Social Networks,” 2014), English Facebook users were selected and the “experiment manipulated the extent to which people were exposed to emotional expressions in their News Feed. This tested whether exposure to emotions led people to change their own posting behaviors, in particular whether exposure to emotional content led people to post content that was consistent with the exposure—thereby testing whether exposure to verbal affective expressions leads to similar verbal expressions, a form of emotional contagion.”

     In 2014, the study famously brought to light a peculiar social experiment being conducted by Facebook. In summary, Facebook crowd-sourced its users to test the propagation of “emotional contagions” (i.e. contentment, depression, happiness, anger) based on posts from Facebook user walls. Experiences with Facebook were deliberately distorted, evoking measurable positive or negative emotional responses in users who conveyed their feelings as new posts. This user-generated data further manipulated the moods of others involved in the project. Facebook users were oblivious to the experiment until the story broke in 2014. The reaction of the public was disappointing and became as fleeting as the Facebook timelines it was manipulating.

     As a user of social media, I am alarmed research like this is being conducted at all. I ponder what purpose it serves. It is an unsettling feeling to second guess if what I see on social media is a genuine representation of my personal network of friends and family. It is also concerns me that my colleagues, friends, and relatives may perceive my digital persona inaccurately if Facebook is manipulating my data for frivolous social experiments. Were any of my posts distributed or weighted differently with unfair bias, possibly casting me in an unfavorable light with people I work with, trust, and love?

     Most concerning, I do not recall an option to opt in or out of the experiment (other than to stop using Facebook or learn another language besides English). It is also interesting to point out Facebook has since introduced a new suicide hotline function on their website, only after the experiment was brought to light. The value of this tool in saving human life will prove to be invaluable, but I wonder if it doesn’t serve another purpose to deflect possible litigation hinged on public knowledge of Facebook’s experiment.

     Even in 1942, Doctors and ethics professionals had a clear vision of the parameters in which to conduct their experiments on human beings. Dr. A.N. Richards, chairman of the University of Pennsylvania School of Medicine explained in a letter that “when any risks are involved, volunteers only should be utilized as subjects, and these only after the risks have been fully explained and after signed statements have been obtained which shall prove that the volunteer offered his services with full knowledge and that claims for damages will be waived. An accurate record should be kept of the terms in which the risks involved were described” (Richards, 1942).

     The experiment Dr. Richards is referring to was a bioethics experiment during World War II, but the intent of his words applies today. The spirit of responsibility and accountability is undeniable in this decades old correspondence; so what happened? What thought processes took place in the designers of Facebook’s experiment? What made them believe they could bypass regulation, conduct emotional research, misinform their consumers, and conceal the purpose of their research. The most disconcerting aspect of the whole situation is from Facebook’s users: silence.

     It is my position that legal, ethical crowd-sourcing will positively change the Internet and many of its associated products. Clever uses of crowd-sourcing will continue to be an engine for the accomplishment of undesirable, menial tasks for the benefit of a broader consumer base. With oversight and careful consideration of data quality, crowd-sourcing can construct literal libraries of useful information. A dangerous line is crossed when consumers are not made aware of how their digital personas are manipulated, for any reason. This practice sows distrust between consumers and ultimately undermines a company’s business when they exercise unethical liberties on their users.


Von Ahn, L., Maurer, B., Mcmillen, C., Abraham, D., & Blum, M. (2008). “ReCAPTCHA: Human-Based Character Recognition via Web Security Measures.” Science, 321(5895), 1465-1468.

Kramer, Adam D. I., Guillory, Jamie E., and Hancock, Jeffrey T. (2014) "Experimental Evidence of Massive-Scale Emotional Contagion Through Social Networks." Proceedings of the National Academy of Sciences of the United States of America 111.24 (2014)

Richards, A. N., (1942) “Reply of A. N. Richards, Chaiman, To Dr. J. E. Moore” Reproduction of the National Archives.

No comments:

Post a Comment