Algorithmic gaze: Automating our decision-making capabilities.


After spending many hours trying to articulate the perfect project concept that would appropriately communicate the research I’ve done thus far, I stumbled onto an idea that I think gets to the heart of what I’m trying to understand about computer vision. Namely, how might algorithms of the future use visual information to draw conclusions about you? And what are the consequences of ceding over our decision-making capabilities to a computer?

Here’s the quick and dirty elevator pitch for the game:

What happens when we let a computer make decisions on our behalf? ALGORITHMIC GAZE is an interactive web-based choose-your-own adventure game that makes personalized decisions for you based on a neural network trained on a collection of images. The project anticipates and satirizes a world in which we cede decision-making authority over to our computers.

I plan to build a low-fidelity game in three.js and WebGL. At the start of the game, the user will upload a handful of pictures and enter information about herself. Then, she will be guided through three different scenarios/scenes, in which there are objects with which she can interact. Each object will prompt a moment of decision: Let me decide or let the computer decide for me.

The program will use the images uploaded by the user to make decisions on behalf of the user. By tapping into a machine learning API, the program will use object recognition, sentiment analysis, facial recognition, and color analysis to make certain conclusions about the user’s preferences. The decisions made on behalf of the user may prompt illogical or surprising outcomes.

A storyboard of the experience:

1 2

Here’s what the basic decision tree will look like as you move through each scene.


Unidentified halo: A wearable that thwarts facial detection.


Unidentified halo is a wearable hat that responds to widespread surveillance culture and a lack of biometric privacy in public spaces. The hat is intended to shield the wearer from facial detection on surveillance cameras by creating a halo of infrared light around the face.

As recently as last week, new information has emerged suggesting that as many as half of all Americans are already included in a massive database of faces. Government reports have long confirmed that millions of images of citizens are collected and stored in federal face recognition databases. Police departments across the country use facial recognition technology for predictive policing. One major problem with these systems is that some facial recognition algorithms have been shown to misidentify black people at unusually high rates. There is also the problem of misidentifying a criminal – and how such mistakes can have disastrous consequences.

Shir David and I worked together on this project. We saw this piece as not only a fashion statement, but also an anti-surveillance tool that could be worn by anyone on the street who is concerned about protecting their privacy.


Since the human eye can’t see infrared light, the hat doesn’t draw any attention to the wearer. In the image above, the surveillance camera “sees” a halo of light around my face, preventing Google’s Cloud Vision platform from detecting a face. When we ran the images through Google’s API, it not only detected Shir’s face but even offered suggestions of her emotion based on facial indicators. My face, on the other hand, went undetected.

The project began as a subversive “kit” of wearable items that would allow the wearer to prevent their biometric data from being collected. Shir and I were both frustrated with both the ubiquity and the invisibility of the mechanisms of biopower, from surveillance cameras on streets to fingerprint scanners at the airport. We discussed the idea further with several engineers at NYU and they suggested that if we were interested in pursuing the idea further, we should construct a hat that shines infrared light on the user’s face.

We all agreed that the hat shouldn’t require technical know-how and the battery could be easily recharged. To do this, we soldered together 22 IR LEDs that are powered by a rechargeable 500mAH lithium battery and monitored by a potentiometer, and then adhered the circuit to a baseball cap. The LEDs are wired along the bill of the hat and the battery is tucked into the rim.




Humans can’t see the infrared light unless they are looking through the feed of a surveillance camera, so the wearer won’t draw attention to herself when she wears it on the street. In terms of users, we imagine that this wearable will be worn by someone who wants a way to protect his biometric identity from being tracked while he’s in public without causing a stir.



895a9988In future versions of the project, we think we would move the LEDs further down the bill of the hat so that it’s closer to the face. We also would ensure that the lithium battery is safely wrapped in a plastic enclosure so that there’s no way it could be accidentally punctured. And, of course, we would sew everything together to improve the appearance of the hat.

We also need to address why the infrared light appears on some IP surveillance cameras but not others – and what kinds of cameras are in use on subway platforms or street corners, for example. Of course, this project fails to address the ubiquity of iPhone cameras, which don’t pick up infrared light and have extremely advanced facial recognition algorithms. These questions will inform the next iteration of the wearable.

Is this you? Is this them? The algorithmic gaze, again.


Last week I presented a handful of different design concepts for my project. The feedback from my classmates was actually very positive – while I feel that the project still lacks focus at this stage, their comments reaffirmed that the different iterations of this projects are all connected by a conceptual thread. My task in the coming weeks is to continue following that thread and consider each iteration of the project a creative intervention into the same set of questions.

Theory & conceptual framework.

We know that systems that are trained on datasets that contain biases may exhibit those biases when they’re used, thus digitizing cultural prejudices like institutional racism and classism. Researchers working in the field of computer vision operate in a liminal space, one in which the consequences of their work remain undefined by public policy. Very little work has been done on “computer vision as a critical technical practice that entangles aesthetics with politics and big data with bodies,” argues Jentery Sayers.

I want to explore the ways in which algorithmic authority exercises disciplinary power on the bodies it “sees” vis-a-vis computer vision. Last week I wrote about Lacan’s concept of the gaze, a scenario in which the subject of a viewer’s gaze internalizes his or her own subjectivization. Michel Foucault wrote in Discipline and Punish about how the gaze is employed in systems of power. I’ve written extensively about biopower and surveillance in previous blog posts (here and here), but I want to continue exploring how people regulate their behavior when they know a computer is watching. Whether real or not, the computer’s gaze has a self-regulating effect on the person who knows they are being looked at.

It’s important to remember that the processes involved in training a data set to recognize patterns in images are so tedious that we tend to automate them. In his paper “Computer Vision as a Public Act: On Digital Humanities and Algocracy”, Jentery Sayers suggests that computer vision algorithms represent a new kind of power called algocracy – rule of the algorithm. He argues that the “programmatic treatment of the physical world in digital form” is so deeply embedded in our modern infrastructure that these algorithms have begun shaping our behavior and assert authority over us. An excerpt from the paper’s abstract:

Computer vision is generally associated with the programmatic description and reconstruction of the physical world in digital form (Szeliski 2010: 3-10). It helps people construct and express visual patterns in data, such as patterns in image, video, and text repositories. The processes involved in this recognition are incredibly tedious, hence tendencies to automate them with algorithms. They are also increasingly common in everyday life, expanding the role of algorithms in the reproduction of culture.

From the perspective of economic sociology, A. Aneesh links such expansion to “a new kind of power” and governance, which he refers to as “algocracy—rule of the algorithm, or rule of the code” (Aneesh 2006: 5). Here, the programmatic treatment of the physical world in digital form is so significantly embedded in infrastructures that algorithms tacitly shape behaviors and prosaically assert authority in tandem with existing bureaucracies.

Routine decisions are delegated (knowingly or not) to computational procedures that—echoing the work of Alexander Galloway (2001), Wendy Chun (2011), and many others in media studies—run in the background as protocols or default settings.

For the purposes of this MLA panel, I am specifically interested in how humanities researchers may not only interpret computer vision as a public act but also intervene in it through a sort of “critical technical practice” (Agre 1997: 155) advocated by digital humanities scholars such as Tara McPherson (2012) and Alan Liu (2012). 

I love these questions posed tacitly by pioneering CV researchers in the 1970s: How does computer vision differ from human vision? To what degree should computer vision be modeled on human phenomenology, and to what effects? Can computer or human vision even be modeled? That is, can either even be generalized? Where and when do issues of processing and memory matter most for recognition and description? And how should computer vision handle ambiguity? Now, the CV questions posed by Facebook and Apple are more along these lines: Is this you? Is this them?

The project.

So how will these new ideas help me shape my project? For one, I’ve become much more wary of using pre-trained data sets like the Clarifai API or Microsoft’s COCO for image recognition. This week I built a Twitter bot that uses the Clarifai API to generate pithy descriptions of images tweeted at it.


I honestly was disappointed by the lack of specificity the data set offered. However, I’m excited that Clarifai announced today a new tool for users to train their own models for image classification.

I want to probe the boundaries of these pre-trained data sets – where do these tools break and why? How can I distort images in a way that objects are recognized as something other than themselves? What would happen if I trained my own data set on a gallery of images that I have curated? Computer vision isn’t source code; it’s a system of power.


For my project, I want to have control over the content that the model is being trained on so that it outputs interesting or surprising results. In terms of the aesthetic, I want to try out different visual ways of organizing these images – clusters, tile patterns, etc. Since training one of these models can take as little as a month, the goal for this week is to start creating the data set and the model.

I’ve been reading Wendy Chun’s Programmed Visions and Alexander Galloway’s Protocol: How Control Exists After Decentralization for months, but I’m recommitting to finishing these books in order to develop my project’s concept more fully.

Progress on the biometric kit: Prototype and field research.

This week Shir and I did some field research, speaking with several engineers, scientists, and software developers about the viability of some of the ideas we had for our anti-surveillance biometric kit.

We first spoke with Nasir Memon, a professor at the NYU Tandon School of Engineering who specializes in biometrics. He had some ideas for the kit, including some kind of wearable (a hat?) that would hold infrared LEDs that would shield the face from facial recognition while remaining imperceptible to the human eye. Upon his suggestion, we then spoke with three NYU engineering students about the viability of this idea and got some real feedback (some of which was positive, some of which presented more challenges).

We talked to Eric Rosenthal, a scientist and professor at ITP, about some of the work he’s done with IR lights and biometric identity verification while at Disney. Shir also spoke to Lior Ben-Kereth, a partner at the facial recognition company acquired by Facebook.

We decided to move forward with the infrared LED wearable idea, but first we needed to ensure that a range of different kinds of cameras do indeed pick up the infrared light. We connected a cluster of IR LEDs and pointed them at our iPhone camera, FaceTime, Snapchat, and a range of IP surveillance cameras – including three different kinds that are in use at ITP.

You can see the result of our test below:

Crystal gazing: A Twitter bot that uses computer vision to describe images.

This semester I’ve been interrogating the concept of “algorithmic gaze” vis-a-vis available computer vision and machine learning tools. Specifically, I’m interested in how such algorithms describe and categorize images of people.

For this week’s assignment, we were to build a Twitter bot using JavaScript (and other tools we found useful – Node, RiTA, Clarifai, etc) that generates text on a regular schedule. I’ve already build a couple Twitter bots in the past using Python, including BYU Honor Code, UT Cities, and Song of Trump, but I had never built one in Javascript. For this project, I immediately knew I wanted to experiment with building a Twitter bot that uses image recognition to describe what it sees in an image.

To do so, I used the Clarifai API’s robust machine learning library to access an already-trained neural net that generates a list of keywords based on an image input. Any Twitter user can tweet an image at my Twitter bot and receive a reply that includes a description of what’s in the photo, along with a magick prediction for their future (hence the name, crystal gazing).


After pulling an array of keywords from the image using Clarifai, I then used Tracery to construct a grammar that included a waiting message, a collection of insights into the image using those keywords, and a pithy life prediction.


I actually haven’t deployed the bot to a server just yet because I’m still ironing out some issues in the code – namely, asynchronous callbacks that are screwing with the order of how functions need to be fired – but you can still see how the bot works by checking it out on Twitter or Github. It’s still a work in progress, however.

You can see the Twitter bot here and find the full code here in my github repo. I also built a version of the bot for the browser, which you can play around with here.

Project update: Algorithmic gaze.

As mentioned last week, I’m exploring the idea of the algorithmic gaze vis-a-vis computer vision and machine learning tools. Specifically, I’m interested in how such algorithms describe and categorize images of people. I’d like to focus primarily as the human body as subject, starting with the traditional form of the portrait. What does a computer vision algorithm see what it looks at a human body? How does it categorize and identify parts of the body? When does the algorithm break? How are human assumptions baked into the way the computer sees us?

As mentioned last week, I’m interested in exploring the gaze as mediated through the computer. Lacan first introduced the concept of the gaze into Western philosophy, suggesting that a human’s subjectivity is determined by being observed, causing the person to experience themselves as an object that is seen. Lacan (and later Foucault) argues that we enjoy being subjectivized by the gaze of someone else: “Man, in effect, knows how to play with the mask as that beyond which there is the gaze. The screen is here the locus of mediation.”

The following ideas are variations on this theme, exploring the different capabilities of computer vision.

Project idea #1: Generative text based on image INPUT.

At its simplest, my project could be a poetic exploration of text produced by machine learning algorithms when it processes an image. This week I started working with several different tools for computer vision and image processing using machine learning. I’ve been checking out some Python tools, including SimpleCV and scikit. I also tested out the Clarifai API in JavaScript.

In the example below, I’ve taken the array of keywords generated by the Clarifai API and arranged them into sentences to give the description some rhythm.

Check out the live prototype here.

I used Clarifai’s image captioning endpoint in order to generate an array of keywords based on the images it’s seeing and then included the top 5 keywords in a simple description.

toni hijab nina

You can find my repo code over here on Github.


In the first project idea, I’m exploring which words an algorithm might use to describe a photo of a person. With this next idea, I’d be seeking to understand how a computer algorithm might categorize those images based on similarity. The user would input/access a large body of images and then the program would generate a cluster of related images or image pairs. Ideally the algorithm would take into account object recognition, facial recognition, composition, and context.

I was very much inspired by the work done in Tate’s most recent project Recognition, a machine learning AI that pairs photojournalism with British paintings from the Tate collection based on similarity and outputs something like this:

The result is a stunning side-by-side comparison of two images you might never have paired together. It’s the result of what happens when a neural net curates an art exhibition – not terribly far off from what a human curator might do. I’d love to riff on this idea, perhaps using the NYPL’s photo archive of portraits.

Another project that has been inspiring me lately was this clustering algorithm created by Mario Klingemann that groups together similar items:

I would love to come up with a way to categorize similar images according to content, style, and facial information – and then generate a beautiful cluster or grid of images grouped by those categories.


A variation on the first project idea, I’d like to explore the object recognition capabilities of popular computer vision libraries by taking a portrait of a person and slowly, frame by frame, incrementally distorting the image until it’s no longer recognized by the algorithm. The idea here is to test the limits of what computers can see and identify.

I’m taking my cues from the project Flower, in which the artist distorted stock images of flowers and ran them through Google’s Cloud Vision API to see how far they could morph a picture while still keeping it recognizable as a flower by computer vision algorithms. It’s essentially a way to determine the algorithm’s recognizable range of distortion (as well as human’s).

I’m interested testing the boundaries of such algorithms and seeing where their breakpoints are when it comes to the human face.*

*After writing this post, I found an art installation Unseen Portraits that did what I’m describing – distorted images of faces in order to challenge face recognition algorithms. I definitely want to continue investigating this idea.

PROJECT IDEA #4: interpreting BODY GESTURES IN paintings.

Finally, I want to return to my idea I started with last week, which was focused on the interpretation of individual human body parts. When a computer looks at an ear, a knee, a toenail, what does it see? How does it describe bodies?

Last week, I started researching hand gestures in Italian Renaissance paintings because I was interested in knowing whether a computer vision algorithm trained on hand gestures would be able to interpret hand signals from art. I thought that if traditional gestural machine learning tools proved unhelpful, it would be an amazing exercise to train a neural net on the hand signals found in religious paintings.


Biometric data kit: Update on project development.

This week, Shir and I have been discussing the biometric anti-surveillance kit that we will be building for our midterm project.

What is biometric data? 

Biometrics are the measurable, distinct characteristics that are used to verify the identity of individuals, including groups that are under surveillance. Biometric data includes fingerprints, DNA, face recognition, retina scans, palm veins, hand geometry, iris recognition, voice, and gait.

Problem area

Biometric data is extremely sensitive. If your data is compromised, it’s not replaceable (unlike password tokens). The widespread collection of personal, biometric data raises questions about the sharing of such data between government agencies or private companies. Many of us use the Apple Touch ID on a daily basis and yet we don’t think about the fact that Apple now has access to a snapshot of our fingerprint. In addition, biometric data is most often collected by the state about populations that are already vulnerable, including criminals, travelers, and immigrants.

Proposed project

We intend to put together a biometric resistance kit, a toolkit of wearable objects aimed at masking and altering user’s personal biometric identity. The aim of the project is to prototype non-intrusive objects that can be worn by anybody to protect their biometric identity in public spaces.

hacking-story-frame-works-001Contents of the kit

We had several ideas of what the kit could contain.

hacking-story-frame-works-007 hacking-story-frame-works-008 hacking-story-frame-works-009 hacking-story-frame-works-010 hacking-story-frame-works-011

Relevant projects

We researched what had been done in the past and found several other artists and engineers experimenting with anti-surveillance materials.


Identity is a project by Mian-Wei that uses a band-aid made of silicon and fibers to trick the Apple Touch ID into thinking it’s a real fingerprint. The solution is simple and effective, something we would like to achieve with our project.

hacking-story-frame-works-014Biononymous Guide is a series of DIY guides for masking your biometric identity (specifically, DNA and fitness trackers). We loved the format of the website – the mix of printed materials, physical objects, and how-to videos match the kind of kit we’re hoping to build.


Adam Harvey is an artist whose anti-surveillance work includes Stealth Wear, a line of Islamic-inspired clothing to shield against drone attacks & thermal cameras, and CV Dazzle, a makeup guide that beats facial recognition algorithms. We loved how he tried to work with styles that people would actually want to wear. This will likely be a major concern in the development of our project.

Field research and interviews

We’ve spoken with a few experts in the field of biometrics, surveillance, and facial recognition algorithms and are planning to continue these conversations.

First, we spoke to Nasir Memon, a computer science professor and biometrics expert at NYU’s Tandon School of Engineering. He had some ideas for the kit, including a hat with infrared lights that would beat facial recognition algorithms. We also spoke to NYU engineering students who are researching surveillance, machine learning, and biometrics. They gave us additional technical guidance about what in our kit would be most viable. We also have set up meetings with artist Adam Harvey and NYU professor Eric Rosenthal to discuss our project idea.

User personas

Shir and I have had many conversations about who this kit would affect. We realized that for the time being, there is no reasonable way to combat the fingerprinting that occurs for immigrants, foreign visitors, criminals, or people who are required to scan their fingerprints for work. Instead, we realized that this kit would be best suited for people who are worried about their privacy and surveillance when they’re in public spaces. Here are several example users we created:

hacking-story-frame-works-002 hacking-story-frame-works-003 hacking-story-frame-works-004

Project proposal redux.

Terrapattern, Golan Levin

For this week’s assignment, we were to reframe or revisit our project idea through a scientific lens. Since computer vision — characterized by image analysis, recognition, and interpretation — is itself considered a scientific discipline, I struggled to find a new scientific framework through which to re-articulate my project.

Because my project is so deeply rooted in computer vision and optics, I’m interested in exploring the idea of “algorithmic gaze” as the means by which computers categorize and label bodies according to specific (and flawed) modalities of power.

Donna Haraway’s concept of the “scientific gaze” has very much influenced my research. In her paper “Situated Knowledges: The Science Question in Feminism and Privilege of Partial Perspective“, Haraway tears apart traditional ideas of scientific objectivity, including the idea of the subject as a passive, single point of empirical knowledge and the scientific gaze as objective observer. Instead, she advocates for situated knowledge, in which subjects are recognized as complex and the scientific gaze is dissolved into a network of imperfect/contested observations. In this new framework, objects and observers are far from passive, exercising control over the scientific process.

Haraway relies on the metaphor of vision, the all-seeing eye of Western science. She describes the scientific gaze as a kind of “god trick,” a move that positions science as the omniscient observer. The metaphor of optics, vision, and gaze will be central to the development of my project. I’m interested in exploring how the “algorithmic gaze” mediates and shapes the information we receive.

Sandro Botticelli (Florentine, 1446 - 1510 ), Portrait of a Youth, c. 1482/1485, tempera on poplar panel, Andrew W. Mellon Collection 1937.1.19
Sandro Botticelli (Florentine, 1446 – 1510 ), Portrait of a Youth, c. 1482/1485, tempera on poplar panel, Andrew W. Mellon Collection 1937.1.19

My first test was using ConvNetJS, a JS library built by Andrei Karpathy that uses neural networks to paint based on an image as input. I used a detail from the painting above and ran it through the neural network. Here’s an example of the process.
screen-shot-2016-09-29-at-2-53-35-pm screen-shot-2016-09-29-at-2-54-23-pmscreen-shot-2016-09-29-at-3-00-11-pm screen-shot-2016-09-29-at-2-54-29-pmscreen-shot-2016-09-29-at-3-03-10-pm

Midterm project proposal: Biometrics.

Problem framework.

For my midterm project, I’d like to address the ethics and implications of widespread biometric data collection.

Biometric identifiers are defined as measurable, distinctive characteristics that are used to label or describe individuals. They’re commonly used by governments and private organizations to verify the identity of an individual or group of individuals, including groups that are under surveillance. Physiological characteristics include fingerprints, DNA, face recognition, retina scans, palm veins, hand geometry, and iris recognition. Behavioral identifiers measure behavioral patterns like voice and gait.

Here’s the breakdown of identification accuracy based on biometric input:


The earliest record of fingerprinting cataloguing dates back to 1891. Biometrics arguably originated with “identificatory systems of criminal activity” as part of a larger system to categorize and label criminal populations. “The biometric system is the absolute political weapon of our era” and a form of “soft control,” writes Nitzan Lebovic. Under the post-9/11 expansion of the Patriot Act, biometric systems have expanded from the state to the private market and blurred the lines between public and private control.

While biometric data is seen as being more accurate and therefore more reliable as a way to identify an individual, it is also not replaceable. If your private password was somehow compromised, for instance, you could simply change your password. On the other hand, you can’t replace your fingerprint or change other physical characteristics.

Italian theorist Giorgio Agamben experienced the implications of “bio-political tattooing” firsthand in 2004 when he was told that in order to obtain a U.S. visa to teach a course at New York University he would have to submit himself to fingerprinting procedures. In a piece published in Le Monde, Agamben explains why he refused to comply, arguing that the electronic filing of finger and retina prints required by the U.S. government are ways in which the state registers and identifies naked life. According to Agamben, biometric data collection operates as a form of disciplinary power.


This issue affects potentially everyone so our audience is very broad. Biometrics data is most often collected about populations that are already vulnerable, including criminals, the poor, and immigrants. Corporations put a monetary value on biometric data, and yet individuals don’t think about data collection as an intrusion.

The goal of this project is to foster an awareness of the implications and ethics of biometric data collection.

Ideas for the project and user journey.

Concept #1: A physical installation that gives the user personalized information based on a biometric input.

Concept #2: A speculative VR experience with advertisements tailored to user’s biometric data.

Concept #3: A kit of wearable objects aimed at masking and altering user’s personal biometric identity.

Concept #4: Collect (non-identity-compromising) biometric data from various participants and sell data on eBay in order to gauge the monetary value of the data.


Redesign of a narrative experience: Mrs Dalloway in 360.


She would not say of any one in the world that they were this or were that. She felt very young; at the same time unspeakably aged. She sliced like a knife through everything; at the same time was outside, looking on. She had a perpetual sense, as she watched the taxi cabs, of being out, far out to the sea and alone; she always had the feeling that it was very, very dangerous to live even one day.
– Virginia Woolf, Mrs. Dalloway

For this week’s assignment, we were asked to redesign a narrative experience according to the agile human-centric design principles we discussed last week.

For my source material, I drew from the themes and text of Virginia Woolf’s Mrs Dalloway, a 1925 novel written in a stream of consciousness literary style that sketches the portrait of life of one woman, Clarissa Dalloway, during the course of one single day.

In the first chapter of the book, Clarissa walks the streets around London running errands in preparation for a party she is throwing that night. When I reread the book, I was struck by the ways in which the novel sharpens our attention to details of time and space, especially the specificity of London during Clarissa’s walks. Time is a significant theme in the novel, with clocks ringing the hour and signs of aging and death made hypervisible in the text. So much of the narration in the novel occurs inside the head of the protagonist, with special attention paid to her surroundings.

With this project, I wanted to explore creating a film that employs this stream of consciousness narrative style while physically putting you in the shoes of the protagonist. I chose to reimagine Mrs Dalloway as an immersive VR/360 experience in order to explore this narrative style not only in text, but also in film.

The idea behind the project was to film myself walking in New York using 360 video, paired with a voice over narration of the opening chapter of the novel. I made slight changes to the text in order to accommodate the sharp departure in setting (from 1925 London to 2016 New York). Much of the narration in the novel is observational — Clarissa sees a woman in a taxi cab, she arrives at the park, she looks in shop windows — and I wanted to replicate those moments in the film as much as possible.

Check out the initial prototype of my idea.



My audience for this project could be anyone, really. Because it’s a 360 video, the user has full control over what he or she is looking at during the film. Just like London, New York is replete with observational details; I wanted the audience to experience that same sensory overload in my project.