San Francisco CNN  — 

Artificial intelligence doesn’t know what to make of Os Keyes.

The 29-year-old graduate student is dark-haired, tattooed and openly transgender, using the pronouns “they” or “them.” Facial analysis software, however, typically assigns each face it analyzes one of two labels: male or female.

The software literally can’t categorize Keyes correctly.

Yet for Keyes, who studies gender, technology and power at the University of Washington, this technology is not simply software that doesn’t get it right. According to them, it’s also representative of how companies are not thinking through how power is distributed and norms are reinforced by such software. And Keyes — like a number of other experts in AI and in gender issues who spoke with CNN Business — is concerned about how these AI classifications could police, restrict, or otherwise harm transgender people, as well as those who don’t look stereotypically male or female.

“The people producing this software are producing something that, in a small way, makes people like me miserable and keeps us miserable,” Keyes told CNN Business.

Top tech companies including Microsoft, Amazon and, until recently, IBM, have all invested in technology that can tag pictures of faces fed to their AI systems with binary labels such as “male” and “female,” along with predicting other characteristics, such as whether they are wearing glasses or makeup. A company might, as Amazon suggests in a company blog post, use this gender-prediction feature to analyze footage of shoppers at their stores to learn about how many are men and how many are women. The technology is often offered alongside facial recognition software, though they may be separate systems.

But there is a fatal flaw: The way a computer sees gender isn’t always the same way people see it. A growing number of terms for describing one’s gender are becoming common in everyday life. Over a dozen states and Washington, DC, currently or will soon offer a third gender option, “X,” on drivers’ licenses. Companies such as United Airlines now let customers pick the pronoun “X” or the gender-neutral honorific “Mx” when booking a ticket. On Instagram, 9 million posts are tagged as “transgender” and over 7 million as “trans.” Well over 3 million posts are tagged with the hashtag “nonbinary.” Gender diversity is on smartphones, too, as both Apple and Google offer nonbinary emoji.

As these societal changes proliferate, AI-driven conclusions have become more than a gender identity concern. Some AI experts and members of the transgender community are worried about the potential for serious repercussions if gender recognition, as it exists today, is put to use for more complicated and sensitive tasks, whether it be using AI to help screen job candidates or nab criminal suspects.

Keyes is personally afraid it could enable a surveillance system to issue alerts when someone of the “wrong gender” walks into a bathroom or changing room. Indeed, one AI startup told CNN Business that it offers a gender prediction system that could help security guards flag men who are in an all-female dormitory, or vice versa.

“What you’re talking about is deliberately putting trans people, who don’t have the best relationship with law enforcement, on a collision course with law enforcement,” Keyes said.

Already, AI that scans your face is being used for security applications at concerts, airports, sports arenas and more. These sensitive use cases only raise the stakes for how peoples’ lives could be upended by a few lines of code.

“I think we need to push back on the idea that these systems should exist at all, and look at these kind of assumptions — that someone’s body or face or style or hair can kind of detect their interior state or identity,” said Meredith Whittaker, a former Google employee and cofounder of New York University’s AI Now Institute, which studies social impacts of AI.

Os Keyes, a graduate student at the University of Washington, worries that artificial intelligence could put transgender people "on a collision course with law enforcement" based on how it categorizes gender.

Tech (mostly) doesn’t want to talk about it

The tech companies are mostly staying quiet on these concerns. Amazon and Microsoft declined to comment for this story. IBM, which appeared to stop offering facial-analysis services in September, also declined to comment. Kairos and Megvii, both AI startups that offer such services, didn’t respond to requests for comment. (Google, which offers image-identification features through its Cloud Vision service, doesn’t appear to offer a gender-labeling feature as part of its facial analysis tools, but the company declined to confirm whether or not this is the case.)

Only one company contacted by CNN Business, New York-based startup Clarifai, was willing to speak. Kyle Martinowich, VP of commercial sales and marketing, said Clarifai built its AI model for predicting gender in response to customer demand. He said those customers now range from bricks-and-mortar stores — who may use Clarifai’s technology to figure out how many women walk down a particular aisle — to the US government — which may use it for gathering information about the types of people walking through airports or into federal buildings.

Clarifai’s system for recognizing gender was trained on a data set of over 30 million images, each of which was annotated as masculine or feminine by three different people. Given how small the trans population is — an estimated 1.4 million adults in the US alone, or 0.6% of the adult population — there’s no training data available to help the company incorporate trans individuals into its gender predictions, according to Martinowich. And he argued it’s not worth the money it would cost to source such data. But if a customer offered to pay the company to make such a product, and brought its own data, he said Clarifai would comply.

Martinowich stressed there are limits to what Clarifai would allow customers to do with its services. For instance, he said that “if the Congolese government called us and said, ‘We want to stop females from entering an all-male building,’ we wouldn’t sell them that. And if we found out, we would cut our service off to them.”

Yet, he also said Clarifai is talking to companies that offer single-sex dormitories about how the startup’s automated gender-identification could be used for safety and security purposes — not to deny someone entry to a building, but to flag a security guard “who would need to make the human determination” about whether a person should be rejected from a building.

How well does the technology work?

The automated facial analysis systems used by tech companies invariably compute gender as one of two groups — male or female. This may come with a numerical score indicating how confident the computer is that the face it sees looks like one gender or the other.

Yet a small but growing area of research indicates there are a number of issues with using AI to spot gender, such as increased error rates when it comes to identifying women of color and concerns about accuracy in general.

There’s also the question of how well the technology works when it encounters pictures of people that identify themselves differently than the software might. As Morgan Klaus Scheuerman, a graduate student at the University of Colorado Boulder, found in a recent study, facial analysis systems from major tech companies were all markedly worse at determining gender when confronted with images of people who are trans.

Scheuerman and other researchers built a dataset of 2,450 photos of faces from Instagram that had been labeled by their authors with one of seven different gender-related hashtags such as “transman”, “transwoman”, “man”, “woman” and “nonbinary.” Then they ran the images through facial-analysis services from Microsoft, Amazon, IBM, and Clarifai.

The results? On average, the services classified photos tagged “woman” as “female” 98.3% of the time, and photos tagged “man” as “male” 97.6% of the time.

When it came to images tagged “transwoman” or “transman”, however, they fared far worse. Photos with the “transwoman” tag were identified as “female” over 87.3% of the time, but photos tagged as “transman” were labeled as “male” just 70.5% of the time. Amazon did most poorly when it came to labeling “transman”-tagged photos as male, which it did just 61.7% of the time.

Scheuerman said this may indicate that images of trans men are not included in training these AI systems to determine what men look like.

“I think the real danger is this notion of objectivity,” said Scheuerman, a long-haired, facial-piercing-bedecked student who studies gender and technology and has repeatedly been misidentified by these systems. “The idea that because this is trained, this system is super advanced, then it must be making these objective, data-driven decisions that have to be correct.”

It may also be seen as another way in which the technology humans build falls short when it comes to analyzing a more diverse population than may be found on some tech engineering teams — which are often largely male and white. Human biases, such as sexist notions, can seep into machine-learning software in particular, regardless of creators’ intentions.

Even tech companies can’t control how gender ID is used

As with so many other use cases of artificial intelligence, it can be hard to understand the full impact of the technology’s gender identifications — or misidentifications — on peoples’ lives because the systems frequently operate in a black box. Often the deployment of AI that analyzes faces, whether it’s done by a police department or a department store, is not publicly disclosed, and many countries (including the US) have few laws governing its use.

This concerns Gillian Branstetter, spokesperson for the National Center for Transgender Equality until November, who points out it can impact not just trans people, but anyone. “Any time you try to codify gender norms, either into laws or into algorithms, you’re bound to have an impact on anyone who’s not Ron Swanson or Barbie,” she said.

To make matters more complicated, companies are already using commercially available AI to deduce gender for a number of reasons — and they’re not always using it in the ways the creators intended.

For instance, Amazon writes in its online Rekognition developer guide that gender predictions from its facial analysis service are “not designed to categorize a person’s gender identity” and shouldn’t be used to do so. (According to the version of the developer guide that Amazon maintains on Github, this kind of language was added in late September; previously, it included no guidance about how the technology was intended to be used.)

Morgan Klaus Scheuerman, a graduate student at the University of Colorado Boulder, ran the same image of his face through facial analysis systems from two different companies. One determined his face was male, while the other identified it as female.

Yet Woo, an Indian dating app that matches heterosexual couples, uses Rekognition’s gender-identifying feature mainly to help make sure the gender that users state in their profile matches up with the images of themselves they post within the app, said Woo cofounder and CEO Sumesh Menon.

If there’s a perceived mismatch, a human worker will be notified, and they may contact the user to ask if their gender is incorrectly stated in the app, Menon said. Men, for instance, have accidentally labeled themselves as women in the past, then complained that they were only seeing other men as potential matches.

“It’s not very nuanced; it’s very straightforward,” he said. “But it is super helpful in how we are able to present profiles to the right gender.”

However, it shows that companies selling this AI technology can’t control its deployment once it’s in the hands of customers. (Woo is listed, along with a testimonial from Menon, on an Amazon Rekognition Customers page.)

“That’s in a way proving the point that there’s no way to really ensure your client is using this in an ethical way or a way you intended it to be used,” said Scheuerman.

Nix it, or fix it?

Despite the ethical concerns, businesses believe there is a clear value in having this gender data — but only if the data itself is accurate. To that end, rather than abandon the feature, some companies are now wrestling with how to improve its predictive capabilities.

Limbik, a startup that calls itself a “data studio for short-form video,” uses AI to analyze videos and predict what people will want to watch. The startup turns to AI from Amazon and IBM to identify gender in videos and analyze all manner of things, such as how frequently men pop up in a certain kind of commercial.

But Limbik CEO Zach Schwitzky said his company has “struggled with binary classification.” Two common issues he encounters with the software include short-haired females being classified as males, and people who appear to be teenagers being misclassified as either gender. In his experience, existing automated gender identification works well for anyone who’s between 25 and 35, but that it’s not as helpful for people who are older or younger.

Now, Limbik is building its own software to label gender in images, which to start includes three categories: male, female, and other. Eventually, the company may add more categories, too. Right now, the company is sorting images by hand from sites like Facebook, Twitter, Instagram; these will be used to train an AI model, Schwitzky said.

“I just struggle to think about how to do it in a way that it could be done accurately and effectively,” Schwitzky said.

The research community, too, is wrestling with how to represent gender. Aaron Smith, director of Data Labs at Pew Research Center, said the issue of how to accurately represent gender and gender identity “is a topic of huge interest,” particularly to those who study AI.

Yet whether AI can be built that could accurately identify gender on a broader spectrum, or perhaps consider any characteristics beyond outward appearances, is still largely unknown.

Smith isn’t sure whether technology will eventually be able to suss out a person’s internal identity. He notes that that identity can be “inherently opaque” to AI systems making assessments based on outward appearances.

For those like Keyes, who are worried about the consequences of using AI to recognize gender, there’s a belief that no amount of tinkering will make these systems work or even worthwhile.

“You could add a million categories, and unless you’re adding one category per person you’re never going to get to a place where you can work out someone’s gender from their face,” Keyes said.