Cutting Edge

Lero researcher finds racist and misogynistic terms in image library used to inform AI

A study by a researcher from Lero, the Science Foundation Ireland Research Centre for Software and University College Dublin’s Complex Software Lab, has resulted in the withdrawal of an 80-million image library, which has, up to now, been used to train Artificial Intelligence (AI) and Machine Learning (ML) systems.

The research by Lero and UCD PhD candidate Abeba Birhane found images in academic datasets, used to develop AI systems and applications, were contaminated with racist, misogynistic and other unacceptable offensive labels and slurs.

Already, MIT has deleted its much-cited “80 Million Tiny Images” dataset, asking researchers and developers to cease using the library to train AI and ML systems. MIT’s decision came as a direct result of the research carried out by University College Dublin based Lero researcher Abeba Birhane and Vinay Prabhu, chief scientist at UnifyID, a privacy start-up in Silicon Valley.

In the course of the work, Ms Birhane found the MIT database contained thousands of images labelled with racist and misogynistic insults and derogatory terms including n****r, c**t, b***h and wh**re.

Ms Birhane said linking images to slurs and offensive language infuses prejudice and bias into AI and ML models, perpetuating these stereotypes and prejudices, inflicting unprecedented and incalculable harm on those already on the margins of society.

“Not only is it unacceptable to label people’s images with offensive terms without their awareness and consent, training and validating AI systems with such dataset raises grave problems in the age of ubiquitous AI. Face recognition systems built on such dataset embed harmful stereotypes and prejudices.

When such systems are deployed into the real-world – in security, hiring, or policing systems – the consequences are dire, resulting in individuals being denied opportunities or labelled as a criminal. More fundamentally, the practice of labelling a person based on their appearance risks reviving the long-discredited pseudoscientific practice of physiognomy,” Ms Birhane said.

While the 80 Million Tiny Images dataset is one of the Large Scale Vision Datasets (LSVD), there are many others in use around the world.

“Lack of scrutiny has played a role in the creation of monstrous and secretive datasets without much resistance, prompting further questions such as: what other secretive datasets currently exist hidden and guarded under the guise of proprietary assets?” she said.

The researchers also found that all of the images used to populate the datasets examined were “non-consensual” images, included those of children, scraped from seven image search engines, including Google.

“In the age of Big Data, the fundamentals of informed consent, privacy, or agency of the individual have gradually been eroded. Institutions, academia, and industry alike amass millions of images of people without consent and often for unstated purposes under the guise of anonymisation, a claim that is both ephemeral and vacuous,” the research team argues.

The researchers said their goal is to bring awareness to the AI and ML community regarding the severity of the threats from ill-considered datasets and their direct and indirect impact of their work on society, especially on vulnerable groups.

“From the questionable ways images were sourced to troublesome labelling of people in images, to the downstream effects of training AI models using such images, large scale vision datasets may do more harm than good.”

“We believe radical ethics that challenge deeply ingrained traditions need to be incentivised and rewarded in order to bring about a shift in culture that centres justice and the welfare of disproportionately impacted communities. I would urge the machine learning community to pay close attention to the direct and indirect impact of our work on society, especially on vulnerable groups,” Ms Birhane concluded.

Irish Tech News

Recent Posts

The Monzo Money Pulse: Research reveals Irish adults are sitting on a €1.5 billion ‘Savings Gap’

Digital bank Monzo has released its inaugural edition of ‘The Monzo Money Pulse’, a research-led…

11 hours ago

Ireland Surges Ahead in Digital Transformation: New eir research

eir, Ireland’s leading telecommunications provider, today published its Digital Ireland Report, a landmark nationwide study…

13 hours ago

Electric Vehicles Outsell Petrol for the First Time

Nevo reports that the Irish car market has crossed a landmark moment. New figures released…

15 hours ago

CEOs plan to accelerate AI, transformation, cost control and dealmaking in 2026 – EY Ireland CEO Outlook

Irish CEOs are entering 2026 with urgency and focus, responding to rising cost pressures, geopolitical…

18 hours ago

Irish buyers continue move to electric vehicles as momentum builds in Ireland’s transition

Ireland’s transition to electrified mobility continues to strengthen, with two in five Irish consumers (40%)…

4 days ago

More about Irish Tech News


Irish Tech News are Ireland’s No. 1 Online Tech Publication and often Ireland’s No.1 Tech Podcast too.


You can find hundreds of fantastic previous episodes and subscribe using whatever platform you like via our Anchor.fm page here: https://anchor.fm/irish-tech-news


If you’d like to be featured in an upcoming Podcast email us at Simon@IrishTechNews.ie now to discuss.


Irish Tech News have a range of services available to help promote your business. Why not drop us a line at Info@IrishTechNews.ie now to find out more about how we can help you reach our audience.


You can also find and follow us on Twitter, LinkedIn, Facebook, Instagram, TikTok and Snapchat.