Centres Of Excellence

To focus on new and emerging areas of research and education, Centres of Excellence have been established within the Institute. These ‘virtual' centres draw on resources from its stakeholders, and interact with them to enhance core competencies

Read More >>

Faculty

Faculty members at IIMB generate knowledge through cutting-edge research in all functional areas of management that would benefit public and private sector companies, and government and society in general.

Read More >>

IIMB Management Review

Journal of Indian Institute of Management Bangalore

IIM Bangalore offers Degree-Granting Programmes, a Diploma Programme, Certificate Programmes and Executive Education Programmes and specialised courses in areas such as entrepreneurship and public policy.

Read More >>

About IIMB

The Indian Institute of Management Bangalore (IIMB) believes in building leaders through holistic, transformative and innovative education

Read More >>

Caste And Occupational Identity In Large Language Models

Jarul Zaveri and Arpit Shah
2025
Working Paper No
724
Body

A large body of scholarship has documented evidence of racial and gender biases in large language models (LLMs). In this work, we examine three types of LLM biases in the context of caste and occupational identity in India through five studies. Our studies cover a comprehensive set of occupations in India and test for bias across all of India's districts. Our results provide four key insights. First, we find representation bias such that individuals from marginalized caste groups are significantly under-represented in LLM output compared to their share in India's working population. This potentially reflects India's digital divide. Second, corrective measures to increase representation introduce other sources of errors. Corrective measures can also lead to association bias where marginalized castes are linked to occupations that require lower education levels and provide lower pay. Third, the models also demonstrate selection bias with a higher probability of shortlisting resumes with names from dominant caste groups. Finally, we propose a training approach by which selection bias can be reduced in LLM shortlisting. Our work is highly relevant at a time when generative AI is becoming increasingly important in recruitment and hiring processes as a cost-saving measure.

Key words
large language models; caste; bias; occupational identity; India
WP No. 724.pdf (727.59 KB)

Caste And Occupational Identity In Large Language Models

Author(s) Name: Jarul Zaveri and Arpit Shah, 2025
Working Paper No : 724
Abstract:

A large body of scholarship has documented evidence of racial and gender biases in large language models (LLMs). In this work, we examine three types of LLM biases in the context of caste and occupational identity in India through five studies. Our studies cover a comprehensive set of occupations in India and test for bias across all of India's districts. Our results provide four key insights. First, we find representation bias such that individuals from marginalized caste groups are significantly under-represented in LLM output compared to their share in India's working population. This potentially reflects India's digital divide. Second, corrective measures to increase representation introduce other sources of errors. Corrective measures can also lead to association bias where marginalized castes are linked to occupations that require lower education levels and provide lower pay. Third, the models also demonstrate selection bias with a higher probability of shortlisting resumes with names from dominant caste groups. Finally, we propose a training approach by which selection bias can be reduced in LLM shortlisting. Our work is highly relevant at a time when generative AI is becoming increasingly important in recruitment and hiring processes as a cost-saving measure.

Keywords: large language models; caste; bias; occupational identity; India
WP No. 724.pdf (727.59 KB)