Caste And Occupational Identity In Large Language Models
A large body of scholarship has documented evidence of racial and gender biases in large language models (LLMs). In this work, we examine three types of LLM biases in the context of caste and occupational identity in India through five studies. Our studies cover a comprehensive set of occupations in India and test for bias across all of India's districts. Our results provide four key insights. First, we find representation bias such that individuals from marginalized caste groups are significantly under-represented in LLM output compared to their share in India's working population. This potentially reflects India's digital divide. Second, corrective measures to increase representation introduce other sources of errors. Corrective measures can also lead to association bias where marginalized castes are linked to occupations that require lower education levels and provide lower pay. Third, the models also demonstrate selection bias with a higher probability of shortlisting resumes with names from dominant caste groups. Finally, we propose a training approach by which selection bias can be reduced in LLM shortlisting. Our work is highly relevant at a time when generative AI is becoming increasingly important in recruitment and hiring processes as a cost-saving measure.
Caste And Occupational Identity In Large Language Models
A large body of scholarship has documented evidence of racial and gender biases in large language models (LLMs). In this work, we examine three types of LLM biases in the context of caste and occupational identity in India through five studies. Our studies cover a comprehensive set of occupations in India and test for bias across all of India's districts. Our results provide four key insights. First, we find representation bias such that individuals from marginalized caste groups are significantly under-represented in LLM output compared to their share in India's working population. This potentially reflects India's digital divide. Second, corrective measures to increase representation introduce other sources of errors. Corrective measures can also lead to association bias where marginalized castes are linked to occupations that require lower education levels and provide lower pay. Third, the models also demonstrate selection bias with a higher probability of shortlisting resumes with names from dominant caste groups. Finally, we propose a training approach by which selection bias can be reduced in LLM shortlisting. Our work is highly relevant at a time when generative AI is becoming increasingly important in recruitment and hiring processes as a cost-saving measure.