When Linguistics Meets Machine Learning
Insights from Dr Dai Huteng’s Workshop on Automated Language Analysis
Dr Dai Huteng introduced fundamental concepts of machine learning (ML), demonstrated its applications in linguistic research, and encouraged interdisciplinary collaboration in a seminar on 29 May 2025. Dr Dai, an Assistant Professor from the University of Michigan Ann Arbor, USA, combined theoretical explanations, practical examples, and group discussions to engage participants.
The seminar began with an explanation by Dr Dai that ML enables computers to learn from data without explicit programming, drawing parallels between ML and human language acquisition. He emphasised ML's domain-general applicability and its transformative impact on linguistics, citing concrete examples such as language models serving as cognitive models for studying language acquisition and processing.
Dr. Dai highlighted the crucial distinction between supervised and unsupervised learning when introducing core ML concepts. Supervised learning uses labelled data (e.g., predicting sentence acceptability based on features like length and frequency). In contrast, unsupervised learning aims to discover latent patterns from unlabeled data (e.g., clustering words into syntactic categories). He noted that most current linguistic applications, including acceptability judgments and sentiment analysis, primarily rely on supervised learning methods, as they can more directly model the relationship between input and human-annotated output, thereby providing interpretable quantitative tools for linguistic research. This conceptual distinction helped participants understand how to select appropriate learning paradigms based on their specific research questions.
In the hands-on demonstration session, Dr Dai presented two classic case studies to vividly illustrate ML applications in linguistic research. Beginning with linear regression, he demonstrated how to predict reaction time using word length. He then transitioned to logistic regression, using sentiment analysis as a binary classification example to elucidate how the sigmoid activation function converts outputs into probability values and the rationale behind the 0.5 decision threshold. These demonstrations revealed ML's automated pattern extraction mechanism, mirroring how linguists manually induce linguistic rules but achieving automation and quantification. Participants grasped algorithmic principles through this session and understood how ML transforms traditional linguistic empirical observations into computable modelling processes.
The workshop successfully bridged ML and linguistics by integrating theory and practice. Dr Dai used intuitive cases like regression analysis to reveal ML’s essence as an "automated pattern discovery" tool, systematically demonstrating its application potential in syntactic analysis, semantic understanding, and psycholinguistics. This workshop emphasised that while ML technologies evolve rapidly, linguists' domain expertise and theoretical foundation remain crucial to research success.

