New techniques to analyze categorical and discrete time series data
Integer-valued time series (popularly known as count data) appears in many disciplines, ranging from economics to public health to social sciences. Popular examples of such data are the number of people affected from a virus, the number of a certain product sold per day, the number of website visits, the number of extreme environmental events at a location or the number of accidents at an intersection. Generalized linear models (GLM) with Poisson or negative binomial distribution are suitable to deal with the discreteness and they can assess the effect of different regressors on the response variable, but they fail to address the correlated nature of the data. Meanwhile, models like autoregressive integrated moving average (ARIMA) can analyze the covariance structure for a real-valued time series in an appropriate way, but they are also inappropriate for count data as they do not produce coherent forecasts. In fact,
modelling count data demands one to consider both the discreteness and the time-dependence properties of the series. To that end, under specific assumptions, models like INAR, GLARMA, ACP, INGARCH etc. have been developed. However, they lack a sense of generality in the modeling framework. The same can be stated about categorical time series data and related models as well. In this project, we aim to develop a general modeling framework for both categorical and discrete time series data; especially in the context of analyzing economic data.
We also aim to extend the work to deal with spatio-temporal models for categorical and discrete random variables, and work on some interesting applications on environmental datasets.
New techniques to analyze categorical and discrete time series data
Project Team: | Soudeep Deb, Rishideep Roy and Anagh Chattopadhyay |
Sponsor: | IIM Bangalore |
Project Status: | Ongoing (Initiated in July 2021) |
Area: | Decision Sciences |
Abstract: | Integer-valued time series (popularly known as count data) appears in many disciplines, ranging from economics to public health to social sciences. Popular examples of such data are the number of people affected from a virus, the number of a certain product sold per day, the number of website visits, the number of extreme environmental events at a location or the number of accidents at an intersection. Generalized linear models (GLM) with Poisson or negative binomial distribution are suitable to deal with the discreteness and they can assess the effect of different regressors on the response variable, but they fail to address the correlated nature of the data. Meanwhile, models like autoregressive integrated moving average (ARIMA) can analyze the covariance structure for a real-valued time series in an appropriate way, but they are also inappropriate for count data as they do not produce coherent forecasts. In fact, |