High wind speed events are not uncommon here in Puget sound and are especially prevalent in the winter months. These extreme winds can cause road closures, significantly damage public and private property and be hazardous to human safety. Some historical examples of exceptional windstorms in the Pacific Northwest include the Columbus Day windstorm of 1962, the Great Gale of 1880 and Hanukkah Eve windstorm of 2006 (included in the upcoming data).
I am interested in describing this the weather phenomena in terms of a statistical distribution. This distribution can be loosely thought of as the “climate” of high wind speed events in King county.
Here we will get windspeed data from NOAA’s National Climatic Data Center Storm events database. We will select all of the High wind and Strong wind for King co., WA. The aggregate dataset contains 45 incidents from 2006 to 2014. The variable of interest here is the wind speed in knots. Technically speaking, the wind speed measurements represent different quantities measure/estimate sustained/gust winds, but we will ignore this.
The wind speed values are just integers and in the spirit Occam’s Razor, we would prefer the simplest model required our quantity of interest. A Poisson distribution is a natural initial model choice since it is supported for all natural numbers and only requires a single parameter . The Poisson distribution also assumes that the variance of the random variable is equal to its mean. Here the sample mean and sample variance is 45.5 and 73.4, respectively. In turn, our estimated variance to mean ratio is approximately 1.6. We would expect that this quantity would be near 1 if our data was generated from independent draws of a single Poisson distribution, however, deviations from this expectation should not be suprising. To test the null hypothesis that our observed data was generated from independent draws of a single poison distribution, we will use the Conditional Chi-square statistic (Brown and Zhao 2002).
To do this we calculate the test statistic, and critical chi-square value . represents our level of significant and we reject our null hypothesis if . At the level, our critical chi-square statistic is approximately 60.5. Hence, with a test statistic of 70.96, we reject our null hypothesis and conclude that it is likely there is extra-poisson variance in the data generating process.
This overdispersion can be accounted for by adopting slightly more complicated model; the negative binomial distribution. Like the Poisson distribution, the negative binomial distribution is supported on the natural numbers but contains an additional dispersion parameter . We can get estimates of both parameters using the
fitdistr() function from the MASS library in R. A histogram summary of the data is shown below with Poisson and Negative binomial density function overlays.
Hence, the resulting statistical description of high wind events in King co., WA takes the form of a negative binomial distribution with rate parameter 45.1 and dispersion parameter 80.4. It should be mentioned that 9 years of climate data is woefully short of the 30 year benchmark suggested by the World Meterological Organization. In fact, with peak wind speeds of 105 km/hour, the Columbus day windstorm of 1962 was dramatically larger than any of the event in the NOAA dataset which is suggests our current description is lacking.
Brown, Lawrence D., and Linda H. Zhao. “A test for the Poisson distribution.” Sankhyā: The Indian Journal of Statistics, Series A (2002): 611-625.