Selina Hsuan (sh4354), Ghislaine Jumonville (gj2356), Carolyn Fish (csf2135), Camille Okonkwo (co2554), Sanika Sule (ss6692)



Motivation

The motivation behind our project is to perform a comprehensive analysis on the squirrels of Central Park in New York City. Squirrels are familiar neighbors to all New Yorkers, and Central Park provides over 800 acres of prime habitat for approximately 2,300 squirrels (roughly 2.74 squirrels per acre)1. Given the high prevalence of squirrels in New York City and their versatile role as friends, pests, vectors for diseases, such as rabies, and snacks to local predators, we aim to describe behavioral patterns and spatial distribution of squirrel sightings in Central Park.



Initial questions

Our initial questions focused on discovering trends in demographic, activity, or other behaviors of Central Park Squirrels. We were interested in the following:

  1. Are certain fur colors more prevalent than others in different areas of the park?

  2. Are there relationships between the sounds that squirrels make and the activities they are engaged in?

  3. Are certain types of squirrels more likely to engage with humans?



After exploratory analysis and better understanding the strengths (large sample) and limitations (few obvious spacial trends, short observation period) of our data, we adjusted our questions to be:

  1. Are squirrels demographically similar throughout the park?

  2. Is there an observable trend between squirrel’s age or and their primary fur color?

  3. Are squirrels observed using different areas of the park for different activities?

  4. Does the frequency of squirrels observed in trees (above ground) or on the ground differ by time of day (AM versus PM)? Does this relationship exist after adjusting for various activities?

  5. What sounds do squirrels make and how does this differ by their activity or maturity?



Data

Our data was downloaded from the Squirrel Census github. as a .csv file. The Squirrel Census relies on field observations so despite the data appearing quite “clean” upon download from the source github, we utilized some standard data cleaning checks to ensure the data was clean and ready to be used for our analysis.

The data was cleaned using RStudio/RMarkdown to meet the following criteria:

The source data set contained 3,023 observations and 31 variables. After the removal of observations with missing data, the “clean” data set used for site analyses had 2,824 observations and 36 variables. Variables for day, month, year, date_assembled, and name were added to the final data set after some manipulation was done to adjust the date variable. The name variable was created by randomly assigning squirrels names from an R generated list of first and last squirrel names. The unique_squirrel_id variable is a character variable of 11 values and characters, so it could be a mouthful when trying to consider individual observations. We also believe each squirrel deserves a fun name. The data used was collected between 2018-10-06 and 2018-10-20.

The final data used for analysis, saved here, contains the following variables:



Exploratory analysis

Our exploratory analysis is segmented into three distinct areas of focus: Squirreling the Stats (demographic), Nesting Nooks (activity), and Scurrying Secrets (communication).

In the Demographic section, we looked at the distribution of ages and fur color of squirrels in Central Park. Using plots of the distribution of juvenile and adult squirrels throughout Central Park, there were almost 8 times the number of adults to juveniles observed. Both adults and juveniles were scattered throughout the park, but the juvenile squirrels tended to cluster and were seen quite close to each other, which may be indicative of squirrel nests in the area. Looking at the distribution of squirrel fur color, the majority of Central Park’s squirrels are gray (~80%), while few cinnamon (16%) and black (4%) squirrels were observed. When comparing the distribution of primary fur color by age, we saw that these fur color trends were consistent between adults and juveniles.

When creating the Nesting Nooks page, we thought about different aspects of the data that would be interesting to explore spatially. Originally, maps of primary fur color and age were on the Nesting Nooks page, but eventually we pivoted the page to be more about activity, moving those maps to the demographics page. We were hoping to see more spatial clustering of certain activities, but when mapped the data points were randomly distributed around Central Park. Activities, such as eating, chasing, climbing, and running were observed throughout the park. Using density plots, we observed a modest cluster of increased density for foraging in the center of the park, near The Lake. Next, we looked into the frequency and distribution of squirrels observed on the ground compared to above ground plane (in a tree, on a building, etc.), but saw no spatial trends.

In our final exploratory analysis of communication and vocalizations, we wanted to see if there were any trends in the sounds made by squirrels (kuks, quaas, or moans) and the activities they were engaged in when making sound (climbing, chasing, foraging, eating, running). Using several plots and graphs, we discovered that there were no obvious relationships between the sounds that squirrels make and their interactions with humans or engagement in the aforementioned activities, but in every case “kuks” were the most common sound and moans are fairly rare. Finally, we looked into the relationship between age and vocalizations. The observed trends in vocalizations did not vary between adults and juveniles; squirrels “kuk” most often at all ages.

In summary, after performing exploratory analyses, primarily in the form of visualizations, there did not appear to be as much spatial or physical variation in Central Park squirrels as we had initially thought. Our analysis suggests the vast majority of squirrels in Central Park are adults with gray fur, and they appear to engage in all activities in all areas of the park. Most squirrels do not engage with humans, and they make the same sounds regardless of the activity they are engaged in or age. As a result, we decided to focus our project less on the difference between different groups of squirrels and more on providing descriptive visualizations, particularly in the form of maps, with interactivity features for readers to generally learn more about each squirrel, who they are, and what they do.



Additional analysis

Despite finding that Central Park squirrels do not display a lot of variation, we wanted to perform some statistical analyses to explore trends in squirrel activity more deeply. We were puzzled with what direction any statistical analysis should take due to the categorical nature of this data. The lack of continuous variables limited our options for analysis, so we opted to investigate relationships by using a Chi Square Analysis and a Logistic Regression. The Chi Square Analysis we conducted was a formal hypothesis test exploring the relationship between the shift and location variables. After tidying a subset of the data, we formed a contingency table and ran a chi square test. The goal of our logistic regression analysis was to investigate if we could predict the likelihood of a squirrel being observed on the “ground plane,” based on time of squirrel sighting (shift) and the following behavioral variables: running, chasing, climbing, eating and foraging. We created a binomial model, tidied the output to include the p-value, estimate, and odds ratios. We also did some residual plotting and visualizations to detect outliers and ensure our model was appropriate for the data.

From the Chi-Square Test, we rejected the null hypothesis and found at the 5% level of significance, there is evidence to suggest there is a significant association between the time of day and the location where squirrels are observed. From our research, squirrels exhibit different location preferences or behaviors during the morning (AM) compared to the afternoon (PM). Squirrels are known to be diurnal animals, so the results of the chi-square test serves as additional evidence. Since squirrels have been observed to be active more during dawn and dusk, this aligns with the literature on this topic. From the logistic regression, we found that the shift, running, climbing, eating, and foraging predictors in our model were statistically significant at the 5% level. The effect of chasing was not statistically significant at the 0.05 level. Squirrels spotted on the ground plane also have a greater likelihood of being spotted at night compared to during the day (OR: 1.50, p-value: 0.0003). The QQ plot showed the distribution of our model to be approximately normal. The predicted values plotted against residuals along with the violin plot showed evidence of skew for both the above ground and ground plane plots. For optimization purposes, before running a formal analysis this indicates to us we should check for collinearity and run some additional diagnostic tests to see if we can create a better model.



Discussion

Overall, we found that squirrels are relatively similar throughout the park, both demographically and in terms of behavior. There are no observable trends between squirrel ages and primary fur colors, and squirrels are observed engaging in all activities in all areas of the park. Squirrels also make the same sounds regardless of maturity, activity, or interaction with humans. One of the strengths of this data set was the large number of observations- the squirrel census has plenty of squirrels to collect data on. However, the analyses for this report were limited by the consistency in squirrel presence and behavior throughout the park. Additionally, the squirrel census uses a cross-sectional study design; all the observations were collected within a few days of October 2018. Had longitudinal data been available, seasonal variation in behavior could have been investigated which may have shown some very interesting trends in squirrel behavior over time. The 2018 Squirrel Census data alone may not have revealed variable trends, however it could be interesting to compare Squirrel Census data from years other than 2018 or in relation to other geospatial measures for Central Park, like placement of trash bins or human foot traffic patterns.

From our statistical analyses, we found that there is a significant association between the time of day and the location where squirrels are observed. Squirrels are more likely to be observed on the ground plane during the PM shift, adjusting for the activity they were observed doing. Our project also features an interactive map created with Shiny to enable viewers to explore the spatial distribution of squirrels with various combinations of attributes. The map includes landmarks and streets of Central Park so that specific locations can be recognized. Although there does not appear to be many obvious trends from our analysis of Central Park squirrels, a goal of our project is for viewers to be able to make their own insights and discoveries based on recognizable locations of the park. We hope that viewers will be able to walk away with more awareness and perhaps a new found appreciation of the nearby squirrels during their next visit to Central Park.



References

  1. https://www.centralparknyc.org/articles/getting-to-know-central-parks-squirrels
  2. https://www.thesquirrelcensus.com/data