Selina Hsuan (sh4354), Ghislaine Jumonville (gj2356), Carolyn Fish (csf2135), Camille Okonkwo (co2554), Sanika Sule (ss6692)
The motivation behind our project is to perform a comprehensive analysis on the squirrels of Central Park in New York City. Squirrels are familiar neighbors to all New Yorkers, and Central Park provides over 800 acres of prime habitat for approximately 2,300 squirrels (roughly 2.74 squirrels per acre)1. Given the high prevalence of squirrels in New York City and their versatile role as friends, pests, vectors for diseases, such as rabies, and snacks to local predators, we aim to describe behavioral patterns and spatial distribution of squirrel sightings in Central Park.
Our initial questions focused on discovering trends in demographic, activity, or other behaviors of Central Park Squirrels. We were interested in the following:
Are certain fur colors more prevalent than others in different areas of the park?
Are there relationships between the sounds that squirrels make and the activities they are engaged in?
Are certain types of squirrels more likely to engage with humans?
After exploratory analysis and better understanding the strengths (large sample) and limitations (few obvious spacial trends, short observation period) of our data, we adjusted our questions to be:
Are squirrels demographically similar throughout the park?
Is there an observable trend between squirrel’s age or and their primary fur color?
Are squirrels observed using different areas of the park for different activities?
Does the frequency of squirrels observed in trees (above ground) or on the ground differ by time of day (AM versus PM)? Does this relationship exist after adjusting for various activities?
What sounds do squirrels make and how does this differ by their activity or maturity?
Our data was downloaded from the Squirrel Census github. as a .csv file. The Squirrel Census relies on field observations so despite the data appearing quite “clean” upon download from the source github, we utilized some standard data cleaning checks to ensure the data was clean and ready to be used for our analysis.
The data was cleaned using RStudio/RMarkdown to meet the following criteria:
Every squirrel (observation) required a unique identifier
(unique_squirrel_id
).
Every squirrel had complete location/ geo data
(hectare
, x
(latitude), and y
(longitude)).
Every squirrel required non-missing data for age
,
primary
fur color, location
,
activity
, and date
observed.
All continuous variables were numeric and stored as such.
Dates must be saved in date format.
The source data set contained 3,023 observations and 31 variables.
After the removal of observations with missing data, the “clean” data
set used for site analyses had 2,824 observations and 36 variables.
Variables for day
, month
, year
,
date_assembled
, and name
were added to the
final data set after some manipulation was done to adjust the date
variable. The name variable was created by randomly assigning squirrels
names from an R generated list of first and last squirrel names. The
unique_squirrel_id
variable is a character variable of 11
values and characters, so it could be a mouthful when trying to consider
individual observations. We also believe each squirrel deserves a fun
name. The data used was collected between 2018-10-06 and 2018-10-20.
The final data used for analysis, saved here, contains the following variables:
x
: Longitude.y
: Latitude.unique_squirrel_id
: Identification tag for each
squirrel sightings. The tag is comprised of “Hectare ID” + “Shift” +
“Date” + “Hectare Squirrel Number.”hectare
: ID tag, which is derived from the hectare grid
used to divide and count the park area. One axis that runs predominantly
north-to-south is numerical (1-42), and the axis that runs predominantly
east-to-west is roman characters (A-I).shift
: Value is either “AM” or “PM,” to communicate
whether or not the sighting session occurred in the morning or late
afternoon.date
: Concatenation of the sighting session day and
month.hectare_squirrel_number
: Number within the
chronological sequence of squirrel sightings for a discrete sighting
session.age
: Value is either “Adult” or “Juvenile.”primary_fur_color
: Value is either “Gray,” “Cinnamon”
or “Black.”highlight_fur_color
: Discrete value or string values
comprised of “Gray,” “Cinnamon” or “Black.”combination_of_primary_and_highlight_color
: A
combination of the previous two columns; this column gives the total
permutations of primary and highlight colors observed.color_notes
: Sighters occasionally added commentary on
the squirrel fur conditions. These notes are provided here.location
: Value is either “Ground Plane” or “Above
Ground.” Sighters were instructed to indicate the location of where the
squirrel was when first sighted.above_ground_sighter_measurement
: For squirrel
sightings on the ground plane, fields were populated with a value of
“FALSE.”specific_location
: Sighters occasionally added
commentary on the squirrel location. These notes are provided here.running
: Squirrel was seen running.chasing
: Squirrel was seen chasing.climbing
: Squirrel was seen climbing.eating
: Squirrel was seen eating.foraging
: Squirrel was seen foraging.other_activites
: Other activities.kuks
: Squirrel was heard kukking, a chirpy vocal
communication used for a variety of reasons.quaas
: Squirrel was heard quaaing, an elongated vocal
communication which can indicate the presence of a ground predator such
as a dog.moans
: Squirrel was heard moaning, a high-pitched vocal
communication which can indicate the presence of an air predator such as
a hawk.tail_flags
: Squirrel was seen flagging its tail.
Flagging is a whipping motion used to exaggerate squirrel’s size and
confuse rivals or predators. Looks as if the squirrel is scribbling with
tail into the air.tail_twitches
: Squirrel was seen flagging its tail.
Flagging is a whipping motion used to exaggerate squirrel’s size and
confuse rivals or predators. Looks as if the squirrel is scribbling with
tail into the air.approaches
: Squirrel was seen approaching human,
seeking food.indifferent
: Squirrel was indifferent to human
presence.runs_from
: Squirrel was seen running from humans,
seeing them as a threat.other_interactions
: Sighter notes on other types of
interactions between squirrels and humans.lat_long
: Combined lat long.day
: Day value.month
: Month value.year
: Year value.date_assembled
: Concatenated date of sighting.name
: Unique squirrel name.Our exploratory analysis is segmented into three distinct areas of focus: Squirreling the Stats (demographic), Nesting Nooks (activity), and Scurrying Secrets (communication).
In the Demographic section, we looked at the distribution of ages and fur color of squirrels in Central Park. Using plots of the distribution of juvenile and adult squirrels throughout Central Park, there were almost 8 times the number of adults to juveniles observed. Both adults and juveniles were scattered throughout the park, but the juvenile squirrels tended to cluster and were seen quite close to each other, which may be indicative of squirrel nests in the area. Looking at the distribution of squirrel fur color, the majority of Central Park’s squirrels are gray (~80%), while few cinnamon (16%) and black (4%) squirrels were observed. When comparing the distribution of primary fur color by age, we saw that these fur color trends were consistent between adults and juveniles.
When creating the Nesting Nooks page, we thought about different aspects of the data that would be interesting to explore spatially. Originally, maps of primary fur color and age were on the Nesting Nooks page, but eventually we pivoted the page to be more about activity, moving those maps to the demographics page. We were hoping to see more spatial clustering of certain activities, but when mapped the data points were randomly distributed around Central Park. Activities, such as eating, chasing, climbing, and running were observed throughout the park. Using density plots, we observed a modest cluster of increased density for foraging in the center of the park, near The Lake. Next, we looked into the frequency and distribution of squirrels observed on the ground compared to above ground plane (in a tree, on a building, etc.), but saw no spatial trends.
In our final exploratory analysis of communication and vocalizations, we wanted to see if there were any trends in the sounds made by squirrels (kuks, quaas, or moans) and the activities they were engaged in when making sound (climbing, chasing, foraging, eating, running). Using several plots and graphs, we discovered that there were no obvious relationships between the sounds that squirrels make and their interactions with humans or engagement in the aforementioned activities, but in every case “kuks” were the most common sound and moans are fairly rare. Finally, we looked into the relationship between age and vocalizations. The observed trends in vocalizations did not vary between adults and juveniles; squirrels “kuk” most often at all ages.
In summary, after performing exploratory analyses, primarily in the form of visualizations, there did not appear to be as much spatial or physical variation in Central Park squirrels as we had initially thought. Our analysis suggests the vast majority of squirrels in Central Park are adults with gray fur, and they appear to engage in all activities in all areas of the park. Most squirrels do not engage with humans, and they make the same sounds regardless of the activity they are engaged in or age. As a result, we decided to focus our project less on the difference between different groups of squirrels and more on providing descriptive visualizations, particularly in the form of maps, with interactivity features for readers to generally learn more about each squirrel, who they are, and what they do.
Despite finding that Central Park squirrels do not display a lot of
variation, we wanted to perform some statistical analyses to explore
trends in squirrel activity more deeply. We were puzzled with what
direction any statistical analysis should take due to the categorical
nature of this data. The lack of continuous variables limited our
options for analysis, so we opted to investigate relationships by using
a Chi Square Analysis and a Logistic Regression. The Chi Square Analysis
we conducted was a formal hypothesis test exploring the relationship
between the shift
and location
variables.
After tidying a subset of the data, we formed a contingency table and
ran a chi square test. The goal of our logistic regression analysis was
to investigate if we could predict the likelihood of a squirrel being
observed on the “ground plane,” based on time of squirrel sighting
(shift
) and the following behavioral variables:
running
, chasing
, climbing
,
eating
and foraging.
We created a binomial
model, tidied the output to include the p-value, estimate, and odds
ratios. We also did some residual plotting and visualizations to detect
outliers and ensure our model was appropriate for the data.
From the Chi-Square Test, we rejected the null hypothesis and found
at the 5% level of significance, there is evidence to suggest there is a
significant association between the time of day and the location where
squirrels are observed. From our research, squirrels exhibit different
location preferences or behaviors during the morning (AM) compared to
the afternoon (PM). Squirrels are known to be diurnal animals, so the
results of the chi-square test serves as additional evidence. Since
squirrels have been observed to be active more during dawn and dusk,
this aligns with the literature
on this topic. From the logistic regression, we found that the
shift
, running
, climbing
,
eating
, and foraging
predictors in our model
were statistically significant at the 5% level. The effect of chasing
was not statistically significant at the 0.05 level. Squirrels spotted
on the ground plane also have a greater likelihood of being spotted at
night compared to during the day (OR: 1.50, p-value: 0.0003). The QQ
plot showed the distribution of our model to be approximately normal.
The predicted values plotted against residuals along with the violin
plot showed evidence of skew for both the above ground and ground plane
plots. For optimization purposes, before running a formal analysis this
indicates to us we should check for collinearity and run some additional
diagnostic tests to see if we can create a better model.
Overall, we found that squirrels are relatively similar throughout the park, both demographically and in terms of behavior. There are no observable trends between squirrel ages and primary fur colors, and squirrels are observed engaging in all activities in all areas of the park. Squirrels also make the same sounds regardless of maturity, activity, or interaction with humans. One of the strengths of this data set was the large number of observations- the squirrel census has plenty of squirrels to collect data on. However, the analyses for this report were limited by the consistency in squirrel presence and behavior throughout the park. Additionally, the squirrel census uses a cross-sectional study design; all the observations were collected within a few days of October 2018. Had longitudinal data been available, seasonal variation in behavior could have been investigated which may have shown some very interesting trends in squirrel behavior over time. The 2018 Squirrel Census data alone may not have revealed variable trends, however it could be interesting to compare Squirrel Census data from years other than 2018 or in relation to other geospatial measures for Central Park, like placement of trash bins or human foot traffic patterns.
From our statistical analyses, we found that there is a significant association between the time of day and the location where squirrels are observed. Squirrels are more likely to be observed on the ground plane during the PM shift, adjusting for the activity they were observed doing. Our project also features an interactive map created with Shiny to enable viewers to explore the spatial distribution of squirrels with various combinations of attributes. The map includes landmarks and streets of Central Park so that specific locations can be recognized. Although there does not appear to be many obvious trends from our analysis of Central Park squirrels, a goal of our project is for viewers to be able to make their own insights and discoveries based on recognizable locations of the park. We hope that viewers will be able to walk away with more awareness and perhaps a new found appreciation of the nearby squirrels during their next visit to Central Park.