Challenge question¶
From the original file, extract a new file containing the latitude and longitude values for locations with extremely high wind measurements, say greater than 300 Knots. Using R, load in the data and plot the latitude and longitude values against each other as a scatter plot to see the locations where the extreme measurements were recorded.
Suppose we've already created the csv.
library(tidyverse)
# load the CA2015 wind data
data <- read_csv("tutorial_data/data/CA_wind_2015.csv")
data %>% glimpse()
Rows: 11,048,282 Columns: 24 $ `State Code` <chr> "01", "01", "01", "01", "01", "01", "01", "01", ~ $ `County Code` <chr> "073", "073", "073", "073", "073", "073", "073",~ $ `Site Num` <chr> "0023", "0023", "0023", "0023", "0023", "0023", ~ $ `Parameter Code` <dbl> 61103, 61103, 61103, 61103, 61103, 61103, 61103,~ $ POC <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~ $ Latitude <dbl> 33.55306, 33.55306, 33.55306, 33.55306, 33.55306~ $ Longitude <dbl> -86.815, -86.815, -86.815, -86.815, -86.815, -86~ $ Datum <chr> "WGS84", "WGS84", "WGS84", "WGS84", "WGS84", "WG~ $ `Parameter Name` <chr> "Wind Speed - Resultant", "Wind Speed - Resultan~ $ `Date Local` <date> 2015-01-01, 2015-01-01, 2015-01-01, 2015-01-01,~ $ `Time Local` <time> 00:00:00, 01:00:00, 02:00:00, 03:00:00, 04:00:0~ $ `Date GMT` <date> 2015-01-01, 2015-01-01, 2015-01-01, 2015-01-01,~ $ `Time GMT` <time> 06:00:00, 07:00:00, 08:00:00, 09:00:00, 10:00:0~ $ `Sample Measurement` <dbl> 0.3, 1.0, 0.9, 0.7, 0.3, 0.6, 0.6, 0.7, 1.0, 0.8~ $ `Units of Measure` <chr> "Knots", "Knots", "Knots", "Knots", "Knots", "Kn~ $ MDL <dbl> 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1~ $ Uncertainty <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~ $ Qualifier <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~ $ `Method Type` <chr> "Non-FRM", "Non-FRM", "Non-FRM", "Non-FRM", "Non~ $ `Method Code` <chr> "061", "061", "061", "061", "061", "061", "061",~ $ `Method Name` <chr> "Instrumental - Met One Sonic Anemometer Model 5~ $ `State Name` <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Ala~ $ `County Name` <chr> "Jefferson", "Jefferson", "Jefferson", "Jefferso~ $ `Date of Last Change` <date> 2015-05-26, 2015-05-26, 2015-05-26, 2015-05-26,~
data %>% select(`Units of Measure`) %>% unique()
Units of Measure |
---|
<chr> |
Knots |
Degrees Compass |
Long/Lat extraction¶
On CLI, we did
awk -F',' '$14>300 || NR==1 {print $6,$7,$14}'
but R may be quicker:
data.longlat <- data %>%
filter(`Sample Measurement` > 1 & `Units of Measure` == "Knots") %>%
select(c("Latitude", "Longitude", "Sample Measurement", "Units of Measure"))
# ensure that there are only Knots
data.longlat %>% select("Units of Measure") %>% unique()
Units of Measure |
---|
<chr> |
Knots |
Plot longitude vs latitude¶
library(ggplot2)
data.longlat %>% ggplot(
aes(
x = Longitude,
y = Latitude,
colour = `Sample Measurement`
)
) +
geom_point() +
# geom_smooth(method = glm) +
labs(
title = "Wind Speeds in California 2015",
x = "Longitude",
y = "Latitude",
color = "Wind Speed (Kn)"
)