generated from CDCgov/template
-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathREADME.Rmd
298 lines (190 loc) · 15.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit the README.Rmd file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
<!-- badges: start -->
[](https://www.repostatus.org/#active)
[](https://github.com/CDCgov/EPHTrackR/actions?workflow=R-CMD-check)
[](https://CRAN.R-project.org/package=sword)
[](https://lifecycle.r-lib.org/articles/stages.html#stable)
<!-- badges: end -->
<!-- [](https://travis-ci.org/cont-limno/LAGOSNE) [](https://cran.r-project.org/package=LAGOSNE) [](https://cran.r-project.org/package=LAGOSNE)
[](https://codecov.io/gh/tidyverse/dplyr?branch=master)-->
# EPHTrackR <img src="man/figures/CDC_Tracking_Combined.jpg" align="right" height=140/>
The `EPHTrackR` package provides an R interface to access and download publicly available data stored on the [CDC National Environmental Public Health Tracking Network](https://www.cdc.gov/nceh/tracking/about.htm) (Tracking Network) via a connection with the [Tracking Network Data API](https://ephtracking.cdc.gov/apihelp). A detailed user guide describing available API calls and associated outputs can be found at that link. Associated metadata for downloaded measure data can be found on the Tracking Network [Indicators and Data](https://ephtracking.cdc.gov/searchMetadata) page. Users might find it easier to view the online [Data Explorer](https://ephtracking.cdc.gov/DataExplorer/) to get a sense of the available datasets before or while using this package.
The purpose of the Tracking Network is to deliver information and data to protect the nation from health issues arising from or directly related to environmental factors. At the local, state, and national levels, the Tracking Network relies on a variety of people and information systems to deliver a core set of health, exposure, and hazards data; information summaries; and tools to enable analysis, visualization, and reporting of insights drawn from data.
**Measures** are the core data product of the Tracking Network. Measures are organized into **indicators**, groups of highly related measures, and **content areas**, the highest level of categorization containing all the indicators related to a broad topic of interest. Using Tracking Network data, users can create customized maps, tables, and graphs of local, state, and national data. The Tracking Network contains data covering several focal areas: <br>
— Health effects of exposures, such as asthma <br>
— Hazards in the environment, such as air pollution <br>
— Climate, such as extreme heat events <br>
— Community design, such as access to parks <br>
— Lifestyle risk factors, such as smoking <br>
— Population characteristics, such as age and income <br>
## Installation
```{r install,eval=FALSE, echo=T}
#install development version from Github
#install devtools first if you haven't previously - install.packages("devtools")
devtools::install_github("CDCgov/EPHTrackR",
dependencies = TRUE)
```
## Load package
```{r load_library, eval=T}
library(EPHTrackR)
```
## Saving a Tracking API token
The `tracking_api_token()` function adds a Tracking API token to your .Renviron file so it can be called securely without being stored in your code. After you have run this function, the token will be called automatically by all the other functions in this package. Leaving the default argument `install=T`, ensures that the token will be saved in future R sessions. A token can be acquired by emailing, trackingsupport(AT)cdc.gov. A token is not required to use this package, but you will experience less throttling and better API support if you have one. Further information is available at https://ephtracking.cdc.gov/apihelp.
```{r save_token, eval=FALSE, echo=t}
tracking_api_token("XXXXXXXXXXXXXXXXX",
install=T)
#After you run this function, reload your environment so you can use the token without restarting R.
readRenviron("~/.Renviron")
#Token can be viewed by running:
Sys.getenv("TRACKING_API_TOKEN")
```
## Full measure inventory
Running the `list_measures()` function without any additional inputs retrieves the latest inventory of content areas, indicators, and measures from the Tracking Network Data API.
```{r full_data, eval=FALSE, echo=t}
measures_inventory <- list_measures()
View(measures_inventory)
```
## Viewing content area, indicator, and measure names
Each content area, indicator, and measure has a full name and a unique identifier. You can use this information to determine what data are available on the Tracking Network and make appropriate calls to download the data (see below).
#### List content areas available
```{r content areas, eval=T}
ca_df <- list_content_areas()
head(ca_df)
```
#### List indicators in specified content area(s)
```{r indicators, eval=T}
ind_df <- list_indicators(content_area = "Heat & Heat-related Illness (HRI)")
head(ind_df)
```
#### List measures in specified indicator(s) and/or content area(s)
```{r measures, eval=T}
meas_df <- list_measures(content_area = 36)
head(meas_df)
```
## Viewing available geographic and temporal types and items for specified measures
Measures on the Tracking Network vary in their geographic resolution (e.g., state, county), geographic extent (e.g., Massachusetts, Michigan, Pennsylvania, California, Georgia), temporal resolution (e.g., year, month) and temporal extent (e.g., 2000-2010, 2010-2020). By becoming familiar with the geographies and temporal periods for which data are available using this function, you can make more targeted data downloads.
#### List geographic types available for specified measures
Measures are typically available at the state, county, or census tract level.
```{r geographic types, eval = T}
geog_type_df <- list_GeographicTypes(measure= "Number of Square Miles within FEMA Designated Flood Hazard Area")
head(geog_type_df[[1]])
```
#### List geographic items available for specified measures
This function identifies the particular geographic items (e.g., Alabama) that are available for a specified measure. It will reveal both the lowest level geographic items available (e.g., a county or census tract) and an overarching items, like states (i.e., parent geographic item) that contains the lowest level geographic items you would like returned in the data.
```{r geographic items, eval = T}
geog_item_df <- list_GeographicItems(measure= "Annual Number of Extreme Heat Days from May to September",
geo_type="County")
head(geog_item_df[[1]])
```
#### List temporal items available for specified measures
Measures are typically available at an annual scale, but also can be daily, monthly, or weekly. This function identifies the particular years, months, days etc. that are available for the specified measure (e.g., 2001, Aug 2020, etc.)
```{r temporal items, eval = T}
temp_df <- list_TemporalItems(measure= "Annual Number of Extreme Heat Days from May to September")
head(temp_df[[1]])
```
## Viewing available Advanced Options for data stratification
In addition to geographic and temporal specifications, some measures on the Tracking Network have a set of **Advanced Options** that allow users to access data stratified by other variables. For instance, data on asthma hospitalizations can be broken down by age and/or gender.
#### List available stratification levels for a measure
**Advanced Options** might only be available at a particular geographic scale (e.g., age-breakdown of asthma hospitalizations is only available at the state level). Therefore, results showing available stratification levels always include the geography type. The output of this function is a list with a separate element for each geography type available (e.g., state, county) for the specified measure. Each row in the data frame elements of the list shows a stratification available for the measure.
```{r statification levels, eval = T}
strat_df <- list_StratificationLevels(measure=99)
head(strat_df[[1]])
```
#### List available stratification types for a measure
If you'd like to query Tracking data by a specific stratification level (e.g., you'd like data for just males), then you need to run the `list_StratificationTypes()` function to identify the internal name for the stratification and the appropriate code for the stratification level of interest (e.g., 1 for male, 2 for female). The internal name can be found in the in the ColumnName column of the function output and codes can be found in the nested list found in the stratificationItem column of the function output. Refer back to these when constructing queries with with the `get_data()` function.
```{r statification types, eval = T}
strat_df <- list_StratificationTypes(measure=99,
geo_type="State")
strat_df[[1]]
#viewing the nested list in the stratificationItem column of the function output to identify stratification level codes
strat_df[[1]]$stratificationItem
```
## Accessing Tracking Network data
You can use the information from the functions listed above to request specific data from the Tracking Network Data API. Be careful when making queries. If you request data that include many years and/or data at a fine geographic scale, the dataset could be very large and take a long time to download (if it doesn't crash your session and bog down the entire API).
We recommend that you include only one measure and one stratification level in a data query. Many geographies and temporal periods may be submitted. The function will likely still work if vectors of multiple measures or stratification levels are submitted, but the resulting object will be a multi-element list and distinguishing the element that applies to a particular measure or stratification might be difficult. The output of function calls with a single measure and stratification level is a list with one element containing the relevant data frame.
#### Downloading state-level measure data
```{r data download simple, eval = T}
data_st<-get_data(measure=99,
strat_level = "ST")
head(data_st[[1]])
```
#### Downloading measure data with advanced options
The advanced stratification options are submitted via the `strat_level` argument and the subset of stratification levels derived from the `list_StratificationTypes()` function can be submitted with the `stratItems` argument.
```{r data download advanced, eval = T}
data_strat.item <- get_data(measure=99,
strat_level = "ST_AG_GN",
temporalItems = c(2005),
geoItems = "Arizona",
stratItems = c("GenderId=1","AgeBandId=3"))
head(data_strat.item[[1]])
```
#### Downloading measure data with advanced options and specific geographies
You can submit state names, abbreviations, or state FIPS codes with the `geoItems` argument. This will return data for either the specified state(s) or all the sub-state geographies within the state, depending on the geography type of the data (e.g., state, county). Individual sub-state geographies can also be submitted by FIPS code or name with this argument. For measures with sub-state geographies, a mix of state and sub-state entries can be submitted.
```{r data download monthly, eval = T}
data_mo.geo<-get_data(measure=99,
strat_level = "State x County", #this can be written by name or ID (i.e., "ST_CT)
geoItems = c("Massachusetts",
"Alameda, CA", #county name should not include word 'county' and must have state
1001))
head(data_mo.geo[[1]])
```
#### Downloading measure data with specific geographies and temporal periods selected
```{r data download the works, eval = T}
data_tpm.geo <- get_data(measure=99,
strat_level = "ST",
geoItems = c("CO", "ME", "FL"),
temporalItems = c(2014:2018))
head(data_tpm.geo[[1]])
```
## Public Domain Standard Notice
This repository constitutes a work of the United States Government and is not
subject to domestic copyright protection under 17 USC § 105. This repository is in
the public domain within the United States, and copyright and related rights in
the work worldwide are waived through the [CC0 1.0 Universal public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/).
All contributions to this repository will be released under the CC0 dedication. By
submitting a pull request you are agreeing to comply with this waiver of
copyright interest.
## License Standard Notice
The repository utilizes code licensed under the terms of the Apache Software
License and therefore is licensed under ASL v2 or later.
This source code in this repository is free: you can redistribute it and/or modify it under
the terms of the Apache Software License version 2, or (at your option) any
later version.
This source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the Apache Software License for more details.
You should have received a copy of the Apache Software License along with this
program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html
The source code forked from other open source projects will inherit its license.
## Privacy Standard Notice
This repository contains only non-sensitive, publicly available data and
information. All material and community participation is covered by the
[Disclaimer](https://github.com/CDCgov/EPHTrackR/blob/master/DISCLAIMER.md)
and [Code of Conduct](https://github.com/CDCgov/EPHTrackR/blob/master/code-of-conduct.md).
For more information about CDC's privacy policy, please visit [http://www.cdc.gov/other/privacy.html](https://www.cdc.gov/other/privacy.html).
## Contributing Standard Notice
Anyone is encouraged to contribute to the repository by [forking](https://help.github.com/articles/fork-a-repo)
and submitting a pull request. (If you are new to GitHub, you might start with a
[basic tutorial](https://help.github.com/articles/set-up-git).) By contributing
to this project, you grant a world-wide, royalty-free, perpetual, irrevocable,
non-exclusive, transferable license to all users under the terms of the
[Apache Software License v2](http://www.apache.org/licenses/LICENSE-2.0.html) or
later.
All comments, messages, pull requests, and other submissions received through
CDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at [http://www.cdc.gov/other/privacy.html](http://www.cdc.gov/other/privacy.html).
## Records Management Standard Notice
This repository is not a source of government records, but is a copy to increase
collaboration and collaborative potential. All government records will be
published through the [CDC web site](http://www.cdc.gov).