Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local geocoding database? #39

Open
mikedolanfliss opened this issue Jul 20, 2020 · 2 comments
Open

Local geocoding database? #39

mikedolanfliss opened this issue Jul 20, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@mikedolanfliss
Copy link

(Ported from another issue):

While I'm at suggesting things :), there was once headway on the data-science toolkit (rdstk) on using a local master address database to geocode - much more sustainable for big calls when someone can, for instance, download all geocoded addresses in a state or county and work off that (the way SAS or ArcGIS can). That to me feels like a missing keystone to a complete geocode package, though would be a sizable lift. I'll be watching this package and if there are ways I can contribute (ahem, after COVID, since I'm an overworked epidemiologist at the moment) I'd love to. Again, thanks for the work.

I've at times written my own somewhat lazy/hacky string-match to a known census of addresses in a location, but being able to call geo() on a local geodatabase would make this package indispensable for workers with access to local spatial databases but few API creds (since some of the open source / free geocoders are of much lower capacity than google or geocodio).

Again, great work. Just logging some ideas for the future!

@jessecambon jessecambon added the enhancement New feature or request label Jul 20, 2020
@izahn
Copy link

izahn commented May 27, 2021

https://degauss.org/ might be useful, either directly or for inspiration!

@ottothecow
Copy link
Contributor

ottothecow commented Jan 17, 2022

A similar/related concept is maintaining a local geocoding database consisting of addresses you have already geocoded.

I've rolled my own simple versions of this for my own work, but I've found it to be very useful in several scenarios:

  1. using a paid service (or rate-limited), you don't have to repeat geocoding on addresses that you have previously geocoded. E.g. say you get a new list every week, but sometimes that list has addresses that were also on last week's list.
  2. testing/modifying code. I tend to avoid building geocoding into programs and instead keep it as a standalone script because I don't want to hit external APIs every time I run a program that I have modified in a way that doesn't change the geocoding results. E.g. Instead of writing a simple program that geocodes some addresses and plots them on a map, I would write a program that prepares the data for geocoding, one that sends it out for geocoding, and one that reads the results in and plots it, but sometimes this makes overall program flow awkward since data cleaning that impacts the geocoding must done first, but data cleaning and filtering that affects the plot has to be separated out and delayed until the end.
  3. large amounts of repeated geocoding--in unison with scenario 1. If you need to re-run large lists of addresses that have a lot of overlap with previous work , even if the API is free or you aren't worried about cost it will be significantly faster to pull in locally cached results before jumping to an external geocoder.

Could store the timestamp of the geocode and have a parameter for how often to refresh the result. Ditto for having an option to force replacing cached results.

What I don't know is the most efficient way to do this within the R/tidygeocoder world. When I've rolled my own, I typically have been making my own API requests directly and simply storing the results in whatever format is most convenient in the language I am using. Caching was done based on exact inputs (so "1 Main Street" and "1 Main St." both get their own result), but that takes care of most of the repetition issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants