Skip to content

bigquery

Patrick Thoral edited this page Apr 27, 2022 · 1 revision

AmsterdamUMCdb on Google BigQuery

Introduction

Initially we released AmsterdamUMCdb as downloadable csv files that could be obtained by applying for access from Amsterdam Medical Data Science. However, many individuals and organisations were struggling with setting up an initial database environment that was performant enough for basic data exploration, let alone modelling. In addition, to allow temporarily access for courses and datathons, setting up on premise environments was cumbersome. For this reason, the most recent version of AmsterdamUMCdb is also available from Google BigQuery for authorised user with Google Colaboratory as the main coding environment. This removes the necessity of downloading AmsterdamUMCdb, setting up a database system and installing a coding environment.

Before you begin

Unless you have enrolled in a course or datathon that uses AmsterdamUMCdb, you will need to request access by going to the Amsterdam Medical Data Science website.

Important: Ensure you have a working Google account and verify that the e-mail address used when applying for access has been associated with this account. If you already have a Google account, you can add secondary e-mail addresses here, or alternatively create another Google account using the e-mail address used during application for access.

If you don't have any experience in using Jupyter notebooks and/or Python, it is recommended to familiarize yourself with the basics.

Running Colab with BigQuery

Open Colab with the getting_started notebook from the official AmsterdamUMCdb GitHub repository: Open In Colab

Differences compared to downloadable version

Unvalidated device data

In contrast to the downloadable version of AmsterdamUMCdb, we have moved unvalidated (device) data (registeredby IS NULL) from the numericitems and listitems table to separate tables (numericitems_unvalidated and listitems_unvalidated respectively) to reduce query costs and memory consumption when unvalidated device data is not needed. However, depending on the chosen solution, you may need to combine the data from both tables.

Google SQL Dialects

The example Jupyter notebooks were developed using the PostgreSQL database system as the backend and thus the SQL code used in the notebooks use the PostgreSQL dialect to query the database. However, BigQuery uses either Google Standard SQL or Legacy SQL. While simple queries will run unmodified, more complex queries often require modifying the SQL code.

Frequently asked questions

General remarks

Please follow the instructions in the getting_started notebook carefully: Open In Colab . If you make a mistake, depending on the API used to access Google BigQuery, it may be easier to fix mistakes by restarting from the beginning by selecting Runtime > Restart Runtime (Ctrl+M .).

Error: OSError: Project was not passed and could not be determined from the environment

Make sure to enter your personal or corporate Google Cloud Project ID into the provided text box AND run the cell to set the PROJECT_ID variable. The Project ID will be charged by Google depending on the amount of data processed.

ERROR: 403 Access Denied: Table table_name: User does not have permission to query table table_name

Most likely you are not using the Google account associated with the e-mail address while applying for access. Verify that the e-mail address used when applying for access has been associated with this account. If you already have a Google account, you can add secondary e-mail addresses here, or alternatively create another Google account using the e-mail address used during registration. In addition, unless you have applied for access following the regular procedure on Amsterdam Medical Data Science, access will be revoked after the course or datathon.

Error: The project your_id has not enabled BigQuery

Please make sure you are actually using the Google Project ID, and not the Project Number or Project Name. For more information, see Locate the Project ID