Setting up network access control for BigQuery

Ben Herzberg
4 min readAug 24, 2020

So you’re using BigQuery as your Cloud Data Warehouse. However, in order to reduce risks, you want to limit access to BigQuery, and allow access only to specific IP addresses or ranges.

Why apply network access controls on top of BigQuery?

Some examples for situations when you want to apply network access control over BigQuery:

  • You want to allow access to BigQuery only from your corporate IP addresses, so that employees will not be able to access the data without being on the network.
  • You want to allow access only to your production applications that will be communicating with BigQuery.
  • You’re doing an audit and would like to show that you have network access controls on top of your data warehouse.

The main problem with this, is that unlike an on-premesis data warehouse, or even unlike a Redshift cluster which has its own network connection, BigQuery is accessible via Google APIs, which are publicly accessible.

How to setup BigQuery network access control

If we can’t block network access to the API itself, we need some way to not allow API commands running through the API based on where you’re coming from. That’s where Google’s VPC (Virtual Private Cloud) Service Controls comes handy. Using it, you can limit network access, as well as set other policies to limit access to BigQuery (as well as other multi-tanent services). Once you do that, you will limit all access to BigQuery based on the policies you set (CLI, API, through cloud console, etc).

Step-by-step guide:

These are the step-by-step instructions on how to set up a sample network access control policy on top of BigQuery, taken from the BigQuery security guide I wrote in Satori:

  1. Go to your G-suite Admin Panel, and once you’re inside, navigate to Security > Context-Aware Access > Access Levels. Once there, create a policy with the IP ranges you would like to allow access to, as per the below image:

2. In Google Cloud Console, go to VPC Service Controls and create a new perimeter. I recommend starting with a dry-run perimeter first, so you can verify that you’re not blocking legitimate traffic.

3. Follow the following settings in the wizard (using your own configurations :)):

Setting up your perimeter name
Choosing to restrict access around BigQuery
Choosing the ingress policy to use

4. That’s it, you’re done! You can now see violations in the log, as well as verify that it works. As an example, let’s run a query in the cloud console:

Getting blocked because you fail to meet the policy

Limitations

As you can see, setting up a network access control that will limit network access to your BigQuery data warehouse is simple. However, there are a couple of limitations:

  1. The main limitation of this method is that you can’t be granular. You can’t limit (as example) network access to certain data (for example: a certain dataset or table will only be accessible from the corporate network).
  2. In addition, you can’t set a network access policy based on data types (for example: you will not be able to access PII unless you’re in a certain network.

If you would like to overcome these limitations, as well as other smart access controls, classification-on-read, and gaining an analytics layer on top of your data warehouse or data lake (BigQuery, Snowflake, Redshift and others), feel free to contact us in Satori, for a security and analytics layer on top of your DWH.

--

--