top of page

How to create a Data Lake on Google BigQuery from Salesforce using DataLakeHouse.io


Now let's create a Data Lake in Google BigQuery using a Salesforce Sales Cloud data source.


First, go to Connections, Sources and add a new Salesforce connection.

DataLakeHouse.io and Salesforce connection

Name your Salesforce connection and Target Schema Prefix as salesforce_salescloud and select either your Production or Sandbox Salesforce environment. Click on Authorize Your Account, where you will be redirected to your Salesforce login page. Enter your Salesforce credentials:

DataLakeHouse.io and Salesforce source

After that you will receive a Connection Successful message.


DataLakeHouse.io and Salesforce successful connection

Open the Actions and click on Reload Schema from Server. This operation takes a few minutes to run depending on the number of Salesforce entities available to your Salesforce organization.


DataLakeHouse.io and Salesforce reload schema


On the Salesforce Source Connection, go to Actions and click on Edit. On the Connection Settings screen go to the Source Schema tab, you should see all of the metadata objects on the screen below.


DataLakeHouse.io and Salesforce objects

NOTE: If you can't see any Schema objects return to the Connection Strings and click on Re-authorize your connection. After this step, go to Source Schema and click Load Schema again. If an error persists, reconnect to DataLakeHouse.io, clean your browser cache and create a new Salesforce source connection.


DataLakeHouse.io and Salesforce

Open Google BigQuery


Go to https://console.cloud.google.com/ and click on Create or Select a Project.


DataLakeHouse.io Google BigQuery

Create a new project named DLH-io and also inform the project ID dlh-io, you will need this ID a few steps later.

DataLakeHouse.io Google BigQuery project

After that you will get a message like this. Click on Select Project.

DataLakeHouse.io Google BigQuery

You will be redirected to the page like this. Search for BigQuery.


DataLakeHouse.io Google BigQuery

You should receive an empty database like this.


DataLakeHouse.io Google BigQuery database

Now search for the IAM:


DataLakeHouse.io Google IAM

On the IAM page click on Grant Access

DataLakeHouse.io Google IAM

Grant access to dlh-global-bq-data-sync-svc@prd-datalakehouse.iam.gserviceaccount.com, assign the role to BigQuery user and click on Save.

DataLakeHouse.io Google BigQuery IAM security

If successful you will see the new role on the IAM page:


DataLakeHouse.io Google BigQuery IAM role

Return to DatalakeHouse.io and create a new BigQuery Target:


DataLakeHouse.io Google BigQuery target

Create a new Target named BigQuery and inform the Project ID created before:

DataLakeHouse.io Google BigQuery target setup

If successful you will receive a message like this:

DataLakeHouse.io Google BigQuery

Go to the Sync Bridge and create a new sync named Salesforce-BigQuery, select the Salesforce source, BigQuery target, GMT time zone and click on Save Sync Bridge.


DataLakeHouse.io Sync Bridge

A new Salesforce-BigQuery sync bridge will appear as the screen bellow

DataLakeHouse.io Sync Bridge

Click on Actions and click on Re-Sync All History in order to bring in all historical and current data.

DataLakeHouse.io Sync History

You should receive this message and click on OK.

DataLakeHouse.io sync bridge

Next, go to the Monitoring page in DataLakeHouse.io and verify the logs:

DataLakeHouse.io log monitor

Return to your Google BigQuery console and refresh the page. You should see all of the Salesforce objects tables with data replicated.

DataLakeHouse.io on BigQuery

Conclusion


DataLakeHouse.io allows a quick setup of your Salesforce source and BigQuery destination in just a few clicks. It doesn't require any coding, firewall or API setting, just make sure that all of the connections were successful in order to create your sync bridge. Wait a few minutes, monitor the logs and view your Salesforce data in BigQuery.


DataLakeHouse.io is a must have technology for any company who requires an advanced and easy-to-use Data Lake and Platform Analytics with several connectors to famous cloud applications. More details on DLH.io.



About the author: Angelo Buss is a Solutions Architect and the founder of BRF Consulting, a DataLakeHouse.io and Salesforce ISV partner with expertise on Data Engineering.







bottom of page