top of page
Writer's pictureAngelo Buss

DataLakeHouse and the pre-build analytical data model for Salesforce with Streamlit and Snowflake

Updated: Oct 6, 2022

DataLakeHouse (DLH.io) has a complete analytical solution with several SaaS connectors as initially described on this article How to run a pre-build dbt model on Snowflake with DLH.io


This blog post explains the pre-build analytical model for Salesforce and also some of the facts and dimensions ingested into Snowflake Data Warehouse; we will be exploring a little about how it is data modeled with SqlDBM and displayed data through a Streamlit app creating visualizations written in Python.


The dlh_salesforce dbt package provides pre-built models for Salesforce considering a lot of the aspects desired when analyzing Salesforce data.


Exploring dlh_salesforce tables and views


The dlh_salesforce package consists of staging views, facts and dimensions. The staging views represent a layer where the transformations occur, that said, the creation of surrogate keys, new columns and column renaming will happen there. The dlh_salesforce views loaded to Snowflake:


dlh_salesforce views

Now let’s take a quick look into one of the staging views. The V_SFC_ACCOUNTS_STG. In the code snippet below, it’s possible to see that there are surrogate keys named with the suffix DLHK that stands for DataLakeHouseKey and columns considered to be Business Keys with the suffix BK, that’s due to the fact that Salesforce already has unique keys in its objects. And, of course, the object's attributes are selected as well.


W_SFC_ACCOUNTS_STG.sql

...
SELECT
  --DLHK
   MD5(S.ID) AS K_ACCOUNT_DLHK
  ,MD5( TRIM(COALESCE(S.JIGSAW_COMPANY_ID, '00000000000000000000000000000000'))  ) AS K_JIGSAW_COMPANY_DLHK
  ,MD5( TRIM(COALESCE(S.LAST_MODIFIED_BY_ID, '00000000000000000000000000000000'))  ) AS K_LAST_MODIFIED_BY_USER_DLHK
  ,K_USER_DLHK AS K_OWNER_USER_DLHK
  ,MD5( TRIM(COALESCE(S.PARENT_ID, '00000000000000000000000000000000'))  ) AS K_PARENT_DLHK
  ,MD5( TRIM(COALESCE(S.CREATED_BY_ID, '00000000000000000000000000000000'))  ) AS K_CREATED_BY_USER_DLHK
  ,MD5( TRIM(COALESCE(S.DANDB_COMPANY_ID,  '00000000000000000000000000000000'))  ) AS K_DAN_DB_COMPANY_DLHK
  --BUSINESS KEYS
  ,S.ID AS K_ACCOUNT_BK
  ,S.JIGSAW_COMPANY_ID AS K_JIGSAW_COMPANY_BK
  ,S.LAST_MODIFIED_BY_ID AS K_LAST_MODIFIED_BY_USER_BK
  ,S.OWNER_ID AS K_OWNER_USER_BK
  ,S.PARENT_ID AS K_PARENT_BK
  ,S.CREATED_BY_ID AS K_CREATED_BY_USER_BK
  ,S.DANDB_COMPANY_ID AS K_DAN_DB_COMPANY_BK
  
  --ATTRIBUTES
  ,U.A_FULL_NAME AS A_OWNER_FULL_NAME
  ,U.A_USER_ROLE_FULL_NAME AS A_OWNER_ROLE
  ,U.A_USER_PROFILE_NAME AS A_OWNER_PROFILE
...

Besides the staging views, there are also facts and dimensions available for use. These tables could be queried by analytics tools such as Streamlit, company acquired by Snowflake. The dlh_salesforce facts and dimensions loaded to Snowflake:

dlh_salesforce tables

Now let’s take a closer look at a single dimension dbt model, the W_SFC_ACCOUNTS_D.


Differently from staging views, the materialization configured for this model and other facts and dimensions, is defined to build a table. And also notice that column transformations can’t be found here, which is coherent to the practice of keeping the transformations mainly in the staging layer.


W_SFC_ACCOUNTS_D.sql

{{ config (
  materialized= 'table',
  schema= var('target_schema', 'SALESFORCE'),
  tags= ["master", "daily"],
  transient=false
)
}}

SELECT
  *
FROM
  {{ref('V_SFC_ACCOUNTS_STG')}} AS C
...

Reverse engineering with SqlDBM


SqlDBM is a data modeling tool that facilitates the visualization of our data model, it connects natively to Snowflake and other databases, getting the DDLs from the desired databases and schemas and importing these to our data modeling project. You can create a free trial account and create one project to play with it here at https://sqldbm.com/.


Also check out the SqlDBM official guide for more details about how to use SqlDBM at https://sqldbm.com/QuickGuide.pdf.

After creating a project and choosing the database, select “Bring your database“, you will be able to see the connection parameters and choose which schemas will be imported to the project.


choose database window

Once chosen the Snowflake database, the window below will show up with the respective

required parameters to connect to Snowflake.




Having it connected successfully it will display the databases and schemas for selection.




Once connected the DDL for the tables and views will appear in the editor. There, before actually importing, the DDL can be modified as desired, for this sample project it was changed to have some constraints for better visualization of some of the dimensions and facts relationships yet to be used.

retriving views and tables DDLs

Clicking the “Upload SQL Script” option, SqlDBM will evaluate the changes, inform about warnings and errors if they exist and let it be imported, reflecting the diagrams marked to be affected.


importing DDLs script


Finally, having the DDLs imported, it is possible to see all the tables in the diagrams affected. Here is the main diagram focusing on the facts and dimensions that we will be working on the next topic.


main diagram

Creating simple visualizations with Streamlit


An open-source framework designed to create web based apps with visualizations built with Python, Streamlit charges itself of doing that fast and with a low learning curve. We will be using this framework to query dlh_salesforce dimensions and facts that we already loaded to Snowflake and create some visualizations on top of them. Here Streamlit provides a very objective guide to connect Snowflake. https://docs.streamlit.io/knowledge-base/tutorials/databases/snowflake


First it was added the secrets.toml file where our Snowflake credentials are defined.

.streamlit/secrets.toml

[snowflake]
user = ""
password = ""
account = ""
warehouse = ""
database = ""
schema = ""
role = ""

Once configured, the code snippet below provided by the Streamlit tutorial was used to connect to Snowflake, and it was modified to display information related to opportunities, accounts and contacts.


app.py

from turtle import width
import streamlit as st
import snowflake.connector
import pandas as pd
import altair as alt

# Initialize connection.
# Uses st.experimental_singleton to only run once.
@st.experimental_singleton
def init_connection():
    return snowflake.connector.connect(**st.secrets["snowflake"])

conn = init_connection()

# Perform query.
# Uses st.experimental_memo to only rerun when the query changes or after 10 min.
@st.experimental_memo(ttl=600)
def run_query(query):
    with conn.cursor() as cur:
        cur.execute(query)
        return cur.fetch_pandas_all()

accounts = run_query("select * from BRFC_DLH_DBT.BRFC_DBT_DEMO_SALESFORCE.W_SFC_ACCOUNTS_D where A_INDUSTRY is not null;")

opportunities = run_query('''
SELECT 
    opp.*
FROM BRFC_DLH_DBT.BRFC_DBT_DEMO_SALESFORCE.W_SFC_OPPORTUNITIES_F opp
LEFT JOIN BRFC_DLH_DBT.BRFC_DBT_DEMO_SALESFORCE.W_SFC_ACCOUNTS_D account 
    on account.K_ACCOUNT_DLHK = opp.K_ACCOUNT_DLHK
where account.A_INDUSTRY is not null
''')

contacts = run_query('''
select 
    contacts.*
from BRFC_DLH_DBT.BRFC_DBT_DEMO_SALESFORCE.W_SFC_CONTACTS_D contacts
left join BRFC_DLH_DBT.BRFC_DBT_DEMO_SALESFORCE.W_SFC_ACCOUNTS_D account 
    on account.K_ACCOUNT_DLHK = contacts.K_ACCOUNT_DLHK
where account.A_INDUSTRY is not null
''')

accounts_locations = pd.DataFrame(
    [
        ["CA",36.7428983,-119.8238377],
        ["UK", 54.5512799,-4.4737716],
        ["Singapore",1.3143394,103.7041656],
        ["TX",31.1003754,-104.5682181],
        ["NC",35.0877233,-84.4692308],
        ["IL",39.721961,-91.5094288],
        ["OR",44.127451,-122.8260904],
        ["NY",40.6973929,-74.266756]
    ],
    columns=['location', 'lat', 'lon']
)

employees_by_industry = alt.Chart(accounts[['A_NAME', 'A_INDUSTRY', 'M_NUMBEROFEMPLOYEES']]).mark_bar().encode(
    x = alt.X('sum(M_NUMBEROFEMPLOYEES)', title='NUMBER OF EMPLOYEES'), 
    y = alt.Y('A_INDUSTRY', title= 'INDUSTRY')
).properties(height=300)


accounts_by_industry = alt.Chart(accounts[['A_NAME', 'A_INDUSTRY']]).mark_arc().encode(
    theta=alt.Theta(field="A_NAME", title='NUMBER OF ACCOUNTS', type="nominal"),
    color=alt.Color(field="A_INDUSTRY", title='INDUSTRY', type="nominal"),
)

# Print results.
st.title('DLH Streamlit POC')

col1, col2, col3, col4 = st.columns([1,4,3,2])

with col1:
    st.metric('Number of Opportunities', len(opportunities['K_OPPORTUNITY_DLHK'].unique()))
    st.metric('Number of Accounts', len(accounts['K_ACCOUNT_DLHK'].unique()))
    st.metric('Number of Contacts', len(contacts['K_CONTACT_DLHK'].unique()))

with col2:
    st.dataframe(accounts[['A_NAME', 'A_INDUSTRY', 'M_NUMBEROFEMPLOYEES', 'A_BILLING_STATE']])

with col3:
    st.altair_chart(employees_by_industry, use_container_width=True)

with col4:
    st.altair_chart(accounts_by_industry, use_container_width=True)

st.map(accounts_locations[['lat', 'lon']], 1)

And finally running here we have a dashboard with some simple visualizations to show you a glance of what Streamlit is capable of.

streamlit app


Conclusion


When it comes to transforming Salesforce data and producing insights by creating simple visualizations, the combination of Snowflake, dbt, and Streamlit works very cleanly in a very straightforward manner. And also that works better when having pre-built models from the dlh_salesforce package designed to structure salesforce data for better analysis, and why not track changes and comprehend better your data modeling with a tool able to connect to Snowflake like SqlDBM where you can either forward and reverse engineer it.


The tools used here have proven to be very adequate to rapidly, and with a few pieces, build your ELT puzzle.

About the author: Leandro Cavalcanti is a Data Engineer of BRF Consulting - an official partner of DatalakeHouse.io, AICG, Snowflake, dbt Labs, Salesforce and SqlDBM.




295 views0 comments

Comments


bottom of page