Product Data General Configuration
Last updated Nov 28th, 2024
Overview
You can import data at the contact, organization and/or the custom product entity level from your data warehouse directly into Common Room via our data warehouse integrations. Once you've shared your data with Common Room (using the instructions for AWS S3, BigQuery/GCS, Redshift, or Snowflake), our team will work with you to finalize the data contract and set up a regular import.
Setup
Setting up the connection involves the following steps:
- [Common Room + Customer] Share the data with Common Room by following the instructions for AWS S3, BigQuery/GCS, Redshift, or Snowflake.
- [Customer] Provide your Common Room contact with the following details: (TODO)
- [Common Room] We will configure the import, and create the fields that data will be written to. The name of each field will correspond to the name of the column in your CSV, TSV, or JSONL file.
- [Customer] Validate that the data was imported successfully, and (if desired) rename the fields to have more user-friendly names. (If you would like to rename workspace fields, let your Common Room contact know first and we will enable this option for you.)
Details
When importing data to Common Room, there are a few important things to keep in mind.
- User data is keyed by a customer email with one record per customer
- Company data is keyed by a unique identifier (e.g. SFDC account id) with one record per company
- Each company record has non-nullable fields for: the primary domain, name
- The primary Domain is what we use to match up with the Organizations in the community. It could be the actual domain of the customer’s company, email domain of the billing admin, account owner, etc.
- Name is used to differentiate multiple records for the same primary domain (e.g. google.com could have Android and Google Maps as two different client records
- Different datasets are written into different top-level locations
- E.g. s3://data/customers/…, s3://data/companies/...
- Data snapshots are written into date-based partitions
- E.g. data/customers/date=20211025 will contain the snapshot generated on 2021-10-25
- Common Room will detect new partitions and always use the data only from the latest partition
- Each partition contains the entire snapshot of the dataset
- Once written the partition should not change
- Files are one of:
- CSV/TSV (optionally gzipped)
- JSONL (optionally gzipped)
FAQ
Should the daily data dump be a delta or full data dump?
We support both full and delta data dumps but full data dumps are preferred.
How should we handle deleted records?
Adding a deleted: true flag on a record can be used to flag deleted records