Step 1. Export your data from Amplitude
Amplitude offers a few different export methods. Pick the one that matches your data size and setup: 1. S3 Export For high volume backfills, you can dump your Amplitude data into an S3 bucket. 2. Warehouse Export If your Amplitude data is already in Snowflake, BigQuery, or Redshift, you can skip file downloads. Statsig can ingest directly from these warehouses (see Step 2.1) 3. Export API Use Amplitude’s Export API to pull gzipped JSON- Limit: 4 GB per request so use hourly windows for large ranges
- Example:
- Best for small datasets or initial testing
Step 2. Transform your data
Amplitude and Statsig store events in slightly different formats. To make your Amplitude exports work in Statsig, you’ll need to map your Amplitude data to Statsig’s format. This step is required irrespective of how you choose to import data into Statsig in the next step.| Amplitude field | Statsig field |
|---|---|
event_type | event |
event_time | timestamp (ms since epoch) |
user_id | user.userID |
device_id | user.stableID |
event_properties | metadata |
user_properties | user fields |
Step 3. Import into Statsig
Once your data look like Statsig events, you can start to bring them in. There are a few paths to import your data depending on how you exported:| If you exported from Amplitude via… | Import into Statsig using… | Best when… |
|---|---|---|
| S3 export | S3 ingestion | You’re backfilling large datasets |
| Warehouse (Snowflake/BQ/Redshift) | Warehouse ingestion | Your Amplitude data already lives in a warehouse |
| Export API | Event Webhook | You’re moving a few days/weeks of data programmatically |
| UI download (CSV/JSON) | Event Webhook | You’re testing or moving a small slice of data |
- Just ensure the files are transformed to Statsig schema, in Parquet/JSON/CSV form, and then follow Statsig’s S3 ingestion steps.
- Please note that you need to shard your Amplitude raw data into 1 day’s data per directory for to be able into Statsig
- Connect your warehouse to Statsig
- Point Statsig at a query that outputs events in the expected schema
- Statsig ingests on a recurring schedule