Data Pipelines
Data Pipelines is a paid add-on (opens in a new tab) which continuously exports the events in your Mixpanel project to a cloud storage bucket or data warehouse of your choice. It's useful if you want to analyze Mixpanel events using SQL in your own environment.
Using Data Pipelines requires 2 steps:
- Configuring your destination to allow Mixpanel to write to it.
- Telling Mixpanel to start exporting your data to that destination using the Pipelines API.
We offer a 30-day free trial of the Data Pipelines add-on. See the FAQ for how to enable it.
Step 1: Configuring your destination
Configuration depends on the type of Pipeline you want to set up.
Raw
Raw Pipelines export events as JSON to a cloud storage bucket. This is the simplest approach.
See our configuration guides for each raw destination:
Upon successful creation of a pipeline, events will be exported to the following locations:
- Hourly:
<BUCKET_NAME>/<PATH_PREFIX>/<MIXPANEL_PROJECT_ID>/<YEAR>/<MONTH>/<DAY>/<HOUR>
- Daily:
<BUCKET_NAME>/<PATH_PREFIX>/<MIXPANEL_PROJECT_ID>/<YEAR>/<MONTH>/<DAY>/full_day
An empty complete
file will be written in the finished hour or day prefix to indicate that the export is complete. The absence of this file means there is an ongoing export for that hour or day.
Schematized
Schematized Pipelines export events into tables with schemas generated by Mixpanel, inferred from your event history. There are two types of schemas, which you can configure:
- Monoschema: A single table for all events in which you have the event name as the column and one column per property.
- Multischema: One table per event name with the properties of that event as columns.
See our configuration guides for each schematized destination:
The Schematized Pipeline reference goes the details of schematization and the output format.
Step 2: Creating the Pipeline
Once you’ve configured your destination, you need to tell Mixpanel to start exporting to that destination.
You can do this with our Create Pipeline API (opens in a new tab). You can create the Pipeline directly from our developer docs UI.
Limits:
- For event export pipelines (
data_source: events
) in each Mixpanel project, we support at most two recurring pipelines (to_date
is empty) and one non-recurring pipeline (has ato_date
that is the ending date of the export window). - Note that
from_date
must also be no more than 6 months from the date the pipeline is created.
FAQ
Why are some events or properties not exported to the destination?
This normally happens when you have thousands of unique event names or property names, which is usually an implementation mistake (eg: including a UUID in the event or property name). This causes the export process to exceed table or column limits in the destination. Mixpanel itself imposes a limit of 10K unique properties in your schema after tranformation rules have been applied. Any projects exceeding this limit will have their pipelines paused until the issue can be remediated. If you notice an error in your pipelines around exceeding this limitation please try to identify a regex selector that selects some properties you would like to filter out of your schema and reach out to our support team for assistance.
Why does the number of events in Mixpanel not match the number of exported events to my destination?
This can happen for a few reasons:
- Data Sync is not enabled or not supported for your pipeline.
- Data Delay: it can take up to 1 day for late arriving data to be synced from Mixpanel to your destination.
- Hidden Events: Mixpanel exports all events to your destination, even ones that are hidden in the UI via Lexicon. We recommend checking whether the count in your destination is mostly due to events that have been hidden in the Mixpanel UI.
What timezone is used for my event exports?
Pipeline exports raw data from your project so all exported events so it will be whatever timestamp was supplied at the time of ingestion. This means that the timestamp for exported events from your pipeline will be in the UTC timezone, unless your project was created before 11 Jan 2023. Learn more about managing timezones here.
How can I count events exported by Mixpanel in the warehouse?
Counting events can be slightly different for each warehouse, since we use different partitioning methods. Here are examples for BigQuery and Snowflake.
How does the free trial work?
Mixpanel offers a 30-day trial version of the Data Pipelines. The trial allows for one data export pipeline per project to be created. Simply pass trial=true
to our API to create a trial pipeline.
Trial limitations:
- Export scheduling is daily only.
- Data sync is unavailable.
- You can only create one pipeline per project.
- Backfilled data will only include one day prior to the creation date.
- Pipelines will, by default, include both event and user data (not available for raw pipelines).
- The pipeline cannot filter by event name.
- The “Create Pipeline” parameters will default to the values highlighted to list in the parameters table (opens in a new tab).
Was this page useful?