Cohort analysis is used for getting insights about user behaviour when they are grouped by some criteria. This kind of analytics is performed with dataset that represents a history of some events; this might be page views statistics or log of user actions (say, purchases).
On input you need to have tabular data with the following columns:
user_id
create_date
or registration_date
event_date
This article describes how to use SeekTable for cohort analysis and create user retention report (like in Google Analytics).
Online demo for this kind of report: user retention report sample.
Lets assume that our history data is represented by user_events.csv file. After file upload SeekTable automatically suggests "year" and "month" dimensions for date columns:
To perform cohort analysis we need to add one custom dimension to calculate number of months (this may be days or weeks) between event data and registration date (Cube → Edit Configuration). For this purpose "Expression" type of dimension can be used:
Type | Name | Label | Format | Parameters |
---|---|---|---|---|
Expression |
month_from_reg |
Months from Reg Date |
|
In dimension with type="Expression" 1-st parameter is a formula expression, and next N parameters are names of dimensions that are used as arguments of the formula. See also reference on calculated fields for more details on this topic.
Then let's add a measure that counts number of unique users inside groups:
Type | Parameters | Name | Label |
---|---|---|---|
CountUnique |
user_id |
CountUniqueOfUserID |
Count Unique Users |
Now we can configure user retention by month report:
You can use options from "Format" tab to make your user retention report like in Google Analytics:
% of row
By Row
Grand Total
and Column Totals
Now you have basic cohort report; you can use another dimensions on columns to change cohort criteria; also you can change the formula and user another period instead of month - say, a day or quarter.
In case of SQL database all steps are the same as for CSV, only difference is that instead of
CountUnique
measure type (which is available only for CSV cubes) you need to configure custom measure based on SQL COUNT(DISTINCT)
:
Type | Parameters | Name | Label |
---|---|---|---|
FirstValue |
COUNT(DISTINCT user_id) |
CountUniqueOfUserID |
Count Unique Users |
Also you can calculate number of months (weeks, days) between event and registration dates on SQL level,
without usage of "Expression"-type dimension for month_from_reg
. For example if you use MySql you can define it in the following way:
Type | Name | Label | Format | Parameters |
---|---|---|---|---|
Field |
month_from_reg |
Months from Reg Date |
|