Today started with the news that Twitch had been hacked and a trove of data, from source code to operation manuals had been leaked. But the main discussion was around the streamers revenues.
And, I’ll be frank, this is something I am interested in. More that knowing how many X streamers had received over the almost 2 years worth of leaked data, I was more interested in the business model behind.
Setting up the infrastructure
Getting the revenue data was easy enough. The original link to the data had been removed but it had been made available elsewhere since the initial diffusion.
My first challenge was to deal with the 5+GB of raw compressed data and loading it on my personnal machine. I had doubt I could keep all 167’572’112 records in RAM.
Thankfully, {dplyr} works well in conjunction with SQLite3 and that would give me a good opportunity to practice.
twitch_con <- DBI::dbConnect(RSQLite::SQLite(), path = ":twitch:")
DBI::dbExecute(twitch_con, "DELETE FROM twitch") # for the sake of trial and errors
reve_files <- list.files("twitch-leaks-part-one/twitch-payouts/all_revenues//", pattern = "*.csv", full.names = TRUE, recursive = TRUE)
rev_df <- read_csv(gzfile(reve_files[1]), col_types = "ccdddddddddc") %>%
mutate(report_date = lubridate::as_date(report_date, format = "%m/%d/%Y"))
dplyr::copy_to(twitch_con, rev_df, "twitch", temporary = FALSE, overwrite = TRUE)
rev_dfn <- names(rev_df)
for (rev_fil in reve_files[2:length(reve_files)]) {
rev_df <- read_csv(gzfile(rev_fil), col_types = "ccdddddddddc") %>%
mutate(
report_date = lubridate::as_date(report_date, format = "%m/%d/%Y")
) %>%
select(all_of(rev_dfn))
DBI::dbWriteTable(twitch_con, "twitch", k, append = TRUE)
}
Once the data is in the database, it is just a question of playing with Tidyverse. And patience.
Let’s talk about streamer revenue streams
The revenue data gives the details of the main contributions the streamers can receive throught the platform. There is no data dictionnary available but most of the names are self explanatory. Most of them
- ad_share_gross
- sub_share_gross
- bits_share_gross
- bits_developer_share_gross
- bits_extension_share_gross
- prime_sub_share_gross
- bit_share_ad_gross
- fuel_rev_gross
- bb_rev_gross
And looking at the share of each of these revenue stream, subs account for the majority of the revenues. it doesn’t looks like sub gifts are identified in the data though. Prime Gaming subs account for 15% of the revenue and I would have expected this % to be higher as I was under the impression that it was already fairly established in 2019 when the data starts being available (released in 2018 as per the information I found).
revenue_source | revenue |
---|---|
sub_share_gross | 55.7% |
bits_share_gross | 18.5% |
prime_sub_share_gross | 15.3% |
ad_share_gross | 9.1% |
bits_extension_share_gross | 0.9% |
bb_rev_gross | 0.3% |
bits_developer_share_gross | 0.2% |
bit_share_ad_gross | 0.1% |
fuel_rev_gross | 0.0% |
Revenue ranges
Looking at the data, it doesn’t look that interesting to start a career as a streamer. Over the last 2 years, 85% of streamers received less than $10’000.
Total revenue range | % of streamers |
---|---|
$0 - $9 | 14.366% |
$10 - $99 | 8.296% |
$100 - $999 | 28.823% |
$1000 - $9999 | 34.197% |
$10000 - $99999 | 12.069% |
$100000 - $999999 | 2.010% |
$1000000 - $9999999 | 0.223% |
$10000000 - $99999999 | 0.015% |
$100000000 - $999999999 | 0.000% |
And the picture seem even more dire looking at the monthly average:
Monthly average range | % of streamers |
---|---|
$0 - $9 | 27.993% |
$10 - $99 | 33.264% |
$100 - $999 | 29.313% |
$1000 - $9999 | 8.096% |
$10000 - $99999 | 1.207% |
$100000 - $999999 | 0.121% |
$1000000 - $9999999 | 0.006% |
$10000000 - $99999999 | 0.000% |
As such, it seems a complement to another activity rather than a main one. The success of a handful of streamers hide the experience of all the other ones.
Final words and next steps
The main issue with this data is that it does not include stream adjacent revenue streams such as partnership or sponsored streams which tends to be the main revenue streams of streamers.
One thing I would like to see is if there are any difference in revenue between the country or main stream language. But having to go through 2.4 million unique user ID to collect this information is not something I am looking forward to