JY

Data Analyst, Ronin.

© 2025

Diving into the Twitch leaked revenue dataset

Today started with the news that Twitch had been hacked and a trove of data, from source code to operation manuals had been leaked. But the main discussion was around the streamers revenues.

And, I’ll be frank, this is something I am interested in. More that knowing how many X streamers had received over the almost 2 years worth of leaked data, I was more interested in the business model behind.

Setting up the infrastructure

Getting the revenue data was easy enough. The original link to the data had been removed but it had been made available elsewhere since the initial diffusion.

My first challenge was to deal with the 5+GB of raw compressed data and loading it on my personnal machine. I had doubt I could keep all 167’572’112 records in RAM.

Thankfully, {dplyr} works well in conjunction with SQLite3 and that would give me a good opportunity to practice.


twitch_con <- DBI::dbConnect(RSQLite::SQLite(), path = ":twitch:")

DBI::dbExecute(twitch_con, "DELETE FROM twitch") # for the sake of trial and errors

reve_files <- list.files("twitch-leaks-part-one/twitch-payouts/all_revenues//", pattern = "*.csv", full.names = TRUE, recursive = TRUE)

rev_df <- read_csv(gzfile(reve_files[1]), col_types = "ccdddddddddc") %>%
  mutate(report_date = lubridate::as_date(report_date, format = "%m/%d/%Y"))

dplyr::copy_to(twitch_con, rev_df, "twitch", temporary = FALSE, overwrite = TRUE)

rev_dfn <- names(rev_df)

for (rev_fil in reve_files[2:length(reve_files)]) {
  rev_df <- read_csv(gzfile(rev_fil), col_types = "ccdddddddddc") %>%
    mutate(
      report_date = lubridate::as_date(report_date, format = "%m/%d/%Y")
    ) %>% 
    select(all_of(rev_dfn))
  
  DBI::dbWriteTable(twitch_con, "twitch", k, append = TRUE)
}

Once the data is in the database, it is just a question of playing with Tidyverse. And patience.

Let’s talk about streamer revenue streams

The revenue data gives the details of the main contributions the streamers can receive throught the platform. There is no data dictionnary available but most of the names are self explanatory. Most of them

  • ad_share_gross
  • sub_share_gross
  • bits_share_gross
  • bits_developer_share_gross
  • bits_extension_share_gross
  • prime_sub_share_gross
  • bit_share_ad_gross
  • fuel_rev_gross
  • bb_rev_gross

And looking at the share of each of these revenue stream, subs account for the majority of the revenues. it doesn’t looks like sub gifts are identified in the data though. Prime Gaming subs account for 15% of the revenue and I would have expected this % to be higher as I was under the impression that it was already fairly established in 2019 when the data starts being available (released in 2018 as per the information I found).

revenue_source revenue
sub_share_gross 55.7%
bits_share_gross 18.5%
prime_sub_share_gross 15.3%
ad_share_gross 9.1%
bits_extension_share_gross 0.9%
bb_rev_gross 0.3%
bits_developer_share_gross 0.2%
bit_share_ad_gross 0.1%
fuel_rev_gross 0.0%

Revenue ranges

Looking at the data, it doesn’t look that interesting to start a career as a streamer. Over the last 2 years, 85% of streamers received less than $10’000.

Total revenue range % of streamers
$0 - $9 14.366%
$10 - $99 8.296%
$100 - $999 28.823%
$1000 - $9999 34.197%
$10000 - $99999 12.069%
$100000 - $999999 2.010%
$1000000 - $9999999 0.223%
$10000000 - $99999999 0.015%
$100000000 - $999999999 0.000%

And the picture seem even more dire looking at the monthly average:

Monthly average range % of streamers
$0 - $9 27.993%
$10 - $99 33.264%
$100 - $999 29.313%
$1000 - $9999 8.096%
$10000 - $99999 1.207%
$100000 - $999999 0.121%
$1000000 - $9999999 0.006%
$10000000 - $99999999 0.000%

As such, it seems a complement to another activity rather than a main one. The success of a handful of streamers hide the experience of all the other ones.

Final words and next steps

The main issue with this data is that it does not include stream adjacent revenue streams such as partnership or sponsored streams which tends to be the main revenue streams of streamers.

One thing I would like to see is if there are any difference in revenue between the country or main stream language. But having to go through 2.4 million unique user ID to collect this information is not something I am looking forward to