Live Streaming Sessions Dataset
Thousands of live streaming sessions of two major service providers: Twitch.tv and YouTube Live

Dataset accepted to dataset track at MMSyS 2015

Downloads Files Contents Fields Description

Downloads

Download Size Compressed Size Uncompressed 1 day data sample (compressed/uncompressed)
Twitch Dataset 12G 241G Sample Twitch Dataset (244M/3G)
YouTube Live Dataset 1G 81G Sample YouTube Dataset (64M/713M)
Twitch Summary 676M 2.4G do not apply
YouTube Live Summary 54M 134M do not apply
Twitch Sessions 4.7G 25G Sample Twitch Sessions (52M/286M)
YouTube Live Sessions 819M 2.3G Sample YouTube Sessions (6.8M/20M)
Twitch Sessions Filtered 4.7G 14G Sample Twitch Sessions Filtered (28M/145M)
YouTube Live Sessions Filtered 819M 1.2G Sample YouTube Sessions Filtered (2.9M/10M)

Files Contents

Twitch Dataset - file name: twitch.7z

Contains the original logs from the Twitch API in SQLite database files.

Each SQLite database has data of a day. They are named 2014-<MONTH>-<DAY>.sqlite, for example: 2014-01-06.sqlite.

File 2014-01-06.sqlite contents sample for the following SQL command.

select date_utc,url,substr(content,0,100) from data limit 3;

date_utc url content
2014-01-06 00:00:26 http://api.justin.tv/api/stream/list.json?limit=100&offset=0 [{"broadcast_part":14,"featured":true,"channel_subscription":true,"audio_codec":"aac","id":"8119764 ...
2014-01-06 00:00:31 http://api.justin.tv/api/stream/list.json?limit=100&offset=100 [{"subcategory":"series","broadcast_part":1,"featured":false,"channel_subscription":false,"audio_co ...
2014-01-06 00:00:34 http://api.justin.tv/api/stream/list.json?limit=100&offset=200 [{"broadcast_part":34,"featured":false,"channel_subscription":false,"audio_codec":"aac","id":"81153 ...

YouTube Live Dataset - file: youtube.7z

Contains the original logs from the YouTube Live API in SQLite database files.

Each SQLite database has data of a day. They are named 2014-<MONTH>-<DAY>.sqlite, for example: 2014-01-06.sqlite.

File 2014-01-06.sqlite contents sample for the following SQL command.

select date_utc,url,substr(content,0,100) from data limit 3;

date_utc url content
2014-01-06 00:00:03https://gdata.youtube.com/feeds/api/charts/live/events/live_now?v=2&inline=true&max-results=50&start-index=1<?xml version='1.0' encoding='UTF-8'?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:app='http://w ...
2014-01-06 00:00:04https://gdata.youtube.com/feeds/api/charts/live/events/live_now?start-index=51&max-results=50&inline=true&v=2<?xml version='1.0' encoding='UTF-8'?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:app='http://w ...
2014-01-06 00:00:06https://gdata.youtube.com/feeds/api/charts/live/events/live_now?start-index=101&max-results=50&inline=true&v=2<?xml version='1.0' encoding='UTF-8'?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:app='http://w ...

Twitch Summary - file name: twitch-info.tar.gz

Contains the summary information of channels and sessions from the Twitch API.

There are two files named channels.csv with the channels information and sessions.csv with the sessions information.

File channel.csv contents sample:

channel_idrows_countviewers_maxviewers_minviewers_sumvideo_bitrate_maxvideo_bitrate_minvideo_bitrate_sumaccessed_at_utc_std_maxaccessed_at_utc_std_mincategorymaturefeaturedbroadcasterchannel_loginaudio_codectimezonevideo_codecgeosession_id_count
383640851940353030.22656251240.929687539935.093752014-02-11 18:25:002014-01-23 04:15:00gamingFalseobsbongkanoeaacAmerica/Los_AngelesAVCUS6
507072913000939.21875438.843752036.8281252014-02-18 16:00:002014-01-18 13:25:00gamingFalseoctodadheydudeeeeaacAVCGB2
5540686185012425.32031250.02097.83593752014-01-22 13:30:002014-01-22 12:55:00gamingFalseobslevolshadeaacAmerica/Los_AngelesAVCPT6

File sessions.csv contents sample:

session_idrows_countviewers_maxviewers_minviewers_sumvideo_bitrate_maxvideo_bitrate_minvideo_bitrate_sumaccessed_at_utc_std_maxaccessed_at_utc_std_mincategorymaturechannel_idfeaturedbroadcasterchannel_loginaudio_codectimezonevideo_codecgeo
8849407616552153458.44531252870.23437515786.83593752014-03-13 23:10:002014-03-13 22:50:00gaming30206559Falseobsrobk1llmp3Europe/BerlinAVCDE
836786801610001824.78906251824.78906251824.78906252014-01-28 14:50:002014-01-28 14:50:00gamingTrue39885982Falseobskick1337aacEurope/AmsterdamAVCBG
8540483024112061719.56251460.054687517648.9531252014-02-14 12:40:002014-02-14 11:50:00gaming43698438FalseobsxawitoraacEurope/WarsawAVCPL

YouTube Live Summary - file: youtube-info.tar.gz

Contains the information summary of channels and sessions from the YouTube Live API.

There are two files named channels.csv with the channels information and sessions.csv with the sessions information.

File channel.csv contents sample:

channel_idrows_countviewers_maxviewers_minviewers_sumaccessed_at_utc_std_maxaccessed_at_utc_std_mincategoryrating_avgchannel_loginfavorite_countsession_id_count
cDkCoJM0JxDFRxCMi_fmqw2220112014-03-28 02:35:002014-03-28 00:25:00EntertainmentJane Wilde2
EiSlJJL-22IGQcm0m37DNA70002014-03-10 15:15:002014-03-10 13:45:00EntertainmentJuankar Rodriguez2
qpyILgSP-4gLYV5uUBQ-TA47602021302014-03-20 05:35:002014-03-06 03:30:00People5.0TalkingNerd2

File sessions.csv contents sample:

session_idrows_countviewers_maxviewers_minviewers_sumaccessed_at_utc_std_maxaccessed_at_utc_std_mincategorychannel_idrating_avgchannel_loginfavorite_count
Fpq9V5GPTpg11112014-02-19 03:35:002014-02-19 03:35:00Entertainmentm3HRGBa1blGUV6qvWwGL6AVasili Cojemeachin
U0cLWRpum6010002014-03-21 00:45:002014-03-21 00:45:00GamesFf5toC3WtpKukAm7VkCs3QBeastgonerage
5iW_gbP6n0M61022014-02-02 23:10:002014-02-02 22:45:00EntertainmentbXOMk57V6-5bJRdgnnTYTQHany Elgoarany

Twitch Sessions - file name: twitch-sessions.tar.gz

Contains the parsed information from the Twitch API in CSV files.

Each CSV file has data of a day. They are named 2014-<MONTH>-<DAY>.csv, for example: 2014-01-06.csv.

File 2014-01-06.csv contents sample:

accessed_at_utcaccessed_at_utc_stdsession_idchannel_idchannel_loginviewersgeocategoryvideo_bitrateuptimeuptime_seclanguage_channellanguage_sessionvideo_heightvideo_widthembed_countsite_counttimezonesession_countvideo_codecchannel_view_countbroadcasterbroadcast_partfeaturedchannel_subscriptionaudio_codecproducermature
2014-01-06 00:00:262014-01-06 00:00:00811976403222510310speeddemosarchivesda58136GBgaming2961.4765625Sun Jan 5 09:02:05 201432301.0enen720128000US/EasternAVC21161133obs14TrueTrueaacTrue
2014-01-06 00:00:262014-01-06 00:00:00811818656037701508phantoml0rd48254Nonegaming2861.34375Sun Jan 5 06:17:57 201442149.0enen1080192000US/PacificAVC48487513obs20TrueTrueaacTrue
2014-01-06 00:00:262014-01-06 00:00:00811957835221390470sing_sing17281NLgaming1871.640625Sun Jan 5 08:43:31 201433415.0nlnl720128000Europe/AmsterdamAVC14876907obs15TrueTrueaacTrueTrue

YouTube Live Sessions - file: youtube-sessions.tar.gz

Contains the parsed information from the YouTube Live API in CSV files.

Each CSV file has data of a day. They are named 2014-<MONTH>-<DAY>.csv, for example: 2014-01-06.csv.

File 2014-01-06.csv contents sample:

accessed_at_utcaccessed_at_utc_stdsession_idchannel_idchannel_loginviewerscategoryuptimeuptime_secfavorite_countall_viewersrating_avgrating_countrating_likesrating_dislikes
2014-01-06 00:00:032014-01-06 00:00:00bZ1s1bj6AZEMEiyV8N2J93GdPNltPYM6wЕСПРЕСО ТВ1337Nonprofit2013-11-24 23:35:033630300.002282664.4051433948880771411
2014-01-06 00:00:032014-01-06 00:00:00Crw3xqGNUIoLJun193ZJyheIJ9jFM2QOwThe Sport of Bowling - USBC871Sports2013-12-31 22:34:48437115.006104.692307526242
2014-01-06 00:00:032014-01-06 00:00:00492qhhJH2hs2oGvjIJwxn1KeZR3JtE-uQГромадське ТБ768News2014-01-03 14:22:50207433.00149694.800424694289547

Twitch Sessions Filtered - file name: twitch-sessions-fix.tar.gz

Contains the parsed information from the Twitch API in CSV files.

Sessions with no viewers or only with one appearance were filtered out.

We added rows with approximations values for the sessions that have missing rows between two (or more) consecutive snapshots. These rows have the flag fixed_row=1.

Each CSV file has data of a day. They are named 2014-<MONTH>-<DAY>-fix.csv, for example: 2014-01-06-fix.csv.

File 2014-01-06-fix.csv contents sample:

datesession_idchannel_idviewersuptimebitratefixed_row
2014-01-06 00:00:0080886263203940103037Fri Jan 3 00:14:19 2014921.4843750
2014-01-06 00:00:0080888123365959099282Fri Jan 3 00:50:38 20142120.3593750
2014-01-06 00:00:00808935044852539494181Fri Jan 3 02:36:31 2014773.75781250

YouTube Live Sessions Filtered - file: youtube-sessions-fix.tar.gz

Contains the parsed information from the YouTube Live API in CSV files.

Sessions with no viewers or only with one appearance were filtered out.

We added rows with approximations values for the sessions that have missing rows between two (or more) consecutive snapshots. These rows have the flag fixed_row=1.

Each CSV file has data of a day. They are named 2014-<MONTH>-<DAY>-fix.csv, for example: 2014-01-06-fix.csv.

File 2014-01-06-fix.csv contents sample:

datesession_idchannel_idviewersuptimebitratefixed_row
2014-01-06 00:00:003Q0EgzvLjt0yS7HkaIUX2FBJkJUsnFZUA02013-04-09 21:51:4100
2014-01-06 00:00:00roRe2Iuu_bwPIvT-zcQl2H0vabdXJGcpg52013-08-30 18:14:2800
2014-01-06 00:00:005b23nApD49MPIvT-zcQl2H0vabdXJGcpg92013-08-30 18:27:2800

Fields Description

accessed_at_utc Date and time in UTC when the data was collected.
accessed_at_utc_std Date and time in UTC when the data was collected normalized for each 5 minutes. For example for accessed_at_utc 2014-01-06 00:00:03 accessed_at_utc_std is 2014-01-06 00:00:00.
accessed_at_utc_std_max Maximum accessed_at_utc_std found in the aggregated data of this row.
accessed_at_utc_std_min Minimum accessed_at_utc_std found in the aggregated data of this row.
all_viewers Counter of viewers from the services API.
audio_codec Coded used by the broadcaster for audio.
bitrate Bit rate of the video.
broadcast_part Broadcast part.
broadcaster Broadcaster software.
category Category of the channel.
channel_id Channel unique identification.
channel_login Channel alpha numeric login.
channel_subscription Counter for the subscriptions of the channel.
channel_view_count Counter of viewers from the services API.
content Content received from the url requested from the services API.
date Date and time in UTC when the data was collected normalized for each 5 minutes.
date_utc Date and time in UTC when the data was collected.
embed_count Counter of embed from the services API.
favorite_count Counter of favorites from the services API.
featured Flag for the featured channels.
fixed_row If the row was created to fix the data set.
geo Geo localization of the broadcaster.
language_channel Channel language.
language_session Session language.
mature Flag for mature content channel.
producer Flag for producer.
rating_avg Average rating.
rating_count Number of ratings.
rating_dislikes Sum of rating dislikes.
rating_likes Sum of rating likes.
rows_count Row count found in the aggregated data of this row.
session_count Session count found in the aggregated data of this row.
session_id Session unique identification.
session_id_count Session_id count found in the aggregated data of this row.
site_count Site count from the services API.
timezone Timezone
uptime Session start time.
uptime_sec Calculated number of seconds since the start time.
url Url requested from the services API.
video_bitrate Video bitrate.
video_bitrate_max Maximum video bitrate found in the aggregated data of this row.
video_bitrate_min Minimum video bitrate found in the aggregated data of this row.
video_bitrate_sum Sum of video bitrate found in the aggregated data of this row.
video_codec Coded used on the video.
video_height Video height.
video_width Video width.
viewers Number of viewers.
viewers_max Maximum viewers found in the aggregated data of this row.
viewers_min Minimum viewers found in the aggregated data of this row.
viewers_sum Sum of viewers found in the aggregated data of this row.