No one gets it right the first time
My entire first week of Insight was spent coming up with at least a dozen independent project ideas. They ranged from tracking popular trends to politics and natural resources. None of them bore any fruit after a quick search for data or a deeper look into the problem I was trying to solve (if there was one). Instead of throwing more random ideas together, I focused my efforts on thinking about what I really wanted to work on and what components should be involved.
- Machine learning
- Something with a personal touch
With at least a framework of what a project should look like, I started looking for inspiration everywhere. As an avid gamer, one place I turned to was Twitch since it was an amazing source of video. After scanning through their site, I found that The International, a Dota 2 championship, had just aired a month or so prior and over 50 hours of video, audio, and chat logs were available.
If I could get the video and chat from the broadcast for The International, I could potentially find highlights in the super long videos that contained multiple games using the chat logs as my guide. I could use machine learning to find the game segments in the video, generate analytics on the chat logs, and on top of that, it would be something that really had a personal touch.
That's if I could make it work.
I was able to figure out how to get the data I needed (I'll leave that as an exercise for the reader to figure out), but that was one of the few successes I had during my first week of working on Livebeat. It turns out that processing 10+ hours of 720p, 60 fps video is really hard. And I didn't have a fancy server cluster, just my laptop. Turns out that Python, even with all of its available packages, didn't have the best tools for processing such large videos. I tried OpenCV but couldn't even get it working with my videos after about 12 hours of work. I had some success with ImageIO, a Python wrapper for ffmpeg, but it would crash after processing about 25% of each video. I'd have to cut each video up into quarters and babysit my computer to make sure I wasn't losing any time.
As the deadline for having a minimum viable product loomed, I had to find a better solution. I had to find a way to process each video in full without breaking it up.