Saturday, September 17, 2011

Finding a YouTube video title by scraping

I was recently writing a tool in C# and I needed a way to get a YouTube video title. I could have used the YouTube API but I wasn't bothered about setting everything up and going through the documentation, so I decided to scrape the title off the video page itself.

For this, I first wrote a very simple way to execute a GET request and return the response (Removed all forms of error checking for brevity):
public static string ExecuteGETRequest(string url)
{
    var request = (HttpWebRequest)WebRequest.Create(url);

    using (var response = (HttpWebResponse)request.GetResponse())
    {
        var reader = new StreamReader(response.GetResponseStream());

        return reader.ReadToEnd();
    }
}
And here's the method that does the task at hand:
public static string GetYouTubeVideoTitle(string youtubeLinkUrl)
{
    string response = ExecuteGETRequest(youtubeLinkUrl),
             title = response.Substring(response.IndexOf("<title>\n") + 8);

    title = title.Substring(0, title.IndexOf("\n"));
    return title.Trim();
}

Usage

var title = GetYouTubeVideoTitle("http://www.youtube.com/watch?v=TY9jN6hm3N0"); // "Wicked Lady - Ship of Ghosts (Part 1 of 2) 1972"
Be wary of using this method for extracting titles because it could potentially break in the future due to changes in how YouTube renders the page with HTML; but it may be useful for quick scripts.

Checking for link validity

If you want to make sure the given link actually points to a YouTube video, you can use something like this:
public static bool IsValidYoutubeLink(string youtubeLink)
{
    // Regular expression from http://www.regexlib.com/REDetails.aspx?regexp_id=2569
    return Regex.IsMatch(youtubeLink, @"^http://\w{0,3}.?youtube+\.\w{2,3}/watch\?v=[\w-]{11}");
}