Friday, April 26, 2013

Using the Blogger API to search within a post's HTML content

A couple of months ago I wanted to switch the syntax highlighter I used to have on this blog to SunlightJS. Thing is though, I didn't want to go through every individual post I had written before and check whether they are using the (old) syntax highlighter to update their HTML markup.

Another problem was that the only way I would be able to truly know if a particular post is using the old syntax highlighter was by going through it's HTML content and determine whether the post currently has a tag like the following: <pre class="

The issue here is that the normal Google driven searchbox packaged with Blogger (the one you see on the right hand side) does not search through a posts' HTML content but rather through its textual content only which means it was useless for my task.

Because of this, I decided to write a script to semi-automate this task for me. The reason I say that the script semi-automates the job is because although I wanted it to show me all the posts in which I had used the old highlighter before, I could not risk making it to update the blog post automatically to use the new syntax highlighter because that would have involved HTML parsing and it was not worth the hassle. Besides, I didn't want to risk ruining my old posts because of a script fuckup.

So what the script does is it uses the Blogger API to batch-fetch all my previous blog posts, go through each posts' individual HTML content and notify me whether that particular post is using the old syntax highlighter. Searching is done using the regular expression engine and for my case, the function was pretty trivial:

getOccurences = function(haystack, needle) {
    var matches = haystack.match(new RegExp(needle, "g"));
    return matches ? matches.length: 0;
}

The Blogger API url I used to fetch my posts' details is as follows:

https://www.googleapis.com/blogger/v3/blogs/<blogid>/posts?key=<API key>

For the script to work I needed to provide it with three things:

Here's a demo of the script in action: http://jsfiddle.net/ydtuc/77/.

And this is how it looks like:


From the live demo (or the screenshot), you can see the output of the script in the bottom right pane. The script nicely formats each of the returned matches in a way that if I click on the #number of the post, it would take me to the actual blog post and clicking on the name of the post takes me to the Blogger editor for that current post so that I can easily update it. The number in parentheses next to the title of the post is the total number of occurrences the needle appears in the haystack.

This is the full script:

var key = 'AIzaSyCEsBaStg4bOHA2Yp75OqfbjBkk_kq0cMw',
    blogId = '7117612757607239790',
    baseMatcher = 'JavaScript', // the needle.
    allPosts = [],
    isProcessing,
    updateButton = $("#update"),
    update = (function (button) {
        var updateText = function (message, color) {
            button.attr("value", message);
            if (color) {
                button.css("color", color);
            }

        },
            revertButtonMessage = function () {
                changeButtonMessage("Update", "black");
            },
            changeButtonMessage = function (message, color) {
                var didIt;
                updateText(message, color);
                setTimeout(revertButtonMessage, 1000);
            },
            switchButtonMessage = function (message, color) {
                updateText(message, color);
            };

        return {
            changeButtonMessage: changeButtonMessage,
            switchButtonMessage: switchButtonMessage,
            revertButtonMessage: revertButtonMessage
        };
    }(updateButton)),
    formatLink = function (text, href) {
        return '<a href="' + href + '" target="_blank">' + text + '</a>';
    },
    getOccurences = function (haystack, needle) {
        var matches = haystack.match(new RegExp(needle, "g"));
        return matches ? matches.length : 0;
    },
    logger = (function () {
        var logs = $("#logs"),
            appendLine = function (line) {
                logs.append((line ? line : "") + "<br/>");
            };

        return {
            log: function (items) {
                appendLine('<span class="logHeader">Logs #' + new Date().getTime() + '</span>');
                appendLine();
                $.each(items, function (i, item) {
                    appendLine(formatLink('#' + i, item.url) + ': ' + formatLink(item.title, "http://www.blogger.com/blogger.g?blogID=" + blogId + "#editor/target=post;postID=" + item.id) + ' (' + getOccurences(item.content, baseMatcher) + ')');
                });
                appendLine();
            },
            clear: function () {
                logs.html('');
            }
        };
    }()),
    filter = function (items, predicateForPass) {
        var i = 0,
            j = items.length,
            item, passed = [],
            inc = 0;

        for (; i < j; ++i) {
            item = items[i];
            if (!predicateForPass(item)) {
                continue;
            }

            passed.push(item);
        }

        return passed;
    },
    getPosts = function (pageToken) {
        var url = 'https://www.googleapis.com/blogger/v3/blogs/' + blogId + '/posts?key=' + key,
            get = function (token) {
                var pagedUrl = url + (token ? ("&pageToken=" + token) : "");
                $.get(pagedUrl, function (response) {
                    var token = response.nextPageToken;
                    allPosts = allPosts.concat(response.items);

                    if (token) {
                        return get(token);
                    }

                    isProcessing = false;
                    update.changeButtonMessage("Finished", "green");
                    var filtered = filter(allPosts, function (item) {
                        var preTotal = getOccurences(item.content, baseMatcher),
                            preNewSyntaxHighlighter = getOccurences(item.content, baseMatcher + "sunlight");
                        return (preTotal - preNewSyntaxHighlighter) > 0;
                    });

                    console.log(filtered);
                    logger.clear();
                    logger.log(filtered);
                    allPosts.length = 0;
                });
            };

        update.switchButtonMessage("Getting posts...", "green");
        isProcessing = true;

        get(pageToken);
    };
updateButton.click(function () {
    if (isProcessing) {
        return update.changeButtonMessage("Busy; try again later", "red");
    }

    getPosts();
});

getPosts();