Fun With the Chrome JavaScript Console and the Pluralsight Website

I'm currently working on my third course for Pluralsight. Everyone already knows that Scott Allen is a "dominating force" for Pluralsight but I was curious how many courses other authors have published as well. The Pluralsight Authors page - http://pluralsight.com/training/Authors – shows all 146 authors and you can click on any author's page to see how many (and which) courses they have authored. The problem is: I don't want to have to click into 146 pages to get a count for each author.

With this in mind, I figured I could write a little JavaScript using the Chrome JavaScript console to do some "detective work." My first step was to figure out how the HTML was structured on this page so I could do some screen-scraping. Right-click the first author - "Inspect Element". I can see there is a primary <div> with a class of "main" which contains all the authors. Each author is in an <h3> with an <a> tag containing their name and link to their page:

This web page already has jQuery loaded so I can use $ directly from the console. This allows me to just use jQuery to inspect items on the current page. Notice this is a multi-line command. In order to use multiple lines in the console you have to press SHIFT-ENTER to go to the next line:

Now I can see I'm extracting data just fine. At this point I want to follow each URL. Then I want to screen-scrape this next page to see how many courses each author has done. Let's take a look at the author detail page:

I can see we have a table (with a css class of "course") that contains rows for each course authored. This means I can get the number of courses pretty easily like this:

Now I can put this all together. Back on the authors page, I want to follow each URL, extract the returned HTML, and grab the count. In the code below, I simply use the jQuery $.get() method to get the author detail page and the "data" variable that is in the callback contains the HTML. A nice feature of jQuery is that I can simply put this HTML string inside of $() and I can use jQuery selectors directly on it in conjunction with the find() method:

Now I'm getting somewhere. I have every Pluralsight author and how many courses each one has authored. But that's not quite what I'm after – what I want to see are the authors that have the MOST courses in the library. What I'd like to do is to put all of the data in an array and then sort that array descending by number of courses. I can add an item to the array after each author detail page is returned but the catch here is that I can't perform the sort operation until ALL of the author detail pages have executed. The jQuery $.get() method is naturally an async method so I essentially have 146 async calls and I don't want to perform my sort action until ALL have completed (side note: don't run this script too many times or the Pluralsight servers might think your an evil hacker attempting a DoS attack and deny you). My C# brain wants to use a WaitHandle WaitAll() method here but this is JavaScript.

I was able to do this by using the jQuery Deferred() object. I create a new deferred object for each request and push it onto a deferred array. After each request is complete, I signal completion by calling the resolve() method. Finally, I use a $.when.apply() method to execute my descending sort operation once all requests are complete. Here is my complete console command:

var authorList = [],
    defList = [];
$(".main h3 a").each(function() {
    var def = $.Deferred();
    defList.push(def);
    var authorName = $(this).text();
    var authorUrl = $(this).attr('href');
    $.get(authorUrl, function(data) {
        var courseCount = $(data).find("table.course tbody tr").length;
        authorList.push({ name: authorName, numberOfCourses: courseCount });
        def.resolve();
    });
});
$.when.apply($, defList).then(function() {
    console.log("*Everything* is complete");
    var sortedList = authorList.sort(function(obj1, obj2) {
        return obj2.numberOfCourses - obj1.numberOfCourses;
    });
    for (var i = 0; i < sortedList.length; i++) {
        console.log(authorList[i]);
    }
});

And here are the results:

WOW! John Sonmez has 44 courses!! And Matt Milner has 29! I guess Scott Allen isn't the only "dominating force". I would have assumed Scott Allen was #1 but he comes in as #3 in total course count (of course Scott has 11 courses in the Top 50, and 14 in the Top 100 which is incredible!). Given that I'm in the middle of producing only my third course, I better get to work!