What Is “Most Popular?”

The right-hand column of this blog notes the five most popular posts. It uses the MostViewed Nucleus plug-in, which displays the items that have received the most hits. This method favors old posts, since the longer they are on the site, the more hits they get (mostly from Google). MostViewed uses the following SQL query to find the top five most popular items:

"SELECT i.inumber id, v.views views, i.ititle title ".
"FROM ".sql_table('plugin_views')." v, ".sql_table('item')." i ".
"WHERE v.id = i.inumber ".
"ORDER BY views DESC ".
"LIMIT 0, ".intval($numOfPostsToShow);

I wrote a new query to take the item’s age into account. This produces the top posts by calculating the rate of hits over time.

SELECT i.inumber id, v.views views, i.ititle title,
((UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(i.itime))/v.views) AS pop
FROM nucleus_plugin_views v, nucleus_item i
WHERE v.id = i.inumber AND i.itime > '1970-01-01'
ORDER BY pop;

(Because of rounding, this calculates the ratio of the number of seconds the post has been online to the number of hits and then sorts ascending. The 1970 limit keeps drafts out of the results, as draft posts have a time of 0, which mySQL stores as 1969-12-31)

This produced the opposite effect, favoring recent posts. This is because old posts are found primarily through Google search results, whereas new posts are found through RSS subscriptions and home page views in addition to web searches. The rate of hits taper off once the item has disappeared from these two sources.

Another plugin, MostPopular, determines popularity by the number of comments rather than the number of page views. I don’t prefer this approach, but most readers are lurkers, and it doesn’t make a lot of sense to exclude most readers from a measure of popularity.

Periodically resetting the view counts appears to produce the best results, giving newer items a fair shot to rise to the top, but it requires manual intervention from time to time. Since the Views plugin stores hit totals, not individual hits, it’s not possible to count hits only since a specific date, or to exclude recent hits that are a result of post prominence.

If I really want to capture a more accurate measure of popularity (a questionable endeavor at best), then I should modify the Views plugin to log individual hits by date and then plot a frequency distribution of hits over time. I expect this would produce a bubble in the first couple of weeks of a post’s existence when the item is within the RSS feed and home page, and then taper off to a baseline popularity level based on search engine and link hits. Once the hits are stored by date, it would be possible to measure and correct for the bubble effect or implement a cutoff date in order to capture just the baseline popularity rate.

I didn’t know that mySQL had so many functions for performing calculations! That’s one unexpected benefit of this investigation.

Comments are closed.