Making your web applications Search Engine Friendly has always been important for scoring high in search engine results. More and more of those front end applications are evolving towards SPA's (Single Page App) which are inherently difficult to crawl, thereby potentially impacting your search ranking. However, the dilemma between a focus on UX or SEO is a fallacy. In this post we’ll have a look at how you can both offer a dynamic, fast and user-friendly web application and still keep search engine crawlers happy.
In The Beginning All Was Static
In the beginning of the Internet, all HTML pages that were sent to clients were assembled server side. It was in that period that Google first came up with their idea of a search engine. Google’s crawler could extract content and structure by simply analysing static HTML pages. The HTML that crawlers consumed was identical to what users got to see in their browsers.
Crawlers of search engine providers were lagging behind. For a fair share of public facing websites there was a discrepancy between what the crawler saw and the end result in the user’s browser. In order not to undermine the relevance and completeness of its search engine, Google took the initiative in 2009 and developed a ‘proposal for making AJAX crawable’. The proposal contains an agreement between web servers serving dynamic web pages and the Google crawler. Shortly thereafter Google implemented the proposal - after which it became the de facto standard. Competing search engines such as Bing/Yahoo and DuckDuckGo have also started supporting this standard.
Google’s Offer You Can’t Refuse
According to Google’s AJAX crawling specification, there are two ways to tell crawlers they should activate their AJAX crawling mechanism when visiting your site:
- The use of hashbangs (#!) in URL’s for client side routing purposes. In the original HTML specification, the fragment identifier (#) was intended to point a user to a specific section of a web page. Because these fragments are never sent to the web server, modern front end frameworks started (mis)using it as a means to store client side state and enable back- and forward navigation in browsers. Google states that whenever it encounters a URL containing a hashbang as a fragment identifier, it will consider the page to be ‘AJAX crawlable’. The routing module of AngularJS, for example, allows developers to easily customize the hash prefix to meet that requirement.
- Including a special meta tag in the head of the HTML of your page:
Once the crawler encounters a web page that meets one of these requirements, it will transform the URL by adding the query parameter ‘
_escaped_fragment_’ and appending all hash fragments as its value. For example, when you have a hashbang URL like this:
The crawler will transform it to this:
Similarly, when you have a web page that contains the special meta tag:
Google’s crawler will convert it to the following:
This strange looking query parameter makes sure the complete URL is sent to the web server. According to the standard, a web server should respond to these kind of requests with a HTML snapshot. That snapshot should be identical to the end result a user gets to see his browser, but in pure HTML form. HTML is a crawler’s favourite dish. As such, it will happily index your snapshot and expose the clean URL in search results.
When SEO is important to your business, you should definitely consider following Google’s crawling protocol. One way to do this would be to intercept requests containing the ‘
_escaped_fragment_’ query parameter and serve a static, trimmed down version of certain parts of your application in pure HTML. This approach however, is a form of cloaking which is considered a bad practice in SEO land. It also adds additional overhead server side.
You could perfectly write your own prerenderer by using a headless browser emulator such as HtmlUnit or PhantomJS. Luckily there are already quite some companies offering this functionality as a service: BromBone, prerender.io, SnapSearch and seo4ajax. In general, they're all using the same underlying mechanism. The prerenderer is mostly registered at the middleware level and intercepts HTTP requests server side. When they detect the request originates from a crawler (due to the ‘
_escaped_fragment_’) they will render the requested page and return the HTML to the crawler. HTML snapshots are often cached, for example on Amazon S3, to ensure subsequent crawl attempts can be served quickly. Most of them are free for a small amount of web pages. Prerender.io even open sourced their complete prerendering middleware, which allows you to run the complete infrastructure in-house if you so desire.
For some sites SEO can mean the difference between profit and failure. An investment in SEO should be taken into account upfront, even more so when your company’s next web application will be a SPA. Consider using specialized prerender middleware when you take SEO seriously. Crawlers are closing in on prerenderers but will need some more time before they’ll be able to offer the same capabilities; especially when dealing with client side routing.
In the end, it all boils down to optimizing the User Experience. That experience often starts in the Google Search Box.