Dynamic Sitemaps for a static React app

Written by
David Watts

Sitemaps are essential for informing search engines about what URLs are available for their bots to crawl. A new sitemap needs to be generated and reuploaded whenever a new page is published on a website.

This creates unique challenges for websites that rely on user-generated content and dynamic pages because new URLs are created continuously. For example, a recent React project at FullStack Labs required that each item a user creates is given a dynamically generated URL containing details about that item. For search engines to find these new URLs, we need to generate, reupload, and ping Google about a new sitemap whenever an item is created.

This article will cover how to generate sitemaps for dynamic pages using NodeJs, upload them to AWS S3, and ping Google nightly using Serverless. The repo for this project can be found here.

Getting Started

Clone the repository from Github, install Serverless, and install node packages.


git clone https://github.com/dwatts1772/serverless-lambda-sitemap-generator.git
‍npm install -g serverless
npm install

Next, name your service and select your AWS region in serverless.yml.

A Note About Sitemap.xml Limits

A Sitemap.xml file cannot exceed 50MB or 50,000 sitemaps which means that generating an ever growing sitemap.xml would eventually surpass these limits. However, since sitemap files can reference other sitemap files, we can generate one sitemap file index that references all new child sitemap files. For example:


/* sitemap.xml.gz */
‍<?xml version="1.0" encoding="UTF-8"?>
‍<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></sitemapindex>
  <sitemap></sitemap>
    <loc>https://website.com/sitemap_1.xml.gz</loc>
    <lastmod>2019-08-22T21:03:06.832Z</lastmod>
  
/* sitemap_1.xml.gz */

  xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"     
  xmlns:xhtml="http://www.w3.org/1999/xhtml"     
  xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0"     
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"     
  xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
  <url></url>
    <loc>https://website.com/url</loc>
    <lastmod>2019-03-11</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.5</priority>
  
  <url></url>
    <loc>https://website.com/url-2</loc>
    <lastmod>2019-03-11</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.5</priority>

Generating Item URL Sitemaps

Before we can generate each sitemap, we need to get a list of all Items to generate each URL. Since this process will be different for each project, the getArrayOfItems function will require custom code that pulls the necessary data to generate each URL.


const items = getArrayOfItems();

Once you have your array of items, you will need to shape the data into the slug which will be appended to your URL.


const slug = '{YOUR_UNIQUE_SLUG_HERE}';

The slug will be added to a new array of URLs.


_.each(chunk, function(item) {  
  const type = 'customType';  
  const slug = '{YOUR_UNIQUE_SLUG_HERE}';  
  urls.push({    
    url: basePath + slug,    
    changefreq: 'daily',    
    priority: 0.5,    
    lastmod: date,  
  });});

This array will then be passed into a 3rd party sitemap library which will parse and transform each item into a sitemap file.


let sitemap = sm.createSitemap({  
  hostname: siteURL,  
  cacheTime: 600000, /* 600 sec (10 min) cache purge period */  
  urls,});

Finally Upload each sitemap to S3


await upload({  
  content: sitemap.toString(),  
  filename: sitemapGeneratedPrefix + (index + 1) + '.xml.gz',  
  siteMapBucket,});

Generate Master Index Sitemap

Use xmlbuilder to create the sitemap file:


let root = builder  
  .create('sitemapindex', {encoding: 'UTF-8'})  
  .att('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');

Add each sitemap reference to the master sitemap file:


_.each(chunks, function(chunk, index) {  
  let sitemap = root.ele('sitemap');  
  sitemap.ele(    
    'loc',    
    siteURL + '/' + sitemapGeneratedPrefix + (index + 1) + '.xml.gz',  
  );  
  sitemap.ele('lastmod', new Date().toISOString());});

Finally upload the master sitemap file to S3:


let xmlString = root.end({  
  pretty: true,  
  indent: '  ',  
  newline: '\n',  
  allowEmpty: false,});

‍await upload({content: xmlString, filename: sitemapIndex, siteMapBucket});‍

Ping Google

The final step is to ping Google about the uploaded sitemaps. Set at the root of your application using the AWS JS SDK in the upload function.


asyncfunctionpingGoogle({siteURL, sitemapIndex}) {  
  console.log('Pinging Google sitemap has been updated...');  
  await when.promise(function(resolve, reject, notify) {    
    rest      
      .get('http://google.com/ping?sitemap=' + siteURL + '/' + sitemapIndex)      
      .on('success', function(data, response) {        
        console.log('Google Ping: ' + data);                resolve();      
      })      
      .on('fail', function(data, response) {        
        console.log('Google Ping Error:', data);                resolve();      
      });  
  });  
console.log('Google pinged.');}

Using Serverless Framework and NodeJS we can create an AWS Lambda function that runs nightly using AWS Lambda's "CRON" jobs. Serverless makes this incredibly easy by just declaring the schedule in the serverless.yml file.


functions:  
  sitemap:    
    handler: handler.sitemap    
    events:      
      -schedule: rate(24 hours)      
      -schedule: cron(0 12 * * ? *)

In the code above we are declaring which function to run and the cron schedule it should run on. (Once every 24 hours) You could use AWS Cloudwatch rate expressions as well in this scenario. We use CRON due to it being more universally known. https://docs.aws.amazon.com/lambda/latest/dg/tutorial-scheduled-events-schedule-expressions.html

Deploying and Testing

The package.json file contains a script npm run start:offline which will run local test of the code and output the sitemap files for review.

Once everything is where it needs to be you can run serverless deploy to upload all necessary files to AWS Lambda and allow for Serverless to set everything up.

NOTE: Sometimes the VPC setup can in AWS can be a bit difficult.

---

At FullStack Labs, we are consistently asked for ways to speed up time-to-market and improve project maintainability. We pride ourselves on our ability to push the capabilities of these cutting-edge libraries. Interested in learning more about speeding up development time on your next form project, or improving an existing codebase with forms? Contact us.