Dynamic Sitemaps for a static React app

Written by David Watts

Sitemaps are essential for informing search engines about what URLs are available for their bots to crawl. A new sitemap needs to be generated and reuploaded whenever a new page is published on a website.


This creates unique challenges for websites that rely on user-generated content and dynamic pages because new URLs are created continuously. For example, a recent React project at FullStack Labs required that each item a user creates is given a dynamically generated URL containing details about that item. For search engines to find these new URLs, we need to generate, reupload, and ping Google about a new sitemap whenever an item is created.



This article will cover how to generate sitemaps for dynamic pages using NodeJs, upload them to AWS S3, and ping Google nightly using Serverless. The repo for this project can be found here.



Getting Started


Clone the repository from Github, install Serverless, and install node packages.

    
git clone https://github.com/dwatts1772/serverless-lambda-sitemap-generator.git

npm install -g serverless

npm install
    
  

Next, name your service and select your AWS region in

serverless.yml

.


A Note About Sitemap.xml Limits


A Sitemap.xml file cannot exceed 50MB or 50,000 sitemaps which means that generating an ever growing sitemap.xml would eventually surpass these limits. However, since sitemap files can reference other sitemap files, we can generate one sitemap file index that references all new child sitemap files. For example:

    
// sitemap.xml.gz
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://website.com/sitemap_1.xml.gz</loc>
    <lastmod>2019-08-22T21:03:06.832Z</lastmod>
  </sitemap>
</sitemapindex>
// sitemap_1.xml.gz
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
    xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" 
    xmlns:xhtml="http://www.w3.org/1999/xhtml" 
    xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0" 
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" 
    xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
    <url>
        <loc>https://website.com/url</loc>
        <lastmod>2019-03-11</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.5</priority>
    </url>
    <url>
           <loc>https://website.com/url-2</loc>
        <lastmod>2019-03-11</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.5</priority>
    </url>
</urlset>
    
  

Generating Item URL Sitemaps


Before we can generate each sitemap, we need to get a list of all Items to generate each URL. Since this process will be different for each project, the

getArrayOfItems

function will require custom code that pulls the necessary data to generate each URL.

    
const items = getArrayOfItems();
    
  

Once you have your array of items, you will need to shape the data into the slug which will be appended to your URL.

    
const slug = '{YOUR_UNIQUE_SLUG_HERE}';
    
  

The slug will be added to a new array of URLs.

    
_.each(chunk, function(item) {
  const type = 'customType';
  const slug = '{YOUR_UNIQUE_SLUG_HERE}';
  urls.push({
    url: basePath + slug,
    changefreq: 'daily',
    priority: 0.5,
    lastmod: date,
  });
});
    
  

This array will then be passed into a 3rd party

sitemap

library which will parse and transform each item into a sitemap file.

    
let sitemap = sm.createSitemap({
  hostname: siteURL,
  cacheTime: 600000, //600 sec (10 min) cache purge period
  urls,
});
    
  

Finally Upload each sitemap to S3

    
await upload({
  content: sitemap.toString(),
  filename: sitemapGeneratedPrefix + (index + 1) + '.xml.gz',
  siteMapBucket,
});
    
  

Generate Master Index Sitemap


Use

xmlbuilder

to create the sitemap file:

    
let root = builder
  .create('sitemapindex', {encoding: 'UTF-8'})
  .att('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');
    
  

Add each sitemap reference to the master sitemap file:

    
_.each(chunks, function(chunk, index) {
  let sitemap = root.ele('sitemap');
  sitemap.ele(
    'loc',
    siteURL + '/' + sitemapGeneratedPrefix + (index + 1) + '.xml.gz',
  );
  sitemap.ele('lastmod', new Date().toISOString());
});
    
  

Finally upload the master sitemap file to S3:

    
let xmlString = root.end({
  pretty: true,
  indent: '  ',
  newline: '\n',
  allowEmpty: false,
});

await upload({content: xmlString, filename: sitemapIndex, siteMapBucket});
    
  

Ping Google


The final step is to ping Google about the uploaded sitemaps. Set at the root of your application using the AWS JS SDK in the

upload

function.

    
async function pingGoogle({siteURL, sitemapIndex}) {
  console.log('Pinging Google sitemap has been updated...');
  await when.promise(function(resolve, reject, notify) {
    rest
      .get('http://google.com/ping?sitemap=' + siteURL + '/' + sitemapIndex)
      .on('success', function(data, response) {
        console.log('Google Ping: ' + data);
        resolve();
      })
      .on('fail', function(data, response) {
        console.log('Google Ping Error:', data);
        resolve();
      });
  });
  console.log('Google pinged.');
}
    
  

Using Serverless Framework and NodeJS we can create an AWS Lambda function that runs nightly using AWS Lambda's "CRON" jobs. Serverless makes this incredibly easy by just declaring the schedule in the

serverless.yml

file.

    
functions:
  sitemap:
    handler: handler.sitemap
    events:
      - schedule: rate(24 hours)
      - schedule: cron(0 12 * * ? *)
    
  

In the code above we are declaring which function to run and the cron schedule it should run on. (Once every 24 hours)

You could use AWS Cloudwatch rate expressions as well in this scenario. We use CRON due to it being more universally known. https://docs.aws.amazon.com/lambda/latest/dg/tutorial-scheduled-events-schedule-expressions.html



Deploying and Testing


The package.json file contains a script

npm run start:offline

which will run local test of the code and output the sitemap files for review.

Once everything is where it needs to be you can run

serverless deploy

to upload all necessary files to AWS Lambda and allow for Serverless to set everything up.

NOTE: Sometimes the VPC setup can in AWS can be a bit difficult.


---
At FullStack Labs, we are consistently asked for ways to speed up time-to-market and improve project maintainability. We pride ourselves on our ability to push the capabilities of these cutting-edge libraries. Interested in learning more about speeding up development time on your next form project, or improving an existing codebase with forms? Contact us.

Let’s Talk!

We’d love to learn more about your project. Contact us below for a free consultation with our CEO.
Projects Start at $25,000.

FullStack Labs
This field is required
This field is required
Type of project
Reason for contact:
How did you hear about us? This field is required