FullStack Labs

Please Upgrade Your Browser.

Unfortunately, Internet Explorer is an outdated browser and we do not currently support it. To have the best browsing experience, please upgrade to Microsoft Edge, Google Chrome or Safari.
Upgrade

Dynamic Sitemaps for a static React app

Written by 
David Watts
,
Software Architect
Dynamic Sitemaps for a static React app
blog post background
A Day in The Life of a Software Engineer at FullStack Labs
2020 Software Development Price Guide & Hourly Rate Comparison
How Company Culture Attracts Top IT Talent in Colombia

Sitemaps are essential for informing search engines about what URLs are available for their bots to crawl. A new sitemap needs to be generated and reuploaded whenever a new page is published on a website.

This creates unique challenges for websites that rely on user-generated content and dynamic pages because new URLs are created continuously. For example, a recent React project at FullStack Labs required that each item a user creates is given a dynamically generated URL containing details about that item. For search engines to find these new URLs, we need to generate, reupload, and ping Google about a new sitemap whenever an item is created.

This article will cover how to generate sitemaps for dynamic pages using NodeJs, upload them to AWS S3, and ping Google nightly using Serverless. The repo for this project can be found here.

Getting Started

Clone the repository from Github, install Serverless, and install node packages.

-- CODE language-jsx keep-markup --
git clone https://github.com/dwatts1772/serverless-lambda-sitemap-generator.git
npm install -g serverless
npm install

Next, name your service and select your AWS region in serverless.yml.

A Note About Sitemap.xml Limits

A Sitemap.xml file cannot exceed 50MB or 50,000 sitemaps which means that generating an ever growing sitemap.xml would eventually surpass these limits. However, since sitemap files can reference other sitemap files, we can generate one sitemap file index that references all new child sitemap files. For example:

-- CODE language-jsx keep-markup --
/* sitemap.xml.gz */
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></sitemapindex>
  <sitemap></sitemap>
   <loc>https://website.com/sitemap_1.xml.gz</loc>
    <lastmod>2019-08-22T21:03:06.832Z</lastmod>
  

/* sitemap_1.xml.gz */

  xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"    
  xmlns:xhtml="http://www.w3.org/1999/xhtml"    
  xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0"    
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"    
  xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
  <url></url>
    <loc>https://website.com/url</loc>
    <lastmod>2019-03-11</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.5</priority>
  
  <url></url>
    <loc>https://website.com/url-2</loc>
    <lastmod>2019-03-11</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.5</priority>
  

Generating Item URL Sitemaps

Before we can generate each sitemap, we need to get a list of all Items to generate each URL. Since this process will be different for each project, the getArrayOfItems function will require custom code that pulls the necessary data to generate each URL.

-- CODE language-jsx keep-markup --
const items = getArrayOfItems();

Once you have your array of items, you will need to shape the data into the slug which will be appended to your URL.

-- CODE language-jsx keep-markup --
const slug = '{YOUR_UNIQUE_SLUG_HERE}';

The slug will be added to a new array of URLs.

-- CODE language-jsx keep-markup --
_.each(chunk, function(item) {  
  const type = 'customType';  
  const slug = '{YOUR_UNIQUE_SLUG_HERE}';  
  urls.push({    
    url: basePath + slug,    
    changefreq: 'daily',    
    priority: 0.5,    
    lastmod: date,  
  });
});

This array will then be passed into a 3rd party sitemap library which will parse and transform each item into a sitemap file.

-- CODE language-jsx keep-markup --
let sitemap = sm.createSitemap({  
  hostname: siteURL,  
  cacheTime: 600000, /* 600 sec (10 min) cache purge period */  
  urls,
});

Finally Upload each sitemap to S3

-- CODE language-jsx keep-markup --
await upload({  
  content: sitemap.toString(),  
  filename: sitemapGeneratedPrefix + (index + 1) + '.xml.gz',  
  siteMapBucket,
});

Generate Master Index Sitemap

Use xmlbuilder to create the sitemap file:

-- CODE language-jsx keep-markup --
let root = builder  
  .create('sitemapindex', {encoding: 'UTF-8'})  
  .att('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');

Add each sitemap reference to the master sitemap file:

-- CODE language-jsx keep-markup --
_.each(chunks, function(chunk, index) {  
  let sitemap = root.ele('sitemap');  
  sitemap.ele(    
    'loc',    
    siteURL + '/' + sitemapGeneratedPrefix + (index + 1) + '.xml.gz',  
  );  
  sitemap.ele('lastmod', new Date().toISOString());
});

Finally upload the master sitemap file to S3:

-- CODE language-jsx keep-markup --
let xmlString = root.end({  
  pretty: true,  
  indent: '  ',  
  newline: '\n',  
  allowEmpty: false,});

await upload({content: xmlString, filename: sitemapIndex, siteMapBucket});

Ping Google

The final step is to ping Google about the uploaded sitemaps. Set at the root of your application using the AWS JS SDK in the upload function.

-- CODE language-jsx keep-markup --
asyncfunctionpingGoogle({siteURL, sitemapIndex}) {  
  console.log('Pinging Google sitemap has been updated...');  
  await when.promise(function(resolve, reject, notify) {    
    rest      
      .get('http://google.com/ping?sitemap=' + siteURL + '/' + sitemapIndex)      
      .on('success', function(data, response) {        
        console.log('Google Ping: ' + data);        
        resolve();      
      })      
      .on('fail', function(data, response) {        
        console.log('Google Ping Error:', data);        
        resolve();      
      });  
  });  
console.log('Google pinged.');}

Using Serverless Framework and NodeJS we can create an AWS Lambda function that runs nightly using AWS Lambda's "CRON" jobs. Serverless makes this incredibly easy by just declaring the schedule in the serverless.yml file.

-- CODE language-jsx keep-markup --
functions:  
  sitemap:    
    handler: handler.sitemap    
    events:      
      -schedule: rate(24 hours)      
      -schedule: cron(0 12 * * ? *)

In the code above we are declaring which function to run and the cron schedule it should run on. (Once every 24 hours) You could use AWS Cloudwatch rate expressions as well in this scenario. We use CRON due to it being more universally known. https://docs.aws.amazon.com/lambda/latest/dg/tutorial-scheduled-events-schedule-expressions.html

Deploying and Testing

The package.json file contains a script npm run start:offline which will run local test of the code and output the sitemap files for review.

Once everything is where it needs to be you can run serverless deploy to upload all necessary files to AWS Lambda and allow for Serverless to set everything up.

NOTE: Sometimes the VPC setup can in AWS can be a bit difficult.

---

At FullStack Labs, we are consistently asked for ways to speed up time-to-market and improve project maintainability. We pride ourselves on our ability to push the capabilities of these cutting-edge libraries. Interested in learning more about speeding up development time on your next form project, or improving an existing codebase with forms? Contact us.

David Watts
Written by
David Watts
David Watts

As a Senior Software Engineer at Fullstack Labs, I have a demonstrated history of delivering a variety of complex custom software solutions. I have extensive experience building high-performance, scalable applications and deploying enterprise level systems across diverse industries, with an emphasis on user functionality and satisfaction. I also have an in-depth background in creating and architecting large and small data collection services.

FullStack Labs Icon

Let's Talk!

We’d love to learn more about your project. Contact us below for a free consultation with our CEO.
Projects start at $50,000.

company name
name
email
phone
Type of project
Reason for contact
How did you hear about us?
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.