If you are working on an application that requires you to store large files locally, you may run into some challenges. Maybe you’re working with big data and want to give your client the ability to work offline. Or maybe you need to store a large number of media files, such as images, videos, music, etc. The most obvious approach would be to include these dependencies along with your application’s download and installation.
Then, what if you need to update some of the dependencies? What if your app is hosted on the web?
Several years back, I worked on a TCG card game and ran into this exact problem. Our software was coded in C#, and we opted to use the "WebClient Class" for our HTTP downloads. At first, we opted for a simple solution, which seemed almost obvious, as it’s quite simple. The client starts by asking the server for a nice list of what it needs to download. The server returns a long list including the file's download link, local path, size, hash, etc. Then finally, the client runs through that list, comparing the hash of each file with the local files that it finds, and marking the mismatches for download, as well as the files that don't exist locally. Then it proceeds to download.
Files are downloaded one by one. Our TCG game had over 1,500 media files at the point where we started running into speed issues, and they were about double their regular sizes due to data encryption. Downloading 1,500 files one by one is problematic, even if those files are less than 1 MB each.
So my first update to the system was to make it download up to 6 files simultaneously, ideally cutting the download speed a great deal. One thing to note here is that this may require configuration changes on whatever class/component you are using for your downloads. The number of downloads you want to do simultaneously depends more on your needs, so consider 6 as just an example of the TCG’s use case at the time.
There's no worse feeling than rolling out an update and your users getting stuck waiting on a tedious update, painfully sitting through a slow download.
Downloading more than just 1 file at a time will increase your speed, but if you have over 1,000 files, and especially if there are small files included in that list, then your download client may never achieve full speed. You can host these files on a server with upload speeds of up to 1 GB per second and download them from a client with download speeds of up to 1 GB per second, but this does not mean that your 6 160 MB files will be instantly downloaded. When each download process is started, there are a few things that need to happen:
That’s right, "the download speed needs to ramp up to full speed," and I believe that’s generally what someone new to building something like this may overlook entirely. Downloading about 10K 1KB files may take some time, simply because of the waiting for the process that happens above for each individual file when using the file per file system.
Introducing file bundles. This isn’t reinventing the wheel or anything, just a little bit of combining existing technology to work together and help us achieve our goals. Instead of having a direct file-by-file download, your publishing logic can and should compress each individual file, as well as organize these files into different directories. For our TCG, I kept each directory at 500MB max, unless an individual file exceeded that, in which case it was the only file in that directory.
Publishing systems will vary, and there are many ways to accomplish this, but in my specific TCG example, we had logic in place to know what our most recent directory was, and its current size. We published files individually, so when a new file was published, we checked if it was eligible to join the last directory without exceeding our size limit; otherwise, we would spin off a new directory and go from there. Other newly published files could then join the party until it reaches the limit again, and so on.
After each publication, the directory that’s just been updated would then get archived into a file bundle. The file list system mentioned above is still used, but now we also include a reference to which bundle it belongs. When your client logic cross-checks for missing files that it needs to download, we add logic to determine if it would be faster to download the file individually or download the bundle instead based on what the client is missing and all of the configuration mentioned above.
Now you have a system that allows you to maximize your download speed because, as long as your bundle size is configured to complement your client/server max capacity, then downloading a file bundle allows your client to ramp up to, and remain at, its max speed potential until all of those files are downloaded.
In the case of our TCG, with about 2 GB of media data, it took our system about 2 hours to do a fresh install (remembering that this was several years ago). This solution reduced that download time to under 3 minutes, with plenty of room for improvement via configuration.
In the end, the key takeaways are that if you want max speed, you need to allow your client to reach and stay at max speed for as long as it makes sense, while at the same time managing your data distribution so clients are only downloading the files they need.
The most challenging part to build is your bundle management system, which needs to edit and create new bundles on the fly as files are updated. The end result allows your application’s end-users to download updates at the full speed potential of the client/server internet connection.
We’d love to learn more about your project.
Engagements start at $75,000.