Static website converter?

Jun 21, 2020

Does anyone know a good tool that can convert a dynamic website into a static website?

The Thimbleweed Park blog was built using PHP and a MongoDB database and it's quite complex. The website won't ever change and I like to turn it into a static site so I can host the files somewhere else and can shut down the server.

Ideally it would be a tool that you just point at the url and it scrapes the site, producing a bunch of .html files.

Does such a thing exist?


Darren Jun 21, 2020
I'd give HTTrack a try, it's worked well for me in the past.

Misel Jun 21, 2020
What about the good old

wget --recursive --no-parent <url>

Ron Gilbert Jun 21, 2020
wget pulls down the site, but it doesn't fix up the links so it's a navigable site in a browser.  The "files" download don't end in .html and the links that point to them don't get a .html added.

Chris Armstrong Jun 21, 2020
Have you looked into adding a REST API to turn the existing site into a ‘headless CMS', and then using a static site generator like Gatsby (https://github.com/gatsbyjs/gatsby/) or Jekyl (https://github.com/contentful/jekyll-contentful-data-import) to generate the HTML pages?

Ron Gilbert Jun 21, 2020
That sounds like a lot of work. I just want to get all the html files and host them statically.  If it takes me more than an hour, it's not worth my time.

Daniel Lewis Jun 21, 2020
HTTrack

Darius Jun 21, 2020
Maybe wget with http option "--adjust-extension" could do the trick?
http://www.gnu.org/software/wget/manual/wget.html#HTTP-Options

Maxime Priv Jun 21, 2020
I used SiteSucker in the past for this. I think it will do what you need (if you're on a Mac). You can try the free version on their website ;)

https://ricks-apps.com/osx/sitesucker/index.html

Jon B Jun 21, 2020
I'd second a recommendation for Gatsby, it might be a bit over an hour but it has a wonderful model for pulling in dynamic sources into structured data and then formatting it into a static output. Tons of plugins on the source side to pull from just about anything. Haven't personally used a mongo source but I see that there's a first party source plugin for it:

https://www.gatsbyjs.org/packages/gatsby-source-mongodb/

AlgoMango Jun 21, 2020
It's already archived on archive.org (or you can request it should rescan the latest version) so you can download it from there as a complete static site easily with  wayback-machine-downloader. It's a Ruby script. Install Ruby and then "gem install wayback_machine_downloader". After it installs all you need to do is type "wayback_machine_downloader http://grumpygamer.com/" and wait for the magic to happen;) ,
Only issue might be that you really just get the front-end, no logins etc. but I've found it useful. Just takes five minutes to try so I think it's worth a try... Enjoy!
Repo:
https://github.com/hartator/wayback-machine-downloader

Brian Jun 21, 2020
Setting aside Jekyll based hosting, this sort of practice of scrfaping is employed by the archiveTeam. They have resources up at https://github.com/dhamaniasad/WARCTools

I suspect more than one of them will convert your front-end into a warc without issue. The trick is then rendering that warc as a non-awful output. I'd say this is very much a plan B if wget doesn't adjust paths for you.

Kevin Jun 21, 2020
Gitlab Pages & Hugo - powerful yet simple to use.

Matteo Jun 21, 2020
As @Darren already said HTTrack should work like a charm:
https://www.httrack.com/

Björn Tantau Jun 21, 2020
wget should work with the - - mirror option. I've used it quite often for exactly this purpose.

aasasd Jun 21, 2020
Ron, wget absolutely does the things you want, I used it for this very purpose. The difficult part is picking the right options among the couple hundred that it has: recursive download with assets, under the specified directory—and alas I don't remember the proper options offhand. But you can be sure that leafing through the man page will get you the desired results.

aasasd Jun 21, 2020
Specifically, some options you'll want are --page-requisites --convert-links --adjust-extension .

Steve Jun 21, 2020
wget does a pretty good job. Also, there‘s httrack. See https://stackoverflow.com/questions/6348289/download-a-working-local-copy-of-a-webpage

Simounet Jun 22, 2020
Hi Ron,
I think it should do the trick.
wget --no-verbose \
--mirror \
--adjust-extension \
--convert-links \
--force-directories \
--backup-converted \
--span-hosts \
--no-parent \
-e robots=off \
--restrict-file-names=windows \
--timeout=5 \
--warc-file=archive.warc \
--page-requisites \
--no-check-certificate \
--no-hsts \
--domains blog.thimbleweedpark.com \
"https://blog.thimbleweedpark.com/"

Stay safe and have a nice-not-so-grumpy day.

Dan Jones Jun 23, 2020
It looks like your site is some sort of home-grown CMS, right?
Then you've already got all your site's posts in a database somewhere.
You should be able to write a basic PHP script to pull those entries in the database, and dump them to a bunch of HTML files.

Johnny Walker Jun 29, 2020
You're on a Mac, right? Got Homebrew? It should be as simple as:

brew install httrack

Then

httrack "https://blog.thimbleweedpark.com/" -O "/blog.thimbleweedpark.com/" -v

(-O means the output directory -- so you can change the second part. -v mean verbose). If you need anything on another subdomain then:

httrack "https://blog.thimbleweedpark.com/" -O "/blog.thimbleweedpark.com/" "+*.thimbleweedpark.com//*" -v

David Choy Jul 15, 2020
Curious; what did you pick? I manage lots of websites and have used httrack in the past, but it doesn't always work.

Rez Aug 10, 2020
You can use Hugo

Add your comment:


Here are the rules for commenting.