fossilizer
This is an attempt to build a static site generator that ingests Mastodon exports and produces a web site based on the content as a personal archive or even as a way to publish a backup copy of your stuff.
Quick Start
These are rough instructions for a rough command-line tool. There is no GUI, yet.
- Request and download an export from your Mastodon instance (e.g.
archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz) - Download a release of pagefind and install it or use a precompiled binary
- Download a release of Fossilizer - there is no installation, just a standalone command.
- Note: on macOS, you'll need to make an exception to run
fossilizerin Security & Privacy settings
- Note: on macOS, you'll need to make an exception to run
- Make a working directory somewhere
- Initialize the
datadirectory:fossilizer init - Ingest your Mastodon export and extract media attachments:
fossilizer import archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz - Build your static website in the
builddirectory:fossilizer build - Build pagefind assets for search:
pagefind --keep-index-url --site build - Serve the
builddirectory up with a local web server - the--openoption will attempt to open a browser:fossilzer serve --open - Enjoy a static web site of your Mastodon toots.
Tips
-
Try
fossilizerby itself for a list of subcommands, try--helpas an option to get more details on any command. -
Try
fossilizer upgradeto upgrade the SQLite database and other assets when you download a new version. This is not (yet) automatic. -
data/config.tomlcan be used to set many as-yet undocumented configuration options. -
data/data.sqlite3is a a persistent SQLite database that accumulates all imported data. -
data/mediais where media attachments are unpacked. -
You can repeatedly import data and import from multiple Mastodon instances. Everything will be merged.
-
Try
fossilizer init --customize, which unpacks the following for customization:-
a
data/webdirectory with static web assets that will be copied into thebuilddirectory -
a
data/templatesdirectory with Tera templates used to produce the HTML output -
Note: this will not overwrite the database for an existing
datadirectory, though it will overwrite any existingtemplatesorwebdirectories.
-
Command Line Tool
The fossilizer command-line tool can be used to do all the things.
The following sections describe the different commands available:
fossilizer initfossilizer import <export>fossilizer mastodonfossilizer buildfossilizer servefossilizer upgrade
The init command
The init command prepares the current directory with data and configuration
files needed by Fossilzer. It's used like so:
mkdir my-mastodon-site
cd my-mastodon-site
fossilizer init
When using the init command for the first time, some files and directories
will be set up for you:
my-mastodon-site/
└── build
└── data
└── data.sqlite3
-
The
builddirectory is where your static site will be generated -
The
data/data.sqlite3file is a SQLite database into which things like posts and user account data will be stored.
After you've run this command, you can try the import command to
ingest data from one or more Mastodon exports.
Options
--clean
The --clean flag will delete existing build and data directories before
setting things up. Be careful with this, because it will wipe out any existing
data!
fossilizer init --clean
--customize
By default, Fossilzer will use templates and assets embedded in the executable to generate a static web site. However, if you'd like to customize how your site is generated, you can extract these into external files to edit:
fossilizer init --customize
This will result in a file structure something like this:
my-mastodon-site/
└── build
└── data
└── media
├── config.toml
├── data.sqlite3
└── themes
└── default
├── templates
│ ├── activity.html
│ ├── day.html
│ ├── index.html
│ └── layout.html
└── web
├── index.css
└── index.js
-
The
config.tomlfile can be used to supply configuration settings -
The
data/themesdirectory holds themes that can be used to customize the appearance of the site. Thedefaulttheme is provided by default. If you want to use a different theme, you can copy thedefaultdirectory and modify it under a directory with a different name. This name, then, can be supplied to thebuildcommand with the--themeoption. -
The
data/themes/default/templatesdirectory holds Tera templates used to generate HTML pages. -
The
data/themes/default/webdirectory holds web assets which will be copied into the root directory of your static site when it's generated.
TODO: Need to document configuration settings and templates. For now, just play around with the templates used by cli/build.rs and see what happens! 😅 Configuration settings can be found in the config.rs module
The import command
The import command is used to ingest the content from a Mastodon export into
the SQLite database and extract media attachments. It's used like so:
cd my-mastodon-site
fossilizer import ../archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz
Depending on the size of your export, this command should take a few seconds or minutes to extract all the posts and attachments.
Along with inserting database records, you'll find files like the following added to your data directory, including all the media attachments associated with the export under a directory based on the SHA-256 hash of the account address:
my-mastodon-site/
└── data
├── data.sqlite3
├── media
│ └── acc0bb231a7a2757c7e5c63aa68ce3cdbcfd32a43eb67a6bdedffe173c721184
│ ├── avatar.png
│ ├── header.jpg
│ └── media_attachments
│ └── files
│ ├── 002
│ │ ├── ...
│ ├── 105
│ │ ├── ...
│ ├── 106
│ │ ├── ...
You can run this command repeatedly, either with fresh exports from one Mastodon instance or with exports from many instances. All the data will be merged into the database from previous imports.
After you've run this command, you can try the build command to
generate a static web site.
The mastodon sub-commands
The mastodon collection of sub-commands is used to connect to a Mastodon instance and fetch toots from an account via the Mastodon API.
To use these commands, first you'll need to connect to an existing account on a Mastodon instance using link, code, and then verify sub-commands.
Then, you can fetch toots from that account and import them into the local database using the fetch sub-command.
Selecting a Mastodon instance
By default, the mastodon command will connect to the instance at https://mastodon.social. You can specify a different instance hostname with the --instance / -i option:
fossilizer mastodon --instance mstdn.social link
Configuration and secrets for connecting to the selected Mastodon instance are stored in a file named config-{instance}.toml in the data directory.
Connecting to a Mastodon instance
Before importing toots from a Mastodon account, you'll need to connect to the instance and authenticate with an account.
The link sub-command will begin this process by attempting to register a new application with your instance and then offering an authorization URL to visit in a web browser. For example:
$ fossilizer mastodon link
[2024-04-18T20:06:21Z INFO fossilizer::cli::mastodon::link] Visit this link to begin authorization:
[2024-04-18T20:06:21Z INFO fossilizer::cli::mastodon::link] https://mastodon.social/oauth/authorize?client_id=w1pCC1ANqOqnrG6pk8cnbcMa0vTQjgmLQBHCrMqhEzY&scope=read+read%3Anotifications+read%3Astatuses+write+follow+push&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code
Once you've visited this link and authorized the application, you'll be given a code to paste back into the terminal to complete the process.
The code sub-command will complete the process by exchanging the code for an access token:
$ fossilizer mastodon code 8675309jennyabcdefghiZZZFUVMixgjTlQMF0vK1I
After running the code sub-command, you can then run the verify sub-command to check that the connection is working:
$ fossilizer mastodon verify
[2024-04-18T20:09:04Z INFO fossilizer::cli::mastodon::verify] Verified as AuthVerifyResult { username: "lmorchard", url: "https://mastodon.social/@lmorchard", display_name: "Les Orchard 🕹\u{fe0f}🔧🐱🐰", created_at: "2016-11-01T00:00:00.000Z" }
Note that the access token secret obtained through the above steps is stored in the config-{instance}.toml file in the data directory:
data
├── config-instance-hackers.town.toml
├── config-instance-mastodon.social.toml
└── data.sqlite3
Keep these files safe and don't publish them anywhere! Also, once you've connected to an instance, you can use the --instance / -i option to select it without needing to run link or code again.
Fetching toots
Once you've connected to a Mastodon instance, you can import toots from an account with the fetch sub-command. By default, this command will attempt to fetch and import the newest 100 toots in pages of 25.
$ fossilizer mastodon fetch
[2024-04-18T20:13:00Z INFO fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:13:01Z INFO fossilizer::mastodon::fetcher] Fetched 25 (of 100 max)...
[2024-04-18T20:13:04Z INFO fossilizer::mastodon::fetcher] Fetched 50 (of 100 max)...
[2024-04-18T20:13:04Z INFO fossilizer::mastodon::fetcher] Fetched 75 (of 100 max)...
[2024-04-18T20:13:05Z INFO fossilizer::mastodon::fetcher] Fetched 100 (of 100 max)...
You can adjust the number of toots fetched with the --max / -m option and the page size with the --page / -p option. However, note that the Mastodon API may limit the number of toots you can fetch in a single request:
$ fossilizer mastodon fetch --max 200 --page 100
[2024-04-18T20:15:28Z INFO fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:15:29Z INFO fossilizer::mastodon::fetcher] Fetched 40 (of 200 max)...
[2024-04-18T20:15:29Z INFO fossilizer::mastodon::fetcher] Fetched 80 (of 200 max)...
[2024-04-18T20:15:30Z INFO fossilizer::mastodon::fetcher] Fetched 120 (of 200 max)...
[2024-04-18T20:15:31Z INFO fossilizer::mastodon::fetcher] Fetched 160 (of 200 max)...
[2024-04-18T20:15:31Z INFO fossilizer::mastodon::fetcher] Fetched 200 (of 200 max)...
Incremental fetching
If you've already imported most of your toots and would like to fetch only the newest ones, you can use the --incremental option. This will stop the fetch process as soon as a page is encountered that contains a toot already in the database:
$ fossilizer mastodon fetch --incremental
2024-04-18T20:17:49Z INFO fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:17:50Z INFO fossilizer::mastodon::fetcher] Fetched 25 (of 100 max)...
[2024-04-18T20:17:50Z INFO fossilizer::mastodon::fetcher] Stopping incremental fetch after catching up to imported activities
The build command
The build command is used to generate a static web site from imported
content and media attachments. It's used like so:
cd my-mastodon-site
fossilzer build
pagefind --keep-index-url --site build
Note: Until or unless Pagefind can be integrated into Fossilzer, it needs to be run as a separate command to provide search indexes and code modules for the site.
After using the build command, you should end up with a build directory
with a structure somewhat like this:
my-mastodon-site/
├── build
│ ├── 2020
│ ├── 2021
│ ├── 2022
│ ├── 2023
│ ├── index.css
│ ├── index.html
│ ├── index.js
│ ├── media
│ ├── pagefind
│ └── vendor
-
Activities are organized into a
{year}/{month}/{day}.htmlfile structure -
An
index.htmlpage is generated for the site overall, linking to the pages for each day -
The
mediadirectory is copied directly fromdata/media -
The
pagefinddirectory is generated by Pagefind for client-side search -
Other files and directories like
index.js,index.css,vendorare static assets copied into the build
You can customize both the templates and the static web assets used in this build. Check out the init --customize option for more information.
Options
--theme
Use the theme named <THEME> for rendering the site. This will look for a directory named themes/<THEME> in the data directory.
--clean
Delete build directory before proceeding
--skip-index
Skip building index page in HTML
--skip-index-json
Skip building index page in JSON
--skip-activities
Skip building pages for activities
--skip-assets
Skip copying over web assets
The serve command
The serve command starts up a local web server to allow access to the static web site.
fossilizer serve
Options
--host
Listen on the specified <HOST> address. Default is 127.0.0.1.
--port
Listen on the specified <PORT> number. Default is 8881.
--open
Open a web browser to the server URL after starting.
The upgrade command
The upgrade command is used to upgrade the database and perform any other
necessary changes after downloading a new version of Fossilzer.
Run this command whenever you upgrade Fossilzer.
cd my-mastodon-site
fossilzer upgrade
Customization
Try fossilizer init --customize, which unpacks the following for customization:
-
a
data/webdirectory with static web assets that will be copied into thebuilddirectory -
a
data/templatesdirectory with Tera templates used to produce the HTML output -
Note: this will not overwrite the database for an existing
datadirectory, though it will overwrite any existingtemplatesorwebdirectories.
Check out the templates to see how the pages are built. For a more in-depth reference on what variables are supplied when rendering templates, check out the crate documentation:
For Developers
TODO: jot down design notions and useful information for folks aiming to help contribute to or customize this software.
fossilizer has not yet been published as a crate, but you can see the module docs here:
Odds & Ends
-
For some details on how SQLite is used here as an ad-hoc document database, check out this blog post on Using SQLite as a document database for Mastodon exports. TL;DR: JSON is stored as the main column in each row, while
json_extract()is used mainly to generate virtual columns for lookup indexes. -
When ingesting data, care is taken to attempt to store JSON as close to the original source as possible from APIs and input files. That way, data parsing and models can be incrementally upgraded over time without having lost any information from imported sources.