fossilizer

This is an attempt to build a static site generator that ingests Mastodon exports and produces a web site based on the content as a personal archive or even as a way to publish a backup copy of your stuff.

Quick Start

These are rough instructions for a rough command-line tool. There is no GUI, yet.

Request and download an export from your Mastodon instance (e.g. archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz)
Download a release of pagefind and install it or use a precompiled binary
Download a release of Fossilizer - there is no installation, just a standalone command.
- Note: on macOS, you'll need to make an exception to run fossilizer in Security & Privacy settings
Make a working directory somewhere
Initialize the data directory:
```
fossilizer init
```

Ingest your Mastodon export and extract media attachments:


fossilizer import archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz

Build your static website in the build directory:
```
fossilizer build
```
Build pagefind assets for search:
```
pagefind --keep-index-url --site build
```
Serve the build directory up with a local web server - the --open option will attempt to open a browser:
```
fossilzer serve --open
```
Enjoy a static web site of your Mastodon toots.

Tips

Try fossilizer by itself for a list of subcommands, try --help as an option to get more details on any command.
Try fossilizer upgrade to upgrade the SQLite database and other assets when you download a new version. This is not (yet) automatic.
data/config.toml can be used to set many as-yet undocumented configuration options.
data/data.sqlite3 is a a persistent SQLite database that accumulates all imported data.
data/media is where media attachments are unpacked.
You can repeatedly import data and import from multiple Mastodon instances. Everything will be merged.
Try fossilizer init --customize, which unpacks the following for customization:
- a data/web directory with static web assets that will be copied into the build directory
- a data/templates directory with Tera templates used to produce the HTML output
- Note: this will not overwrite the database for an existing data directory, though it will overwrite any existing templates or web directories.

Command Line Tool

The fossilizer command-line tool can be used to do all the things.

The following sections describe the different commands available:

The `init` command

The init command prepares the current directory with data and configuration files needed by Fossilzer. It's used like so:


mkdir my-mastodon-site
cd my-mastodon-site
fossilizer init

When using the init command for the first time, some files and directories will be set up for you:


my-mastodon-site/
└── build
└── data
    └── data.sqlite3

The build directory is where your static site will be generated
The data/data.sqlite3 file is a SQLite database into which things like posts and user account data will be stored.

After you've run this command, you can try the import command to ingest data from one or more Mastodon exports.

Options

`--clean`

The --clean flag will delete existing build and data directories before setting things up. Be careful with this, because it will wipe out any existing data!


fossilizer init --clean

`--customize`

By default, Fossilzer will use templates and assets embedded in the executable to generate a static web site. However, if you'd like to customize how your site is generated, you can extract these into external files to edit:


fossilizer init --customize

This will result in a file structure something like this:


my-mastodon-site/
└── build
└── data
    └── media
    ├── config.toml
    ├── data.sqlite3
    └── themes
        └── default
            ├── templates
            │   ├── activity.html
            │   ├── day.html
            │   ├── index.html
            │   └── layout.html
            └── web
                ├── index.css
                └── index.js

The config.toml file can be used to supply configuration settings
The data/themes directory holds themes that can be used to customize the appearance of the site. The default theme is provided by default. If you want to use a different theme, you can copy the default directory and modify it under a directory with a different name. This name, then, can be supplied to the build command with the --theme option.
The data/themes/default/templates directory holds Tera templates used to generate HTML pages.
The data/themes/default/web directory holds web assets which will be copied into the root directory of your static site when it's generated.

TODO: Need to document configuration settings and templates. For now, just play around with the templates used by cli/build.rs and see what happens! 😅 Configuration settings can be found in the config.rs module

The `import` command

The import command is used to ingest the content from a Mastodon export into the SQLite database and extract media attachments. It's used like so:


cd my-mastodon-site
fossilizer import ../archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz

Depending on the size of your export, this command should take a few seconds or minutes to extract all the posts and attachments.

Along with inserting database records, you'll find files like the following added to your data directory, including all the media attachments associated with the export under a directory based on the SHA-256 hash of the account address:


my-mastodon-site/
└── data
    ├── data.sqlite3
    ├── media
    │   └── acc0bb231a7a2757c7e5c63aa68ce3cdbcfd32a43eb67a6bdedffe173c721184
    │       ├── avatar.png
    │       ├── header.jpg
    │       └── media_attachments
    │           └── files
    │               ├── 002
    │               │   ├── ...
    │               ├── 105
    │               │   ├── ...
    │               ├── 106
    │               │   ├── ...

You can run this command repeatedly, either with fresh exports from one Mastodon instance or with exports from many instances. All the data will be merged into the database from previous imports.

After you've run this command, you can try the build command to generate a static web site.

The `mastodon` sub-commands

The mastodon collection of sub-commands is used to connect to a Mastodon instance and fetch toots from an account via the Mastodon API.

To use these commands, first you'll need to connect to an existing account on a Mastodon instance using link, code, and then verify sub-commands.

Then, you can fetch toots from that account and import them into the local database using the fetch sub-command.

Selecting a Mastodon instance

By default, the mastodon command will connect to the instance at https://mastodon.social. You can specify a different instance hostname with the --instance / -i option:


fossilizer mastodon --instance mstdn.social link

Configuration and secrets for connecting to the selected Mastodon instance are stored in a file named config-{instance}.toml in the data directory.

Connecting to a Mastodon instance

Before importing toots from a Mastodon account, you'll need to connect to the instance and authenticate with an account.

The link sub-command will begin this process by attempting to register a new application with your instance and then offering an authorization URL to visit in a web browser. For example:


$ fossilizer mastodon link

[2024-04-18T20:06:21Z INFO  fossilizer::cli::mastodon::link] Visit this link to begin authorization:
[2024-04-18T20:06:21Z INFO  fossilizer::cli::mastodon::link] https://mastodon.social/oauth/authorize?client_id=w1pCC1ANqOqnrG6pk8cnbcMa0vTQjgmLQBHCrMqhEzY&scope=read+read%3Anotifications+read%3Astatuses+write+follow+push&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code

Once you've visited this link and authorized the application, you'll be given a code to paste back into the terminal to complete the process.

The code sub-command will complete the process by exchanging the code for an access token:


$ fossilizer mastodon code 8675309jennyabcdefghiZZZFUVMixgjTlQMF0vK1I

After running the code sub-command, you can then run the verify sub-command to check that the connection is working:


$ fossilizer mastodon verify

[2024-04-18T20:09:04Z INFO  fossilizer::cli::mastodon::verify] Verified as AuthVerifyResult { username: "lmorchard", url: "https://mastodon.social/@lmorchard", display_name: "Les Orchard 🕹\u{fe0f}🔧🐱🐰", created_at: "2016-11-01T00:00:00.000Z" }

Note that the access token secret obtained through the above steps is stored in the config-{instance}.toml file in the data directory:


data
├── config-instance-hackers.town.toml
├── config-instance-mastodon.social.toml
└── data.sqlite3

Keep these files safe and don't publish them anywhere! Also, once you've connected to an instance, you can use the --instance / -i option to select it without needing to run link or code again.

Fetching toots

Once you've connected to a Mastodon instance, you can import toots from an account with the fetch sub-command. By default, this command will attempt to fetch and import the newest 100 toots in pages of 25.


$ fossilizer mastodon fetch

[2024-04-18T20:13:00Z INFO  fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:13:01Z INFO  fossilizer::mastodon::fetcher] Fetched 25 (of 100 max)...
[2024-04-18T20:13:04Z INFO  fossilizer::mastodon::fetcher] Fetched 50 (of 100 max)...
[2024-04-18T20:13:04Z INFO  fossilizer::mastodon::fetcher] Fetched 75 (of 100 max)...
[2024-04-18T20:13:05Z INFO  fossilizer::mastodon::fetcher] Fetched 100 (of 100 max)...

You can adjust the number of toots fetched with the --max / -m option and the page size with the --page / -p option. However, note that the Mastodon API may limit the number of toots you can fetch in a single request:


$ fossilizer mastodon fetch --max 200 --page 100

[2024-04-18T20:15:28Z INFO  fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:15:29Z INFO  fossilizer::mastodon::fetcher] Fetched 40 (of 200 max)...
[2024-04-18T20:15:29Z INFO  fossilizer::mastodon::fetcher] Fetched 80 (of 200 max)...
[2024-04-18T20:15:30Z INFO  fossilizer::mastodon::fetcher] Fetched 120 (of 200 max)...
[2024-04-18T20:15:31Z INFO  fossilizer::mastodon::fetcher] Fetched 160 (of 200 max)...
[2024-04-18T20:15:31Z INFO  fossilizer::mastodon::fetcher] Fetched 200 (of 200 max)...

Incremental fetching

If you've already imported most of your toots and would like to fetch only the newest ones, you can use the --incremental option. This will stop the fetch process as soon as a page is encountered that contains a toot already in the database:


$ fossilizer mastodon fetch --incremental

2024-04-18T20:17:49Z INFO  fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:17:50Z INFO  fossilizer::mastodon::fetcher] Fetched 25 (of 100 max)...
[2024-04-18T20:17:50Z INFO  fossilizer::mastodon::fetcher] Stopping incremental fetch after catching up to imported activities

The `build` command

The build command is used to generate a static web site from imported content and media attachments. It's used like so:


cd my-mastodon-site
fossilzer build
pagefind --keep-index-url --site build

Note: Until or unless Pagefind can be integrated into Fossilzer, it needs to be run as a separate command to provide search indexes and code modules for the site.

After using the build command, you should end up with a build directory with a structure somewhat like this:


my-mastodon-site/
├── build
│   ├── 2020
│   ├── 2021
│   ├── 2022
│   ├── 2023
│   ├── index.css
│   ├── index.html
│   ├── index.js
│   ├── media
│   ├── pagefind
│   └── vendor

Activities are organized into a {year}/{month}/{day}.html file structure
An index.html page is generated for the site overall, linking to the pages for each day
The media directory is copied directly from data/media
The pagefind directory is generated by Pagefind for client-side search
Other files and directories like index.js, index.css, vendor are static assets copied into the build

You can customize both the templates and the static web assets used in this build. Check out the init --customize option for more information.

Options

--theme

Use the theme named <THEME> for rendering the site. This will look for a directory named themes/<THEME> in the data directory.

--clean

Delete build directory before proceeding

--skip-index

Skip building index page in HTML

--skip-index-json

Skip building index page in JSON

--skip-activities

Skip building pages for activities

--skip-assets

Skip copying over web assets

The `serve` command

The serve command starts up a local web server to allow access to the static web site.


fossilizer serve

Options

--host

Listen on the specified <HOST> address. Default is 127.0.0.1.

--port

Listen on the specified <PORT> number. Default is 8881.

--open

Open a web browser to the server URL after starting.

The `upgrade` command

The upgrade command is used to upgrade the database and perform any other necessary changes after downloading a new version of Fossilzer.

Run this command whenever you upgrade Fossilzer.


cd my-mastodon-site
fossilzer upgrade

Customization

Try fossilizer init --customize, which unpacks the following for customization:

a data/web directory with static web assets that will be copied into the build directory
a data/templates directory with Tera templates used to produce the HTML output
Note: this will not overwrite the database for an existing data directory, though it will overwrite any existing templates or web directories.

Check out the templates to see how the pages are built. For a more in-depth reference on what variables are supplied when rendering templates, check out the crate documentation:

For Developers

TODO: jot down design notions and useful information for folks aiming to help contribute to or customize this software.

fossilizer has not yet been published as a crate, but you can see the module docs here:

Crate fossilizer

Odds & Ends

For some details on how SQLite is used here as an ad-hoc document database, check out this blog post on Using SQLite as a document database for Mastodon exports. TL;DR: JSON is stored as the main column in each row, while json_extract() is used mainly to generate virtual columns for lookup indexes.
When ingesting data, care is taken to attempt to store JSON as close to the original source as possible from APIs and input files. That way, data parsing and models can be incrementally upgraded over time without having lost any information from imported sources.

Fossilizer