fossilizer
This is an attempt to build a static site generator that ingests Mastodon exports and produces a web site based on the content as a personal archive or even as a way to publish a backup copy of your stuff.
Quick Start
These are rough instructions for a rough command-line tool. There is no GUI, yet.
- Request and download an export from your Mastodon instance (e.g.
archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz
) - Download a release of pagefind and install it or use a precompiled binary
- Download a release of Fossilizer - there is no installation, just a standalone command.
- Note: on macOS, you'll need to make an exception to run
fossilizer
in Security & Privacy settings
- Note: on macOS, you'll need to make an exception to run
- Make a working directory somewhere
- Initialize the
data
directory:fossilizer init
- Ingest your Mastodon export and extract media attachments:
fossilizer import archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz
- Build your static website in the
build
directory:fossilizer build
- Build pagefind assets for search:
pagefind --keep-index-url --site build
- Serve the
build
directory up with a local web server - the--open
option will attempt to open a browser:fossilzer serve --open
- Enjoy a static web site of your Mastodon toots.
Tips
-
Try
fossilizer
by itself for a list of subcommands, try--help
as an option to get more details on any command. -
Try
fossilizer upgrade
to upgrade the SQLite database and other assets when you download a new version. This is not (yet) automatic. -
data/config.toml
can be used to set many as-yet undocumented configuration options. -
data/data.sqlite3
is a a persistent SQLite database that accumulates all imported data. -
data/media
is where media attachments are unpacked. -
You can repeatedly import data and import from multiple Mastodon instances. Everything will be merged.
-
Try
fossilizer init --customize
, which unpacks the following for customization:-
a
data/web
directory with static web assets that will be copied into thebuild
directory -
a
data/templates
directory with Tera templates used to produce the HTML output -
Note: this will not overwrite the database for an existing
data
directory, though it will overwrite any existingtemplates
orweb
directories.
-
Command Line Tool
The fossilizer
command-line tool can be used to do all the things.
The following sections describe the different commands available:
fossilizer init
fossilizer import <export>
fossilizer mastodon
fossilizer build
fossilizer serve
fossilizer upgrade
The init
command
The init
command prepares the current directory with data and configuration
files needed by Fossilzer. It's used like so:
mkdir my-mastodon-site
cd my-mastodon-site
fossilizer init
When using the init
command for the first time, some files and directories
will be set up for you:
my-mastodon-site/
└── build
└── data
└── data.sqlite3
-
The
build
directory is where your static site will be generated -
The
data/data.sqlite3
file is a SQLite database into which things like posts and user account data will be stored.
After you've run this command, you can try the import
command to
ingest data from one or more Mastodon exports.
Options
--clean
The --clean
flag will delete existing build
and data
directories before
setting things up. Be careful with this, because it will wipe out any existing
data!
fossilizer init --clean
--customize
By default, Fossilzer will use templates and assets embedded in the executable to generate a static web site. However, if you'd like to customize how your site is generated, you can extract these into external files to edit:
fossilizer init --customize
This will result in a file structure something like this:
my-mastodon-site/
└── build
└── data
└── media
├── config.toml
├── data.sqlite3
└── themes
└── default
├── templates
│ ├── activity.html
│ ├── day.html
│ ├── index.html
│ └── layout.html
└── web
├── index.css
└── index.js
-
The
config.toml
file can be used to supply configuration settings -
The
data/themes
directory holds themes that can be used to customize the appearance of the site. Thedefault
theme is provided by default. If you want to use a different theme, you can copy thedefault
directory and modify it under a directory with a different name. This name, then, can be supplied to thebuild
command with the--theme
option. -
The
data/themes/default/templates
directory holds Tera templates used to generate HTML pages. -
The
data/themes/default/web
directory holds web assets which will be copied into the root directory of your static site when it's generated.
TODO: Need to document configuration settings and templates. For now, just play around with the templates used by cli/build.rs
and see what happens! 😅 Configuration settings can be found in the config.rs
module
The import
command
The import
command is used to ingest the content from a Mastodon export into
the SQLite database and extract media attachments. It's used like so:
cd my-mastodon-site
fossilizer import ../archive-20230720182703-36f08a7ce74bbf59f141b496b2b7f457.tar.gz
Depending on the size of your export, this command should take a few seconds or minutes to extract all the posts and attachments.
Along with inserting database records, you'll find files like the following added to your data directory, including all the media attachments associated with the export under a directory based on the SHA-256 hash of the account address:
my-mastodon-site/
└── data
├── data.sqlite3
├── media
│ └── acc0bb231a7a2757c7e5c63aa68ce3cdbcfd32a43eb67a6bdedffe173c721184
│ ├── avatar.png
│ ├── header.jpg
│ └── media_attachments
│ └── files
│ ├── 002
│ │ ├── ...
│ ├── 105
│ │ ├── ...
│ ├── 106
│ │ ├── ...
You can run this command repeatedly, either with fresh exports from one Mastodon instance or with exports from many instances. All the data will be merged into the database from previous imports.
After you've run this command, you can try the build
command to
generate a static web site.
The mastodon
sub-commands
The mastodon
collection of sub-commands is used to connect to a Mastodon instance and fetch toots from an account via the Mastodon API.
To use these commands, first you'll need to connect to an existing account on a Mastodon instance using link
, code
, and then verify
sub-commands.
Then, you can fetch toots from that account and import them into the local database using the fetch
sub-command.
Selecting a Mastodon instance
By default, the mastodon
command will connect to the instance at https://mastodon.social
. You can specify a different instance hostname with the --instance
/ -i
option:
fossilizer mastodon --instance mstdn.social link
Configuration and secrets for connecting to the selected Mastodon instance are stored in a file named config-{instance}.toml
in the data
directory.
Connecting to a Mastodon instance
Before importing toots from a Mastodon account, you'll need to connect to the instance and authenticate with an account.
The link
sub-command will begin this process by attempting to register a new application with your instance and then offering an authorization URL to visit in a web browser. For example:
$ fossilizer mastodon link
[2024-04-18T20:06:21Z INFO fossilizer::cli::mastodon::link] Visit this link to begin authorization:
[2024-04-18T20:06:21Z INFO fossilizer::cli::mastodon::link] https://mastodon.social/oauth/authorize?client_id=w1pCC1ANqOqnrG6pk8cnbcMa0vTQjgmLQBHCrMqhEzY&scope=read+read%3Anotifications+read%3Astatuses+write+follow+push&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code
Once you've visited this link and authorized the application, you'll be given a code to paste back into the terminal to complete the process.
The code
sub-command will complete the process by exchanging the code for an access token:
$ fossilizer mastodon code 8675309jennyabcdefghiZZZFUVMixgjTlQMF0vK1I
After running the code
sub-command, you can then run the verify
sub-command to check that the connection is working:
$ fossilizer mastodon verify
[2024-04-18T20:09:04Z INFO fossilizer::cli::mastodon::verify] Verified as AuthVerifyResult { username: "lmorchard", url: "https://mastodon.social/@lmorchard", display_name: "Les Orchard 🕹\u{fe0f}🔧🐱🐰", created_at: "2016-11-01T00:00:00.000Z" }
Note that the access token secret obtained through the above steps is stored in the config-{instance}.toml
file in the data
directory:
data
├── config-instance-hackers.town.toml
├── config-instance-mastodon.social.toml
└── data.sqlite3
Keep these files safe and don't publish them anywhere! Also, once you've connected to an instance, you can use the --instance
/ -i
option to select it without needing to run link
or code
again.
Fetching toots
Once you've connected to a Mastodon instance, you can import toots from an account with the fetch
sub-command. By default, this command will attempt to fetch and import the newest 100 toots in pages of 25.
$ fossilizer mastodon fetch
[2024-04-18T20:13:00Z INFO fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:13:01Z INFO fossilizer::mastodon::fetcher] Fetched 25 (of 100 max)...
[2024-04-18T20:13:04Z INFO fossilizer::mastodon::fetcher] Fetched 50 (of 100 max)...
[2024-04-18T20:13:04Z INFO fossilizer::mastodon::fetcher] Fetched 75 (of 100 max)...
[2024-04-18T20:13:05Z INFO fossilizer::mastodon::fetcher] Fetched 100 (of 100 max)...
You can adjust the number of toots fetched with the --max
/ -m
option and the page size with the --page
/ -p
option. However, note that the Mastodon API may limit the number of toots you can fetch in a single request:
$ fossilizer mastodon fetch --max 200 --page 100
[2024-04-18T20:15:28Z INFO fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:15:29Z INFO fossilizer::mastodon::fetcher] Fetched 40 (of 200 max)...
[2024-04-18T20:15:29Z INFO fossilizer::mastodon::fetcher] Fetched 80 (of 200 max)...
[2024-04-18T20:15:30Z INFO fossilizer::mastodon::fetcher] Fetched 120 (of 200 max)...
[2024-04-18T20:15:31Z INFO fossilizer::mastodon::fetcher] Fetched 160 (of 200 max)...
[2024-04-18T20:15:31Z INFO fossilizer::mastodon::fetcher] Fetched 200 (of 200 max)...
Incremental fetching
If you've already imported most of your toots and would like to fetch only the newest ones, you can use the --incremental
option. This will stop the fetch process as soon as a page is encountered that contains a toot already in the database:
$ fossilizer mastodon fetch --incremental
2024-04-18T20:17:49Z INFO fossilizer::mastodon::fetcher] Fetching statuses for account https://mastodon.social/@lmorchard
[2024-04-18T20:17:50Z INFO fossilizer::mastodon::fetcher] Fetched 25 (of 100 max)...
[2024-04-18T20:17:50Z INFO fossilizer::mastodon::fetcher] Stopping incremental fetch after catching up to imported activities
The build
command
The build
command is used to generate a static web site from imported
content and media attachments. It's used like so:
cd my-mastodon-site
fossilzer build
pagefind --keep-index-url --site build
Note: Until or unless Pagefind can be integrated into Fossilzer, it needs to be run as a separate command to provide search indexes and code modules for the site.
After using the build
command, you should end up with a build
directory
with a structure somewhat like this:
my-mastodon-site/
├── build
│ ├── 2020
│ ├── 2021
│ ├── 2022
│ ├── 2023
│ ├── index.css
│ ├── index.html
│ ├── index.js
│ ├── media
│ ├── pagefind
│ └── vendor
-
Activities are organized into a
{year}/{month}/{day}.html
file structure -
An
index.html
page is generated for the site overall, linking to the pages for each day -
The
media
directory is copied directly fromdata/media
-
The
pagefind
directory is generated by Pagefind for client-side search -
Other files and directories like
index.js
,index.css
,vendor
are static assets copied into the build
You can customize both the templates and the static web assets used in this build. Check out the init --customize
option for more information.
Options
--theme
Use the theme named <THEME>
for rendering the site. This will look for a directory named themes/<THEME>
in the data
directory.
--clean
Delete build directory before proceeding
--skip-index
Skip building index page in HTML
--skip-index-json
Skip building index page in JSON
--skip-activities
Skip building pages for activities
--skip-assets
Skip copying over web assets
The serve
command
The serve
command starts up a local web server to allow access to the static web site.
fossilizer serve
Options
--host
Listen on the specified <HOST>
address. Default is 127.0.0.1
.
--port
Listen on the specified <PORT>
number. Default is 8881
.
--open
Open a web browser to the server URL after starting.
The upgrade
command
The upgrade
command is used to upgrade the database and perform any other
necessary changes after downloading a new version of Fossilzer.
Run this command whenever you upgrade Fossilzer.
cd my-mastodon-site
fossilzer upgrade
Customization
Try fossilizer init --customize
, which unpacks the following for customization:
-
a
data/web
directory with static web assets that will be copied into thebuild
directory -
a
data/templates
directory with Tera templates used to produce the HTML output -
Note: this will not overwrite the database for an existing
data
directory, though it will overwrite any existingtemplates
orweb
directories.
Check out the templates to see how the pages are built. For a more in-depth reference on what variables are supplied when rendering templates, check out the crate documentation:
For Developers
TODO: jot down design notions and useful information for folks aiming to help contribute to or customize this software.
fossilizer
has not yet been published as a crate, but you can see the module docs here:
Odds & Ends
-
For some details on how SQLite is used here as an ad-hoc document database, check out this blog post on Using SQLite as a document database for Mastodon exports. TL;DR: JSON is stored as the main column in each row, while
json_extract()
is used mainly to generate virtual columns for lookup indexes. -
When ingesting data, care is taken to attempt to store JSON as close to the original source as possible from APIs and input files. That way, data parsing and models can be incrementally upgraded over time without having lost any information from imported sources.