If you follow this document and still have troubles, please open an issue on GitHub: so I can fill in the gaps.
This guide is split into two parts: short one and very detailed one. Short Version is meant for people who are confortable with development tools like Python, Django, pip, git or even gcc. In this case I will skip most of details as they will sound boring for such persons.
Detailed Version is for people who probably heard of Python - as programming language - but do not have any experience working with it. Any form of prior programming (in ruby, perl, java, C/C++) experience will help you to grasp the concepts quicker.
In both versions of this guide you need to know what is a command line terminal and how to work with it. In general Papermerge is a Linux/Unix based software. In this guide I will use Ubuntu 20.04 as example, but this instructions can be adopted easily for any Linux distribution.
First download the sorce code. In this guide we clone latest stable version 1.5.5 into folder PapermergeDMS (located in home folder):
Install required Ubuntu 20.04 LTS deb packages:
$ sudo apt install build-essential \ python3-pip \ python3-venv \ git \ imagemagick \ poppler-utils \ pdftk \ tesseract-ocr \ tesseract-ocr-eng \ tesseract-ocr-deu \ tesseract-ocr-fra \ tesseract-ocr-spa
Create python virutal environment and activate it:
$ cd ~/PapermergeDMS $ python3 -m venv .venv --system-site-packages $ source .venv/bin/activate
Alternativelly you can create python virtual environment with command:
Advantage of last command is that you can specify exact python version. Papermerge requires python version >= 3.7
Install necessary dependencies:
Initialize SQLite database with:
Create a user for Papermerge instance:
Start webserver with:
If no specific IP or PORT is given, the default is 127.0.0.1:8000 also known as http://localhost:8000/.
7. In a separate window, change to the project’s root directory again, but this time, you should start the worker script with:
Papermerge is written in Python. First thing you need to make sure python interpreter is installed.
Ubuntu 20.04 comes with python interpreter already installed.
However, the command to invoke python interpreter is
Python community now transitions from python2 (which is not maintained
anymore) to python3. This is why in many Linux distributions you can
access python interpreter either with
python command or with
A close python’s friend is pip command. pip is python’s package manager.
Similar as with python interpreter story - there might be either pip or
pip3 - latter is specifically for
python version 3. We will use pip3:
Make sure it was installed:
Another package which you need to install is
python3-venv. This one will be
used in Step 2 we’ll learn and setup python virtual environment:
Although Papermerge is written in python, it uses some special python modules which are compiled from C sources and used in binary form. This means that you need gcc compiler installed as well.
In Ubuntu 20.04,
gcc is available via
Double check that
gcc was installed:
$ gcc --version gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Ok, great! We have to install one more utility -
git. Strictly speaking,
you don’t need
git, because you can download the tarball and unzip
(or untar or extract) sources. But let’s stick with git:
Double check that
git was installed:
And now clone the latest stable version of Papermerge directly from GitHub repository:
As of writing this - latest stable version of Papermerge is 1.5.5, thus the argument
with respective version. Also, notice that repository was cloned inside folder named PapermergeDMS, you
can choose whatever title for that folder suites you.
From now on, I will refer to PapermergeDMS folder as project root or top level folder of the project
Change directory to project’s root (folder where you cloned/extracted sources):
Our goal in this step is to install python dependencies - like Django for example. Most important dependencies -
without which project won’t start - are listed in
(relative to project’s root).
The command to do that is as simple as
pip install -r
requirements/base.txt - but please don’t rush to type it yet. We need to
clarify the concept of python virtual environment first.
If you simply type
pip insall -r requirements/base.txt - it will install
packages system-wide. A better approach is to install dependencies per
project. So, the place designated for python packages specific for this
project is called a virtual environment. Let’s create a python virtual
environment and name it
For virtual environment you can choose whatever name you want. For example
you can choose to name your virtual environment py37, in that case command
python3 -m venv py37. However, I chose .venv because in
Unix/Linux dotted folders have special meaning - they are hidden so to
--system-site-packages is necessary, it will install a package called
wheel package, pip install later will issue warning.
Alternatively you can create python virtual environment with following command:
Advantage of last command is that it specifies exact python version.
Next we need to activate virtual environment:
Terminal’s prompt (bash - in Ubuntu) will change to indicate that python virtual environment is active.
Python virtual environment is just a directory where python packages will
be installed. Activating python environment is basically changing your
PATH variable - this is exactly what
.venv/bin/activate command does. If you will run
echo $PATH you will
PapermergeDMS/.venv/bin is first in path, this means that
python interpreter first of all will look up for its dependencies there.
And now, with python virtual environment active, let’s install Papermerge dependencies:
$ source .venv/bin/activate # a reminder to run this first $ pip3 install -r requirements/base.txt
At this point, all python packages were installed inside directory
.venv/lib/python3.8/site-packages and that directory is first one in your
Papermerge is based on Django Web Framework.
Django provides a
manage.py script which is used to run all project related commands.
manage.py script is located in project’s root directory.
First let’s run migrations:
At this point you will see warnings about missing binary dependencies. We
deal with them little bit later when we will discuss
run migrations - in django’s jargon means that you prepare the database. By
default Papermerge (as any Django based project) uses sqlite database - which is stored in a single file.
./manage.py migrate command will create that file (it is called db.sqlite3
and is in project’s root directory) and create database schema for the
We are not ready yet, but at this point, you can built-in web server and access login screen:
runserver command will start web server on port
can access login screen via any web browser by pointing it to
But as I mentioned, we are not ready yet. First of all, when you run
./manage.py runserver command you probably noticed couple of warnings. To see
if all binary dependencies were installed run following command:
$ ./manage.py check
On freshly installed Ubuntu 20.04 LTS you will see following warnings:
System check identified some issues: WARNINGS: ?: Papermerge can't find convert. Without it, image resizing is not possible. HINT: Either it's not in your PATH or it's not installed. ?: Papermerge can't find identify. Without it, it is not possible to count pages in TIFF. HINT: Either it's not in your PATH or it's not installed. ?: Papermerge can't find tesseract. Without it, OCR of the documents is impossible. HINT: Either it's not in your PATH or it's not installed. ?: papermerge.conf.py file was found. Following locations attempted /etc/papermerge.conf.py, papermerge.conf.py HINT: Create one of those files or point PAPERMERGE_CONFIG environment name to it. System check identified 5 issues (0 silenced).
This means that you need to install all above dependencies. Let’s install all of them in one shot:
sudo apt install imagemagick \ poppler-utils \ tesseract-ocr \ tesseract-ocr-eng \ tesseract-ocr-deu \ tesseract-ocr-fra \ tesseract-ocr-spa
When installation is complete, run check again:
$ ./manage.py check System check identified some issues: WARNINGS: ?: papermerge.conf.py file was found. Following locations attempted /etc/papermerge.conf.py, papermerge.conf.py HINT: Create one of those files or point PAPERMERGE_CONFIG environment name to it. System check identified 1 issue (0 silenced).
To silence last warning, just create an empty
papermerge.conf.py file in project’s root,
we will turn our attention to that file little bit later:
It’s time to create administrative (superuser) user for your Papermerge instance:
The username and password you will type above you will use as login credentials. So, start server again (in case it is not running):
Point your web browser to
http://localhost:8000 and use superuser’s
username/password to login.
In a separate window, change to the project’s root directory again, but this
time, you should start the worker with
Remember to activate python virtual environment first:
Worker is the part which performs OCR process. For correct function of Papermerge you must have both parts running:
main app - the one which you start with
worker - the one which you start with
Now, you can start uploading documents. Remember that only PDF, TIFF, jpeg and png File Formats are supported.
By default, you don’t need configuration
However, if there is no configuration file - Papermerge will issue a warning.
In one of previous steps we created an empty configuration file:
What is the purpose of empty configuration file? It has one - it raises awareness of administrator that such file exists. The logic is following - if administrator created papermerge.conf.py => Papermerge application immediately concludes admin is aware of such file and silence the warning message. Again - it is absolutely OK to leave papermerge.conf.py file empty.
By default, in language dropdown menu, two languages will be displayed German and English. You can change that with following configuration:
Now four languages will be displayed in language dropdown.
In previous steps we installed english, spanish, french and german tesseract language packs (packages named tesseract-ocr-eng, tesseract-ocr-deu, tesseract-ocr-fra, tesseract-ocr-spa). For each language you want to OCR you need to have tesseract language pack installed.
Learn more Papermerge configurations in settings
Once you’ve tested things and are happy with the work flow, you should secure the installation and automate the process of starting the webserver and worker. Server Configurations explains different configuration scenarios of how you can make your bare metal setup - more stable.