Settings

Papermerge loads its settings from a configurations file. At first it tries to read following files:

  1. /etc/papermerge.conf.py

  2. papermerge.conf.py - from current project directory

If neither of above files exists it will check environment variable PAPERMERGE_CONFIG_FILE. In case environment variable PAPERMERGE_CONFIG_FILE points to an existing file - it will try to read its configurations from there.

If all above attempts fail, Papermerge will use default configurations values and issue you a warning. If you want to get rid of warning message, just create an empty configuration file papermerge.conf.py in project root directory (right next to papermerge.conf.py.example) or in location /etc/papermerge.conf.py.

Configuration file uses python syntax.

Django Settings

Papermerge is based on Django Web Development Framework. This means basically that if you’ll know how Django projects are configured - you’ll be more familiar with how papermerge’s configuration internals. One particularly important thing to be aware of is the DJANGO_SETTINGS_MODULE environment variable - which is Django specific. Learn more about Django’s settings from Django documentation.

PAPERMERGE_ Prefix

There is slight difference where you place papermerge settings enumerated below. In short, when placed in papermerge.conf.py file, they don’t need PAPERMERGE_ prefix, while if you place very same configuration in django settings file - it needs PAPERMERGE_ prefix.

Papermerge settings can be either in:

  1. papermerge.conf.py file

  2. django settings file (the one referenced by DJANGO_SETTINGS_MODULE environment variable)

In papermerge.conf.py file configuration settings are without PAPERMERGE_ prefix, because all (well, 90%) of them are papermerge specific. In django settings file however, there are all sort of settings - for celery (prefixed with CELERY_), for allauth (prefixed with ACCOUNT_). Respectively settings for specific for papermerge are prefixed as well. Thus, any settings listed below, when added directly to django settings file - needs PAPERMERGE_ prefix.

Configuration file papermerge.conf.py is there for convenience. Most of the time you will need only that file.

Main App, Worker or Both?

Some configuration variables are for worker only (the part which OCRs the documents, imports documents form local directory or fetches them from imap/email account), some configuration variables are for main app only and some are for both. This distinction becomes aparent in case you deploy main app and worker on separate hosts; another scenario when this distinction is important in case of containerized deployment via docker - it so, because usually main app and worker will run in different containers - and thus will have different copies of papermerge.conf.py file.

The settings below specify for whom configuration settings is addressed. When it says: “context: worker” - it means variable applies only in context of worker i.e. it needs to be changed in papermerge.conf.py on worker instance/host/container.

When settings description states “context: main app, worker” - it means configuration needs to be changed on both - main app and worker in order to function properly.

Some of the most used configurations which you might be interest in:

Paths and Folders

DBDIR

  • /path/to/papermerge/sqlite/db/

  • context: main app

Defines location where db.sqlite3 will be saved. By default uses project’s local directory.

Example:

python
DBDIR = "/opt/papermerge/db/"

MEDIA_DIR

  • /path/to/media/

  • context: main app, worker

Defines directory where all uploaded documents will be stored.

By default uses a folder named media in project’s local directory.

STATIC_DIR

  • /path/to/collected/static/assets/

  • context: main app

Location where all static assets of the project Papermerge project (javascript files, css files) will be copied by ./manage collectstatic command.

By default uses a folder named static in project’s local directory.

Example:

python
STATIC_DIR = "/opt/papermerge/static/"

Document Importer

Importer is a command line utility, which you can invoke with ./manage.py importer, used to import all documents from local directory.

IMPORTER_DIR

  • /path/where/documents/will/be/imported/from/

  • context: worker

Location on local file system where Papermerge will try to import documents from.

Example:

IMPORTER_DIR = “/opt/papermerge/import/”

OCR

OCR_LANGUAGES

  • context: main app, worker

    Addinational languages for text OCR. A dictionary where key is ISO 639-2/T code and value human text name for language

Example:

python
OCR_LANGUAGES = {
    'heb': 'hebrew',
    'jpn': 'japanese'
}

Note that both hebrew and japanes language data for tesseract must be installed. You can check Tesseract’s available languages with following command:

bash List available languages
$ tesseract --list-langs

Default value for OCR_LANGUAGES uses following value:

python
OCR_LANGUAGES = {
    "deu": "Deutsch",  # German language
    "eng": "English",
  }

OCR_DEFAULT_LANGUAGE

  • context: main app, worker

By default Papermerge will use language specified with this option to perform OCR. Change this value for language used by majority of your documents.

Example:

python
OCR_DEFAULT_LANGUAGE = "spa"

Default value is “deu” (German language).

I18n and Localization

LANGUAGE_CODE

  • context: main app

This option specifies language of user interface. There are two options:

  • en - for user interface in English language

  • de - for user interface in German language

English is default fallback i.e. if you don’t specify anything or specify unsupported language then English will be used. Instead of en you can use en-US, en-UK etc. Instead of de you can use de-DE, de-AT etc. See here full least of all available language codes. You can translate Papermerge to your own language.

Default value: en

LANGUAGE_FROM_AGENT

If is set to True, will use same language code as your Web Browser (agent) does. Browsers send ‘Accept-Language’ header with their locale. For more, read here.

  • If True - will override LANGUAGE_CODE option. This means that with LANGUAGE_FROM_AGENT=True in whatever locale settings your Web Browser runs - same will be used by Papermerge instance.

  • If False - language code specified in LANGUAGE_CODE option will be used and ‘Accept-Language’ header in browser will be ignored.

Default value: False

Database

By default, Papermerge uses SQLite3 database (which is a file located in DBDIR). Alternatively you can use a PostgreSQL or MySQL/MariaDB database. Following are options for PostgreSQL and MySQL/MariaDB database connections.

DBTYPE

context: main app

DB type (if different from SQLite3). For PostgreSQL database use one of following values:

  • pg

  • postgre

  • postgres

  • postgresql

For MySQL/MariaDB database (they share same database backend) use one of following values:

  • my

  • mysql

  • maria

  • mariadb

Example:

python
DBTYPE = "mysql"

DBUSER

context: main app

DB user used for database connection.

Example:

python
DBUSER = "john"

DBNAME

context: main app

Database name. Default value is papermerge.

DBHOST

context: main app

Database host. Default value is localhost.

DBPORT

context: main app

Database port. Port must be specified as integer number. No string quotes.

Example:

python
DBPORT = 5432

Default value is 5432 for PostgreSQL and 3306 for MySQL/MariaDB.

DBPASS

context: main app

Password for connecting to database Default value is empty string.

EMail

You can import documents directly from email/IMAP account. All EMail importer settings must be defined in papermerge.conf.py on worker side. Read details about ingesting documents via IMAP account in document consumption chapter.

IMPORT_MAIL_HOST

context: worker

IMAP Server host.

IMPORT_MAIL_USER

context: worker

Email account/IMAP user. IMAP user needs read and write access to IMAP “INBOX” folder.

IMPORT_MAIL_PASS

context: worker

Email account/IMAP password.

IMPORT_MAIL_INBOX

context: worker

IMAP folder to read email from. Default value for this settings is “INBOX”.

IMPORT_MAIL_BY_USER

context: worker

Whether to allow users to receive in their inbox folder emails sent from their own email address. This capability of assigning attached documents to correct user’s inbox is called email routing and is described at length in One IMAP Account for Many Papermerge Users.

IMPORT_MAIL_BY_SECRET

context: worker

Whether to allow users to receive in their inbox folder emails containing their own secret. This capability of assigning attached documents to correct user’s inbox is called email routing and is described at length in One IMAP Account for Many Papermerge Users.

IMPORT_MAIL_DELETE

context: worker

Whether to delete emails after processing.

Binary Dependencies

Papermerge uses a number of open source 3rd parties for various purposes. One of the most obvious example is tesseract - used to OCR documents (extract text from binary image file). Another, less obvious example, is pdfinfo utility provided by poppler-utils package: pdfinfo is used to count number of pages in pdf document. Configurations listed below allow you to override path to specific dependency.

BINARY_OCR

context: worker

Full path to tesseract binary/executable file. Tesseract is used for OCR operations - extracting of text from binary image files (jpeg, png, tiff). Default value is:

python
BINARY_OCR = "/usr/bin/tesseract"

BINARY_FILE

context: main app, worker

File utility used to find out mime type of given file. Default value is:

python
BINARY_FILE = "/usr/bin/file"

BINARY_CONVERT

context: main app, worker

Convert utility is provided by ImageMagick package. It is used for resizing images. Default value is:

python
BINARY_CONVERT = "/usr/bin/convert"

BINARY_PDFTOPPM

context: main app, worker

Provided by Poppler Utils. Used to extract images from PDF file. Default value is:

python
BINARY_PDFTOPPM = "/usr/bin/pdftoppm"

BINARY_PDFINFO

context: main app, worker

Provided by Poppler Utils. Used to get page count in PDF file. Default value is:

python
BINARY_PDFINFO = "/usr/bin/pdfinfo"

BINARY_STAPLER

context: main app, worker

Provided by stapler. This external tool is used to reorder, cut/paste, delete pages within PDF document.

Default value is:

python
BINARY_STAPLER = "/usr/bin/stapler"

Depending on you system, and the way you installed stapler - you may want to adjust BINARY_STAPLER path.