+ - 0:00:00
Notes for current slide

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other.

Useful when presenting.

Notes for next slide
  • Galaxy objects like users, histories, datasets, and workflows are stored in the database.
  • All job information is likewise stored there.
  • User and reference data, however, are stored outside of the database.
  • You will want to backup both user data and your database.



Galactic Database



last_modification Updated:   purlPURL: gxy.io/GTN:S00008

video-slides Video slides | text-document Plain-text slides |

Tip: press P to view the presenter notes | arrow-keys Use arrow keys to move between slides
1 / 15

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other.

Useful when presenting.

Galactic Database

Galaxy uses a database for:

  • Galaxy objects and all their relations (users, histories, datasets, workflows)
  • Job state persistence, job dispatching

Galaxy does not use a database for:

  • Dataset contents: files on disk
2 / 15
  • Galaxy objects like users, histories, datasets, and workflows are stored in the database.
  • All job information is likewise stored there.
  • User and reference data, however, are stored outside of the database.
  • You will want to backup both user data and your database.

Defaults

  • Galaxy uses the SQLAlchemy database abstraction layer. This allows for different database servers to be plugged in.
  • By default Galaxy automatically creates and uses an SQLite database during the first startup.
    • The database is in the file database/universe.sqlite
3 / 15
  • Galaxy uses SQLAlchemy for interacting with databases.
  • This allows Galaxy to transparently use sqlite or postgres.

Choices

  • SQLite
    • Useful for single-user Galaxy or development.
  • PostgreSQL
    • The recommended standard for anything serious.
  • MySQL
    • Supported by SQLAlchemy but Galaxy is not tested against it.
4 / 15
  • sqlite is fine in development, but should not be used for production.
  • postgres is the best for any production server.
  • Do not use mysql if possible.

Sizing

Galaxy rarely deletes from the database, most objects are marked deleted.

Allocate at least 20 GB of disk to start, 50+ GB if expanding would be difficult.

8-16 GB memory should be sufficient.

Recommended: Run PostgreSQL on a different server for resource isolation.

5 / 15
  • Data is not removed from the database, so plan accordingly.
  • Allocated at least 20 GB of disk to start, and 50 if expanding later is difficult.
  • The RAM usage is usually not significant.
  • We recommend running the database on a separate server from Galaxy for better isolation.

Configuration

database_connection is specified as a database URL in galaxy.yml

  • Default SQLite: sqlite:///./database/universe.sqlite?isolation_level=IMMEDIATE

  • Local PostgreSQL (socket): postgresql:///<db_name>[?host=/var/run/postgresql]

  • Network PostgreSQL: postgresql://[user][:password]@<host>[:5432]/<db_name>

6 / 15
  • The parameter database_connection tells galaxy where the database is.
  • There are several styles, depending on how you connect to your database.

New Database

On first startup with an empty database, Galaxy creates its schema

7 / 15
  • On first startup with an empty database, Galaxy creates its schema.

Migrations

Changes in the Galaxy DB model (when upgrading Galaxy) are captured incrementally in the form of atomic scripts.

Each script can both upgrade and downgrade a DB.

$ ./manage_db.sh upgrade
$ ./manage_db.sh downgrade --version=132
8 / 15
  • Whenever Galaxy is upgraded, there may be changes to the database schema.
  • These are stored in migration scripts, and you can use the manage db script to up or downgrade.

Tuning - Pool

If the server logs errors about not having enough database pool connections.

Galaxy config option default value usegalaxy.org value
database_engine_option_pool_size 5 10
database_engine_option_max_overflow 10 20

Values for usegalaxy.org

9 / 15
  • Databases have a limited number of connection slots.
  • Galaxy can pool connections, and re-use existing connections from a pool when it needs to query the DB.
  • There are a couple of options for controlling pooling.
  • The defaults are generally fine until you see errors.

Tuning - Server-side cursors

If large database query results are causing memory or response time issues in the Galaxy process, leave it on server (PostgreSQL only, recommended).

Galaxy config option default value usegalaxy.org value
database_engine_option_server_side_cursors false true
10 / 15
  • Large queries may slow down Galaxy or the DB.
  • You can enable server side cursors to help with this.

Tuning - Slow query logging

Queries slower than this threshold (in s) will be logged at debug level.

Galaxy config option default value usegalaxy.org value
slow_query_log_threshold 0 2
11 / 15
  • If you notice slow responses, you can enable slow query logging.
  • This will print a message in your Galaxy log if a query takes more than a specified time period.
  • This can be useful to help you debug issues.

Tuning - TS install database

Galaxy can track Tool Shed data in a separate DB.

Galaxy config option default value usegalaxy.org value
install_database_connection value of database_connection SQLite DB in CVMFS

All other database config options but prefixed with install_ are also available.

This allows:

  • Bootstrapping fresh Galaxy instances with prebuilt/tested tool sets
  • Atomic installation/rollback (esp. with SQLite: backup and restore DB file)
12 / 15
  • Galaxy can track tool shed data in a separate database
  • This can enable deploying Galaxies with prebuilt tool sets

Access through model

Python script to access Galaxy's database layer via the Galaxy model.

(venv)$ python -i scripts/db_shell.py --config config/galaxy.yml
>>> new_user = User('foo@example.org', 'secret')
>>> new_user.username = 'foo'
>>> sa_session.add(new_user)
>>> sa_session.flush()
>>> sa_session.query(User).all()
13 / 15
  • You can use the Galaxy python models to interact with the database.
  • After activating the galaxy virtual environment, you can use the DB shell script to interface.
  • This can allow scripting tasks like resetting passwords.

Useful queries

Captured in gxadmin (tutorial)

14 / 15
  • Many useful DB queries are captured in gxadmin
  • Look into this if you need to query the database for information like running jobs or recent users.

Thank You!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!

page logo

Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.

15 / 15

Galactic Database

Galaxy uses a database for:

  • Galaxy objects and all their relations (users, histories, datasets, workflows)
  • Job state persistence, job dispatching

Galaxy does not use a database for:

  • Dataset contents: files on disk
2 / 15
  • Galaxy objects like users, histories, datasets, and workflows are stored in the database.
  • All job information is likewise stored there.
  • User and reference data, however, are stored outside of the database.
  • You will want to backup both user data and your database.
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow