RStudio in Galaxy
Author(s) | Bérénice Batut Fotis E. Psomopoulos Toby Hodges |
Editor(s) | Armin Dadras |
Reviewers |
OverviewQuestions:Objectives:
How can I manipulate data using R in Galaxy?
Requirements:
Launch RStudio in Galaxy
Time estimation: 3 hoursSupporting Materials:Published: Oct 8, 2019Last modification: Jan 21, 2025License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00157rating Rating: 4.0 (1 recent ratings, 12 all time)version Revision: 14
This tutorial will introduce you to how to run RStudio in Galaxy
CommentThis tutorial is significantly based on the Carpentries “Intro to R and RStudio for Genomics” lesson
RStudio is an Integrated Development Environment (IDE). Like most IDEs, it provides a graphical interface to R, making it more user-friendly, and providing dozens of useful features. We will introduce additional benefits of using RStudio as you cover the lessons. In this case, we are specifically using RStudio Server, a version of RStudio that can be accessed in your web browser. RStudio Server has the same features of the Desktop version of RStudio you could download as standalone software.
AgendaIn this tutorial, we will cover:
RStudio
Opening up RStudio in Galaxy is easy:
Hands-on: Launch RStudioDepending on which server you are using, you may be able to run RStudio directly in Galaxy. If that is not available, RStudio Cloud can be an alternative.
Currently RStudio in Galaxy is only available on UseGalaxy.eu and UseGalaxy.org
- Open the Rstudio tool tool by clicking here to launch RStudio
- Click Run Tool
- The tool will start running and will stay running permanently
- Click on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started.
If RStudio is not available on the Galaxy instance:
- Register for RStudio Cloud, or login if you already have an account
- Create a new project
You should now be looking at a page with the RStudio interface:
Creating your first R script
Now that we are ready to start exploring R, we will want to keep a record of the commands we are using. To do this we can create an R script.
Hands-on: Create a R script
- Click the File menu
- Select New File
- Click on R Script
A new panel appears on the top left. Before we go any further, you should save your script.
Hands-on: Save a R script
Click the galaxy-save icon (Save current document) in the bar above the first line in the script editor
Alternatively, you can also:
- Click the File menu and select Save
- Type CTRL+S (CMD+S on OSX)
In the Save File window that opens, name your file
genomics_r_basics
The new script genomics_r_basics.R
should appear under Files in the bottom right panel. By convention, R scripts end with the file extension .R
.
Overview and customization of the RStudio layout
Here are the major windows (or panels) of the RStudio environment:
-
Source: This panel is where you will write/view R scripts
Some outputs (such as if you view a dataset using
View()
) will appear as a tab here. -
Console/Terminal: This is actually where you see the execution of commands
This is the same display you would see if you were using R at the command line without RStudio. You can work interactively (i.e. enter R commands here), but for the most part we will run a script (or lines in a script) in the source pane and watch their execution and output here.
-
Environment/History: RStudio will show here you what datasets and objects (variables) you have created and which are defined in memory.
You can also see some properties of objects/datasets such as their type and dimensions. The History tab contains a history of the R commands you’ve executed in R.
-
Files/Plots/Packages/Help: This multipurpose panel will show you the contents of directories on your computer
- Files: You can also use this tab to navigate and set the working directory
- Plots: This tab will show the output of any plots generated
- Package: In this tab you will see what packages are actively loaded, or you can attach installed packages
- Help: It will display help files for R functions and packages.
All of the panels in RStudio have configuration options. For example, you can minimize/maximize a panel, or by moving your mouse in the space between panels you can resize as needed. The most important customization options for panel layout are in the View menu. Other options such as font sizes, colors/themes, and more are in the Tools menu under Global Options.
Comment: Working with R at the terminalAlthough we won’t be working with R at the terminal, there are lots of reasons to.
For example, once you have written an RScript, you can run it at any Linux or Windows terminal without the need to start up RStudio. We don’t want you to get confused - RStudio runs R, but R is not RStudio.
For more on running an R Script at the terminal see the dedicated Software Carpentry lesson.
How to call functions in R, without needing to master them?
A function in R (or any computing language) is a short program that takes some input and returns some output.
Hands-on: Calling a function in R
- Type
date()
in the Console panel- Type Enter
- Check what is displayed in the Console panel
You should obtain something like:
[1] "Tue Mar 26 15:12:24 2019"
Comment: Display of function call in the tutorialNow in the tutorial, we will display the function call like this:
> date() [1] "Tue Mar 26 15:12:24 2019"
The other way to execute these functions is to use the script we just created and then keep track of the functions.
Hands-on: Running a function via a script
- Type
date()
in the Script panel- Click on the Run the current line or selection or type CTRL+Enter (or CMD+Enter)
You should see in the Console panel the same as when we run the function directly via the console.
We would like now to keep information about this function
Hands-on: Comment in a script
Write on the line before
date()
a comment:# Gives the current date
- Select both lines
- Execute them
- Check the output
The comment line is displayed in the console but not executed.
Question: What do these functions do?Try the following functions by writing them in your script. See if you can guess what they do, and make sure to add comments to your script about your assumed purpose.
dir()
sessionInfo()
Sys.time()
dir()
lists files in the working directorysessionInfo()
gives the version of R and additional info including on attached packagesSys.time()
gives the current time
Warning: Commands are case sensitive!In R, the commands are case sensitive. So be careful when you type them.
You have hopefully noticed a pattern - an R function has three key properties:
- A name (e.g.
dir
,getwd
) first - A pair of
()
after the name -
0 or more arguments inside the parentheses
An argument may be a specific input for your function and/or may modify the function’s behavior. For example the function
round()
will round a number with a decimal:Input# This will round a number to the nearest integer > round(3.14) [1] 3
Getting help
What if you wanted to round to one significant digit, round()
can do this, but you may first need to read the help to find out how.
To see the help you need enter a ?
in front of the function name. The Help tab (in the bottom-right panel) will show you information.
Hands-on: Get help
Add a
?
in front of the function name to see the help> ?round()
Check the Help tab
In R, this help is sometimes also called a “vignette”. Often there is too much information. You will slowly learn how to read and make sense of them:
- Checking the Usage or Examples headings is often a good place to look first
- Under Arguments, we can also see what arguments we can pass to this function to modify its behavior
We can also see the arguments of a function without opening its help.
Hands-on: Get the function arguments
Type
args()
to see the arguments ofround
function> args(round) function (x, digits = 0) NULL
round()
takes two arguments:
x
: the number to be rounded-
digits
: integer indicating the number of decimal places to be usedThe
=
sign indicates that a default (in this case 0) is already set.
Since x
is not set, round()
requires we provide it, in contrast to digits
where R will use the default value 0 unless you explicitly provide a different value.
We can explicitly set the digits parameter when we call the function.
Hands-on: Call a function with several parameters
- Call
round
with 2 arguments
- x:
3.14159
- digits:
2
> round(3.14159, digits = 2) [1] 3.14
- Call
round
with 2 arguments
- 3.14159
- 2
> round(3.14159, 2) [1] 3.14
R accepts what we call “positional arguments”. If you pass a function arguments separated by commas, R assumes that they are in the order you saw when we used args()
. In the case below that means that x
is 3.14159 and digits
is 2.
Finally, what if you are using ?
to get help for a function in a package not installed on your system, such as when you are running a script which has dependencies?
Hands-on: Get help for a missing function
- Ask help for
geom_point()
Check the generated error
> ?geom_point() Error in .helpForCall(topicExpr, parent.frame()) : no methods for ‘geom_point’ and no documentation for it as a function
- Type
??geom_point()
- Check the Help tab
Using the two question marks (here ??geom_point()
), R returns results from a search of the documentation for packages you have installed on your computer in the Help tab.
Finally, if you think there should be a function, for example a statistical test, but you aren’t sure what it is called in R, or what functions may be available.
Hands-on: Search for a function
- Type
help.search('chi-Squared test')
- Check the Help panel
A list of potential interesting function related to “chi-Squared test” are listed. You can click on one of them to see the help of it. Remember to put your search query in quotes inside the function’s parentheses.
Question: Search for R functionsSearch the R functions for the following statistical functions
- Student-t test
- mixed linear model
While your search results may return several tests, we list a few you might find:
- Student-t test:
stats::TDist
- mixed linear model:
stats::lm.glm
We will not discuss now, where to look for the libraries and packages that contain functions you want to use. For now, be aware that two important ones are:
- CRAN: the main repository for R
- Bioconductor: a popular repository for bioinformatics-related R packages
RStudio contextual help
Here is one last bonus we will mention about RStudio. It’s difficult to remember all of the arguments and definitions associated with a given function.
Hands-on: Search for a function
Stopping RStudio
RStudio will keep running until you stop it, so you can always come back to your analysis later. However, once you are finished with your analysis, you should save the work you did within RStudio by exporting any files you created back to your Galaxy history, as well as the log describing all the commands you ran. Then you can safely shut down RStudio.
Hands-on: Stop RStudioWhen you have finished your R analysis, it’s time to stop RStudio.
- First, save your work into Galaxy, to ensure reproducibility:
- You can use
gx_put(filename)
to save individual files by supplying the filename- You can use
gx_save()
to save the entire analysis transcript and any data objects loaded into your environment.- Once you have saved your data, you can proceed in 2 different ways:
- Deleting the corresponding history dataset named
RStudio
and showing a “in progress state”, so yellow, OR- Clicking on the “User” menu at the top and go to “Active InteractiveTools” and locate the RStudio instance you started, selecting the corresponding box, and finally clicking on the “Stop” button at the bottom.
Interaction between RStudio and Galaxy
Getting data in and out from Galaxy
Import Data from the Galaxy History to RStudio
To import a dataset from the history into RStudio, you need to get the path to that file. To do so, you can use gx_get()
function with the dataset id
(number in the Galaxy history) . For example, if you want to import a dataset with history ID 7 to their RStudio, you can get the path to the file by:
gx_get(7)
It is important to know that the gx_get()
function copies the data from the Galaxy history to the RStudio session and returns the path to the copied file. You are supposed to use a proper R function to read the file. For example, you can pass the path to a function that reads tables such as read_table
or read_tsv
. Let’s assume that dataset 7 in the history is a tab-separated table (TSV) and you want to read it into your RStudio. You can do it as follows:
table_name <- read.table(gx_get(7))
Export Data from the RStudio
You can export the RHistory and all objects from RStudio to Galaxy as follows (analysis_17.01.2025
is an arbitrary name):
gx_save(session_name = "analysis_17.01.2025")
This data object can be loaded to R.
If you want to export just one file from your R environment to your Galaxy history, you should first write the object from the memory to a file and then use the path to that file to export it. For example, if you want to export a table called results
from RStudio to Galaxy you can do as follows:
write.csv(results, "./result.csv", row.names = FALSE) # Do you want to save the row.names or not?
gx_put("result.csv")