.. _tutorial: =============== Getting started =============== This tutorial is for bioinformaticians. It will help you install the ReSDK and explain some basic commands. We will connect to an instance of `Genialis server`_, do some basic queries, and align raw reads to a genome. .. _Genialis server: https://app.genialis.com Installation ============ Installing is easy, just make sure you have Python_ and pip_ installed on your computer. Run this command in the terminal (CMD on Windows):: pip install resdk .. _Python: https://www.python.org/downloads/ .. _pip: https://pip.pypa.io/en/stable/installing/ .. note:: If you are using Apple silicon you should use Python version 3.10 or higher. Registration ============ The examples presented here require access to a public `Genialis Server`_ that is configured for the examples in this tutorial. Some parts of the documentation will work for registered users only. Please `request a Demo`_ on Genialis Server before you continue, and remember your username and password. .. _`Genialis Server`: https://app.genialis.com .. _`request a Demo`: http://genial.is/Demo-Request Connect to Genialis Server ========================== Start the Python interpreter by typing ``python`` into the command line. You'll recognize the interpreter by '>>>'. Now we can connect to the Genialis Server: .. literalinclude:: files/start.py :lines: 2-9 .. note:: If you omit the ``login()`` line you will be logged as anonymous user. Note that anonymous users do not have access to the ful set of features. The ``login()`` call will perform interactive login in a web browser. If you wish to log in as a different user, open the link in an incognito window. .. note:: When connecting to the server through an interactive session, we suggest you use the ``resdk.start_logging()`` command. This allows you to see important messages (*e.g.* warnings and errors) when executing commands. .. note:: To avoid copy-pasting of the commands, you can :download:`download all the code ` used in this section. Query data ========== Before we start querying data on the server we should become familiar with what a data object is. Everything that is uploaded or created (via processes) on a server is a data object. The data object contains a complete record of the processing that has occurred. It stores the inputs (files, arguments, parameters...), the process (the algorithm) and the outputs (files, images, numbers...). Let's count all data objects on the server that we can access: .. literalinclude:: files/start.py :lines: 11 This is all of the data on the server you have permissions for. As a new user you can only see a small subset of all data objects. We can see the data objects are referenced by *id*, *slug*, and *name*. .. note:: ``id`` is the auto-generated **unique identifier** of an object. IDs are integers. ``slug`` is the **unique name** of an object. The slug is automatically created from the ``name`` but can also be edited by the user. Only lowercase letters, numbers and dashes are allowed (will not accept white space or uppercase letters). ``name`` is an arbitrary, **non unique name** of an object. Let's say we now want to find some genome indices. We don't always know the *id*, *slug*, or *name* by heart, but we can use `filters`_ to find them. We will first count all genome index data objects: .. _filters: http://resdk.readthedocs.io/en/latest/ref.html#resdk.ResolweQuery .. literalinclude:: files/start.py :lines: 13 This is quite a lot of objects! We can filter even further: .. literalinclude:: files/start.py :lines: 15 .. note:: For a complete list of filtering options use a "wrong" filtering argument and you will receive an informative message with all options listed. For example:: res.data.filter(foo="bar") For future work we want to get the genome with a specific slug. We will `get`_ it and store a reference to it for later: .. _get: http://resdk.readthedocs.io/en/latest/ref.html#resdk.ResolweQuery.get .. literalinclude:: files/start.py :lines: 17,18 We have now seen how to use filters to find and get what we want. Let's query and get a paired-end FASTQ data object: .. literalinclude:: files/start.py :lines: 20-24 We now have ``genome`` and ``reads`` data objects. We can learn about an object by calling certain object attributes. We can find out who created the object: .. literalinclude:: files/start.py :lines: 26 and inspect the list of files it contains: .. literalinclude:: files/start.py :lines: 28 These and many other data object attributes/methods are described `here`_. .. _here: http://resdk.readthedocs.io/en/latest/ref.html#resdk.resources.Data Run alignment ============= A common analysis in bioinformatics is to align sequencing reads to a reference genome. This is done by running a certain *process*. A process uses an algorithm or a sequence of algorithms to turn given inputs into outputs. Here we will only test the STAR alignment process, but many more processes are available (see the `Process catalog`_). This process automatically creates a BAM alignment file and BAI index, along with some other files. .. _Process catalog: http://resolwe-bio.readthedocs.io/en/latest/catalog.html Let's run STAR on our reads, using our genome: .. literalinclude:: files/start.py :lines: 30-36 This might seem like a complicated statement, but note that we only run a process with specific slug and required inputs. The processing may take some time. Note that we have stored the reference to the alignment object in a ``bam`` variable. We can check the `status`_ of the process to determine if the processing has finished: .. _status: http://resdk.readthedocs.io/en/latest/ref.html#resdk.resources.Data.status .. literalinclude:: files/start.py :lines: 38 Status ``OK`` indicates that processing has finished successfully. If the status is not ``OK`` yet, run the ``bam.update()`` and ``bam.status`` commands again in few minutes. We can inspect our newly created data object: .. literalinclude:: files/start.py :lines: 40-42 As with any other data object, it has its own *id*, *slug*, and *name*. We can explore the process inputs and outputs: .. literalinclude:: files/start.py :lines: 44-48 Download the outputs to your local disk: .. literalinclude:: files/start.py :lines: 50 We have come to the end of Getting started. You now know some basic ReSDK concepts and commands. Yet, we have only scratched the surface. By continuing with the Tutorials, you will become familiar with more advanced features, and will soon be able to perform powerful analyses on your data.