Having spent so long within tertiary education (seriously, I started my Bachelor’s in 2011), I think it’s fair to say that I really enjoy learning new things. It can feel tedious sometimes, but I think at the core of it, I would never want to stop learning.
I mentioned in my previous post that I chose my Ph. D project, partly because I would get to expand my skill-set and learn a whole new field (e.g. Metabolomics, which in this particular instance was the study of Bacterial metabolism). So, naturally enough, even when I was looking for work after my Ph. D, I wanted to do something that would let me take some of the skills gained from said degree, but also get the opportunity to learn more stuff.
And this leads me to my past month and a half-ish, where I’ve been learning how to use this
abomination coding language:
For those that aren’t familiar (those that know this would probably be recoiling a little- maybe some cat hisses), this is R, the language that many scientists and data analysts use to sift through vast amounts of data.
In its simplest form, you could use it as a calculator. Type in an equation in your script file and hit ‘run’, and the console will spit out the answer.
You can also use it to load data frames. For instance, this random list of passengers onboard the ill-fated voyage on the Titanic:
But hypothetically speaking, you could import and load data frames from, say, a screen of hundreds of chemical compounds to see if it could kill cancer cells, or a database that records characteristics of samples received by a lab (e.g. when it was received, where it was from, what it is, what tests it had undergone, etc).
With such vast amounts of data, extracting key bits of information (and omitting those that aren’t) could be time consuming, if done manually.
The reason why many people use this program is because you can essentially write a command that tells the program to do it automatically for you.
For instance, say I wanted to look at only female passengers over the age of 30:
This literally took less than 1 second to produce. It comes down to the processing power of your computer, but given I sifted through a data frame that contained 721 rows and 11 columns, that’s pretty nifty.
If I wanted to save this as a spreadsheet (to be viewed on Excel), I could do that:
Or, I could keep playing around with it.
How about I make a graph of all 721 passengers in this data frame?
I could add another factor and colour code each point according to the Sex of the passengers:
And then I could just make things a little easier to interpret by adding more features to the graph…
I could even save the plot as its own individual file:
You can literally go nuts with this- the world is your oyster. This is literally just the tip of the iceberg (pun intended), and the stuff I’m actually doing for work is on another dimension.
Hopefully one day I’ll be able to write my own code more comfortably, but right now it’s sort of just at that point where I’m going, ‘I think I understood some of that?’. It really is like learning a whole new language.
For those that are also learning, there’s a bunch of resources out there for this program. I got send many links, so I’m just going to dump them below:
- An introductory tutorial on the use of R and RStudio, plus some of its packages: https://bookdown.org/ansellbr/WEHI_tidyR_course_book/#how-to-use-this-book
- Fundamentals of Data Visualization: https://serialmentor.com/dataviz/
- The R Graph Gallery: https://www.r-graph-gallery.com/
- R Graphics Cookbook: https://r-graphics.org/index.html
- R for Data Science: https://r4ds.had.co.nz/
- Data analysis using R: https://uomresearchit.github.io/r-tidyverse-intro/
But lets be honest- the best way to solve an RStudio or R query is to just Google it.
A former wet-lab based Bacteriology Ph. D student residing in Australia. Now working part time at a secret location as a Communications and Data Officer. 👀 🦠 🧫 🧬