12/02/2021
Monday: my history of coding
Hello! I'm Samantha, and I'll be tweeting from this account for the week! I’m a 3rd year PhD student at the University of Birmingham in the UK. My field is Clinical Bioinformatics and I am looking at inflammation! Today I’ll talk about my experience learning to code throughout education: when it started, how it progressed, and what I can do now!
I was lucky to attend a primary school with a computer suite, they allowed students to use them for homework, which I found useful since I didn’t have a computer at home. I would spend lots of time on the computers, my first search engine was Ask Jeeves. I didn’t understand what a search engine was, so seeing the logo “Ask Jeeves” I would literally ask it full questions! These days Google already knows what I want to ask before I have finished typing.
The first time I ever came across coding was when I was around 8 - Someone came to our school to show us “drag & drop” visual programming. They showed us a tool called Flowol, which is somewhat like Scratch. We started with a few exercises to practice, being shown how to drag and drop these shapes and that they make a sequence of commands. I enjoyed it so much that I rushed to make sure I could go through all the exercises before we ran out of time. We finished with a competition: we were grouped up and it was a race to solve a traffic light problem - there were lots of rules to think about: traffic light 2 can’t be green until traffic light 1 is red, and to remember lights are amber for a few seconds. I remember my group tried working on it together but we couldn’t figure it out. I didn’t want to stop, so I kept working on my own. I remember my sequence of shapes became very complex at the end with arrows pointing everywhere, but it worked!
Our school also had a programmable floor robot. There was only one, so the teacher would just show us how to use it. I remember it driving around from one spot to another and back! Reflecting, I enjoyed Flowol programming more in comparison to the floor robot. Perhaps because I didn’t get to use the floor robot and Flowol was a more hands-on experience... As much as I enjoyed these experiences I wouldn’t cross paths with programming again until I was in college as I had no computer at home and didn’t know where to start.
The next time I would meet programming again was during college, when I was 16. I decided to study IT in college because I enjoyed it during secondary school. I realised I was actually pretty good at it too, especially compared to my other subjects. I originally wanted to be a midwife and undertook child development and health and social care studies alongside IT. I wondered maybe there would be a way I could somehow combine healthcare and IT in my future. My IT course at college was really broad and covered animation, business, and web development. I even have the animation project I made!
I really enjoyed the web modules. I remember our first task was to build a basic website with HTML
and CSS
with Adobe Dreamweaver. I still didn’t have a computer at home, so I had to spend my evenings at school to make sure I wouldn’t fall behind. The goal for the next web module was to build another website with some interactive elements on top - this is where I met Javascript
! My IT course wasn’t programming-focussed so the Javascript
code snippets were from the teacher, but we could explore editing the snippets. For example a script to output the date could be changed to reformat how the date was output. When I got a laptop of my own, I tried to learn Javascript
in my spare time to include in my website, but I wasn’t the best at it. I did enjoy HTML
and CSS
! I liked making websites look nice: you can change fonts, colours, structure!
It was at this stage that I had to start thinking about my future: what were my career options? Did I actually want to be a midwife? Did I want to focus on IT? I found IT much easier and it was so fun. I asked my IT teacher for advice… My IT teacher strongly encouraged me to apply for computer science and to choose a university outside my hometown to gain independence and confidence.
I started my undergraduate degree at Aberystwyth University, UK, studying Computer Science in 2014. Some coding skills I learnt in my first year included: Java
, Arduino C
, and my old friend Javascript
. The CompSci department runs an activity weekend, almost all students in the first year go to an old manor house in the middle of Wales to do various team building activities without phone signal - it was a great weekend to meet others in the department!
One of my first challenges as an undergrad was the introduction to Linux and the command line interfaces. It was nothing like I’ve done before. I stepped into this whole new computing world that included other operating systems! I had previously only ever interacted with a Windows OS. I enjoyed but struggled with the Arduino C
work: I found it hard to wrap my head around all these new concepts, I took out a book or two from the library but I didn’t fully understand it as I hoped. On the other hand, I failed Java
. I think going from Arduino C
to object-orientated Java
was too confusing for me.
After my first year it was recommended to swap over to Business IT for my second year onward, and I’m happy with that decision. I found programming languages difficult yet enjoyed web building. But it was not the end of my coding experience! My second year modules included: web programming, sys admin, and databases. During web programming I learnt PHP
and in databases I learnt PostgreSQL
. I also chose some modules outside the department as I wanted to widen my career choices: a Web developer or IT teacher. My third year included a group project, my final dissertation, and other business modules. But I found that I lost interest in business IT so I chose a computer science module: machine learning.
My final dissertation was creating a website for a business with a database. I made a small cake shop with a forum for users to talk recipes or general chats. My machine learning module did not include any programming, instead the assignments were to use the Weka tool. At the end of my third year, I needed to start thinking about my next steps: I had applied for various Masters courses: IT management, Digital Curation, and Data Science.
Over the Summer of my undergraduate graduation (2017), with some help I started to learn other skills, specifically I tried Python
. Python
I clicked with. Despite my troubles with Javascript
, PHP
, Arduino C
I finally felt like I understood a language with ease! During that Summer, I also did a research project with Amanda Clare. With Amanda, I looked at DNA sequences and the current tools available for analysis. This research project is where I found my interest in Bioinformatics! I properly dived into Python
at this point: using the pip package manager, calling modules, and studying these DNA sequences and the technology used to make them! I had access to the ABER cluster, which was scary because I need to visualise my work and so this was the first time I couldn’t rely on an IDE: I had to use a terminal and the command line interface to access directories and scripts, but it was at this point I learnt bash scripts!
I chose my Masters in Data Science! Learning Python
and the Summer project were reasons why I chose Data Science over the other choices. I lost my interest in IT and Data Science meant I would stay in computer science, plus I was told it would be a great career choice!
During the Masters course in Data Science, we explored various topics: statistics, machine learning, and databases. I also learnt new coding skills: R
, Scikit-learn
in Python
, and MongoDB
. I also had a Python
module which I loved! This module was taught via lectures and live coding during the session, the lecturer would get us involved. I learnt about dictionaries, SQLite
, APIs. My Masters project was with Amanda, we looked at Acidobacteria: the content in DNA sequences, “Metagenomic of acid soil: a study of Nanopore long-reads and Acidobacteria”. As part of this I made my first Python
tool which looks at Acidobacteria reads from Nanopore metagenomic data!
Tuesday: PhDs application tips & PhD work
The first thing I wanted to discuss is PhD applications! I’ve had a few questions from Masters students over the past year about PhD application essays & interviews - I’ll write some tips from the blog in this thread!
1. Skills to highlight: mention all the modules you did during your undergraduate and/or masters! All are relevant, but maybe focus a little more on the ones that might be specifically related to the PhD project! Also what skills have you USED?
2. Don’t worry if you don’t think you’re good at a particular skill - what’s important is that you know about it and doing the PhD gives you the chance to improve that skill!
3. Don’t forget the non-technical skills like “communication” or “presenting” - you’ll probably have to present your PhD work at some point!
4. If you are presenting at the interview, put skills or buzzwords in bold to highlight what you can do! Be prepared for possible questions like, “how would you do it differently?” - but questions about the presentation depend on the topic
5. Interview questions: they can range from technical, or questions about yourself! Be prepared for the dreaded question: “what is your biggest weakness?”
6. A personal opinion is that I felt like the interview panel wanted to get to know me: was I a good fit? Did I know what I was talking about?
7. Technical questions can include, “how would you handle a particular dataset?” or “how would you run a GWAS?” - remember, some questions have no wrong answers - they genuinely just want to know your response!
8. Try and throw in some buzzwords that may be relevant for the PhD in terms of skill! But of course make sure you know what these words are - they may pick up if you start throwing random words around
9. Do some background research into the PhD supervisor! Find out their main paper topics, or what people in their lab are doing
10. The PhD is a learning and development journey, so even if you don't fully understand some things try not to worry: the PhD is a time for the supervisor to pass on their knowledge and you’ll learn other stuff
11. Be yourself! Don’t be afraid to answer a question in your own way! And don’t be afraid if you don’t have an answer!
My PhD is focused on inflammation! I am looking into non-traditional biomarkers of inflammation through various methods & data sources: structured electronic health records and unstructured clinical letters. Some possible non-traditional biomarkers could include blood counts/biochemistry or clinical letters telling us a patients’ current status, treatment, and symptoms - additionally some associated genetic traits.
My original PhD title was different: the plan was to use various Biobanks studies and link up missing gaps, however we soon learnt that the data wasn’t available... The new aim is to work on patients in the UK Biobank, which is a long study aimed to investigate the contributions and development of a disease both genetically and environmentally.
"Creating an AI based data assistant to bridge genotype to metadata linking primary clinical data to biobank sample" - my original PhD title
To link back to yesterday and my coding experience - currently my main language is Python
, sometimes I use R
for statistics or plotting. At first, I struggled a lot with Python
dictionaries but I’ve finally become friends with them.
Wednesday: additional PhD work, skills, and tools (both made & recommendations)
My main interests and career ambitions are: machine learning, ontologies
, and natural language processing!
Machine learning (ML) is “learning from data”, it can be supervised prediction with labeled data or unsupervised clustering finding underlying relationships in datasets. I’m personally interested in clustering as I like to observe patterns and in my data and try to interpret the cluster groups, I’ve used K-means
, hierarchical
, DBScan
, plus more! Because I work with such large datasets, I have used various dimensionality reduction methods: specifically t-SNE
and PCA
. An interesting tool I found is PCAmixdata
for R
- essentially it combines a PCA
(continuous) and MCA
(categorical) results - I would definitely recommend! The main toolkit I use for my clustering pipelines is the scikit-learn Python
module! I use it for dimensionality reduction, optimal number of K
, and clustering! I like to say that there’s no “right way” to do machine learning - but parameter finetuning can improve results! Don’t forget visualisation - it’s important to see what the data looks like too.
Ontologies
condense a domain of knowledge in formalised structure. The concepts of an ontology
have metadata and relationships, for example in human anatomy: hand “part of” arm [synonym = upper limb]. Ontologies
allow us to do association text mining: we can extract important sentences from a document using terms and their corresponding synonyms. Linking with natural language processing (NLP), NLP is using computers to look at human language and interaction. I aim to look at those clinical letters of patients with inflammation! So far I have looked into word vectorising (word2vec
): looking how terms are closely related to one-another from different sources, combining this with ontologies
we can do semanity similarity! If you want a way to visualise an ontology
, I recommend: WebVOWL
- it’s an online application with interactive features! If you are interested in NLP, I recommend: spaCy
- it supports many languages, very fast, and has pre-trained models! I personally find this tool the best one I’ve used for PoS
tagging and such thus far.
Doing my work for association text mining or semantic similarity with documents and ontologies
is difficult because a lot of tools that exist for NLP tasks don’t include features for ontology
extraction... A first step to overcome this barrier, I made Jabberwocky
! An initial plan to plug-in an ontology
for associated text mining: using an ontology
term and it collects the synonyms - my future plan is to incorporate spaCy
with it!
Finally, currently I have been doing some genome-wide association studies (GWAS
): looking into genetic variants of a group of people to observe if any variant is associated with a trait. An example of this can be looking at patients with diabetes and observing if they have the same variant which is associated with a particular trait. To be quite honest, this is not my strongest point - as I don’t feel quite knowledgeable in the genetics area yet, but individuals in my lab co-created an R
& bash pipeline which they shared with me!
Thursday: day of struggles
One of my biggest struggles is trying to explain my methods or results to others - I’m still learning about terms I should be using or still trying to interpret the biological meaning... An example of this is: using “feature” instead of “column” or “variable” when describing my data or methods.
Writing - I actually don’t mind writing even though my grammar does seem to be terrible...BUT it doesn’t stop me from trying, and I appreciate that others spend time improving my work! It started off quite difficult to accept feedback, I felt like I was so bad at writing and I was embarrassed - but I’ve learnt to accept and value comments! I appreciate those who help me to improve my work.
It’s important for me to have some sort of IDE while programming. I like to easily run a script and observe my variables - I need to “see” my output before I can continue, maybe it’s because I’m still not 100% confident in my programming skills. I use Spyder
for Python
! But to run lines separately, some good tools are Jupyter Notebook
(Python
) or RStudio
(R
).
Conferences: I’m quite nervous about conferences because I worry that I’ll get asked a question about my work which I can’t answer - but I know they’re good for networking and career experiences!
Mental health! It’s very important to take time for yourself and do things you enjoy! I’ve had to take a day or few off in the past year because of the overwhelming stress… No matter if you’re a student, in academia, or industry, you need to take care of yourself and it’s important for businesses/companies to take care of their employees! In 2018 I wrote a blog post about witnessing my partner writing their PhD: watching them sad, happy, angry…it was hard to watch that - and now being in that position, I completely understand!
Trying to survive this pandemic has been difficult. I’m lucky to not have my work affected as it’s computational - but I still struggle: my body aches as I don’t get my daily walk into work & my back hurts from my cheap chair! I’m too nervous to go outside for a walk because I see a lot of people not wearing masks and these variants scare me...
Friday: FUNday
Hobbies! Since the pandemic I’ve started a bunch of hobbies: knitting, clay sculpting, and even coding for fun! I have a Colours
project on GitHub, I make small simple tools with the techniques I’ve learnt throughout my PhD! Music I listen to while I work: I love movie soundtracks! Such as Tron Legacy & Interstellar - Hans Zimmer is my go-to!
Note: I ran the account from the 8th until the 12th February 2021 - some information may be missing/incorrect.
@sap218 my Twitter account
@ResearcHersCode ResearcHersCode Twitter account
Lonely Bird my Animation IT project (2014) - may contain mild peril
undergraduate dissertation the report (2017)
Bioinformatics poster the Summer research project (2017)
Bioinformatics blog post the Summer Research
postgraduate dissertation the report (2018)
acidoseq first Python
tool for the Masters dissertation
Masters blog post the project timeline
PhD application tips from essays to interviews
Jabberwocky repository GitHub
Jabberwocky paper JOSS (2020)
PhD emotions a blog post
personal GitHub to see my coding hobby projects