Data Analysis in R: Basics and Beyond
In a rapidly changing (professional) context, lifelong learning has become a must. With microcredentials Ghent University offers a new type of course for a broad group of lifelong learners.
Praktische info:
Leertraject
Microcredentials are short academic programmes that meet the high quality standards of Ghent University. Both in terms of content and practice, the development of a microcredential is primarily focused on the needs of professionals and lifelong learners. The focus of the programme is on a well-defined set of learning outcomes. They consist of a limited number of subjects to which credits are linked. For following a microcredential, your learning account will be used . If you successfully complete the programme, you will receive credit certificates for the individual subjects and a recognised certificate that offers clear added value for professionals and employers. Microcredentials are organised by the 'academies for lifelong learning' of Ghent University.
Module 2 - Getting Started with R Software for Data Analysis
R is a flexible environment for statistical computing and graphics, which is becoming increasingly popular as a tool to get insight in often complex data. While in some ways similar to other programming languages (such as C, Java and Perl), R is particularly suited for data analysis because ready-made functions are available for a wide variety of statistical (classical statistical tests, linear and nonlinear modeling, timeseries analysis, classification, clustering, ...) and graphical techniques.
The base R program can be extended with user-submitted packages, which means new techniques are often implemented in R before being available in other software. This is one of the reasons why R is becoming the de facto standard in certain fields such as bioinformatics (Bioconductor) and financial services.
This course introduces the use of the R environment for the implementation of data management, data exploration, basic statistical analysis and automation of procedures.
It starts with a description of the R GUI, the use of the command line and an overview of basic data structures. The application of standard procedures to import data or to export results to external files will be illustrated.
Creation of new variables, subsetting, merging and stacking of data sets will be covered in the data management section. Exploration of the data by histograms, box plots, scatter plots, summary numbers, correlation coefficients and cross-tabulations will be performed.
Simple statistical procedures that will be covered are:
- comparisons of observed group means (t-test, ANOVA and their non-parametric versions) and proportions
- test for independence in 2-way cross tables and linear regression (focusing on the R-implementation of the statistical methods that are the subject of other modules of the statistics series)
Finally, installing new packages and automation of analysis procedures will also be discussed.
Practical sessions and specific exercises will be provided to allow participants to practice their R skills in interaction with the teacher.
Module 6 - Leverage your R Skills: Data Wrangling & Plotting with Tidyverse
Tidyverse is a collection of R-packages used for data wrangling and visualization that share a common design philosophy. The goal of this course is to get you up to speed with the most up-to-date and essential tidyverse tools for data exploration. After attending this course, you’ll have the tools to tackle a wide variety of data wrangling and visualization challenges, using the best parts of R tidyverse.
This course covers the most essential tools from 3 main R tidyverse
packages that are frequently used in general data analysis procedure.
Lectures with R code demonstrations are blended with hands-on exercises
which allows you to try out the tools you’ve seen in the class under
guides.
What you will learn:
- Data transforming and summarizing with dplyr: narrowing in on observations of interest, creating new variables that are functions of existing variables, and calculating a set of summary statistics (like counts or means)
- Data visualization with ggplot2: creating more informative graphs (e.g., scatter plot, bar plot, histogram, smoother/regression line, …) in an elegant and efficient way. Arranging multiple plots on a grid
- Data ingest and tidying with tidyr: storing it in a consistent form that matches the semantics of the dataset with the way it is stored.
- Extra tools for programming: Merging and comparing two datasets based on various matching or filtering criterion. Other useful tools for R programming.
Not included in this course:
- A systematic training guide in basics of R. If you never used R or RStudio before, we highly recommend you to take Module 2 of this year's program which will guide you to be familiar with the R environment for the implementation of data management and exploration tasks.
- Big data. This course focuses on small, in-memory datasets as you can’t tackle big data easily unless you have experience with small data.
- Statistics. Although you will see many basic statistics in this course, the main focus is on R and the tidyverse tools instead of explaining the statistical concepts.
Module 7 - Dynamic Report Generation with R Markdown and Quarto
R offers many first class features for statistics and data science. One of these, is certainly Rmarkdown, that allows seamless integration of analysis (code) and text. This greatly improves reproducibility, reduces copy-paste- and others errors and enhances possibilities for automation.
R markdown offers three main types of output: pdf, html and docx. The first session introduces the basic framework, the output-specific possibilities, the bookdown-extension and the easy and rewarding move to its recent less platform dependent successor Quarto.
The second session explores some general approaches for automation (using self-built templates for report-sections or complete reports) and presents Officedown. The latter is less flexible than Rmarkdown, but offers more options for docx-output.
Gerelateerde opleidingen
Preparation for AI: From Raw Data to Reliable Models | How to make your data AI-ready
Opleiding - Brugge - PUC - KU Leuven Continue