This week's assignment mainly deals with practicing working with more advanced forms of managing and working with data. 1. The varying sizes of each group causes the period and treatment tests order-dependent. In more layman's terms, the results of the tests change depending on how the values for both period and treat are arranged in the equation. The ANOVA results are somewhat inaccurate as a result of this variance. 2. The only singularity present in these model tests occurs in the last test (z ~ b * (x+y)). In this model, both x and y are proportional with each instance of B. R is incapable of recognizing this, resulting in coefficient values of N/A. The main issue with this is that R is not able to detect a singularity if a main effect (b in this case) is affecting the categorical variables present in the equation.
Experimenting with data over a span of time is something I've done with ggplot2 for quite some time; one such example was during the previous year when I wrote a series of functions to create visualizations of the price of certain videogames over the course of a decade. For this assignment, I opted to go for something more simple: a line graph showing the rate of employment from the late 1960's up to the mid-2010's using data from the economics dataset. ggplot ( economics , aes ( x = date , y = unemploy ) ) + geom_line ( ) + geom_smooth ( ) + labs ( title = "Time Series Plot of Unemployment with Smooth Trend Line" , x = "Date" , y = "Unemployment" ) + theme_minimal ( ) The addition of a trend line makes the data presented here more easily understandable to onlookers, with the main message of the data being made clear in the presentation, that being that unemployment in recent times has been steadily incre...
For the R package to make at the end of this course, I've decided to make a package called DescribeR. This package (hopefully) will be able to instantaneously provide a detailed summary of any data type given to it, from a typical variable held within RStudio's memory to descriptions of the columns in a given data frame. I'd also like to have the package utilize ggplot2 to make example visualizations of data frame information so as to better have the user understand the variables in said data frame and how they relate to one another. No two graph outputs should be alike when coming from the same data frame. GitHub link to description file: https://github.com/Retrolovania/R_Programming/blob/main/DescribeR/DESCRIPTION.txt
Comments
Post a Comment