Kindergarten Data Science

Last year I taught a data science lesson to my daughter’s kindergarten class. I learned that they already practice data science: in the morning they would observe the weather, record it on the computer, and then look at a visualization of the weather so far in the month. Amazing! I was able to share with them how that same process (observe/collect data, clean data, analyze data, and visualize data) is what I do at work.

I walked them through two data analyses. First we recorded their birth months and then discussed which months and seasons had the most, and fewest, birthdays. After the lesson I passed out a blank bar chart that they used to create a chart that represented the actual birth month counts.

Next, we read the book A Pig, A Fox, and Stinky Socks. I asked the kids which words they thought occured the most and then we looked at a wordcloud I created and compared it to their guesses.

Here are the packages used to create the wordcloud with a non-standard color palette.


The data collection process required me to type the entire book into a text file. Enjoy the first few pages of literary gold:

book <- readLines("C:\\Users\\Chris\\Documents\\GITHUB\\\\files\\pigfoxcorpus.txt")
## [1] "I am fox"          "I am pig"          "I am little"      
## [4] "I am big"          "I have some socks" "I like to play"

The text needed to be read in as words, cleaned up to be lists of unique words and standardize the case.

words<- scan("C:\\Users\\Chris\\Documents\\GITHUB\\\\files\\pigfoxcorpus.txt", what="character", sep="\n")
words <- tolower(words)
words <- strsplit(words, "\\W")
words <- unlist(words)
## [1] "i"   "am"  "fox" "i"   "am"  "pig"

Finally, the words are counted; common words removed; and the wordcloud created.

freq2<- filter(freq,x!="is",x!="and",x!="the",x!="am",x!="of",x!="a",x!="to",x!="in",x!="for",x!="i")
wordcloud(freq2$x,freq2$freq, random.order = FALSE,min.freq = 2,colors =pal2)