Lately, it seems that every new project requires some aspects of Data Science. Whether it’s an extract, transform layer (ETL); a machine learning module; work on streams of data or a big data project. Finding trained engineers to meet the market demand has been challenging. Moreover, many of the engineering teams are unaware of how Data Science can help them in various tasks. The road to Data Science is usually serpentine and require some guidance and motivation to travel. Since most software engineers are already familiar with programming, databases and have decent mathematical grounding; it is easier to train them as potential resources. With this in mind, I conducted around fifteen hands on workshops. Along with these workshops, the attendees were provided with IPython notebooks to test their skills. Currently, I’m refining the workshops for a second round of training. Here, I’m posting the links of some of these workshops. There are some other workshops as well which were created as Shiny web apps. I’d change them into RMarkdown files to share here as well. Note that the presentations of all these workshops have been made using Slidify and the codes produce reproducible results with compilation in R. The links to these workshops follows.

  1. Data Science overview
  2. Regression
  3. Gradient Descent
  4. Model selection
  5. Linear Classifiers
  6. Decision trees
  7. Information retrieval
  8. Clustering
  9. Big data paradigm
  10. MapReduce and RDBMS
  11. No Sql selection
  12. Some useful resources for Data Science

As you can see from the slides, the intention behind the workshop is to provide a basic understanding of commonly used techniques and methods. I’ve also tried to keep mathematical depth to a minimal while making these workshops. Suggestions for improvement are more than welcome.