My job switched gears a year back as I joined ADDO AI as Chief Data Scientist and Senior Partner. In addition to my responsibilities as a Data Scientist, I get to write responses to RFPs, train data science and data architecture teams, manage team and external partnerships. Unfortunately, I’ve been ignoring this blog for quite some time. I’ve decided to re-initiate it as I normally have strong opinions about technology. Seldom, I also have something interesting to share with the community. There has been one more change, I’ve started using Python as part of my machine learning pipelines over R. I’ve mentioned previously about my extensive use of R Markdown, nbconvert and Jekyll for blogging. So my first hurdle was setting up this blog so that it can upload both R Markdown files and jupyter notebooks as blog posts. I’ll write a post soon on how that was done. My posts will be divided under the following tags:
- Programming: will discuss programming paradigms and tricks to optimize data science pipelines.
- Productivity: tags will talk about different IDEs, collaboration and management tools.
- Production: where I’d discuss about production grade machine learning and watch points.
- Workshops: where I’d either share my presentations or discuss major watchpoints against each workshop.
- Infrastructure where I’d talk about the latest technology, comparisons and paradigms needed for production grade machine learning.
- Data Modeling where I’d post about data model and query design for different relational and noSQL paradigms.
- Machine learning where I’d discuss some machine learning problem that interests me.
I’d take the liberty to venture about some of my biases. Amongst the machine learning frameworks, my first choice is TensorFlow; unless the task requires Random Forests or xgboost; in which case, my preferred choice is H2o. If my client does not have regulatory requirements, I’d recommend them to move towards cloud as opposed to an on-premise deployment. Among cloud providers, I really enjoy working on the Google Cloud Platform (read BigQuery; bigTable; cloudml). If the client requires an on-premise deployment, I go with deployments based on Cloudera distributions. I’m also somewhat vocal about my choice of IDE: one Emacs to rule them all. In programming, I’ve moved from R to Python and my writing style is more functional than imperative (I know, python does not provide tail recursion so what do I even mean by functional? More on that later). In future posts, I’ll shed light on the reasons for my proclivities.