Category Archives: Data Science

Learn the Ropes – Pig

Pig is a high-level platform for creating MapReduce programs used with Hadoop, originally developed by Yahoo in 2006. It is a powerful tool for querying data in a Hadoop cluster. It basically helps write Map-Reduce more easily. Pig is handly … Continue reading

Posted in Big data anlaytics, Data Science, Distributed Computing, Hadoop, MapReduce | Tagged , , | Leave a comment

The Journey to Hadoop

Intel co-founder Gordon Moore in 1965 noticed that the number of transistors per square inch on integrated circuits had doubled every year since their invention. This was later know as the “Moore’s Law“.(REF) A common corollary is  that the frequency … Continue reading

Posted in Big data anlaytics, Data Science, Distributed Computing, Hadoop, Parallel Computing | Tagged , , , , | Leave a comment

Learning from mistakes

The Use case My first endeavor in the field came in form of a proof-of-concept – a near real time dashboard to monitor and alert on issues during a particular stage of order processing. As a developer of Order Pipeline … Continue reading

Posted in Data Science, Statistics | Tagged | Leave a comment

Getting the basics down

Machine Learning was a field that piqued my interest (yes, yes.. it piques everyone’s interest, its a field everyone likes and people throw around randomly) since college. When an opportunity to develop ‘predictive models for order processing’ using machine learning … Continue reading

Posted in Big data anlaytics, Data Science, Hadoop, Python, Statistics | Tagged , , , | Leave a comment

A New Beginning

Introduction to blog Continue reading

Posted in Data Science | Tagged | Leave a comment