- By Adam Nagus
Building a strong partner eco-system even between technology companies, is one of the main reasons technology organisations survive to grow into successful businesses. At Digimasters we focus on supporting the growth of alliances between companies, from technology firms to large large consultancies and small resellers. In this article we are exploring a new partnership between Dataiku and Snowflake and why we believe it is a good match and will support a customer success focused strategy, for both organisations.
Dataiku is a relatively new partner of Digimasters, which came about not just because we are both based in the same building in Aldgate, London, or even because there are ex Qlik employees now working for Dataiku, but because Dataiku was the first company I had met that had focused on data science as a collaborative discipline and not just for the nomad data scientist developing his or her own R/Python models in a corner of the office.
Collaborating, iterating and scaling in a very intuitive environment is the core of the Dataiku Data Science Studio (DSS) platform. If you haven't seen DSS then I recommend you check out this link.
One of the most important parts of a partnership is being able to pronounce your partner's name correctly and when we talk about Dataiku, I hear many different variations, so before we get into the meat of this article let me cover how the company name was created.
Dataiku is a "portmanteau" word combining Data - information that is produced or stored by a computer - and Haiku - a very short and structured form of Japanese poetry also renowned for their soothing simplicity. And that is the core concept of the company, simplicity. Keep data and the technologies around it SIMPLE, yet complete, straightforward, yet structured.
Founded in 2013 by 4 French Data Scientists, Dataiku released its first version of Data Science Studio (DSS) just one year after. Since then the company has increased to more than 130 members and 150 customers around the world and across industries. It's innovative profile has been already recognised by market leaders and named on Gartner as "Visionary" for second consecutive year.
What I have enjoyed using Dataiku DSS is not only it's ease, but also the effectiveness as an end-to-end Data Science Platform. Emphasising on productionisation capabilities, DSS tackles the issues most businesses struggle with, making Machine Learning run into production data.
Snowflake is another young company that has exploded on to the data and analytics scene, this time from the west coast of the US, with almost half a billion dollars in series A and B funding from investors.
Snowflake is a 100% cloud only data warehouse solution currently available on AWS and about to go live on Microsoft Azure in the latter part of 2018. Snowflake to its growing enterprise customer base, is a relational SQL database with a massive twist. Snowflake uses AWS S3 to store structured and semi structured data, which means that the cost to store data is extremely small. Blob storage like S3 maybe cheap, however it isn't particular performant. Here is where Snowflake's innovation and USP comes in.
Snowflake integrates with a cloud platforms elastic scaling architecture in order to grow and shrink its computing power in seconds, based on the demand for data coming from applications, reports, analytics and other data queries. With Snowflake's per second billing, you know you are paying for the compute power you are actually using and there won't be any bottlenecks in the data architecture at 9am, when everyone starts running their queries and analysis for the day.
There are many more benefits to Snowflake's services and if you want to geek out with us about their use of noSQL to optimise how data is stored in S3, then we would be more than happy to talk you through the details, however for now, we want to focus on what the benefits are of a technology partnership between Dataiku, Snowflake and Digimasters can bring to the world of data science.
Why a plugin connector?
Connecting Dataiku to a data source such as S3 is natively supported, however Digimaster's analytics architecture always recommends that you design your analytics platform to make use of 'in-database' processing where possible. What this means is that you don't have to pull the data from your data warehouse or data lake into your analytics tool for processing, all the processing takes place down where the data is stored in the warehouse or analytics mart.
With very large data volumes, it makes little sense moving all the data from S3 to DSS, therefore to make sure your analytics workflow is efficient and performant, you ideally want to have a fast scalable and cost effective analytics database working with Dataiku which supports in-database processing.
DSS comes with a number of connectors and also extensions or, in other words, plugins. Open to everyone, a plugin is a package of code with a basic graphical interface on top of DSS that enables the platform to connect or integrate with other technologies. After a customer asked Dataiku to support the evaluation of Snowflake, the decision was to investigate how Dataiku could utilise S3 and Snowflake to create a performant and very scalable productionised workflow for machine learning. The output of the investigation is a live plugin for DSS which supports a graphical interface for moving data between S3 and Snowflake as well as using in-database processing of Machine Learning models on the data in Snowflake.
To find out more about the plugin and how to install and configure the dataiku and Snowflake plugin please click here.
What does Snowflake enable?
This DSS Plugin offers the ability to quickly load data stored in S3 into Snowflake. Dataiku and Snowflake are two complementary solutions. Snowflake is a high performance, scalable data warehouse optimized for analytics workload, and can be used as "backend" computation engine for machine learning workflows by DSS.
The Plugin emulates a DSS "Sync" Recipe but leverages the built-in Snowflake mechanisms for fast data loading data stored in Amazon S3, enabling high performance data processing and ML modelling.
So what does that mean? Well it means that if you work in retail, then you can develop a productionised recommendation engine for your website, that can scale to any number of concurrent users, because you have Snowflake's automatic elastic scaling technology as part of the core architecture, supporting the decision and recommendations created by the models developed in DSS.
If you are working in cyber security or fraud detection, you now have a big data scalable warehouse hosted in the cloud powering your threat analysis and predictive models.
How the plugin was developed
As I mentioned above, the idea of the plugin connector was initiated by a Dataiku customer. The customer wanted to implement a Snowflake Data warehouse and have DSS running on top, to perform the Machine Learning (ML) tasks. Soon the customer realised that the performance was impacted as the data was increased. They liked both products, however the performance made it difficult for them to scale their projects (who likes slow processes anyway?). Both teams, Dataiku and Snowflake, showed a willingness to do something about that. Dataiku, in collaboration with Snowflake built a new Plugin that resolves that problem. For both companies that was a customer success story, but this is only the beginning a new partnership.
What next with the partnership
Digimasters hopes to see a lot more collaboration and innovation between technology companies and we enjoy testing and working with our own partners and customers to realise the new benefits that technology alliances can bring. As part of our own Partner Management and Alliance strategy service, we focus on identifying similar synergies between companies and help them build a business case for creating new assets. If you'd like to know more about how to develop a strategic partner programme between technology companies and consulting organisations then please get in touch with us here at Digimasters.
For Dataiku and Snowflake, I'd like to see continued investment in the plugin to the point it becomes a native connector. With Snowflake likely to expand to other cloud providers other than AWS, I hope Dataiku will get ahead of the game and have integration with Snowflake on Azure close to Snowflakes own release schedule.
I hope you found this article interesting, thought provoking and helpful. Please give this a Like/share on LinkedIn and would love to hear your thoughts in a comment below or on the LinkedIn post.
Please check out our other blog articles or try some of our visual analytics demos. If you would like to setup a meeting to discuss anything your have read or heard on our website, then please click here.
For more information on dataiku please check out this link