How to deliver on data initiatives in a competitive talent market?

In recent years, the term data science has become more popular due to the influx of data in all businesses. Data science is about getting valuable insights and answering questions by analyzing data using statistical methods, computing power, and automation. When a business is looking to answer a data-driven question, they must follow a set of predefined steps, known as the data science process, and know what these steps involve.

The process of data science includes more than one role. These roles within this process includes business analysts, data engineers, data scientists, and developers. Even though there can be some overlap, each of these roles is important and plays a vital part in the process. The business analyst provides the business understanding to guide the project, The data engineer prepares the data for use by the data scientist in model training,  The data scientist must understand the data to train and test the model. The developer is responsible for model deployment and operationalizing.


These days organizations are finding it hard to retain talent for their data science processes. Fueled by big data and AI, demand for data science skills is growing exponentially, according to job sites. The supply of skilled applicants, however, is growing at a slower pace. According to a  KMPG CIO Survey, taken by over 3,600 technology leaders at companies across the U.S., showed that 46% of chief information officers see “big data and analytics” as the area most suffering from a shortage in the nation’s job market. One way to address this shortage is by partnering with vendor(s) who offer data science services. This approach is important to provide in house data science teams resources including industry knowledge, skills and experience to deliver great data products for data-driven decision making. Most of the vendors offer these services on project basis, This is a great approach to accelerate data work in large organizations, but this approach is hard to sustain for long period of time due to cost especially for small to midsize companies. This can cause the data initiatives to slow down or not get delivered. The model which I found to be more effective for long period of time especially for small to medium size businesses is the DSaaS (Data Science As A Services) model, where the client has access to the entire data science team on a monthly subscription basis. This model can keep the cost down and take away the headaches which goes along retaining a large data science team. Another reason I like this approach because it is aligned with the agile philosophy of delivery which has higher rate of success than the traditional waterfall approach.  There are few firms that are offering data strategy and engineering services in this format like that delivers customized analytics and AI solutions.

What is Data Lake Storage?

The Data Lake feature allows you to perform analytics on your data usage and prepare reports. Data Lake is a large repository that stores both structured and unstructured data. Data Lake Storage combines the scalability and cost benefits of object storage with the reliability and performance of the Big Data file system capabilities. The following illustration shows how Azure Data Lake stores all your business data and makes it available for analysis.