Developed ETL pipeline and dashboard to process and display
stored data in cloud for 75K+ live devices

Client Overview

US based leading manufacturer of residential and commercial water heaters and boilers, as well as heating, ventilating and air conditioning equipment delivering a new level of efficiency, convenience and comfort to users.

Business Challenge

The client has 75K+ on-field devices and all devices data is stored on different platforms, few devices data over Clearblade cloud and other in MongodB. It was challenging for client to monitor devices, analyze the data and find possible errors or anomalies in devices. Client was looking for DataLake solution that will constantly extract data from existing ecosystem, transform data to generate aggregations and store the processed data in a centralized location which is easy to access and can help to generate various analytics like time series, histogram, etc.

VOLANSYS Contribution

VOLANSYS helped its home automation client to define DataLake pipeline using Apache Airflow as job scheduler, AWS EMR as processing data, AWS Glue as data catalog and AWS S3 as data store. We also developed a dashboard utility to showcase the processed data which can help to perform several analyses of the system helping the client’s support team to quickly identify the issues in the system. Our team also worked on AWS architecture setup for auto deployment of ETL jobs and dashboard applications. The processed data can be used to develop algorithms and machine learning models to derive some intelligent insights for future decision making.   

  • Set up job workflow using Apache AirFlow
  • Developed ETL processing scripts, Python and Pyspark scripts for data processing using AWS S3, RDS, Athena, Glue, EMR
  • Developed an UI (dashboard) based on AWS using Python Dash framework to display and monitor the field devices data like
    • Number of connected devices
    • Device status – active/inactive
    • Device geographical location
    • Number of alarms processed in given time range
    • Histogram
    • Time series graphs
    • Alarm Analysis
  • Logs and errors in devices at given time
  • Developed common authentication/authorization gateway in Angular and Node.js which is integrated with client’s Azure active directory
  • Setup auto deployment for Apache Airflow, data processing scripts, visualization tool and other applications using Github actions, AWS Cloud Formation. Deployment is set up on AWS ECS (Fargate) with Load Balancer mapping and application docker image stored on ECR
  • Setup the Monitoring Dashboards and alerts using AWS Cloud Watch
Technologies | Engineering Expertise

Python 3 | Pyspark | Node.js | Angular | Dash | Plotly I AWS ECS | S3 | Elastic Load Balancer | Auto Scaling Group | ECR | RDS (Postgres) | EMR | Apache Airflow | Glue | Cloud watch | Athena | Cloud Migration | Data Engineering | Software Engineering | Cloud

Solution Architecture

Developed ETL pipeline and dashboard diagramETL pipeline

Benefits Delivered
  • Robust and simplified architecture for ETL operation
  • Improved client’s support team efficiency to find anomalies in devices and its root cause with an easy to use dashboard to view all devices related data, graphs and logs at single place
Similar Success Stories