Data Engineering

for a Quick Service Restaurant Category

Challenge

The client reached out to us with a need of implementing a data engineering platform to collate, transform the data from multiple ERPs to achieve the following objective-

Unique Proposition-

1Thin and scalable data engineering layer
2Designed suitable for the cloud with pay per use to minimize the cost
3Designed suitable for adding any micro-strategy over the data architecture

The client had an idea, but they needed a technical partner:

1Who could work in tandem with them to brainstorm, plan, design, develop, and implement such a robust platform along with all the features they’ve envisaged.
2With the capacity to scale up resources as their product grows and evolves in the future

Architecture

AWS architecture
AWS architecture

Technology Stack

01

AWS

Redshift

  • Faster data loading from S3 and extremely fast transformations
  • Ability to pause the compute node when not in use – thereby saving costs
  • Columnar data store provides no limitation on concurrent queries
  • Optimized for structured data processing & traditional data warehousing use cases (surrogate key lookup, joins, aggregates)
  • Applicable for batch as well as near real time
  • Doesn’t require startup time
  • Curated Data Lake is optional. Data can be exported to a S3 bucket

02

EMR

amazon-emr
  • Spot instances helps keep the cost low
  • Ideal for unstructured/semi- structured data, transformations that are difficult to express in SQL and for data science use cases
  • Requires startup time (transient clusters)
  • Data lake was curated before before loading to RedShift

Outcome

1

Ability to create data lake – acquiring and transforming data from various sources such as PoS, ERP, etc.

2

Curated data was split into multiple marts using Redshift on the basis of business, geographical demarcations.

3

Faster data query – Query time was reduced to ms from several minutes

4

Efficient indexing

5

Multiple query options were provided on file (S3 using Athena) and database.

Next Case Study