Snowflake vs Redshift

Snowflake Vs Redshift: Cloud Data Warehousing Comparison

Data warehousing is rapidly moving to the cloud and with attractive pricings, unique architectures, and highly scalable solutions – Snowflake and Redshift Cloud Data Warehouse are two of the most powerful systems in the industry currently for overall data management – data storage, data processing, data cleansing, data sharing, and data analytics.

Comparing Snowflake with Amazon Redshift

With companies moving from on-premises traditional data warehouses to cloud-based warehouses, there are many options for cloud data platforms such as Redshift, Bigquery, etc. Here we compare two of the most popular data warehouse services available at this point –

FeatureSnowflakeRedshift
MaintenanceThe maintenance is automated and requires no human intervention.  Redshift required some housekeeping activities by system administrator as it uses manual workload management (WLM) to monitor the usage
Data StructureSnowflake supports structures and semi structured data types (JSON, AVRO)Redshift works seamlessly with structured data but with data types such as JSON, there are major speed implications
ScalingSnowflake can instantly scale without redistributing data or interrupting usersWhile Redshift also allows Scaling, it can take minutes to hours to add new nodes to the cluster. 
IntegrationsSnowflake lacks the seamless integration with Amazon Technology but does well with Informatica, IBM Cognos, Qlik, Power BI, etc. Amazon Redshift integrates perfect with other Amazon products and services such as Athena, Glue, Glue, Database Migration Service (DMS) building a complete ecosystem

Diving deeper into Snowflake

Being a relatively new player in the industry and with their recent initial public offering that raised $3.36 billion, let’s look deeper into the company and what makes their platform so successful.

Snowflake is a cost-effective, powerful and secure data warehouse built for the cloud.  Built on Amazon Web Services(AWS), Google Cloud Platform, and Microsoft Azure, Snowflake is a pure SaaS (Software as a Service) offering, which means that users require no physical or virtual hardware setup, no software setup. 

Easy to scale and very cost-effective, Snowflake’s distinguishing feature is that it separates cloud computing from storage. Basically, you are only paying for the capacity and performance you use. With Snowflake, you can store all your data in a single place, and size your compute independently using a multi-cluster, shared data architecture that enables you to manage your user and query concurrency needs as they change.

Architecture

On a high level, the data in the tables are stored on S3 and consume only the storage cost. There is no computer cost attached to the database unless you’re executing DDL or DML queries. You will only pay for the computing costs such as when you’re using DDL statements to create databases, schemas, or other structural objects. 

For example- if all of the building activities take 30 mins, you’d pay only for 30 mins of that computing. Once your structure is in place, you’d want to load some data into the tables. If the data load takes 2 hours a day, you again only for the compute cost of the 2 hours. The other cost associated will only be for the storage. Similarly, if you’re running queries for 3 hours a day, you’d be paying for the 3 hours of computing power.

Snowflake’s architecture consists of three key layers:

  • Database Storage: At this layer, the loaded data is reorganized into an optimized, compressed, columnar format, on the cloud. The organization, metadata, compression, statistics, and other aspects of data storage are handled by Snowflake. Snowflake supports semi-structured variables such as JSON, AVRO, Parquet, XML, and ORC in addition to Varchar and Number.
  • Query Processing: The “processing layer” runs queries. Snowflake processes query using “virtual warehouses” or “VW”. VW can be of varying size (single node or multiple – and are nothing but EC2 instances internal to VW). Each virtual warehouse is an independent compute cluster not sharing compute resources with no impact on the performance of each virtual warehouse.
  • Cloud Services: The cloud services layer is a collection of services that coordinate activities across Snowflake. This layer serves as the SQL client interface for DDL and DML operations. 
    • Authentication and Access Control
    • Infrastructure management
    • Metadata management
    • Query parsing and optimization
    • Security

Key Benefits

  • Instant Scalability and Performance: Snowflake provides instant data warehouse scaling to handle concurrency bottlenecks during higher and lower demand periods.
  • Leverages standard ANSI SQL: Because Snowflake used standard SQL queries, the need for special DBAs is eliminated as most teams already use SQL for their IT activities making it easy to get the system up and running quickly.  
  • Supports Structures and Semi-structured Data: While other data warehouses mostly support Varchar and number data formats, Snowflake also provides support for unstructured data types such as JSON, AVRO, Parquet, XML, and ORC.
  • Security: Snowflake ensures the highest levels of encryption and security for users, accounts, and data. You can check out the detailed summary of the security features here
  • Cost-Effective: The Snowflake interface doesn’t track idle time and only considers the usage time. You are only billed for the storage and compute time. 
  • Cloud Benefits: With the cloud infrastructure, Snowflake comes with no tedious hardware/software setup requirements, easily accessible, handling its own maintenance and performance tuning.

Conclusion

To find value in the colossal amount of data generated, companies need a place to store, organize, and analyze the data. If your organization has a diverse data ecosystem, either of the two systems – Snowflake or Redshift might be the right fit for your ever-growing business.

Eeshita Srivastava

Product at HappyFox. I like SaaS products, hikes, and Bon Iver.