Modernizing SQL Analytics: Dremio and FlashBlade

  • Cloud-native disaggregation of query engines and storage, making it simple to independently scale, operate, and upgrade systems.
  • Ability to query filesystem and object data together, allowing you to store data at the best location without requiring extra copies.
  • Support for Hive tables enables in-place migration from legacy Hadoop warehouses.

How to Configure Dremio and FlashBlade

Configure an S3 Data Lake Source

pureuser@irp210-c01> pureobjaccount create dremio
pureuser@irp210-c01> pureobjuser create dremio/dremio-user
pureuser@irp210-c01> pureobjuser access-key create --user dremio/dremio-user
  • fs.s3a.endpoint should point to the FlashBlade data VIP
  • fs.s3a.path.style.access should be “true”
  • fs.s3a.connection.ssl.enabled to “false.” You can enable SSL connections if you have imported a valid certificate into the FlashBlade.

Configure a Filesystem Data Lake

# Extra Volumes
# Array to add extra volumes to all Dremio resources.
extraVolumes:
- name: irp210-data
persistentVolumeClaim:
claimName: irp210-pvc-import
- name: phonehome-data
nfs:
server: 10.61.204.100
path: /phonehome
# Extra Volume Mounts
# Array to add extra volume mounts to all Dremio resources, normally # used in conjunction with extraVolumes.
extraVolumeMounts:
- name: irp210-data
mountPath: “/datalake”
- name: phonehome-data
mountPath: “/phonehome”

Configure a Hive Data Lake

  • fs.s3a.endpoint should point to a FlashBlade data VIP
  • fs.s3a.path.style.access set to true
  • Optionally, set fs.s3a.connection.ssl.enabled to false if you have not imported a certificate to the FlashBlade
  • fs.s3a.access.key to access key from FlashBlade
  • fs.s3a.secret.key to secret key from FlashBlade

Automating Data Lake Configurations

Filesystem or Object Store? Both

SELECT cc.cc_class,
Sum(cs.cs_sales_price) AS “sales total”
FROM
“flashblade-s3”.joshuarobinson.”external_tpcds”.”catalog_sales” AS cs
INNER JOIN “flashblade-nfs”.”call_center” AS cc
ON cs.cs_call_center_sk = cc.cc_call_center_sk
GROUP BY cc.cc_class

Distributed Store on S3

distStorage:
type: “aws”
aws:
bucketName: “dremio”
path: “/”

authentication: “accessKeySecret”
credentials:
accessKey: “ACCESS”
secret: “SECRET”


extraProperties: |-
<property>
<name>fs.s3a.endpoint</name>
<value>10.62.64.200</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>dremio.s3.compat</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</value>
</property>

Performance Testing

Conclusion

--

--

--

Data science, software engineering, hacking

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Git workflow — medium to large teams

Quality is the Responsibility of the Whole Team — trendig Blog

Stop Signal From Telling You When Your Contacts Join

3 Google platforms for Developers and Data Scientists

A family web host! Ballou: Complete and functional

LinkedList in Swift — Code a LinkedList Data Structure in Swift Playgrounds.

A real quick guide to deploy mongodb in AWS ec2 instance

Improve Engineering skills — Part 1: Communication, Analysis & Writing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joshua Robinson

Joshua Robinson

Data science, software engineering, hacking

More from Medium

StarRocks Provides Ultra-fast User Behavior Analysis with Greater Business Value

Data Oriented Computing — Historical Data vs Real-time Data

Cloud Migration for Enterprise Analytics Environment with On-Demand Clusters

The importance of broad discovery in metadata governance