S5cmd for High Performance Object Storage

Why Use S5cmd

  1. Written in a high-performance, concurrent language, Go, instead of Python. This means the application can take better advantage of multiple threads and is faster to run because it is compiled and not interpreted.
  2. Better utilization of multiple tcp connections to transfer more data to and from the object store, resulting in higher throughput transfers.

Installation and Usage:

[default]
aws_access_key_id = XXXXXXX
aws_secret_access_key = YYYYYYYY
go get -u github.com/peakgames/s5cmd
> s5cmd --endpoint-url http://10.62.64.200 ls s3://joshuarobinson/
+ DIR backup/
+ 2017/10/13 10:31:29 73 people.json
+ 2019/07/10 12:39:43 53687091200 two.txt
2019/08/02 03:08:13 +OK “ls s3://joshuarobinson” (13)
> s5cmd --endpoint-url http://10.62.64.200:80 -uw 64 cp /source s3://joshuarobinson/dest
alias s5cmd='s5cmd --endpoint-url http://10.62.64.200 -dw 32 -uw 32'

Test Environment

FROM golang:alpine
RUN apk add git
RUN go get -u github.com/peakgames/s5cmd
docker run -it --rm \
--entrypoint=time \
-v /home/ir/.aws/credentials:/root/.aws/credentials \
-v /tmp/one.txt:/tmp/one.txt \
$IMGNAME s5cmd --endpoint-url http://10.62.64.200:80 -uw 32 cp /tmp/one.txt s3://joshuarobinson/one.txt

Alternative Tools

S3cmd

FROM ubuntu:18.04
RUN apt-get update && apt-get install -y s3cmd --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
access_key = XXXXXXXX
proxy_host = $FB_DATAVIP
proxy_port = 80
secret_key = YYYYYYYYY

S4cmd

FROM ubuntu:18.04RUN apt-get update && apt-get install -y git python3 python3-pip python3-setuptools --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
ARG S4RELEASE=2.1.0
RUN git clone git://github.com/bloomreach/s4cmd.git \
&& cd s4cmd && git checkout tags/$S4RELEASE -b release \
&& pip3 install pytz boto3 && python3 setup.py install
docker run -it --rm -v /tmp:/tmp \
$IMGNAME s4cmd --endpoint-url http://10.62.64.200:80 --num-threads=128 put /tmp/one.txt s3://joshuarobinson/one.txt

Goofys

> export GOPATH=$HOME/work
> go get github.com/kahing/goofys
> go install github.com/kahing/goofys
> goofys --endpoint http://10.62.64.200 joshuarobinson /mountpoint/
> time cp /tmp/one.txt /mountpoint/one.txt
> rm /mountpoint/one.txt

Aws-cli

[default]
s3 =
max_concurrent_requests = 1000
max_queue_size = 10000
multipart_threshold = 64MB
multipart_chunksize = 16MB
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y awscli --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
COPY config /root/.aws/config

Large Object Performance

upload failed: tmp/one.txt to s3://joshuarobinson/one.txt filedescriptor out of range in select()

Small Object Upload

> split -C 1M /mnt/joshua/one.txt prefix-
> s5cmd cp /tmp/prefix-* s3://joshuarobinson/tempdata/
ls /src/data-* | xargs -n1 -i -P 64 s3cmd -q put {} s3://bucketname/some/

Performance Comparison in AWS

Summary

--

--

--

Data science, software engineering, hacking

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Implementing new CMS/CXM this year? Here’s 4 points to consider.

How to Create an Animated GIF With PHP

Oculus Quest v25 update further expands Test Account functionality

Gin 101: Enable TLS/SSL

Installing protoc on Linux Mint

The 18 Characteristics of a Great Product Owner

Multifunctional Developers

pytest +Selenium+Python unittest.TestCase — (1)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joshua Robinson

Joshua Robinson

Data science, software engineering, hacking

More from Medium

Enable GRPC towards your HTTP/1.1 API

Read the documentation first

How to Implement MySQL Proxy with Rust in Pisa-Proxy?

Kafka Connect Custom Single Message Transform using JSLT