Another NAS with a BananaPI

Since a few years now, I used to use a Sheevaplug as a NAS (see https://www.bluemind.org/linux-sheevaplug-perfect-nas and https://www.bluemind.org/linux-sheevaplug-perfect-nas-reloaded). The Sheevaplug was running fine, but I needed some more processing power and memory to make my document search as well as SMB file copy faster and host a saltstack master.

This install has been made with custom saltstack states from another machine before transforming itselft in my saltstack master.

Base install

Archlinux

Installing Archlinux on a Bananapi was not officialy supported (at time where I did the install, 1 year ago), but as the hardware is very similar to cubieboard 2, it just needed a dedicated uBoot. Hopefully, someone did it : 

http://archlinuxarm.org/forum/viewtopic.php?f=27&t=9445

I also optimized the SDCard by formating it with (see here for more info):

mkfs.ext4 -O ^has_journal -E stride=2,stripe-width=1024 -L "bananapi" -b 4096 -v -n /dev/sdb1

Then I changed root password (su, passwd) and changed the hostname (/etc/hostname)

Finaly I copied my custom Salt-stack minion installer script (see here) right after a full system upgrade:

pacman -Syu
 chmod +x salt-minion-install.sh
 ./salt-minion-install.sh

On the salt master side, thanks to the desired state configuration’s magic, a simple command deployed my full NAS solution :

sudo salt 'mynas.local.lan' state.highstate

I’m not going to share every states I used, just because some are pretty obvious and some others are more related to personal taste.

Basicaly what has been done is :

  • Remove alarmuser
  • Install netatalk with timemachine support
  • Manage Hard disk power
    (https://wiki.archlinux.org/index.php/Hdparm#Power_management_configuration)
  • Install cronie / set timezone / change anacron tab for job execution during night
  • Install samba + custom configuration and optimizations
  • Install + configure SSMTP (email sending)
  • Optimizations for a20 soc
  • Install and configure smartmontools
  • Install and configure NFS
  • Set custom mount option / custom io scheduler
  • Set cpufreq to ondemand
  • Install some sysadmin tools : bash settings, lsof, unzip…
  • Install + configure regain
  • install + configure vsftp
  • Create users
  • Set network optimization

Performance tuning

Based on the following forum’s thread which was very instructive, I kept the optimizations explained below.

http://forum.lemaker.org/thread-15543-1-1.html

Initial Calibration

Without any modification on the base Archlinux install, I had:

  • hdparm -t -T /dev/sda : around 139 Mb/s
  • FTP transfert (vsftpd) : 18 Mb/s read
  • Samba : 11.3 Mb/s read and 12.0 Mb/s write

All tests have been done with a CAT5 cable on the same switch.

Tuning operations

Force ethernet IRQ to CPU 1:

$(cat /proc/interrupts | grep eth0 | cut -f 2 -d ":" | tr -d " ") # give 2
echo 2 > /proc/irq/48/smp_affinity

Tune CPU frequency scaling:

echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo 960000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo 528000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

Adjust TCP stack buffers and properties:

sysctl -w net/core/rmem_max=8738000
sysctl -w net/core/wmem_max=6553600
sysctl -w net/ipv4/tcp_rmem="8192 873800 8738000"
sysctl -w net/ipv4/tcp_wmem="4096 655360 6553600"
sysctl -w vm/min_free_kbytes=65536
sysctl -w net.ipv4.tcp_window_scaling=1
sysctl -w net.ipv4.tcp_timestamps=1
sysctl -w net.ipv4.tcp_sack=1
sysctl -w net.ipv4.tcp_no_metrics_save=1

Finaly, set a bigger queue on eth0:

ip link set eth0 txqueuelen 10000

Results

  • FTP transfert (vsftpd) : 77 Mb/s read
  • Samba : 24.1 Mb/s read and 20.3 Mb/s write

While transferring a file, the smbd process is taking 100% of one CPU core. As one file transfer seems to use only one thread, the cpu need to be overclocked to get better performances, but A20 does not easily support large overclocking.

NFS being more performant, I tested it (nfsv4, ports 2049 and 111 must be open on the firewall, both for TCP and UDP)

Result :

  • 57.3 Mb/s read (I did not benchmarked write speed).

Some of used Saltstack states

I said earlier, I’m not sharing all my states, but the following ones are more related to previously mentioned tuning.

SD card optimization

sls file:

# define sdcard optimized mounting option for root fs on sdcards or emm flash
/:
  mount.mounted:
    - device: {{ grains['rootfs'] }}
    - fstype: ext4
    - opts: defaults,async,barrier=0,commit=100,noatime,nodiratime,errors=remount-ro
    - dump: 0
    - pass_num: 1
 
# set default IO sceduler to deadline for sdcard
# deadline scheduler could group small accesses to lesser sdcard latency
/etc/udev/rules.d/60-schedulers.rules:
  file.managed:
    - source: salt://sdcard_optim/60-schedulers.rules
    - user: root
    - group: root
    - mode: 644

 

60-schedulers.rules file:

# set deadline scheduler for sdcard
ACTION=="add|change", KERNEL=="mmcblk[0-9]", ATTR{queue/scheduler}="deadline"

Gigabit optimization

sls file:

/etc/sysctl.d/10-iptuning.conf:
  file.managed:
    - source: salt://gbnetoptim/10-iptuning.conf
    - user: root
    - group: root
    - mode: 644
 
/etc/udev/rules.d/60-custom-txqueuelen.rules:
  file.managed:
    - source: salt://gbnetoptim/60-custom-txqueuelen.rules
    - user: root
    - group: root
    - mode: 644
 
gbnetoptim_reload_udev:
  cmd.run:
    - name: udevadm control --reload-rules
 
gbnetoptim_change_txqueuelen:
  cmd.run:
    - name: ip link set eth0 txqueuelen 10000
 
gbnetoptim_reload_sysctl:
  cmd.run:
    - name: sysctl --system

 

file 10-iptuning.conf :

net.core.rmem_max = 8738000
net.core.wmem_max = 6553600
net.ipv4.tcp_rmem = 8192 873800 8738000
net.ipv4.tcp_wmem = 4096 655360 6553600
net.ipv4.tcp_timestamps = 0 # less CPU usage on small arm soc
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_no_metrics_save = 1
 
vm.min_free_kbytes=65536

 

60-custom-txqueuelen.rules file:

KERNEL=="eth[0,1]", RUN+="/sbin/ip link set %k txqueuelen 10000"
KERNEL=="eth[0,1]", RUN+="/sbin/ip link set %k txqueuelen 10000"

A20 Cpu optimizations

sls file:

# set default IO sceduler to deadline for sdcard
# deadline scheduler could group small accesses to lesser sdcard latency
/etc/udev/rules.d/65-schedulers.rules:
  file.managed:
    - source: salt://a20_optim/65-schedulers-sata.rules
    - user: root
    - group: root
    - mode: 644
 
/etc/systemd/system/a20_optim.service:
  file.managed:
    - source: salt://a20_optim/a20_optim.service
    - user: root
    - group: root
    - mode: 644
 
a20_optim_reload_systemd:
  cmd.run:
    - name : systemctl daemon-reload
 
a20_optim:
  service.running:
    - enable: True
 
a20_reload_udev:
  cmd.run:
    - name: udevadm control --reload-rules

 

file 65-schedulers-sata.rules:

# set deadline scheduler for sata (best perf for a20)
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="deadline"

 

file a20_optim.service:

[Unit]
Description=a20 optimizations service
After=network.target
 
[Service]
Type=oneshot
# set lower and higher cpu freq
ExecStart=/bin/sh -c "echo 528000 >/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq"
ExecStart=/bin/sh -c "echo 960000 >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq"
# avoid cpu detected as idle when there is IO wait (faster transferts)
ExecStart=/bin/sh -c "echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy"
# tune ondemand to be more reactive
ExecStart=/bin/sh -c "echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold"
ExecStart=/bin/sh -c "echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor"
# handle network interface IRQ via cpu1 (cpu0 handle sata)
ExecStart=/bin/sh -c "echo 2 > /proc/irq/48/smp_affinity"
RemainAfterExit=yes
 
[Install]
WantedBy=multi-user.target

Leave a Reply

Your email address will not be published. Required fields are marked *


*