
Another NAS with a BananaPI
Since a few years now, I used to use a Sheevaplug as a NAS (see https://www.bluemind.org/linux-sheevaplug-perfect-nas and https://www.bluemind.org/linux-sheevaplug-perfect-nas-reloaded). The Sheevaplug was running fine, but I needed some more processing power and memory to make my document search as well as SMB file copy faster and host a saltstack master.
This install has been made with custom saltstack states from another machine before transforming itselft in my saltstack master.
Base install
Archlinux
Installing Archlinux on a Bananapi was not officialy supported (at time where I did the install, 1 year ago), but as the hardware is very similar to cubieboard 2, it just needed a dedicated uBoot. Hopefully, someone did it :
http://archlinuxarm.org/forum/viewtopic.php?f=27&t=9445
I also optimized the SDCard by formating it with (see here for more info):
1 |
mkfs.ext4 -O ^has_journal -E stride=2,stripe-width=1024 -L "bananapi" -b 4096 -v -n /dev/sdb1 |
Then I changed root password (su, passwd) and changed the hostname (/etc/hostname)
Finaly I copied my custom Salt-stack minion installer script (see here) right after a full system upgrade:
1 2 3 |
pacman -Syu chmod +x salt-minion-install.sh ./salt-minion-install.sh |
On the salt master side, thanks to the desired state configuration’s magic, a simple command deployed my full NAS solution :
1 |
sudo salt 'mynas.local.lan' state.highstate |
I’m not going to share every states I used, just because some are pretty obvious and some others are more related to personal taste.
Basicaly what has been done is :
- Remove alarmuser
- Install netatalk with timemachine support
- Manage Hard disk power
(https://wiki.archlinux.org/index.php/Hdparm#Power_management_configuration) - Install cronie / set timezone / change anacron tab for job execution during night
- Install samba + custom configuration and optimizations
- Install + configure SSMTP (email sending)
- Optimizations for a20 soc
- Install and configure smartmontools
- Install and configure NFS
- Set custom mount option / custom io scheduler
- Set cpufreq to ondemand
- Install some sysadmin tools : bash settings, lsof, unzip…
- Install + configure regain
- install + configure vsftp
- Create users
- Set network optimization
Performance tuning
Based on the following forum’s thread which was very instructive, I kept the optimizations explained below.
http://forum.lemaker.org/thread-15543-1-1.html
Initial Calibration
Without any modification on the base Archlinux install, I had:
- hdparm -t -T /dev/sda : around 139 Mb/s
- FTP transfert (vsftpd) : 18 Mb/s read
- Samba : 11.3 Mb/s read and 12.0 Mb/s write
All tests have been done with a CAT5 cable on the same switch.
Tuning operations
Force ethernet IRQ to CPU 1:
1 2 |
$(cat /proc/interrupts | grep eth0 | cut -f 2 -d ":" | tr -d " ") # give 2 echo 2 > /proc/irq/48/smp_affinity |
Tune CPU frequency scaling:
1 2 3 4 5 6 |
echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor echo 960000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq echo 528000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor |
Adjust TCP stack buffers and properties:
1 2 3 4 5 6 7 8 9 |
sysctl -w net/core/rmem_max=8738000 sysctl -w net/core/wmem_max=6553600 sysctl -w net/ipv4/tcp_rmem="8192 873800 8738000" sysctl -w net/ipv4/tcp_wmem="4096 655360 6553600" sysctl -w vm/min_free_kbytes=65536 sysctl -w net.ipv4.tcp_window_scaling=1 sysctl -w net.ipv4.tcp_timestamps=1 sysctl -w net.ipv4.tcp_sack=1 sysctl -w net.ipv4.tcp_no_metrics_save=1 |
Finaly, set a bigger queue on eth0:
1 |
ip link set eth0 txqueuelen 10000 |
Results
- FTP transfert (vsftpd) : 77 Mb/s read
- Samba : 24.1 Mb/s read and 20.3 Mb/s write
While transferring a file, the smbd process is taking 100% of one CPU core. As one file transfer seems to use only one thread, the cpu need to be overclocked to get better performances, but A20 does not easily support large overclocking.
NFS being more performant, I tested it (nfsv4, ports 2049 and 111 must be open on the firewall, both for TCP and UDP)
Result :
- 57.3 Mb/s read (I did not benchmarked write speed).
Some of used Saltstack states
I said earlier, I’m not sharing all my states, but the following ones are more related to previously mentioned tuning.
SD card optimization
sls file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# define sdcard optimized mounting option for root fs on sdcards or emm flash /: mount.mounted: - device: {{ grains['rootfs'] }} - fstype: ext4 - opts: defaults,async,barrier=0,commit=100,noatime,nodiratime,errors=remount-ro - dump: 0 - pass_num: 1 # set default IO sceduler to deadline for sdcard # deadline scheduler could group small accesses to lesser sdcard latency /etc/udev/rules.d/60-schedulers.rules: file.managed: - source: salt://sdcard_optim/60-schedulers.rules - user: root - group: root - mode: 644 |
60-schedulers.rules file:
1 2 |
# set deadline scheduler for sdcard ACTION=="add|change", KERNEL=="mmcblk[0-9]", ATTR{queue/scheduler}="deadline" |
Gigabit optimization
sls file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
/etc/sysctl.d/10-iptuning.conf: file.managed: - source: salt://gbnetoptim/10-iptuning.conf - user: root - group: root - mode: 644 /etc/udev/rules.d/60-custom-txqueuelen.rules: file.managed: - source: salt://gbnetoptim/60-custom-txqueuelen.rules - user: root - group: root - mode: 644 gbnetoptim_reload_udev: cmd.run: - name: udevadm control --reload-rules gbnetoptim_change_txqueuelen: cmd.run: - name: ip link set eth0 txqueuelen 10000 gbnetoptim_reload_sysctl: cmd.run: - name: sysctl --system |
file 10-iptuning.conf :
1 2 3 4 5 6 7 8 9 10 |
net.core.rmem_max = 8738000 net.core.wmem_max = 6553600 net.ipv4.tcp_rmem = 8192 873800 8738000 net.ipv4.tcp_wmem = 4096 655360 6553600 net.ipv4.tcp_timestamps = 0 # less CPU usage on small arm soc net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_no_metrics_save = 1 vm.min_free_kbytes=65536 |
60-custom-txqueuelen.rules file:
1 2 |
KERNEL=="eth[0,1]", RUN+="/sbin/ip link set %k txqueuelen 10000" KERNEL=="eth[0,1]", RUN+="/sbin/ip link set %k txqueuelen 10000" |
A20 Cpu optimizations
sls file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# set default IO sceduler to deadline for sdcard # deadline scheduler could group small accesses to lesser sdcard latency /etc/udev/rules.d/65-schedulers.rules: file.managed: - source: salt://a20_optim/65-schedulers-sata.rules - user: root - group: root - mode: 644 /etc/systemd/system/a20_optim.service: file.managed: - source: salt://a20_optim/a20_optim.service - user: root - group: root - mode: 644 a20_optim_reload_systemd: cmd.run: - name : systemctl daemon-reload a20_optim: service.running: - enable: True a20_reload_udev: cmd.run: - name: udevadm control --reload-rules |
file 65-schedulers-sata.rules:
1 2 |
# set deadline scheduler for sata (best perf for a20) ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="deadline" |
file a20_optim.service:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
[Unit] Description=a20 optimizations service After=network.target [Service] Type=oneshot # set lower and higher cpu freq ExecStart=/bin/sh -c "echo 528000 >/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq" ExecStart=/bin/sh -c "echo 960000 >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq" # avoid cpu detected as idle when there is IO wait (faster transferts) ExecStart=/bin/sh -c "echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy" # tune ondemand to be more reactive ExecStart=/bin/sh -c "echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold" ExecStart=/bin/sh -c "echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor" # handle network interface IRQ via cpu1 (cpu0 handle sata) ExecStart=/bin/sh -c "echo 2 > /proc/irq/48/smp_affinity" RemainAfterExit=yes [Install] WantedBy=multi-user.target |