Wednesday, September 6, 2017

OpenSUSE Tumbleweed on AMD APU Kabini

In order to have Tumbleweed GUI working on Kabini, one needs to install the kernel-firmware package.

Recently moved my Linux workstation to an old AMD APU platform with Kabini (Athlon 5350). The installation completed successfully, but the console goes blank and X cannot use the radeon driver and fallback to VESA.

After some digging​, found that although the radeon and amdgpu modules are loaded, there are error messages in dmesg saying that some firmwares cant be loaded.

The problem can be easily fixed by installing the kernel firmware:

sudo zypper install kernel-firmware

Saturday, September 2, 2017

Shrinking the Linux guest storage file of VirtualBox

Shrinking the dynamic storage file of VirtualBox used to be tedious.  First need to zero out the free space in the guest and then compacting the file from host.

With Linux supporting TRIM and VirtualBox supporting DISCARD, it can be done much easier within the guest.

First, on the host, prepare the storage file with DISCARD support:

VBoxManage storageattach "Ubuntu server" --storagectl "SATA" --port 0 --discard on --nonrotational on

- "Ubuntu server" is the VM name
- use "--storagectrl" and "--port" to specify the storage controller



Then whenever the storage file needs to be compacted, execute fstrim in the guest. e.g.

sudo fstrim -v /

where "/" is the mount point.



Tuesday, June 20, 2017

Compiling Nvidia 340.102 driver for Linux 4.11.x kernel

Further to the patch for compiling 340.102 driver on 4.10.x kernel, to compile for 4.11.x kernel, add the following patch on kernel/nv-drm.c too.

From:

static int nv_drm_unload(
    struct drm_device *dev
)

to

#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 11, 0)
static int nv_drm_unload(
#else
static void nv_drm_unload(
#endif
    struct drm_device *dev
)

Saturday, June 17, 2017

A Java stream approach on testing triangular Fibonacci numbers

Read this interesting blog post on testing triangular Fibonacci numbers.  And I decided to implement a similar test in Java.

First, a Java Stream to produce triangular numbers.

    public static Stream<BigInteger> stream() {
        return Stream
            .iterate(
                BigInteger.ONE,
                i -> i.add(BigInteger.ONE))
            .map(i -> i.multiply(i.add(BigInteger.ONE)).divide(TWO));
    }

And a Stream for Fibonacci sequence.

    public static Stream<BigInteger> stream() {
        return Stream
            .iterate(
                new BigInteger[] { BigInteger.ONE, BigInteger.ONE },
                p -> new BigInteger[] { p[1], p[0].add(p[1]) })
            .map(p -> p[0]);
    }

Now, a simple and naive way to test for a triangular Fibonacci number is to loop the Fibonacci sequence while testing for the number's existence in the stream of triangular numbers.

        Iterator<BigInteger> fib = FibonacciNum.stream().limit(TEST_LIMIT).iterator();
        Iterator<BigInteger> tri = TriangularNum.stream().iterator();
     
        BigInteger t = tri.next();
     
        List<BigInteger> result = new ArrayList<BigInteger>();
     
        while (fib.hasNext()) {
            BigInteger f = fib.next();
            while (t.compareTo(f) <= 0) {
                if (t.equals(f)) {
                    result.add(t);
                }
                t = tri.next();
            }
        }

But since the Fibonacci sequence grows so quickly, it is a waste of CPU time to generate all those triangular numbers.  A quicker way is to ditch the triangular number stream and implement a test function for triangular number.  We then use that function to filter the Fibonacci stream.

        List<BigInteger> result = FibonacciNum
            .stream()
            .limit(TEST_LIMIT)
            .parallel()
            .filter(f -> TriangularNum.isTriangular(f))
            .distinct()
            .sorted()
            .collect(Collectors.toList());

Testing the first 70 Fibonacci numbers, the time diff between the two approaches is huge (24ms vs 4s).


And with the fast approach, on an i5 4210 machine, testing the first 50,000 Fibonacci numbers will take 93s.

See source below or on GitHub.

Saturday, May 6, 2017

Compiling Nvidia 340.102 driver for Linux 4.10.x kernel

Upgraded my Tumbleweed to kernel 4.10.x and Nvidia 340 driver won't build.  Until Nvidia fix it, here are steps to patch the 340.102 driver to make it works.

Download the 340.102 driver from Nvidia:
http://www.nvidia.ca/object/unix.html

Unpack the driver:
./NVIDIA-Linux-x86_64-340.102.run -x

Apply patch for kernel 4.9.x:
https://pkgs.rpmfusion.org/cgit/nonfree/nvidia-340xx-kmod.git/tree/4.9.0_kernel.patch

Apply patch for kernel 4.10.x:
https://raw.githubusercontent.com/MilhouseVH/LibreELEC.tv/8d3956f72d79ea3648b19f4c705a38307bb03efb/packages/x11/driver/xf86-video-nvidia-legacy/patches/xf86-video-nvidia-legacy-kernel-4.10.patch

Proceed to install. Note that the unified memory module can't be built with this patch, so we are disabling it:
sudo ./nvidia-installer --no-unified-memory



Sunday, January 8, 2017

Face recognition with OpenCV 3.x

Here is another experiment with OpenCV face detection and recognition.  Full source code is available on GitHub.

The picture below is the recognition program running on my Windows laptop and the webcam is feeding live images to the program. My phone is in front of the webcam showing a photo.

The face recognizer was trained with ~10 photos each for Obama, Trump, and Trudeau that I found on internet.

To increase the accuracy, a simple transformation step was added to level the face images when training the recognizer.





Saturday, December 31, 2016

OpenCV face detection on UDOO with Arch Linux

A few years ago, tried to run face detection on BBB.  This time trying it on the more powerful UDOO platform.  Some minor code changes:

- Move from Python module cv to cv2
- OpenCV 3.x

My UDOO is running Arch Linux with kernel 4.9.  Note that the libGL included in "imx-gpu-viv-fb" doesn't work properly with OpenCV.  Switching to "mesa-libgl" solved the issue.

$ sudo pacman -Sy mesa-libgl opencv python2-numpy

A Logitech C200 USB web camera is plugged in and detected automatically.

$ lsusb
Bus 001 Device 003: ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter
Bus 001 Device 004: ID 046d:0802 Logitech, Inc. Webcam C200
Bus 001 Device 002: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

$ lsusb -s 001:004 -v | egrep "Width|Height"
Couldn't open device, some information will be missing
        wWidth                            640
        wHeight                           480
        wWidth                            160
        wHeight                           120
        wWidth                            176
        wHeight                           144
        wWidth                            320
        wHeight                           240
......

The camera supports maximum resolution of 640x480.  But reducing the capture to 320x240  (line 9-10) produces a smoother video on UDOO.

The refresh rate is approximately 30fps.  Change the waitKey parameter (line 58) to adjust it.




Tuesday, November 1, 2016

Hashing files with MD5 / SHA1 / SHA256

A quick and dirty hash program implemented in go.


Sunday, August 28, 2016

Loading data into Elasticsearch

I was playing around with Elasticsearch and tried to load some data into it.  One way to do it is write a script to parse the JSON file and use Elasticsearch client to index the doc. e.g. using elasticsearch-py:

from elasticsearch import Elasticsearch
import argparse
import json
import sys


parser = argparse.ArgumentParser(description='Import JSON files into Elasticsearch')
parser.add_argument('-f', '--file', help='file to import', required=True)
parser.add_argument('-i', '--index', help='Elasticsearch index name', required=True)
parser.add_argument('-t', '--type', help='Elasticsearch type name', required=True)
parser.add_argument('--id', help='id field of each document')
parser.add_argument('--empty_as_null', help="Convert empty objection to null")
args = parser.parse_args()

es = Elasticsearch()

with open(args.file, 'r') as json_file:
    for line in json_file:
        doc = json.loads(line)
        if args.id is not None:
            doc_id = doc[args.id]
            #doc.pop(args.id)
        else:
            doc_id = None

        try:
            es.index(index=args.index, doc_type=args.type, id=doc_id, body=doc)
        except:
            print('Problem processing ')
            print(doc)
            print(sys.exc_info()[0])


But this is slow.  It took almost an hour to index 61k documents.  A much way is to use the Bulk API.  But first we need to modify the JSON file.  The Yelp sample business data comes in this format:

...
{"business_id": "UsFtqoBl7naz8AVUBZMjQQ", "full_address": "202 McClure St\nDravo
sburg, PA 15034", "hours": {}, "open": true, "categories": ["Nightlife"], "city"
: "Dravosburg", "review_count": 4, "name": "Clancy's Pub", "neighborhoods": [],
"longitude": -79.886930000000007, "state": "PA", "stars": 3.5, "latitude": 40.35
0518999999998, "attributes": {"Happy Hour": true, "Accepts Credit Cards": true,
"Good For Groups": true, "Outdoor Seating": false, "Price Range": 1}, "type": "b
usiness"}

...

We will need insert an action before each record.  A simple sed command will do the trick:

sed -i.bak 's/^/{ "index": { "_index": "yelp", "_type": "business" } }\n/' business.json

Then we can load the file directly into Elasticsearch

curl -s -XPOST localhost:9200/_bulk --data-binary "@business.json"; echo


And this takes only 30 seconds for the same 61k documents

Tuesday, July 5, 2016

Compiling Cyanogenmod for TF300t

Inspired by JustArchi's optimizations on compiling the Cyanogenmod code, I set out to compile an optimized ROM for my dated Asus TF300t tablet. The official Cyanogenmod built does work, but it just feel a bit laggy.

The idea is simple. Follow the Cyanogenmod how-to guide to setup the environment and repo sync to retrieve the code.  Then apply the patches from JustArchi's ArchiDroid.

However, the options used in JustArchi are quite "exotic".  I am more interested to improve the performance by e.g. removing "-g" and turning on NEON auto-vectorization etc.  Besides, the tf300t build is having problem with "-O3" on target ARM (it is OK on THUMB though). So I ended up using these parameters instead:

ARCHIDROID_GCC_CFLAGS_ARM := -O2

ARCHIDROID_GCC_CFLAGS := -O2 -funsafe-math-optimizations -ftree-vectorize -mvectorize-with-neon-quad -fgcse-las -fgcse-sm -fipa-pta -fivopts -fomit-frame-pointer -frename-registers -fsection-anchors -ftracer -ftree-loop-im -ftree-loop-ivcanon -funsafe-loop-optimizations -funswitch-loops -fweb -Wno-error=array-bounds -Wno-error=clobbered -Wno-error=maybe-uninitialized -Wno-error=strict-overflow

Note that with auto vectorization turned on, the file external/libopus/celt/rate.c failed to compile due to a known bug.  The error is as follow:

target thumb C: libopus <= external/libopus/celt/rate.c
external/libopus/celt/rate.c: In function 'compute_allocation':
external/libopus/celt/rate.c:638:1: error: unrecognizable insn:
 }
 ^
(insn 1122 1121 1123 153 (set (reg:V4SI 1012)
        (unspec:V4SI [
                (const_vector:V4SI [
                        (const_int 0 [0])
                        (const_int 0 [0])
                        (const_int 0 [0])
                        (const_int 0 [0])
                    ])
                (reg:V4SI 1008 [ vect_var_.64 ])
                (const_int 1 [0x1])
            ] UNSPEC_VCGE)) external/libopus/celt/rate.c:521 -1
     (nil))
external/libopus/celt/rate.c:638:1: internal compiler error: in extract_insn, at recog.c:2150


So I modified the code to disable auto vectorization for the compute_allocation function in external/libopus/celt/rate.c:

__attribute__((optimize("no-tree-vectorize")))
int compute_allocation(const CELTMode *m, int start, int end, const int *offsets, const int *cap, int alloc_trim, int *intensity, int *dual_stereo,
      opus_int32 total, opus_int32 *balance, int *pulses, int *ebits, int *fine_priority, int C, int LM, ec_ctx *ec, int encode, int prev, int signalBandwidth)


After all the changes, clean the source tree (make clean) and delete the ccache.  Then use the "breakfast" and "brunch" command to build the zip.  Flash the ROM as usual.

References:

ARM Floating point reference
gcc auto-vectorization
gcc optimization options
gcc ARM options