Sunday, August 28, 2016

Loading data into Elasticsearch

I was playing around with Elasticsearch and tried to load some data into it.  One way to do it is write a script to parse the JSON file and use Elasticsearch client to index the doc. e.g. using elasticsearch-py:

from elasticsearch import Elasticsearch
import argparse
import json
import sys


parser = argparse.ArgumentParser(description='Import JSON files into Elasticsearch')
parser.add_argument('-f', '--file', help='file to import', required=True)
parser.add_argument('-i', '--index', help='Elasticsearch index name', required=True)
parser.add_argument('-t', '--type', help='Elasticsearch type name', required=True)
parser.add_argument('--id', help='id field of each document')
parser.add_argument('--empty_as_null', help="Convert empty objection to null")
args = parser.parse_args()

es = Elasticsearch()

with open(args.file, 'r') as json_file:
    for line in json_file:
        doc = json.loads(line)
        if args.id is not None:
            doc_id = doc[args.id]
            #doc.pop(args.id)
        else:
            doc_id = None

        try:
            es.index(index=args.index, doc_type=args.type, id=doc_id, body=doc)
        except:
            print('Problem processing ')
            print(doc)
            print(sys.exc_info()[0])


But this is slow.  It took almost an hour to index 61k documents.  A much way is to use the Bulk API.  But first we need to modify the JSON file.  The Yelp sample business data comes in this format:

...
{"business_id": "UsFtqoBl7naz8AVUBZMjQQ", "full_address": "202 McClure St\nDravo
sburg, PA 15034", "hours": {}, "open": true, "categories": ["Nightlife"], "city"
: "Dravosburg", "review_count": 4, "name": "Clancy's Pub", "neighborhoods": [],
"longitude": -79.886930000000007, "state": "PA", "stars": 3.5, "latitude": 40.35
0518999999998, "attributes": {"Happy Hour": true, "Accepts Credit Cards": true,
"Good For Groups": true, "Outdoor Seating": false, "Price Range": 1}, "type": "b
usiness"}

...

We will need insert an action before each record.  A simple sed command will do the trick:

sed -i.bak 's/^/{ "index": { "_index": "yelp", "_type": "business" } }\n/' business.json

Then we can load the file directly into Elasticsearch

curl -s -XPOST localhost:9200/_bulk --data-binary "@business.json"; echo


And this takes only 30 seconds for the same 61k documents

Tuesday, July 5, 2016

Compiling Cyanogenmod for TF300t

Inspired by JustArchi's optimizations on compiling the Cyanogenmod code, I set out to compile an optimized ROM for my dated Asus TF300t tablet. The official Cyanogenmod built does work, but it just feel a bit laggy.

The idea is simple. Follow the Cyanogenmod how-to guide to setup the environment and repo sync to retrieve the code.  Then apply the patches from JustArchi's ArchiDroid.

However, the options used in JustArchi are quite "exotic".  I am more interested to improve the performance by e.g. removing "-g" and turning on NEON auto-vectorization etc.  Besides, the tf300t build is having problem with "-O3" on target ARM (it is OK on THUMB though). So I ended up using these parameters instead:

ARCHIDROID_GCC_CFLAGS_ARM := -O2

ARCHIDROID_GCC_CFLAGS := -O2 -funsafe-math-optimizations -ftree-vectorize -mvectorize-with-neon-quad -fgcse-las -fgcse-sm -fipa-pta -fivopts -fomit-frame-pointer -frename-registers -fsection-anchors -ftracer -ftree-loop-im -ftree-loop-ivcanon -funsafe-loop-optimizations -funswitch-loops -fweb -Wno-error=array-bounds -Wno-error=clobbered -Wno-error=maybe-uninitialized -Wno-error=strict-overflow

Note that with auto vectorization turned on, the file external/libopus/celt/rate.c failed to compile due to a known bug.  The error is as follow:

target thumb C: libopus <= external/libopus/celt/rate.c
external/libopus/celt/rate.c: In function 'compute_allocation':
external/libopus/celt/rate.c:638:1: error: unrecognizable insn:
 }
 ^
(insn 1122 1121 1123 153 (set (reg:V4SI 1012)
        (unspec:V4SI [
                (const_vector:V4SI [
                        (const_int 0 [0])
                        (const_int 0 [0])
                        (const_int 0 [0])
                        (const_int 0 [0])
                    ])
                (reg:V4SI 1008 [ vect_var_.64 ])
                (const_int 1 [0x1])
            ] UNSPEC_VCGE)) external/libopus/celt/rate.c:521 -1
     (nil))
external/libopus/celt/rate.c:638:1: internal compiler error: in extract_insn, at recog.c:2150


So I modified the code to disable auto vectorization for the compute_allocation function in external/libopus/celt/rate.c:

__attribute__((optimize("no-tree-vectorize")))
int compute_allocation(const CELTMode *m, int start, int end, const int *offsets, const int *cap, int alloc_trim, int *intensity, int *dual_stereo,
      opus_int32 total, opus_int32 *balance, int *pulses, int *ebits, int *fine_priority, int C, int LM, ec_ctx *ec, int encode, int prev, int signalBandwidth)


After all the changes, clean the source tree (make clean) and delete the ccache.  Then use the "breakfast" and "brunch" command to build the zip.  Flash the ROM as usual.

References:

ARM Floating point reference
gcc auto-vectorization
gcc optimization options
gcc ARM options

Wednesday, June 15, 2016

Adding seccomp support to Elasticsearch on ARM

Linux kernel supports seccomp since 2.6.12.  The ARM support was added in 2012.

However, the current Elasticsearch source only supports seccomp on x86 and amd84 platforms.  When starting Elasticsearch on an ARM platform, you will see bootstrap failed to install seccomp filters:

[2016-06-15 22:11:00,078][WARN ][bootstrap                ] unable to install syscall filter: seccomp unavailable: 'arm' architecture unsupported


To add support for ARM platforms, it is just a matter of finding the correct audit code of ARM architecture and the appropriate syscall number of blocked functions.

Here is the code change required:


diff --git a/core/src/main/java/org/elasticsearch/bootstrap/Seccomp.java b/core/src/main/java/org/elasticsearch/bootstrap/Seccomp.java
index 46908e6..d94c848 100644
--- a/core/src/main/java/org/elasticsearch/bootstrap/Seccomp.java
+++ b/core/src/main/java/org/elasticsearch/bootstrap/Seccomp.java
@@ -243,6 +243,9 @@ final class Seccomp {
         Map<String,Arch> m = new HashMap<>();
         m.put("amd64", new Arch(0xC000003E, 0x3FFFFFFF, 57, 58, 59, 322, 317));
         m.put("i386",  new Arch(0x40000003, 0xFFFFFFFF, 2, 190, 11, 358, 354));
+        // ARM syscall number ref based on kernel 4.6
+        // https://github.com/torvalds/linux/blob/v4.6/arch/arm/kernel/calls.S
+        m.put("arm", new Arch(0x40000028, 0xFFFFFFFF, 2, 190, 11, 387, 383));
         ARCHITECTURES = Collections.unmodifiableMap(m);
     }


Also forked the Elasticsearch github source for that.

Friday, June 3, 2016

Compiling Nvidia driver 340.96 for Linux 4.6.x kernel

Updates Oct 2016: The latest Nvidia driver should support kernel 4.6.x, 4.7.x, and 4.8.x.  There is no need to use this patch anymore.

Updates 2016-08-13: The patch works for Linux kernel 4.7.x too.

With the latest Linux 4.6.x kernel, the Nvidia 340.96 driver won't compile.  Here is a quick fix to compile and install the driver until Nvidia releases a new version.

First, download extract the driver package:

./NVIDIA-Linux-x86_64-340.96 -x

Then patch the files accordingly:

diff -r NVIDIA-Linux-x86_64-340.96/kernel/os-mlock.c NVIDIA-Linux-x86_64-340.96.mod/kernel/os-mlock.c
48c48
<     ret = get_user_pages(current, mm, (unsigned long)address,
---
>     ret = get_user_pages_remote(current, mm, (unsigned long)address,
61c61,62
<             page_cache_release(user_pages[i]);
---
>             //page_cache_release(user_pages[i]);
>             put_page(user_pages[i]);
88c89,90
<         page_cache_release(user_pages[i]);
---
>         //page_cache_release(user_pages[i]);
>         put_page(user_pages[i]);

diff -r NVIDIA-Linux-x86_64-340.96/kernel/uvm/nvidia_uvm_lite.c NVIDIA-Linux-x86_64-340.96.mod/kernel/uvm/nvidia_uvm_lite.c
788c788,789
<         retValue = VM_FAULT_MINOR;
---
>         //retValue = VM_FAULT_MINOR;
>         retValue = 0;


Finally, compile and install:

./nvidia-installer

Sunday, May 22, 2016

Event driven GPIO with Python on UDOO

In the previous post, a shell script was used to read periodically on a GPIO pin for input.  That could be inefficient and in the worst case missing the input completely.



Using the same hardware design as in previous post, we can change the software part to a more efficient event driven approach.  First, setup the GPIO pin (GPIO 42 in this example) as input:

echo 42 > /sys/class/gpio/export
echo in > /sys/class/gpio/gpio42/direction
echo falling > /sys/class/gpio/gpio42/edge

Note that on UDOO, GPIO pins can be set to trigger interrupt when the value change.  Here, we set the "edge" as "falling" to indicate that we want to be notified when the input change from 1 to 0.  You can also set it as "rising" or "both" to suit your needs.  You can also change the value in the file "active_low" to reverse the order.  See the Sysfs document for details.

With the GPIO pin setup properly, we can use a Python script to wait for input using Linux epoll(7):

import sys
import os
import select
import time
import datetime

if len(sys.argv) < 2:
    print('Missing gpio')
    sys.exit(1)

fd = None
e = None
ignore = True
gpio = sys.argv[1]

try:
    fd = os.open("/sys/class/gpio/gpio%s/value" % gpio, os.O_RDONLY)
    e = select.epoll()
    e.register(fd, select.EPOLLIN | select.EPOLLET)

    while True:
        events = e.poll()
        if not ignore:
            for fd, event_type in events:
                print(datetime.datetime.now().isoformat() + " event_type " + str(event_type) + " detected on " + str(fd))
            break
        ignore = False

finally:
    if e is not None:
      e.close()
    if fd is not None:
        os.close(fd)


We register the GPIO pin with a epoll object.  Since we want to wait till the value change from 1 to 0, we used the flag select.EPOLLET to use edge-triggered instead of the default level-triggered mechanism.  Then the program enters an infinite loop to wait for the value change.

Note that the first trigger is ignored as epoll returns immediately on the first call.

Also, you can register multiple GPIO pins with the epoll object.  Check the (file descriptor, event type) tuple returned by poll() and you can handle the case differently.

The whole Python script will not return until the GPIO input changed from 1 to 0.  To use it to trigger a shutdown, create a shell script similar to the following to setup the GPIO pin and wait for the Python script to return. Run the shell script on every reboot using systemd.  For details, refer to the previous post.

#!/bin/sh

GPIO=42

echo $GPIO > /sys/class/gpio/export
echo in > /sys/class/gpio/gpio$GPIO/direction
echo falling > /sys/class/gpio/gpio$GPIO/edge

/usr/bin/python /root/scripts/pwrbtncheck/poll.py $GPIO
if [ $? = 0 ]
then
  echo "Shutdown button pressed"
  /usr/bin/sync; /usr/bin/sync; /usr/bin/shutdown -h now
fi




Yes. 42 is "The Answer to the Ultimate Question of Life, The Universe, and Everything" :)

Sunday, May 15, 2016

Adding shutdown button to UDOO

By default, the PWb button on UDOO is for waking up the board after a proper shutdown (use the reset button instead).  It cannot be used to shutdown Linux. But we could implement a shutdown button by using one of the GPIO pins.

First, the physical connection.  We will be connecting a GPIO pin to a push button.  In this example we will be using GPIO42, which is pin 7 on UDOO.  A 10K resistor is used as pull-up resistor.



Then we need to enable and monitor the GPIO pin in Linux. I am using Arch Linux but it should be similar for other flavor of Linux.  The script to do it:

#!/bin/sh

GPIO=42

echo $GPIO > /sys/class/gpio/export
echo in > /sys/class/gpio/gpio$GPIO/direction

while true; do
 value=`cat /sys/class/gpio/gpio$GPIO/value`
 if [ "$value" = "0" ]
 then
   echo "Shutdown button pressed"
   /usr/bin/sync; /usr/bin/sync; /usr/bin/shutdown -h now
 fi
 sleep 2
done


Some explanations of the script.  With kernel GPIO Sysfs, we need to specify which pin we will be using by writing the GPIO number (42 here) to /sys/class/gpio/export.  Since we will be reading from the pin, we also write "in" to the /sys/class/gpio/gpioXX/direction.

Then the script will enter an infinite loop to monitor if the GPIO value is 0 (i.e. push button is pressed).  If so, it will sync the disks and shutdown the system.

This polling method is less efficient than an event trigger approach. But that will be another exercise. :P

Save this script (e.g. /root/scripts/pwrbtncheck/pwrbtncheck.sh) and we will tell systemd to run it every time the board is booted.

Create a file named pwrbtncheck.service in /etc/systemd/system:

[Unit]
Description=Power button check

[Service]
ExecStart=/root/scripts/pwrbtncheck/pwrbtncheck.sh > /dev/null 2>&1 &

[Install]
WantedBy=multi-user.target


Then enable it by running this command:

sudo systemctl enable pwrbtncheck.service



Saturday, May 7, 2016

Setup FIDO Universal 2nd Factor (U2F) testing environment in 2 minutes

This is a quick start guide of setting up a node.js testing environment for U2F.  For details, please refer to the github page of u2f-sample-server.

What is u2f-sample-server?

It is a ready-to-use node.js package to test U2F tokens.  It is a demo to show how to register a U2F device and later authenticate it.  Messages exchanged between the server (relying party), the browser (client), and the U2F devices are shown.

To allow the use of the built-in U2F plugin of Chrome browser, the package contains self-signed certificate for SSL connection.

Note that although u2f-sample-server demonstrates the full register and authenticate workflow, it is not the proper way to do it in real-life application.  For example, the registered U2F devices should be associated with particular account and stored in database rather than session.


Steps


  • Make sure you have node.js environment setup properly on your machine
  • Clone the u2f-sample-server from github:
git clone https://github.com/kitsook/u2f-sample-server
  • Install dependencies
cd u2f-sample-server
npm install
  • Start the server
node index.js

  • In your Chrome / Chromium browser, navigate to https://localhost:4430/demo and start testing the U2F registration and authentication workflow. 

As of May 2016, there is bug in the node-u2flib-server module. If you encountered the following error when starting the server, you will need to comment out one line of code.  Please refer to the github page for details.


module.js:328
    throw err;
    ^

Error: Cannot find module './crypto/random_challenge_generator.js'
    at Function.Module._resolveFilename (module.js:326:15)
    at Function.Module._load (module.js:277:25)
    at Module.require (module.js:354:17)
    at require (internal/module.js:12:17)
......

Saturday, April 30, 2016

Node.js server setup for FIDO U2F

Just put together a quick hack of node.js server to demonstrate the FIDO Universal 2nd Factor authentication.



Full source code at github.

Saturday, April 16, 2016

Static IP with connman

Just a quick note.  When googling how to set static IP with connman, many solutions incorrectly stated to edit the settings file under /var/lib/connman/wifi_<HASH>_managed_psk/ directly.

But instead, one should edit the file /var/lib/connman/service-name.config instead. e.g.

debian@beaglebone:/var/lib/connman$ sudo cat wifi.config
[service_home]
Type = wifi
Name = yyyyyyyyy
Security = wpa
Passphrase = xxxxxxxxxx
IPv4=192.168.1.4/255.255.255.0/192.168.1.254
IPv6=off
Nameservers=8.8.8.8,8.8.4.4


Also note that when the service-name.config file is updated, connman will automatically pick up the changes without restarting.

For more info and settings, do a "man connman-service.config".


Sunday, February 7, 2016

Remove invalid SIDs from Windows ACL

Here is a quick note on how to remove those invalid SIDs / unknown account on Windows file system  ACL (e.g. after moving files from one domain to another).


First, download SubInAcl from microsoft.


Then open a command prompt with administrator permission.  Change directory to the target folder.

If necessary, execute the following command to take the ownership first.  Otherwise if the unknown SID is the owner of the files, suppressing the SID with SubInAcl will change the owner to everyone.

takeown  /f * /r /d y

Execute the following command to remove the SID recursively.  Refer to the html file that comes with SubInAcl for other available options.

"c:\Program Files (x86)\Windows Resource Kits\tools\subinacl.exe" /noverbose /subdirectories *  /suppresssid=S-1-5-21-3393913859-1150651423-3580285917-1000