2020年12月27日 星期日

Testing the tests: What should we do when the test case self is buggy?

Nowadays, software testing is widely using to ensure quality and prevent bug, like unit test/AB test…. etc, But increasingly test case will causing some unwanted problem like the test case is buggy so the testing will failed. And causing developer waste time chasing down problems that potentially didn’t really exist

Facebook provide a method to detect the test case bug:

1. Using ML technology  to predict what test case to run 

2. All end-to-end tests will have some degree of flakiness, So make a index about how reliable in these test case

3. Assert that a test is sufficiently reliable and provide a scale to illustrate which tests are less reliable than they should be. 

Software testing is a good tool to ensure the product quality, with more complicated code base and more complicated test case, it need carefully consider introduce method that test the test case to prevent waste time and increase testing quality.


2020年11月13日 星期五

Performance tuning- Leveraging modern CPU branch predict mechanism.

In modern CPU the branch predictor is complicated and the deeper pipeline causing the cost of miss branch prediction higher.

How can we leveraging the modern CPU branch predictor, In [1] example code, the branch miss rate can  be reduce by just sorting input data before exciting original algorithm. 



Conclusion 

1. Adding pattern to your algorithm(sorted data).

2. Using likely()/unlikely() marco to help branch prediction more accuracy.


Reference:

3. Linux likely() unlikely() MARCO.

2020年6月22日 星期一

Machine Learning Foundations NLP(2): Using the Sequencing APIs

Using the Sequencing APIs

Sequencing is use to format sentence array using token,
For example:

"I have a car"
"Car have a door"
 
Tokenize: [I:1] [have:2] [a:3] [car:4] [door:5]
 
then these two sentence can be represent to:
[1 2 3 4] 
[4 2 3 5]

Sequencing is useful to represent sentence data and take as input for a neuron network.

Code

Result:

Reference:






2020年6月17日 星期三

Machine Learning Foundations NLP(1): Tokenization for Natural Language Processing

Tokenization for Natural Language Processing


Tokenize is mean to break down a sentence to server work, for example:

I have a car. -> I / have / a/ car

Tensorflow provide a tokenize tool:Tokenizer, it can easily to use for tokenize input sentence.


Code:



Result:

{'have': 1, 'a': 2, 'i': 3, 'he': 4, 'apple': 5, 'bike': 6, 'pen': 7, 'car': 8}



Reference:

Machine Learning Foundations: Ep #8 - Tokenization for Natural Language Processing



2020年5月28日 星期四

Linux kernel module: Add entry to debugfs and a read/write file.

Goal

This module will create Ray_DBG directory in debugfs and create a file REG for read/write. it's similar to previous module(create /proc/Ray), but easier to use and need less line of code.
 

Code


Makefile


Usage

1. Insert module 
$>insmod test_module.ko
$>dmesg | tail
[7501255.251763] Create Ray_DBG !
[7501255.251776] Create REG debug !
2. Mount debugfs 
$>mount -t debugfs none /sys/kernel/debug
3. Read data
$>cat /sys/kernel/debug/Ray_DBG/REG
$>0x000000aa
4. Write data 
$>echo 0xff>  /sys/kernel/debug/Ray_DBG/REG 
$>cat /sys/kernel/debug/Ray_DBG/REG 
$>0x000000ff

Reference

  1. Kernel document: debugfs
  2. Debugfs

Linux kernel module: Add entry in /proc and passing args while insmod

Goal 

Implement a kernel module that can passing parameter when insert module, and add an entry to /proc that can write/read data. In embedded linux often using similar facility to help debug.

Code:


Makefile:



Usage:

1. Insert module and make /proc/Ray entry
 #>sudo insmod test_module.ko entry="Ray" mode=1238
 
2. Write data 
#>echo 123 > /proc/Ray

3. Read data
 #>cat /proc/Ray

2020年5月27日 星期三

DeepSpeech: Speech to text AI model.

  • Overview

"DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper."
DeepSpeech provide lots of lauange api support, Python Javascript, c, and it's easily use to involve in application

  • Install DeepSpeech

Follow user guide instruction.

  • Demo

Using command line tool to inference sound data.

$> deepspeech --model deepspeech-0.7.0-models.pbmm --audio audio/2830-3980-0043.wav

Output:
Loading model from file deepspeech-0.7.0-models.pbmm
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.1-0-g2e9c281
Loaded model in 0.0093s.
Loading scorer from files deepspeech-0.7.0-models.scorer
Loaded scorer in 0.00023s.
Running inference.
experience proves this
Inference took 1.480s for 1.975s audio file.
 
The red color string is inference text data of input sound data.

Using Python API to inference sound data.

Output:
your power is sufficient i said sound data


Reference:

2020年5月22日 星期五

Machine Learning Foundations: Exercise 4 Happy and sad image classify model with 99.9% accuracy .

Machine Learning Foundations: Exercise 4 Happy and sad image classify model with 99.9% accuracy:code lab link

Build a mode to classify happy and sad model with convolution neuron network. with more 99.9% accuracy.
 
Note:
  •  Model over-fitting: Near 100% accuracy on training data, but lower accuracy on testing data.
  •  ImageDataGenerator:  Label data automatic using directory name.

Code: 

2020年5月20日 星期三

Conference note: Making C Less Dangerous in the Linux kernel - Kees Cook


In this conference discuss below topic about unsafely C usage, and how Linux kernel has remove or add  facility to detect such condition.

1. Variable Length array
  • Using compiler option to Detect VLA: gcc -W vla
  • Using guard page tor prevent stack overflow. VLA is needed lots of instruction compared to the fixed-size array.
2. Switch case break or non-break
  • Mark all non-breaks with a “fall through” to whether programmer intent to fall through or it's a bug.
  • Compiler support this feature: -Wimplicit-fallthrough
3. Arithmetic overflow detection
  • Using compiler option to detection overflow in compile time
  • Support different warning label: ignore or take as warring
4. Compare different API for string copy
  • The safer string copy function: strscpy().
5. Safe stack - Shadow stack
  • Separate the local variable stack and return address stack
  • Support by hardware:
    • ARM pointer authentication (Sign the return address for distinguish between a local variable and return address 

Reference :

2020年5月18日 星期一

Conference note: Safety vs Security: A Tale of Two Updates - Jeremy Rosen, Smile.fr

Safety and Security are different aspects of a system, briefly discuss the differences between them and challenge to build a system with both concerns.

  • Safety: Proof and make it simple for reliable

    1. Completely define spec.
    2. Proven: Check every state of the system is meet the design.
    3. Change as less as possible: If a bug can solve by period reboot machine, just period reboot machine rather than update the software.

  • Security: Prevent system been using in unwanted way

  1. Fasting iteration: Change the old encryption algorithm to a new one.
  2. Preventive: Need update the spec to face the new challenge.

These two aspects often neglect in embedded system nowadays, especially in customer market. but with more and more device connected to the internet and responsible for critical tasks like healthcare. Designer need to take care of these two fields, Include them in the beginning of designing a system and try to make an optimized combination of them to meet the system standards.


2020年5月17日 星期日

Machine Learning Foundations: Exercise 3 Improve accuracy of MNIST using Convolutions.

Exercise 3: Improve accuracy of MNIST using Convolutions:code lab link

Improve MNIST to 99.8% accuracy or more using only a single convolutional layer and a single MaxPooling 2D.
The filter amount will affect the accuracy and training time.

Code: 

2020年5月16日 星期六

GO: Fixed warning: go env -w GOPATH=... does not override conflicting OS environment variable

Because the GOPATH is OS level setting variable so it can't be set using go env -v cmd.
We need to use OS specify way to modify this PATH variable.


1. Using CMD to set GOPATH variable (Environment: WIN10)
    > setx GOPATH ##YOURPATH##

2. Check the GOPATH been setup 
   > go env 



Reference 
     2. GO/env_write.txt

2020年5月13日 星期三

Conference note: Linux I2C in the 21st Century - Wolfram Sang, Consultant / Renesas

This conference is intent to brief overview what't new in Linux I2C subsystem.
I found some interesting part of the I2C subsystem of the latest linux kernel:


  • The API i2c_dummy_device() for I2C device have more than one slave address.

Declare a dummy I2C deivce share same device but different slave address

  • Recommend new API to create I2C device: i2c_new_ancillary_device()

  • Dynamic address assign : In same I2C bus and dynamic detect what address should use.


I2C has been widely used in industry for decades and has some enhance feature like have multi slave address or dynamic assign address of slave in the same bus,
 bring lots of challenge for developer and Linux kernel provide some new API for more generic driver development.


Further speaking:
I3C is the next generation serial bus to replace I2C but I didn't saw much application  right now.


Reference : 
  1.  Linux I2C in the 21st Century - Wolfram Sang, Consultant / Renesas
  2.  I2C and SMBus Subsystem
  3.  I3C 



2020年5月12日 星期二

Machine Learning Foundations: Exercise 2 Handwriting digit model with 99% accuracy

Exercise 2: Handwriting digit model with 99% accuracy:code lab link
Write an MNIST classifier that trains to 99% accuracy or above, and does it without a fixed number of epochs -- i.e. you should stop training once you reach that level of accuracy.

Code:

Result:







Machine Learning Foundations : Exercise 1 House Price Question


Exercise 1 : House Prices Question : code lab link
Build a neural network that predicts the price of a house according to a simple formula, house pricing was as easy as a house costs 50k + 50k per bedroom, so that a 1 bedroom house costs 100k, a 2 bedroom house costs 150k etc.

Training data set
Bedroom amount [1, 2, 3, 4,]
House price [100, 150, 200, 250]

Code:

2020年5月6日 星期三

Octave: Read data from csv file.

Octave/Matlab can read CSV file and do some processing or plot the data quickly and easily.
Below are simple codes that read a CSV file and make some parse.


1. Read Sample_Data.csv file and stored as char array:
    char = strsplit(fileread ("Sample_Data.csv"));

2. Simple parse: Iterate each element and skip first 5 column data

    for i = 1:size(char,2)
        result(i,:) = char((10*i)-5:(10*i));
    end





Ref:

2020年3月11日 星期三

How to get main() function return value in Linux terminal.

Sometimes we want to get the exit value of main() function, using the "echo $? " can get the most recently program exit value.

Example program:



Result:



Reference:
  1. UNIX Shell Programming 

Linux driver: How to enable dynamic debug at booting time for built-in driver.

 Dynamic debug is useful for debug driver, and can be enable by: 1. Mount debug fs #>mount -t debugfs none /sys/kernel/debug 2. Enable dy...