Fancy Coder
share my projects and discoveries
Friendoc and its future
http://wp.me/sSs4G-friendoc is a great success: it attracts 1.2 million users within 3 weeks. I believe it is the most fast-spreading web app in history.
Maintaining code which would effect million users is very stressful. My friend and I both need a good rest. Now we’ve already pass this product to one of my senior, which is also a part-time staff of RenRen. RenRen will take responsibility maintaining this software.
Although stressful, developing with Fan and have millions of users using our software is a great experience : )
Friendoc

Friendoc is a web application based on RenRen open platform(facebook c2c ver). It does one simple thing: collect data from you and your friends, visualize them as nice images. My friend Fan Xinyuan and I are only developers. My friend works on interface design, and I work on back end part.
It was one and a half months ago, my friend told me he want to build friendoc on RenRen. I thought his idea was great, and technical problems are not difficult, thus I agreed to help him.
It is my first time building a web app on open platform. RenRen’s API is not as comprehensive as facebook’s and document is poor. It took me quite some time to figure out how to call its API. After we know how to collect data, we start working on visualization.
Initially, I wished to use third-party API to visualize data like google chart. However, my friend insist that we need our own style rather than statistical diagrams. I suggest to use image processing library to draw images dynamically with data as input. The most easy-to-use image processing library I know is PIL. Then he started drawing small image parts manually using PS, and I write python script to organize pieces into a complete image.

One important feature is add these images into user’s albums, so that users can “circle out” their friends, and their friends would also try our app. I encountered many problems when developing this feature.
Now out app is still under verification. If things goes well, it will be online in a few days. RenRen’s testers gave very high comment on our app, and they are willing to provide free server to hold our app. I think this would be the first time that RenRen provide server to hold web app for developers.
If things goes very well, we would consider transplantation our app into facebook : )
GCC Allows Arrays of Variable Length!!
This is very surprising. When I was learning C, I was convinced that array size must be determined at compile time, and I never wrote a line of code that violate this. Today, my friend (a C beginner) asked me to help him to debug his code. His code contains following lines:
..... int T, R, C, i, j; ..... char tile[R][C]; .....
To my great surprise, this code compiles with no error. According to this post and this page, C99 allows array size to be a variable, which is supported by GCC. This feature may free programmer from many new, delete, malloc, free and memory leak in many cases.
Javascript tips
I am reading Secrete of Javascript Ninja these days. This book introduced many js tips/patterns, and I wish to summarize what I find useful in this post.
Force the “new” operator
JS is not that object oriented. In JS, all functions is a constructor when it is called with a “new” operator, and “this” variable in the function is bounded to the newly constructed object automatically. This was considered a harmful pattern, because if I forgot to add “new”, then the function will not be considered a constructor, and “this” variable refers to global variable. However, this situation can be avoided by adding the “new” operator automatically if we forgot. Following code does this:
function User(first, last){
if ( !(this instanceof arguments.callee) )
return new User(first, last);
....
}
In this case, “User” cannot be used as a normal function.
Get function it self inside function execution
JS allows you to write code in functional programming style. In other functional programming languages like Scheme, it is not easy to do recursion using a lambda function, because you don’t know how to call the function it self: it got no name. In JS, you can always call current function through arguments.callee.
var factorial=function(n){return n==0?1:n*arguments.callee(n-1);}
Use setTimtout to help break time consuming script
If a script is running when page is loaded, then the page will only be displayed after the script finishes running. However, we can use setTimeout to break long script into small pieces. When the script is constructing page, page view update is not done immediately. Instead, it is pushed into a queue, and everything in the queue will be handled after script block finish running. setTimeout function will break the current script block, and let those tasks in the queue have a chance to be handled.
bd=document.getElementsByTagName("body")[0];
tb=document.createElement("table");
bd.appendChild(tb);
for(var i=0;i<10000;i++){
var tr=document.createElement("tr");
var td=document.createElement("td")
td.appendChild(document.createTextNode(i));
tr.appendChild(td);
td=document.createElement("td")
td.appendChild(document.createTextNode(i*i));
tr.appendChild(td);
tb.appendChild(tr);
}
bd=document.getElementsByTagName("body")[0];
tb=document.createElement("table");
bd.appendChild(tb);
var j=10;
(function(){
for(var i=0;i<1000;i++){
var tr=document.createElement("tr");
var td=document.createElement("td")
td.appendChild(document.createTextNode(i));
tr.appendChild(td);
td=document.createElement("td")
td.appendChild(document.createTextNode(i*i));
tr.appendChild(td);
tb.appendChild(tr);
}
if(j--)setTimeout(arguments.callee,0);
})();
Refer to group in regular expression
Like regular expressions in many other languages, JS use “\1″ to refer to first matching group, “\2″ for the second…However, this only works inside regular expression itself. One application is primality testing in my earlier post.
JS string also has “replace” method, which support regular expression replacement. To refer to matching group in the second argument of replace method, we need to use “$1″, “$2″ instead of “\1″, “\2″.
"hello world".replace(/(\w+)\s+(\w+)/,"$2 $1");
Remove empty text node from HTML
When using DOM to get html node, text nodes containing white space only is very annoying. Following code removes white space text nodes using DFS.
function cleanWhitespace( element ) {
// If no element is provided, do the whole HTML document
element = element || document;
// Use the first child as a starting point
var cur = element.firstChild;
// Go until there are no more child nodes
while ( cur != null ) {
// If the node is a text node, and it contains nothing but whitespace
if ( cur.nodeType == 3 && ! /\S/.test(cur.nodeValue) ) {
// Remove the text node
element.removeChild( cur );
// Otherwise, if it's an element
} else if ( cur.nodeType == 1 ) {
// Recurse down through the document
cleanWhitespace( cur );
}
cur = cur.nextSibling; // Move through the child nodes
}
}
Get element by class name
function hasClass(name,type) {
var r = [];
var e = document.getElementsByTagName(type || "*");
for ( var j = 0; j < e.length; j++ )
if ( e[j].classList.contains(name) )
r.push( e[j] );
return r;
}
Expressive Javascript
When I was learning Scheme, I was told that Scheme is a expressive language, because it can implement higher order function, closure, and even OOP.
Now I am learning javascript, and I find that JS is can be even more expressive than scheme. First, you can write C-style code in JS, which is difficult, or impossible in scheme. Moreover, JS has scheme’s functional programming behavior, and it can also implement higher order function, closure, and even OOP.
Let’s implement a Scheme’s classic example: print all permutation of a set of numbers
cons=function(a,b){return {fst:a,rst:b};};
car=function(p){return p.fst;};
cdr=function(p){return p.rst;};
range=function(n){
var res=null;
for(i=n;i>0;i--){
res=cons(i,res);
}
return res;
};
list=function(){
var arg=[].slice.call(arguments);
if(arg.length==0)return null;
return cons(arg[0],list.apply(undefined,arg.slice(1)));
};
print=function(l){
var cur=l;
document.write("(");
while(cur){
var v=car(cur);
cur=cdr(cur);
if(typeof v == "number")
document.write(v+(cur?", ":""));
else
print(v);
}
document.write(")<br>");
};
map=function(f,l){
return l==null?null:cons(f(car(l)),map(f,cdr(l)));
}
filter=function(f,l){
if(l==null)return null;
if(f(car(l)))return cons(car(l),filter(f,cdr(l)));
return filter(f,cdr(l));
}
append=function(l1,l2){
if(l1==null)return l2;
return cons(car(l1),append(cdr(l1),l2));
}
flat=function(l){
if(l==null)return null;
if(typeof car(l) == 'number')return cons(car(l),flat(cdr(l)));
return append(car(l),flat(cdr(l)));
}
length=function(l){
return l==null?0:1+length(cdr(l));
};
perm=function(l){
if(length(l)==1)return list(l);
return flat(map(function(x){ return map(function(rst){return cons(x,rst);},perm(filter( function(t){ return t!=x; },l))) },l));
};
print(perm(list(1,2,3,4)));
We start from simulating Pair and List in scheme, then implement operations on list, and finally implement “perm” operation. All implementations are following the way they implemented in scheme style (use recursions instead of loops).
An impressive example of scheme is it’s higher order function, which takes functions as arguments, or return functions. Higher order function is a more general function, and an good example is memoization.
Memoization is one implementation of Dynamic Programming: it keeps DP formula’s recursive style in code, which is easy to understand, while avoid repeated calculations by memorize all result for each states.
All memoizations have similar format:
if state in memo: return memo[state] if state is base case: get result for state else: calculate result using recursive formula add result into memo return result
Although all memoization code looks similar, we need to rewrite all code from scratch every time in languages like C. However, if we can use higher order function, memoization can be generalized into a template, where we only need to pass recursive formula and base case into it, and the higher order function handle the rest automatically.
var memo=function(base,dp){
var f=[];
return function(){
var arg=[].slice.apply(arguments),res;
if(f[arg]!=undefined)return f[arg];
res=base.apply(undefined,arg);
if(res!=undefined)return f[arg]=res;
return f[arg]=dp.apply(0,arg);
};
};
The higher order memoization can be implemented in lines of code in js. base argument is a function that returns answer inly on base case parameters, while return undefined otherwise. dp is function that implement dp formula using recursive calls.
Some interesting examples:
var fact=memo(function(n){if(n==0)return 1;},function(n){return fact(n-1)*n;});
var fib =memo(function(n){if(n<=1)return n;},function(n){return fib(n-1)+fib(n-2);});
var comb=memo(
function(n,m){
if(n<0 || m<0 || m>n)
return 0;
if(n==m || m==0)
return 1;
},
function(n,m){
return comb(n-1,m-1)+comb(n-1,m);
});
DP functions can also call each other:
var even=memo(function(n){if(n==0)return true;},function(n){return odd(n-1);})
var odd=memo(function(n){if(n==0)return false;},function(n){return even(n-1);})
Javascript is also able to do “currying”. Currying is how Haskell handle functions: “+” is a function takes 2 arguments, return the sum; “1+” is a function takes 1 arguments x, return x+1. That is to say, if you pass function with partial arguments in Haskell, you will get a function taking the rest arguments, and return the result with complete arguments.
Function.prototype.curry=function(){
var args=[].slice.apply(arguments),f=this;
return function(){
return f.apply(this,args.concat([].slice.apply(arguments)));
};
};
add=function(a,b){return a+b;}
addOne=add.curry(1)
Recursion -> Loop
Sometimes, we need to convert recursive functions into loop for efficiency consideration, or stack size constrain. Converting tail recursion into iteration is trivial. The method we discuss in this post is a general method to convert any recursive functions into iterative ones.
The idea is to simulate stack behavior: we store frame record as a structure into a simulated stack when calling a function, and restore the top record in stack when function returns.
The record should contain 2 part of data:
- local variable, including parameter
- next instruction to execute(used when restoring the record)
We also need a global variable to store return value (the same as eax register).
“Next instruction to execute” can be stored using label, and we can “goto” corresponding instruction after restoring record. However, “goto” statement is not supported in some languages, and also not recommended. Thus, we use integer number to represent label id, and use switch statement to jump to different part of code.
Following example shows recursive and iterative version of factorization function. Iterative version is significantly longer in code length, and slightly slower in practice. However, iterative version can work on large input where recursive version may get stack overflow.
We select javascript to demonstrate because:
- Dynamic type saves declaration code
- Closure can simulate behavior of frame record perfectly (stack elements are actually functions)
- Powerful literal value representation
var fact=function(n){
return n==0?1:n*fact(n-1);
};
var Fact=function(n){
var _ret;
var stk=[];
var construct=function(n){
var l=0;
return function(){
switch(l){
case 0:
l=1;
if(n==0){
_ret=1;
stk.pop();
return;
}
stk.push(construct(n-1));
return;
case 1:
_ret*=n;
stk.pop();
return;
}
};
};
stk.push(construct(n));
while(stk){
if(stk.length==0)return _ret;
stk[stk.length-1]();
}
};
Now let’s look at more complicated examples:
fibonacci number
var Fib=function(n){
var _ret;
var stk=[];
var construct=function(n){
var l=0,a,b;
return function(){
switch(l){
case 0:
l=1;
if(n<=1){
_ret=n;
stk.pop();
return;
}
stk.push(construct(n-1));
return;
case 1:
l=2;
a=_ret;
stk.push(construct(n-2));
return;
case 2:
b=_ret;
_ret=a+b;
stk.pop();
return;
}
};
};
stk.push(construct(n));
while(true){
if(stk.length==0)return _ret;
stk[stk.length-1]();
}
}
hanoi tower
var Hanoi=function(a,b,c,n){
var stk=[];
var construct=function(a,b,c,n){
var l=0;
return function(){
switch(l){
case 0:
l=1;
if(n>0){
stk.push(construct(a,c,b,n-1));
return;
}
stk.pop();
return;
case 1:
l=2;
document.write(a+"->"+c+"<br>");
stk.push(construct(b,a,c,n-1));
return;
case 2:
stk.pop();
return;
}
};
};
stk.push(construct(a,b,c,n));
while(stk){
if(stk.length==0)return;
stk[stk.length-1]();
}
};
merge sort
var Hanoi=function(a,b,c,n){
var stk=[];
var construct=function(a,b,c,n){
var l=0;
return function(){
switch(l){
case 0:
l=1;
if(n>0){
stk.push(construct(a,c,b,n-1));
return;
}
stk.pop();
return;
case 1:
l=2;
document.write(a+"->"+c+"<br>");
stk.push(construct(b,a,c,n-1));
return;
case 2:
stk.pop();
return;
}
};
};
stk.push(construct(a,b,c,n));
while(stk){
if(stk.length==0)return;
stk[stk.length-1]();
}
};
merge sort
var MergeSort=function(a,i,j){
if(!i)i=0;
if(!j)j=a.length;
var stk=[];
var construct=function(a,i,j){
var l=0;
var mid,t=[],x,y;
return function(){
switch(l){
case 0:
l++;
if(j-i<=1){
stk.pop();
return;
}
mid=floor((i+j)/2);
stk.push(construct(a,i,mid));
return;
case 1:
l++;
stk.push(construct(a,mid,j));
return;
case 2:
x=i, y=mid;
while(x<mid && y<j){
if(a[x]<a[y]){
t.push(a[x]);
x++;
}else{
t.push(a[y]);
y++;
}
}
while(x<mid)t.push(a[x++]);
while(y<j)t.push(a[y++]);
for(var k=i;k<j;k++)
a[k]=t[k-i];
stk.pop();
return;
}
};
};
stk.push(construct(a,i,j));
while(stk){
if(stk.length==0)return;
stk[stk.length-1]();
}
};
Example for TreeList and BinaryIndexedTree
We mentioned BinaryIndexedTree and TreeList in previous post. Now I will give an example problem where both data structure are needed.
Given array a[0...n-1], which stores a permutation of 0 to n-1. We define array b[0...n-1] using following equation:
b[i]=|{x|x occurs before i in a[], and x > i}|
The problem is: given a[], compute b[] in O(nlogn) time; given b[], compute a[] in O(nlogn) time.
To convert a[] into b[], assume we have an array t[0...n](initially all 0). We go through each elements in a[], and for each element a[i] we encounter, b[a[i]]=t[0]+…+t[a[i]], and then we set t[a[i]]=1. The correctness is obvious. Implementing this method naively gives O(n^2). Notice that binary indexed tree is designed for this kind of prefix sum operation:
vector<int> a2b(vector<int> a){
vector<int> b(a.size());
BITWraper t(a.size(),0);
for (int i = 0; i < a.size(); i++) {
b[a[i]]=i-t.sum(a[i]);
t.add(a[i]);
}
return b;
}
By default, binary indexed tree’s array index starts from 1. Thus, we implemented a wrapper class, which allows arbitrary starting index.
How to computing a[] given b[]? Let’s look at large elements first. Let’s first write down the largest element, for each elements i, there are b[i] elements greater than it. Since we write down elements in descending order, all b[i] elements must have been written down. Thus, we insert i into b[i] position. After all elements are inserted, a[] is constructed. Again, implementing using array or linked list will result in O(n^2) complexity, while TreeList only need O(nlogn) time to do all insertions.
vector<int> b2a(vector<int> b){
TreeList<int> a;
for(int i=b.size()-1;i>=0;i--)
a.insert(i,b[i]);
return a.toVector();
}
C code that print its own source
Very interesting. The trick is that use s to print itself, thus s should not contain any character like “\n”, “\”" which will be displayed differently as its literal value. Thus, we use numbers to represent those special characters.
#include <stdio.h>
int main(){const char *s="#include<stdio.h>%cint main(){const char *s=%c%s%c;printf(s,10,34,s,34);}";printf(s,10,34,s,34);}
Binary Indexed Tree
Binary Indexed Tree (also called Fenwick tree), a good tutorial is available here
Binary indexed tree’s logical structure is an integer array. Initially array is filled with 0. Binary Indexed Tree allows you to update value of element in array, and compute sum of elements in any index interval. All operations are O(log n).
The implementation is simple and efficient, although quite tricky, and it is very popular in programming contests. Source code follows.
//index from 1 to N
template<size_t N>
class BinaryIndexedTree{
int *a;
int low_bit(int x){return x & -x;}
public:
// i \in [1,N]
void add(int i,int k=1){
while(i<=N){
a[i]+=k;
i+=low_bit(i);
}
}
// i \in [0,N]
int sum(int i){
int res=0;
while(i){
res+=a[i];
i-=low_bit(i);
}
return res;
}
// i \in [0,N], j \in [i,N]
int rangeSum(int i,int j){
return sum(j)-sum(i-1);
}
BinaryIndexedTree(){
a=new int[N+1];
fill(a,a+N,0);
}
~BinaryIndexedTree(){
delete[] a;
}
};
TreeList
As we know, array data structure has O(1) access time and O(n) insertion/deletion(kth element); linked list need O(n) for all operations, and O(1) insertion/deletion only when position pointer is given.
We also know that binary search tree and skip-list provide O(log n) for accessing, insertion and deletion when element value is given.
Now what if we want a data structure, which behaves like a list, which can insert/delete element at kth position efficiently?
One of my friend suggest “blocked linked list”, which is a combination of array and linked list. Assume the input data size N is known, we can use a list of arrays to store data. List has length sqrt(N), each array also has length sqrt(N). Insertion and deletion has complexity O(sqrt(N)), and array is splitted when length exceed sqrt(N), and adjacent arrays are merged when length smaller than sqrt(N). This data structure is very popular in high school programming contest, where data size is known in advance. However, O(sqrt(N)) is not good enough: we want O(log n) for all operations.
Another method is to use binary search tree: binary search tree with “size” field can find “kth” element in O(log n) time. Elements have a floating point value as key for comparison. When inserting kth element, we first look for element k-1 and element k, and take their mean as key for new element. Thus new element is inserted into kth position. However, this method is not that elegant: size of the list depends on floating point precision.
Finally, comes our “TreeList” data structure. I don’t think I am the first guy who invent it, and I don’t know whether there is any other name for such data structure. The TreeList support 3 operations:
- insert element into kth position
- remove element into kth position
- get kth element
The implementation is based on treap. Actually, any binary search tree can be used. The complexity for all 3 operations are O(log n). Source code is shown below.
When do we need this data structure? Follow my latest post
template<class T>
class TreeList{
// list structure, support 3 operations:
// -insert element into kth position
// -remove element into kth position
// -get kth element
// implemented using treap
struct node{
T value;
int pri,size;
node *left,*right;
node(){}
node(const T& v,node* l,node* r){
value=v;
left=l;
right=r;
size=1;
pri=rand();
}
}*root,*null;
void resize(node* cur){
if(cur!=null)cur->size=cur->left->size+cur->right->size+1;
}
void left_rotate(node* &cur){
node *res=cur->left;
cur->left=res->right;
res->right=cur;
resize(cur);
cur=res;
resize(cur);
}
void right_rotate(node* &cur){
node *res=cur->right;
cur->right=res->left;
res->left=cur;
resize(cur);
cur=res;
resize(cur);
}
void remove_node(node* &cur){
if(cur->left == null && cur->right == null){
cur=null;
}else if(cur->left->pri < cur->right->pri){
left_rotate(cur);
remove_node(cur->right);
resize(cur);
}else{
right_rotate(cur);
remove_node(cur->left);
resize(cur);
}
}
void insert_to_kth_pos(const T& x,node* &cur,int k){
if(cur == null && k==0){
cur = new node(x,null,null);
}else if(k <= cur->left->size){
insert_to_kth_pos(x,cur->left,k);
if(cur->pri > cur->left->pri)
left_rotate(cur);
resize(cur);
}else if(k <= cur->size){
insert_to_kth_pos(x,cur->right,k-cur->left->size-1);
if(cur->pri > cur->right->pri)
right_rotate(cur);
resize(cur);
}
}
void remove_kth_element(node* & cur,int k){
if(k == cur->left->size){
remove_node(cur);
}else if(k < cur->left->size){
remove_kth_element(cur->left,k);
if(cur->pri > cur->left->pri)
left_rotate(cur);
resize(cur);
}else if(k < cur->size){
remove_kth_element(cur->right,k-cur->left->size-1);
if(cur->pri > cur->right->pri)
right_rotate(cur);
resize(cur);
}
}
void print(node* cur){
if(cur == null)return;
print(cur->left);
cout<<cur->value<<" ";
print(cur->right);
}
public:
TreeList (){
null=new node();
null->pri = 0x7fffffff;
null->size = 0;
null->left=null->right=null;
root=null;
}
void print(){
print(root);
cout<<endl;
}
void insert(const T& x,int k){
insert_to_kth_pos(x,root,k);
}
void remove(int k){
remove_kth_element(root,k);
}
T get(int k){
node* cur=root;
while(cur!=null){
if(k < cur->left->size){
cur=cur->left;
}else if(k == cur->left->size){
break;
}else{
k-=cur->left->size+1;
cur=cur->right;
}
}
return cur->value;
}
};