January 19, 2014

Git, NGINX, web development and deploy-by-push (part 3)

This is the third and hopefully the last part of the series of posts I'm writing about how to setup Gitolite and NGINX and make a deploy by push system. If you haven't read the previous two posts, then you should!

As I was saying at the end of the second post, now that we have everything setup correctly, let's try it!



First you must clone the admin repository. Note that I'll be doing this from my machine instead of the server, but it will work both ways.

$ git clone git@alexandernst.com:/gitolite-admin

Add a new repository inside the gitolite.conf file inside the conf folder. I'll add mydomain.com as a test repository. Once done that, add, commit and push the changes.

$ git status
    modified:   conf/gitolite.conf
$ git commit -am "Add mydomain.com"
    1 file changed, 1 insertion(+), 1 deletion(-)
$ git push
Counting objects: 7, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 403 bytes | 0 bytes/s, done.
Total 4 (delta 0), reused 0 (delta 0)
remote: Initialized empty Git repository in /srv/git/repositories/mydomain.com.git/
remote: Cloning into 'mydomain.com'...
remote: warning: You appear to have cloned an empty repository.
remote: [master (root-commit) 2fc4a4e] Initial commit
remote:  6 files changed, 9 insertions(+)
remote:  create mode 100644 conf/.gitignore
remote:  create mode 100755 conf/autorun.sh
remote:  create mode 100755 conf/post-commit.sh
remote:  create mode 100644 logs/.gitignore
remote:  create mode 100644 www/404.html
remote:  create mode 100644 www/50x.html
remote:  -----Start deploy-----
remote:  Cloning new repo...
remote:  Cloning into 'mydomain.com'...
remote:  fatal: bad object 0000000000000000000000000000000000000000
remote:  -----End deploy-----
remote: To git@localhost:mydomain.com
remote:  * [new branch]      master -> master
To git@alexandernst.com:gitolite-admin
    34f2281..404d913  master -> master
Don't worry about the fatal: bad object error.

Now, let's go to the server and checkout if it actually worked!

$ tree -a -L 3 /srv/http/
/srv/http/
`-- mydomain.com
    |-- .git
    |   |-- HEAD
    |   |-- branches
    |   |-- config
    |   |-- description
    |   |-- hooks
    |   |-- index
    |   |-- info
    |   |-- logs
    |   |-- objects
    |   |-- packed-refs
    |   `-- refs
    |-- conf
    |   |-- .gitignore
    |   |-- autorun.sh
    |   `-- post-commit.sh
    |-- logs
    |   `-- .gitignore
    `-- www
        |-- 404.html
        `-- 50x.html

11 directories, 11 files

It did! It's working. Now you can clone that new repository

$ git clone git@alexandernst.com:mydomain.com

and add an nginx.conf file in the conf folder, an index.html file in the www folder, or whatever you want.

Just one more thing. Note that when you remove a repository from the conf file, the repository isn't removed from either /srv/http/ nor from /srv/git/repositories/. The correct way of removing a repository is first removing it from the conf file and then issuing an rm -rf.

I hope you find this 3 parts post useful. If you find any bugs or errors or if you have any comments, please use the comments section or email me and I'll try to fix them and/or help you.

Git, NGINX, web development and deploy-by-push (part 2)

If you are reading this, then you probably read the first part, and if not, then you should!

In the first part I showed how to setup Gitolite and SSH keys. In this part I'll show you how I made Gitolite and NGINX deploy my webpages and some extra tips.

Before we start, I want to remind you that we will be working on the server side (where you installed Gitolite), as the git user. Also keep in mind I set my git user's $HOME path to /srv/git. One last thing: I'll be serving my webpages from /srv/http.

Let's start!


Before we start with Gitolite and NGINX, I'll set permissions and ownership to /srv/http. I assume the user/group http already exists in your machine. If it doesn't, create them.

$ sudo members http
http
$ sudo usermod -a -G http git
$ sudo chown -R http:http /srv/http
$ sudo chmod -R 770 /srv/http

Later we will configure NGINX to use the http group to read from /srv/http.

The very first thing that we want Gitolite to do is to actually create a folders structure on each new repository we create. That is, whenever we add a new repository in the conf file inside Gitolite's admin repository, Gitolite will create that repository, clone it, create a basic folders structure and commit it.

I spent some time thinking about a good folders structure and I came up with a solution that will let me have my website, my conf and my logs.

Tip: actually, it's really easy to add mail or whatever you want.

First we need to edit Gitolite's conf, not in the admin repository, but Gitolite's conf itself.

$ pwd
/srv/git
$ ls -a
.bash_history
.gitconfig
.gitolite
.gitolite.rc
.local
.ssh
projects.list
repositories

We want to edit the .gitolite.rc file. There are 2 things that we need to do.
  1. Uncomment the LOCAL_CODE variable which points to $ENV{HOME}/local, or create it if it doesn't exist.
  2. Create a POST_CREATE variable with the value ['post_create'].
After you are done editing the file, it should look like this:

%RC = (
    LOCAL_CODE => "$ENV{HOME}/local",
    POST_CREATE => ['post_create'],
    ...
    ...
    ...
)

Next thing we must do is create the scripts that will actually create the folders structure and add it to each new repository.

$ pwd
/srv/git
$ mkdir -p local/hooks/common local/triggers

Create a file called post_create inside the local/triggers folder with the following instructions:

#!/bin/bash

repo="$2"

cd /tmp
git clone git@localhost:$repo
cd $repo
mkdir conf www logs
echo "" > conf/.gitignore
echo -e '#!/bin/bash\n' > conf/autorun.sh
echo -e '#!/bin/bash\n' > conf/post-commit.sh
chmod 770 conf/autorun.sh conf/post-commit.sh
echo -e '*\n!.gitignore' > logs/.gitignore
echo "404" > www/404.html
echo "50x" > www/50x.html
git --git-dir=/tmp/$repo/.git --work-tree=/tmp/$repo add -A
git --git-dir=/tmp/$repo/.git --work-tree=/tmp/$repo commit -m "Initial commit"
git --git-dir=/tmp/$repo/.git --work-tree=/tmp/$repo push
cd ..
rm -rf $repo

As you can see, this script is going to be triggered after a new repository is created and it will create some basic folders and some files/scripts.

The logs folder gets a .gitignore file that ignores everything except itself for two reasons. First, we want the logs folder itself to get added to git (as a reminder), and second, we don't want the server logs for that domain to get added to git. Of course, you can change this to whatever best fits your needs.

The conf folder gets an empty .gitignore file because we want to keep the folder itself and everything that we will place inside it later (NGINX conf, BindDNS conf, scripts, etc...). I'll explain this later.

Tip: You can create a mail folder the same way as the logs folder gets created.

Create yet another file called post-receive inside the local/hooks/common folder with the following instructions:

#!/bin/bash

repo=$(basename "$PWD")
repo=${repo%.git}

nginx=0

echo -----Start deploy-----
if [ -d /srv/http/$repo ]; then
    chown -R git:http /srv/http/$repo
    chmod -R 770 /srv/http/$repo
fi

if [ -d /srv/http/$repo ]; then
    echo "Updating existing repo..."
    cd /srv/http/$repo
    git --git-dir=/srv/http/$repo/.git --work-tree=/srv/http/$repo fetch origin
    git --git-dir=/srv/http/$repo/.git --work-tree=/srv/http/$repo update-index --refresh &> /dev/null #update git index to check actual content changes instead only file permission changes
    git --git-dir=/srv/http/$repo/.git --work-tree=/srv/http/$repo reset --hard origin/master
else
    echo "Cloning new repo..."
    cd /srv/http/
    git clone git@localhost:$repo
fi

chown -R git:http /srv/http/$repo
chmod -R 770 /srv/http/$repo
chmod -R 550 /srv/http/$repo/www #we are done updating, set www to read only

postcommit_arg_oldrev=0
while read oldrev newrev refname; do
    diff=$(git --git-dir=/srv/http/$repo/.git --work-tree=/srv/http/$repo diff --name-only $oldrev $newrev)

    if [ $nginx -eq 0 ]; then
        nginx=$(echo $diff | grep 'conf/.*nginx\.conf' | wc -l)
    fi

    if [ $nginx -ne 0 ]; then
        break;
    fi

    if [ $postcommit_arg_oldrev -eq 0 ]; then
        postcommit_arg_oldrev=$oldrev
    fi
done

if [ -f /srv/http/$repo/conf/post-commit.sh ]; then
    cd /srv/http/$repo/conf
    /srv/http/$repo/conf/post-commit.sh $postcommit_arg_oldrev
    cd /srv/http/$repo
fi

if [ $nginx -ne 0 ]; then
    echo "Changes in NGINX conf found, restarting..."
    sudo nginx -t &> /dev/null
    if [ $? -eq 0 ]; then
        sudo systemctl restart nginx
    else
        sudo nginx -t
        echo "Not restarting NGINX due to bad config!"
    fi
fi
echo -----End deploy-----

This script is a little bit longer, but it's easy to understand what it's going to do. It will get triggered after each push. It will check if the repository that we made the push to is cloned inside /srv/http and if it doesn't, it will clone it there; if it does, it will update it. This is where the tricky part is. Remember that I said that I don't want to run a blind git update command that will wipe who-knows-what? Well, what this script does is:
  1. Fetch (but not merge) all the changes from the gitolite server
  2. Check for actual file changes (ignores file permissions changes)
  3. Resets to HEAD only the files that got changed (hello nasty PHP viruses!). Why only the files that changed and not just wipe everything? Because you're interested in preserving the files that got uploaded to your website from an external source. Let's say avatars from a forum or attachments from a WordPress post.
The script will also set some permissions (change to fit your needs) and it will iterate over all the added/changed files from all the commits inside the push and if it finds an added/changed file inside the conf folder that ends with .nginx.conf it will restart NGINX.

Tip: You can add support for BindDNS the same way I added support for NGINX.

There is one more thing that the script will do: run the post-commit.sh script inside the conf folder! This is extremely useful. You can use it for whatever you want. What I use it for is set write permissions to some paths inside www (cache, avatars, uploads, etc...). I also use it to start/restart some NodeJS instances, but you can make it do literally whatever you like!

Note: For the git user to be able to restart NGINX you will either have to allow the git user (from the sudoers file) to run the commands sudo systemctl restart nginx and sudo nginx -t or you'll have to hack a small script, executable but not writeable from the git user, that will restart NGINX.

Finally, we need to make both scripts executable.

$ chmod ug+rx local/hooks/common/post-receive
$ chmod ug+rx local/triggers/post_create

Now that we are done configuring Gitolite, it's time to do the same with NGINX. We just need to tell NGINX
Hey! Look for conf files that you can use in each folder under /srv/http/*/conf
This way we could place one (or multiple) conf files in the conf folder of each repository, making NGINX use them all as directives for vhosts!

Edit your /etc/nginx/nginx.conf file and add this line inside the http directive.

include /srv/http/*/conf/*.nginx.conf;

You will also want to set access_log and error_log to off, because you'll want individual logs per domain/vhost. Just add those two rules to the http directive.

access_log off;
error_log off;

And make it run as the http user and use the http group.

user http http;

When you want to add a vhost, just create a new .nginx.conf file inside your repository's conf folder and write a server directive. Like this one:

server {
    listen 80;
    server_name mydomain.com www.mydomain.com;

    root /srv/http/mydomain.com/www;

    error_page 404 404.html
    error_page 500 502 503 504 50x.html;

    location / {

    }
}

We are done here! If you followed everything step by step, now you'll have a completely functional Git server with deploy by push system.

I'll write yet another part, hopefully the last one, showing the actual process of creating a new repository, so make sure to check that too!

January 18, 2014

Git, NGINX, web development and deploy-by-push (part 1)

I work as a web developer (please, help?) and what I want from my environment/servers is reliability and a fast and easy deployment method. I found a combination of tools that will do the job just as I want: Git, NGINX and some shell scripts.

The idea behind this is to be able to clone a git repository which would contain a particular web page, work on that repository and then push the changes back to the server, triggering an automatic deploy.
Of course, we don't want to trigger a blind git operation that will wipe out who-knows-what.
There is something more. Sometimes, because of WordPress bugs  or bad karma, some web pages get infected with those crappy PHP viruses. That is something I want to be able to fix in a few seconds, without restoring old backups nor doing any kind of manual cleaning.

I managed to get all those things, and I'd like to share it here. Hopefully you'll find it useful.

Here is how. Start by installing NGINX and Gitolite. Don't configure anything yet, just install them. I'll wait while you're dealing with your favorite package manager.


Are you done? Ok, let's move on.

You'll notice that (probably) the Gitolite package created a git user/group. If it didn't, create them. This is an important step, so don't skip it! But keep one thing in mind. The $HOME of the git user will be the path where repositories are going to be saved. You're free to set that path to whatever you want. I'll set mine to /srv/git. You'll also have to setup a password, set the right permissions/ownership, set some basic git stuff and generate a SSH key. I won't set a passphrase for the SSH key, but you are free to set one if you wish so.

$ usermod -d /srv/git git
$ passwd git
$ chown -R git:git /srv/git
$ chmod -R u+rwx /srv/git
$ chmod -R go-rwx /srv/git
$ su - git
$ ssh-keygen -t rsa -C "git@alexandernst.com"
$ git config --global user.email "git@alexandernst.com"
$ git config --global user.name "Git System"
$ git config --global push.default simple

Next step is to setup Gitolite itself. It's done with a command that will create the required folders structure and the admin repository.

$ gitolite setup -pk /srv/git/.ssh/id_rsa.pub

$ pwd
/srv/git

$ tree -a -L 2
.
|-- .bash_history
|-- .gitconfig
|-- .gitolite
|   |-- conf
|   |-- hooks
|   |-- keydir
|   `-- logs
|-- .gitolite.rc
|-- .local
|   `-- share
|-- .ssh
|   |-- authorized_keys
|   |-- id_rsa
|   `-- id_rsa.pub
|-- projects.list
`-- repositories
    |-- gitolite-admin.git
    `-- testing.git

11 directories, 7 files

You'll be using the admin repository to add and/or delete repositories and users. And for that, you'll need to clone that repository, edit the config file inside and push it.

$ git clone git@localhost:gitolite-admin

Note: If the cloning process asked you for the git user password instead of using the key you just generated or if you got a warning saying that you cloned an empty repository, then you did something wrong. SSH is extremely picky with permissions, so triple-check that you followed the exact commands I wrote. Only the git user should be able to RWX the $HOME path. Both the group and others should not have any access at all to that directory.

$ tree -a -L 2
.
|-- .bash_history
|-- .gitconfig
|-- .gitolite
|   |-- conf
|   |-- hooks
|   |-- keydir
|   `-- logs
|-- .gitolite.rc
|-- .local
|   `-- share
|-- .ssh
|   |-- authorized_keys
|   |-- id_rsa
|   |-- id_rsa.pub
|   `-- known_hosts
|-- gitolite-admin
|   `-- .git
|   |-- conf
|   `-- keydir
|-- projects.list
`-- repositories
    |-- gitolite-admin.git
    `-- testing.git

15 directories, 8 files

$ cd gitolite-admin
$ cat conf/gitolite.conf
repo gitolite-admin
    RW+     =   id_rsa

repo testing
    RW+     =   @all

I won't be explaining in depth how to configure Gitolite. If you want to learn everything about it, go to Gitolite's docs and read them. I'll just show you what I did and what works for me.

My conf file looks like this.

@projects = random_stuff more_stuff
@projects = yet_more_stuff

repo gitolite-admin
    RW+     =   @all

repo @projects
    RW+     =   @all

Note that I'm granting both read and write access to all the keys that I have added (keep reading, I'll get to this in a moment). Maybe this won't fit your needs, or maybe it will. Configure your Gitolite instance as you want.

After editing the config file, as I already said earlier, you need to push it back to Gitolite's admin repository.

$ git status
        modified:   conf/gitolite.conf
$ git commit -am "Grant RW to @all"
        1 file changed, 4 insertions(+), 2 deletions(-)
$ git push

After each push, Gitolite will scan your conf file and detect all the changes you did. Which means that if you added/deleted repositories, Gitolite will create/delete them, if you changed permissions, Gitolite will apply those changes, etc...

Now that we are done with Gitolite's basic setup, we need to add some users, because... yeah... if we can't clone/push from/to the server, what's the point of having one? 

Go to your machine and generate a SSH key the same way you did for the git user. Then copy the public (.pub) key to the server, inside the keydir folder, inside the Gitolite admin repository.

$ scp /home/alexandernst/.ssh/id_rsa.pub git@alexandernst.com:/srv/git/gitolite-admin/keydir/alexandernst.pub

Note that I copied my key from my machine to my server, where I installed Gitolite, but I changed the name of my key from id_rsa.pub to alexandernst.pub.

Now go back to the server and add the key you just copied.

$ git status
        keydir/alexandernst.pub
$ git add keydir/alexandernst.pub
$ git commit -am "Add alexandernst user"
        1 file changed, 1 insertion(+)
$ git push

From now on you should be able to clone the admin repository on your machine and add/delete users/repositories.

I'll be writing a second post about how to actually setup Gitolite and NGINX for push deploy as this post already looks quite long.

January 5, 2014

Why DuckDuckGo will fail

Maybe a better title would be 'Why all search engines with DDG's policy will fail', anyways, the current one works too.

Let me make something clear. I won't discuss any bugs that DDG had in the past, has right now or will have in the future. Every piece of software has bugs, missing or incomplete features and so on. I'll just skip those as those are fixable, given enough amount of time, man power, or other resources.

What I'll be focusing in this post is why the idea/concept of a search engine that doesn't track people won't work.

If you're reading this, then you probably know how DDG works and what they're trying to do. Anyways, if you have been living in a cave for the last 10 years, DDG is a search engine that emphasizes protecting searchers' privacy and avoiding the filter bubble of personalized search results and also distinguishes itself from other search engines by not profiling its users and by deliberately showing all users the same search results for a given search term.




That means that DDG will do the opposite of what Google does when you search something. If you don't know how Google works, let's just say that they save the query terms you use when you search something in Google and use those to make a personal profile (sort of) that is used later to show you targeted ads, better results based on your search history and who-knows-what-more. 

So... I already wrote a few paragraphs and I still haven't said why will DDG fail. Well, it's pretty simple. It will fail because we are different. No, really, let me explain.

By now Google already knows that my main interests are developing, motorbikes and strip clubs dogs. That also means that every single time I search something in Google I'll get search results based on my preferences or based on the topics I mostly search for. Which means that when I search for "tree" I'll most probably get results containing "binary tree", which is actually fine, because that's probably what I'm looking for.
When I search "Shark", I'll probably get results about the motorbike helmets company, which is also fine because that's what I'm probably looking for.
When Bob, who is a biologist, searches for "tree", he will get results related with nature, because that's what he is probably looking for.

On the other hand, DDG, which doesn't know what are my search preferences, will give me results that don't make any sense for me.
DDG will do it's best to match the words I wrote in the search input field with all the pages they have indexed, trying to look for matches at the most important/relevant parts of each page (title, short description, etc...).  But this just isn't enough.

That's how search engines used to work 7 years ago. And lets be honest, we didn't get the results we were looking for on page one. We had to use more words to describe better what we were looking for.

That's the main reason why DDG will fail. The search engine itself is doomed because it won't be able to deliver the right results.

One possible solution for DDG would be to actually track users and give them better results without sharing that tracking information with 3rd parties. That way users would be able to benefit the accuracy of the results while not worrying about their privacy.
Maybe even let every single user choose if he wants to be tracked to get better search results or stay untracked and get the "default" results that everybody gets right now.

One thing is clear. Current DDG strategy won't lead them to be the search engine, a title that right now is in Google's possession.