Are You A Sysadmin? Here’s Why You Should Choose an IT Company

Throughout my career, I’ve had the opportunity to work for different types of companies: a financial institution, a large publishing company and an insurance company. Each of them had a different perspective on IT.
Read the rest of this entry »

MTL Data Meetup at Radialpoint

Love open data? Radialpoint is opening its doors to the MTL Data Meetup group, Wednesday, October 22, 2014 at 5:30 PM.

We’ll be exploring the evolution of open data with some great minds in the space. Dr. Diane Mercier will keynote the event talking to us about Montreal’s open data portal. After a quick break, Toby Hocking will show us data visualizations of bike count throughout several locations in Montreal between 2009-2013. We’ll wrap up the meetup with a new solution to Kaggle’s Titanic competition, a challenge where candidates use machine learning tools to predict which passengers survived the tragic 1912 shipwreck. A team has implemented a new Python solution using Theano (a deep learning Python library) and they are going to demo what they’ve been able to uncover using it.

Exciting stuff to come! Be sure to secure your spot here.



Elasticsearch Montreal Meetup Tonight!

Interested in stretching the possibilities of what you can do with elasticsearch?  Read the rest of this entry »

In the Near Future Data Sovereignty, Security and Privacy Will Be Why Organizations to Run to the Cloud, Not from it.

When attending the Gigaom Structure conference in San Francisco this summer, two things stood out for me most. One was how few organizations are actually running loads in the cloud today. The second, was the huge amount of work, both legislatively and technically that cloud providers were doing in order to resolve concerns that companies have around data sovereignty, security and privacy. Millions of dollars are being spent to try and conform to different regions privacy concerns as well as huge lobbying efforts in order to shape policy that is public cloud friendly.

During the conference, a question was asked to the overfilled standing room of about 300 IT professionals and IT Architects from major North American corporations. “How many of you have working loads in the public cloud today – that you know about ?” asked the presenter. There was a small round of chuckles at the last part of the question. As I looked around, I was blown away to find that there were only about 10 other hands raised beside mine in the room.

It reminded me of a Microsoft conference that I attended in 2012. The presenter asked the room full of hundreds of IT professionals how many of them had their organizations on Windows 7 (which was released to public 3 years earlier) and I remember being amazed that I was 1 of 4 people in a room of hundreds that raised their hand. (The most unsettling thought was how many organizations might be running Vista!)

This to me, illustrates the lingering anxiety and resistance companies still have towards change. The cloud is change and its implications for data sovereignty, privacy and security remain safeguard excuses to remain stagnant.

Don’t get me wrong. The nature of some businesses (think banks) make entry into the cloud computing arena more complex. One reason is that federal regulations haven’t adjusted to allow them. Despite this, as time goes on, heavy, on premise IT infrastructures  will cease to be the standard—instead the exception to the rule.

“That You Know About”

Circling back to the initial question about how many organizations are storing loads in the cloud – the “that you know about” part speaks to the fact that cloud-use within companies is already happening despite policies to control or prevent it.  While the decision-makers, lawyers and other management hum and ha over whether to move to the cloud, often their developers have already found ways to work there.

While there are multitudes of reasons why the developers are running to the cloud, the fact that they are doing it despite corporate policies against it speaks to the inevitability of the approaching storm.

The Cloud is the Future

The rare edge cases like the story of quickly get pointed out as the reason why it’s too soon to move to the cloud. The reality is that no matter how good your security team is, how many lawyers you have and how many of them are spending their days worrying about data privacy and security concerns, there is no company in the world that is spending the resources, time and focus on this than companies like Google, Amazon and Microsoft.

The reason they are hiring the world’s best minds in these fields is not only because they want to resolve every country’s concerns regarding privacy legislation, not only because they want the most secure public clouds, but because they want there to be no more excuses that prevent key decision-makers from spending their dollars with them instead of with the Blade, SAN and Network equipment makers of the world.

In the very near future companies will be running to the public cloud because if they are genuinely concerned about having great security and privacy without any data sovereignty issues, they would be crazy to build their infrastructure anywhere else.

 Photo credit:

This post originally appeared on IT World Canada as part of their So You Think You Can Blog  contest

OSGi: The Gateway Into Micro-Services Architecture

The terms “modularity” and “microservices architecture” pop up quite often these days in context of building scalable, reliable distributed systems. Java platform itself is known to be weak with regards to modularity (Java 9 is going to address this by delivering project Jigsaw), giving a chance to frameworks like OSGi and JBoss Modules to emerge.

When I first heard about OSGi back in 2007, I was truly excited about all the advantages Java applications might benefit from by being built on top of it. But very quickly the frustration took place instead of excitement: no tooling support, very limited set of compatible libraries and frameworks, quite unstable and hard to troubleshoot runtime. Clearly, it was not ready to be used by an average Java developer and as such, I had to put it on the shelf. Over the years, OSGi has matured a lot and gained  widespread community support.

The curious reader may ask: what are the benefits of using modules and OSGi in particular? To name just a few problems it helps solve:

  • explicit (and versioned) dependency management: modules declare what they need (and optionally the version ranges)
  • small footprint: modules are not packaged with all their dependencies
  • easy release: modules can be developed and released independently
  • hot redeploy: individual modules may be redeployed without affecting others

In today’s post we are going to take a 10000 foot view on a state-of-the art in building modular Java applications using OSGi. Leaving aside discussions how good or bad OSGi is, we are going to build an example application consisting of following modules:

  • data access module
  • business services module
  • REST services module

Apache OpenJPA 2.3.0 / JPA 2.0 for data access (unfortunately, JPA 2.1 is not yet supported by OSGi implementation of our choice), Apache CXF 3.0.1 / JAX-RS 2.0 for REST layer are two main building blocks of the application. I found Christian Schneider‘s blog, Liquid Reality, to be invaluable source of information about OSGi (as well as many other topics).
In OSGi world, the modules are called bundles. Bundles manifest their dependencies (import packages) and the packages they expose (export packages) so other bundles are able to use them. Apache Maven supports this packaging model as well. The bundles are managed by OSGi runtime, or container, which in our case is going to be Apache Karaf 3.0.1 (actually, it is the single thing we need to download and unpack).

Let me stop talking and show some code. We are going to start from the top (REST) and go all the way to the bottom (data access) as it would be easier to follow. Our PeopleRestService is a typical example of JAX-RS 2.0 service implementation:

package com.example.jaxrs;

import java.util.Collection;



@Path( "/people" )
public class PeopleRestService {
    private PeopleService peopleService;

    @Produces( { MediaType.APPLICATION_JSON } )
    public Collection< Person > getPeople( 
            @QueryParam( "page") @DefaultValue( "1" ) final int page ) {
        return peopleService.getPeople( page, 5 );

    @Produces( { MediaType.APPLICATION_JSON } )
    @Path( "/{email}" )
    public Person getPerson( @PathParam( "email" ) final String email ) {
        return peopleService.getByEmail( email );

    @Produces( { MediaType.APPLICATION_JSON  } )
    public Response addPerson( @Context final UriInfo uriInfo, 
            @FormParam( "email" ) final String email, 
            @FormParam( "firstName" ) final String firstName, 
            @FormParam( "lastName" ) final String lastName ) {

        peopleService.addPerson( email, firstName, lastName );
        return Response.created( uriInfo
            .path( email )
            .build() ).build();

    @Produces( { MediaType.APPLICATION_JSON  } )
    @Path( "/{email}" )
    public Person updatePerson( @PathParam( "email" ) final String email,
            @FormParam( "firstName" ) final String firstName, 
            @FormParam( "lastName" )  final String lastName ) {

        final Person person = peopleService.getByEmail( email );

        if( firstName != null ) {
            person.setFirstName( firstName );

        if( lastName != null ) {
            person.setLastName( lastName );

        return person;

    @Path( "/{email}" )
    public Response deletePerson( @PathParam( "email" ) final String email ) {
        peopleService.removePerson( email );
        return Response.ok().build();

    public void setPeopleService( final PeopleService peopleService ) {
        this.peopleService = peopleService;

As we can see, there is nothing here telling us about OSGi. The only dependency is the PeopleService which somehow should be injected into the PeopleRestService. How? Typically, OSGi applications use blueprint as the dependency injection framework, very similar to old buddy, XML based Spring configuration. It should be packaged along with application inside OSGI-INF/blueprint folder. Here is a blueprint example for our REST module, built on top of Apache CXF 3.0.1:

<blueprint xmlns=""

    <cxf:bus id="bus">

    <jaxrs:server address="/api" id="api">
             <ref component-id="peopleRestService"/>
            <bean class="com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider" />

    <!-- Implementation of the rest service -->
    <bean id="peopleRestService" class="com.example.jaxrs.PeopleRestService">
        <property name="peopleService" ref="peopleService"/>

    <reference id="peopleService" interface="" />

Very small and simple: basically the configuration just states that in order for the module to work, the reference to the should be provided (effectively, by OSGi container). To see how it is going to happen, let us take a look on another module which exposes services. It contains only one interface PeopleService:


import java.util.Collection;


public interface PeopleService {
    Collection< Person > getPeople( int page, int pageSize );
    Person getByEmail( final String email );
    Person addPerson( final String email, final String firstName, final String lastName );
    void removePerson( final String email );

And also provides its implementation as PeopleServiceImpl class:


import java.util.Collection;

import org.osgi.service.log.LogService;


public class PeopleServiceImpl implements PeopleService {
    private PeopleDao peopleDao;
    private LogService logService;

    public Collection< Person > getPeople( final int page, final int pageSize ) {
        logService.log( LogService.LOG_INFO, "Getting all people" );
        return peopleDao.findAll( page, pageSize );

    public Person getByEmail( final String email ) {
        logService.log( LogService.LOG_INFO, "Looking for a person with e-mail: " + email );
        return peopleDao.find( email );

    public Person addPerson( final String email, final String firstName, 
            final String lastName ) {
        logService.log( LogService.LOG_INFO, "Adding new person with e-mail: " + email );
        return new Person( email, firstName, lastName ) );

    public void removePerson( final String email ) {
        logService.log( LogService.LOG_INFO, "Removing a person with e-mail: " + email );
        peopleDao.delete( email );

    public void setPeopleDao( final PeopleDao peopleDao ) {
        this.peopleDao = peopleDao;

    public void setLogService( final LogService logService ) {
        this.logService = logService;

And this time again, very small and clean implementation with two injectable dependencies, org.osgi.service.log.LogService and Its blueprint configuration, located inside OSGI-INF/blueprint folder, looks quite compact as well:

<blueprint xmlns=""

    <service ref="peopleService" interface="" />
    <bean id="peopleService" class="">
        <property name="peopleDao" ref="peopleDao" />
        <property name="logService" ref="logService" />

    <reference id="peopleDao" interface="" />
    <reference id="logService" interface="org.osgi.service.log.LogService" />

The references to PeopleDao and LogService are expected to be provided by OSGi container at runtime. Hovewer, PeopleService implementation is exposed as service and OSGi container will be able to inject it into PeopleRestService once its bundle is being activated.

The last piece of the puzzle, data access module, is a bit more complicated: it contains persistence configuration (META-INF/persistence.xml) and basically depends on JPA 2.0 capabilities of the OSGi container. The persistence.xml is quite basic:

<persistence xmlns=""

    <persistence-unit name="peopleDb" transaction-type="JTA">

            <property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/>

Similarly to the service module, there is an interface PeopleDao exposed:


import java.util.Collection;


public interface PeopleDao {
    Person save( final Person person );
    Person find( final String email );
    Collection< Person > findAll( final int page, final int pageSize );
    void delete( final String email );

With its implementation PeopleDaoImpl:


import java.util.Collection;

import javax.persistence.EntityManager;
import javax.persistence.criteria.CriteriaBuilder;
import javax.persistence.criteria.CriteriaQuery;


public class PeopleDaoImpl implements PeopleDao {
    private EntityManager entityManager;

    public Person save( final Person person ) {
        entityManager.persist( person );
        return person;

    public Person find( final String email ) {
        return entityManager.find( Person.class, email );

    public void setEntityManager( final EntityManager entityManager ) {
        this.entityManager = entityManager;

    public Collection< Person > findAll( final int page, final int pageSize ) {
        final CriteriaBuilder cb = entityManager.getCriteriaBuilder();

        final CriteriaQuery< Person > query = cb.createQuery( Person.class );
            query.from( Person.class );

        return entityManager
            .createQuery( query )
            .setFirstResult(( page - 1 ) * pageSize )
            .setMaxResults( pageSize )

    public void delete( final String email ) {
        entityManager.remove( find( email ) );

Please notice, although we are performing data manipulations, there is no mention of transactions as well as there are no explicit calls to entity manager’s transactions API. We are going to use the declarative approach to transactions as blueprint configuration supports that (the location is unchanged, OSGI-INF/blueprint folder):

<blueprint xmlns=""

    <service ref="peopleDao" interface="" />
    <bean id="peopleDao" class="">
        <jpa:context unitname="peopleDb" property="entityManager" />
        <tx:transaction method="*" value="Required"/>

    <bean id="dataSource" class="org.hsqldb.jdbc.JDBCDataSource">
        <property name="url" value="jdbc:hsqldb:mem:peopleDb"/>

    <service ref="dataSource" interface="javax.sql.DataSource">
            <entry key="" value="peopleDb" />

One thing to keep in mind: the application doesn’t need to create JPA 2.1‘s entity manager: the OSGi runtime is able do that and inject it everywhere it is required, driven by jpa:context declarations. Consequently, tx:transaction instructs the runtime to wrap the selected service methods inside transaction.

Now, when the last service PeopleDao is exposed, we are ready to deploy our modules with Apache Karaf 3.0.1. It is quite easy to do in three steps:

  • run the Apache Karaf 3.0.1 container
    bin/karaf (or binkaraf.bat on Windows)
  • execute following commands from the Apache Karaf 3.0.1 shell:
    feature:repo-add cxf 3.0.1 
    feature:install http cxf jpa openjpa transaction jndi jdbc 
    install -s mvn:org.hsqldb/hsqldb/2.3.2 
    install -s mvn:com.fasterxml.jackson.core/jackson-core/2.4.0
    install -s mvn:com.fasterxml.jackson.core/jackson-annotations/2.4.0 
    install -s mvn:com.fasterxml.jackson.core/jackson-databind/2.4.0 
    install -s mvn:com.fasterxml.jackson.jaxrs/jackson-jaxrs-base/2.4.0 
    install -s mvn:com.fasterxml.jackson.jaxrs/jackson-jaxrs-json-provider/2.4.0
  • build our modules and copy them into Apache Karaf 3.0.1‘s deploy folder (while container is still running):
    mvn clean package cp module*/target/*jar apache-karaf-3.0.1/deploy/

When you run the list command in the Apache Karaf 3.0.1 shell, you should see the list of all activated bundles (modules), similar to this one: Where module-service, module-jax-rs and module-data correspond to the ones we are being developed. By default, all our Apache CXF 3.0.1 services will be available at base URL http://:8181/cxf/api/. It is easy to check by executing cxf:list-endpoints -f command in the Apache Karaf 3.0.1 shell. Let us make sure our REST layer works as expected by sending couple of HTTP requests. Let us create new person:

curl http://localhost:8181/cxf/api/people -iX POST -d "firstName=Tom&lastName=Knocker&"

HTTP/1.1 201 Created
Content-Length: 0
Date: Sat, 09 Aug 2014 15:26:17 GMT
Location: http://localhost:8181/cxf/api/people/
Server: Jetty(8.1.14.v20131031)

And verify that person has been created successfully:

curl -i http://localhost:8181/cxf/api/people

HTTP/1.1 200 OK
Content-Type: application/json
Date: Sat, 09 Aug 2014 15:28:20 GMT
Transfer-Encoding: chunked
Server: Jetty(8.1.14.v20131031)


Would be nice to check if database has the person populated as well. With Apache Karaf 3.0.1 shell it is very simple to do by executing just two commands: jdbc:datasources and jdbc:query peopleDb “select * from people”.

Awesome! I hope this introductory blog post opens yet another piece of interesting technology you may use for developing robust, scalable, modular and manageable software. We have not touched on many, many things but these are here for you to discover. The complete source code is available on GitHub.

Note to Hibernate 4.2.x / 4.3.x users: unfortunately, in the current release of Apache Karaf 3.0.1 the Hibernate 4.3.x does work properly at all (as JPA 2.1 is not yet supported) and, however I have managed to run with Hibernate 4.2.x, the container often refused to resolve the JPA-related dependencies.

Source: OSGi: the gateway into micro-services architecture

Canadian AI 2014 recap

Here is a recap of AI 2014 that I published on Medium:

PyLadies Meetup “Python for Natural Language Processing” Held at Radialpoint This Thursday

PyLadies Meetup MTL

This Thursday, July 17, Radialpoint will be hosting a PyLadies meetup: “Python for Natural Language Processing”. Come and learn how Laura Hernandez, a PhD student from Ecole de Technologie Superieure, is aiming to detect Alzheimer’s disease using NLP while Zareen Syed deep-dives into the challenges of NLP. PyLadies organizer, Françoise Provencher will provide a demo on NLP tools in Python.

Come enjoy free snacks and refreshments alongside talented PyLadies!

Looking forward to seeing you!

Meetup Link :

Date: Thursday, July 17, 2014

Time: 6:30 PM to 9:00 PM

Address: 2050 Bleury, Suite 300, Montréal, QC (map)

Entity Linking and Retrieval for Semantic Search Montreal 2014


A few weeks ago Radialpoint had the privilege of hosting a tutorial for Entity Linking and Retrieval for Semantic Search. For this purpose, we brought three presenters from Europe, researchers Edgar Meij from Yahoo! Labs, Krisztian Balog from the University of Stavanger and Daan Odijk from the University of Amsterdam. They spent a full day with us and a large cross-section of the Information Retrieval community here in Montreal, both on the academic and industrial sides.

This tutorial was the latest incarnation of a series of tutorials by the same presenters that started in SIGIR 2013. Besides SIGIR, The same material has been presented (among others) at WWW 2013 and WSDM 2014. These are the most important conferences in the field. SIGIR 2013 took place in Ireland, WWW 2013 in Brazil and WSDM 2014 in US. It is thus a great feat for us to have been able to host such a world-class tutorial here in downtown Montreal.

A Merger of Great Minds

I was very happy to see such a talented and diverse crowd attend the tutorial. There were people with a wide variety of interests and experience. I saw seasoned experts together with students, managers together with developers. Best of all I saw people driven by a sincere interest on the topic who were taking the opportunity to attend a training that, on top of travel costs to places such as Brazil, would have cost hundreds of dollars as part of SIGIR. This type of event cements the growing interest in data-driven R&D that Radialpoint is pushing into new levels.

The tutorial covered a key technology we’re integrating into award-winning Radialpoint Reveal and other upcoming products – the ability to detect entities (e.g., a hardware device or a software program) in running text and link it to an existing representation of the item (e.g., its Wikipedia page or its Tech Support page).

In true Open Source fashion, the presenters are making all the material for the tutorial available on github. It had two parts:

  • In the first part, they present the problem of entity linking, recognizing entities in running text and linking them back to a generalized concept graph (ontology).
  • In the second part, they cover entity retrieval, discussing the semantic search problem and statistical approaches to it.

The material is coupled with online exercises that can be found on Code Academy.

Post-Tutorial Discussions

After the event, we enjoyed some time at our office discussing how Entity Linking and Retrieval technology is best applied to knowledge management in the customer support industry. In particular, we discussed how technical support anomalies might be effectively represented as labeled subgraphs, an intriguing possibility we might explore in a very near future.

Closing Thoughts

A great tutorial, full of positive energy and forward-thinking ideas, the presenters, Edgar, Krisztian and Dan also extended their appreciation of how unique this experience was for them. Namely, how happy they were to see such diversity and different backgrounds interested in this topic. We wish them good luck and hope to see them during their next visit to Montreal!

Research behind Reveal wins Best Paper Award!

Alexis Phil Pablo and Ary

To build Radialpoint Reveal we applied a combination of machine learning techniques to process search query logs. The research we performed formed the basis for the academic paper called Filtering Personal Queries from Mixed-Use Query Logs. We recently submitted it to the 27th Canadian Conference on Artificial Intelligence. It was Radialpoint’s first academic paper that we had a chance to present at Canadian AI, a major AI conference. Here’s the abstract of the paper:

Queries performed against the open Web during working hours reveal missing content in the internal documentation within an organization. Mining such queries is thus advantageous but it must strictly adhere to privacy policy and meet privacy expectations of the employees. Particularly, we need to filter queries related to non-work activities. We show that, in the case of technical support agents, 78.7% of personal queries can be filtered using a words-as-features Maximum Entropy approach, while losing only 9.3% of the business related queries. Further improvements can be expected when running a data mining algorithm on the queries and when filtering private information from its output.

And guess what, we won the Best Application Award at CCAI and we couldn’t be more excited! We view this award as an important recognition from AI thought leaders validates our approach. I would like to sincerely thank all of my collaborators for their work on this project, and the Conference for awarding us such an honour!

The conference was in Montreal from May 6th to 9th and is a gathering of world leaders in the development of artificial intelligence (AI) and machine learning technologies and research. Of the 86 papers they received from around the world, our paper beat out 21 other finalists vying for two prizes for original work in Theoretical and Applied AI. The prize is in the Applied category.

I’m very happy and proud about this, for many reasons. It’s a local conference and it’s the first time we’ve submitted. I feel it’s an independent validation of our work by world-renowned experts, that even if we know we’re right, other people confirm it as well. This is not the same as having a company auditing our code, it’s AI thought leaders and experts looking at our ideas and saying they are sound.

Co-authored by Ary Fagundes Bressane Neto, Philippe Desaulniers, Alexis Smirnov, and yours truly, the paper describes how we analyzed web searches by tech support agents to determine which technical issues they were researching. Our goal was to obtain valuable tech support knowledge that other tech support agents can leverage for solving technical problems faster. We created an end-to-end process for sifting through huge amounts of data and separating personal searches from business ones while protecting the agents’ privacy. The approach we came up with was clearly appreciated by the industry. I think the judges liked our practical approach to a real-life problem.

The experience of working on this project was truly gratifying. Although we applied our process to analyzing tech support searches, I feel this could be used in many other ways, such as finding out what’s trending in an organization, helping libraries mine queries to determine what knowledge they should purchase, or helping companies figure out where the greatest need for training is. I think it could really help organizations be more responsive to what people are looking for. The possibilities are endless!


Bring your VMWare Infrastructure to the next level using CloudStack

Few weeks ago, back in April 2014, Radialpoint and CloudOps had a chance to present at CloudStack Collaboration Conference 2014 in Denver. Our presentation was called Success Story: Bring your VMWare Infrastructure to the next level using CloudStack. We talked about how CloudStack helped us to transform VMware-based infrastructure to help development teams move faster. With this transition we also started using SaltStack to create complete environments for distributed systems and Twelve-Factor apps.