Sunday, November 17, 2024
Google search engine
HomeData Modelling & AIBusiness AnalyticsGetting started with Julia – a high level, high performance language for...

Getting started with Julia – a high level, high performance language for computing

Learning new tools and techniques in data science is sort of like running on treadmill – you have to run continuously to stay on top of it. The minute you stop, you start falling behind.

As part of this learning, I continuously look out for new developments happening in new tools and techniques. It was in this desire to continuously learn that I came across Julia about a year back. It was in very early stages then – it still is!

But, there is something special about Julia, which makes it a compelling tool to learn for all future data scientists. So, I thought to write a few articles on it. This is first of these articles, which provides the motivation to learn Julia, its installation, current packages available and ways to become part of Julia community.

 

What is Julia?

687474703a2f2f6a756c69616c616e672e6f72672f696d616765732f6c6f676f5f68697265732e706e67

Julia is a high-level, high-performance dynamic programming language for technical computing, with easy to write syntax. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library.

 

Why another programming language?

The simplest way to understand its power is to think of it as a language which has a wide range of statistical packages like R, it is easy to write and learn like Python and has execution speed similar to C / C++. If you are still not convinced about what I have mentioned, have a look at benchmarks of a few common benchmarks below:
Julia_benchmark

C compiled by gcc 4.8.2, taking best timing from all optimization levels (-O0 through -O3). C, Fortran and Julia use OpenBLAS v0.2.12. The Python implementations of rand_mat_stat and rand_mat_mul use NumPy (v1.8.2) functions; the rest are pure Python implementations.

 

A Summary of Features in Julia

Some of the important features to highlight from data science capabilities are:

A more comprehensive list of features can be accessed here

 

Installation of Julia

Now that you might be raring to give Julia a try for all the promises made above, let me quickly walk through various options to test drive your new sedan (which has sports car like acceleration):

  • Option 1: Try Juliabox in browser – The simplest of option – no setup required. Just go to Juliabox, sign in using Google (sorry, if you don’t have a Google account – try the next version) and your instance is ready to fire.

Juliabox

  • Option 2 – Use an IDEJuno seems to be the best IDE available right now. Sadly, JuliaStudio is no longer supported. The best way to install it is to download the combo package from Julia site itself.
  • Option 3 – Using Command line – If you are the hardcore programmer, who can’t think of a programming language without a command line, don’t worry! There is an option for you as well. You can download the package here.
  • Option 4 – Using iJulia notebooks – If you are a Python explorer and have used iPython for your interactive data exploration – here is an awesome news. iJulia notebooks are equally awesome and carry over similar interface. In order to install iJulia, you need to install iPython first, then install Julia 0.3 or later. Next start Julia and add package “IJulia” and start using it. You can find more details here.

The installation was pretty simple and straight forward. I have tried Juliabox as well as Juno. Option 1 and 2 come with a few demo examples before hand. You can just follow the comments (starting with #) to understand and give the code a test run.

 

A few important packages

There are a total of 610 packages on Julia as on date (9th July 2015). If you filter out packages for which tests have failed or which have not been tested, you are only left with 381 packages. Among these I have filtered out the ones related to data science and have more than 15 stars. That leaves us with the following packages:

Package Description Version Stars
BackpropNeuralNet A neural network in Julia 0.0.3 18
Bokeh Bokeh Bindings for Julia 0.1.0 26
Boltzmann Restricted Boltzmann Machines in Julia 0.1.0 19
Calculus Calculus functions in Julia 0.1.8 46
Clustering A Julia package for data clustering 0.4.0 33
Convex A julia package for disciplined convex programming. 0.0.6 108
Cpp Utilities for calling C++ from Julia 0.1.0 18
DataArrays Data structures that allow missing values 0.2.16 21
DataFrames library for working with tabular data in Julia 0.6.7 206
DataFramesMeta Metaprogramming tools for DataFrames 0.0.1 33
DataStructures Julia implementation of Data structures 0.3.10 52
DecisionTree Decision Tree Classifier and Regressor 0.3.8 36
Distances A package for evaluating distances(metrics) between vectors. 0.2.0 21
Distributions A package for probability distributions & associated functions. 0.7.4 101
DSP Filter design, periodograms, window functions, and other digital signal processing functionality 0.0.8 32
FunctionalCollections Functional and and persistent data structures for Julia 0.1.2 34
Gadfly Crafty statistical graphics for Julia. 0.3.13 684
GeneticAlgorithms A lightweight framework for writing genetic algorithms in Julia 0.0.3 86
GLM Generalized linear models in Julia 0.4.6 78
GLMNet Wrapper for fitting Lasso/ElasticNet GLM models using glmnet 0.0.4 23
Graphs Working with graphs in Julia 0.5.5 90
HDF5 Saving and loading Julia variables 0.4.18 65
HypothesisTests Hypothesis tests for Julia 0.2.9 16
Images An image library for Julia 0.4.39 73
JuMP Modeling language for Mathematical Programming (linear, mixed-integer, conic, nonlinear) 0.9.2 162
MachineLearning Julia Machine Learning library 0.0.3 37
Mamba Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia 0.4.11 44
Markdown Markdown parsing for Julia 0.3.0 21
Match Advanced Pattern Matching for Julia 0.1.3 29
MixedModels A Julia package for fitting (statistical) mixed-effects models 0.3.22 41
MLBase A set of functions to support the development of machine learning algorithms 0.5.1 41
Mocha Deep Learning framework for Julia 0.0.8 297
MultivariateStats A Julia package for multivariate statistics & data analysis (e.g. dimension reduction) 0.2.1 21
NLopt Package to call the NLopt nonlinear-optimization library from the Julia language 0.2.1 31
OpenStreetMap Julia OpenStreetMap Package 0.8.1 20
Optim Optimization functions for Julia 0.4.2 116
Orchestra Heterogeneous ensemble learning for Julia. 0.0.5 27
PGM A Julia framework for probabilistic graphical models. 0.0.1 25
PyCall Package to call Python functions from the Julia language 0.8.1 183
RCall Embedded R within Julia 0.2.1 16
RDatasets Julia package for loading many of the data sets available in R 0.1.2 34
Regression Algorithms for regression (e.g. linear / logistic regression) 0.3.2 17
Rif Julia-to-R interface 0.0.12 47
StatsBase Basic statistics for Julia 0.6.15 57
StreamStats Compute statistics over data streams in pure Julia 0.0.2 27
TimeSeries Time series toolkit for Julia 0.5.10 37

P.S. There is a lot of development happening on the language and the libraries. So this can change very quickly.

 

A few things to note:

  • Gadfly looks to be the most popular package. This might well be because it is being used as a showcase library across all the products in the ecosystem
  • The core data science libraries look more evolved than some of the other libraries. Mocha for DeepLearning, Orchestra for optimization, DataFrames or distributions are all on more evolved version comparatively

 

How to install & use a package?

Installing and using a package in Julia is dead simple. If you want to install / add a package, simply type this in your programming interface

Pkg.add("Gadfly")

This will install the package as well as its dependencies.

 

Once the package is installed, you can load it simply by calling “using”

using Gadfly

Simple!

 

The Julia ecosystem:

Julia is supported by a close knit community of developers. Here are a few mailing lists, you can be a part of:

  • julia-news – for important announcements, such as new releases.
  • julia-users – discussion around the usage of Julia. New users of Julia can ask their questions here.
  • julia-stats – special purpose mailing list for discussions related to statistical programming with Julia. Topics of interest include DataFrame support, GLM modeling, and automatic generation of MCMC code for Bayesian models.
  • julia-opt – discussions related to numerical optimization in julia. This includes Mathematical Programming (linear, mixed-integer, conic, semi-definite, etc.), constrained and unconstrained gradient-based and gradient-free optimization, and related topics.

In addition to these newsletter, you can also look at juliabloggers.com . The site looks like a developing ecosystem as of now though.

 

End Notes

I hope that you have got a good overview of this powerful language under development. I was pretty excited when I saw it first and I continue to check this language for new developments closely. In the next articles to come, we will understand the data structured available in Julia, its interface with other languages e.g. Python and solve one of the case studies using Julia to understand its power.

What do you think of Julia? Are you all set to give it a try? Does the future excite you? Do let us know your thoughts through comments below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Kunal Jain

12 Jul 2020

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.

RELATED ARTICLES

Most Popular

Recent Comments