Skip to content

Google Summer of Code

Anže Starič edited this page Feb 9, 2017 · 8 revisions

Google Summer of Code

Jump straight to ideas below.

Orange - Data mining fruitful & fun!

About Orange

Orange is an open-source, cross-platform, component-based data mining and machine learning software suite which features friendly yet powerful and flexible visual programming front-end for exploratory data analysis, visualization, model construction, evaluation, and forecast. It includes a comprehensive set of components (we call them widgets) for data preprocessing, feature scoring and filtering, modeling, model evaluation and exploration. It is maintained and developed at the Bioinformatics Laboratory of the Faculty of Computer and Information Science, University of Ljubljana, Slovenia.

Website: http://orange.biolab.si/
Wikipedia: https://en.wikipedia.org/wiki/Orange_(software)

Orange workflow screenshot

Ideal student candidate

Since Orange is mostly written in Python 3, the ideal student will have strong skills in idiomatic Python 🐍 and NumPy with at least two years of experience. They would not be uncomfortable reading technical and scientific articles. 🎓 They would also possess some knowledge of Git and GitHub because this is how we roll. :octocat: Some understanding of how GUI widget toolkits behave and work (such as Qt in particular) is a strong plus.

The best candidates will understand the foundations of the Unix Philosophy and will chant its koans regularly before bedtime. The best candidates use a POSIX system because everyone knows you can't develop on Windos.

Google Summer of Code selection process is quite competitive. Accepted students typically have thoroughly researched the technologies of their proposed project and have been in frequent contact with potential mentors.

How to start

Here's the recipe to win:

If you're set to win with Orange:

If you intend to come visit us in person, we share lunch (pizzas or Indian food) on Fridays!

GSoC application resources

Contact

Open-source development includes open and transparent communication. To step in contact with us, please use one of the preferred means of communication:

  • gitter chat can be used to say hi, discuss your GSoC application, proposed ideas, idea implementation details, ...,
  • Orange issue tracker for queries that look like issues (bugs, legitimate feature requests),
  • pull-requests section for issues that include patches 👍 that fix them

Project ideas

Listed in no particular order, sometimes vague and incomplete, are some ideas for projects that might be interesting to carry out during this year's Google Summer of Code program.

If you'd like to discuss a particular implementation of any of these ideas, or if you have questions regarding your own idea, please join us on gitter chat and don't be shy posing any questions there. 😃

To be clear, your own ideas that complement Orange are most welcome!

ODBC sql backend

Orange can currently connect to PostgreSQL and MSSQL databases. Support for alternative databases could be provided via a (well maintained) third party python library that supports connections to ODBC data sources. The end result should include (beside a working backend) automated tests that run on travis and/or appveyor, documentation of the feature, installation guide. Any external packages used should be installable on all supported platforms (win, macos, linux) via pip and conda.

Intensity: moderate
Extent: moderate
Involves: Python, SQL, ODBC
Mentors: @astaric

Improved Feature Constructor

Constructing new features is a crucial task in data mining. There is a widget for this in Orange (Feature Constructor), but it has some unnecessary limitations and most of all is not very intuitive and easy to use. The need for better solutions has already lead to the Create Class widget being introduced, which is limited to a more specific case, but does that much better. However a good general Feature Constructor is still needed.

The widget should be redone / extensively improved, with focuses on:

  • the best user experience possible
  • high efficiency and effectiveness
  • compromises between ease of use and advanced features (somehow hidden at first?)
  • very good documentation and in-widget help (probably with some examples etc)

We would like to see a good first proposal with ideas and suggestions, but expect a lot of coordination about design decisions with other developers after that.

Intensity: moderate
Extent: limited
Involves: Python, QT
Mentors: @janezd, @lanzagar

Orange Add-on: Statistics

In a desire to make Orange a complete data analysis software, we wish to introduce simple statistics widgets to Orange. The new add-on would include t-test (sample t-test, independent t-test, pair samples t-test + Bayesian counterparts, ANOVA (+ Bayesian ANOVA), Pearson's r, correlations, normalization, etc. We also wish to extend Box Plot widget to output basic statistics (mean, median, variance, confidence intervals, standard deviation) to a Data Table.

Intensity: moderate
Extent: extensive
Involves: Python, Qt
Mentors: @janezd, @ajdapretnar, @kernc

Porting Orange 2 widgets to Orange 3

Orange has had a long history. There was a period when most of Orange core was a custom C++ code, with glue code and widgets in Python 2 and PyQwt. It eventually proved hard to manage, so we migrated to the current basic stack: Python 3, PyQtGraph, NumPy.

We haven't yet managed to port all the widgets (there's so much to do!), and we would really like to see the following widgets from Orange 2: Interaction Graph, SOM (self-organizing maps) with viewer, Reliability (classifier reliability estimation), Ensemble, and multiple visualization widgets.

Any of those widgets can be a proposal itself!

Intensity: hard
Extent: extensive
Involves: Orange 2, Cython
Mentors: @janezd, @BlazZupan

Orange package for Debian/Ubuntu GNU/Linux

GNU/Linux is one of our three target platforms. While the inferior platforms enjoy prebuilt executable packages, GNU/Linux users are let to themselves. This is all fine and dandy as GNU/Linux users usually find they way around pretty easily, but it would still be convenient if users could more simply just dpkg that deb or apt-get orange onto their 'buntu boxen.

The only way to ever become a Debian maintainer and positively affect millions is to start.

Intensity: easy
Extent: limited
Involves: git-buildpackage, Open Build Service
Mentors: @kernc

Export OWS to Python code

Imagine analyzing some data in Orange. You build a model that you are satisfied with, but you just want to make some final adjustments to your model in the code. Unfortunately, there is currently no way to export widgets' functionality and the built OWS workscheme into the underlying Python code (think of IPython Notebook's --to script).

It'd be useful if we could transparently export (linearize) Orange workflow schemes into raw Python code one could further edit and run as Python script.

Note, the mentors have no idea how this could be done. They just wish it was.

Intensity: hard
Extent: broad
Involves: Python
Mentors: @kernc