Case Studies

IMDB analysis

Nakkeb team wanted to know what are the movies on IMDB that have a ranking of more than 7 stars and had a metascore of more than 80.
Searching the database to discover new movies is not something easy as there are many different type of movies and don’t usually appear together. So we went to our NakkebWeb™ solution and we asked it to crawl imdb.com. We then got the database into our automatic data extraction system which was able to identify many features in movies such as Title, year, duration, category, rating, meta rating, directory, writer, cast..etc. The engine returned the extract field annotated and identified into a database, then the request was a simple query to the database. We output the result into excel sheet that can be found below. We also included the distribution of those movies depending on rating, number of reviews and other interesting parameters.

GooglePlay analysis

Nakkeb team wanted to estimate the income that Google is generating from their Play Google appstore. Of course the goal is not to discover any confidential numbers at Google but rather see how large and profitable the Android market is. We also wanted to answer other questions, like is it really profitable to enter the business? how much an average game is generating? what is the median number of downloads if the price of teh game is usd 1, 2 or 5? what is the variance in all of these cases.
There are over 500k applications on the appstore; some are free some are paid. . . for each application Google will show you how many downloads this application is getting in a range fashion (i.e. 5000-10,000, 10m-50m..etc). We used NakkebWeb to gather infomration from play.google.com. The engine was able to crawl, extract, tag and identify important information. The engine was able to identify easily the major fields such as title, description, screenshots, price and number of downloads. NakkebWeb was able to export those infomration in a database. We then wrote small script to apply equations on the result data. Here are some of the results.

iTunes analysis

Nakkeb team wanted to know the distribution of Applications on Appstore according to their Price and category.

CrunchBase analysis

We wanted to study which startups get funding. What are the factors that affect this funding? is it competitors, is it the sector, or is it the founders?
To achieve this we analyzed data from Crunchbase. We crawled the information and did a study of correlation between the industry, important keywords/concepts, amount of funding, date of funding and the period over which the startup raised the funding. So we used again NakkebWeb™ solution and we asked it to crawl crunchbase.com. We then got the database into our automatic data extraction system which was able to identify many features in the webpages automatically. The engine returned the extract field annotated and identified into a database.

LastFM analysis

Music industry is one of the biggest. Which bands are generating more sellings than others? Is there a secret to discover?
Our goal was to study the information offered by each artist and their songs and analyse to see whether there's a correlation between the type of the song, its lyrics and success rate. Is there a shift in the famous artist toward a specific genre because it is generating good income? We used NakkebWeb™ solution to crawl and extract information from last.fm. NakkebWeb™ was able to extract all information from these pages and produce them into a database.