The customer wanted to create an online musical library - a mashup of VK.com social network and LastFM. The most obvious way to organize search was to create a semantic database of artists, albums, tracks and connections between them. Querying that DB allowed us to find the right track and to search for an mp3 file in VK later. To keep semantical data fresh we had to configure replication between our server and MusicBrains service or a similar one. That was the right way, but after estimating semantical DB size (around 100Gb) and replication problems, we came to a simpler solution.

Quality of VK search API is questionable, and VK database is not semantical. It's just a list of tracks uploaded by users. There were lots of duplicates and misprints. But we decided to take this "dirty" output and to filter it through LastFM API. Also, we removed duplicates, fixed typos with our algorithm. If we could not find track author in LastFM, we listed the track under "Other" category. Thus we got the almost semantic grouping without having our own semantic database. To minimize the number of requests and to improve performance we implemented several cache levels: for popular requests and artist data.

 

with semantic database

  

schema we implemented

    

About UI design. The whole website is a music player. Current playlist, search results, user playlists are all shown in one manner to make UI clean and intuitive.

 

search results

     

create a new playlist

   

loaded playlist

   

We created MVP and deployed the website. Traffic increased, and we found out that VK API wasn't a good fit. It had undocumented flaws and limits that were much lower than announced. Modular architecture that we always used was a great helper; it allowed us to rewrite the search module without touching the rest of the code. We replaced API with HTML parsing of search page in VK. Thus we preserved the whole initial schema, making service completely functional.

So we managed to create a service built on top of public APIs with caching to improve performance. It was crucial to write several prototypes to choose the optimal solution. Modular architecture approach helped us a lot, and when the external server went down, we had rewritten faulty module only.