[Short general description]: A majority of the human crowd are still oblivious of the extensive struggle that a small section of the AI community are facing, to make AI an easy and accessible affair for all. This stems from the realisation that AI is here and will become part of our lives in ways that we may not yet fathom. Regular AI companies or companies seeking to implement AI into their systems strive to bring about new ways to improve life with AI are finding themselves at a crippled stage to fully explore their ideas. Raven aims to help such set of individuals and companies to fully exploit the potential of AI, economically.
[Main problems tackled]: The development process to train AI/ML models could take weeks to even months to complete when run on basic computers with limited capacity. The cost factor to acquire better compute chips (GPUs) did not make the process any easier. Intensive and frequent use of speedy compute resources to calculate and update gradients in different neurons of a deep neural network, inferred from the training data, usually cost beyond what small to medium scale developers and companies could encompass. Cloud computing aided to an extent, but still, the cost of acquiring resources through them is not affordable for tasks revolving AI development. The usual spend ranges from $2.50 — $17 USD per hour on any given cloud platform.
The simple solution to this inaccessibility now lies in crowdsourcing them. Crowdsourcing has long been disrupting the existing markets as the David (read Ubers and Airbnbs of the world) amongst various Goliaths making those services cheaper and viable. The world of AI has been witnessing it too. From crowdsourcing the development on Kaggle to gathering data using Ocean Protocol, the AI ecosystem is welcoming the new approaches. Raven aims to take the torch further by building one of the first truly decentralized and distributed deep learning training system, by utilizing the idle compute resources and using them to train the deep learning models economically.
[Main contribution proposal]: The theoretical constraints in decentralised and distributed training of Deep Neural Networks (DNNs) lie in how a DNN architecture is centrally trained on a single node and then obtained by different servers for their application or, in it getting split among several servers to be then trained. Needless to say, the computation capacity to do such training is huge, whereby limiting them only to the powerful GPUs and servers. Raven approaches this problem by facilitating dynamic allocation of nodes to the participating devices in the network. Thus, eliminating any added dependencies on the host nodes and significantly reducing the compute power required in-house.
Where Raven Protocol differs from similar institutions is how it tackles latency from asynchronous updation and parallelization of data shards. The latency for which there was no previous solution in that training models through these ‘others’, can consume a major chunk of time, up to several weeks to months. This is irrespective of availability of the massive computational power that is needed to perform the training. Suppose the parallelization is achieved, even then they would be confined to users with systems that can handle enormous power. This factor eliminates small scale users from accessing the platform.
Raven is able to successfully build a dynamic graph for such vast number of small synchronous calculations that are required to train a model.
The cost of acquiring specific powerful CPUs and GPUs to train DNNs becomes minimal through Raven Protocol which allows for sharing of resources from idle compute power of individual contributors’ devices. The concept of sharing idle computing power to facilitate training saves the enormous expense involved. In return the contributors are compensated/rewarded with Raven tokens (RAV).
[Innovation]: Traditional methods of Deep Learning training involve Data and Model Parallelism, which partially quenches that demand, with distributed systems.
- Data Parallelism - Data Parallelism is sought in distributed training, as data cannot be contained in one single machine, and also to achieve faster training. Thus, data gets cut into smaller chunks to be utilised at different machines (ref. Fig 1.0). For this, the Model is replicated at those systems. These Model replicas are trained individually with the Data Shards (Bn) and then the weights are collated at the Parameter (/Master) Server to obtain the final model. This method proved to contain a lot of latency in the overall execution.
- Model Parallelism - Researchers had come up with yet another method, to overcome the limitations of data parallelism by splitting the model architecture over the Intranet. This method of distribution is called Model Parallelism, where the dataset is kept in a system or storage, accessible across machines on which architecture-splits are kept ready to be trained. But, even with this method, each system participating in the training needs to be equipped with sophisticated compute resources such as advanced GPUs. And thus, has its own limitation in terms of scalability, which becomes a bottleneck, that creates latency in the network.
Raven Protocol uses Distributed Training of Deep Learning Models using a shared network of Compute Power, within a blockchain environment. The dearth of availability of economical supply of ample compute power for individuals and businesses to perform the resource intensive DL training, brought forth this concept of gathering compute resources from the willing public.
Raven takes both Data and Model Parallelisation approaches to form a different model of distribution. This is basically, crowd-sourcing of compute power, from as low a source as a Smartphone in your pocket, or a PC on your desks. Being set on a blockchain, this provides additional security and anonymity while distributing the training across multiple devices over the Internet. This also brings new revenue opportunities to the contributors, and partners who are coming forward and growing the ecosystem in the form of a constant source of income from such DL trainings.
Dynamic Graph Computation
All the frameworks operate on tensors and are built on the computational graph as a Directed Acyclic Graph. In most of the current and popular deep learning frameworks including Tensorflow (before Eager Execution), the computational graph is static in nature. However, frameworks like PyTorch is dynamic, giving a lot more options to researchers and developers to fiddle around with their creativity and imagination.
A major difference between static and dynamic computation graph is that in the former, the model optimization is preset, and the data substitutes the placeholder tensors. Whereas, in the latter the nodes in a network are executed without a need for any placeholder tensors. Dynamic computation holds a very distinguishable advantage in cases like language modelling where, the shape of the tensors are variable during the course of the training. The benefit of a dynamic graph is its concurrency, and it is robust enough to handle the contributor addition or deletion, making the whole Raven training sustainable.
Raven is thus capable of eliminating the latency and scalability issues, with both the approaches. Hence, distributing the training of any deeper neural network and their larger datasets, by getting rid of the added dependency on the Model replication. Data is also sharded in smaller snippets. In fact, the Model is intact at the Master Node, and the heavy lifting is distributed in the tiniest snippets of data subsets over the network of contributors. The resultant gradients, after the calculations that happen at the node/contributor ends, are sent back to the Master Node.