Calls for AI models to be stored on Bitcoin gain traction

(Image credit: Getty Images)

Large language models (LLMs) could soon be stored on Bitcoin as a way to distribute publicly-trained LLMs securely and anonymously.

An AI developer has issued a call to action in which they have urged AI developers to disseminate Alpaca and LLaMA, two LLMs derived from leaked Meta training data, via BitTorrent and Bitcoin.

The process aims to democratise generative AI and would involve users publishing AI model data via a torrent and then inscribing it onto the Bitcoin blockchain through a process known as Ordinals, which allows data to be stored on top of Bitcoin.

Once the files have been successfully uploaded, they could become accessible to any user around the world in a decentralised manner.

Any user that had the Bitcoin address of the holder of the model would be able to query Bitcoin for the holder’s transaction history, and use the goatfile to access the torrented models via the bundled magnet URIs.

Torrent files are linked via magnet URIs, which are used to identify files by cryptographic hashing as opposed to by location, and these have been bundled together into a ‘goatfile’ - YAML text that can be inscribed onto $10-20 (£8-16) of Bitcoin - in order to anonymously direct users to the leaked LLaMA files.

The developer tweeted the instructions to widespread interest on 23 March. They noted that once one person has followed all the steps, the models will be forever accessible online.

In this way, blockchain technology could be used to distribute publicly-trained LLMs securely and anonymously.

Meta’s LLaMA LLM was leaked online in early March, and quickly became available via torrent as well as through the imageboard website 4chan.

The model, which Meta made available on a request-by-request basis, is a versatile build capable of processing 20 languages and performed well against AI benchmarks.

A key benefit of LLaMA in comparison to competing models such as LaMDA or GPT-3 and GPT-4 is its small size. The model is available in sizes of 7 billion, 13 billion, 33 billion, and 65 billion parameters, and users can run the smallest of these on consumer hardware as affordable as a single GPU.

A debate over democratised AI

While generative AI has long been the remit of academia, the past year has seen rapid expansion into the field by private companies such as OpenAI, Microsoft, Google, Meta, and Nvidia.

The development of generative AI has fallen largely into the hands of big tech companies, partly due to the need for hyperscaler infrastructure in order to adequately train models containing many billions of parameters.

There are some exceptions to this trend. AWS and Hugging Face have partnered to ‘democratise’ ML and AI, and through LLaMA Meta intended to allow widespread development and implementation of LLMs for approved parties at no cost.

Localising models to AWS, Azure, or Google Cloud architecture will allow developers to complete vast training in short timeframes at manageable cost.

But open source developers, some of whom regard any limitations on software access as an existential threat to the standard, oppose the limitations on access being enforced by such companies.

In the face of this, the LLaMA leak on 4chan radically altered the landscape for the public availability of LLMs. Although the model is far from plug-and-play in its raw form, the fact that it can be run locally means that the model can be used to easily power any number of systems.

AWS and Hugging Face partner to ‘democratise’ ML, AI models Why is big tech racing to partner with Nvidia for AI? ChatGPT privacy flaw exposes users’ chatbot interactions

There are concerns that the lightweight and powerful model could be used for malicious purposes, and with access achievable through the blockchain it will not be possible to identify which threat actors have obtained the model.

“Anyone can fine-tune this model for anything they want now,” tweeted information security consultant Jeffrey Ladish.

“Fine tune it on 4chan and get endless racist trash. Want a model that constantly tries to gaslight you in subtle ways? Should be achievable. Phishing, scams, and spam are my immediate concerns, but we'll see…”

Just how much of a benefit or a risk some of these open source systems could be has yet to be seen.

Stanford’s Alpaca model, its own academic iteration built on the 7 billion-parameter version of LLaMA, demonstrated that LLaMA is a viable foundation for competing with GPT-3.

It also showed that refining LLaMA against questions generated by GPT-4 could produce proportional improvements as demonstrated in the said model.

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.

In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.

Latest