randomname

joined 2 days ago
 

Archived

cross-posted from: https://scribe.disroot.org/post/1835428

cross-posted from: https://scribe.disroot.org/post/1835375

cross-posted from: https://scribe.disroot.org/post/1835374

DeepSeek-R1 is a blockbuster open-source model that is now at the top of the U.S. App Store.

As a Chinese company, DeepSeek is beholden to CCP policy. This is reflected even in the open-source model, prompting concerns about censorship and other influence.

Today we’re publishing a dataset of prompts covering sensitive topics that are likely to be censored by the CCP. These topics include perennial issues like Taiwanese independence, historical narratives around the Cultural Revolution, and questions about Xi Jinping.

...

 

cross-posted from: https://scribe.disroot.org/post/1839567

Here is the link to the report: https://graphika.com/reports/chinese-state-influence

A Chinese social media operation that aims to whip up political anger in the West has called for the overthrow of a foreign government when impersonating protesters criticising flood relief efforts in Spain, online analysis outfit Graphika said.

Graphika said an operation dubbed Spamouflage, which it believed was linked to the Chinese state, posed this month as human rights group Safeguard Defenders to spread online calls for the government to be toppled in response to the catastrophic floods in October that killed 224 people.

"This is the first time we have seen Spamouflage directly calling to overthrow a foreign government," Graphika said in its latest report.

...

The report also finds:

  • Chinese covert influence operations have impersonated human rights organizations critical of Beijing, almost certainly in an effort to discredit their activities and disrupt domestic political conversations in Western countries. The state-linked Spamouflage operation, for instance, has repeatedly targeted the Spain-based non-profit Safeguard Defenders and in January posed as the organization to spread online calls for the Spanish government to be overthrown in response to deadly floods in Valencia. This is the first time we have seen Spamouflage directly calling for the overthrow of a foreign government.
  • Chinese state influence actors and pro-China communities continue to leverage international trade issues in their efforts to advance Beijing’s strategic interests. In recent weeks, this has included attempts to orchestrate a boycott of Japanese retail brand Uniqlo due to the company’s reported refusal to use cotton from China’s Xinjiang region, and efforts to exacerbate tensions between the U.S. and Japan over a blocked steel company merger.
  • Chinese officials and state media have used social media and other online platforms to dismiss and deflect allegations of Chinese state hacking activity. After Japan accused China in January of orchestrating a years-long hacking campaign against Japanese government agencies and companies, for example, Chinese state actors spread statements dismissing the allegations as groundless and disseminated cartoons casting Tokyo as an agent of U.S. “disinformation.”
  • Overt and covert Chinese state influence actors have engaged in a sustained effort to advance narratives that reinforce Beijing’s territorial claims in the South China Sea and attempt to legitimize its activities in the region. In November, these actors amplified comments by an international law scholar that appeared to support China’s position.
[–] randomname@scribe.disroot.org 0 points 1 day ago* (last edited 1 day ago)

The guys at HF (and many others) appear to have a different understanding of Open Source.

As the Open Source AI definition says, among others:

Data Information: Sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system. Data Information shall be made available under OSI-approved terms.

  • In particular, this must include: (1) the complete description of all data used for training, including (if used) of unshareable data, disclosing the provenance of the data, its scope and characteristics, how the data was obtained and selected, the labeling procedures, and data processing and filtering methodologies; (2) a listing of all publicly available training data and where to obtain it; and (3) a listing of all training data obtainable from third parties and where to obtain it, including for fee.

Code: The complete source code used to train and run the system. The Code shall represent the full specification of how the data was processed and filtered, and how the training was done. Code shall be made available under OSI-approved licenses.

  • For example, if used, this must include code used for processing and filtering data, code used for training including arguments and settings used, validation and testing, supporting libraries like tokenizers and hyperparameters search code, inference code, and model architecture.

Parameters: The model parameters, such as weights or other configuration settings. Parameters shall be made available under OSI-approved terms.

  • The licensing or other terms applied to these elements and to any combination thereof may contain conditions that require any modified version to be released under the same terms as the original.

These three components -data, code, parameter- shall be released under the same condition.

[–] randomname@scribe.disroot.org 7 points 1 day ago (2 children)

Is Deepseek Open Source?

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.

The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.

[–] randomname@scribe.disroot.org -2 points 1 day ago (1 children)

I feel safer knowing that my data is not in a country where the company can use it against me

Where is this country that can't use your data against you?

[–] randomname@scribe.disroot.org 4 points 1 day ago (1 children)

Is Deepseek Open Source?

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.

The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.