Anthropic, a San Francisco-based start-up, has developed “constitutional classifiers” to protect against harmful content generated by AI models like its Claude chatbot, as tech giants like Microsoft and Meta work to address the risks posed by AI technology. The new system acts as a safeguard layer on top of language models, monitoring inputs and outputs for dangerous information. Anthropic offered bug bounties to testers who attempted to bypass the security measures, with the system rejecting over 95% of attempts with the classifiers in place.
Full Article
Loading PerspectiveSplit analysis...






