Skip to main content

AI

Re-leveling Books Using AI

Submission Date

Question

[This question comes from a regional BOCES.]

Our technology integration specialist suggested that we use an AI tool to re-level books/text by an original author to a more appropriate reading level for students who are struggling. This is now being used regularly with our special education staff for students who are struggling readers. Is this an infringement of copyright?

Answer

In the spirit of learning, I am going to answer this question in a multiple-choice quiz.  For purposes of the quiz, we’ll use the member’s term “re-level” for generating simplified versions of curricular materials.

[NOTE: If you are not feeling playful and just need the answer, please read footnote #2 and skip to the “Final Paragraphs” section of this response.]

Name:                                                                                                             Date:              

Copyright Quiz

 

  1. A teacher uses software[1] to create a “re-levelled” version of “The Gettysburg Address,” which was published before 1900. Is it infringement?
    1. Yes, because creating a “re-levelled” version of a book is creating a “derivative work”[2] protected by the Copyright Act.
    2. No, because even if it is a derivative work, the book is no longer protected by copyright.
    3. Maybe, if the work was recently turned into a movie.
  1. A teacher uses software to create a “re-levelled” version of the 2020 young adult book All Boys Aren’t Blue, and the district does not have the permission of the copyright owner. Is it infringement?
  1. No, because the use is for education.
  2. No, because the software removes all the parts people are complaining to the school board about.
  3. Yes.
  1. A teacher uses software to create a re-levelled version of a New York Times article for a learning-disabled student and the district does not have the permission of the copyright owner. The teacher only allows access to the student. Is it infringement?
  1. No, because the simpler version is a modification of a single article to accommodate a person with a disability.
  2. No, because the district is a state institution that is arguably exempt from copyright claims in federal court.
  3. Yes.
  1. A teacher uses software to “re-level” a short excerpt of a history textbook to illustrate the dangers of relying on AI to modify learning content and the district does not have the permission of the copyright owner. The class is given a hard copy of the modified paragraph with the unmodified paragraph next to it for comparison, and the assignment is also posted on the class’s LMS[3]. Is it infringement?
  1. Yes, but kudos to the teacher for emphasizing critical thinking.
  2. No, so long as the excerpt is only long enough to demonstrate the point of the modification and is not used as a substitute for the original, allowing it to be considered a “fair use”.
  3. No, not even when the district decides they like the modified version better and decides to re-level the entire book.
  1. A teacher uses software to re-level an entire collection of curricular materials with permission of the publisher, who is not the copyright owner but has an unlimited exclusive license to authorize “derivative works” of the content. Is it infringement?
  1. No, but I am concerned this type of thing could dull our vigilance against the prospect of a future subject to the binary whim of robot overlords.
  2. Yes, because there is no specific permission from the actual author.
  3. No.

 

 

Answer Key:

  1. B
  2. C
  3. C
  4. B
  5. A or C, depending on your POV.

Final Paragraphs

As the above quiz scenarios illustrate, the answer to the member’s question is: it depends on a variety of factors, but even if the use is limited to a specific student with an IEP[4], the only ways to ensure the creation/use of an AI-modified version of an entire work is not an infringing “derivative work” is to: 1) only modify works in the public domain; OR 2) only modify works for which a district has specific permission to create derivative works.

The sole exception to this would be a modification that met the criteria for “fair use”[5] (as modelled in question 4).

I will (mostly) leave the ethical/educational/social/futuristic terror aspects of this question to philosophers,[6] ethicists, educators, Writers Guild members, artists, and speculative fiction writers.

That said, if someone uses AI to “re-level” this answer for a 4-year-old, I hope the modified version will be: “Don’t use people’s work without permission, and please don’t give up on people.”

 

[1] I am going to use the term “software” since the function described could be done by “AI” or (I believe) could be done by a sophisticated “find-and-replace” computer program. In making this distinction, I rely on the definition of “Artificial Intelligence” in 15 USCS 9401, which defines AI as: “… a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations or decisions influencing real or virtual environments. Artificial intelligence systems use machine and human-based inputs to—

(A) perceive real and virtual environments;

(B) abstract such perceptions into models through analysis in an automated manner; and

(C) use model inference to formulate options for information or action.”

[2] A “derivative” work is a defined term in Section 101 of the Copyright Act. The definition is: “[A] work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.” An excellent discussion of how AI-generated output can (or might not) be a “derivative work” can be found in the case Andersen v. Stability AI Ltd., 23-cv-00201-WHO (N.D. Cal. Oct. 30, 2023).

[3] “Learning management site.”

[4] An IEP is an “Individualized Education Program” (as I am sure many people reading this know). While modified formats of copyright-protected works can be generated to meet the needs of a person with an IEP (for instance, generating a Braille edition of a printed book), creating a “derivative” work (basically, a simpler or “re-levelled” version of the original work) does not currently fall within this exception to infringement.

[5] “Fair use” is defined by Section 107 of the Copyright Act. For more on fair use, check out the “fair use” tags on Ask the Lawyer, and for educators, review your institution’s “fair use” policy.

[6] I will share a personal story, though. The other day (specifically, “the other day” in November 2023), my 4th-grader come home with a one-page read-aloud assignment called “The Man Who Lived in a Hollow Tree.” It was such incoherent mishmash, I decided to research what the heck was going on. By dint of research, I found out that what the one-page assignment was mostly likely an abridged version of “The Man Who Lived in a Hollow Tree” (reviewed at https://www.goodreads.com/en/book/show/3866740), except the modified version left out critical facts like the main character being a carpenter, his name, and why he chose to live in a tree. I found myself wondering “Who the heck wrote this?” And now, perhaps, I know.

Privacy And Zoom's AI

Submission Date

Question

Recently, Zoom introduced new AI features and updated their terms of service agreement, indicating that any user data can be used to train their AI products (TOS 10.4: https://explore.zoom.us/en/terms/). There was a backlash and Zoom quickly put out a clarification and stated that these features are opt-in only (https://blog.zoom.us/zooms-term-service-ai/). Despite this clarification, I am wondering if there are any privacy or FERPA concerns that librarians and educators need to be worried about since Zoom is still used heavily in both library and school worlds. Should we be looking for alternatives or is this just the way of the world now?

Answer

The day this story really broke (August 7, 2023, a day that will live in minor infamy), Nathan in my office pointed this issue out to me.

"Did you see that Zoom is going to use customer content to train AI?" he asked (this is what passes for casual morning conversation in my office).

My eyebrows went up, mostly because Zoom was being upfront about it, rather than because it was being done at all (because yes, this is the way of the world now).  That said, there are some tricks libraries and educators—and any business that cares about use of personal data—can employ to resist it.

Not surprisingly, this comes down to two simple things: awareness, and language.

We'll use the recent Zoom scenario to illustrate:

I am not sure how awareness of the new clause first broke (I am going outsource that research to Nathan, and if he finds out, he'll put it in a footnote, here[1]).  But it is clear that fairly soon, consumers were unambiguously aware of the privacy and use concerns posed by the "we'll suck you into our AI" Terms of Use.

Here is the language Zoom used[2] (and has since retracted) to announce it would use our conferences, etc. to train AI:

"[You agree Zoom can use your Content] ... for the purpose of product and service development, marketing, analytics, quality assurance, machine learning, artificial intelligence, training, testing, improvement of the Services, Software, or Zoom's other products, services, and software, or any combination thereof..."

This is where language comes in.

As the world soon knew, this "old" language listed "artificial intelligence", as well as "training", (although the Terms' dubious use of commas suggests to me that Zoom could use our Content for not just "training" AI, but humans, too... actually an even more terrifying prospect, from some perspectives).[3]  So yes, lots to be concerned about when it comes to "Customer Content" (which is Zoom’s term for the recordings/data/analytics that come from "Customer Input", which is the raw content you put into Zoom[4]).

 Now let's use our awareness of the current Term of Use (current as of August 24, 2023, at least), and see what the language says:

"10.2 Permitted Uses and Customer License Grant. Zoom will only access, process or use Customer Content for the following reasons (the “Permitted Uses”): (i) consistent with this Agreement and as required to perform our obligations and provide the Services; (ii) in accordance with our Privacy Statement; (iii) as authorized or instructed by you; (iv) as required by Law; or (v) for legal, safety or security purposes, including enforcing our Acceptable Use Guidelines. You grant Zoom a perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable license and all other rights required or necessary for the Permitted Uses."

Although not as stark as the old language, there is still a lot of wiggle room to squeeze a blending of Customer Content with AI there.  What if Zoom is "obligated" to provide a service, and decides to use AI to do it?  What if Zoom decides AI is needed for "enforcing Acceptable Use Guidelines?"  What if Zoom decides that AI is needed for your safety, and that, also for your safety, Customer Content must be used to train that AI?

Of course, right now, the Terms also say (in bold, so you know they mean it[5]):

"Zoom does not use any of your audio, video, chat, screen sharing, attachments or other communications-like Customer Content (such as poll results, whiteboard and reactions) to train Zoom or third-party artificial intelligence models".

So can this assurance be trusted?  This brings us back to language.

Back in the day, of course, computer systems were not "trained" (as one would train a dog, or a small child to use the toilet) but rather, "programmed."

However, even in the (relatively) slow-moving world of the law, this is no longer the case.

Here is an excerpt from a recent case[6] where lawyers were squabbling over how to gather "Electronically Stored Evidence" ("ESI"):

Defendants propose the following method for searching and producing relevant ESI:

1) Narrow the existing universe of approximately 27,000 documents...

2) Undersigned counsel reviews a statistically significant sample of the remaining e-mails at issue and marks them relevant/irrelevant to create a "training set;"

 3) That training set is then used to "train" the eDiscovery vendor's artificial intelligence/predictive coding tool, which "reviews" the remaining e-mails and assigns each a percentage-based score that measures likelihood to be responsive...

So even in the law, computer systems are being "trained", and there is a precise meaning to the term (which in plain[7] terms is "repeatedly using data and parameters to create patterns desired by the user").

So, with all that said, let's look at the member's questions:

Question 1: I am wondering if there are any privacy or FERPA concerns that librarians and educators need to be worried about since Zoom is still used heavily in both library and school worlds.

The short answer is: yes.

Question 2: Should we be looking for alternatives or is this just the way of the world now?

The short answer is: yes.

Here is the reason for my first short answer:  Many contracts have what I call a "we were just kidding" clause that allows the contractor to change their terms at will, and without notice.  Here is the one in the current version of Zoom:

15.2 Other Changes. You agree that Zoom may modify, delete, and make additions to its guides, statements, policies, and notices, with or without notice to you, and for similar guides, statements, policies, and notices applicable to your use of the Services by posting an updated version on the applicable webpage. In most instances, you may subscribe to these webpages using an authorized email in order to receive certain updates to policies and notices.

What does this mean?  Even though they are in bold, Zoom can change its assurance on AI at any time.

The reason for my second short answer is this: Libraries and education institutions have incredible commercial leverage when they work together.  For this reason, libraries and educational institutions should always be using their awareness of data, ethics, use, and privacy issues to demand contract language that meets their expectations.

Those expectations will change from product to product. With a product like Zoom, which can generate audio/video/text/analytics/+, including content that later may be part of a student file (FERPA) or a library record (various) the assurances should be:

  • All content entered is property of the customer (library or school);
  • At all times, all content entered into the service, or content generated with the use of customer-supplied content, may only be used to provide the current service(s) specifically authorized by the customer;
  • Any other use of data (for product improvement, for marketing) must be via a specific opt-in;
  • Terms cannot change without notice and terms in effect at the time content was generated will govern such content, regardless of future changes;
  • Customers can receive assurance that all data is purged upon request.
  • Customers can verify that they can enforce and comply with all their own internal policies and obligations regarding data creation, use, and storage.

In addition, libraries and educational institutions should have a clear set of policies for how they, as the potential owners of recordings and other data associated with the use, will use their ownership and control of the content.  It would be unfortunate, to say the least, for a student to find that their college disciplinary hearing for underage drinking is now available on YouTube.[8]

Many public library groups and academic consortia are already working to develop this type of criteria[9] (which should focus more on isolating aspirations and expectations than on legal wording, since legal wording will vary from state to state). And some institutions are designing their own services[10] in order to avoid contract terms that don't meet their criteria.

At the individual institutional level, this means building assessment of such services, and bargaining time, into the procurement process.  It also means thinking through that institution's own particular ethics and responsibilities and developing internal policies to promote them.

So, while this is the world we live in, libraries and educational institutions are well-situated to make a better one. 

Thanks for an important question.

 

 

[1] It may have been first pointed out by an anonymous user of the Reddit-like website Hacker News (https://news.ycombinator.com/item?id=37021160). This story (https://stackdiary.com/zoom-terms-now-allow-training-ai-on-user-content-with-no-opt-out/), published the same day, was shared on Twitter the next day.

[2] We didn't Wayback this.  On the day Nathan informed me of this, I asked him to pull the Terms off the site, so I could review.  We got the question to "Ask the Lawyer" about a week later.  Sometimes things just work out.

[3] What perspectives?  Ethical, moral, psychological, legal, to name a few.

[4] Definition is from paragraph "10" of the Zoom Terms of Use in effect on 8/7/2023.

[5] Like all things in law, the rules on use and interpretation of bold, underline, and italics vary from state to state.  I am not kidding.  For a great book on typography and legal writing, check out Matthew Butterick's "Typography for Lawyers."

[6] Maurer v. Sysco Albany, LLC, 2021 U.S. Dist. LEXIS 100351

[7] I trust it is painfully obvious I am not a programmer.

[8] An extreme example...then again, think of the use people have tried to make of old letters, files, and yearbooks.  Also, do we think YouTube will make it to 2033?