| Asia-Pacific Forum on Science Learning and Teaching, Volume 19, Issue 2, Article 17 (Dec., 2018) | 
Discussion and Recommendations
The construct validity analysis of the Arabic version of STEBI-B instrument using Rasch analysis techniques revealed that the instrument is adequately valid and reliable. The item quality measures revealed that all items of both scales: OE and PE, fits with model expectations based on both MNSQ and ZSTD values. The person- item for both scales demonstrates that most persons tend to select options at or above the middle of the Likert scale. This may be due to language misinterpretation and differences between Arabic and English language, and the homogeneity of the study sample, in terms of some variables. For instance most of them were female students at the second year level in a four years preparation program, they completed successfully about 2-3 college courses in school science and methods of teaching science courses, with almost the same high school science courses etc.
Moreover, the results of response scale categories revealed that the sample responses of three items of the PE scale and seven items of the PE scale fails to increase by category value. This failure of the item-responses to map the underlying trait in an ordered fashion is problematic. Purski, Blanco, Riggs, Grimes, Fordtran, Barbola, Cornell & Lichtenst (2013) argues that the potential sources for the observed disordered responses include:
1. The use of verbal negatives and phrases in the composition of the items:
(a) Respondents often read negatives (e.g., not) and interpret the item as stated in the affirmative style.
(b) The use of negative sounding phrases such as ''difficult to teach,'' ''might be better at,'' ''anxious when,'' ''wish I understood better'', may invite a negative mindset or may impart a defensive posture in respondents.
2. The choice of scaling in which Strongly Agree is assigned 1 and Strongly Disagree assigned 5 is a bit counter-intuitive for many respondents.
3. The above conditions combined can create overlapping problems for the respondents.
4. Another problem is introduced with the alteration of positively and negatively worded items; from one item to another, one has a situation in which respondents may inadvertently circle a ''4'' (Agree) rather than a ''2'' (Disagree) or vice versa.
5. There is debate about whether inclusion of the option ''undecided/uncertain'' is appropriate for persons in the field. For pre-service teachers, it makes sense that they might feel unprepared on any or all of these items; but for in-service teachers, being in the field, they should either indicate that they ''can'' or ''cannot'' do. Offering ''undecided/uncertain'' to some items in this case, may be, prompted respondents to decline indicating a lack of skill or commitment. In addition to that, there was another type of confusion between choosing ''agree'' (2) or ''undecided/uncertain'' (3).
6. Use of modifiers such as those in item three ''typically able'' and 11 ''continually improvising'' could add to respondent confusion—what is ''typical'' and who does anything ''continually?''
In a review comparing efficacy research from 1986-1997 with that done from 1998 to 2009, Klassen, Tze, Betts & Gordon (2011, p 39–40) suggested four key areas for future directions in efficacy research; there is a need to: (a) conduct qualitative studies to determine the sources of teacher efficacy—how they ''form, develop, and change over time''—these have yet to be fully researched and may vary over the career span and across cultures; (b) offer valid measurements—there is a prevalence of invalid or ill-reported measurements in the research literature; (c) connect self-efficacy of science teachers to student outcomes; and (d) determine how teacher self-efficacy enhanced (e.g., through Teacher Professional Development, teacher researcher collaborations).
In conclusion, the Arabic version of the STEBI-B instrument is a valid and reliable measure, but some improvement and revisions is needed to improve the quality of instruments' items, such as using a positive statements, and using different rating scale for each one of the two scales of the STEBI-I. Lin and Gorrell (2001) argued that the concept of teacher efficacy might be culturally oriented, thus there is a need to examine the translated items carefully when applied in different cultures. So, language editing of the instruments' item statements of the Arabic version is highly needed, in order to make it consistent with the characteristics of the Arabic language and reflects the cultural differences.
Copyright (C) 2018 EdUHK APFSLT. Volume 19, Issue 2, Article 17 (Dec., 2018). All Rights Reserved.